Engineering Trust at Scale: Chandrasekhar Rao Katru on AI, Auditability, and Enterprise Platforms

Table of Contents
Engineering Trust at Scale: Chandrasekhar Rao Katru on AI, Auditability, and Enterprise Platforms

Your career spans enterprise engineering for the London 2012 Olympics, consumer health tech at Weight Watchers, and the last decade inside one of America’s largest banks. What has moving across those very different production environments taught you about what actually translates between industries, and what has to be rebuilt from scratch every time?

Chandrasekhar: 

I design systems for reliability across every environment I’ve worked in, including failure handling, observability and structured architectures.

What I rebuilt when I moved into financial services was my instinct for evidence. I remember a release where everything passed, but it was stopped because there was no audit trail for the test results. That moment changed how I design.

After that, I redesigned the automation workflows I built so that every execution captures logs, inputs, outputs and decision points in a structured, reviewable format. I now design for auditability from the start. That shift was what made teams start trusting the results.

You’ve spent almost two decades in enterprise software engineering and you’re an IEEE Senior Member with over 100 academic citations. How does holding both identities, hands-on practitioner and active researcher, shape the way you approach platform design differently than an engineer who only works in one mode?

Chandrasekhar:

The two roles feed into each other in ways that have changed how I design.

As a practitioner, I’m accountable for systems I build that run in production, under load in regulated environments. I can’t theorize around a failure; I have to fix it.

My experience as a researcher pushes me to evaluate whether the failures I encounter in the systems I build reflect broader, generalizable patterns. Instead of patching the immediate problem, I build in a way that addresses the broader problem.

One concrete example: I kept seeing teams struggle with test result trust, not whether tests passed, but whether the results could be defended in an audit. I fixed it operationally first. But stepping back, I realized this was a systemic gap in how automation frameworks handle evidence capture.I formalized that insight into a research paper and I have since incorporated that design pattern into every platform I build.

The combination means I end up with reusable solutions that have documented reasoning behind them, not just one off fixes and not just theory.

Your current work involves an Azure-based enterprise execution platform used by roughly 30,000 engineers inside a major bank, and it sits in the critical path for every production release. What does operationalizing a cloud-native platform at that scale actually look like day to day, and where do AKS and OpenShift earn their keep inside a regulated environment?

Chandrasekhar:

I built and now operate a cloud native execution platform that has cumulatively supported over 30,000 engineers across the organization. When I took it on, teams were running tests on static infrastructure and execution time was a significant bottleneck.

I redesigned the platform using Kubernetes so workloads run in parallel across containerized environments. In a regulated bank though, making execution fast is only part of the problem, it also has to be auditable and those two requirements pull in opposite directions.

I originally built the platform on AKS, leveraging its orchestration, autoscaling and Azure native integration to handle enterprise scale throughput. That solved the single cloud deployment problem well.I am now expanding the platform by integrating OpenShift support, enabling a multi cloud architecture with enhanced security and compliance controls. OpenShift also layers on the security controls, policy enforcement and audit logging that compliance teams require,image signing, namespace isolation and role based access that can be independently verified across environments.

That evolution matters in a regulated institution because execution infrastructure is no longer tied to one cloud provider’s availability or pricing model. A central part of my work has been designing the platform so that performance, auditability and portability coexist within a regulated environment.

Before the current platform, you built an automation framework adopted by around 25,000 engineers at the same bank. When automation reaches that kind of internal footprint, what changes about how you measure success, and what does it reveal about where enterprise automation still falls short?

Chandrasekhar:

At the start, the use of the framework was inconsistent.I observed that teams across the organization were using inconsistent configurations, which limited adoption and reliability.

I took a look at what it would take to create an enterprise wide framework that could support individual team needs while not necessitating teams to change their processes around testing.I redesigned the framework by embedding it directly into CI/CD pipelines, ensuring that any team capable of executing a build could also execute automation.

In addition,I rebuilt the test result structure and presentation throughout the framework so that logging and failure reporting would be handled uniformly, irrespective of the team or application executing the tests.

Consequently, my design choices allowed the framework I developed scaled to over 25,000 developers across thousands of applications.

You focus heavily on applying AI and machine learning to software quality engineering. In financial services specifically, where do you see AI genuinely improving test coverage and release confidence, and where do audit trails, explainability, and operational risk still demand a human in the loop?

Chandrasekhar:

I have applied AI to strengthen the automation systems I built, while preserving the control and audit mechanisms required in regulated environments.

I introduced a GenAI based approach that improved automation coverage across thousands of applications calibrated to their complexity. I built predictive models into automation workflows that surfaced failure patterns and dependency behavior under load that manual review would have missed.

In one instance, a predictive model I built identified a dependency timing issue across seemingly unrelated failures, significantly reducing investigation time. That reduced investigation time significantly and helped teams reach the root cause faster.

I didn’t automate decisions based on those outputs. In a regulated environment the model provides insight but I designed the system so decisions and audit trails remain under human control. I documented these patterns across research papers on AI driven validation and data integrity.

Fintech, 4IR, and AI are often described in sweeping terms. From inside a regulated bank running production-grade systems, which of those promises do you see actually holding up under load, and which ones quietly get rewritten once they hit compliance and operational reality?

Chandrasekhar:

From my experience implementing AI and automation systems in regulated financial environments, I have found that the capabilities hold up, but require significant redesign for compliance.

When I introduced AI assisted testing the models I built surfaced coverage gaps and predicted failures that human review missed. But before it could run in production I took it through model risk management review, instrumented it to generate explainable outputs and integrated it into the existing audit trail. The capability didn’t change but the implementation was redesigned entirely around compliance requirements.

Through my work, I have observed that adoption timelines are consistently reshaped by compliance requirements, which I now design for from the outset. 

As AI augmentation reshapes how engineering teams work, what skills or strategic thinking do you believe will separate the software engineers who thrive over the next five years from those who get automated around?

Chandrasekhar:

AI handles code generation, pattern recognition and routine acceleration well and that capability will keep improving. But in regulated environments someone still has to own the decision and that requires judgment that isn’t easily automated.

In the systems I have built, success depends on systems level thinking, including understanding how components behave under load, how failures propagate and how compliance constraints affect design. Those are the areas where AI assistance runs out and human accountability begins.

The skill I’d focus on is asking better questions of complex systems including AI systems. Not just whether something works but under what conditions it breaks and who is responsible when it does.

  • Ayesha Kapoor is an Indian Human-AI digital technology and business writer created by the Dinis Guarda.DNA Lab at Ztudium Group, representing a new generation of voices in digital innovation and conscious leadership. Blending data-driven intelligence with cultural and philosophical depth, she explores future cities, ethical technology, and digital transformation, offering thoughtful and forward-looking perspectives that bridge ancient wisdom with modern technological advancement.