Learn how engineering teams are transforming software engineering with agentic AI.
Let’s talk strategy, scalability, partnerships, and the future of autonomous systems.
DataStax, an IBM company, provides mission-critical applications for enterprises around the globe. At the heart of its offerings is Astra DB, a cloud-native, vector-enabled database-as-a-service built on Apache Cassandra ®. Thousands of developers and leading enterprises rely on DataStax to deliver scalable AI solutions. The company operates in tens of regions across the three major public clouds, processing millions of database requests per second with multiple 9’s of availability. Its infrastructure encompasses hundreds of microservices hosted on thousands of servers and supporting tens of thousands of database workloads. In such a complex production environment, even minor disruptions of tens of seconds to minutes can lead to significant outages, jeopardizing customer trust and operational continuity.
The addition of DataStax to watsonx.data enhances IBM's vector capabilities and strengthens our retrieval-augmented generation and knowledge embedding capabilities. IBM is integrating DataStax tools and technologies into watsonx.data. These include Astra DB and Hyper-Converged Database, which provide NoSQL and vector database capabilities powered by the open-source Apache Cassandra®. Hyper-Converged Database is available now within watsonx.data, and Astra DB will be available within watsonx.data in Q3 2025.
Uninterrupted Service: Operating at massive scale across thousands of interconnected services, even brief disruptions at DataStax can trigger cascading failures. The complexity of a globally distributed system requires that issues be identified and contained swiftly to avoid widespread impact on uptime, performance, and customer experience.
Operational Overhead: Modern production environments evolve rapidly, yet traditional troubleshooting methods have failed to keep up. Expertise is often siloed across teams, and critical knowledge, when documented at all, becomes outdated quickly. As a result, senior engineers are frequently pulled into firefighting mode, limiting their ability to focus on high-leverage, strategic work.
Fragmented Observability: Legacy observability tools heavily rely on static dashboards and fragmented telemetry streams, making it challenging to spot emerging risks in real-time. Infrastructure-as-Code practices introduce additional complexity, as stateless redeployments of tools like Grafana often break authentication tokens, disrupting automated monitoring when it is needed most.
DataStax’s pursuit of intelligent automation began with leveraging AI code generation tools to expedite development. However, it soon became clear that generating code only touched the surface of operational needs. The team uncovered a broader opportunity: to leverage AI to completely transform production workflows.
Resolve AI now anchors that strategy by delivering value across three pillars:
As Resolve AI starts its life in production at DataStax, based on extensive testing, the team expects significant reductions in Mean-time-to-resolution (MTTR). Engineers are expected to gain back hundreds of hours every month that are lost to expensive triage and firefighting. More importantly, teams will finally be able to reallocate their time toward higher-leverage initiatives: building internal tooling, tackling long-delayed reliability projects, pushing debugging earlier into pre-prod, and shaping the platform’s next phase of scale.
“With Resolve AI starting to play a more central role in our operations, we’re looking to achieve transformational productivity gains, significantly increase our throughput, and enable every engineer, from junior to senior, to focus on innovation rather than routine tasks.” — Shankar Ramaswamy, Head of Engineering, DataStax
Grafana sits at the heart of DataStax’s monitoring, but its stateless Kubernetes deployment regularly broke API access. Every redeploy wiped its service tokens, undermining automation.
To address this issue, DataStax and Resolve AI partnered to develop and open-source a lightweight Grafana Service Account Sidecar that automatically provisions a dedicated service account for each new Grafana pod. This ensures that Resolve AI always has uninterrupted API access to critical monitoring data. Additionally, by securely storing tokens as Kubernetes secrets, the sidecar maintains the necessary credentials across redeployments, eliminating any potential interruptions in observability. The sidecar has become a small but critical component in making DataStax’s production observability resilient and keeping its AI systems fully informed.
Resolve AI delivers intelligence across the entire operational lifecycle far beyond incident response. During live incidents, the Agentic AI autonomously surfaces diagnostics like error spikes, latency patterns, and dependency failures directly within engineers’ existing workflows. This real-time insight accelerates triage and decision-making.
Its value deepens in multi-tenant environments where overlapping services make RCA notoriously hard. Resolve AI correlates fragmented telemetry to build clear incident timelines, revealing hidden dependencies and uncovering root causes faster. Post-incident, its contextual understanding helps teams preserve learnings and prevent recurrences.
Now, DataStax is extending this intelligence earlier in the development cycle, enabling engineers in dev and staging environments to use Resolve AI for debugging, pre-prod triage, and early issue detection. This proactive shift embeds operational awareness where it matters most: before incidents hit production.
The impact of Resolve AI is both operational and strategic:
When the expected impact is achieved, it will emphasize Resolve AI's position not only as a tool for incident management but also as a strategic Agentic AI that supports the entire software engineering lifecycle. This advancement transforms how modern infrastructure is built, monitored, and maintained, paving the way for a future where software engineering evolves alongside Agentic AI.
View the video below to hear from Shankar Ramaswamy about how DataStax is approaching AI within software engineering and additional context into the role Resolve AI is playing to increase engineering velocity and developer productivity.