Go Back to Customers

Global Uptime Unlocked - From Data to Action with Agentic AI

Expected reductions
60% in MTTR
Engineering productivity
100s of eng hrs saved per month
Improved customer experiences
Greater system reliability and ROI
Share:

About DataStax

DataStax, an IBM company, provides mission-critical applications for enterprises around the globe. At the heart of its offerings is Astra DB, a cloud-native, vector-enabled database-as-a-service built on Apache Cassandra ®. Thousands of developers and leading enterprises rely on DataStax to deliver scalable AI solutions. The company operates in tens of regions across the three major public clouds, processing millions of database requests per second with multiple 9’s of availability. Its infrastructure encompasses hundreds of microservices hosted on thousands of servers and supporting tens of thousands of database workloads. In such a complex production environment, even minor disruptions of tens of seconds to minutes can lead to significant outages, jeopardizing customer trust and operational continuity.

The addition of DataStax to watsonx.data enhances IBM's vector capabilities and strengthens our retrieval-augmented generation and knowledge embedding capabilities. IBM is integrating DataStax tools and technologies into watsonx.data. These include Astra DB and Hyper-Converged Database, which provide NoSQL and vector database capabilities powered by the open-source Apache Cassandra®. Hyper-Converged Database is available now within watsonx.data, and Astra DB will be available within watsonx.data in Q3 2025.

shan-quote.png

Key Challenges: Ensuring Unbroken Astra DB Uptime Across a Multi-Cloud, Global Footprint

Uninterrupted Service: Operating at massive scale across thousands of interconnected services, even brief disruptions at DataStax can trigger cascading failures. The complexity of a globally distributed system requires that issues be identified and contained swiftly to avoid widespread impact on uptime, performance, and customer experience.

Operational Overhead: Modern production environments evolve rapidly, yet traditional troubleshooting methods have failed to keep up. Expertise is often siloed across teams, and critical knowledge, when documented at all, becomes outdated quickly. As a result, senior engineers are frequently pulled into firefighting mode, limiting their ability to focus on high-leverage, strategic work.

Fragmented Observability: Legacy observability tools heavily rely on static dashboards and fragmented telemetry streams, making it challenging to spot emerging risks in real-time. Infrastructure-as-Code practices introduce additional complexity, as stateless redeployments of tools like Grafana often break authentication tokens, disrupting automated monitoring when it is needed most.

The Solution: Embracing Autonomous Operations with Resolve AI

DataStax’s pursuit of intelligent automation began with leveraging AI code generation tools to expedite development. However, it soon became clear that generating code only touched the surface of operational needs. The team uncovered a broader opportunity: to leverage AI to completely transform production workflows.

Resolve AI now anchors that strategy by delivering value across three pillars:

  • Unified Knowledge: It continuously maps services, infrastructure, change history, and even tribal knowledge into a real-time temporal knowledge graph, providing engineers with full system context.
  • Autonomous Investigation: It approaches incidents like a seasoned engineer, observing symptoms, forming hypotheses, running diagnostics, and guiding teams to root cause without constant human handholding.
  • Continuous Learning: Resolve AI sharpens its reasoning with each incident and interaction. It learns how DataStax systems behave and where failures emerge, responding faster and smarter through novel and recurring issues.

As Resolve AI starts its life in production at DataStax, based on extensive testing, the team expects significant reductions in Mean-time-to-resolution (MTTR). Engineers are expected to gain back hundreds of hours every month that are lost to expensive triage and firefighting. More importantly, teams will finally be able to reallocate their time toward higher-leverage initiatives: building internal tooling, tackling long-delayed reliability projects, pushing debugging earlier into pre-prod, and shaping the platform’s next phase of scale.

“With Resolve AI starting to play a more central role in our operations, we’re looking to achieve transformational productivity gains, significantly increase our throughput, and enable every engineer, from junior to senior, to focus on innovation rather than routine tasks.” — Shankar Ramaswamy, Head of Engineering, DataStax

Observability Without Gaps

Grafana sits at the heart of DataStax’s monitoring, but its stateless Kubernetes deployment regularly broke API access. Every redeploy wiped its service tokens, undermining automation.

To address this issue, DataStax and Resolve AI partnered to develop and open-source a lightweight Grafana Service Account Sidecar that automatically provisions a dedicated service account for each new Grafana pod. This ensures that Resolve AI always has uninterrupted API access to critical monitoring data. Additionally, by securely storing tokens as Kubernetes secrets, the sidecar maintains the necessary credentials across redeployments, eliminating any potential interruptions in observability. The sidecar has become a small but critical component in making DataStax’s production observability resilient and keeping its AI systems fully informed.

Mike-quote.png

Deeper Impact with Agentic AI for Software Engineering

Resolve AI delivers intelligence across the entire operational lifecycle far beyond incident response. During live incidents, the Agentic AI autonomously surfaces diagnostics like error spikes, latency patterns, and dependency failures directly within engineers’ existing workflows. This real-time insight accelerates triage and decision-making.

Its value deepens in multi-tenant environments where overlapping services make RCA notoriously hard. Resolve AI correlates fragmented telemetry to build clear incident timelines, revealing hidden dependencies and uncovering root causes faster. Post-incident, its contextual understanding helps teams preserve learnings and prevent recurrences.

Now, DataStax is extending this intelligence earlier in the development cycle, enabling engineers in dev and staging environments to use Resolve AI for debugging, pre-prod triage, and early issue detection. This proactive shift embeds operational awareness where it matters most: before incidents hit production.

The impact of Resolve AI is both operational and strategic:

  • Expected reductions up to 60% in MTTR through faster investigation and recovery.
  • Estimated monthly savings of hundreds of Engineering Hours by automating manual diagnostics.
  • Improved On-Call Experience and Efficiency, reducing interruptions and enabling leaner response models.
  • Greater System Reliability and ROI, with fewer escalations and lower incident-related costs.

When the expected impact is achieved, it will emphasize Resolve AI's position not only as a tool for incident management but also as a strategic Agentic AI that supports the entire software engineering lifecycle. This advancement transforms how modern infrastructure is built, monitored, and maintained, paving the way for a future where software engineering evolves alongside Agentic AI.

View the video below to hear from Shankar Ramaswamy about how DataStax is approaching AI within software engineering and additional context into the role Resolve AI is playing to increase engineering velocity and developer productivity.

0:00
0:00
/
0:00

Want to see why leading companies trust Resolve AI?

Learn how engineering teams are transforming software engineering with agentic AI.

Contact Us

Handoff your headaches to Resolve AI

Get back to driving innovation and delivering customer value.

Join our community

©Resolve.ai - All rights reserved

semi-circle-shape
square-shape
shrinked-square-shape
bell-shape