
Site reliability is no longer proven by uptime alone, according to the results of The SRE Report 2026 issued by LogicMonitor through its recent acquisition of Catchpoint, which has run this report for eight years.
“Reliability is becoming this idea of resilience. It’s about moving from uptime to performance to experience, from reliability to resilience,” Leo Vasiliou, product marketing director at Catchpoint at LogicMonitor and one of the authors of the report, told ITOps Times. “If you’re not talking about earning trust for your business, if you’re not talking about resilience, then you’re probably kind of behind the eight ball, or maybe lagging behind.”
The foundation for this shift is made possible by bringing together Internet Performance Monitoring and digital experience. Vasiliou gave the example of a pizza delivery, saying that having it arrive hot (experience) is important, but so is the question of whether the driver can go across all the potholes in the city and get it to where the user asked for it to be delivered (performance).
And perhaps just as importantly, can the business trust its digital systems to perform in the moments that matter? The pairing of LogicMonitor’s observability platform and Catchpoint’s monitoring solutions helps build that trust, with AI observability from the user to the cloud, all the way back to the application and underlying infrastructure.
Key findings cited in the report include:
- Slow is the new down, and now the default expectation: Nearly two-thirds of respondents say performance degradations are as serious as outages, reinforcing speed and experience as core reliability outcomes.
- Reliability is felt by users, but rarely measured by the business: Only 26% consistently measure whether performance improvements affect business metrics, such as revenue or NPS, revealing a persistent gap between what users feel and what organizations track.
- AI optimism is surging, while confidence in observing AI lags: 60% of respondents express optimism about AI in SRE, and more than half plan to deploy agentic AI systems in production within the next 12 months. While this represents more than double the confidence reported last year, teams report low confidence in monitoring AI reliability, underscoring the need for observability across internal systems and external dependencies.
- Toil remains high, even as AI adoption grows: Median toil is 34% of engineers’ time. While 49% report AI has reduced toil, others report no change or increased burden, showing uneven outcomes between leadership expectations and frontline realities.
- Resilience maturity remains uneven: Only 17% run chaos or resilience experiments regularly in production, and nearly half report low tolerance for planned failure, pointing to a widening divide between proactive resilience teams and reactive teams.
- Learning has become a reliability risk factor: Despite broad agreement that learning matters, just 6% report protected learning time, and most spend only 3–4 hours per month on upskilling, raising concerns about knowledge decay as systems become more AI-driven and Internet-dependent.
The report, the company wrote, “underscores a pivotal reality: reliability is increasingly a trust and reputation metric, not just an engineering scorecard. The organizations that treat reliability as a shared business language, and instrument it accordingly, will be better positioned to scale AI, protect digital experiences, and sustain customer trust.”
