
KubeCon + CloudNativeCon North America continues on in Atlanta, and during the second day of the event, a number of new announcements were made.
Here are some of the highlights:
NVIDIA Grove improves inference scaling on Kubernetes
NVIDIA announced the release of NVIDIA Grove, an open-source API for running modern ML inference workloads on Kubernetes clusters. It enables users to describe their inference serving system in Kubernetes as a single Custom Resource from which the platform can coordinate hierarchical gang scheduling, topology-aware placement, multilevel autoscaling, and explicit startup ordering.
“You get precise control of how the system behaves without stitching together scripts, YAML files, or custom controllers,” NVIDIA wrote in a blog post.
SUSE adds new capabilities to Rancher Prime and SUSE AI
The latest updates to Rancher Prime are designed to provide more consistent operations at scale across a variety of workload environments. This release introduces a preview for Liz, an AI agent for Kubernetes management; virtual clusters to optimize GPU resources; a preview for micro-segmentation that decouples network functions from physical hardware; and expanded observability capabilities.
SUSE AI adds a preview for an integrated MCP proxy, which will enable centralized management of MCP endpoints, optimize model costs, and enhance data access control. It also includes an expanded inference engine portfolio and the OpenTelemetry operator for auto-instrumentation.
Chronosphere launches AI-Guided Troubleshooting capabilities
These new capabilities combine AI reasoning with a Temporal Knowledge Graph, which is a queryable map of an organization’s services, infrastructure, and their relationships. This provides the context needed for AI to surface the most meaningful next steps in an investigation, Chronosphere explained.
At each step of an investigation, the tool explains what has been analyzed or ruled out, and it also feeds these results back into the Temporal Knowledge Graph to improve future suggestions.
“For AI to be effective in observability, it needs more than pattern recognition and summarization,” said Martin Mao, CEO and co-founder of Chronosphere. “Chronosphere has spent years building the data foundation and analytical depth needed for AI to actually help engineers. With our Temporal Knowledge Graph and advanced analytics capabilities, we’re giving AI the understanding it needs to make observability truly intelligent — and giving engineers the confidence to trust its guidance.”
Sysdig announces new threat investigation and analysis capabilities for Falco
Falco is the company’s open-source tool for runtime cloud threat detection, and the latest capabilities improve its ability to be used with Sysdig’s other open-source application analysis tool, Stratoshark.
Falco is now able to record system capture (SCAP) files when a specific rule is triggered, and then those files can be consumed by Stratoshark. According to Sysdig, this new capability will enable users to more seamlessly move “from real-time threat detection into post-event analysis.”
Dash0 announces agentic AI agents for observability
These agents are part of a new platform called Agent0, which is integrated natively into Dash0.
This new platform provides five focused expert agents, including The Seeker, which helps with troubleshooting and incident triage; The Oracle, which helps improve PromQL queries; The Pathfinder, which can be used to help with onboarding and instrumentation; The Threadweaver, which is a trace analyst and narrative builder; and The Artist, which is a dashboard and alerting builder.
The company also says it plans to expand Agent0 over the next several months with agents for error management, RUM, cloud cost optimization, security, and more.
Devtron announces 2.0 release of platform
Devtron is an open-source Kubernetes management platform and the 2.0 release focuses on providing agentic SRE to ensure that customers’ Kubernetes installations are able to withstand catastrophic failures and ransomware attacks, and can maintain high availability at scale.
New capabilities include a single pane of glass for viewing applications and infrastructure, integrated FinOps with real-time cost attribution and GPU visibility, KubeVirt integration, and automated cost controls like hibernation and rightsizing.
“Kubernetes made applications and infrastructure inseparable. Every pod defines resources, every deployment affects costs,” said Ranjan Parthasarathy, CEO of Devtron. “Yet platform teams still use separate tools for monitoring apps, managing infrastructure, and tracking costs. Devtron 2.0 provides true unified visibility. When a service slows down, you immediately see if it’s on an overloaded node. When costs spike, you see exactly which application is consuming what resources. Our Agentic SRE takes this further, autonomously optimizing across all three domains.”
Cloudsmith launches MCP Server
Cloudsmith is a company that provides cloud-native artifact management, and this MCP server will allow developers to integrate Cloudsmith’s capabilities directly into their workflows.
Developers can use it to get answers about their repositories, packages, and builds, and can initiate certain actions with full audit logs to maintain visibility over interactions.
“AI is redefining how developers work, moving from manual clicks to natural language interactions. We see this shift every day with our customers. Cloudsmith’s MCP Server is a necessary bridge to this new way of working,” said Alison Sickelka, VP of Product at Cloudsmith. “By integrating directly with tools like Claude and CoPilot, we ensure engineers can manage, secure, and make decisions about their software artifacts simply by asking a question within the environment they already use. This isn’t just about convenience, it brings trusted artifact data and governance exactly where developers build, making the AI part of the secure software supply chain, not separate from it.”
Catch up on the news from Day 1 here.
