Observability Without Code Changes: The Promise of eBPF‑Native Architectures

Published: March 25th, 2026

There’s a problem with modern observability that almost nobody talks about openly: your monitoring stack might be hurting the systems it’s supposed to protect.

I don’t mean in a theoretical sense. I mean that the agents and SDKs most teams rely on for visibility impose real overhead on the applications they instrument. CPU, memory, throughput. When we benchmarked our own eBPF-based sensor against leading observability platforms at 3,000 requests per second, Datadog added 249% CPU overhead and 227% memory overhead to the monitored application. OpenTelemetry added 59% CPU and 27% memory. Under CPU-constrained conditions, that overhead translated directly into degraded request handling. Datadog reduced throughput by 71%, OpenTelemetry by 19%.

This is the hidden cost that engineering teams discover too late, usually when they’re already at scale and already paying for it. It’s also what pushed us to build differently, and why I think the observability industry is finally being forced to reckon with a problem it created for itself.

The Instrumentation Trap

I learned this lesson before founding groundcover, and it’s what convinced me the industry had a structural problem worth solving.

At a previous company, we had a data pipeline we couldn’t diagnose. Customers were reporting data loss across a complex system: thirty microservices, message queues, Redis, API calls, everything you’d expect from a modern platform. We had logs, plenty of them, but you can’t read through twenty or thirty million log lines and understand what’s going on. So we instrumented: counters to represent every stage of the pipeline, traces, the full treatment. Two months of work. About a million counters by the end. We finally found the leakage, and then the CTO came back and told us the observability bill had increased fivefold. We had to remove most of what we’d built because we couldn’t afford to run it.

That experience was the trigger for founding groundcover. We weren’t doing observability wrong. We were doing it exactly as the industry prescribes. The problem was the model itself.

The standard approach has an elegant simplicity: if you want visibility into a service, you add an SDK. If you want traces, you wrap your HTTP clients. If you want metrics, you decorate your code. For small systems with a handful of well-understood services, this works fine.

The trap springs at scale. Every service has to adopt the correct SDK version, aligned with its runtime and language. As microservice counts grow into the hundreds, keeping instrumentation consistent across teams becomes a project in itself, one that never quite finishes, because applications keep changing. Services get rewritten. Dependencies get upgraded. New third-party integrations appear that nobody has documented.

That last point is the deeper problem. Engineers can only instrument what they already know about. But modern platforms depend on a sprawling ecosystem of managed databases, feature-flag services, authentication providers, external APIs, and internal microservices, many of which were never formally mapped. If an interaction wasn’t anticipated during development, it simply won’t appear in your telemetry. You have visibility into the things you expected to see, and a blind spot over everything else.

In security, we’d call this the “unknown unknowns” problem. In observability, we just call it normal.

AI-assisted development is making this worse faster than most teams realize. Engineers are now generating large volumes of code quickly, new services, new dependencies, new integration patterns, at a pace that outstrips any team’s ability to instrument or document proactively. The gap between what’s running in production and what’s actually observable is widening.

Why the Kernel Changes Everything

eBPF, the extended Berkeley Packet Filter, offers a fundamentally different model. Instead of inserting instrumentation into application code, eBPF programs run directly inside the Linux kernel, observing system behavior from below the application layer entirely.

The architectural implication is significant. Traditional monitoring agents run in user space, which means they have to ask the kernel for the data they need. That request-response overhead is exactly what shows up in benchmark results as CPU and memory tax on your workloads. eBPF programs run in kernel space and access that data directly, which is why the overhead profile looks so different in practice.

There’s a second advantage that matters just as much: eBPF observes everything that touches the kernel, whether it was instrumented or not. HTTP requests, database calls, outbound network connections, process activity, all of it is visible at the kernel layer, regardless of what language the application is written in, which SDK it uses, or whether anyone thought to instrument it. When customers first deploy our sensor and open the platform for the first time, a consistent pattern emerges: they see their production workloads mapped in a way they’ve never seen before. Third-party applications reporting data to external vendors nobody knew about. Service interactions that don’t appear in any architecture diagram. Weird things, that’s the word I’d use, that you never thought you’d see.

The scariest problems in production are the ones you didn’t know to look for. Two minutes after deploying a sensor on a new cluster, customers can see every API going in and out, including the ones that surprised them. That visibility, arriving before anyone wrote a line of instrumentation code, is what makes the approach feel qualitatively different from what came before.

What eBPF Actually Costs to Run

I want to be honest about the tradeoffs, because the eBPF ecosystem has attracted enough hype that it’s worth separating the genuine advantages from the overselling.

Running programs inside the kernel requires a deep understanding of system boundaries. Every eBPF program must pass the kernel verifier before it executes, a safety check that prevents programs from harming system stability. This is genuinely valuable, but it creates real development friction. At groundcover, we describe it as learning to “dance with the verifier.” It rejects programs without always explaining why. Something as basic as copying data from A to B can require careful attention to avoid out-of-bounds access that trips the verifier. A program that passes on one kernel version may be rejected on another, and since the verifier only runs at load time, you may not discover the incompatibility until you deploy to a different node configuration.

Stack space for eBPF programs is constrained. Writing efficient eBPF code at scale requires a discipline that takes time to develop. And translating raw kernel signals, network packets, syscall events, and process metadata into something developers can actually act on is a non-trivial engineering problem. The kernel sees everything, but it doesn’t automatically speak the language of distributed traces and service maps. At the eBPF layer, you’re working in hundreds of nanoseconds, not milliseconds. That translation layer is where most of the hard work lives, and it’s the part that takes real investment to get right.

The operational maturity of the ecosystem has improved considerably. CO-RE (Compile Once, Run Everywhere) has addressed many of the portability problems that plagued earlier eBPF development. Toolchains like libbpf have raised the floor significantly. But teams considering eBPF should plan for the learning curve, not assume it away.

eBPF and OpenTelemetry Are Not Competitors

One of the most common misconceptions I encounter is that eBPF and OpenTelemetry are in tension, that adopting one means moving away from the other. This misunderstands what each technology actually does.

OpenTelemetry operates at the application layer. It gives developers a standardized, vendor-neutral way to emit traces, metrics, and logs from their own code. The signals it produces are rich with business context: domain-specific events, custom attributes, application-level spans that reflect your service’s actual logic. This is valuable data that kernel-level observability cannot replicate, because the kernel has no concept of a “checkout flow” or a “recommendation engine.” That semantic layer only exists inside the application.

eBPF operates at the system layer. It gives you automatic, zero-instrumentation visibility across your entire environment, every service, every network connection, every process, regardless of language or runtime.

The right mental model is that eBPF provides the floor and OpenTelemetry provides the ceiling. eBPF ensures you have coverage across everything, including the things you didn’t know to instrument. OpenTelemetry ensures the things you do care about are instrumented with the precision and context your business needs.

In practice, we see customers needing far less OTel instrumentation than they expect. Most teams think they need to instrument everything. What they actually need is eBPF to cover the full environment automatically, and OTel to pinpoint the 20 or 30 percent of their stack where business-level context genuinely matters: a specific checkout flow, a customer-facing API, a billing event. The combination is powerful precisely because each technology is doing what it does best, rather than both trying to do the same job.

Observability That Doesn’t Wait to Be Asked

The deeper shift that eBPF enables isn’t just technical. It’s philosophical.

The traditional model of observability is reactive and anticipatory. Teams instrument what they know, discover blind spots during incidents, add more instrumentation, and repeat the cycle. The system is only as observable as the engineering time invested in instrumenting it, and that investment is always running behind the pace of development.

The kernel-native model inverts this. When you observe behavior at the system level, you get immediate coverage across everything running in your environment, including services that were deployed five minutes ago, third-party dependencies that were never in your architecture docs, and edge cases that no one thought to plan for. You don’t have to anticipate what to observe. The system tells you what’s happening.

For teams operating at high scale, or teams whose development velocity has outpaced their instrumentation discipline, this isn’t an incremental improvement. It’s a different way of thinking about what observability is supposed to do.

The observability industry has spent years asking engineers to do more upfront work: more instrumentation, more configuration, more maintenance, in exchange for visibility. eBPF-native architectures make a different offer. Visibility first, instrumentation where it adds value. That’s the direction the field is moving, and I think it’s the right one.

Article Tags

eBPF, open telemetry, unknown unknowns

About Yechezkel Rabinovich

Yechezkel "Chez" Rabinovich is CTO and co-founder of groundcover.

View all posts by Yechezkel Rabinovich

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__atuvc	1 year 1 month	AddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.
__atuvs	30 minutes	AddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.

Cookie	Duration	Description
__gads	1 year 24 days	The __gads cookie, set by Google, is stored under DoubleClick domain and tracks the number of times users see an advert, measures the success of the campaign and calculates its revenue. This cookie can only be read from the domain they are set on and will not track any data while browsing through other sites.
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_WTGVKVXEZJ	2 years	This cookie is installed by Google Analytics.
_gat_gtag_UA_107693958_2	1 minute	Set by Google to distinguish users.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
_jsuid	1 year	This cookie contains random number which is generated when a visitor visits the website for the first time. This cookie is used to identify the new visitors to the website.
at-rand	never	AddThis sets this cookie to track page visits, sources of traffic and share counts.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
iutk	5 months 27 days	This cookie is used by Issuu analytic system to gather information regarding visitor activity on Issuu products.
uvc	1 year 1 month	Set by addthis.com to determine the usage of addthis.com service.

Cookie	Duration	Description
IDE	1 year 24 days	Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
loc	1 year 1 month	AddThis sets this geolocation cookie to help understand the location of users who share the information.
mc	1 year 1 month	Quantserve sets the mc cookie to anonymously track user behaviour on the website.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
__gpi	1 year 24 days	No description
_heatmaps_g2g_101137905	10 minutes	No description
cf_7167_id	20 years	No description
cf_7167_person_last_update	session	No description
GoogleAdServingTest	session	No description
prism_252377639	1 month	No description
querylyvid	3 months	No description
xtc	1 year 1 month	No description

Observability Without Code Changes: The Promise of eBPF‑Native Architectures

The Instrumentation Trap

Why the Kernel Changes Everything

What eBPF Actually Costs to Run

eBPF and OpenTelemetry Are Not Competitors

Observability That Doesn’t Wait to Be Asked

Article Tags

Subscribe to SDTimes

About Yechezkel Rabinovich

Related Articles

groundcover Brings AI-Native Observability to Production Analysis

From TicketOps to GitOps: What programmability means for network operations

eBPF has opened many doors for Linux, will continue to do so for many years

ITOps Open-Source Project of the Week: Grafana Beyla