Over the past seven years, we’ve seen Kubernetes become the de facto platform for building modern applications. With this shift, application architectures have become increasingly distributed, dynamic, and modular. As a byproduct, logging data has exploded – depending on the company, anywhere from one terabyte to multiple petabytes of data can be generated each day.
While more data isn’t inherently bad, teams simply don’t have the tools or the budget to process and make sense of all this data. It’s like a bathtub with a never-ending supply of water, but only so much capacity to make use of it. So, DevOps and SRE teams are forced to make predictions and decisions about which data is “important” and worth analyzing. The rest gets “drained” into archives or a less active storage tier where the team can save costs at the expense of real-time visibility and analytics.
Again, no two organizations are the same, but it’s common for teams to neglect as much as 80% of their logging data.
The impact of neglecting observability datasets
This practice of dropping datasets presents shortcomings to DevOps and SRE teams, both from an operational standpoint and a long-term viability perspective.
First, in terms of operations, teams are forced to undertake a lot of manual labor to get valuable insights from their data.
- They must understand their datasets at an intimate level to index the “right” things.
- They likely have to structure their logs in a very purposeful and intentional way to make sense of the data.
- They have to configure and refine logic at granular levels to monitor the behaviors they care about (to the best of their ability).
All of these operations take time, which is unfortunate since DevOps teams are already overtaxed: according to a recent survey, 83% of DevOps practitioners reported experiencing burnout.
When an issue does occur, teams must look for the proverbial “needle in a haystack” of logging data to resolve it, adding hours or days to the workflow.
Second, from a long-term perspective, data growth is not slowing down. Let’s say you’re generating one terabyte of data each day now, and only able to analyze 200 gigabytes. What happens in three years when you’re generating more data?
Borrowing from my previous analogy, the water source might get bigger but your bathtub is likely to stay the same size.
Both of these challenges are going to be more extreme for teams that are new to Kubernetes. That’s because there is a sudden spike in data, plus many new layers or tiers of resources (clusters, containers, pods, etc.). To monitor the “right” datasets – and confidently drop the wrong datasets – teams need to decipher this new environment, which can be challenging to do.
A new approach: Analyzing data at the source
DevOps and SRE teams can help their organizations solve these challenges, but it calls for a different approach to observability. One that allows them to analyze 100% of their data at any scale, without neglecting critical parts of it.
Currently, most observability pipeline vendors are pushing the choice of what data to index and analyze upstream. Instead, teams can push their analytics upstream to the data source, unlocking visibility into complete datasets without pushing the limits of their existing platforms. Thus, streaming the outputs to their observability platform of choice, saving their engineers hours in manual operations.
To flip observability on its head like this, teams need a few core capabilities. It all starts with a deployment methodology that is as distributed as their Kubernetes environment. From there, teams need:
- Stream processing of data at the source, versus batch processing in a central platform
- Federated machine learning to analyze datasets
- Intuitive visualizations to communicate service behavior to each stakeholder
Kubernetes environments have characteristics that make them advantageous for building modern applications – they’re scalable, modular, and dynamic. These same characteristics create huge swarms of data that can be difficult to keep up with using traditional observability tools alone.
However, data explosion doesn’t have to be an Achilles heel for your DevOps or SRE team. By pushing analytics upstream to the data source, teams can understand the behavior of their applications and services before they index any data. This new approach to observability allows teams to gain better visibility today, and also stay ahead of exponential data growth in the years to come.