Application performance management (APM) didn’t used to have a Big Data problem. Three-tiered application architecture with regular, infrequent updates generated predictable and manageable amounts of data for analysis and dependency mapping.
Modern architectures, however, present a Big Data problem. Their virtual machines, containers, microservices, hybrid clouds, and software-defined networks are constantly changing and can generate petabytes of data each day. Sampling no longer works since too many objects, methods, and transactions will have come and gone between sample intervals. It’s essential to collect data on all transactions and user experiences every second but the resulting volume, velocity, and variety of data can overwhelm legacy monitoring tools, requiring a different approach to APM.
No analyst has ever said they wished they had less data to help them diagnose a problem, as long as they have the appropriate tools to deal with it. Application performance management data that includes all apps, all users, on any device, is essential for quick and correct diagnosis and resolution. The breadth of infrastructure also means that this data must be collected from public cloud servers, software-as-a-service instances, and private data centers, following users as their processes spin up and down. Modern APM tools should automatically discover and instrument all processes regardless of type and location, delivering stats wherever and whenever the app is running. Intelligent agents need to compress and stream the data to large-scale, multi-threaded, and multi-queued database systems in order to handle the transaction flow that typical commercial APM systems were not designed for.
Velocity: Sooner is better than later
Many revenue-impacting transactions and user experiences now last only a few seconds, making it essential to collect system data in real time with one-second granularity. Sampling APM environment data at one-minute or five-minute intervals simply results in too many unknowns. The elastic and dynamic cloud environment changes so much that many performance issues are intermittent and unpredictable and trying to catch these elusive problems with sampled transaction trace data can be both time-consuming and frustrating.
Advanced APM tools use efficient monitoring processes at the operating system level to get second-by-second environmental statistics and trace all transactions across all back-end tiers. Unique identifiers can be used to stitch together and trace transactions from client to server and back, while Java and .NET agents are able to spin up and down with the applications themselves. This high-resolution, high-frequency performance data is continuously streamed to a scalable analytics console. Instead of investigating a problem with averages and guesses or waiting to collect additional data for analysis, you can examine the specific transaction and quickly determine the underlying causes.
Typical cloud and microservices-based apps rely on a broadly distributed infrastructure, encompassing hundreds or thousands of components, innumerable method calls, and rapid state changes. Similar transactions can take very different paths through the infrastructure based on factors like the time of day, the user’s location, or their demographic profile. There are hundreds of interacting attributes that can affect the user experience, from the virtual servers and databases, to the back and forth communications across multiple networks, and the wide range of user devices. Modern APM tools gather the broadest possible diagnostics data and user metadata to close the gaps in performance visibility, and then apply new technologies and techniques to store and analyze the resulting Big Data that exceeds the processing capacity of legacy analytics systems. Machine learning can find patterns that indicate potential bottlenecks and new types of visualizations can illustrate the complex and changing interdependencies between cloud and on-premise applications, microservices, infrastructure, and networks, helping DevOps quickly determine the critical focus areas.
APM: Scaling to support Big Data
A high-resolution application performance management system can generate petabytes of data per day, resulting in a very real Big Data problem. Instead of making tradeoffs between data quality and scalability, modern APM tools need to scale to meet the needs of Big Data with compressed data streaming, high-speed database technology, and AI-augmented analytics. These proven Big Data strategies provide detailed visibility into individual user experiences, powerful analytics to find common issues across multiple applications, and non-aggregated historical data to diagnose intermittent problems and to evaluate the effects of code and infrastructure changes. New technologies and approaches to APM mean that IT no longer needs to make a trade-off between data quality versus monitoring scalability. Collecting and processing the volume, velocity, and variety of data opens up new opportunities to accelerate DevOps processes and significantly improve end user experience.