Of the many programming fads and trends that have come and gone in the past 20 or so years, application performance management is not one of them. It’s here to stay, though perhaps not in its current definition or even its current use.
Open-source time series database provider InfluxData’s senior manager Daniela Pontes explained that for as long as organizations have been putting applications in front of users, they’ve wanted to know how those applications are performing. “You always wanted to know how long it takes for a certain resource to download or a page to be viewed,” she said.
But software development methods have changed significantly in the past few decades. As the application development process has evolved to be more dynamic and agile, applications have been split into smaller and smaller pieces so that certain parts could be updated, upgraded, or modified in a very fast way.
This agility, though, has come at a price. Breaking these applications up into small pieces has made them more susceptible to performance degradation. “One little degradation here could create a cascading effect that would impact the application in a very visible and noticeable way. But to find that little point where the situation started is much harder than it used to be when you had one whole application,” said Pontes.
Now, APM has become more essential due to the increased complexity of application environments and their susceptibility to performance degradations, Pontes explained. For example, events like updating or migrating can cause points of degradation in an environment that is already sliced into pieces.
Not only that, but the nature of how APM needs to operate has changed as well. Ten or 20 years ago, a company might have had two major releases per year, but today, companies could be deploying hundreds of releases every day, explained Mirko Novakovic, CEO and co-founder of microservices APM provider Instana. “That changes the dynamics of the application. It means that APM has to adopt to these changes.”
With potentially hundreds of releases being pushed out every day, automation is key. You can no longer do manual configuration when you have such frequent releases, Novakovic explained.
Another thing that has changed is that APM tools need to be accessed and used by more people than in the past. Where once there were a few skilled and trained performance engineers using APM tools to fix problems, now hundreds of operators need to be using that tool throughout their day. According to Novakovic, this is a huge challenge. How can you roll out a tool that hundreds of developers can leverage? Novakovic said he believes the answer is to make a tool that is easy to use, but that is also able to adapt to individual roles. Users need to be able to figure out how to use it on their own, but also customize it to solve specific use cases, he explained.
Another challenge Novakovic pointed out is the scale of environments today. When you have tens of thousands of containers in your environment, multiple challenges arise. One challenge is related to resources. Deploying an APM in each container would utilize way too many resources. And because containers are small, they normally don’t have those resources to spare. “Only having one agent per platform and not having a single agent for every type of technology and every container is a trend we’re seeing,” said Novakovic.
Another challenge relates to data. When you have thousands of components, you could be looking at tens of thousands, hundreds of thousands, or even billions of metrics and traces. “How do you process them? How do you deal with overhead? How do you make sure that the network is not overutilized by the APMs? So there are multiple challenges in managing the overhead. And multiple approaches on how to attack this. And I think what’s a challenge for the customers is how to understand the different types of overhead that there are, how they can tweak it, and it’s also about how much time they need to spend to configure it so it’s working in their environment,” said Novakovic.
There is currently a trend towards standardization of tracing, Novakovic explained. “So I would say overall that means that the data APM gathers is more or less becoming commodity.” This means that it’s no longer just about gathering the data. It’s about what you’re actually doing with that data once it’s gathered. Novakovic explained that this is leading to the creation of new categories, like AIOps, that can actually produce value from the data that an APM tool gathers.
Another trend that Sigelman sees at LightStep is that APM doesn’t have as much overhead anymore. “It’s no longer acceptable to have any overhead in an application,” he said.
Over the next few years, Novakovic predicted that there will be increased intelligence in APM solutions. “Because of the complexity of the infrastructure and the applications, people will need more assistance by the APM tool to figure out where the problems are and monitor it 24/7,” said Novakovic.
Other types of monitoring tools will become part of APM
Novakovic also predicts that many different types of monitoring tools will merge into the APM category. For example, infrastructure monitoring and logging may merge with APM.
Sigelman doesn’t even know if the APM category will retain its name. For example, LightStep has branded its APM products as [x]PM and is targeting it specifically at microservices and serverless. “You have people that are still offering APM and I’m sure they’re still making a lot of revenue off of it. But it’s a totally different product. It would be useless in a microservice environment and probably vice versa. So I think as a category it’s getting pretty confusing. But I’ve also seen that it survives.”
Sigelman also believes that any APM not designed to understand applications with hundreds of different moving parts will die off.
What to look for in APM solutions
According to Pontes, there is no set of qualities that organizations should be looking for in APM solutions. “We believe that APM is something that is very particular, so you need to understand your own needs.”
For example, distributed tracing would be an absolute necessity for making sense of things in larger distributed systems, Sigelman said. “If you have a hundred or a thousand microservices, if you can’t understand how they depend on each other on a per transaction basis, there’s no way you’re going to be able to explain performance issues in your system,” he said.
Sigelman recommends that end users take a deep look at the problem that they’re trying to solve with APM. “If they’re adopting cloud native and microservices and serverless things like that, the number one problem that they’re going to have is that you can’t understand any one service in isolation and you need to understand the entire thing as a whole. And APM as a category has traditionally been pretty bad at actually delivering on that.”
Novakovic believes that there are several factors an organization should be considering. A few questions to ask are how often they’re releasing software, what level of granularity is needed for metrics, and how much they’re willing to pay for a solution. “So it really depends on your architecture, on what you want to accomplish, and this drives the criteria that will be more important or less important.”
Pontes also recommends that organizations look to open-source tools so that they are less restricted. “We believe that open-source is fundamental so that you would not be locked into one solution that later on you find out doesn’t cover all your instrumentation needs because now you have another application or your application now has a better code implementation and you have to reinstrument… So you need to have that flexibility to implement APM in the way that you need it.”