Gartner has predicted that by 2026, 80% of large engineering organizations are going to have platform engineering teams.
To talk about why platform engineering is gaining so much traction, Keith Babo, head of product at Solo.io, joined us on the most recent episode of our podcast, Get With IT.
Here’s an edited and abridged version of that conversation:
Gartner has reported about 80% of large engineering organizations are going to be deploying platforms by 2026. Why are we seeing this growth? And what are some of the quick benefits?
Let me baseline real quick with a concept to understand what this market is like. I like to think of platform engineering as a two-sided marketplace. We know two-sided marketplaces, like for credit cards, right? There’s card holders and then there’s vendors, or, like eBay, there’s buyers and sellers. Netflix, there’s content creators and then there’s subscribers. Platform engineering is no different.
The two parties in the network are essentially developers and operations teams. Developers want self service. They want to be able to focus on developing their applications and not think about infrastructure. Operations teams are responsible for taking those applications that the development teams create and supporting them in production, which means scaling, securing, observing, and debugging those applications.
What platform engineering does is it basically formalizes the concept of a platform as a product. In a two sided marketplace, the product is what ties these two components or networks together in a mutually beneficial relationship. And that’s exactly what platform engineering does. It surfaces self-service interfaces through a developer portal to these engineers or these application developers to onboard their apps, but the platform teams can create guardrails around how those apps are deployed, so that they can observe, secure, and make them resilient at runtime.
What do you mean when you’re talking about application networking?
I see four components of that. There’s security, observability, resiliency and traffic management. So security is exactly as it sounds, right? I want to keep my network traffic private. Only identified parties that are authorized to talk to one another should be able to do that.
Resiliency — The cloud’s an ephemeral, sort of dynamic place. Containers, clouds, regions are going up and down, right? I need my services to be resilient in the face of those ephemeral failures.
Traffic management — If I’m doing canary or A/B rollouts, I want something in my network to be able to facilitate that.
Observability — Is my app performing? Is it successful? How am I limiting the mean time to resolution for any issues I hit?
So these are all things that can happen at the network layer, and that intersects with platform engineering because of the guardrails that platform teams need to put in place. You can drive all those guardrails through declarative configuration in the networking infrastructure to realize those benefits.
We’ve heard about dynamic configuration, which I guess could be a similar thing, where people can then create the infrastructure they need on the fly if they want to do a partial rollout to one cohort, instead of doing a broad rollout to everyone, or things like that.
That’s such an important point. Let me double click on that for a moment, because those are compatible things. Development teams hate ticket-based cultures, right? They hate when they have to provision a new environment. They have to file a ticket, they have to wait a week until it gets set up. Now, I actually want to deploy an application, now it’s in architecture review, another ticket to get it enabled, to deploy to production.
They want self service, and that’s what we mean. Like they want to be able to use an internal developer portal and a UI or an API to automate deployment so they get a high level of dynamism they want. But it’s all done with those guardrails to make sure it’s safe and secure to deploy.
That’s an excellent point. It seems like there’s a lot of moving parts, especially when we’re talking about cloud native application development. So talk a little bit about the security aspect of that, and how can organizations ensure that, as all these parts are moving dynamically, nothing is becoming a vulnerability or exposing something that shouldn’t be exposed, or somebody is seeing it who shouldn’t be seeing it.
From our standpoint, the way we view the architecture is that there’s two fundamental planes of traffic. There’s a north-south traffic plane. So you have, let’s say, a Kubernetes cluster, and it’s taking traffic from the outside, like public Internet, and that’s coming into the cluster, which is a north-south traffic plane. That’s where you’re going to have the highest level of security in terms of authorizing incoming traffic, being able to detect threats, like with a web application firewall, and making sure there’s no data exfiltration of private data that’s in your network escaping to the public Internet.
These are all concerns around the north-south traffic barrier, but many companies stop there. They might deploy an API gateway that handles north-south traffic, but we’re starting to see more and more exploits happen once an attacker gets access to the inside of the network. Once the attacker is inside of the gate, if you have not secured your internal network and adopted a zero trust architecture, then that attacker can run wild within that network and attack services from inside the gate. So that’s really the security component we see is that leveraging things like declarative configuration — what I mean by that is declarative configuration is basically having configuration that you can check into a git repository and then deploy automatically alongside your applications to make sure that they’re always secured in your environment, both from a north-south perspective and an east-west or service-to-service perspective.
I understand that one of the key things that organizations can use to secure that kind of communication is through this mutual TLS. So how does that come into play? And how important is that for an organization to use if they’re going to deploy a platform like this?
So mTLS is critical for two reasons. One is that it encrypts the traffic in transit. As you’re exchanging PII, transactional information, healthcare information, whatever that might be in a network, that’s just live and open on the wire for anyone that can actually observe that network. So encrypting that in transit becomes very important to prevent eavesdropping attacks.
Just as important, you want to make sure that all services that are communicating with one another in the network are authorized to do so, so having a strong sense of identity from a client perspective, and validating that identity with mTLS, that’s a mutual part of that where both parties are authenticated or providing credentials that verify their identity, that two given workloads or services allowed to talk to one another, and those are the two components of why mTLS is so important for interior security to support zero trust architectures.
So let’s take a step back for a second and talk about platform engineering. In a broader sense, we just saw a survey from the development tools company Atlassian talking about the developer experience and how platform engineering can be an important thing when, as you’re talking about, developers want to be able to self serve and create what they want to create when they want to have it, and not have to wait and all the inefficiencies that go with that. So the question for you is, how much input do developers have in the creation of the platform?
Ultimately, the goal of platform engineering is to reduce the cognitive load for the developer on how they get their applications to production. I’ve spent a long time in development in my career, and I know how these teams are measured. If I’m developing an application, that application has zero value to my organization until the time it is deployed in production and used by customers. Up until that point, it’s a cost center the entire time. Only once it’s deployed and used by customers am I realizing value.
Therefore, as a developer, I am hyper focused, as soon as development is done, I want that app in production right now. If I have to start worrying about how am I handling security for this app? How am I handling retries and circuit breaking and data exfiltration controls? This is what is called in the industry, undifferentiated heavy lifting. Application developers should be focusing on business logic, not on infrastructure concerns, but we can’t safely deploy these applications without addressing those infrastructure concerns, and that’s where platform engineering really shines, in connecting both sides of that market.
We reduce the cognitive load on developers by giving them easy self service to spin up clusters and deploy applications, but it’s done in such a way with the right guardrails that we’re sure that those are safe, secure and resilient when operations teams need to support them in production.
You may also like…