Q&A: How platform engineering teams manage infrastructure and security

Published: August 16th, 2024

Gartner has predicted that by 2026, 80% of large engineering organizations are going to have platform engineering teams.

To talk about why platform engineering is gaining so much traction, Keith Babo, head of product at Solo.io, joined us on the most recent episode of our podcast, Get With IT.

Here’s an edited and abridged version of that conversation:

Gartner has reported about 80% of large engineering organizations are going to be deploying platforms by 2026. Why are we seeing this growth? And what are some of the quick benefits?

Let me baseline real quick with a concept to understand what this market is like. I like to think of platform engineering as a two-sided marketplace. We know two-sided marketplaces, like for credit cards, right? There’s card holders and then there’s vendors, or, like eBay, there’s buyers and sellers. Netflix, there’s content creators and then there’s subscribers. Platform engineering is no different.

The two parties in the network are essentially developers and operations teams. Developers want self service. They want to be able to focus on developing their applications and not think about infrastructure. Operations teams are responsible for taking those applications that the development teams create and supporting them in production, which means scaling, securing, observing, and debugging those applications.

What platform engineering does is it basically formalizes the concept of a platform as a product. In a two sided marketplace, the product is what ties these two components or networks together in a mutually beneficial relationship. And that’s exactly what platform engineering does. It surfaces self-service interfaces through a developer portal to these engineers or these application developers to onboard their apps, but the platform teams can create guardrails around how those apps are deployed, so that they can observe, secure, and make them resilient at runtime.

What do you mean when you’re talking about application networking?

I see four components of that. There’s security, observability, resiliency and traffic management. So security is exactly as it sounds, right? I want to keep my network traffic private. Only identified parties that are authorized to talk to one another should be able to do that.

Resiliency — The cloud’s an ephemeral, sort of dynamic place. Containers, clouds, regions are going up and down, right? I need my services to be resilient in the face of those ephemeral failures.

Traffic management — If I’m doing canary or A/B rollouts, I want something in my network to be able to facilitate that.

Observability — Is my app performing? Is it successful? How am I limiting the mean time to resolution for any issues I hit?

So these are all things that can happen at the network layer, and that intersects with platform engineering because of the guardrails that platform teams need to put in place. You can drive all those guardrails through declarative configuration in the networking infrastructure to realize those benefits.

We’ve heard about dynamic configuration, which I guess could be a similar thing, where people can then create the infrastructure they need on the fly if they want to do a partial rollout to one cohort, instead of doing a broad rollout to everyone, or things like that.

That’s such an important point. Let me double click on that for a moment, because those are compatible things. Development teams hate ticket-based cultures, right? They hate when they have to provision a new environment. They have to file a ticket, they have to wait a week until it gets set up. Now, I actually want to deploy an application, now it’s in architecture review, another ticket to get it enabled, to deploy to production.

They want self service, and that’s what we mean. Like they want to be able to use an internal developer portal and a UI or an API to automate deployment so they get a high level of dynamism they want. But it’s all done with those guardrails to make sure it’s safe and secure to deploy.

That’s an excellent point. It seems like there’s a lot of moving parts, especially when we’re talking about cloud native application development. So talk a little bit about the security aspect of that, and how can organizations ensure that, as all these parts are moving dynamically, nothing is becoming a vulnerability or exposing something that shouldn’t be exposed, or somebody is seeing it who shouldn’t be seeing it.

From our standpoint, the way we view the architecture is that there’s two fundamental planes of traffic. There’s a north-south traffic plane. So you have, let’s say, a Kubernetes cluster, and it’s taking traffic from the outside, like public Internet, and that’s coming into the cluster, which is a north-south traffic plane. That’s where you’re going to have the highest level of security in terms of authorizing incoming traffic, being able to detect threats, like with a web application firewall, and making sure there’s no data exfiltration of private data that’s in your network escaping to the public Internet.

These are all concerns around the north-south traffic barrier, but many companies stop there. They might deploy an API gateway that handles north-south traffic, but we’re starting to see more and more exploits happen once an attacker gets access to the inside of the network. Once the attacker is inside of the gate, if you have not secured your internal network and adopted a zero trust architecture, then that attacker can run wild within that network and attack services from inside the gate. So that’s really the security component we see is that leveraging things like declarative configuration — what I mean by that is declarative configuration is basically having configuration that you can check into a git repository and then deploy automatically alongside your applications to make sure that they’re always secured in your environment, both from a north-south perspective and an east-west or service-to-service perspective.

I understand that one of the key things that organizations can use to secure that kind of communication is through this mutual TLS. So how does that come into play? And how important is that for an organization to use if they’re going to deploy a platform like this?

So mTLS is critical for two reasons. One is that it encrypts the traffic in transit. As you’re exchanging PII, transactional information, healthcare information, whatever that might be in a network, that’s just live and open on the wire for anyone that can actually observe that network. So encrypting that in transit becomes very important to prevent eavesdropping attacks.

Just as important, you want to make sure that all services that are communicating with one another in the network are authorized to do so, so having a strong sense of identity from a client perspective, and validating that identity with mTLS, that’s a mutual part of that where both parties are authenticated or providing credentials that verify their identity, that two given workloads or services allowed to talk to one another, and those are the two components of why mTLS is so important for interior security to support zero trust architectures.

So let’s take a step back for a second and talk about platform engineering. In a broader sense, we just saw a survey from the development tools company Atlassian talking about the developer experience and how platform engineering can be an important thing when, as you’re talking about, developers want to be able to self serve and create what they want to create when they want to have it, and not have to wait and all the inefficiencies that go with that. So the question for you is, how much input do developers have in the creation of the platform?

Ultimately, the goal of platform engineering is to reduce the cognitive load for the developer on how they get their applications to production. I’ve spent a long time in development in my career, and I know how these teams are measured. If I’m developing an application, that application has zero value to my organization until the time it is deployed in production and used by customers. Up until that point, it’s a cost center the entire time. Only once it’s deployed and used by customers am I realizing value.

Therefore, as a developer, I am hyper focused, as soon as development is done, I want that app in production right now. If I have to start worrying about how am I handling security for this app? How am I handling retries and circuit breaking and data exfiltration controls? This is what is called in the industry, undifferentiated heavy lifting. Application developers should be focusing on business logic, not on infrastructure concerns, but we can’t safely deploy these applications without addressing those infrastructure concerns, and that’s where platform engineering really shines, in connecting both sides of that market.

We reduce the cognitive load on developers by giving them easy self service to spin up clusters and deploy applications, but it’s done in such a way with the right guardrails that we’re sure that those are safe, secure and resilient when operations teams need to support them in production.

You may also like…

The real problems IT still needs to tackle for platforms

Platform Engineering is not (just) about infrastructure!

Article Tags

platform engineering, solo.io

About ITOps Times

View all posts by ITOps Times

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__atuvc	1 year 1 month	AddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.
__atuvs	30 minutes	AddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.

Cookie	Duration	Description
__gads	1 year 24 days	The __gads cookie, set by Google, is stored under DoubleClick domain and tracks the number of times users see an advert, measures the success of the campaign and calculates its revenue. This cookie can only be read from the domain they are set on and will not track any data while browsing through other sites.
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_WTGVKVXEZJ	2 years	This cookie is installed by Google Analytics.
_gat_gtag_UA_107693958_2	1 minute	Set by Google to distinguish users.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
_jsuid	1 year	This cookie contains random number which is generated when a visitor visits the website for the first time. This cookie is used to identify the new visitors to the website.
at-rand	never	AddThis sets this cookie to track page visits, sources of traffic and share counts.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
iutk	5 months 27 days	This cookie is used by Issuu analytic system to gather information regarding visitor activity on Issuu products.
uvc	1 year 1 month	Set by addthis.com to determine the usage of addthis.com service.

Cookie	Duration	Description
IDE	1 year 24 days	Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
loc	1 year 1 month	AddThis sets this geolocation cookie to help understand the location of users who share the information.
mc	1 year 1 month	Quantserve sets the mc cookie to anonymously track user behaviour on the website.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
__gpi	1 year 24 days	No description
_heatmaps_g2g_101137905	10 minutes	No description
cf_7167_id	20 years	No description
cf_7167_person_last_update	session	No description
GoogleAdServingTest	session	No description
prism_252377639	1 month	No description
querylyvid	3 months	No description
xtc	1 year 1 month	No description

Q&A: How platform engineering teams manage infrastructure and security

Article Tags

Subscribe to SDTimes

About ITOps Times

Related Articles

Solo.io releases Gloo AI Gateway

Achieving Proper Data Governance by Training Centrally; Inferring Locally; and Protecting Data Everywhere

Working toward AIOps maturity? It’s never too early (or late) for platform engineering

KubeCon + CloudNativeCon Europe 2024: Istio 1.22, profiling support in OpenTelemetry, wasmCloud 1.0, and more