Atlassian has released an open-source Kubernetes autoscaler that is optimized for batch workloads. The project, Escalator, attempts to solve many of the issues associated with autoscaling. According to the company, it found that clusters either weren’t scaling up or scaling down fast enough.
“Initially with our Kubernetes platform, we were pleasantly surprised by how quickly we were were able to port these batch workloads to Kubernetes pods. However, when the number of concurrent jobs ramped-up, we started to notice a few bumps in the road,” Corey Johnston, Kubernetes platform team lead at Atlassian, wrote in a post.
The problem with scaling up was that when clusters hit capacity, users would have to wait for several minutes for additional Kubernetes workers to be booted up and be able to service the load. Many builds are unable to tolerate extended delays and would fail as a result, Atlassian explained.
The issue when scaling down was that when loads had subsided, the autoscaler would not scale-down fast enough. Atlassian explained that this is not really an issue when your node count is low, but it can become a problem when that number reaches the hundreds and beyond.
To address this, the company created Escalator, a batch of job optimized autoscaler for Kubernetes. According to Atlassian, the project had two initial goals. The first was to provide a preemptive scale-up with a buffer capacity feature that would prevent clusters from filling up. The second was to support aggressive scale-down of machines that were no longer needed. It also wanted to add Prometheus metrics so that IT Ops teams could see how well clusters were doing.
“After several months of work, the result was exactly what we envisioned to build, and what we’ve released into the open source community. Gone are our three minute waits for EC2 instances to boot and join the cluster,” Johnston wrote in a post. “Gone too is all the wasted money we were spending every hour on unused, idle worker nodes. The cluster now scales-down very quickly, so we only pay for the number of machines we actually need. Escalator has enabled us to save a lot of money – ranging from hundreds to thousands of dollars a day – based on the workloads we’re running.”
Going forward, the company is looking into expanding the tool to its external Bitbucket Pipeline users, and how it can manage more service-based workloads. “But given the tremendous benefits its had in our environment, we’ve released Escalator to the Kubernetes community as open source, so others can take advantage of its features too,” Johnston wrote.