Earlier this week, Netflix open sourced Maestro, a highly scalable workflow orchestrator for handling large-scale workflows, like data pipelines.

It can manage the entire workflow lifecycle, including retries, queuing, task distribution to compute engines, and more. 

Workflows in Maestro follow the directed acyclic graph (DAG) structure, composed of individual job definitions, or Steps. These Steps can have dependencies, triggers, workflow parameters, metadata, step parameters, configurations, and conditional or unconditional branches. 

Users can package business logic in several different formats, including Docker images, notebooks, bash scripts, SQL, and Python.

The three main components of the platform are:

  • Workflow engine, which manages workflow definitions, the lifecycle of workflow instances, and step instances
  • Time-based scheduling service, which starts new workflow instances at set times
  • Signal service, which starts new workflow instances when certain conditions are met

Maestro was first announced by Netflix in 2022 as a way to replace its existing workflow orchestrator, which had worked well for them for quite some time, but was facing scale issues. This led to the recognition that they needed a platform that could scale horizontally (as Maestro does), not vertically. 

Since implementing it internally, Netflix has seen an 87.5% increase in executed jobs. It currently runs an average of 500K jobs daily and has run 2 million jobs on busy days, the company said. 

“Maestro has been extensively used within Netflix, and today, we are excited to make the Maestro source code publicly available. We hope that the scalability and usability that Maestro offers can expedite workflow development outside Netflix,” Netflix wrote in a blog post

Since being open sourced earlier this week, Maestro has already received almost 2K stars on GitHub


Read about other recent Open-Source Projects of the Week: