Managing multiple Dockers hosts is hard. Docker orchestration systems are built to address that problem.

There are multiple options for Docker orchestration. Last week we talked about Google's Kubernetes, and this week we are going to discuss a cluster resource management solution that has recently become an Apache "Top Level Project": Mesos.

What is Apache Mesos?

A project originally born from scaling Twitter, Apache Mesos is a datacenter-level project that aims to provide high-level, finely-grained, and collaborative locality-aware resource management to various computing frameworks.

How Does Apache Mesos Work?

Mesos operates as a cluster manager, with a master controller that communicates with agents residing on each node (e.g. host machine) and "framework schedulers". Nodes run with a Mesos agent on them, which in turn is responsible for managing the local resources, any "framework executors" running on the nodes and coordinating with the master. "Framework schedulers" are responsible for telling the Mesos master what jobs they have ready to run and what resources those jobs require; "framework executors" in turn are handed jobs that the cluster is ready to have run on the right nodes with the right access to resources.

Mesos Architecture

Data locality

Data locality is -- simplistically -- the concept of how "expensive" the data or resource you need is for you to access. Generally speaking, the cost here is latency or time needed to access a resource, but could be, say, machine or network use charges.

A fundamental example of locality and cost can me found inside every IPv4/IPv6 attached computer: the routing table details how to send packets from your machine to a target ip, and when there's more than one acceptable route for a packet to take the kernel chooses the route (generally network interface) to send those packets over based on which route has the lowest "metric", the key indicator of cost. (You'll appreciate this on a well configured-laptop that has a wireless and wired interface both off the same network!)

An even more tl;dr version of this can be found on -- where else -- Wikipedia, at Locality of Reference.

To impress on the importance of leveraging data locality, let's imagine a Mesos cluster that runs on 4 nodes, two of which have direct access on node-attached storage of, say, a 500TB dataset (dataset A), and all of which have direct access to a 100GB dataset (B); three frameworks run inside the cluster, all of which run jobs that use either dataset A or B, but never both at the same time. When the framework schedulers communicate with the Mesos master as to what jobs they have and what resources those jobs need, all other things being equal Mesos will offer slots on the nodes with direct access to dataset A to the jobs that need it. This locality of access can result in drastic speed increases, as the cost of accessing data locally -- e.g. internal disk, SAN/SAS/SCSI-attached storage, etc -- is often significantly less than accessing data remotely, often over a network and through another host.

Lower cost to access data/resources translates into faster processing and more efficient utilization of cluster resources.

Coming Soon

We are interviewing Florian Leibert, Founder and CEO of Mesosphere (the commercial company behind the Mesos project) on the CenturyLink Labs Podcast. Check back soon to get his background and history on the project in more detail.

We also soon publish a follow-up post on how to get started using Apache Mesos. Until then, check out these great links...