Distributed coordination

A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable.
Leslie Lamport

Building distributed systems is hard. To be able to build robust, reliable and fault-tolerant distributed systems, developers need to understand the intricacies of building such systems. More often than not, it is the failure scenarios that are most interesting. A distributed system could fail in non-intuitive ways, and it is quite easy to overlook them during development, and later be bitten by it in production. It requires a high level of commitment from developers to stay focused and work through the problem space. Case in point, ZooKeeper was widely misunderstood (for a long time) to provide linearizable reads. Fortunately, not all application developers need to build distributed systems from scratch since we have some building blocks that we could rely upon.

In this post, we will focus on one such building block commonly called as a “coordination service”. ZooKeeper has traditionally been the de-facto product of choice in this space, although over the last few years, etcd has gained popularity as well. We will briefly discuss some background on needing a coordination service, its use-cases, and then compare ZooKeeper with etcd. The intention is to give the reader enough information so as to help make a decision as to whether a coordination service is needed, and if so what service to use.

Why do we need a coordination service?

To get more tasks done on a single machine we use a multithreading model or an event driven model. To get even more tasks done, we could spread the load onto multiple machines. Within a single machine, different tasks coordinate with the primitives provided by the underlying hardware, ultimately being exposed to the application developer via the operating system or language libraries (e.g. compare-and-swap). The goal of a coordination service is to provide such primitives for tasks running on multiple machines.

Google’s chubby is a good read for anyone trying to understand building such a distributed coordination service (in particular, a lock service), from which ZooKeeper draws its inspiration from. At the core of a distributed coordination service is a consensus protocol. Paxos has been a widely adopted protocol. It is notoriously known to be a complex protocol to understand, and hence to implement. So much so that it prompted researchers to create a consensus protocol with simplicity as one of the main objectives (Raft). Besides these, there are also systems such as virtual synchrony which use a different execution model to solve the consensus problem.

As a side note, When building distributed systems, it is important to recognize (and not be confused) about encountering an infinite regress of turtles-all-the-way-down syndrome. For instance, ZooKeeper uses a leader election algorithm in its implementation, and we could also use ZooKeeper for implementing leader election within our applications.

Some of the commonly cited use-cases for coordination with examples are:

Distributed locking
- enables us to build synchronization primitives such as a mutex, semaphore, atomic counter etc. A mutex could be useful in managing access to a shared resource in a cluster, and only one server can access the resource at a time. A semaphore could be useful in managing access to a limited resource (e.g. only 10 workers can be started for a given job). An atomic counter could be useful to manage concurrent access to a shared counter in a cluster, and sometimes they also provide operations such as CAS which are invaluable when consistency is prime.
Group membership
- enables us to logically group servers. e.g. in a web application, “application servers” and “database servers” could be distinct groups.
DNS replacement
- enables us to replace DNS with a coordination service that does key lookups. This was one of the motivational factors for building Chubby.
Configuration management
- a place holder for meta-configuration. This could be any configuration per application requirements, being considerate of the limitations imposed by the coordination service. e.g. in ZooKeeper, each znode is limited to hold at most 1MB.
Leader election
- one of the servers in the cluster could be treated differently/specially by making it a leader. e.g. a leader in a group of application servers might actually be scheduling the tasks which could later be picked up by workers running on other servers.

ZooKeeper vs. etcd

A nice comparison can be found on etcd.io. The chart below is intended to complement it.

Comparison	ZooKeeper	etcd
Language implemented in	Java	Go
Data model	Hierarchical	Flat (seemingly for performance benefits)
Platform support (server)	Linux, Windows – Production(Darwin – Development only) See support matrix	Linux – Production(Darwin, Windows – Experimental) See support matrix
Official client APIs	Java, C (over TCP)	Go (gRPC over HTTP/2 )
Platform support (C client)	Linux – Production Darwin, Windows – Not supported	A few C/C++ clients are listed here but none seem very active. Most of these are based on the HTTP APIs provided with etcd v2 (i.e. they internally use a HTTP client library to talk to etcd) and not the newer gRPC based APIs that etcd v3 supports
Primitives (beyond key/value storage and watch APIs)	Implementations of recipes available in Java (Curator), and not readily available in C/C++	Wider range of recipes implemented in Go, and not readily available in C/C++
HTTP APIs	Not supported	gRPC gateway
Consensus protocol	ZooKeeper Atomic Broadcast	RAFT
Discussion
Adoption	ZooKeeper, being a much older project (first release was in 2008), has wider adoption. Being an old project, it has both its share of benefits (tried and tested over the years to make it a robust system) and drawbacks (being seen heavy)	etcd, being a newer project (first stable release was in 2015), has comparatively lower adoption. Being comparatively a newer project, it is leaner, fixes some of the limitations of ZooKeeper, and provides a richer API. See why etcd

Conclusion

Coordination services such as ZooKeeper and etcd definitely make it easy for application developers to build distributed systems, although to use them correctly and effectively still requires considerable time and effort in understanding the nuances of using these systems.

For a quick overview of various consensus algorithms, see Distributed Consensus Making Impossible Possible – Heidi Howard – JOTB16