Sep. 30th, 2016

akuchling: Sherlock Hemlock (Default)
 Apache Zookeeper ( is a specialized filesystem-like database.
Database features
The database consists of znodes, which are identified by a filesystem-like path, e.g. "/prod", "/solr/status".  Each znode can contain a chunk of content, represented as a string.  It's expected that this content is relatively small, e.g. a few KB and not 600MB.   Unlike a filesystem, a znode can have content and still have children, i.e. /solr can have the value 'active', yet /solr/status, /solr/nodes, etc. can also exist.
It's possible to create an ACL controlling who can read/write a particular node, and also to create a watch that's triggered when a particular node is modified.  This can be used to create data structures inside Zookeeper such as locks or queues.  For example, to create a barrier you might pick a znode path such as /barrier/barrier-id and issue an exists(path) call that also sets a watch.  If exists() returns False, the barrier isn't held and you can create it and then go do your computation.  If exists() returns True, wait for a watch event that tells you the barrier has
been deleted and then try the exists() again.
The database is kept in memory while the server is running.  Changes are logged to disk, so the server can be stopped and restarted as necessary.
Distributed operation
Zookeeper supports distributed operation.  You can configure N zookeeper nodes into a set. Updates made on one node will become visible to clients of all the other nodes in the set.  This is coordinated by a leader node that's elected when the nodes are started.  If the leader node suddenly dies, a new leader is automatically elected.
It was straightforward to set up a 3-node set.  Each node has its own data directory and its own config file.  All the config files have a bunch of settings describing the nodes within the set: 
server.1 = node1:2888:3888
server.2 = node2:2888:3888
server.3 = node3:2888:3888
Each data directory contains a "myid" file that gives the index (1, 2, 3) of which node this is.  Once you start all the servers, they'll elect a leader that will propagate write operations to the other servers.  Clients can connect to any of the servers and should see the same database.
There's a command-line tool,, that allows running commands on a Zookeeper database.  Example invocation: -server zknodeN:2181 ls /
You could use Zookeeper's database to store configuration info, storing data in znodes instead of a set of nested dictionaries as you would in JSON.  This would be primarily a read-only database, but Zookeeper can also be used for applications that need frequent updates.  For example, I've seen references to  using Zookeeper as a status board for a set of servers.  You might run 'apachectl status' on server prod1 every minute and store the output in the znode /apache/prod1.  A monitoring process could then set a watch on /apache and its children, get triggered every time a server updates its status, and then react when one changes from 'running' to 'stopped'.
Zookeeper comes with Java and C APIs.  There's a Python wrapper for the C API, but a more complete package seems to be kazoo (  The README for kazoo says memory management for the C API is difficult, so kazoo is pure Python and supports several different event loops such as gevent and Twisted.  kazoo also has implementations of some common usage patterns such as locks, queues, and elections.


akuchling: Sherlock Hemlock (Default)

September 2016

2526272829 30 

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Sep. 23rd, 2017 03:51 am
Powered by Dreamwidth Studios