akuchling: Sherlock Hemlock (Default)
 Apache Zookeeper (http://zookeeper.apache.org/) is a specialized filesystem-like database.
Database features
The database consists of znodes, which are identified by a filesystem-like path, e.g. "/prod", "/solr/status".  Each znode can contain a chunk of content, represented as a string.  It's expected that this content is relatively small, e.g. a few KB and not 600MB.   Unlike a filesystem, a znode can have content and still have children, i.e. /solr can have the value 'active', yet /solr/status, /solr/nodes, etc. can also exist.
It's possible to create an ACL controlling who can read/write a particular node, and also to create a watch that's triggered when a particular node is modified.  This can be used to create data structures inside Zookeeper such as locks or queues.  For example, to create a barrier you might pick a znode path such as /barrier/barrier-id and issue an exists(path) call that also sets a watch.  If exists() returns False, the barrier isn't held and you can create it and then go do your computation.  If exists() returns True, wait for a watch event that tells you the barrier has
been deleted and then try the exists() again.
The database is kept in memory while the server is running.  Changes are logged to disk, so the server can be stopped and restarted as necessary.
Distributed operation
Zookeeper supports distributed operation.  You can configure N zookeeper nodes into a set. Updates made on one node will become visible to clients of all the other nodes in the set.  This is coordinated by a leader node that's elected when the nodes are started.  If the leader node suddenly dies, a new leader is automatically elected.
It was straightforward to set up a 3-node set.  Each node has its own data directory and its own config file.  All the config files have a bunch of settings describing the nodes within the set: 
server.1 = node1:2888:3888
server.2 = node2:2888:3888
server.3 = node3:2888:3888
Each data directory contains a "myid" file that gives the index (1, 2, 3) of which node this is.  Once you start all the servers, they'll elect a leader that will propagate write operations to the other servers.  Clients can connect to any of the servers and should see the same database.
There's a command-line tool, zkCli.sh, that allows running commands on a Zookeeper database.  Example invocation:

zkCli.sh -server zknodeN:2181 ls /
You could use Zookeeper's database to store configuration info, storing data in znodes instead of a set of nested dictionaries as you would in JSON.  This would be primarily a read-only database, but Zookeeper can also be used for applications that need frequent updates.  For example, I've seen references to  using Zookeeper as a status board for a set of servers.  You might run 'apachectl status' on server prod1 every minute and store the output in the znode /apache/prod1.  A monitoring process could then set a watch on /apache and its children, get triggered every time a server updates its status, and then react when one changes from 'running' to 'stopped'.
Zookeeper comes with Java and C APIs.  There's a Python wrapper for the C API, but a more complete package seems to be kazoo (http://kazoo.readthedocs.org/en/latest/).  The README for kazoo says memory management for the C API is difficult, so kazoo is pure Python and supports several different event loops such as gevent and Twisted.  kazoo also has implementations of some common usage patterns such as locks, queues, and elections.
akuchling: Sherlock Hemlock (Default)
Work has resolved the question of what NoSQL database I should learn: I'm being given a project that will involve Redis as well as PostgreSQL.

Available operations:

  • get the value corresponding to a string-valued key
  • MGET key1 key2 key3: get multiple values, returning them as an array
  • set the value corresponding to a key. Variants: set the value for a key, only if the key is already present or isn't already present.
  • DEL key: delete the value for a key.
  • EXPIRE key delay: cause the key to expire and disappear after 'delay' seconds. Resolution is 1msec. (? correct ?) Can also specify delay when adding the key: SET key value EX delay.
  • TTL key: returns the # of second the key has left before it expires.
  • TYPE key: returns the type of the corresponding value
  • EXISTS key: returns 1/0 based on whether the key exists.
  • Increment/decrement counters atomically. INCR key increments the value by 1; INCRBY key n increments the value by n.
  • GETSET key value: stores a new value and returns the previous value.

Data types for values:

Linked lists:

  • Written as 1,2,3,4.
  • LLEN key returns the length.
  • LPUSH key value adds a value as the head of the list, and runs in O(1) time.  RPUSH key value  adds the value as the tail and is O(n).
  • There are corresponding LPOP and RPOP commands.
  • LRANGE key start end returns an array containing the corresponding values, and is also O(n).
  • Capped lists: LTRIM key start end keeps only the specified range and discards everything else. To keep the 10 most recent items, do LTRIM key 0 9.
  • Blocking pops: BLPOP and BRPOP block if the list is empty, or can take a timeout. Useful for implementing queues. There's even BRPOPLPUSH srckey destkey will pop the rightmost item from list 1 and push it onto list 2.


  • HSET key attr1 value1: set a single key in a hash. For multiple keys, use HMSET key attr1 value1 attr2 value2 ...
  • HGET key attrname: retrieve a key within a hash. HGETALL key returns a flat array of [attr1, value1, attr2, ...].
  • Other commands also have hash variants: HINCRBY, HEXISTS.


  • Are unordered collections of items. Commands include SADD key elem, SISMEMBER key elem to check membership, SPOP key removes a random element and returns it, SCARD key returns the number of elements ('cardinality').
  • Intersection: SINTER key1 key2 ....

Sorted sets: each element has an FP value called its score, and the set is sorted by this score in ascending order.

  • ZADD key score1 elem1 score2 elem2 .... Updates are O(log N).
  • ZRANGE key start end and ZREVRANGE key start end return slices in sorted or reversed order, returning just the elements. Add "WITHSCORES" to also return the scores.
  • To search by score, ZRANGEBYSCORE key min max. min and max are inclusive; you can say -inf or +inf. ZREMRANGEBYSCORE would remove the matching elements.
  • Ranking: ZRANK key elem returns the position of the element in the sorted order.
  • Redis 2.8 can search by the elements as well: ZRANGEBYLEX key [B [P returns elements between B and P.

Bitmaps: bit operations are defined on strings. Size limit is 512Mb = 2**32 bits.

  • SETBIT key bitnum 0/1 sets a bit, and GETBIT key bitnum returns it.
  • Bit operations over entire string: BITOP and|or|xor|not destkey srckey1 srckey2 ... - for NOT, only one srckey can be supplied.
  • BITCOUNT key counts set bits, and BITPOS key 1/0 returns the index of the first 1 or 0 bit (you can also constrain this to a range).

HyperLogLogs: probabilistic data structure for counting # of unique items seen. Has a certain amount of error on the resulting value, but is compact and doesn't grow unboundedly.

  • PFADD key elem1 elem2 elem3 ... stores the elements.
  • PFCOUNT key returns the # of unique elements seen.


akuchling: Sherlock Hemlock (Default)

September 2016

2526272829 30 


RSS Atom

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Sep. 23rd, 2017 03:47 am
Powered by Dreamwidth Studios