Riak: Dynamo in Practice with Riak Core
Riak isn't just a database; it's a masterpiece of Erlang engineering. It implements the Dynamo architecture, which focuses on high availability and partition tolerance (AP in the CAP theorem).
The Hash Ring
Riak uses a consistent hashing ring divided into 64 (or more) partitions. These partitions are managed by "vnodes" (virtual nodes).
%% Riak Core vnode initialization
init([Partition]) ->
{ok, #state{partition=Partition}}.
handle_command({put, Key, Value}, _Sender, State) ->
%% Store data locally for this partition
Storage = do_store(Key, Value),
{reply, ok, State#state{storage=Storage}}.
Quorum and N, R, W
The secret to Riak's flexibility is the N, R, W parameters:
- N: Number of replicas (default 3).
- R: Number of nodes that must respond to a read.
- W: Number of nodes that must respond to a write.
If you set W=1, your writes are lightning fast but potentially less durable. If you set W=3, you have high durability but a single node failure can block writes.
Vector Clocks
Because Riak is eventually consistent, two clients might update the same key on different nodes during a network partition. Riak uses Vector Clocks to track the causal history of an object.
Object A, vclock: {node1, 1}, {node2, 2}
If the vclocks are siblings (neither is a descendant of the other), Riak creates a "conflict" (sibling) and lets the application resolve it. This is why Riak is perfect for shopping carts—you never want to lose a customer's item just because of a network glitch.