Paper: Raft


Raft

Paper -> https://raft.github.io/raft.pdf

Usenix -> https://web.stanford.edu/~ouster/cgi-bin/papers/raft-atc14.pdf

Website -> https://raft.github.io/

Designing for Understandability Raft 2016 -> Video , Slides

Raft User Study, 2013 -> Video, Slides

Motivation: Replicated State Machines

  • Service that is replicated on multiple machines:

Raft Basics

  • Leader based:
  • Server states:
  • Time divided into terms:
  • Request-response protocol between servers (remote procedure calls, or RPCs). 2 request types:

Leader Election

  • All servers start as followers
  • No heartbeat (AppendEntries)? Start election:
  • Election outcomes:
  • Each server votes for at most one candidate in a given term
  • Election Safety: at most one server can be elected leader in a given term
  • Availability: randomized election timeouts reduce split votes

Log Replication

  • Handled by leader
  • When client request arrives:
  • Log entries: index, term, command
  • Logs can become inconsistent after leader crashes
  • Raft maintains a high level of coherency between logs (Log Matching Property):
  • AppendEntries consistency check preserves above properties.
  • Leader forces other logs to match its own:

Safety

  • Must ensure that the leader for new term always holds all of the log entries committed in previous terms (Leader Completeness Property).
  • Step 1: restriction on elections: don’t vote for a candidate unless candidate’s log is at least as up-to-date as yours.
  • Compare indexes and terms from last log entries.
  • Step 2: be very careful about when an entry is considered committed

Persistent Storage

  • Each server stores the following in persistent storage (e.g. disk or flash):
  • These must be recovered from persistent storage after a crash
  • If a server loses its persistent storage, it cannot participate in the cluster anymore

Implementing Raft

Client Interactions

  • Clients interact only with the leader
  • Initially, a client can send a request to any server
  • If leader crashes while executing a client request, the client retries (with a new randomly-chosen server) until the request succeeds
  • This can result in multiple executions of a command: not consistent!
  • Goal: linearizability: System behaves as if each operation is executed exactly once, atomically, sometime between sending of the request and receipt of the response.
  • Solution:

Other Issues

  • Cluster membership
  • Log compaction
  • See paper for details Paxos Vs Raft by John Kubiatowicz.

Paper Link: https://raft.github.io/raft.pdf


Last updated: March 15, 2026

Questions or discussion? Email me