LSM Tree


Doing lots of disk writes in expensive (disk i/o). We batch writes by using in memory buffers to accumulate, and write in one-shot.

Has two components:

  • C1: In memory buffers (small)
  • C2: On disk (larger)

We merge in memory segments into disk.


Make inserts and deletes cheap (disk writes).

LSM Trees cascade data from incoming stream into smaller, higher performing stores.

Merging algorithm a.k.a rolling merge

This is heart of LSM algorithm. It migrate entries from C0 to C1, C1 to C2, …, to free buffer space.

Removes a contiguous segment of entries from c0 to c1.

C0 - Memtable C1 Tree - SSTable.

Helps with compaction.


Store instructions in log, can use for recovery.


Disk i/o not cheap.

Further questions

Will B+Tree augmented with memory be able to support the same thing?


Condensed ver: In-depth:

See also

B-Tree vs LSM Tree: LSM Tree to PostgreSQL: SSTable