LSM Tree

Motivations

Doing lots of disk writes in expensive (disk i/o). We batch writes by using in memory buffers to accumulate, and write in one-shot.

Has two components:

  • C1: In memory buffers (small)
  • C2: On disk (larger)

We merge in memory segments into disk.

Design

Make inserts and deletes cheap (disk writes).

LSM Trees cascade data from incoming stream into smaller, higher performing stores.

Merging algorithm a.k.a rolling merge

This is heart of LSM algorithm. It migrate entries from C0 to C1, C1 to C2, …, to free buffer space.

Removes a contiguous segment of entries from c0 to c1.

C0 - Memtable C1 Tree - SSTable.

Helps with compaction.

Recovery

Store instructions in log, can use for recovery.

Tradeoffs

Disk i/o not cheap.

Further questions

Will B+Tree augmented with memory be able to support the same thing?

References

Condensed ver: https://blog.acolyer.org/2014/11/26/the-log-structured-merge-tree-lsm-tree/ In-depth: https://www.cs.umb.edu/~poneil/lsmtree.pdf

See also

B-Tree vs LSM Tree: https://tikv.org/deep-dive/key-value-engine/b-tree-vs-lsm/ LSM Tree to PostgreSQL: https://www.youtube.com/watch?v=llbjTBgTPO4&ab_channel=PGCon SSTable