LSM Tree

Motivations

Doing lots of disk writes in expensive (disk i/o). We batch writes by using in memory buffers to accumulate, and write in one-shot.

Has two components:

We merge in memory segments into disk.

Make inserts and deletes cheap (disk writes).

LSM Trees cascade data from incoming stream into smaller, higher performing stores.

This is heart of LSM algorithm. It migrate entries from C0 to C1, C1 to C2, …, to free buffer space.

Removes a contiguous segment of entries from c0 to c1.

C0 - Memtable C1 Tree - SSTable.

Helps with compaction.

Store instructions in log, can use for recovery.

Disk i/o not cheap.

Will B+Tree augmented with memory be able to support the same thing?