How does blocking increase compression?
Previously we have to store dictionary term pointers in a table:
Freq | Postings ptr | Term ptr |
---|---|---|
33 | ||
29 | ||
44 | ||
126 |
Given a string of 3.2MB, we have 3.2M positions, so term pointers are 3bytes each.
If we do blocking, now we only store every k block term pointers.
Then we encode relative pointer offsets in the string itself.
Since words are not very long, we can encode these pointers as 1 byte.
This allows us to save 9 bytes on 3 pointers,