Zipf’s law

$$ cf_i \propto 1/i = K/i $$

where

$cf_i$ is collection frequency, number of occurrences of the term $t_i$ in the collection.

We want to know about relative frequencies of terms in a collection (not vocabulary).

Implications

Most frequent term occurs cf1 times.

Second most frequent term occurs cf1/2 times.

2021-03-17