Zipf’s law

$$ cf_i \propto 1/i = K/i $$

where

\(cf_i\) is collection frequency, number of occurrences of the term \(t_i\) in the collection.

We want to know about relative frequencies of terms in a collection (not vocabulary).

Implications

Most frequent term occurs cf1 times.

Second most frequent term occurs cf1/2 times.