Index every consecutive pair of terms in the text as a phrase.

“Hi I am” -> “Hi I”, “I am”

For longer phrases

use conjunction search for “Hi I am” -> “Hi I” AND “I am”

We have to maintain doc source however, to make sure these terms actually appear next to each other, rather then at different places in the document.

For extended biwords


N X* N

Where N is a noun X* means that there is one or more articles / prepositions

Catcher in the rye becomes catcher rye. This is because we segment out nouns and articles, only preserving nouns.