# Bigram

Index every consecutive pair of terms in the text as a phrase.

“Hi I am” -> “Hi I”, “I am”

## For longer phrases

use conjunction search for “Hi I am” -> “Hi I” AND “I am”

We have to maintain doc source however, to make sure these terms actually appear next to each other, rather then at different places in the document.

## For extended biwords

Form:

N X* N

Where N is a noun X* means that there is one or more articles / prepositions

Catcher in the rye becomes catcher rye. This is because we segment out nouns and articles, only preserving nouns.