How do you construct an inverted index?

Text extraction -> Tokenizer -> Linguistic modules (nlp) -> indexer