How are documents represented?

They are vectors.

Terms are the axes of the vector space.

Their tf-idf weights are the scalar values.

They are high-dimensional.

A con of this is that we are unable to consider the ordering of words in a document, since there is no “ordering of axis”