Relevance feedback formulae for IR

Pseudo relevance feedback

  1. Assume / be given K relevant documents.

  2. Compute Centroid for relevant & non relevant docs: Cr, Cnr.

  3. Compute cosine similarity for Cr, Cnr and q => qr, qnr

  4. Find the q where qr - qnr is maximized.

  5. This is our new query.

  6. results from this.

Recap:

  1. We are given a query as a list of words.

  2. We calculate a query vector, q from this list of words

  3. We are given a list of documents, for each we calculate document vector, d

  4. for each d, we compute the cosine similarity with q as follows:

  q·d
-------
|q|·|d|

Which is equivalent to: unit_vec(q) * unit_vec(d)

  1. How do we calculate q from list of words in a query (1nc.ltc)?

Weight for a single word: (1 + log(tf)) * log(N / df)

After obtaining a vec of weights, vw, get unit_vec(vw)

recap:

  1. How do we calculate d from list of words in a document (1nc.ltc)?

Weight for a single word: 1 + log(tf)

After obtaining a vec of weights, vw, get unit_vec(vw)