# Relevance feedback formulae for IR

## Pseudo relevance feedback

1. Assume / be given K relevant documents.

2. Compute Centroid for relevant & non relevant docs: Cr, Cnr.

3. Compute cosine similarity for Cr, Cnr and q => qr, qnr

4. Find the q where qr - qnr is maximized.

5. This is our new query.

6. results from this.

Recap:

1. We are given a query as a list of words.

2. We calculate a query vector, q from this list of words

3. We are given a list of documents, for each we calculate document vector, d

4. for each d, we compute the cosine similarity with q as follows:

  q·d
-------
|q|·|d|

Which is equivalent to: unit_vec(q) * unit_vec(d)

1. How do we calculate q from list of words in a query (1nc.ltc)?

Weight for a single word: (1 + log(tf)) * log(N / df)

After obtaining a vec of weights, vw, get unit_vec(vw)

recap:

• df: document freq,
• N: query size,
1. How do we calculate d from list of words in a document (1nc.ltc)?

Weight for a single word: 1 + log(tf)

After obtaining a vec of weights, vw, get unit_vec(vw)