Relevance feedback formulae for IR
Pseudo relevance feedback
Assume / be given K relevant documents.
Compute Centroid for relevant & non relevant docs: Cr, Cnr.
Compute cosine similarity for Cr, Cnr and q => qr, qnr
Find the q where qr - qnr is maximized.
This is our new query.
results from this.
Recap:
We are given a query as a list of words.
We calculate a query vector, q from this list of words
We are given a list of documents, for each we calculate document vector, d
for each d, we compute the cosine similarity with q as follows:
q·d
-------
|q|·|d|
Which is equivalent to: unit_vec(q) * unit_vec(d)
- How do we calculate q from list of words in a query (
1nc.ltc
)?
Weight for a single word: (1 + log(tf)) * log(N / df)
After obtaining a vec of weights, vw, get unit_vec(vw)
recap:
- df: document freq, Difference between term / Collection, Document frequencies
- N: query size, idf-weighting scheme
- How do we calculate d from list of words in a document (
1nc.ltc
)?
Weight for a single word: 1 + log(tf)
After obtaining a vec of weights, vw, get unit_vec(vw)