Top Banner
© CvR 1 The Geometry of IR Keith van Rijsbergen Tampere 15 th August, 2002 (lost in Hilbert space!)
28

© CvR1 The Geometry of IR Keith van Rijsbergen Tampere 15 th August, 2002 (lost in Hilbert space!)

Mar 28, 2015

Download

Documents

Jordan McCall
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: © CvR1 The Geometry of IR Keith van Rijsbergen Tampere 15 th August, 2002 (lost in Hilbert space!)

© CvR 1

The Geometry of IR

Keith van Rijsbergen

Tampere 15th August, 2002(lost in Hilbert space!)

Page 2: © CvR1 The Geometry of IR Keith van Rijsbergen Tampere 15 th August, 2002 (lost in Hilbert space!)

© CvR 2

“Unscripted” comments I

StatesObservablesMeasurement

=> Reality?Projection PostulatesCognitive State Changes

Page 3: © CvR1 The Geometry of IR Keith van Rijsbergen Tampere 15 th August, 2002 (lost in Hilbert space!)

© CvR 3

“Unscripted” comments II (quoting John von Neumann)

However, all quantum mechanical probabilities are defined byinner products of vectors. Essentially if a state of a system is givenby one vector, the transition probability in another state is theinner product of the two which is the square of the angle betweenthem. In other words, probability correspond precisely to intro-ducing the angles geometrically. Furthermore, there is only oneway to introduce it. The more so because in the quantum mechanicalmachinery the negation of a statement, so the negation of a statementwhich represented by a linear set of vectors, correponds to theorthogonal complement of this linear space.

Unsolved problems in mathematics, typescript, September, 1954

Page 4: © CvR1 The Geometry of IR Keith van Rijsbergen Tampere 15 th August, 2002 (lost in Hilbert space!)

© CvR 4

What is this talk about?

Not about quantum computation.see Nielsen and Chuang, CUP, 2000

Not about Logicsee Engesser and Gabbay, AI, 2002

• History (von Neumann, Dirac, Schroedinger)• Motivation (complementarity)• Duality (Syntax/Semantics)• Measurement (Incompatibility)• Projections (subspaces)• Probability (inner products)• IR application (feedback, clusters, ostension)

Page 5: © CvR1 The Geometry of IR Keith van Rijsbergen Tampere 15 th August, 2002 (lost in Hilbert space!)

© CvR 5

Page 6: © CvR1 The Geometry of IR Keith van Rijsbergen Tampere 15 th August, 2002 (lost in Hilbert space!)

© CvR 6

Page 7: © CvR1 The Geometry of IR Keith van Rijsbergen Tampere 15 th August, 2002 (lost in Hilbert space!)

© CvR 7

Images not Text: how might thatmake a difference?

no visual keywords (yet)- tf/idf issue

aboutness revisable (eg Maron)relevance revisable (eg Goffman)feedback requires salienceaboutness -> relevance -> aboutness

Page 8: © CvR1 The Geometry of IR Keith van Rijsbergen Tampere 15 th August, 2002 (lost in Hilbert space!)

© CvR 8

This is not new!

Goffman, 1969: ‘..that the relevance of the informationfrom one document depends upon what is already knownabout the subject, and in turn affects the relevance of otherdocuments subsequently examined.’

Maron, : ‘Just because a document is about the subject sought by a patron, that fact does not imply that he wouldjudge it relevant.’

Page 9: © CvR1 The Geometry of IR Keith van Rijsbergen Tampere 15 th August, 2002 (lost in Hilbert space!)

© CvR 9

Maron’s theory of indexing

…..in the case where the query consists of singleterm, call it B, the probability that a given documentwill be judged relevant by a patron submitting Bis simply the ratio of the number of patrons who submitB as their query and judge that document as relevant,to the number of patrons, who submit B as their search query

Page 10: © CvR1 The Geometry of IR Keith van Rijsbergen Tampere 15 th August, 2002 (lost in Hilbert space!)

© CvR 10

JPEG Image

In 1949 D.M Mackay wrote a paper ‘Quantalaspects of scientific information’, SER, vol 41, no.314,in which he alluded to using the quantum mechanicsparadigm to IR

Page 11: © CvR1 The Geometry of IR Keith van Rijsbergen Tampere 15 th August, 2002 (lost in Hilbert space!)

© CvR 11

Expectation Catalogue

It (-function) is now the means for predicting probability ofmeasurement results. In it is embodied themomentarily-attained sum of theoretically basedfuture expectation, somewhat as laid down in acatalogue. It is the relation-and-determinacy-bridgebetween measurements and measurements......It is, in principle, determined by a finite number of suitablychosen measurement on the object.....Thus the catalogueof expectations is initially compiled.

Schrödinger, 1935 &1980

Page 12: © CvR1 The Geometry of IR Keith van Rijsbergen Tampere 15 th August, 2002 (lost in Hilbert space!)

© CvR 12

Hypotheses

Cluster Hypothesis: closely associated documents tend to berelevant to the same requests. (1971)

[co-ordination is positively correlated with external relevance,Jackson, 1969]

Association Hypothesis: If an index term is good at discriminatingrelevant from non-relevant documents then any closely associatedindex term is also likely to be good at this. (1979)

[co-occurrence of terms within documents is a suitable measureof similarity between terms, Jackson,1971]

Page 13: © CvR1 The Geometry of IR Keith van Rijsbergen Tampere 15 th August, 2002 (lost in Hilbert space!)

© CvR 13

Navigation - Browsing

T-space

D-space

Page 14: © CvR1 The Geometry of IR Keith van Rijsbergen Tampere 15 th August, 2002 (lost in Hilbert space!)

© CvR 14

DUALITY

Direct file/Inverted file Statespace/Space of Projections

d’ = (x,y,z,u,v,w)d” =(u,v,w,k,l,m)

[[u]] = {d’,d”}; [[x]] = {d’}; [[m]] = {d”}

Boolean Logic: [[ux]] = {d”}; [[xm]] ={d’,d’’}Quantum Logic: [[ux]] = same; [[xm]] = different

Page 15: © CvR1 The Geometry of IR Keith van Rijsbergen Tampere 15 th August, 2002 (lost in Hilbert space!)

© CvR 15

The mathematics you need

Hilbert space (complex!!!)

•inner product <x|y>•norms ||x||2 = <x|x>•operator (linear) <x|A|y>•Hermitian A*=A•trace tr(A) =aii

•eigenvalues Ax = x

Page 16: © CvR1 The Geometry of IR Keith van Rijsbergen Tampere 15 th August, 2002 (lost in Hilbert space!)

© CvR 16

Crash course on Dirac notation

|x> : vector (called ket)<x| = |x>*: functional (bra)

<x|y> = (row vector)(column vector)= xi*yi

|x><y| : linear operator|x><x| : a projector onto ray xtr(|x><y|) = <x|y>

I = |i><i| : universal projector

Page 17: © CvR1 The Geometry of IR Keith van Rijsbergen Tampere 15 th August, 2002 (lost in Hilbert space!)

© CvR 17

Hierarchy of ProjectorsP0 = Pn = I

P1 = |1><1|P2 = |1><1| +|2><2|

.

.

.Pn = |1><1| +…+|n><n|

Page 18: © CvR1 The Geometry of IR Keith van Rijsbergen Tampere 15 th August, 2002 (lost in Hilbert space!)

© CvR 18

Summary

Relevance/Aboutness

Documents

Queries

Observables

Operators

State function

Operators can be applied to state function; andoperators can be decomposed into projectors.

A = aiPi

Page 19: © CvR1 The Geometry of IR Keith van Rijsbergen Tampere 15 th August, 2002 (lost in Hilbert space!)

© CvR 19

‘That is the relevance or irrelevance ofa given retrieved document may affectthe user’s current state of knowledgeresulting in a change of the user’sinformation need, which may lead toa change of the user’s perception/interpretation of the subsequentretrieved documents….’ Borlund, 2000

Page 20: © CvR1 The Geometry of IR Keith van Rijsbergen Tampere 15 th August, 2002 (lost in Hilbert space!)

© CvR 20

T T

T

RR

Y

N

Y

N

N

Y

Relevance/Aboutnes is

Interaction/User dependent

Page 21: © CvR1 The Geometry of IR Keith van Rijsbergen Tampere 15 th August, 2002 (lost in Hilbert space!)

© CvR 21

probability as inner product

|t><t||r><r||t><t| =|t> <t|r><r|t> <t| = <t|r><r|t> |t><t| = |<t|r>|2 |t><t| = cos2 |t><t| (in real Hilbert space)

Page 22: © CvR1 The Geometry of IR Keith van Rijsbergen Tampere 15 th August, 2002 (lost in Hilbert space!)

© CvR 22

|r=1>

|t=1>

|t=0>

|r=0> x

Page 23: © CvR1 The Geometry of IR Keith van Rijsbergen Tampere 15 th August, 2002 (lost in Hilbert space!)

© CvR 23

An operator T is of trace-class provided that T is positive(<x|T|x> 0, x) and trace of T is finite (<ei|T|ei>)

T is a density operator if T is trace-class and tr(T) = 1

T = aiPi is a density operator if 0 ai and ai = 1

Page 24: © CvR1 The Geometry of IR Keith van Rijsbergen Tampere 15 th August, 2002 (lost in Hilbert space!)

© CvR 24

Theorem

Letbe any measure on the closed subspaces of aseparable (real or complex) Hilbert space H of dimensionat least 3. There exists a positive self-adjoint operator Tof trace class such that, for all closed subspace L of H,

(L) = Tr(TPL)

If is to be a probability measure, thus requiring that(H) = 1, then Tr(T) =1, that is, T is a density operator.

Page 25: © CvR1 The Geometry of IR Keith van Rijsbergen Tampere 15 th August, 2002 (lost in Hilbert space!)

© CvR 25

Conditional Probability

P(LA|LB) = tr(PBDPBPA) / tr(DPB)

Note that PA could be E -> F

Page 26: © CvR1 The Geometry of IR Keith van Rijsbergen Tampere 15 th August, 2002 (lost in Hilbert space!)

© CvR 26

What is T? – without blinding you with science

-Relevance Feedback ( a mixture with log weights)-Pseudo relevance feedback (a mixture with similarity weights)-Clustering (superposition of members?)-Ostension (a history)

Page 27: © CvR1 The Geometry of IR Keith van Rijsbergen Tampere 15 th August, 2002 (lost in Hilbert space!)

© CvR 27

Conclusions?

Is it worth it? Does it matter?

- images- logic/probability/information/vectors- language

Page 28: © CvR1 The Geometry of IR Keith van Rijsbergen Tampere 15 th August, 2002 (lost in Hilbert space!)

© CvR 28

Useful ReferencesReadings in Information Retrieval,Morgan Kaufman, Edited by Sparck Jones and Willett

Advances in Information Retrieval: Recent Research from CIIR, Edited by Bruce Croft.

Information Retrieval: Uncertainty and Logics,Advanced Models for the Representation and Retrieval of Information, Edited by Crestani, Lalmas, Van Rijsbergen.

Finding out about, Richard Belew.