Paolo Ferragina, Università di P isa XML Compression and Indexing Paolo Ferragina Dipartimento di Informatica, Università di Pisa [Joint with F. Luccio, G. Manzini, S. Muthukrishnan] The Future of Web Search Barcelona, May 2006 Under patenting by Pisa-Rutgers Univ.
25
Embed
Paolo Ferragina, Università di Pisa XML Compression and Indexing Paolo Ferragina Dipartimento di Informatica, Università di Pisa [Joint with F. Luccio,
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Paolo Ferragina, Università di Pisa
XML Compression and Indexing
Paolo FerraginaDipartimento di Informatica, Università di Pisa
[Joint with F. Luccio, G. Manzini, S. Muthukrishnan]
The Future of Web SearchBarcelona, May 2006
Under patenting byPisa-Rutgers Univ.
Paolo Ferragina, Università di Pisa
Compressed Permuterm Index
Paolo Ferragina, Rossano VenturiniDipartimento di Informatica, Università di Pisa
Under Y!-patenting
Paolo Ferragina, Università di Pisa
A basic problemGiven a dictionary D of strings, having variable length, design
a compressed data structure that supports
1) string id
2) Prefix(): find all strings in D that are prefixed by
3) Suffix(): find all strings in D that are suffixed by
4) Substring(): find all strings in D that contain
5) PrefixSuffix() = Prefix() Suffix()
IR book of Manning-Raghavan-Schutze
Tolerant Retrieval Problem (wildcards)Prefix() = *
Suffix() = *Substring() = **
PrefixSuffix() = *
Paolo Ferragina, Università di Pisa
A basic problemGiven a dictionary D of strings, having variable length, design
a compressed data structure that supports
1) string id
2) Prefix(): find all s in D that are prefixed by
3) Suffix(): find all s in D that are suffixed by
4) Substring(): find all s in D that contain
5) PrefixSuffix() = Prefix() Suffix()
Hashing
Not exact searches
Paolo Ferragina, Università di Pisa
A basic problemGiven a dictionary D of strings, having variable length, design
a compressed data structure that supports
1) string id
2) Prefix(): find all s in D that are prefixed by
3) Suffix(): find all s in D that are suffixed by
4) Substring(): find all s in D that contain
5) PrefixSuffix() = Prefix() Suffix()
(Compacted) Trie
Two versions: for D and for DR + Intersect answers No substring search (unless using Suffix Trie)
Need to store D for resolving edge-labels
Paolo Ferragina, Università di Pisa
A basic problemGiven a dictionary D of strings, having variable length, design
a compressed data structure that supports
1) string id
2) Prefix(): find all s in D that are prefixed by
3) Suffix(): find all s in D that are suffixed by
4) Substring(): find all s in D that contain
5) PrefixSuffix() = Prefix() Suffix()
Front coding...
Paolo Ferragina, Università di Pisa
Two versions: for D and for DR + Intersect answers Need some extra data structures for bucket identification