Top Banner
Information Distance More Applications
30

Information Distance More Applications. 1. Information Distance from a Question to an Answer.

Apr 01, 2015

Download

Documents

Halle Woller
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Information Distance More Applications. 1. Information Distance from a Question to an Answer.

Information Distance

More Applications

Page 2: Information Distance More Applications. 1. Information Distance from a Question to an Answer.

1. Information Distance from a Question to an

Answer

Page 3: Information Distance More Applications. 1. Information Distance from a Question to an Answer.

Question & Answer

Practical concerns: Partial matching does not satisfy triangle

inequality. When x is very popular, and y is not, x contains

a lot of irrelevant information w.r.t. y, then C(x|y) << C(y|x), and d(x,y) prefers y.

dmax does not satisfy universality. Neighborhood density -- there are answers that

are much more popular than others. Nothing to compress: a question and an

answer.

Page 4: Information Distance More Applications. 1. Information Distance from a Question to an Answer.

Partial matching

Triangle inequality does not hold:

d(man,horse) ≥ d(man, centaur) + d(centaur, horse)

Page 5: Information Distance More Applications. 1. Information Distance from a Question to an Answer.

Separate Irrelevant Information In max theory, we wanted smallest p, converting

x,y:

Now let’s remove redundant information from p:

We now wish to minimize q+s+t.

st

x p y

x q y

Page 6: Information Distance More Applications. 1. Information Distance from a Question to an Answer.

The Min Theory (Li, Int’l J. TCS, 2007, Zhang et al, KDD’2007)

Emin (x,y) = smallest program p needed to convert between x and y, but keeping irrelevant information out from p.

Formally: Emin (x,y) =min{|p| : U(x,p,r)=y, U(y,p,q)=x, |p|+|q|+|r| ≤

E(x,y) }

All other development similar to E(x,y). Define: min {C(x|y), C(y|x) }

dmin (x,y) = ----------------------- min {C(x),C(y)}

Fundamental Theorem II: Emin (x,y) = min { C(x|y), C(y|x) }

Page 7: Information Distance More Applications. 1. Information Distance from a Question to an Answer.

Other propertiesTheorem 1. dmin(x,y) ≤ dmax(x,y)

Theorem 2. dmin(x,y) is universal, does not satisfy triangle inequality is symmetric has required density properties: good guys

have more neighbors.

Page 8: Information Distance More Applications. 1. Information Distance from a Question to an Answer.

How to approximate dmax(x,y), dmin(x,y)

Each term C(x|y) may be approximated by one of the following:

1. Compression.2. Shannon-Fano code (Cilibrasi, Vitanyi): an

object with probability p may be encoded by –logp + 1 bits.

3. Mixed usage of (1) and (2) – in question and answer application. This is especially useful for Q&A systems.

Page 9: Information Distance More Applications. 1. Information Distance from a Question to an Answer.

Shannon-Fano Code Consider n symbols 1,2, …, N, with decreasing

probabilities: p1 ≥ p2 ≥, … ≥ pn. Let Pr=∑i=1..rpi. The binary code E(r) for r is obtained by truncating the binary expansion of Pr at length |E(r)| such that

- log pr ≤ |E(r)| < -log pr +1 Highly probably symbols are mapped to shorter

codes, and 2-|E(r)| ≤ pr < 2-|E(r)|+1

Near optimal: Let H = -∑rprlogpr --- the average number of bits needed to encode 1…N. Then we have

- ∑rprlogpr ≤ H < ∑r (-log pr +1)pr = 1 - ∑rprlogpr

Page 10: Information Distance More Applications. 1. Information Distance from a Question to an Answer.

Query-Answer System X. Zhang, Y. Hao, X. Zhu, M. Li, KDD’2007

Adding conditions to normalized information distance, we built a Query-Answer system.

The information distance naturally measures Good pattern matches – via compression Frequently occurring items – via Shannon-Fano

code Mixed usage of the above two.

Page 11: Information Distance More Applications. 1. Information Distance from a Question to an Answer.

Some comparisons

0% 10% 20% 30% 40% 50% 60% 70% 80%

QUANTA

AskED

NSIR

DFKI

Start

ASU

BrainBoost

Ask

LCC

Google

Yahoo

MSN Mean RAR

Top 1 Hit Rate

1. Benchmark is based on a common factoid QA test set of 109 questions that are provided by Text Retrieval Conference (TREC) sponsored by National Institute of Standard and Technology (NIST).

2. MRAR = (∑C(i)/i)/N = (C1 + C2*0.5 + C3*0.33 + C4*0.25 + C5*0.2)/109

Page 12: Information Distance More Applications. 1. Information Distance from a Question to an Answer.

2. Parameter-Free Data Mining (Keogh, Lonadi, Ratanamahatana, KDD’04)

Most data mining algorithms require setting many input parameters. Parameter-laden methods have 2 dangers: Wrong parameter setting leads to errors. Over fitting causes more problems.

Data mining algorithms should have ideally no parameters – thus, no prejudices, expectations, or presumptions.

Compared (a variant of) information distance with every time serious distance (51 of them) appeared in SIGKDD, SIGMOD, ICDM, ICDE, VLDB, ICML, SSDB, PKDD, PADDD during the previous decade.

Page 13: Information Distance More Applications. 1. Information Distance from a Question to an Answer.

Experiment on 18 pairs of time series(length 1000 each):

Q = correct /total

• Inf. Dist. Q=1• ¾ of measures, Q=0• Best of them, HMM, Q=0.33• Data cover: finance, science, medicine, industry• Data: all in SAX format

Page 14: Information Distance More Applications. 1. Information Distance from a Question to an Answer.

Anomaly detection (KLR, KDD’04) – algorithm: use divide and conquer to find a region that have large inf. dist. from other parts.

* All other methods produced wrong results – different from cardiologists.

Page 15: Information Distance More Applications. 1. Information Distance from a Question to an Answer.

Anomaly Detection (KLR KDD’04)

Page 16: Information Distance More Applications. 1. Information Distance from a Question to an Answer.

3. Identifying Multiword Expressions (Fan Bu, Tsinghua University)

Multiword expressions (MWEs) appear frequently and ungrammatically in English.

An MWE is a sequence of neighboring words whose meaning cannot be derived from the meaning of its components. Example: Kolmogorov complexity

Automatically identifying MWEs is a major challenge in computational linguistics.

Page 17: Information Distance More Applications. 1. Information Distance from a Question to an Answer.

Distance from an n-gram to its semantics Given an n-gram, let me define the

“semantics” of the n-gram to be the set of all web pages containing all the words in the n-gram.

The (plain) information distance simplifies to:

Dmax(n-gram g, its semantics) =

log (#pages(g)/#pages(g’s semantics))

Page 18: Information Distance More Applications. 1. Information Distance from a Question to an Answer.

Experiment on1529 idioms, 1798 compositional phrases

Page 19: Information Distance More Applications. 1. Information Distance from a Question to an Answer.

Complex name entity extraction:

Page 20: Information Distance More Applications. 1. Information Distance from a Question to an Answer.

4. Texture classification (Campana & Keogh, 2010)

Why are we interested in this

Page 21: Information Distance More Applications. 1. Information Distance from a Question to an Answer.

Their method Used, as information distance:

d(x,y) = [K(x|y) + K(y|x) ] / K(xy)

MPEG-1 format for images x, y. To do K(x|y), they created a movie of a pair of frames x and y. They used MPEG video compression (lossy).

Page 22: Information Distance More Applications. 1. Information Distance from a Question to an Answer.

Note that the algorithm has no access to color or shape information, this clustering is based only on texture.

Dictionnaire D'Histoire Naturelle by Charles Orbigny. 1849

Page 23: Information Distance More Applications. 1. Information Distance from a Question to an Answer.

 Ornaments from the Hand-Press period. The particularity of this period is the use of block of wood to print the ornaments on the books. The specialist historians want to record the individual instances of ornament occurrence and to identify the individual blocks. This identification work is very useful to date the books and to authenticate outputs from some printing-houses and authors. Indeed, numerous editions published in the past centuries do not reveal on the title page their true origin. The blocks could be re-used to print several books, be exchanged between the printing-houses or duplicated in the case of damage. Mathieu Delalandre

The algorithm can handle very subtle differences.

Page 24: Information Distance More Applications. 1. Information Distance from a Question to an Answer.

Clovis  Egyptian 

Page 25: Information Distance More Applications. 1. Information Distance from a Question to an Answer.
Page 26: Information Distance More Applications. 1. Information Distance from a Question to an Answer.
Page 27: Information Distance More Applications. 1. Information Distance from a Question to an Answer.

5. Gene Expression Dynamics (Nykter et al, PNAS, 2008)

Macrophage (white blood cells) 94 microarrays, 9941 differentially expressed

genes. Computed Normalized information distance

between two time-point measurements. Observed the underlying dynamical network

of the macrophage exhibits criticality. Somebody figure out what this is about – and

present in class.

Page 28: Information Distance More Applications. 1. Information Distance from a Question to an Answer.

6. Tree from metabolic network (Nykter et al, Phy. Rev. Lett. 2008)

Metabolic networks of 107 organisms from KEGG.

Normalized information distance is computed between each pair and a tree was generated.

Page 29: Information Distance More Applications. 1. Information Distance from a Question to an Answer.

Red – BacteriaBlue – archaeaGreen -- eukaryotes

Page 30: Information Distance More Applications. 1. Information Distance from a Question to an Answer.

Phylogenetic Compression (Ane-Sanderson, Syst. Biol. 2005)

Several interesting questions from bioinformatics: Is the parsimony tree the best tree? DNA sequence compression by sequence only

can be optimal? The authors propose a scheme to encode

the tree and data simultaneously to minimize the descriptive complexity of tree(s) plus data. Better compression More economical than the parsimony tree.