Top Banner
Spring 2008 Progress Report SPR2008PR David Gleich and Ying Wang (with Margot Gerritsen and Amin Saberi too!) Library of Congress May 27 th or May 28 th
16

Spring 2008 Progress Report SPR2008PR David Gleich and Ying Wang (with Margot Gerritsen and Amin Saberi too!) Library of Congress May 27 th or May 28 th.

Dec 19, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Spring 2008 Progress Report SPR2008PR David Gleich and Ying Wang (with Margot Gerritsen and Amin Saberi too!) Library of Congress May 27 th or May 28 th.

Spring 2008 Progress ReportSPR2008PR

David Gleich and Ying Wang(with Margot Gerritsen and Amin Saberi too!)

Library of CongressMay 27th or May 28th

Page 2: Spring 2008 Progress Report SPR2008PR David Gleich and Ying Wang (with Margot Gerritsen and Amin Saberi too!) Library of Congress May 27 th or May 28 th.

Alternate Titles

Why LCSH is better than Wikipedia

Matching stuff to fluff

A novel quadratic programming framework for the network alignment problem.

Page 3: Spring 2008 Progress Report SPR2008PR David Gleich and Ying Wang (with Margot Gerritsen and Amin Saberi too!) Library of Congress May 27 th or May 28 th.

Outline

The matching problem and it’s myriad uses

Parsing wikipedia and LCSH for all of the data

Theories on subject ontologies (you probably know better)

Page 4: Spring 2008 Progress Report SPR2008PR David Gleich and Ying Wang (with Margot Gerritsen and Amin Saberi too!) Library of Congress May 27 th or May 28 th.

Last fall

Ying, Jeremy, Vinayak, and I spoke to a few of you about the similarities between LCSH and

Wikipedia categories.

We started working on ways of comparing these databases.

Page 5: Spring 2008 Progress Report SPR2008PR David Gleich and Ying Wang (with Margot Gerritsen and Amin Saberi too!) Library of Congress May 27 th or May 28 th.

From MARC to GRAPH

1. Concatenate subfields of 1xx tags for nodes.2. Use 550 and 551 tags for edges.3. Use 450 and 451 tags for alternate names.

...150 0 _aKlingon (Artificial language)450 0 _atlhIngan (Artificial language)550 0 _wg _aLanguages, Artificial...

Klingon (artificial language) Languages, Artificial

Page 6: Spring 2008 Progress Report SPR2008PR David Gleich and Ying Wang (with Margot Gerritsen and Amin Saberi too!) Library of Congress May 27 th or May 28 th.

LCSH

Overv

iew

(larg

est

connect

ed c

om

ponent)

PrivacyPrivacy, Right of

Privacy (Jewish Law)

Privacy (Islamic Law)Privacy (Canon Law)

Page 7: Spring 2008 Progress Report SPR2008PR David Gleich and Ying Wang (with Margot Gerritsen and Amin Saberi too!) Library of Congress May 27 th or May 28 th.

Wikipedia to GRAPH

see also

narrowerterm

Determinants Linear algebra

Page 8: Spring 2008 Progress Report SPR2008PR David Gleich and Ying Wang (with Margot Gerritsen and Amin Saberi too!) Library of Congress May 27 th or May 28 th.

Wikipedia ideas

Evaluate LCSH graph vs. WC graphTry and match LCSH with WC

... many more ideas …

Page 9: Spring 2008 Progress Report SPR2008PR David Gleich and Ying Wang (with Margot Gerritsen and Amin Saberi too!) Library of Congress May 27 th or May 28 th.

MATCHING

Matching means taking a node in LCSH and finding only one node in WC that is a good pair.

Most famous matching problem:stable marriage.

Page 10: Spring 2008 Progress Report SPR2008PR David Gleich and Ying Wang (with Margot Gerritsen and Amin Saberi too!) Library of Congress May 27 th or May 28 th.

Stable Marriage3

5

1

2

6

4

4

6

1

5

3

2

Angelina JolieBrad Pitt

David Gleich

Laura Bofferding

Slide approved by Laura Bofferding

2008 May 27

Page 11: Spring 2008 Progress Report SPR2008PR David Gleich and Ying Wang (with Margot Gerritsen and Amin Saberi too!) Library of Congress May 27 th or May 28 th.

Matching WC and LCSH

•LCSH and WC have short text labels; use the labels to come up with a set of potential links.

Graph A Graph B

Linear algebra Linear algebra

Linear functions

Algebra

Page 12: Spring 2008 Progress Report SPR2008PR David Gleich and Ying Wang (with Margot Gerritsen and Amin Saberi too!) Library of Congress May 27 th or May 28 th.

Matching with links

•How?

Graph A Graph B

Page 13: Spring 2008 Progress Report SPR2008PR David Gleich and Ying Wang (with Margot Gerritsen and Amin Saberi too!) Library of Congress May 27 th or May 28 th.

Matching without links

•Bipartite matching problem/stable marriage•Maximize the cardinality (number of pairs)

Graph A Graph B

Page 14: Spring 2008 Progress Report SPR2008PR David Gleich and Ying Wang (with Margot Gerritsen and Amin Saberi too!) Library of Congress May 27 th or May 28 th.

Matching with squares

•Enumerate squares•Maximize cardinality and squares

Graph A Graph B

i i' j j'

i

i'

j

j'

Page 15: Spring 2008 Progress Report SPR2008PR David Gleich and Ying Wang (with Margot Gerritsen and Amin Saberi too!) Library of Congress May 27 th or May 28 th.

Matching with squares

}1,0{

10 tosubject

maximizex

i

T

x

Ax

xe

}1,0{

10 tosubject

maximizex

i

TT

x

Ax

Sxxxe

Bipartite matching

Square matching

Polynomial

NP-Complete

Page 16: Spring 2008 Progress Report SPR2008PR David Gleich and Ying Wang (with Margot Gerritsen and Amin Saberi too!) Library of Congress May 27 th or May 28 th.