Top Banner
Managing Structured Collections of Community Data Wolfgang Gatterbauer, Dan Suciu University of Washington, Seattle
21

Managing Structured Collections of Community Data Wolfgang Gatterbauer, Dan Suciu University of Washington, Seattle.

Dec 19, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Managing Structured Collections of Community Data Wolfgang Gatterbauer, Dan Suciu University of Washington, Seattle.

Managing Structured Collectionsof Community Data

Wolfgang Gatterbauer, Dan Suciu

University of Washington, Seattle

Page 2: Managing Structured Collections of Community Data Wolfgang Gatterbauer, Dan Suciu University of Washington, Seattle.

2

1: Flashcards

Page 3: Managing Structured Collections of Community Data Wolfgang Gatterbauer, Dan Suciu University of Washington, Seattle.

3

1: Flashcards

Page 4: Managing Structured Collections of Community Data Wolfgang Gatterbauer, Dan Suciu University of Washington, Seattle.

4

1: Flashcards

Page 5: Managing Structured Collections of Community Data Wolfgang Gatterbauer, Dan Suciu University of Washington, Seattle.

5

1: Flashcards

Computer Science Abbreviations: • 4NF• ACID• MVD• RAID• SQL• FPGA• FTL• ...

• Merge Sort• Two-phase locking• ...

Computer Science Concepts:

Page 6: Managing Structured Collections of Community Data Wolfgang Gatterbauer, Dan Suciu University of Washington, Seattle.

6

1: Flashcards

Page 7: Managing Structured Collections of Community Data Wolfgang Gatterbauer, Dan Suciu University of Washington, Seattle.

7

1: Flashcards

Texas DPS Motorcycle Operators Manual

Page 8: Managing Structured Collections of Community Data Wolfgang Gatterbauer, Dan Suciu University of Washington, Seattle.

8

2: Spaced Repetition

1 day 3 days 1 week 1 month 6 months

correct

incorrect

Ebbinghaus Forgetting Curve

Leitner System (Pimsleur's graduatedinterval recall)

Page 9: Managing Structured Collections of Community Data Wolfgang Gatterbauer, Dan Suciu University of Washington, Seattle.

9

2: Spaced Repetition

Page 10: Managing Structured Collections of Community Data Wolfgang Gatterbauer, Dan Suciu University of Washington, Seattle.

10

2: Spaced Repetition

Specialized Software• used by 3.000 schools • sold 500.000 times

Page 11: Managing Structured Collections of Community Data Wolfgang Gatterbauer, Dan Suciu University of Washington, Seattle.

11

3: A Community

myPairSpace.com

Page 12: Managing Structured Collections of Community Data Wolfgang Gatterbauer, Dan Suciu University of Washington, Seattle.

12

An example PairSpace scenario

Alice

Bob

Charlie

1. 2.3....

100.

pay/pagargo/ir

come/venir

hear/oir...

Spanish 1

?

What to return, how to present, how to query, and how to rank?

D. Charlie comes and searches for Spanish lessons

C. Bob adapts his copy of her original lesson

B. Bob searches and finds Alice's lesson

A. Alice inserts her first Spanish lesson1.

2.3....

100.

pay/pagargo/ir

come/venir

hear/oir...

Spanish 1

Spanish 1

1. 2.3....

100.

pay/pagargo/andar

come/venir

hear/oir...

Page 13: Managing Structured Collections of Community Data Wolfgang Gatterbauer, Dan Suciu University of Washington, Seattle.

13

Challenge 1

Alice• Alice's (original)

• Bob's (most recent)

• their intersection

• their union

• presenting the one conflicting tupleBob

Charlie

1. 2.3....

100.

pay/pagargo/ir

come/venir

hear/oir...

1. 2.3....

100.

pay/pagargo/andar

come/venir

hear/oir...

Spanish 1

Spanish 1

?

1: What to return?

How to inform the user about the structural variation in collections?

Page 14: Managing Structured Collections of Community Data Wolfgang Gatterbauer, Dan Suciu University of Washington, Seattle.

14

Challenge 2

Alice• lists of tuples

• lists lessons & example tuples

• majority vs diversity

• cluster collections into meta-collectionsBob

Charlie

1. 2.3....

100.

pay/pagargo/ir

come/venir

hear/oir...

1. 2.3....

100.

pay/pagargo/andar

come/venir

hear/oir...

Spanish 1

Spanish 1

?

2: How to present?

What are optimal "return structures" and their visual representation?

Page 15: Managing Structured Collections of Community Data Wolfgang Gatterbauer, Dan Suciu University of Washington, Seattle.

15

Challenge 3

Alice• Keyword-based

• Form-based

• Language-based

- varying trust

- given we search for collections

Bob

Charlie

1. 2.3....

100.

pay/pagargo/ir

come/venir

hear/oir...

1. 2.3....

100.

pay/pagargo/andar

come/venir

hear/oir...

Spanish 1

Spanish 1

?

3: How to search?

How to best (fast, easy) allow users to to express their search needs?

Page 16: Managing Structured Collections of Community Data Wolfgang Gatterbauer, Dan Suciu University of Washington, Seattle.

16

Challenge 4

Alice• Syntactic & semantic

similarity (across languages)

• Structure (items vs collection)

• Trust (vote- vs rule-based

• Provenance (on collections)

• Learning/Adjustment over time

Bob

Charlie

1. 2.3....

100.

pay/pagargo/ir

come/venir

hear/oir...

1. 2.3....

100.

pay/pagargo/andar

come/venir

hear/oir...

Spanish 1

Spanish 1

?

4: How to rank?

Page 17: Managing Structured Collections of Community Data Wolfgang Gatterbauer, Dan Suciu University of Washington, Seattle.

17

Overview of Challenges

Alice

Bob

Charlie

1. 2.3....

100.

pay/pagargo/ir

come/venir

hear/oir...

1. 2.3....

100.

pay/pagargo/andar

come/venir

hear/oir...

Spanish 1

Spanish 1

?

• New Challenges–Representation–Interface–Relevance measures

• Cross-Cutting Challenges–inconsistency/trust–non-monotonicy

(dynamic evolution)–uncertainty–provenance

Page 18: Managing Structured Collections of Community Data Wolfgang Gatterbauer, Dan Suciu University of Washington, Seattle.

18

Some promising solutions

(VLDB 2011)

MUD 2010

Sigmod 2010

VLDB 2009

• New Challenges–Representation–Interface–Relevance measures

• Cross-Cutting Challenges–inconsistency/trust–non-monotonicy

(dynamic evolution)–uncertainty–provenance

Page 19: Managing Structured Collections of Community Data Wolfgang Gatterbauer, Dan Suciu University of Washington, Seattle.

ACCGCAACGTATTATAGGCACGATATCTCG

19

Managing the human genome

ACCGCAACGTTATAGGCACGCTATATCG

ACCGCAACGTATTATAGGCACGCTATATCG

ACCGCAACGTATTAGGCACGATATCTCG

ACCGCAATTAGGCACGTACGATATCTCG

ACCGCAATTAGGGACGTACGATATCTCG

...

1:

2:

3:

4:

5:

1B:

Page 20: Managing Structured Collections of Community Data Wolfgang Gatterbauer, Dan Suciu University of Washington, Seattle.

ACCGCAACGTATTATAGGCACGATATCTCG

20

Managing the human genome

ACCGCAACGTTATAGGCACGCTATATCG

ACCGCAACGTATTATAGGCACGCTATATCG

ACCGCAACGTATTAGGCACGATATCTCG

insertion

inversion

deletion

translocation

ACCGCAATTAGGCACGTACGATATCTCG

ACCGCAATTAGGGACGTACGATATCTCG

...

1:

2:

3:

4:

5:

1B:

large-scale structural variations

SNP

singlenucleotidepolymorphism

Page 21: Managing Structured Collections of Community Data Wolfgang Gatterbauer, Dan Suciu University of Washington, Seattle.

21

• myPairSpace.com– one massive central repository for ce-learning needs– has the typical DM challenges of any community DB– new: management of collections and their evolution

• Then abstract and apply learned principles– data determines the structure– management of the human genome

("management" versus "scientific management")

The Vision