Top Banner
Adaptive Schema Databases William Spoth b , Bahareh Sadat Arab i , Eric S. Chan o , Dieter Gawlick o , Adel Ghoneimy o , Boris Glavic i , Beda Hammerschmidt o , Oliver Kennedy b , Seokki Lee i , Zhen Hua Liu o , Xing Niu i , Ying Yang b b: University at Buffalo i: Illinois Inst. Tech. o: Oracle 1
30

Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter

Oct 26, 2019

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter

AdaptiveSchemaDatabases

WilliamSpothb,Bahareh SadatArabi,EricS.Chano,DieterGawlicko,AdelGhoneimyo,BorisGlavici,BedaHammerschmidto,OliverKennedyb,

Seokki Leei,ZhenHua Liuo,XingNiui,YingYangb

b:UniversityatBuffalo i:IllinoisInst.Tech. o:Oracle

1

Page 2: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter

AdaptiveSchemaDatabases

2

Page 3: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter

Classicrelationaldatabase• Navigationalandorganizationalpurposeretaindiscovery,goodperformanceandspace,reusable.

3

Page 4: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter

Classicrelationaldatabase• But...Highupfrontcostandinflexible

4

Page 5: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter

BigData/NOSQL• Datacanbeusedimmediately.

5

Page 6: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter

BigData/NOSQL• But...SacrificenavigationalandPerformancebenefitandmayendupwithduplicateofwork

6

Page 7: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter

AdaptiveSchemaDatabases

Queriesandfeedback...

eventually

• BridgethegapbetweenrelationaldatabaseandNoSQl.

7

Page 8: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter

AdaptiveSchemaDatabases

Queriesandfeedback...

eventually

• BridgethegapbetweenrelationaldatabaseandNoSQl.

8

Page 9: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter

AdaptiveSchemaDatabases

Input:

Queries:SELECTnameFROMUndergradUNIONSELECTnameFROMGrad

SELECTdeg FROMGrad

SELECTnameFROMStudent

9

Page 10: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter

Outline

Unstructured Data Semi-structed Data (e.g., JSON)

Extraction workflow

SchemaWorkspace

Schema Matching

Extraction workflow Extraction workflow

Extraction Schema Candidates

SchemaWorkspace

SchemaWorkspace

SchemaWorkspace

Queries + Feedback• Extractionanddiscovery• Adaptive,personalizedschemas

fromqueries• Explanationsandfeedback• Adaptiveorganization• Conclusionsandfuturework

10

Page 11: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter

Extraction

Unstructured Data Semi-structed Data (e.g., JSON)

Extraction workflow

SchemaWorkspace

Schema Matching

Extraction workflow Extraction workflow

Extraction Schema Candidates

SchemaWorkspace

SchemaWorkspace

SchemaWorkspace

Queries + Feedback

11

Page 12: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter

• ASDextractsschemacandidateset

Giveninput:

12

Extraction

Page 13: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter

• ASDextractsschemacandidateset

Giveninput:

13

Extraction

Page 14: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter

• ASDextractsschemacandidateset

Giveninput:

14

Extraction

Page 15: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter

Extraction• ASDextractsschemacandidateset

Giveninput:

15

Page 16: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter

• ASDextractsschemacandidateset

schemacandidatesetCext={Sext,Pext},whereSext isasetofcandidateschemas,Pext isaprobabilitydistributionovertheseschemas.

16

Discovery

Page 17: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter

• ASDextractsschemacandidateset

Smax:thebestguessschema

17

Discovery

Page 18: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter

Adaptive,personalizedschemasfromqueries

Unstructured Data Semi-structed Data (e.g., JSON)

Extraction workflow

SchemaWorkspace

Schema Matching

Extraction workflow Extraction workflow

Extraction Schema Candidates

SchemaWorkspace

SchemaWorkspace

SchemaWorkspace

Queries + Feedback

18

Page 19: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter

• ASDmaintainsasetofschemaworkspacesW={W1,...,Wn}.

Initially,W={}

19

Adaptive,personalizedschemas

Page 20: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter

• ASDmaintainsasetofschemaworkspacesW={W1,...,Wn}.

20

Query1:SELECTname FROMUndergrad UNIONSELECTname FROMGrad

FindingSchemasfromQueries

Page 21: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter

• ASDmaintainsasetofschemaworkspacesW={W1,...,Wn}.

21

FindingSchemasfromQueries

Query1:SELECTname FROMUndergrad UNIONSELECTname FROMGrad

Page 22: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter

• ASDmaintainsasetofschemaworkspacesW={W1,...,Wn}.

22

Query2:SELECTdeg FROMGrad

FindingSchemasfromQueries

Page 23: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter

• ASDmaintainsasetofschemaworkspacesW={W1,...,Wn}.

Query3:SELECTnameFROMStudent

W1 =(S1={Undergrad(name)},P1=0.27),(S1={Grad(name)},P1=0.23),(S1={Undergrad(name), Grad(name)},P1=0.5)

23

SynthesizingTables

Page 24: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter

Explanationsandfeedback

Unstructured Data Semi-structed Data (e.g., JSON)

Extraction workflow

SchemaWorkspace

Schema Matching

Extraction workflow Extraction workflow

Extraction Schema Candidates

SchemaWorkspace

SchemaWorkspace

SchemaWorkspace

Queries + Feedback

24

Page 25: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter

Extractionerrorsappearinthreeforms:(1)AqueryincompatiblewithSmax

(2)AnupdatewithdatathatviolatesSmax(3)Anextractionerrorpresentedtouser

Weprovide:(1)explanationofresults(2)provenance(3)Warn theanalystwithambiguity(4)Explain theambiguity(5)Evaluate themagnitudeofambiguity(6)Assisttheanalysttoresolve theambiguity

Whatmightgowrong

25

Page 26: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter

ASDinteractswiththeoutsideworld:Schema,Data,andUpdate.

Schemainteractions:WhenaqueryincompatiblewithSmax andtheworkspace

Datainteractions:provenanceforattributeandrowlevelambiguity.

Updateinteractions:• representschemamismatchesasmissingvalues.• resolvedataerrorswithaprobabilisticrepair.• upgradeherschematomatchthechanges.• checkpointherworkspaceandignorenewupdates.

Typesoferrors

26

Page 27: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter

Explanationsandfeedback

Explanations:WematchStudentwith

bothGradandUndergrad

27

Condition2:Queryfromunknown schemaelements:SELECTnameFROMStudent

W1 =(S1={Undergrad(name)},P1=0.27),(S1={Grad(name)},P1=0.23),(S1={Undergrad(name), Grad(name)},P1=0.5)

Page 28: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter

Adaptiveorganization

Unstructured Data Semi-structed Data (e.g., JSON)

Extraction workflow

SchemaWorkspace

Schema Matching

Extraction workflow Extraction workflow

Extraction Schema Candidates

SchemaWorkspace

SchemaWorkspace

SchemaWorkspace

Queries + Feedback

28

Page 29: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter

AdaptiveorganizationTrade-offbetweenstoringdatainitsnativeformatandbasedonaspecificschema.

Whatisthechallenge?Manyworkspaces,addtabletotheschema,….

ChallengesandPossibleSolutions:• Wewantmultiplepersonalizedschemas

1.Relationalworkspaceschemaisessentiallyaviewoverrawdata.Materializingviewcanbeused.

2.Useexistingadaptivephysicaldesign andcaching techniques.• Sharedmaterializations

1.Incrementalmaterializedviewmaintenance.Leveragetechniquesfromrevisioncontrolsystems.

2.Viewselectionproblem.29

Page 30: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter

Conclusionsandfuturework

ASDbridgesthegapbetweenrelationaldatabasesandNoSQL.

• Discovery:Helpuserexploreandunderstandnewdatabyprovidinganoutlineoftheavailableinformation.Done

• Materialization:Adoptworkonadaptivedatastructures.Partiallydone• DataSynthesis:Synthesisnewtablesandattributesfromexistingdata.

Done• ConflictResponse:

– Versioningorbranchingtheschema.– Loganalysistohelpusersassesstheimpactofschemarevisions.

30