Page 1
AdaptiveSchemaDatabases
WilliamSpothb,Bahareh SadatArabi,EricS.Chano,DieterGawlicko,AdelGhoneimyo,BorisGlavici,BedaHammerschmidto,OliverKennedyb,
Seokki Leei,ZhenHua Liuo,XingNiui,YingYangb
b:UniversityatBuffalo i:IllinoisInst.Tech. o:Oracle
1
Page 2
AdaptiveSchemaDatabases
2
Page 3
Classicrelationaldatabase• Navigationalandorganizationalpurposeretaindiscovery,goodperformanceandspace,reusable.
3
Page 4
Classicrelationaldatabase• But...Highupfrontcostandinflexible
4
Page 5
BigData/NOSQL• Datacanbeusedimmediately.
5
Page 6
BigData/NOSQL• But...SacrificenavigationalandPerformancebenefitandmayendupwithduplicateofwork
6
Page 7
AdaptiveSchemaDatabases
Queriesandfeedback...
eventually
• BridgethegapbetweenrelationaldatabaseandNoSQl.
7
Page 8
AdaptiveSchemaDatabases
Queriesandfeedback...
eventually
• BridgethegapbetweenrelationaldatabaseandNoSQl.
8
Page 9
AdaptiveSchemaDatabases
Input:
Queries:SELECTnameFROMUndergradUNIONSELECTnameFROMGrad
SELECTdeg FROMGrad
SELECTnameFROMStudent
…
9
Page 10
Outline
Unstructured Data Semi-structed Data (e.g., JSON)
Extraction workflow
SchemaWorkspace
Schema Matching
Extraction workflow Extraction workflow
Extraction Schema Candidates
SchemaWorkspace
SchemaWorkspace
SchemaWorkspace
Queries + Feedback• Extractionanddiscovery• Adaptive,personalizedschemas
fromqueries• Explanationsandfeedback• Adaptiveorganization• Conclusionsandfuturework
10
Page 11
Extraction
Unstructured Data Semi-structed Data (e.g., JSON)
Extraction workflow
SchemaWorkspace
Schema Matching
Extraction workflow Extraction workflow
Extraction Schema Candidates
SchemaWorkspace
SchemaWorkspace
SchemaWorkspace
Queries + Feedback
11
Page 12
• ASDextractsschemacandidateset
Giveninput:
12
Extraction
Page 13
• ASDextractsschemacandidateset
Giveninput:
13
Extraction
Page 14
• ASDextractsschemacandidateset
Giveninput:
14
Extraction
Page 15
Extraction• ASDextractsschemacandidateset
Giveninput:
15
Page 16
• ASDextractsschemacandidateset
schemacandidatesetCext={Sext,Pext},whereSext isasetofcandidateschemas,Pext isaprobabilitydistributionovertheseschemas.
16
Discovery
Page 17
• ASDextractsschemacandidateset
Smax:thebestguessschema
17
Discovery
Page 18
Adaptive,personalizedschemasfromqueries
Unstructured Data Semi-structed Data (e.g., JSON)
Extraction workflow
SchemaWorkspace
Schema Matching
Extraction workflow Extraction workflow
Extraction Schema Candidates
SchemaWorkspace
SchemaWorkspace
SchemaWorkspace
Queries + Feedback
18
Page 19
• ASDmaintainsasetofschemaworkspacesW={W1,...,Wn}.
Initially,W={}
19
Adaptive,personalizedschemas
Page 20
• ASDmaintainsasetofschemaworkspacesW={W1,...,Wn}.
20
Query1:SELECTname FROMUndergrad UNIONSELECTname FROMGrad
FindingSchemasfromQueries
Page 21
• ASDmaintainsasetofschemaworkspacesW={W1,...,Wn}.
21
FindingSchemasfromQueries
Query1:SELECTname FROMUndergrad UNIONSELECTname FROMGrad
Page 22
• ASDmaintainsasetofschemaworkspacesW={W1,...,Wn}.
22
Query2:SELECTdeg FROMGrad
FindingSchemasfromQueries
Page 23
• ASDmaintainsasetofschemaworkspacesW={W1,...,Wn}.
Query3:SELECTnameFROMStudent
W1 =(S1={Undergrad(name)},P1=0.27),(S1={Grad(name)},P1=0.23),(S1={Undergrad(name), Grad(name)},P1=0.5)
23
SynthesizingTables
Page 24
Explanationsandfeedback
Unstructured Data Semi-structed Data (e.g., JSON)
Extraction workflow
SchemaWorkspace
Schema Matching
Extraction workflow Extraction workflow
Extraction Schema Candidates
SchemaWorkspace
SchemaWorkspace
SchemaWorkspace
Queries + Feedback
24
Page 25
Extractionerrorsappearinthreeforms:(1)AqueryincompatiblewithSmax
(2)AnupdatewithdatathatviolatesSmax(3)Anextractionerrorpresentedtouser
Weprovide:(1)explanationofresults(2)provenance(3)Warn theanalystwithambiguity(4)Explain theambiguity(5)Evaluate themagnitudeofambiguity(6)Assisttheanalysttoresolve theambiguity
Whatmightgowrong
25
Page 26
ASDinteractswiththeoutsideworld:Schema,Data,andUpdate.
Schemainteractions:WhenaqueryincompatiblewithSmax andtheworkspace
Datainteractions:provenanceforattributeandrowlevelambiguity.
Updateinteractions:• representschemamismatchesasmissingvalues.• resolvedataerrorswithaprobabilisticrepair.• upgradeherschematomatchthechanges.• checkpointherworkspaceandignorenewupdates.
Typesoferrors
26
Page 27
Explanationsandfeedback
Explanations:WematchStudentwith
bothGradandUndergrad
27
Condition2:Queryfromunknown schemaelements:SELECTnameFROMStudent
W1 =(S1={Undergrad(name)},P1=0.27),(S1={Grad(name)},P1=0.23),(S1={Undergrad(name), Grad(name)},P1=0.5)
Page 28
Adaptiveorganization
Unstructured Data Semi-structed Data (e.g., JSON)
Extraction workflow
SchemaWorkspace
Schema Matching
Extraction workflow Extraction workflow
Extraction Schema Candidates
SchemaWorkspace
SchemaWorkspace
SchemaWorkspace
Queries + Feedback
28
Page 29
AdaptiveorganizationTrade-offbetweenstoringdatainitsnativeformatandbasedonaspecificschema.
Whatisthechallenge?Manyworkspaces,addtabletotheschema,….
ChallengesandPossibleSolutions:• Wewantmultiplepersonalizedschemas
1.Relationalworkspaceschemaisessentiallyaviewoverrawdata.Materializingviewcanbeused.
2.Useexistingadaptivephysicaldesign andcaching techniques.• Sharedmaterializations
1.Incrementalmaterializedviewmaintenance.Leveragetechniquesfromrevisioncontrolsystems.
2.Viewselectionproblem.29
Page 30
Conclusionsandfuturework
ASDbridgesthegapbetweenrelationaldatabasesandNoSQL.
• Discovery:Helpuserexploreandunderstandnewdatabyprovidinganoutlineoftheavailableinformation.Done
• Materialization:Adoptworkonadaptivedatastructures.Partiallydone• DataSynthesis:Synthesisnewtablesandattributesfromexistingdata.
Done• ConflictResponse:
– Versioningorbranchingtheschema.– Loganalysistohelpusersassesstheimpactofschemarevisions.
30