Value-driven Approach Designing Extended Data Warehouses
Laboratoire d’Informatique et d’Automatique pour les Systèmes
DOLAP’2019,Lisbon,Marsh26,2019
NabilaBERKANI&SelmaKHOURI
ESIAlgiers,Algeria
(n_berkani,s_khouri)@esi.dz
Ladjel BELLATRECHE
LIAS/ISAE-ENSMAPoitiers,France
CarlosORDONEZ
University ofHoustonUSA
Impact of Big Data on DW
2
2017-Present1999-2014 2015-Present 1998-2015
DaWak Conference DOLAPWorkshop
2016:Thinking
30 years of existence: Maturity
3
DataSourcesRequirements
Mappings
MultidimensionalModeling
Field
Origin
Year
Author
Discipline
Temp1
Temp2
Join
Filter
LoadtoDSA
LoadtoDSA
ExtractStore
Extract-Transform-Load
Variety
QuestofValue
Deployment
Extract
Exploitation
Store
Relational
MappingsSources/RequirementsInstanceExtractionDWSchemaDefinitionCross-phasedesign
Designers,DataPreparators,Architects Administrators,
“Deployers”
☛ ActorsoftheDesign
DataAnalysts
☛ ActorsofExploitation
1. Designlife-cyclewellidentified
2.DiversityofActors
àAugmentationofDWbyBigDataVs
Agenda
4
qValue&Variety(2Vs)
qAugmentingDWbyLinkedOpenData
q2Vs-drivenDesignApproach
qCaseStudy
qSummary
Value: # places
5
DecisionMakers
IntegrationN.W.Paton
Deployment Exploitation
Sources
Requirements
Queries,Statistics,…
Visualisation
Decision
Analysis
Valueà userfeedback:N.Konstantinou
§ Variety & Value§ LOD & DW§ 2Vs Design Approach§ Case Study§ Summary
CM
Value
Valueà offeredservices[D.Bork]
Valueà usageofmodernarchitectures:Teradata
Valueà newprogrammingparadigms:Spark
§ FR:Valueàmoney [A.G.Sutcliffe’2018]§ NfR:Valueà satisfactionofqualities(security,privacy,…)
àinterdependencies betweenvalue(phases)&value(operationalDW)
Valueà integrationofnewresources(LOD,…)
àRecenteffortsonbuildingvalueontologies:T.P.Sales,F.A.Baião,G.Guizzardi,J.P.A.Almeida,N.Guarino,J.Mylopoulos:TheCommonOntology
ofValueandRisk.ER2018:121-135
Value increases Variety
6
Person in Charge (PiC) of Value
VALEUR
q Examplesofrequirementsrelatedtovalue1 :• Media:Hasthecoverageofmediachangedovertime?
• Politics:SpeechesEUparliamentthatcontain« human
rights »bycountry
• Finance:EvolutionofDebatesrelatedtoGreececrisisby
countryq Measurementofthevaluedepends onthestudied
domain
➡ InteractionbetweendesignersandPiC ofvalue:multidisciplinary inDW
à Usage of Linked Open Data :• Traditional Management of Variety
+• Variety of Formalisms (graphs)
VARIETY
InternalSources ExternalSources+
Designer
LibrariesDB Newspapers
1http://www.talkofeurope.eu/data
§ Variety & Value§ LOD & DW§ 2Vs Design Approach§ Case Study§ Summary
Augmenting DW by 2Vs
7
DataSourcesRequirements
Mappings
MultidimensionalModeling
Field
Origin
Year
Author
Discipline
Temp1
Temp2
Join
Filter
LoadtoDSA
LoadtoDSA
ExtractStore
ETL(Variety)
Deployment(Variety)
Extract
Exploitation(Value)
MappingsSources/RequirementsInstanceExtractionDWSchemaDefinitionCross-phasedesign
Designers,DataPreparator,ArchitectsAdministrators,
“Deployer”
☛ ActorsoftheDesign
DataAnalysts,“PiC ofvalue”
☛ ActorsofExploitation2.DiversityofActors
High Variety of Sources || Global Processing
Storen
Relational
Store1
Graph
§ Variety & Value§ LOD & DW§ 2Vs Design Approach§ Case Study§ Summary
+
Formalisationq Inputs:1. Setofinternalsources:SInt ={SI1,SI2,…,SIm}2. Setofexternalresources:SExt =={SE1,SE2,…,SEn}3. Eachsource(internal/external)Si has:
§ Itsownphysicalformat(Fi)§ ItsconceptualmodelCMi
§ IsrelatedtoadisciplineD(medicine,engineering,etc.)
4. Setofrequirementstobesatisfied5. [Optional]:AnoperationalDW([Ravat etal.2017]),where:
§ ItsconceptualmodelCMDW
§ Itsformat(s)Format(SDW)={f1,f2,…,fk}à polystore storage
q Objective:§ DefinitionofallphasesofDWaugmentingitsvalue
q Challenges:§ MetricsofValue
Value(DW)=Operator(1≤i≤n+m)[Weight(Si ,D)*Value(Si)];Si ∈ Sint∪ Sext[Ballouetal.]*
*Ballou,D.P.,&Tayi,G.K.(1999).Enhancing dataquality indatawarehouse environments.Com oftheACM, 42(1),73-78
8
§ Variety & Value§ LOD & DW§ 2Vs Design Approach§ Case Study§ Summary
# Scenarios
(b)ParallelDesign (c)Query-drivenDesign
ETL
Users
Requirements
LOD
Synchronise
InternalSources
ETLLOD
ETL
LOD
Users
InternalSources
(a)SerialDesign
Requirements
DW
ETL
Requirements
LOD
InternalSources
Users
On-demandLODgraphs
materialized
GraphQuery
MDQuery
PoolofResults
QueryResults
visualized
Merge
Materialize
DW
LODCubeInternalCube
ETL-LOD
OLAPQueries
DataCube
ED
ExtractTemp1
Temp2
Join
Filter
LoadtoDSA
LoadtoDSAExtract
Store
InternalETL ExternalETLTemp1
Temp2
Join
Filter
LoadtoDSA
LoadtoDSAExtract
Store
9
3Scenarios
☛ LOD is seen as source ☛ Two Parallel ETL ☛ On-demand ETL: data extracted from the existing DW and LOD,then potentially loaded into DWà (requirement satisfaction)
§ Variety & Value§ LOD & DW§ 2Vs Design Approach§ Case Study§ Summary
1. Pivotschema: genericschemavs.LODschema(graph)2. Redefinitionofoperators(overloading)3. Synchronisation ofinternalandexternaldata:3scenarios
Challenges?
Value Metrics
qThreemetricsrelatedto:
1. Requirementsatisfaction
𝑽𝒂𝒍𝒖𝒆(𝑹𝒆𝒒, 𝑺𝒊) =𝒏𝒖𝒎𝒃𝒆𝒓𝒐𝒇𝒓𝒆𝒔𝒑𝒐𝒏𝒔𝒆𝒔𝒐𝒇𝒓𝒆𝒒𝒖𝒊𝒓𝒆𝒎𝒆𝒏𝒕𝒐𝒏𝑺𝒊𝒏𝒖𝒎𝒃𝒆𝒓𝒐𝒇𝒓𝒆𝒔𝒑𝒐𝒏𝒔𝒆𝒔𝒐𝒇𝒂𝒍𝒍𝒓𝒆𝒒𝒖𝒊𝒓𝒆𝒎𝒆𝒏𝒕𝒔
2. Conceptualmodelling(multidimensionalconcepts)
𝑽𝒂𝒍𝒖𝒆 𝑪𝒐𝒏𝒄𝒆𝒑𝒕𝒔, 𝑺𝒊 =𝒏𝒖𝒎𝒃𝒆𝒓𝒐𝒇𝒄𝒐𝒏𝒄𝒆𝒑𝒕𝒔𝒐𝒇𝑫𝑾𝒔𝒄𝒉𝒆𝒎𝒂𝒃𝒚𝒊𝒏𝒕𝒆𝒈𝒓𝒂𝒕𝒊𝒏𝒈𝑺𝒊
𝒕𝒐𝒕𝒂𝒍𝒏𝒖𝒎𝒃𝒆𝒓𝒐𝒇𝒕𝒉𝒆𝑫𝑾𝒄𝒐𝒏𝒄𝒆𝒑𝒕𝒔
3. TargetDWpopulation
𝑽𝒂𝒍𝒖𝒆(𝑰𝒏𝒔𝒕𝒂𝒏𝒄𝒆𝒔, 𝑺𝒊) =𝒏𝒖𝒎𝒃𝒆𝒓𝒐𝒇𝒊𝒏𝒔𝒕𝒂𝒏𝒄𝒆𝒔𝒐𝒇𝑫𝑾𝒃𝒚𝒊𝒏𝒕𝒆𝒈𝒓𝒂𝒕𝒊𝒏𝒈𝑺𝒊
𝒕𝒐𝒕𝒂𝒍𝒏𝒖𝒎𝒃𝒆𝒓𝒐𝒇𝒊𝒏𝒔𝒕𝒂𝒏𝒄𝒆𝒔𝒐𝒇𝒕𝒉𝒆𝑫𝑾
Value(DW) =Operator(1≤i≤n+m)[weight(Si,D)*Value(Si)],whereSi ∈ Sint∪ SExt
10
§ Variety & Value§ LOD & DW§ 2Vs Design Approach§ Case Study§ Summary
Case Study
11
§ 4internalsourcesgeneratedfromLUBMbenchmark§ 15initialrequirements
☛ UniversityResearchAnalysis
§ Variety & Value§ LOD & DW§ 2Vs Design Approach§ Case Study§ Summary
q Analysis:§ 6requirementsarenotsatisfiedbyinternalsources(Oracle12crelease1)
àExternalsource:Dbpedia
Experiments
12
MetricsSources
Dimensions/Measures
Value(S*)MD Value(S*)Req. Value(S*)Instances Instances Responsetime
Internal Sources 6/1 31% 6% 10% 550K 1.1Serial Design 10/7 71% 80% 94% 7,7x106 3.2ParallelDesign 11/8 73% 84% 85% 3,1x106 2.6Query-drivendesign 12/8 74% 96% 84% 2,9x106 1.7
§ Variety & Value§ LOD & DW§ 2Vs Design Approach§ Case Study§ Summary
*Allsourceshavethesameweight*Operator:Avg
AugmentedSchema
Summary✔2Vs for the DW renaissance✔Value = pool of multidisciplinary expertise✔DW life cycle design revisited (new formalization)✔3 augmented scenarios☛Veracity & 2V☛More automation (query rewriting)☛Value Query Language (Thank Patrick)
13
§ Variety & Value§ LOD & DW§ 2Vs Design Approach§ Case Study§ Summary
Special issue on: Business Intelligence and Analytics for Value Creation inthe Era of Big Data and Linked Open Data: International Journal ofInformation Management, Elsevier (Q1; IF=4.810)