Conceptual Design of ETL Processes for both Structured and Semi- structured Data Dimitrios Skoutas Alkis Simitsis {dskoutas,asimi}@dblab.ece.ntua.gr National Technical University of Athens Dept. of Electrical and Computer Engineering http:// www . db lab . ece.ntua.gr
37
Embed
Ontology-based Conceptual Design of ETL Processes for both Structured and Semi-structured Data
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Ontology-based Conceptual Design of ETL Processes for both Structured and Semi-
Introduction Graph-based Datastore Representation Application Ontology Construction and RepresentationApplication Ontology Construction and Representation Datastore Annotation ETL Transformations Conclusions
Outline
15
A suitable application ontology is constructed to model
the concepts of the domain
the relationships between those concepts
the attributes characterizing each concept
the different representation formats and (ranges of) values for each attribute
Application Ontology
16
The application ontology comprises a set of classes C = CC ∪ CT ∪ CG
CC : classes representing domain concepts CT : classes representing value types CG : classes representing aggregate functions
a set of properties P containing PP : properties representing attributes of concepts or
relationships between concepts property: convertsTo property: aggregates property: groups
Application Ontology
17
A graph representation specified for the ontology
Graph nodes represent classes in the ontology
Graph edges represent properties in the ontology
Different symbols are used for the different types of classes and properties
Ontology Graph
18
Ontology Graph
19
Reference example (cont’d) The application ontology graph
20
Introduction Graph-based Datastore Representation Application Ontology Construction and Representation Datastore AnnotationDatastore Annotation ETL Transformations Conclusions
Outline
21
The semantic annotation of each datastore consists in establishing the appropriate mappings between the datastore graph GS and the ontology graph GO.
Each internal node of GS may be mapped to one concept-node of GO.
A leaf node of GS may be mapped to one or more nodes of GO of the following types:
type-node format-node range-node aggregated-node
A node may have zero or more mappings.
Mappings are represented as node labels.
Datastore annotation
22
A defined class is created in the ontology for each internal labeled node of the datastore graph.
The definition for a node is constructed based on its neighbor labeled nodes.
A neighbor labeled node of n is each node n΄ such that: n΄ is labeled there is a path p in the datastore graph from node n to node n΄ p contains no other labeled nodes, except n and n΄
Datastore annotation
23
Reference example (cont’d) Datastore mappings
24
Reference example (cont’d) Datastore definitions
25
Introduction Graph-based Datastore Representation Application Ontology Construction and Representation Datastore Annotation ETL TransformationsETL Transformations Conclusions
Outline
26
Generic types of ETL transformations
ETL Transformations
27
Generating ETL transformations Two main steps
select relevant sources to populate each DW element
identify required data transformations between the sources and the DW