PS-3C A new ensemble modelling technique
PS-3CA new ensemble modellingtechnique
About Me
lsquoHead of BIrsquo Spilgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
Ensemble data Modellinghellip
rwerschkull
nllinkedincominrogierwerschkull
WHY
Another Ensemble
Wersquove got loads already
httptopofmindssewp2014ensemble-modeling-forelasningar-och-lars-ronnbackrwerschkull
nllinkedincominrogierwerschkull
13600
9970
37073 5
0
2000
4000
6000
8000
10000
12000
14000
16000
Data Vault modeling+data warehouse
Anchor modeling+data warehouse
Hyper agility +datawarehouse
Focal point modeling+data warehouse
Head version modeling+data warehouse
Search Hits on Google 31-8-2016
Ensemble
Popularity
Data Vault
Ran into Problems using
Head-Version
Anchor Modelling
rwerschkull
nllinkedincominrogierwerschkull
WHAT problemS
Photo My ownhelliphelliprwerschkull
nllinkedincominrogierwerschkull
problem
rwerschkull
nllinkedincominrogierwerschkull
Not Build for
BiG data lake
Data CentrICITY
Photo credit Lake Public Domain httpwwwwriteupsorgstar-trek-brent-spiner-datarwerschkull
nllinkedincominrogierwerschkull
lsquoData may first be stored in a
data lake so that it can be explored cleaned and prepared
If it can be structured in a relational format (basically rows and columns) and needs to be used frequently and kept highly secure it may go into a
data warehouse
If it stops being used frequently it may go back to a HDFS
(Hadoop Distributed File System)-based archiversquo
Data Centric data first THOMAS H DAVENPORT WALL STEET JOURNAL OF 3-6-2015
httpblogswsjcomcio20150603the-shift-to-a-new-data-architecture rwerschkull
nllinkedincominrogierwerschkull
Systems Like
rwerschkull
nllinkedincominrogierwerschkull
Data Flood
Photo credit Kurayba (httpswwwflickrcomphotos48503330N0828564454666 )
under cc licence (httpscreativecommonsorglicensesby-sa20)
rwerschkull
nllinkedincominrogierwerschkull
The possible resulthellip
Photo credit httpshighfiveexportswordpresscom201006253000-pieces-lego-mix-specialty-pieces-rare-pieces-bricks-blocks-
parts-more-ultimate-lot-of-lego-parts-pieces-lego-for-sale-lego-batman-lego-starwars-lego-technic-lego-minifigur
rwerschkull
nllinkedincominrogierwerschkull
But isNrsquot Data Vault v2
lsquomade for
Big data centric
systemsrsquorwerschkull
nllinkedincominrogierwerschkull
In
DV2you still
do thisin one go
Subject Oriented
Integrated
Time Variant Non-Volatile
EDW
rwerschkull
nllinkedincominrogierwerschkull
Coding =
A lot like modelling
Being Data Centric
conflictswith thecomplex Data
MODELLINGwork
httpxkcdcom844
problem
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Less Mature
JOINoptimizers
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Key-Value
Document
Column Family
NoSQL Databases
+SQL on Hadoop solutions
Do NOT like joins
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
This
REALLYcomplicates
AnchoRModeling
rwerschkull
nllinkedincominrogierwerschkull
And Personally
HUB
SAT
LINK
This one
too(Link Satellite)
HUB
HUB
SAT
SAT
rwerschkull
nllinkedincominrogierwerschkull
problem
rwerschkull
nllinkedincominrogierwerschkull
a) HASHING OF Business keys
Rolling
Stock Nr Datetime Sensor Id Value Concatenated Business Key
Key
Len MD5 Hash
Key
Len
8739
2015-01-22
013427 72A1_FINV 123 8739|2015-01-22 013427|72A1_FINV 34 86ae4c6b0e2e2d5a13a0d11440529aeb 32
8739
2015-01-22
013427 72A1_SLDET 100 8739|2015-01-22 013427|72A1_SLDET 35 51ce9bc292eef407bd7c91a52eebcf2e 32
8739
2015-01-22
013432 72A1_FINV 126 8739|2015-01-22 013432|72A1_FINV 34 9482a41c1fecc4c64b8c437af6cc85e8 32
8739
2015-01-22
013432
13A8_MW_UB
AT_VT 5 8739|2015-01-22 013432|13A8_MW_UBAT_VT 42 e4160914ee55ce0b93f87b23366a0ce3 32
8674
2015-01-22
013426 72A1_FINV 6 8674|2015-01-22 013426|72A1_FINV 34 fcb3e7c8c91e44ce396d908a4948ca65 32
8674
2015-01-22
013426
16A1_HSVER
OND 7 8674|2015-01-22 013426|16A1_HSVEROND 38 fe9098c8c291ad56af5c8afae5169196 32
Loses statistical Informationregarding the data distributionQuery optimizers do not like thishellip
Column family Document and Key-value databases need a
good (natural) sharding key for (partial) key-
lookups
Hashinghelliphellip
httpwwwebaytechblogcom20120814cassandra-data-modeling-best-practices-part-2rwerschkull
nllinkedincominrogierwerschkull
Surrogates keys require
centralized coordination
hellipand thus can impact the overall systemrsquos scalability and availability
A lot of MPP NoSQL databases simply do not have themhellip
B) Surrogate BuSINESS keys
rwerschkull
nllinkedincominrogierwerschkull
Then Some Inspiration
httproelantvoscomblogp=1119
rwerschkull
nllinkedincominrogierwerschkull
lsquoIn my opinion the answer lies in the adoption of the
persistent (Historical) Staging Area concept
(also known as Historical Staging or the History Area)
This basically adopts the fundamentals of a Data Warehousersquo
lsquoThe Historical Staging Area effectively lsquoactsrsquo as
Data Lake but in a better defined form as data deltas and
event datetimes are taken into accountrsquo
So
Time Variant
amp
Non Volatile
Subject Oriented
amp
Integrated
EDW
Subject Oriented Integrated
Time Variant Non-Volatile
EDW
rwerschkull
nllinkedincominrogierwerschkull
Could be a
Data LAKE
VirtualisedEnsemble
Tier
EDW
Time Variant
amp
Non Volatile
Subject Oriented
amp
Integrated
EDW
rwerschkull
nllinkedincominrogierwerschkull
How
Does PS-3C Work
rwerschkull
nllinkedincominrogierwerschkullPhoto credit Public Domain
StagingArea
EDWInformation
Marts
Focus of Current ensemble EDWrsquos
rwerschkull
nllinkedincominrogierwerschkull
Persistent StagingArea HSA =
Data LibraryEDW
Information Marts
Splitting the work
rwerschkull
nllinkedincominrogierwerschkull
Persitent
Staging
-
Concept
Context
Connector
Business
Concept
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
Identify source event stream Primary or Unique KeyUse source metadata for this
Automate the building of a PS lsquoaround this keyrsquo Take all columns
Historize using SCD-2 approach
Persistent Staging - how
rwerschkull
nllinkedincominrogierwerschkull
Entity levelUnique key
Functional Description
Delivering party
Owner Responsible
MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)
hellip
Persistent Staging Metadata-1
rwerschkull
nllinkedincominrogierwerschkull
Column level [Load Date Timestamp]
[Load End Date Timestamp]
[Deleted Flag] OR delete as new record
[Source system] on table file level (lowest possible)
Load End Date Timestamp possible but difficulthellipRequires updates
Persistent Staging Metadata-2
rwerschkull
nllinkedincominrogierwerschkull
ACID is possible in HIVE
ACID Makes Updates possibleBy registering updates as lsquonew datarsquo
Reconciliation compacting when idle at user command
Use ORC files
PLUS changing the HIVE configurationhellip
UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)
rwerschkull
nllinkedincominrogierwerschkull
HivePut semi structured data = variable columns in MAP data type
OR use Data storage type that supports schema-evolutionAVRO (ORC in development)
Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo
Schema can be different for every row
What about SEMI-STRUCTURED Data
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
3C - how
rwerschkull
nllinkedincominrogierwerschkull
Always starts with Conceptual data modeling
NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic
No Link Satellites
No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo
Explicit Helper entities
Like Data Vault(2) BUT
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
a UNIQUE Domain specific point of integration
hellipa business entity
hellipwithin itrsquos own domain
hellipdoes not necessarily need to be Enterprise Wide
Business Concept (BC)
rwerschkull
nllinkedincominrogierwerschkull
Why not lsquoenterprise widersquoCompany
Customer
Sales Customer
International
Sales Customer
Local
Sales Customer
Marketing Customer
Customer hellip
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description]
[Owner Responsible]
Column level [Load Date Timestamp]
[Source system] on table file level (lowest possible)
Business Concept Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example-Data
NSR-Station
NS-
Travelcard
NS-
Trainseries
Business Key
IC|855
IC|8852
Sp|7455
St|16050
hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
NS-
Traveller
Business Key
3528 0234 2073 1234
3528 0234 2073 5678
hellip
Business Key
CRM-RW123456
CRM-LAS224466
hellip
rwerschkull
nllinkedincominrogierwerschkull
Most easy entity to be virtualised(if performance allows)
No Hashing amp No surrogateBUSINESSKEYS
(not by default at least)
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Containts Context about a Concept In a historical way
hellipLike a Data Vault Satellite
Every CC belongs to only one BC
Seperate entity per source system table stream
Concept Context (CC)
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Concept Context Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
Example-dataBK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts META_Deleted_Ind
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0
Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0
Asa hellip hellip hellip hellip hellip hellip
NSR-Station
[adres]Source NSS1 table y
More difficult to be virtualisedDepends on semantic gap with source
But do make virtual when lsquostreaming datarsquo is necessary
Because we have PS layerExposing all columns not necessary
Refactoring is more easyhellip
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Relations between Concepts + Context
In a historical way
hellipMerger of Data Vault Link + Link Satellite
Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point
in time
Connector (C)
rwerschkull
nllinkedincominrogierwerschkull
Explicitly defining a driving key as metadatahellip
Gives business understanding
Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)
bull is not registered as a new lsquoconnectionrsquo
Connector Driving key
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Connector Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
Driving Key
NS-Travelcard
+Checkin timestamp
Example-Data
NSR-StationNS-
Travelcard
NS-
Trainseries
rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
BK_NSR-
Station-from
BK_NSR-
Station-to
BK_NS_
Trainseries
BK_NS-Travelcard Checkin
timestamp
Checkout
timestamp
META_Load_dts META_Load_end_dts
Asd Ut IC 855 3528 0234 2073
1234
5-4-2016 84932 5-4-2016 94012 6-4-2016
220000
31-12-9999
000000
Ut Asd IC 855 3528 0234 2073
1234
5-4-2016 181009 5-4-2016 185520 6-4-2016
220000
31-12-9999
000000
We add Two mandatory
HELPEREntities
Are we there yet No
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
To help switching from sources that are Tied together by technical (surrogate) keyshellip
To a Business Key based model
Itrsquos a LOOKUP table that translates the technical to the
Business Key
Business Alias
Example
NSR-Station
[valuation]Source NSS2 table p
[description]SourceNSS1 table x[adres]
Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
BA-NSS2
Key Lookup voor NSS1 source tables
Key Lookup voor NSS2 source tables
Example-data
NSR-Station
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
Key Lookup voor NSS1 source tables
Business Key NSS1_Surrogate_key
Ut 123522
Asd 666323
Asa 222443
hellip hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
Has a 1 on (01) relation with a Business Concept
More difficult to be virtualised Lookup table should be kept small
Therefore DO NOT do key lookup in Concept Context entity
Load generate together with BC
Preferably lsquoin memoryrsquo somehowhellip
BA important Details
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Integrate the validity timelines of Concept Contextsbelonging to a Business Concept
Like a Data Vault Point-in-time construct
But Mandatory
And with a clearly defined and performantapproach
BC-Timeline
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-Station
[valuation]Source NSS2 table p
[adres]Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BK_NSR-
Station
WOZ
waarde
Waarde
Ratingbureau X
META_Laad_dts META_Laad_eind_dts
Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959
Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959
Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000
BK_NSR-
Station
Combined_Load_dts Combined_Load_end_dts
Ut 5-6-2013 220000 1-1-2014 215959
Ut 1-1-2014 220000 1-1-2015 215959
Ut 1-1-2015 220000 4-7-2015 215959
Ut 4-7-2015 220000 1-3-2016 215959
Ut 1-3-2016 220000 31-12-9999 000000
Asd 5-6-2013 220000 31-12-9999 000000
BK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959
Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000
Asa hellip hellip hellip hellip hellip hellip
BCT
modelling
rwerschkull
nllinkedincominrogierwerschkull
What
Makes PS-3Ca Different Ensemble
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
rwerschkull
nllinkedincominrogierwerschkull
1) Explicitly Splitting The work
Data
+History
Subjects
+Integration
rwerschkull
nllinkedincominrogierwerschkull
2) NO HASHED BUSINeSS KEYS
or surrogate keys
httpwwwcannabisculturecomfilesimages6hashbrickJPG
Only
Concatenatedones
rwerschkull
nllinkedincominrogierwerschkull
3) Less
joinsRelation
+Technical validity timeline
+ Relation context
Together in one entityPhoto credit Public Domain
rwerschkull
nllinkedincominrogierwerschkull
4) Explicit
HelpEREntities
Business Alias
Business Component Timeline
+ explicitly define Driving key(s)
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
About Me
lsquoHead of BIrsquo Spilgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
Ensemble data Modellinghellip
rwerschkull
nllinkedincominrogierwerschkull
WHY
Another Ensemble
Wersquove got loads already
httptopofmindssewp2014ensemble-modeling-forelasningar-och-lars-ronnbackrwerschkull
nllinkedincominrogierwerschkull
13600
9970
37073 5
0
2000
4000
6000
8000
10000
12000
14000
16000
Data Vault modeling+data warehouse
Anchor modeling+data warehouse
Hyper agility +datawarehouse
Focal point modeling+data warehouse
Head version modeling+data warehouse
Search Hits on Google 31-8-2016
Ensemble
Popularity
Data Vault
Ran into Problems using
Head-Version
Anchor Modelling
rwerschkull
nllinkedincominrogierwerschkull
WHAT problemS
Photo My ownhelliphelliprwerschkull
nllinkedincominrogierwerschkull
problem
rwerschkull
nllinkedincominrogierwerschkull
Not Build for
BiG data lake
Data CentrICITY
Photo credit Lake Public Domain httpwwwwriteupsorgstar-trek-brent-spiner-datarwerschkull
nllinkedincominrogierwerschkull
lsquoData may first be stored in a
data lake so that it can be explored cleaned and prepared
If it can be structured in a relational format (basically rows and columns) and needs to be used frequently and kept highly secure it may go into a
data warehouse
If it stops being used frequently it may go back to a HDFS
(Hadoop Distributed File System)-based archiversquo
Data Centric data first THOMAS H DAVENPORT WALL STEET JOURNAL OF 3-6-2015
httpblogswsjcomcio20150603the-shift-to-a-new-data-architecture rwerschkull
nllinkedincominrogierwerschkull
Systems Like
rwerschkull
nllinkedincominrogierwerschkull
Data Flood
Photo credit Kurayba (httpswwwflickrcomphotos48503330N0828564454666 )
under cc licence (httpscreativecommonsorglicensesby-sa20)
rwerschkull
nllinkedincominrogierwerschkull
The possible resulthellip
Photo credit httpshighfiveexportswordpresscom201006253000-pieces-lego-mix-specialty-pieces-rare-pieces-bricks-blocks-
parts-more-ultimate-lot-of-lego-parts-pieces-lego-for-sale-lego-batman-lego-starwars-lego-technic-lego-minifigur
rwerschkull
nllinkedincominrogierwerschkull
But isNrsquot Data Vault v2
lsquomade for
Big data centric
systemsrsquorwerschkull
nllinkedincominrogierwerschkull
In
DV2you still
do thisin one go
Subject Oriented
Integrated
Time Variant Non-Volatile
EDW
rwerschkull
nllinkedincominrogierwerschkull
Coding =
A lot like modelling
Being Data Centric
conflictswith thecomplex Data
MODELLINGwork
httpxkcdcom844
problem
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Less Mature
JOINoptimizers
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Key-Value
Document
Column Family
NoSQL Databases
+SQL on Hadoop solutions
Do NOT like joins
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
This
REALLYcomplicates
AnchoRModeling
rwerschkull
nllinkedincominrogierwerschkull
And Personally
HUB
SAT
LINK
This one
too(Link Satellite)
HUB
HUB
SAT
SAT
rwerschkull
nllinkedincominrogierwerschkull
problem
rwerschkull
nllinkedincominrogierwerschkull
a) HASHING OF Business keys
Rolling
Stock Nr Datetime Sensor Id Value Concatenated Business Key
Key
Len MD5 Hash
Key
Len
8739
2015-01-22
013427 72A1_FINV 123 8739|2015-01-22 013427|72A1_FINV 34 86ae4c6b0e2e2d5a13a0d11440529aeb 32
8739
2015-01-22
013427 72A1_SLDET 100 8739|2015-01-22 013427|72A1_SLDET 35 51ce9bc292eef407bd7c91a52eebcf2e 32
8739
2015-01-22
013432 72A1_FINV 126 8739|2015-01-22 013432|72A1_FINV 34 9482a41c1fecc4c64b8c437af6cc85e8 32
8739
2015-01-22
013432
13A8_MW_UB
AT_VT 5 8739|2015-01-22 013432|13A8_MW_UBAT_VT 42 e4160914ee55ce0b93f87b23366a0ce3 32
8674
2015-01-22
013426 72A1_FINV 6 8674|2015-01-22 013426|72A1_FINV 34 fcb3e7c8c91e44ce396d908a4948ca65 32
8674
2015-01-22
013426
16A1_HSVER
OND 7 8674|2015-01-22 013426|16A1_HSVEROND 38 fe9098c8c291ad56af5c8afae5169196 32
Loses statistical Informationregarding the data distributionQuery optimizers do not like thishellip
Column family Document and Key-value databases need a
good (natural) sharding key for (partial) key-
lookups
Hashinghelliphellip
httpwwwebaytechblogcom20120814cassandra-data-modeling-best-practices-part-2rwerschkull
nllinkedincominrogierwerschkull
Surrogates keys require
centralized coordination
hellipand thus can impact the overall systemrsquos scalability and availability
A lot of MPP NoSQL databases simply do not have themhellip
B) Surrogate BuSINESS keys
rwerschkull
nllinkedincominrogierwerschkull
Then Some Inspiration
httproelantvoscomblogp=1119
rwerschkull
nllinkedincominrogierwerschkull
lsquoIn my opinion the answer lies in the adoption of the
persistent (Historical) Staging Area concept
(also known as Historical Staging or the History Area)
This basically adopts the fundamentals of a Data Warehousersquo
lsquoThe Historical Staging Area effectively lsquoactsrsquo as
Data Lake but in a better defined form as data deltas and
event datetimes are taken into accountrsquo
So
Time Variant
amp
Non Volatile
Subject Oriented
amp
Integrated
EDW
Subject Oriented Integrated
Time Variant Non-Volatile
EDW
rwerschkull
nllinkedincominrogierwerschkull
Could be a
Data LAKE
VirtualisedEnsemble
Tier
EDW
Time Variant
amp
Non Volatile
Subject Oriented
amp
Integrated
EDW
rwerschkull
nllinkedincominrogierwerschkull
How
Does PS-3C Work
rwerschkull
nllinkedincominrogierwerschkullPhoto credit Public Domain
StagingArea
EDWInformation
Marts
Focus of Current ensemble EDWrsquos
rwerschkull
nllinkedincominrogierwerschkull
Persistent StagingArea HSA =
Data LibraryEDW
Information Marts
Splitting the work
rwerschkull
nllinkedincominrogierwerschkull
Persitent
Staging
-
Concept
Context
Connector
Business
Concept
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
Identify source event stream Primary or Unique KeyUse source metadata for this
Automate the building of a PS lsquoaround this keyrsquo Take all columns
Historize using SCD-2 approach
Persistent Staging - how
rwerschkull
nllinkedincominrogierwerschkull
Entity levelUnique key
Functional Description
Delivering party
Owner Responsible
MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)
hellip
Persistent Staging Metadata-1
rwerschkull
nllinkedincominrogierwerschkull
Column level [Load Date Timestamp]
[Load End Date Timestamp]
[Deleted Flag] OR delete as new record
[Source system] on table file level (lowest possible)
Load End Date Timestamp possible but difficulthellipRequires updates
Persistent Staging Metadata-2
rwerschkull
nllinkedincominrogierwerschkull
ACID is possible in HIVE
ACID Makes Updates possibleBy registering updates as lsquonew datarsquo
Reconciliation compacting when idle at user command
Use ORC files
PLUS changing the HIVE configurationhellip
UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)
rwerschkull
nllinkedincominrogierwerschkull
HivePut semi structured data = variable columns in MAP data type
OR use Data storage type that supports schema-evolutionAVRO (ORC in development)
Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo
Schema can be different for every row
What about SEMI-STRUCTURED Data
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
3C - how
rwerschkull
nllinkedincominrogierwerschkull
Always starts with Conceptual data modeling
NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic
No Link Satellites
No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo
Explicit Helper entities
Like Data Vault(2) BUT
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
a UNIQUE Domain specific point of integration
hellipa business entity
hellipwithin itrsquos own domain
hellipdoes not necessarily need to be Enterprise Wide
Business Concept (BC)
rwerschkull
nllinkedincominrogierwerschkull
Why not lsquoenterprise widersquoCompany
Customer
Sales Customer
International
Sales Customer
Local
Sales Customer
Marketing Customer
Customer hellip
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description]
[Owner Responsible]
Column level [Load Date Timestamp]
[Source system] on table file level (lowest possible)
Business Concept Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example-Data
NSR-Station
NS-
Travelcard
NS-
Trainseries
Business Key
IC|855
IC|8852
Sp|7455
St|16050
hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
NS-
Traveller
Business Key
3528 0234 2073 1234
3528 0234 2073 5678
hellip
Business Key
CRM-RW123456
CRM-LAS224466
hellip
rwerschkull
nllinkedincominrogierwerschkull
Most easy entity to be virtualised(if performance allows)
No Hashing amp No surrogateBUSINESSKEYS
(not by default at least)
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Containts Context about a Concept In a historical way
hellipLike a Data Vault Satellite
Every CC belongs to only one BC
Seperate entity per source system table stream
Concept Context (CC)
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Concept Context Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
Example-dataBK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts META_Deleted_Ind
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0
Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0
Asa hellip hellip hellip hellip hellip hellip
NSR-Station
[adres]Source NSS1 table y
More difficult to be virtualisedDepends on semantic gap with source
But do make virtual when lsquostreaming datarsquo is necessary
Because we have PS layerExposing all columns not necessary
Refactoring is more easyhellip
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Relations between Concepts + Context
In a historical way
hellipMerger of Data Vault Link + Link Satellite
Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point
in time
Connector (C)
rwerschkull
nllinkedincominrogierwerschkull
Explicitly defining a driving key as metadatahellip
Gives business understanding
Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)
bull is not registered as a new lsquoconnectionrsquo
Connector Driving key
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Connector Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
Driving Key
NS-Travelcard
+Checkin timestamp
Example-Data
NSR-StationNS-
Travelcard
NS-
Trainseries
rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
BK_NSR-
Station-from
BK_NSR-
Station-to
BK_NS_
Trainseries
BK_NS-Travelcard Checkin
timestamp
Checkout
timestamp
META_Load_dts META_Load_end_dts
Asd Ut IC 855 3528 0234 2073
1234
5-4-2016 84932 5-4-2016 94012 6-4-2016
220000
31-12-9999
000000
Ut Asd IC 855 3528 0234 2073
1234
5-4-2016 181009 5-4-2016 185520 6-4-2016
220000
31-12-9999
000000
We add Two mandatory
HELPEREntities
Are we there yet No
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
To help switching from sources that are Tied together by technical (surrogate) keyshellip
To a Business Key based model
Itrsquos a LOOKUP table that translates the technical to the
Business Key
Business Alias
Example
NSR-Station
[valuation]Source NSS2 table p
[description]SourceNSS1 table x[adres]
Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
BA-NSS2
Key Lookup voor NSS1 source tables
Key Lookup voor NSS2 source tables
Example-data
NSR-Station
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
Key Lookup voor NSS1 source tables
Business Key NSS1_Surrogate_key
Ut 123522
Asd 666323
Asa 222443
hellip hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
Has a 1 on (01) relation with a Business Concept
More difficult to be virtualised Lookup table should be kept small
Therefore DO NOT do key lookup in Concept Context entity
Load generate together with BC
Preferably lsquoin memoryrsquo somehowhellip
BA important Details
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Integrate the validity timelines of Concept Contextsbelonging to a Business Concept
Like a Data Vault Point-in-time construct
But Mandatory
And with a clearly defined and performantapproach
BC-Timeline
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-Station
[valuation]Source NSS2 table p
[adres]Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BK_NSR-
Station
WOZ
waarde
Waarde
Ratingbureau X
META_Laad_dts META_Laad_eind_dts
Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959
Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959
Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000
BK_NSR-
Station
Combined_Load_dts Combined_Load_end_dts
Ut 5-6-2013 220000 1-1-2014 215959
Ut 1-1-2014 220000 1-1-2015 215959
Ut 1-1-2015 220000 4-7-2015 215959
Ut 4-7-2015 220000 1-3-2016 215959
Ut 1-3-2016 220000 31-12-9999 000000
Asd 5-6-2013 220000 31-12-9999 000000
BK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959
Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000
Asa hellip hellip hellip hellip hellip hellip
BCT
modelling
rwerschkull
nllinkedincominrogierwerschkull
What
Makes PS-3Ca Different Ensemble
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
rwerschkull
nllinkedincominrogierwerschkull
1) Explicitly Splitting The work
Data
+History
Subjects
+Integration
rwerschkull
nllinkedincominrogierwerschkull
2) NO HASHED BUSINeSS KEYS
or surrogate keys
httpwwwcannabisculturecomfilesimages6hashbrickJPG
Only
Concatenatedones
rwerschkull
nllinkedincominrogierwerschkull
3) Less
joinsRelation
+Technical validity timeline
+ Relation context
Together in one entityPhoto credit Public Domain
rwerschkull
nllinkedincominrogierwerschkull
4) Explicit
HelpEREntities
Business Alias
Business Component Timeline
+ explicitly define Driving key(s)
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
Ensemble data Modellinghellip
rwerschkull
nllinkedincominrogierwerschkull
WHY
Another Ensemble
Wersquove got loads already
httptopofmindssewp2014ensemble-modeling-forelasningar-och-lars-ronnbackrwerschkull
nllinkedincominrogierwerschkull
13600
9970
37073 5
0
2000
4000
6000
8000
10000
12000
14000
16000
Data Vault modeling+data warehouse
Anchor modeling+data warehouse
Hyper agility +datawarehouse
Focal point modeling+data warehouse
Head version modeling+data warehouse
Search Hits on Google 31-8-2016
Ensemble
Popularity
Data Vault
Ran into Problems using
Head-Version
Anchor Modelling
rwerschkull
nllinkedincominrogierwerschkull
WHAT problemS
Photo My ownhelliphelliprwerschkull
nllinkedincominrogierwerschkull
problem
rwerschkull
nllinkedincominrogierwerschkull
Not Build for
BiG data lake
Data CentrICITY
Photo credit Lake Public Domain httpwwwwriteupsorgstar-trek-brent-spiner-datarwerschkull
nllinkedincominrogierwerschkull
lsquoData may first be stored in a
data lake so that it can be explored cleaned and prepared
If it can be structured in a relational format (basically rows and columns) and needs to be used frequently and kept highly secure it may go into a
data warehouse
If it stops being used frequently it may go back to a HDFS
(Hadoop Distributed File System)-based archiversquo
Data Centric data first THOMAS H DAVENPORT WALL STEET JOURNAL OF 3-6-2015
httpblogswsjcomcio20150603the-shift-to-a-new-data-architecture rwerschkull
nllinkedincominrogierwerschkull
Systems Like
rwerschkull
nllinkedincominrogierwerschkull
Data Flood
Photo credit Kurayba (httpswwwflickrcomphotos48503330N0828564454666 )
under cc licence (httpscreativecommonsorglicensesby-sa20)
rwerschkull
nllinkedincominrogierwerschkull
The possible resulthellip
Photo credit httpshighfiveexportswordpresscom201006253000-pieces-lego-mix-specialty-pieces-rare-pieces-bricks-blocks-
parts-more-ultimate-lot-of-lego-parts-pieces-lego-for-sale-lego-batman-lego-starwars-lego-technic-lego-minifigur
rwerschkull
nllinkedincominrogierwerschkull
But isNrsquot Data Vault v2
lsquomade for
Big data centric
systemsrsquorwerschkull
nllinkedincominrogierwerschkull
In
DV2you still
do thisin one go
Subject Oriented
Integrated
Time Variant Non-Volatile
EDW
rwerschkull
nllinkedincominrogierwerschkull
Coding =
A lot like modelling
Being Data Centric
conflictswith thecomplex Data
MODELLINGwork
httpxkcdcom844
problem
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Less Mature
JOINoptimizers
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Key-Value
Document
Column Family
NoSQL Databases
+SQL on Hadoop solutions
Do NOT like joins
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
This
REALLYcomplicates
AnchoRModeling
rwerschkull
nllinkedincominrogierwerschkull
And Personally
HUB
SAT
LINK
This one
too(Link Satellite)
HUB
HUB
SAT
SAT
rwerschkull
nllinkedincominrogierwerschkull
problem
rwerschkull
nllinkedincominrogierwerschkull
a) HASHING OF Business keys
Rolling
Stock Nr Datetime Sensor Id Value Concatenated Business Key
Key
Len MD5 Hash
Key
Len
8739
2015-01-22
013427 72A1_FINV 123 8739|2015-01-22 013427|72A1_FINV 34 86ae4c6b0e2e2d5a13a0d11440529aeb 32
8739
2015-01-22
013427 72A1_SLDET 100 8739|2015-01-22 013427|72A1_SLDET 35 51ce9bc292eef407bd7c91a52eebcf2e 32
8739
2015-01-22
013432 72A1_FINV 126 8739|2015-01-22 013432|72A1_FINV 34 9482a41c1fecc4c64b8c437af6cc85e8 32
8739
2015-01-22
013432
13A8_MW_UB
AT_VT 5 8739|2015-01-22 013432|13A8_MW_UBAT_VT 42 e4160914ee55ce0b93f87b23366a0ce3 32
8674
2015-01-22
013426 72A1_FINV 6 8674|2015-01-22 013426|72A1_FINV 34 fcb3e7c8c91e44ce396d908a4948ca65 32
8674
2015-01-22
013426
16A1_HSVER
OND 7 8674|2015-01-22 013426|16A1_HSVEROND 38 fe9098c8c291ad56af5c8afae5169196 32
Loses statistical Informationregarding the data distributionQuery optimizers do not like thishellip
Column family Document and Key-value databases need a
good (natural) sharding key for (partial) key-
lookups
Hashinghelliphellip
httpwwwebaytechblogcom20120814cassandra-data-modeling-best-practices-part-2rwerschkull
nllinkedincominrogierwerschkull
Surrogates keys require
centralized coordination
hellipand thus can impact the overall systemrsquos scalability and availability
A lot of MPP NoSQL databases simply do not have themhellip
B) Surrogate BuSINESS keys
rwerschkull
nllinkedincominrogierwerschkull
Then Some Inspiration
httproelantvoscomblogp=1119
rwerschkull
nllinkedincominrogierwerschkull
lsquoIn my opinion the answer lies in the adoption of the
persistent (Historical) Staging Area concept
(also known as Historical Staging or the History Area)
This basically adopts the fundamentals of a Data Warehousersquo
lsquoThe Historical Staging Area effectively lsquoactsrsquo as
Data Lake but in a better defined form as data deltas and
event datetimes are taken into accountrsquo
So
Time Variant
amp
Non Volatile
Subject Oriented
amp
Integrated
EDW
Subject Oriented Integrated
Time Variant Non-Volatile
EDW
rwerschkull
nllinkedincominrogierwerschkull
Could be a
Data LAKE
VirtualisedEnsemble
Tier
EDW
Time Variant
amp
Non Volatile
Subject Oriented
amp
Integrated
EDW
rwerschkull
nllinkedincominrogierwerschkull
How
Does PS-3C Work
rwerschkull
nllinkedincominrogierwerschkullPhoto credit Public Domain
StagingArea
EDWInformation
Marts
Focus of Current ensemble EDWrsquos
rwerschkull
nllinkedincominrogierwerschkull
Persistent StagingArea HSA =
Data LibraryEDW
Information Marts
Splitting the work
rwerschkull
nllinkedincominrogierwerschkull
Persitent
Staging
-
Concept
Context
Connector
Business
Concept
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
Identify source event stream Primary or Unique KeyUse source metadata for this
Automate the building of a PS lsquoaround this keyrsquo Take all columns
Historize using SCD-2 approach
Persistent Staging - how
rwerschkull
nllinkedincominrogierwerschkull
Entity levelUnique key
Functional Description
Delivering party
Owner Responsible
MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)
hellip
Persistent Staging Metadata-1
rwerschkull
nllinkedincominrogierwerschkull
Column level [Load Date Timestamp]
[Load End Date Timestamp]
[Deleted Flag] OR delete as new record
[Source system] on table file level (lowest possible)
Load End Date Timestamp possible but difficulthellipRequires updates
Persistent Staging Metadata-2
rwerschkull
nllinkedincominrogierwerschkull
ACID is possible in HIVE
ACID Makes Updates possibleBy registering updates as lsquonew datarsquo
Reconciliation compacting when idle at user command
Use ORC files
PLUS changing the HIVE configurationhellip
UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)
rwerschkull
nllinkedincominrogierwerschkull
HivePut semi structured data = variable columns in MAP data type
OR use Data storage type that supports schema-evolutionAVRO (ORC in development)
Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo
Schema can be different for every row
What about SEMI-STRUCTURED Data
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
3C - how
rwerschkull
nllinkedincominrogierwerschkull
Always starts with Conceptual data modeling
NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic
No Link Satellites
No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo
Explicit Helper entities
Like Data Vault(2) BUT
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
a UNIQUE Domain specific point of integration
hellipa business entity
hellipwithin itrsquos own domain
hellipdoes not necessarily need to be Enterprise Wide
Business Concept (BC)
rwerschkull
nllinkedincominrogierwerschkull
Why not lsquoenterprise widersquoCompany
Customer
Sales Customer
International
Sales Customer
Local
Sales Customer
Marketing Customer
Customer hellip
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description]
[Owner Responsible]
Column level [Load Date Timestamp]
[Source system] on table file level (lowest possible)
Business Concept Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example-Data
NSR-Station
NS-
Travelcard
NS-
Trainseries
Business Key
IC|855
IC|8852
Sp|7455
St|16050
hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
NS-
Traveller
Business Key
3528 0234 2073 1234
3528 0234 2073 5678
hellip
Business Key
CRM-RW123456
CRM-LAS224466
hellip
rwerschkull
nllinkedincominrogierwerschkull
Most easy entity to be virtualised(if performance allows)
No Hashing amp No surrogateBUSINESSKEYS
(not by default at least)
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Containts Context about a Concept In a historical way
hellipLike a Data Vault Satellite
Every CC belongs to only one BC
Seperate entity per source system table stream
Concept Context (CC)
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Concept Context Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
Example-dataBK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts META_Deleted_Ind
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0
Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0
Asa hellip hellip hellip hellip hellip hellip
NSR-Station
[adres]Source NSS1 table y
More difficult to be virtualisedDepends on semantic gap with source
But do make virtual when lsquostreaming datarsquo is necessary
Because we have PS layerExposing all columns not necessary
Refactoring is more easyhellip
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Relations between Concepts + Context
In a historical way
hellipMerger of Data Vault Link + Link Satellite
Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point
in time
Connector (C)
rwerschkull
nllinkedincominrogierwerschkull
Explicitly defining a driving key as metadatahellip
Gives business understanding
Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)
bull is not registered as a new lsquoconnectionrsquo
Connector Driving key
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Connector Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
Driving Key
NS-Travelcard
+Checkin timestamp
Example-Data
NSR-StationNS-
Travelcard
NS-
Trainseries
rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
BK_NSR-
Station-from
BK_NSR-
Station-to
BK_NS_
Trainseries
BK_NS-Travelcard Checkin
timestamp
Checkout
timestamp
META_Load_dts META_Load_end_dts
Asd Ut IC 855 3528 0234 2073
1234
5-4-2016 84932 5-4-2016 94012 6-4-2016
220000
31-12-9999
000000
Ut Asd IC 855 3528 0234 2073
1234
5-4-2016 181009 5-4-2016 185520 6-4-2016
220000
31-12-9999
000000
We add Two mandatory
HELPEREntities
Are we there yet No
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
To help switching from sources that are Tied together by technical (surrogate) keyshellip
To a Business Key based model
Itrsquos a LOOKUP table that translates the technical to the
Business Key
Business Alias
Example
NSR-Station
[valuation]Source NSS2 table p
[description]SourceNSS1 table x[adres]
Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
BA-NSS2
Key Lookup voor NSS1 source tables
Key Lookup voor NSS2 source tables
Example-data
NSR-Station
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
Key Lookup voor NSS1 source tables
Business Key NSS1_Surrogate_key
Ut 123522
Asd 666323
Asa 222443
hellip hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
Has a 1 on (01) relation with a Business Concept
More difficult to be virtualised Lookup table should be kept small
Therefore DO NOT do key lookup in Concept Context entity
Load generate together with BC
Preferably lsquoin memoryrsquo somehowhellip
BA important Details
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Integrate the validity timelines of Concept Contextsbelonging to a Business Concept
Like a Data Vault Point-in-time construct
But Mandatory
And with a clearly defined and performantapproach
BC-Timeline
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-Station
[valuation]Source NSS2 table p
[adres]Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BK_NSR-
Station
WOZ
waarde
Waarde
Ratingbureau X
META_Laad_dts META_Laad_eind_dts
Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959
Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959
Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000
BK_NSR-
Station
Combined_Load_dts Combined_Load_end_dts
Ut 5-6-2013 220000 1-1-2014 215959
Ut 1-1-2014 220000 1-1-2015 215959
Ut 1-1-2015 220000 4-7-2015 215959
Ut 4-7-2015 220000 1-3-2016 215959
Ut 1-3-2016 220000 31-12-9999 000000
Asd 5-6-2013 220000 31-12-9999 000000
BK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959
Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000
Asa hellip hellip hellip hellip hellip hellip
BCT
modelling
rwerschkull
nllinkedincominrogierwerschkull
What
Makes PS-3Ca Different Ensemble
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
rwerschkull
nllinkedincominrogierwerschkull
1) Explicitly Splitting The work
Data
+History
Subjects
+Integration
rwerschkull
nllinkedincominrogierwerschkull
2) NO HASHED BUSINeSS KEYS
or surrogate keys
httpwwwcannabisculturecomfilesimages6hashbrickJPG
Only
Concatenatedones
rwerschkull
nllinkedincominrogierwerschkull
3) Less
joinsRelation
+Technical validity timeline
+ Relation context
Together in one entityPhoto credit Public Domain
rwerschkull
nllinkedincominrogierwerschkull
4) Explicit
HelpEREntities
Business Alias
Business Component Timeline
+ explicitly define Driving key(s)
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
WHY
Another Ensemble
Wersquove got loads already
httptopofmindssewp2014ensemble-modeling-forelasningar-och-lars-ronnbackrwerschkull
nllinkedincominrogierwerschkull
13600
9970
37073 5
0
2000
4000
6000
8000
10000
12000
14000
16000
Data Vault modeling+data warehouse
Anchor modeling+data warehouse
Hyper agility +datawarehouse
Focal point modeling+data warehouse
Head version modeling+data warehouse
Search Hits on Google 31-8-2016
Ensemble
Popularity
Data Vault
Ran into Problems using
Head-Version
Anchor Modelling
rwerschkull
nllinkedincominrogierwerschkull
WHAT problemS
Photo My ownhelliphelliprwerschkull
nllinkedincominrogierwerschkull
problem
rwerschkull
nllinkedincominrogierwerschkull
Not Build for
BiG data lake
Data CentrICITY
Photo credit Lake Public Domain httpwwwwriteupsorgstar-trek-brent-spiner-datarwerschkull
nllinkedincominrogierwerschkull
lsquoData may first be stored in a
data lake so that it can be explored cleaned and prepared
If it can be structured in a relational format (basically rows and columns) and needs to be used frequently and kept highly secure it may go into a
data warehouse
If it stops being used frequently it may go back to a HDFS
(Hadoop Distributed File System)-based archiversquo
Data Centric data first THOMAS H DAVENPORT WALL STEET JOURNAL OF 3-6-2015
httpblogswsjcomcio20150603the-shift-to-a-new-data-architecture rwerschkull
nllinkedincominrogierwerschkull
Systems Like
rwerschkull
nllinkedincominrogierwerschkull
Data Flood
Photo credit Kurayba (httpswwwflickrcomphotos48503330N0828564454666 )
under cc licence (httpscreativecommonsorglicensesby-sa20)
rwerschkull
nllinkedincominrogierwerschkull
The possible resulthellip
Photo credit httpshighfiveexportswordpresscom201006253000-pieces-lego-mix-specialty-pieces-rare-pieces-bricks-blocks-
parts-more-ultimate-lot-of-lego-parts-pieces-lego-for-sale-lego-batman-lego-starwars-lego-technic-lego-minifigur
rwerschkull
nllinkedincominrogierwerschkull
But isNrsquot Data Vault v2
lsquomade for
Big data centric
systemsrsquorwerschkull
nllinkedincominrogierwerschkull
In
DV2you still
do thisin one go
Subject Oriented
Integrated
Time Variant Non-Volatile
EDW
rwerschkull
nllinkedincominrogierwerschkull
Coding =
A lot like modelling
Being Data Centric
conflictswith thecomplex Data
MODELLINGwork
httpxkcdcom844
problem
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Less Mature
JOINoptimizers
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Key-Value
Document
Column Family
NoSQL Databases
+SQL on Hadoop solutions
Do NOT like joins
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
This
REALLYcomplicates
AnchoRModeling
rwerschkull
nllinkedincominrogierwerschkull
And Personally
HUB
SAT
LINK
This one
too(Link Satellite)
HUB
HUB
SAT
SAT
rwerschkull
nllinkedincominrogierwerschkull
problem
rwerschkull
nllinkedincominrogierwerschkull
a) HASHING OF Business keys
Rolling
Stock Nr Datetime Sensor Id Value Concatenated Business Key
Key
Len MD5 Hash
Key
Len
8739
2015-01-22
013427 72A1_FINV 123 8739|2015-01-22 013427|72A1_FINV 34 86ae4c6b0e2e2d5a13a0d11440529aeb 32
8739
2015-01-22
013427 72A1_SLDET 100 8739|2015-01-22 013427|72A1_SLDET 35 51ce9bc292eef407bd7c91a52eebcf2e 32
8739
2015-01-22
013432 72A1_FINV 126 8739|2015-01-22 013432|72A1_FINV 34 9482a41c1fecc4c64b8c437af6cc85e8 32
8739
2015-01-22
013432
13A8_MW_UB
AT_VT 5 8739|2015-01-22 013432|13A8_MW_UBAT_VT 42 e4160914ee55ce0b93f87b23366a0ce3 32
8674
2015-01-22
013426 72A1_FINV 6 8674|2015-01-22 013426|72A1_FINV 34 fcb3e7c8c91e44ce396d908a4948ca65 32
8674
2015-01-22
013426
16A1_HSVER
OND 7 8674|2015-01-22 013426|16A1_HSVEROND 38 fe9098c8c291ad56af5c8afae5169196 32
Loses statistical Informationregarding the data distributionQuery optimizers do not like thishellip
Column family Document and Key-value databases need a
good (natural) sharding key for (partial) key-
lookups
Hashinghelliphellip
httpwwwebaytechblogcom20120814cassandra-data-modeling-best-practices-part-2rwerschkull
nllinkedincominrogierwerschkull
Surrogates keys require
centralized coordination
hellipand thus can impact the overall systemrsquos scalability and availability
A lot of MPP NoSQL databases simply do not have themhellip
B) Surrogate BuSINESS keys
rwerschkull
nllinkedincominrogierwerschkull
Then Some Inspiration
httproelantvoscomblogp=1119
rwerschkull
nllinkedincominrogierwerschkull
lsquoIn my opinion the answer lies in the adoption of the
persistent (Historical) Staging Area concept
(also known as Historical Staging or the History Area)
This basically adopts the fundamentals of a Data Warehousersquo
lsquoThe Historical Staging Area effectively lsquoactsrsquo as
Data Lake but in a better defined form as data deltas and
event datetimes are taken into accountrsquo
So
Time Variant
amp
Non Volatile
Subject Oriented
amp
Integrated
EDW
Subject Oriented Integrated
Time Variant Non-Volatile
EDW
rwerschkull
nllinkedincominrogierwerschkull
Could be a
Data LAKE
VirtualisedEnsemble
Tier
EDW
Time Variant
amp
Non Volatile
Subject Oriented
amp
Integrated
EDW
rwerschkull
nllinkedincominrogierwerschkull
How
Does PS-3C Work
rwerschkull
nllinkedincominrogierwerschkullPhoto credit Public Domain
StagingArea
EDWInformation
Marts
Focus of Current ensemble EDWrsquos
rwerschkull
nllinkedincominrogierwerschkull
Persistent StagingArea HSA =
Data LibraryEDW
Information Marts
Splitting the work
rwerschkull
nllinkedincominrogierwerschkull
Persitent
Staging
-
Concept
Context
Connector
Business
Concept
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
Identify source event stream Primary or Unique KeyUse source metadata for this
Automate the building of a PS lsquoaround this keyrsquo Take all columns
Historize using SCD-2 approach
Persistent Staging - how
rwerschkull
nllinkedincominrogierwerschkull
Entity levelUnique key
Functional Description
Delivering party
Owner Responsible
MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)
hellip
Persistent Staging Metadata-1
rwerschkull
nllinkedincominrogierwerschkull
Column level [Load Date Timestamp]
[Load End Date Timestamp]
[Deleted Flag] OR delete as new record
[Source system] on table file level (lowest possible)
Load End Date Timestamp possible but difficulthellipRequires updates
Persistent Staging Metadata-2
rwerschkull
nllinkedincominrogierwerschkull
ACID is possible in HIVE
ACID Makes Updates possibleBy registering updates as lsquonew datarsquo
Reconciliation compacting when idle at user command
Use ORC files
PLUS changing the HIVE configurationhellip
UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)
rwerschkull
nllinkedincominrogierwerschkull
HivePut semi structured data = variable columns in MAP data type
OR use Data storage type that supports schema-evolutionAVRO (ORC in development)
Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo
Schema can be different for every row
What about SEMI-STRUCTURED Data
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
3C - how
rwerschkull
nllinkedincominrogierwerschkull
Always starts with Conceptual data modeling
NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic
No Link Satellites
No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo
Explicit Helper entities
Like Data Vault(2) BUT
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
a UNIQUE Domain specific point of integration
hellipa business entity
hellipwithin itrsquos own domain
hellipdoes not necessarily need to be Enterprise Wide
Business Concept (BC)
rwerschkull
nllinkedincominrogierwerschkull
Why not lsquoenterprise widersquoCompany
Customer
Sales Customer
International
Sales Customer
Local
Sales Customer
Marketing Customer
Customer hellip
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description]
[Owner Responsible]
Column level [Load Date Timestamp]
[Source system] on table file level (lowest possible)
Business Concept Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example-Data
NSR-Station
NS-
Travelcard
NS-
Trainseries
Business Key
IC|855
IC|8852
Sp|7455
St|16050
hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
NS-
Traveller
Business Key
3528 0234 2073 1234
3528 0234 2073 5678
hellip
Business Key
CRM-RW123456
CRM-LAS224466
hellip
rwerschkull
nllinkedincominrogierwerschkull
Most easy entity to be virtualised(if performance allows)
No Hashing amp No surrogateBUSINESSKEYS
(not by default at least)
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Containts Context about a Concept In a historical way
hellipLike a Data Vault Satellite
Every CC belongs to only one BC
Seperate entity per source system table stream
Concept Context (CC)
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Concept Context Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
Example-dataBK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts META_Deleted_Ind
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0
Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0
Asa hellip hellip hellip hellip hellip hellip
NSR-Station
[adres]Source NSS1 table y
More difficult to be virtualisedDepends on semantic gap with source
But do make virtual when lsquostreaming datarsquo is necessary
Because we have PS layerExposing all columns not necessary
Refactoring is more easyhellip
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Relations between Concepts + Context
In a historical way
hellipMerger of Data Vault Link + Link Satellite
Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point
in time
Connector (C)
rwerschkull
nllinkedincominrogierwerschkull
Explicitly defining a driving key as metadatahellip
Gives business understanding
Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)
bull is not registered as a new lsquoconnectionrsquo
Connector Driving key
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Connector Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
Driving Key
NS-Travelcard
+Checkin timestamp
Example-Data
NSR-StationNS-
Travelcard
NS-
Trainseries
rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
BK_NSR-
Station-from
BK_NSR-
Station-to
BK_NS_
Trainseries
BK_NS-Travelcard Checkin
timestamp
Checkout
timestamp
META_Load_dts META_Load_end_dts
Asd Ut IC 855 3528 0234 2073
1234
5-4-2016 84932 5-4-2016 94012 6-4-2016
220000
31-12-9999
000000
Ut Asd IC 855 3528 0234 2073
1234
5-4-2016 181009 5-4-2016 185520 6-4-2016
220000
31-12-9999
000000
We add Two mandatory
HELPEREntities
Are we there yet No
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
To help switching from sources that are Tied together by technical (surrogate) keyshellip
To a Business Key based model
Itrsquos a LOOKUP table that translates the technical to the
Business Key
Business Alias
Example
NSR-Station
[valuation]Source NSS2 table p
[description]SourceNSS1 table x[adres]
Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
BA-NSS2
Key Lookup voor NSS1 source tables
Key Lookup voor NSS2 source tables
Example-data
NSR-Station
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
Key Lookup voor NSS1 source tables
Business Key NSS1_Surrogate_key
Ut 123522
Asd 666323
Asa 222443
hellip hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
Has a 1 on (01) relation with a Business Concept
More difficult to be virtualised Lookup table should be kept small
Therefore DO NOT do key lookup in Concept Context entity
Load generate together with BC
Preferably lsquoin memoryrsquo somehowhellip
BA important Details
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Integrate the validity timelines of Concept Contextsbelonging to a Business Concept
Like a Data Vault Point-in-time construct
But Mandatory
And with a clearly defined and performantapproach
BC-Timeline
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-Station
[valuation]Source NSS2 table p
[adres]Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BK_NSR-
Station
WOZ
waarde
Waarde
Ratingbureau X
META_Laad_dts META_Laad_eind_dts
Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959
Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959
Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000
BK_NSR-
Station
Combined_Load_dts Combined_Load_end_dts
Ut 5-6-2013 220000 1-1-2014 215959
Ut 1-1-2014 220000 1-1-2015 215959
Ut 1-1-2015 220000 4-7-2015 215959
Ut 4-7-2015 220000 1-3-2016 215959
Ut 1-3-2016 220000 31-12-9999 000000
Asd 5-6-2013 220000 31-12-9999 000000
BK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959
Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000
Asa hellip hellip hellip hellip hellip hellip
BCT
modelling
rwerschkull
nllinkedincominrogierwerschkull
What
Makes PS-3Ca Different Ensemble
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
rwerschkull
nllinkedincominrogierwerschkull
1) Explicitly Splitting The work
Data
+History
Subjects
+Integration
rwerschkull
nllinkedincominrogierwerschkull
2) NO HASHED BUSINeSS KEYS
or surrogate keys
httpwwwcannabisculturecomfilesimages6hashbrickJPG
Only
Concatenatedones
rwerschkull
nllinkedincominrogierwerschkull
3) Less
joinsRelation
+Technical validity timeline
+ Relation context
Together in one entityPhoto credit Public Domain
rwerschkull
nllinkedincominrogierwerschkull
4) Explicit
HelpEREntities
Business Alias
Business Component Timeline
+ explicitly define Driving key(s)
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
13600
9970
37073 5
0
2000
4000
6000
8000
10000
12000
14000
16000
Data Vault modeling+data warehouse
Anchor modeling+data warehouse
Hyper agility +datawarehouse
Focal point modeling+data warehouse
Head version modeling+data warehouse
Search Hits on Google 31-8-2016
Ensemble
Popularity
Data Vault
Ran into Problems using
Head-Version
Anchor Modelling
rwerschkull
nllinkedincominrogierwerschkull
WHAT problemS
Photo My ownhelliphelliprwerschkull
nllinkedincominrogierwerschkull
problem
rwerschkull
nllinkedincominrogierwerschkull
Not Build for
BiG data lake
Data CentrICITY
Photo credit Lake Public Domain httpwwwwriteupsorgstar-trek-brent-spiner-datarwerschkull
nllinkedincominrogierwerschkull
lsquoData may first be stored in a
data lake so that it can be explored cleaned and prepared
If it can be structured in a relational format (basically rows and columns) and needs to be used frequently and kept highly secure it may go into a
data warehouse
If it stops being used frequently it may go back to a HDFS
(Hadoop Distributed File System)-based archiversquo
Data Centric data first THOMAS H DAVENPORT WALL STEET JOURNAL OF 3-6-2015
httpblogswsjcomcio20150603the-shift-to-a-new-data-architecture rwerschkull
nllinkedincominrogierwerschkull
Systems Like
rwerschkull
nllinkedincominrogierwerschkull
Data Flood
Photo credit Kurayba (httpswwwflickrcomphotos48503330N0828564454666 )
under cc licence (httpscreativecommonsorglicensesby-sa20)
rwerschkull
nllinkedincominrogierwerschkull
The possible resulthellip
Photo credit httpshighfiveexportswordpresscom201006253000-pieces-lego-mix-specialty-pieces-rare-pieces-bricks-blocks-
parts-more-ultimate-lot-of-lego-parts-pieces-lego-for-sale-lego-batman-lego-starwars-lego-technic-lego-minifigur
rwerschkull
nllinkedincominrogierwerschkull
But isNrsquot Data Vault v2
lsquomade for
Big data centric
systemsrsquorwerschkull
nllinkedincominrogierwerschkull
In
DV2you still
do thisin one go
Subject Oriented
Integrated
Time Variant Non-Volatile
EDW
rwerschkull
nllinkedincominrogierwerschkull
Coding =
A lot like modelling
Being Data Centric
conflictswith thecomplex Data
MODELLINGwork
httpxkcdcom844
problem
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Less Mature
JOINoptimizers
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Key-Value
Document
Column Family
NoSQL Databases
+SQL on Hadoop solutions
Do NOT like joins
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
This
REALLYcomplicates
AnchoRModeling
rwerschkull
nllinkedincominrogierwerschkull
And Personally
HUB
SAT
LINK
This one
too(Link Satellite)
HUB
HUB
SAT
SAT
rwerschkull
nllinkedincominrogierwerschkull
problem
rwerschkull
nllinkedincominrogierwerschkull
a) HASHING OF Business keys
Rolling
Stock Nr Datetime Sensor Id Value Concatenated Business Key
Key
Len MD5 Hash
Key
Len
8739
2015-01-22
013427 72A1_FINV 123 8739|2015-01-22 013427|72A1_FINV 34 86ae4c6b0e2e2d5a13a0d11440529aeb 32
8739
2015-01-22
013427 72A1_SLDET 100 8739|2015-01-22 013427|72A1_SLDET 35 51ce9bc292eef407bd7c91a52eebcf2e 32
8739
2015-01-22
013432 72A1_FINV 126 8739|2015-01-22 013432|72A1_FINV 34 9482a41c1fecc4c64b8c437af6cc85e8 32
8739
2015-01-22
013432
13A8_MW_UB
AT_VT 5 8739|2015-01-22 013432|13A8_MW_UBAT_VT 42 e4160914ee55ce0b93f87b23366a0ce3 32
8674
2015-01-22
013426 72A1_FINV 6 8674|2015-01-22 013426|72A1_FINV 34 fcb3e7c8c91e44ce396d908a4948ca65 32
8674
2015-01-22
013426
16A1_HSVER
OND 7 8674|2015-01-22 013426|16A1_HSVEROND 38 fe9098c8c291ad56af5c8afae5169196 32
Loses statistical Informationregarding the data distributionQuery optimizers do not like thishellip
Column family Document and Key-value databases need a
good (natural) sharding key for (partial) key-
lookups
Hashinghelliphellip
httpwwwebaytechblogcom20120814cassandra-data-modeling-best-practices-part-2rwerschkull
nllinkedincominrogierwerschkull
Surrogates keys require
centralized coordination
hellipand thus can impact the overall systemrsquos scalability and availability
A lot of MPP NoSQL databases simply do not have themhellip
B) Surrogate BuSINESS keys
rwerschkull
nllinkedincominrogierwerschkull
Then Some Inspiration
httproelantvoscomblogp=1119
rwerschkull
nllinkedincominrogierwerschkull
lsquoIn my opinion the answer lies in the adoption of the
persistent (Historical) Staging Area concept
(also known as Historical Staging or the History Area)
This basically adopts the fundamentals of a Data Warehousersquo
lsquoThe Historical Staging Area effectively lsquoactsrsquo as
Data Lake but in a better defined form as data deltas and
event datetimes are taken into accountrsquo
So
Time Variant
amp
Non Volatile
Subject Oriented
amp
Integrated
EDW
Subject Oriented Integrated
Time Variant Non-Volatile
EDW
rwerschkull
nllinkedincominrogierwerschkull
Could be a
Data LAKE
VirtualisedEnsemble
Tier
EDW
Time Variant
amp
Non Volatile
Subject Oriented
amp
Integrated
EDW
rwerschkull
nllinkedincominrogierwerschkull
How
Does PS-3C Work
rwerschkull
nllinkedincominrogierwerschkullPhoto credit Public Domain
StagingArea
EDWInformation
Marts
Focus of Current ensemble EDWrsquos
rwerschkull
nllinkedincominrogierwerschkull
Persistent StagingArea HSA =
Data LibraryEDW
Information Marts
Splitting the work
rwerschkull
nllinkedincominrogierwerschkull
Persitent
Staging
-
Concept
Context
Connector
Business
Concept
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
Identify source event stream Primary or Unique KeyUse source metadata for this
Automate the building of a PS lsquoaround this keyrsquo Take all columns
Historize using SCD-2 approach
Persistent Staging - how
rwerschkull
nllinkedincominrogierwerschkull
Entity levelUnique key
Functional Description
Delivering party
Owner Responsible
MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)
hellip
Persistent Staging Metadata-1
rwerschkull
nllinkedincominrogierwerschkull
Column level [Load Date Timestamp]
[Load End Date Timestamp]
[Deleted Flag] OR delete as new record
[Source system] on table file level (lowest possible)
Load End Date Timestamp possible but difficulthellipRequires updates
Persistent Staging Metadata-2
rwerschkull
nllinkedincominrogierwerschkull
ACID is possible in HIVE
ACID Makes Updates possibleBy registering updates as lsquonew datarsquo
Reconciliation compacting when idle at user command
Use ORC files
PLUS changing the HIVE configurationhellip
UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)
rwerschkull
nllinkedincominrogierwerschkull
HivePut semi structured data = variable columns in MAP data type
OR use Data storage type that supports schema-evolutionAVRO (ORC in development)
Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo
Schema can be different for every row
What about SEMI-STRUCTURED Data
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
3C - how
rwerschkull
nllinkedincominrogierwerschkull
Always starts with Conceptual data modeling
NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic
No Link Satellites
No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo
Explicit Helper entities
Like Data Vault(2) BUT
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
a UNIQUE Domain specific point of integration
hellipa business entity
hellipwithin itrsquos own domain
hellipdoes not necessarily need to be Enterprise Wide
Business Concept (BC)
rwerschkull
nllinkedincominrogierwerschkull
Why not lsquoenterprise widersquoCompany
Customer
Sales Customer
International
Sales Customer
Local
Sales Customer
Marketing Customer
Customer hellip
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description]
[Owner Responsible]
Column level [Load Date Timestamp]
[Source system] on table file level (lowest possible)
Business Concept Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example-Data
NSR-Station
NS-
Travelcard
NS-
Trainseries
Business Key
IC|855
IC|8852
Sp|7455
St|16050
hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
NS-
Traveller
Business Key
3528 0234 2073 1234
3528 0234 2073 5678
hellip
Business Key
CRM-RW123456
CRM-LAS224466
hellip
rwerschkull
nllinkedincominrogierwerschkull
Most easy entity to be virtualised(if performance allows)
No Hashing amp No surrogateBUSINESSKEYS
(not by default at least)
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Containts Context about a Concept In a historical way
hellipLike a Data Vault Satellite
Every CC belongs to only one BC
Seperate entity per source system table stream
Concept Context (CC)
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Concept Context Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
Example-dataBK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts META_Deleted_Ind
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0
Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0
Asa hellip hellip hellip hellip hellip hellip
NSR-Station
[adres]Source NSS1 table y
More difficult to be virtualisedDepends on semantic gap with source
But do make virtual when lsquostreaming datarsquo is necessary
Because we have PS layerExposing all columns not necessary
Refactoring is more easyhellip
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Relations between Concepts + Context
In a historical way
hellipMerger of Data Vault Link + Link Satellite
Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point
in time
Connector (C)
rwerschkull
nllinkedincominrogierwerschkull
Explicitly defining a driving key as metadatahellip
Gives business understanding
Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)
bull is not registered as a new lsquoconnectionrsquo
Connector Driving key
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Connector Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
Driving Key
NS-Travelcard
+Checkin timestamp
Example-Data
NSR-StationNS-
Travelcard
NS-
Trainseries
rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
BK_NSR-
Station-from
BK_NSR-
Station-to
BK_NS_
Trainseries
BK_NS-Travelcard Checkin
timestamp
Checkout
timestamp
META_Load_dts META_Load_end_dts
Asd Ut IC 855 3528 0234 2073
1234
5-4-2016 84932 5-4-2016 94012 6-4-2016
220000
31-12-9999
000000
Ut Asd IC 855 3528 0234 2073
1234
5-4-2016 181009 5-4-2016 185520 6-4-2016
220000
31-12-9999
000000
We add Two mandatory
HELPEREntities
Are we there yet No
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
To help switching from sources that are Tied together by technical (surrogate) keyshellip
To a Business Key based model
Itrsquos a LOOKUP table that translates the technical to the
Business Key
Business Alias
Example
NSR-Station
[valuation]Source NSS2 table p
[description]SourceNSS1 table x[adres]
Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
BA-NSS2
Key Lookup voor NSS1 source tables
Key Lookup voor NSS2 source tables
Example-data
NSR-Station
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
Key Lookup voor NSS1 source tables
Business Key NSS1_Surrogate_key
Ut 123522
Asd 666323
Asa 222443
hellip hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
Has a 1 on (01) relation with a Business Concept
More difficult to be virtualised Lookup table should be kept small
Therefore DO NOT do key lookup in Concept Context entity
Load generate together with BC
Preferably lsquoin memoryrsquo somehowhellip
BA important Details
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Integrate the validity timelines of Concept Contextsbelonging to a Business Concept
Like a Data Vault Point-in-time construct
But Mandatory
And with a clearly defined and performantapproach
BC-Timeline
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-Station
[valuation]Source NSS2 table p
[adres]Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BK_NSR-
Station
WOZ
waarde
Waarde
Ratingbureau X
META_Laad_dts META_Laad_eind_dts
Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959
Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959
Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000
BK_NSR-
Station
Combined_Load_dts Combined_Load_end_dts
Ut 5-6-2013 220000 1-1-2014 215959
Ut 1-1-2014 220000 1-1-2015 215959
Ut 1-1-2015 220000 4-7-2015 215959
Ut 4-7-2015 220000 1-3-2016 215959
Ut 1-3-2016 220000 31-12-9999 000000
Asd 5-6-2013 220000 31-12-9999 000000
BK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959
Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000
Asa hellip hellip hellip hellip hellip hellip
BCT
modelling
rwerschkull
nllinkedincominrogierwerschkull
What
Makes PS-3Ca Different Ensemble
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
rwerschkull
nllinkedincominrogierwerschkull
1) Explicitly Splitting The work
Data
+History
Subjects
+Integration
rwerschkull
nllinkedincominrogierwerschkull
2) NO HASHED BUSINeSS KEYS
or surrogate keys
httpwwwcannabisculturecomfilesimages6hashbrickJPG
Only
Concatenatedones
rwerschkull
nllinkedincominrogierwerschkull
3) Less
joinsRelation
+Technical validity timeline
+ Relation context
Together in one entityPhoto credit Public Domain
rwerschkull
nllinkedincominrogierwerschkull
4) Explicit
HelpEREntities
Business Alias
Business Component Timeline
+ explicitly define Driving key(s)
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
Data Vault
Ran into Problems using
Head-Version
Anchor Modelling
rwerschkull
nllinkedincominrogierwerschkull
WHAT problemS
Photo My ownhelliphelliprwerschkull
nllinkedincominrogierwerschkull
problem
rwerschkull
nllinkedincominrogierwerschkull
Not Build for
BiG data lake
Data CentrICITY
Photo credit Lake Public Domain httpwwwwriteupsorgstar-trek-brent-spiner-datarwerschkull
nllinkedincominrogierwerschkull
lsquoData may first be stored in a
data lake so that it can be explored cleaned and prepared
If it can be structured in a relational format (basically rows and columns) and needs to be used frequently and kept highly secure it may go into a
data warehouse
If it stops being used frequently it may go back to a HDFS
(Hadoop Distributed File System)-based archiversquo
Data Centric data first THOMAS H DAVENPORT WALL STEET JOURNAL OF 3-6-2015
httpblogswsjcomcio20150603the-shift-to-a-new-data-architecture rwerschkull
nllinkedincominrogierwerschkull
Systems Like
rwerschkull
nllinkedincominrogierwerschkull
Data Flood
Photo credit Kurayba (httpswwwflickrcomphotos48503330N0828564454666 )
under cc licence (httpscreativecommonsorglicensesby-sa20)
rwerschkull
nllinkedincominrogierwerschkull
The possible resulthellip
Photo credit httpshighfiveexportswordpresscom201006253000-pieces-lego-mix-specialty-pieces-rare-pieces-bricks-blocks-
parts-more-ultimate-lot-of-lego-parts-pieces-lego-for-sale-lego-batman-lego-starwars-lego-technic-lego-minifigur
rwerschkull
nllinkedincominrogierwerschkull
But isNrsquot Data Vault v2
lsquomade for
Big data centric
systemsrsquorwerschkull
nllinkedincominrogierwerschkull
In
DV2you still
do thisin one go
Subject Oriented
Integrated
Time Variant Non-Volatile
EDW
rwerschkull
nllinkedincominrogierwerschkull
Coding =
A lot like modelling
Being Data Centric
conflictswith thecomplex Data
MODELLINGwork
httpxkcdcom844
problem
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Less Mature
JOINoptimizers
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Key-Value
Document
Column Family
NoSQL Databases
+SQL on Hadoop solutions
Do NOT like joins
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
This
REALLYcomplicates
AnchoRModeling
rwerschkull
nllinkedincominrogierwerschkull
And Personally
HUB
SAT
LINK
This one
too(Link Satellite)
HUB
HUB
SAT
SAT
rwerschkull
nllinkedincominrogierwerschkull
problem
rwerschkull
nllinkedincominrogierwerschkull
a) HASHING OF Business keys
Rolling
Stock Nr Datetime Sensor Id Value Concatenated Business Key
Key
Len MD5 Hash
Key
Len
8739
2015-01-22
013427 72A1_FINV 123 8739|2015-01-22 013427|72A1_FINV 34 86ae4c6b0e2e2d5a13a0d11440529aeb 32
8739
2015-01-22
013427 72A1_SLDET 100 8739|2015-01-22 013427|72A1_SLDET 35 51ce9bc292eef407bd7c91a52eebcf2e 32
8739
2015-01-22
013432 72A1_FINV 126 8739|2015-01-22 013432|72A1_FINV 34 9482a41c1fecc4c64b8c437af6cc85e8 32
8739
2015-01-22
013432
13A8_MW_UB
AT_VT 5 8739|2015-01-22 013432|13A8_MW_UBAT_VT 42 e4160914ee55ce0b93f87b23366a0ce3 32
8674
2015-01-22
013426 72A1_FINV 6 8674|2015-01-22 013426|72A1_FINV 34 fcb3e7c8c91e44ce396d908a4948ca65 32
8674
2015-01-22
013426
16A1_HSVER
OND 7 8674|2015-01-22 013426|16A1_HSVEROND 38 fe9098c8c291ad56af5c8afae5169196 32
Loses statistical Informationregarding the data distributionQuery optimizers do not like thishellip
Column family Document and Key-value databases need a
good (natural) sharding key for (partial) key-
lookups
Hashinghelliphellip
httpwwwebaytechblogcom20120814cassandra-data-modeling-best-practices-part-2rwerschkull
nllinkedincominrogierwerschkull
Surrogates keys require
centralized coordination
hellipand thus can impact the overall systemrsquos scalability and availability
A lot of MPP NoSQL databases simply do not have themhellip
B) Surrogate BuSINESS keys
rwerschkull
nllinkedincominrogierwerschkull
Then Some Inspiration
httproelantvoscomblogp=1119
rwerschkull
nllinkedincominrogierwerschkull
lsquoIn my opinion the answer lies in the adoption of the
persistent (Historical) Staging Area concept
(also known as Historical Staging or the History Area)
This basically adopts the fundamentals of a Data Warehousersquo
lsquoThe Historical Staging Area effectively lsquoactsrsquo as
Data Lake but in a better defined form as data deltas and
event datetimes are taken into accountrsquo
So
Time Variant
amp
Non Volatile
Subject Oriented
amp
Integrated
EDW
Subject Oriented Integrated
Time Variant Non-Volatile
EDW
rwerschkull
nllinkedincominrogierwerschkull
Could be a
Data LAKE
VirtualisedEnsemble
Tier
EDW
Time Variant
amp
Non Volatile
Subject Oriented
amp
Integrated
EDW
rwerschkull
nllinkedincominrogierwerschkull
How
Does PS-3C Work
rwerschkull
nllinkedincominrogierwerschkullPhoto credit Public Domain
StagingArea
EDWInformation
Marts
Focus of Current ensemble EDWrsquos
rwerschkull
nllinkedincominrogierwerschkull
Persistent StagingArea HSA =
Data LibraryEDW
Information Marts
Splitting the work
rwerschkull
nllinkedincominrogierwerschkull
Persitent
Staging
-
Concept
Context
Connector
Business
Concept
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
Identify source event stream Primary or Unique KeyUse source metadata for this
Automate the building of a PS lsquoaround this keyrsquo Take all columns
Historize using SCD-2 approach
Persistent Staging - how
rwerschkull
nllinkedincominrogierwerschkull
Entity levelUnique key
Functional Description
Delivering party
Owner Responsible
MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)
hellip
Persistent Staging Metadata-1
rwerschkull
nllinkedincominrogierwerschkull
Column level [Load Date Timestamp]
[Load End Date Timestamp]
[Deleted Flag] OR delete as new record
[Source system] on table file level (lowest possible)
Load End Date Timestamp possible but difficulthellipRequires updates
Persistent Staging Metadata-2
rwerschkull
nllinkedincominrogierwerschkull
ACID is possible in HIVE
ACID Makes Updates possibleBy registering updates as lsquonew datarsquo
Reconciliation compacting when idle at user command
Use ORC files
PLUS changing the HIVE configurationhellip
UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)
rwerschkull
nllinkedincominrogierwerschkull
HivePut semi structured data = variable columns in MAP data type
OR use Data storage type that supports schema-evolutionAVRO (ORC in development)
Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo
Schema can be different for every row
What about SEMI-STRUCTURED Data
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
3C - how
rwerschkull
nllinkedincominrogierwerschkull
Always starts with Conceptual data modeling
NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic
No Link Satellites
No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo
Explicit Helper entities
Like Data Vault(2) BUT
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
a UNIQUE Domain specific point of integration
hellipa business entity
hellipwithin itrsquos own domain
hellipdoes not necessarily need to be Enterprise Wide
Business Concept (BC)
rwerschkull
nllinkedincominrogierwerschkull
Why not lsquoenterprise widersquoCompany
Customer
Sales Customer
International
Sales Customer
Local
Sales Customer
Marketing Customer
Customer hellip
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description]
[Owner Responsible]
Column level [Load Date Timestamp]
[Source system] on table file level (lowest possible)
Business Concept Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example-Data
NSR-Station
NS-
Travelcard
NS-
Trainseries
Business Key
IC|855
IC|8852
Sp|7455
St|16050
hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
NS-
Traveller
Business Key
3528 0234 2073 1234
3528 0234 2073 5678
hellip
Business Key
CRM-RW123456
CRM-LAS224466
hellip
rwerschkull
nllinkedincominrogierwerschkull
Most easy entity to be virtualised(if performance allows)
No Hashing amp No surrogateBUSINESSKEYS
(not by default at least)
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Containts Context about a Concept In a historical way
hellipLike a Data Vault Satellite
Every CC belongs to only one BC
Seperate entity per source system table stream
Concept Context (CC)
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Concept Context Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
Example-dataBK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts META_Deleted_Ind
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0
Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0
Asa hellip hellip hellip hellip hellip hellip
NSR-Station
[adres]Source NSS1 table y
More difficult to be virtualisedDepends on semantic gap with source
But do make virtual when lsquostreaming datarsquo is necessary
Because we have PS layerExposing all columns not necessary
Refactoring is more easyhellip
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Relations between Concepts + Context
In a historical way
hellipMerger of Data Vault Link + Link Satellite
Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point
in time
Connector (C)
rwerschkull
nllinkedincominrogierwerschkull
Explicitly defining a driving key as metadatahellip
Gives business understanding
Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)
bull is not registered as a new lsquoconnectionrsquo
Connector Driving key
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Connector Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
Driving Key
NS-Travelcard
+Checkin timestamp
Example-Data
NSR-StationNS-
Travelcard
NS-
Trainseries
rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
BK_NSR-
Station-from
BK_NSR-
Station-to
BK_NS_
Trainseries
BK_NS-Travelcard Checkin
timestamp
Checkout
timestamp
META_Load_dts META_Load_end_dts
Asd Ut IC 855 3528 0234 2073
1234
5-4-2016 84932 5-4-2016 94012 6-4-2016
220000
31-12-9999
000000
Ut Asd IC 855 3528 0234 2073
1234
5-4-2016 181009 5-4-2016 185520 6-4-2016
220000
31-12-9999
000000
We add Two mandatory
HELPEREntities
Are we there yet No
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
To help switching from sources that are Tied together by technical (surrogate) keyshellip
To a Business Key based model
Itrsquos a LOOKUP table that translates the technical to the
Business Key
Business Alias
Example
NSR-Station
[valuation]Source NSS2 table p
[description]SourceNSS1 table x[adres]
Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
BA-NSS2
Key Lookup voor NSS1 source tables
Key Lookup voor NSS2 source tables
Example-data
NSR-Station
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
Key Lookup voor NSS1 source tables
Business Key NSS1_Surrogate_key
Ut 123522
Asd 666323
Asa 222443
hellip hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
Has a 1 on (01) relation with a Business Concept
More difficult to be virtualised Lookup table should be kept small
Therefore DO NOT do key lookup in Concept Context entity
Load generate together with BC
Preferably lsquoin memoryrsquo somehowhellip
BA important Details
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Integrate the validity timelines of Concept Contextsbelonging to a Business Concept
Like a Data Vault Point-in-time construct
But Mandatory
And with a clearly defined and performantapproach
BC-Timeline
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-Station
[valuation]Source NSS2 table p
[adres]Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BK_NSR-
Station
WOZ
waarde
Waarde
Ratingbureau X
META_Laad_dts META_Laad_eind_dts
Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959
Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959
Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000
BK_NSR-
Station
Combined_Load_dts Combined_Load_end_dts
Ut 5-6-2013 220000 1-1-2014 215959
Ut 1-1-2014 220000 1-1-2015 215959
Ut 1-1-2015 220000 4-7-2015 215959
Ut 4-7-2015 220000 1-3-2016 215959
Ut 1-3-2016 220000 31-12-9999 000000
Asd 5-6-2013 220000 31-12-9999 000000
BK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959
Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000
Asa hellip hellip hellip hellip hellip hellip
BCT
modelling
rwerschkull
nllinkedincominrogierwerschkull
What
Makes PS-3Ca Different Ensemble
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
rwerschkull
nllinkedincominrogierwerschkull
1) Explicitly Splitting The work
Data
+History
Subjects
+Integration
rwerschkull
nllinkedincominrogierwerschkull
2) NO HASHED BUSINeSS KEYS
or surrogate keys
httpwwwcannabisculturecomfilesimages6hashbrickJPG
Only
Concatenatedones
rwerschkull
nllinkedincominrogierwerschkull
3) Less
joinsRelation
+Technical validity timeline
+ Relation context
Together in one entityPhoto credit Public Domain
rwerschkull
nllinkedincominrogierwerschkull
4) Explicit
HelpEREntities
Business Alias
Business Component Timeline
+ explicitly define Driving key(s)
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
WHAT problemS
Photo My ownhelliphelliprwerschkull
nllinkedincominrogierwerschkull
problem
rwerschkull
nllinkedincominrogierwerschkull
Not Build for
BiG data lake
Data CentrICITY
Photo credit Lake Public Domain httpwwwwriteupsorgstar-trek-brent-spiner-datarwerschkull
nllinkedincominrogierwerschkull
lsquoData may first be stored in a
data lake so that it can be explored cleaned and prepared
If it can be structured in a relational format (basically rows and columns) and needs to be used frequently and kept highly secure it may go into a
data warehouse
If it stops being used frequently it may go back to a HDFS
(Hadoop Distributed File System)-based archiversquo
Data Centric data first THOMAS H DAVENPORT WALL STEET JOURNAL OF 3-6-2015
httpblogswsjcomcio20150603the-shift-to-a-new-data-architecture rwerschkull
nllinkedincominrogierwerschkull
Systems Like
rwerschkull
nllinkedincominrogierwerschkull
Data Flood
Photo credit Kurayba (httpswwwflickrcomphotos48503330N0828564454666 )
under cc licence (httpscreativecommonsorglicensesby-sa20)
rwerschkull
nllinkedincominrogierwerschkull
The possible resulthellip
Photo credit httpshighfiveexportswordpresscom201006253000-pieces-lego-mix-specialty-pieces-rare-pieces-bricks-blocks-
parts-more-ultimate-lot-of-lego-parts-pieces-lego-for-sale-lego-batman-lego-starwars-lego-technic-lego-minifigur
rwerschkull
nllinkedincominrogierwerschkull
But isNrsquot Data Vault v2
lsquomade for
Big data centric
systemsrsquorwerschkull
nllinkedincominrogierwerschkull
In
DV2you still
do thisin one go
Subject Oriented
Integrated
Time Variant Non-Volatile
EDW
rwerschkull
nllinkedincominrogierwerschkull
Coding =
A lot like modelling
Being Data Centric
conflictswith thecomplex Data
MODELLINGwork
httpxkcdcom844
problem
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Less Mature
JOINoptimizers
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Key-Value
Document
Column Family
NoSQL Databases
+SQL on Hadoop solutions
Do NOT like joins
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
This
REALLYcomplicates
AnchoRModeling
rwerschkull
nllinkedincominrogierwerschkull
And Personally
HUB
SAT
LINK
This one
too(Link Satellite)
HUB
HUB
SAT
SAT
rwerschkull
nllinkedincominrogierwerschkull
problem
rwerschkull
nllinkedincominrogierwerschkull
a) HASHING OF Business keys
Rolling
Stock Nr Datetime Sensor Id Value Concatenated Business Key
Key
Len MD5 Hash
Key
Len
8739
2015-01-22
013427 72A1_FINV 123 8739|2015-01-22 013427|72A1_FINV 34 86ae4c6b0e2e2d5a13a0d11440529aeb 32
8739
2015-01-22
013427 72A1_SLDET 100 8739|2015-01-22 013427|72A1_SLDET 35 51ce9bc292eef407bd7c91a52eebcf2e 32
8739
2015-01-22
013432 72A1_FINV 126 8739|2015-01-22 013432|72A1_FINV 34 9482a41c1fecc4c64b8c437af6cc85e8 32
8739
2015-01-22
013432
13A8_MW_UB
AT_VT 5 8739|2015-01-22 013432|13A8_MW_UBAT_VT 42 e4160914ee55ce0b93f87b23366a0ce3 32
8674
2015-01-22
013426 72A1_FINV 6 8674|2015-01-22 013426|72A1_FINV 34 fcb3e7c8c91e44ce396d908a4948ca65 32
8674
2015-01-22
013426
16A1_HSVER
OND 7 8674|2015-01-22 013426|16A1_HSVEROND 38 fe9098c8c291ad56af5c8afae5169196 32
Loses statistical Informationregarding the data distributionQuery optimizers do not like thishellip
Column family Document and Key-value databases need a
good (natural) sharding key for (partial) key-
lookups
Hashinghelliphellip
httpwwwebaytechblogcom20120814cassandra-data-modeling-best-practices-part-2rwerschkull
nllinkedincominrogierwerschkull
Surrogates keys require
centralized coordination
hellipand thus can impact the overall systemrsquos scalability and availability
A lot of MPP NoSQL databases simply do not have themhellip
B) Surrogate BuSINESS keys
rwerschkull
nllinkedincominrogierwerschkull
Then Some Inspiration
httproelantvoscomblogp=1119
rwerschkull
nllinkedincominrogierwerschkull
lsquoIn my opinion the answer lies in the adoption of the
persistent (Historical) Staging Area concept
(also known as Historical Staging or the History Area)
This basically adopts the fundamentals of a Data Warehousersquo
lsquoThe Historical Staging Area effectively lsquoactsrsquo as
Data Lake but in a better defined form as data deltas and
event datetimes are taken into accountrsquo
So
Time Variant
amp
Non Volatile
Subject Oriented
amp
Integrated
EDW
Subject Oriented Integrated
Time Variant Non-Volatile
EDW
rwerschkull
nllinkedincominrogierwerschkull
Could be a
Data LAKE
VirtualisedEnsemble
Tier
EDW
Time Variant
amp
Non Volatile
Subject Oriented
amp
Integrated
EDW
rwerschkull
nllinkedincominrogierwerschkull
How
Does PS-3C Work
rwerschkull
nllinkedincominrogierwerschkullPhoto credit Public Domain
StagingArea
EDWInformation
Marts
Focus of Current ensemble EDWrsquos
rwerschkull
nllinkedincominrogierwerschkull
Persistent StagingArea HSA =
Data LibraryEDW
Information Marts
Splitting the work
rwerschkull
nllinkedincominrogierwerschkull
Persitent
Staging
-
Concept
Context
Connector
Business
Concept
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
Identify source event stream Primary or Unique KeyUse source metadata for this
Automate the building of a PS lsquoaround this keyrsquo Take all columns
Historize using SCD-2 approach
Persistent Staging - how
rwerschkull
nllinkedincominrogierwerschkull
Entity levelUnique key
Functional Description
Delivering party
Owner Responsible
MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)
hellip
Persistent Staging Metadata-1
rwerschkull
nllinkedincominrogierwerschkull
Column level [Load Date Timestamp]
[Load End Date Timestamp]
[Deleted Flag] OR delete as new record
[Source system] on table file level (lowest possible)
Load End Date Timestamp possible but difficulthellipRequires updates
Persistent Staging Metadata-2
rwerschkull
nllinkedincominrogierwerschkull
ACID is possible in HIVE
ACID Makes Updates possibleBy registering updates as lsquonew datarsquo
Reconciliation compacting when idle at user command
Use ORC files
PLUS changing the HIVE configurationhellip
UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)
rwerschkull
nllinkedincominrogierwerschkull
HivePut semi structured data = variable columns in MAP data type
OR use Data storage type that supports schema-evolutionAVRO (ORC in development)
Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo
Schema can be different for every row
What about SEMI-STRUCTURED Data
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
3C - how
rwerschkull
nllinkedincominrogierwerschkull
Always starts with Conceptual data modeling
NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic
No Link Satellites
No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo
Explicit Helper entities
Like Data Vault(2) BUT
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
a UNIQUE Domain specific point of integration
hellipa business entity
hellipwithin itrsquos own domain
hellipdoes not necessarily need to be Enterprise Wide
Business Concept (BC)
rwerschkull
nllinkedincominrogierwerschkull
Why not lsquoenterprise widersquoCompany
Customer
Sales Customer
International
Sales Customer
Local
Sales Customer
Marketing Customer
Customer hellip
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description]
[Owner Responsible]
Column level [Load Date Timestamp]
[Source system] on table file level (lowest possible)
Business Concept Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example-Data
NSR-Station
NS-
Travelcard
NS-
Trainseries
Business Key
IC|855
IC|8852
Sp|7455
St|16050
hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
NS-
Traveller
Business Key
3528 0234 2073 1234
3528 0234 2073 5678
hellip
Business Key
CRM-RW123456
CRM-LAS224466
hellip
rwerschkull
nllinkedincominrogierwerschkull
Most easy entity to be virtualised(if performance allows)
No Hashing amp No surrogateBUSINESSKEYS
(not by default at least)
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Containts Context about a Concept In a historical way
hellipLike a Data Vault Satellite
Every CC belongs to only one BC
Seperate entity per source system table stream
Concept Context (CC)
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Concept Context Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
Example-dataBK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts META_Deleted_Ind
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0
Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0
Asa hellip hellip hellip hellip hellip hellip
NSR-Station
[adres]Source NSS1 table y
More difficult to be virtualisedDepends on semantic gap with source
But do make virtual when lsquostreaming datarsquo is necessary
Because we have PS layerExposing all columns not necessary
Refactoring is more easyhellip
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Relations between Concepts + Context
In a historical way
hellipMerger of Data Vault Link + Link Satellite
Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point
in time
Connector (C)
rwerschkull
nllinkedincominrogierwerschkull
Explicitly defining a driving key as metadatahellip
Gives business understanding
Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)
bull is not registered as a new lsquoconnectionrsquo
Connector Driving key
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Connector Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
Driving Key
NS-Travelcard
+Checkin timestamp
Example-Data
NSR-StationNS-
Travelcard
NS-
Trainseries
rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
BK_NSR-
Station-from
BK_NSR-
Station-to
BK_NS_
Trainseries
BK_NS-Travelcard Checkin
timestamp
Checkout
timestamp
META_Load_dts META_Load_end_dts
Asd Ut IC 855 3528 0234 2073
1234
5-4-2016 84932 5-4-2016 94012 6-4-2016
220000
31-12-9999
000000
Ut Asd IC 855 3528 0234 2073
1234
5-4-2016 181009 5-4-2016 185520 6-4-2016
220000
31-12-9999
000000
We add Two mandatory
HELPEREntities
Are we there yet No
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
To help switching from sources that are Tied together by technical (surrogate) keyshellip
To a Business Key based model
Itrsquos a LOOKUP table that translates the technical to the
Business Key
Business Alias
Example
NSR-Station
[valuation]Source NSS2 table p
[description]SourceNSS1 table x[adres]
Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
BA-NSS2
Key Lookup voor NSS1 source tables
Key Lookup voor NSS2 source tables
Example-data
NSR-Station
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
Key Lookup voor NSS1 source tables
Business Key NSS1_Surrogate_key
Ut 123522
Asd 666323
Asa 222443
hellip hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
Has a 1 on (01) relation with a Business Concept
More difficult to be virtualised Lookup table should be kept small
Therefore DO NOT do key lookup in Concept Context entity
Load generate together with BC
Preferably lsquoin memoryrsquo somehowhellip
BA important Details
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Integrate the validity timelines of Concept Contextsbelonging to a Business Concept
Like a Data Vault Point-in-time construct
But Mandatory
And with a clearly defined and performantapproach
BC-Timeline
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-Station
[valuation]Source NSS2 table p
[adres]Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BK_NSR-
Station
WOZ
waarde
Waarde
Ratingbureau X
META_Laad_dts META_Laad_eind_dts
Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959
Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959
Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000
BK_NSR-
Station
Combined_Load_dts Combined_Load_end_dts
Ut 5-6-2013 220000 1-1-2014 215959
Ut 1-1-2014 220000 1-1-2015 215959
Ut 1-1-2015 220000 4-7-2015 215959
Ut 4-7-2015 220000 1-3-2016 215959
Ut 1-3-2016 220000 31-12-9999 000000
Asd 5-6-2013 220000 31-12-9999 000000
BK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959
Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000
Asa hellip hellip hellip hellip hellip hellip
BCT
modelling
rwerschkull
nllinkedincominrogierwerschkull
What
Makes PS-3Ca Different Ensemble
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
rwerschkull
nllinkedincominrogierwerschkull
1) Explicitly Splitting The work
Data
+History
Subjects
+Integration
rwerschkull
nllinkedincominrogierwerschkull
2) NO HASHED BUSINeSS KEYS
or surrogate keys
httpwwwcannabisculturecomfilesimages6hashbrickJPG
Only
Concatenatedones
rwerschkull
nllinkedincominrogierwerschkull
3) Less
joinsRelation
+Technical validity timeline
+ Relation context
Together in one entityPhoto credit Public Domain
rwerschkull
nllinkedincominrogierwerschkull
4) Explicit
HelpEREntities
Business Alias
Business Component Timeline
+ explicitly define Driving key(s)
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
problem
rwerschkull
nllinkedincominrogierwerschkull
Not Build for
BiG data lake
Data CentrICITY
Photo credit Lake Public Domain httpwwwwriteupsorgstar-trek-brent-spiner-datarwerschkull
nllinkedincominrogierwerschkull
lsquoData may first be stored in a
data lake so that it can be explored cleaned and prepared
If it can be structured in a relational format (basically rows and columns) and needs to be used frequently and kept highly secure it may go into a
data warehouse
If it stops being used frequently it may go back to a HDFS
(Hadoop Distributed File System)-based archiversquo
Data Centric data first THOMAS H DAVENPORT WALL STEET JOURNAL OF 3-6-2015
httpblogswsjcomcio20150603the-shift-to-a-new-data-architecture rwerschkull
nllinkedincominrogierwerschkull
Systems Like
rwerschkull
nllinkedincominrogierwerschkull
Data Flood
Photo credit Kurayba (httpswwwflickrcomphotos48503330N0828564454666 )
under cc licence (httpscreativecommonsorglicensesby-sa20)
rwerschkull
nllinkedincominrogierwerschkull
The possible resulthellip
Photo credit httpshighfiveexportswordpresscom201006253000-pieces-lego-mix-specialty-pieces-rare-pieces-bricks-blocks-
parts-more-ultimate-lot-of-lego-parts-pieces-lego-for-sale-lego-batman-lego-starwars-lego-technic-lego-minifigur
rwerschkull
nllinkedincominrogierwerschkull
But isNrsquot Data Vault v2
lsquomade for
Big data centric
systemsrsquorwerschkull
nllinkedincominrogierwerschkull
In
DV2you still
do thisin one go
Subject Oriented
Integrated
Time Variant Non-Volatile
EDW
rwerschkull
nllinkedincominrogierwerschkull
Coding =
A lot like modelling
Being Data Centric
conflictswith thecomplex Data
MODELLINGwork
httpxkcdcom844
problem
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Less Mature
JOINoptimizers
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Key-Value
Document
Column Family
NoSQL Databases
+SQL on Hadoop solutions
Do NOT like joins
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
This
REALLYcomplicates
AnchoRModeling
rwerschkull
nllinkedincominrogierwerschkull
And Personally
HUB
SAT
LINK
This one
too(Link Satellite)
HUB
HUB
SAT
SAT
rwerschkull
nllinkedincominrogierwerschkull
problem
rwerschkull
nllinkedincominrogierwerschkull
a) HASHING OF Business keys
Rolling
Stock Nr Datetime Sensor Id Value Concatenated Business Key
Key
Len MD5 Hash
Key
Len
8739
2015-01-22
013427 72A1_FINV 123 8739|2015-01-22 013427|72A1_FINV 34 86ae4c6b0e2e2d5a13a0d11440529aeb 32
8739
2015-01-22
013427 72A1_SLDET 100 8739|2015-01-22 013427|72A1_SLDET 35 51ce9bc292eef407bd7c91a52eebcf2e 32
8739
2015-01-22
013432 72A1_FINV 126 8739|2015-01-22 013432|72A1_FINV 34 9482a41c1fecc4c64b8c437af6cc85e8 32
8739
2015-01-22
013432
13A8_MW_UB
AT_VT 5 8739|2015-01-22 013432|13A8_MW_UBAT_VT 42 e4160914ee55ce0b93f87b23366a0ce3 32
8674
2015-01-22
013426 72A1_FINV 6 8674|2015-01-22 013426|72A1_FINV 34 fcb3e7c8c91e44ce396d908a4948ca65 32
8674
2015-01-22
013426
16A1_HSVER
OND 7 8674|2015-01-22 013426|16A1_HSVEROND 38 fe9098c8c291ad56af5c8afae5169196 32
Loses statistical Informationregarding the data distributionQuery optimizers do not like thishellip
Column family Document and Key-value databases need a
good (natural) sharding key for (partial) key-
lookups
Hashinghelliphellip
httpwwwebaytechblogcom20120814cassandra-data-modeling-best-practices-part-2rwerschkull
nllinkedincominrogierwerschkull
Surrogates keys require
centralized coordination
hellipand thus can impact the overall systemrsquos scalability and availability
A lot of MPP NoSQL databases simply do not have themhellip
B) Surrogate BuSINESS keys
rwerschkull
nllinkedincominrogierwerschkull
Then Some Inspiration
httproelantvoscomblogp=1119
rwerschkull
nllinkedincominrogierwerschkull
lsquoIn my opinion the answer lies in the adoption of the
persistent (Historical) Staging Area concept
(also known as Historical Staging or the History Area)
This basically adopts the fundamentals of a Data Warehousersquo
lsquoThe Historical Staging Area effectively lsquoactsrsquo as
Data Lake but in a better defined form as data deltas and
event datetimes are taken into accountrsquo
So
Time Variant
amp
Non Volatile
Subject Oriented
amp
Integrated
EDW
Subject Oriented Integrated
Time Variant Non-Volatile
EDW
rwerschkull
nllinkedincominrogierwerschkull
Could be a
Data LAKE
VirtualisedEnsemble
Tier
EDW
Time Variant
amp
Non Volatile
Subject Oriented
amp
Integrated
EDW
rwerschkull
nllinkedincominrogierwerschkull
How
Does PS-3C Work
rwerschkull
nllinkedincominrogierwerschkullPhoto credit Public Domain
StagingArea
EDWInformation
Marts
Focus of Current ensemble EDWrsquos
rwerschkull
nllinkedincominrogierwerschkull
Persistent StagingArea HSA =
Data LibraryEDW
Information Marts
Splitting the work
rwerschkull
nllinkedincominrogierwerschkull
Persitent
Staging
-
Concept
Context
Connector
Business
Concept
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
Identify source event stream Primary or Unique KeyUse source metadata for this
Automate the building of a PS lsquoaround this keyrsquo Take all columns
Historize using SCD-2 approach
Persistent Staging - how
rwerschkull
nllinkedincominrogierwerschkull
Entity levelUnique key
Functional Description
Delivering party
Owner Responsible
MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)
hellip
Persistent Staging Metadata-1
rwerschkull
nllinkedincominrogierwerschkull
Column level [Load Date Timestamp]
[Load End Date Timestamp]
[Deleted Flag] OR delete as new record
[Source system] on table file level (lowest possible)
Load End Date Timestamp possible but difficulthellipRequires updates
Persistent Staging Metadata-2
rwerschkull
nllinkedincominrogierwerschkull
ACID is possible in HIVE
ACID Makes Updates possibleBy registering updates as lsquonew datarsquo
Reconciliation compacting when idle at user command
Use ORC files
PLUS changing the HIVE configurationhellip
UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)
rwerschkull
nllinkedincominrogierwerschkull
HivePut semi structured data = variable columns in MAP data type
OR use Data storage type that supports schema-evolutionAVRO (ORC in development)
Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo
Schema can be different for every row
What about SEMI-STRUCTURED Data
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
3C - how
rwerschkull
nllinkedincominrogierwerschkull
Always starts with Conceptual data modeling
NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic
No Link Satellites
No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo
Explicit Helper entities
Like Data Vault(2) BUT
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
a UNIQUE Domain specific point of integration
hellipa business entity
hellipwithin itrsquos own domain
hellipdoes not necessarily need to be Enterprise Wide
Business Concept (BC)
rwerschkull
nllinkedincominrogierwerschkull
Why not lsquoenterprise widersquoCompany
Customer
Sales Customer
International
Sales Customer
Local
Sales Customer
Marketing Customer
Customer hellip
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description]
[Owner Responsible]
Column level [Load Date Timestamp]
[Source system] on table file level (lowest possible)
Business Concept Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example-Data
NSR-Station
NS-
Travelcard
NS-
Trainseries
Business Key
IC|855
IC|8852
Sp|7455
St|16050
hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
NS-
Traveller
Business Key
3528 0234 2073 1234
3528 0234 2073 5678
hellip
Business Key
CRM-RW123456
CRM-LAS224466
hellip
rwerschkull
nllinkedincominrogierwerschkull
Most easy entity to be virtualised(if performance allows)
No Hashing amp No surrogateBUSINESSKEYS
(not by default at least)
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Containts Context about a Concept In a historical way
hellipLike a Data Vault Satellite
Every CC belongs to only one BC
Seperate entity per source system table stream
Concept Context (CC)
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Concept Context Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
Example-dataBK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts META_Deleted_Ind
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0
Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0
Asa hellip hellip hellip hellip hellip hellip
NSR-Station
[adres]Source NSS1 table y
More difficult to be virtualisedDepends on semantic gap with source
But do make virtual when lsquostreaming datarsquo is necessary
Because we have PS layerExposing all columns not necessary
Refactoring is more easyhellip
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Relations between Concepts + Context
In a historical way
hellipMerger of Data Vault Link + Link Satellite
Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point
in time
Connector (C)
rwerschkull
nllinkedincominrogierwerschkull
Explicitly defining a driving key as metadatahellip
Gives business understanding
Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)
bull is not registered as a new lsquoconnectionrsquo
Connector Driving key
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Connector Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
Driving Key
NS-Travelcard
+Checkin timestamp
Example-Data
NSR-StationNS-
Travelcard
NS-
Trainseries
rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
BK_NSR-
Station-from
BK_NSR-
Station-to
BK_NS_
Trainseries
BK_NS-Travelcard Checkin
timestamp
Checkout
timestamp
META_Load_dts META_Load_end_dts
Asd Ut IC 855 3528 0234 2073
1234
5-4-2016 84932 5-4-2016 94012 6-4-2016
220000
31-12-9999
000000
Ut Asd IC 855 3528 0234 2073
1234
5-4-2016 181009 5-4-2016 185520 6-4-2016
220000
31-12-9999
000000
We add Two mandatory
HELPEREntities
Are we there yet No
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
To help switching from sources that are Tied together by technical (surrogate) keyshellip
To a Business Key based model
Itrsquos a LOOKUP table that translates the technical to the
Business Key
Business Alias
Example
NSR-Station
[valuation]Source NSS2 table p
[description]SourceNSS1 table x[adres]
Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
BA-NSS2
Key Lookup voor NSS1 source tables
Key Lookup voor NSS2 source tables
Example-data
NSR-Station
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
Key Lookup voor NSS1 source tables
Business Key NSS1_Surrogate_key
Ut 123522
Asd 666323
Asa 222443
hellip hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
Has a 1 on (01) relation with a Business Concept
More difficult to be virtualised Lookup table should be kept small
Therefore DO NOT do key lookup in Concept Context entity
Load generate together with BC
Preferably lsquoin memoryrsquo somehowhellip
BA important Details
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Integrate the validity timelines of Concept Contextsbelonging to a Business Concept
Like a Data Vault Point-in-time construct
But Mandatory
And with a clearly defined and performantapproach
BC-Timeline
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-Station
[valuation]Source NSS2 table p
[adres]Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BK_NSR-
Station
WOZ
waarde
Waarde
Ratingbureau X
META_Laad_dts META_Laad_eind_dts
Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959
Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959
Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000
BK_NSR-
Station
Combined_Load_dts Combined_Load_end_dts
Ut 5-6-2013 220000 1-1-2014 215959
Ut 1-1-2014 220000 1-1-2015 215959
Ut 1-1-2015 220000 4-7-2015 215959
Ut 4-7-2015 220000 1-3-2016 215959
Ut 1-3-2016 220000 31-12-9999 000000
Asd 5-6-2013 220000 31-12-9999 000000
BK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959
Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000
Asa hellip hellip hellip hellip hellip hellip
BCT
modelling
rwerschkull
nllinkedincominrogierwerschkull
What
Makes PS-3Ca Different Ensemble
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
rwerschkull
nllinkedincominrogierwerschkull
1) Explicitly Splitting The work
Data
+History
Subjects
+Integration
rwerschkull
nllinkedincominrogierwerschkull
2) NO HASHED BUSINeSS KEYS
or surrogate keys
httpwwwcannabisculturecomfilesimages6hashbrickJPG
Only
Concatenatedones
rwerschkull
nllinkedincominrogierwerschkull
3) Less
joinsRelation
+Technical validity timeline
+ Relation context
Together in one entityPhoto credit Public Domain
rwerschkull
nllinkedincominrogierwerschkull
4) Explicit
HelpEREntities
Business Alias
Business Component Timeline
+ explicitly define Driving key(s)
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
Not Build for
BiG data lake
Data CentrICITY
Photo credit Lake Public Domain httpwwwwriteupsorgstar-trek-brent-spiner-datarwerschkull
nllinkedincominrogierwerschkull
lsquoData may first be stored in a
data lake so that it can be explored cleaned and prepared
If it can be structured in a relational format (basically rows and columns) and needs to be used frequently and kept highly secure it may go into a
data warehouse
If it stops being used frequently it may go back to a HDFS
(Hadoop Distributed File System)-based archiversquo
Data Centric data first THOMAS H DAVENPORT WALL STEET JOURNAL OF 3-6-2015
httpblogswsjcomcio20150603the-shift-to-a-new-data-architecture rwerschkull
nllinkedincominrogierwerschkull
Systems Like
rwerschkull
nllinkedincominrogierwerschkull
Data Flood
Photo credit Kurayba (httpswwwflickrcomphotos48503330N0828564454666 )
under cc licence (httpscreativecommonsorglicensesby-sa20)
rwerschkull
nllinkedincominrogierwerschkull
The possible resulthellip
Photo credit httpshighfiveexportswordpresscom201006253000-pieces-lego-mix-specialty-pieces-rare-pieces-bricks-blocks-
parts-more-ultimate-lot-of-lego-parts-pieces-lego-for-sale-lego-batman-lego-starwars-lego-technic-lego-minifigur
rwerschkull
nllinkedincominrogierwerschkull
But isNrsquot Data Vault v2
lsquomade for
Big data centric
systemsrsquorwerschkull
nllinkedincominrogierwerschkull
In
DV2you still
do thisin one go
Subject Oriented
Integrated
Time Variant Non-Volatile
EDW
rwerschkull
nllinkedincominrogierwerschkull
Coding =
A lot like modelling
Being Data Centric
conflictswith thecomplex Data
MODELLINGwork
httpxkcdcom844
problem
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Less Mature
JOINoptimizers
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Key-Value
Document
Column Family
NoSQL Databases
+SQL on Hadoop solutions
Do NOT like joins
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
This
REALLYcomplicates
AnchoRModeling
rwerschkull
nllinkedincominrogierwerschkull
And Personally
HUB
SAT
LINK
This one
too(Link Satellite)
HUB
HUB
SAT
SAT
rwerschkull
nllinkedincominrogierwerschkull
problem
rwerschkull
nllinkedincominrogierwerschkull
a) HASHING OF Business keys
Rolling
Stock Nr Datetime Sensor Id Value Concatenated Business Key
Key
Len MD5 Hash
Key
Len
8739
2015-01-22
013427 72A1_FINV 123 8739|2015-01-22 013427|72A1_FINV 34 86ae4c6b0e2e2d5a13a0d11440529aeb 32
8739
2015-01-22
013427 72A1_SLDET 100 8739|2015-01-22 013427|72A1_SLDET 35 51ce9bc292eef407bd7c91a52eebcf2e 32
8739
2015-01-22
013432 72A1_FINV 126 8739|2015-01-22 013432|72A1_FINV 34 9482a41c1fecc4c64b8c437af6cc85e8 32
8739
2015-01-22
013432
13A8_MW_UB
AT_VT 5 8739|2015-01-22 013432|13A8_MW_UBAT_VT 42 e4160914ee55ce0b93f87b23366a0ce3 32
8674
2015-01-22
013426 72A1_FINV 6 8674|2015-01-22 013426|72A1_FINV 34 fcb3e7c8c91e44ce396d908a4948ca65 32
8674
2015-01-22
013426
16A1_HSVER
OND 7 8674|2015-01-22 013426|16A1_HSVEROND 38 fe9098c8c291ad56af5c8afae5169196 32
Loses statistical Informationregarding the data distributionQuery optimizers do not like thishellip
Column family Document and Key-value databases need a
good (natural) sharding key for (partial) key-
lookups
Hashinghelliphellip
httpwwwebaytechblogcom20120814cassandra-data-modeling-best-practices-part-2rwerschkull
nllinkedincominrogierwerschkull
Surrogates keys require
centralized coordination
hellipand thus can impact the overall systemrsquos scalability and availability
A lot of MPP NoSQL databases simply do not have themhellip
B) Surrogate BuSINESS keys
rwerschkull
nllinkedincominrogierwerschkull
Then Some Inspiration
httproelantvoscomblogp=1119
rwerschkull
nllinkedincominrogierwerschkull
lsquoIn my opinion the answer lies in the adoption of the
persistent (Historical) Staging Area concept
(also known as Historical Staging or the History Area)
This basically adopts the fundamentals of a Data Warehousersquo
lsquoThe Historical Staging Area effectively lsquoactsrsquo as
Data Lake but in a better defined form as data deltas and
event datetimes are taken into accountrsquo
So
Time Variant
amp
Non Volatile
Subject Oriented
amp
Integrated
EDW
Subject Oriented Integrated
Time Variant Non-Volatile
EDW
rwerschkull
nllinkedincominrogierwerschkull
Could be a
Data LAKE
VirtualisedEnsemble
Tier
EDW
Time Variant
amp
Non Volatile
Subject Oriented
amp
Integrated
EDW
rwerschkull
nllinkedincominrogierwerschkull
How
Does PS-3C Work
rwerschkull
nllinkedincominrogierwerschkullPhoto credit Public Domain
StagingArea
EDWInformation
Marts
Focus of Current ensemble EDWrsquos
rwerschkull
nllinkedincominrogierwerschkull
Persistent StagingArea HSA =
Data LibraryEDW
Information Marts
Splitting the work
rwerschkull
nllinkedincominrogierwerschkull
Persitent
Staging
-
Concept
Context
Connector
Business
Concept
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
Identify source event stream Primary or Unique KeyUse source metadata for this
Automate the building of a PS lsquoaround this keyrsquo Take all columns
Historize using SCD-2 approach
Persistent Staging - how
rwerschkull
nllinkedincominrogierwerschkull
Entity levelUnique key
Functional Description
Delivering party
Owner Responsible
MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)
hellip
Persistent Staging Metadata-1
rwerschkull
nllinkedincominrogierwerschkull
Column level [Load Date Timestamp]
[Load End Date Timestamp]
[Deleted Flag] OR delete as new record
[Source system] on table file level (lowest possible)
Load End Date Timestamp possible but difficulthellipRequires updates
Persistent Staging Metadata-2
rwerschkull
nllinkedincominrogierwerschkull
ACID is possible in HIVE
ACID Makes Updates possibleBy registering updates as lsquonew datarsquo
Reconciliation compacting when idle at user command
Use ORC files
PLUS changing the HIVE configurationhellip
UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)
rwerschkull
nllinkedincominrogierwerschkull
HivePut semi structured data = variable columns in MAP data type
OR use Data storage type that supports schema-evolutionAVRO (ORC in development)
Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo
Schema can be different for every row
What about SEMI-STRUCTURED Data
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
3C - how
rwerschkull
nllinkedincominrogierwerschkull
Always starts with Conceptual data modeling
NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic
No Link Satellites
No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo
Explicit Helper entities
Like Data Vault(2) BUT
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
a UNIQUE Domain specific point of integration
hellipa business entity
hellipwithin itrsquos own domain
hellipdoes not necessarily need to be Enterprise Wide
Business Concept (BC)
rwerschkull
nllinkedincominrogierwerschkull
Why not lsquoenterprise widersquoCompany
Customer
Sales Customer
International
Sales Customer
Local
Sales Customer
Marketing Customer
Customer hellip
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description]
[Owner Responsible]
Column level [Load Date Timestamp]
[Source system] on table file level (lowest possible)
Business Concept Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example-Data
NSR-Station
NS-
Travelcard
NS-
Trainseries
Business Key
IC|855
IC|8852
Sp|7455
St|16050
hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
NS-
Traveller
Business Key
3528 0234 2073 1234
3528 0234 2073 5678
hellip
Business Key
CRM-RW123456
CRM-LAS224466
hellip
rwerschkull
nllinkedincominrogierwerschkull
Most easy entity to be virtualised(if performance allows)
No Hashing amp No surrogateBUSINESSKEYS
(not by default at least)
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Containts Context about a Concept In a historical way
hellipLike a Data Vault Satellite
Every CC belongs to only one BC
Seperate entity per source system table stream
Concept Context (CC)
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Concept Context Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
Example-dataBK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts META_Deleted_Ind
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0
Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0
Asa hellip hellip hellip hellip hellip hellip
NSR-Station
[adres]Source NSS1 table y
More difficult to be virtualisedDepends on semantic gap with source
But do make virtual when lsquostreaming datarsquo is necessary
Because we have PS layerExposing all columns not necessary
Refactoring is more easyhellip
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Relations between Concepts + Context
In a historical way
hellipMerger of Data Vault Link + Link Satellite
Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point
in time
Connector (C)
rwerschkull
nllinkedincominrogierwerschkull
Explicitly defining a driving key as metadatahellip
Gives business understanding
Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)
bull is not registered as a new lsquoconnectionrsquo
Connector Driving key
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Connector Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
Driving Key
NS-Travelcard
+Checkin timestamp
Example-Data
NSR-StationNS-
Travelcard
NS-
Trainseries
rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
BK_NSR-
Station-from
BK_NSR-
Station-to
BK_NS_
Trainseries
BK_NS-Travelcard Checkin
timestamp
Checkout
timestamp
META_Load_dts META_Load_end_dts
Asd Ut IC 855 3528 0234 2073
1234
5-4-2016 84932 5-4-2016 94012 6-4-2016
220000
31-12-9999
000000
Ut Asd IC 855 3528 0234 2073
1234
5-4-2016 181009 5-4-2016 185520 6-4-2016
220000
31-12-9999
000000
We add Two mandatory
HELPEREntities
Are we there yet No
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
To help switching from sources that are Tied together by technical (surrogate) keyshellip
To a Business Key based model
Itrsquos a LOOKUP table that translates the technical to the
Business Key
Business Alias
Example
NSR-Station
[valuation]Source NSS2 table p
[description]SourceNSS1 table x[adres]
Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
BA-NSS2
Key Lookup voor NSS1 source tables
Key Lookup voor NSS2 source tables
Example-data
NSR-Station
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
Key Lookup voor NSS1 source tables
Business Key NSS1_Surrogate_key
Ut 123522
Asd 666323
Asa 222443
hellip hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
Has a 1 on (01) relation with a Business Concept
More difficult to be virtualised Lookup table should be kept small
Therefore DO NOT do key lookup in Concept Context entity
Load generate together with BC
Preferably lsquoin memoryrsquo somehowhellip
BA important Details
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Integrate the validity timelines of Concept Contextsbelonging to a Business Concept
Like a Data Vault Point-in-time construct
But Mandatory
And with a clearly defined and performantapproach
BC-Timeline
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-Station
[valuation]Source NSS2 table p
[adres]Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BK_NSR-
Station
WOZ
waarde
Waarde
Ratingbureau X
META_Laad_dts META_Laad_eind_dts
Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959
Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959
Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000
BK_NSR-
Station
Combined_Load_dts Combined_Load_end_dts
Ut 5-6-2013 220000 1-1-2014 215959
Ut 1-1-2014 220000 1-1-2015 215959
Ut 1-1-2015 220000 4-7-2015 215959
Ut 4-7-2015 220000 1-3-2016 215959
Ut 1-3-2016 220000 31-12-9999 000000
Asd 5-6-2013 220000 31-12-9999 000000
BK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959
Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000
Asa hellip hellip hellip hellip hellip hellip
BCT
modelling
rwerschkull
nllinkedincominrogierwerschkull
What
Makes PS-3Ca Different Ensemble
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
rwerschkull
nllinkedincominrogierwerschkull
1) Explicitly Splitting The work
Data
+History
Subjects
+Integration
rwerschkull
nllinkedincominrogierwerschkull
2) NO HASHED BUSINeSS KEYS
or surrogate keys
httpwwwcannabisculturecomfilesimages6hashbrickJPG
Only
Concatenatedones
rwerschkull
nllinkedincominrogierwerschkull
3) Less
joinsRelation
+Technical validity timeline
+ Relation context
Together in one entityPhoto credit Public Domain
rwerschkull
nllinkedincominrogierwerschkull
4) Explicit
HelpEREntities
Business Alias
Business Component Timeline
+ explicitly define Driving key(s)
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
lsquoData may first be stored in a
data lake so that it can be explored cleaned and prepared
If it can be structured in a relational format (basically rows and columns) and needs to be used frequently and kept highly secure it may go into a
data warehouse
If it stops being used frequently it may go back to a HDFS
(Hadoop Distributed File System)-based archiversquo
Data Centric data first THOMAS H DAVENPORT WALL STEET JOURNAL OF 3-6-2015
httpblogswsjcomcio20150603the-shift-to-a-new-data-architecture rwerschkull
nllinkedincominrogierwerschkull
Systems Like
rwerschkull
nllinkedincominrogierwerschkull
Data Flood
Photo credit Kurayba (httpswwwflickrcomphotos48503330N0828564454666 )
under cc licence (httpscreativecommonsorglicensesby-sa20)
rwerschkull
nllinkedincominrogierwerschkull
The possible resulthellip
Photo credit httpshighfiveexportswordpresscom201006253000-pieces-lego-mix-specialty-pieces-rare-pieces-bricks-blocks-
parts-more-ultimate-lot-of-lego-parts-pieces-lego-for-sale-lego-batman-lego-starwars-lego-technic-lego-minifigur
rwerschkull
nllinkedincominrogierwerschkull
But isNrsquot Data Vault v2
lsquomade for
Big data centric
systemsrsquorwerschkull
nllinkedincominrogierwerschkull
In
DV2you still
do thisin one go
Subject Oriented
Integrated
Time Variant Non-Volatile
EDW
rwerschkull
nllinkedincominrogierwerschkull
Coding =
A lot like modelling
Being Data Centric
conflictswith thecomplex Data
MODELLINGwork
httpxkcdcom844
problem
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Less Mature
JOINoptimizers
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Key-Value
Document
Column Family
NoSQL Databases
+SQL on Hadoop solutions
Do NOT like joins
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
This
REALLYcomplicates
AnchoRModeling
rwerschkull
nllinkedincominrogierwerschkull
And Personally
HUB
SAT
LINK
This one
too(Link Satellite)
HUB
HUB
SAT
SAT
rwerschkull
nllinkedincominrogierwerschkull
problem
rwerschkull
nllinkedincominrogierwerschkull
a) HASHING OF Business keys
Rolling
Stock Nr Datetime Sensor Id Value Concatenated Business Key
Key
Len MD5 Hash
Key
Len
8739
2015-01-22
013427 72A1_FINV 123 8739|2015-01-22 013427|72A1_FINV 34 86ae4c6b0e2e2d5a13a0d11440529aeb 32
8739
2015-01-22
013427 72A1_SLDET 100 8739|2015-01-22 013427|72A1_SLDET 35 51ce9bc292eef407bd7c91a52eebcf2e 32
8739
2015-01-22
013432 72A1_FINV 126 8739|2015-01-22 013432|72A1_FINV 34 9482a41c1fecc4c64b8c437af6cc85e8 32
8739
2015-01-22
013432
13A8_MW_UB
AT_VT 5 8739|2015-01-22 013432|13A8_MW_UBAT_VT 42 e4160914ee55ce0b93f87b23366a0ce3 32
8674
2015-01-22
013426 72A1_FINV 6 8674|2015-01-22 013426|72A1_FINV 34 fcb3e7c8c91e44ce396d908a4948ca65 32
8674
2015-01-22
013426
16A1_HSVER
OND 7 8674|2015-01-22 013426|16A1_HSVEROND 38 fe9098c8c291ad56af5c8afae5169196 32
Loses statistical Informationregarding the data distributionQuery optimizers do not like thishellip
Column family Document and Key-value databases need a
good (natural) sharding key for (partial) key-
lookups
Hashinghelliphellip
httpwwwebaytechblogcom20120814cassandra-data-modeling-best-practices-part-2rwerschkull
nllinkedincominrogierwerschkull
Surrogates keys require
centralized coordination
hellipand thus can impact the overall systemrsquos scalability and availability
A lot of MPP NoSQL databases simply do not have themhellip
B) Surrogate BuSINESS keys
rwerschkull
nllinkedincominrogierwerschkull
Then Some Inspiration
httproelantvoscomblogp=1119
rwerschkull
nllinkedincominrogierwerschkull
lsquoIn my opinion the answer lies in the adoption of the
persistent (Historical) Staging Area concept
(also known as Historical Staging or the History Area)
This basically adopts the fundamentals of a Data Warehousersquo
lsquoThe Historical Staging Area effectively lsquoactsrsquo as
Data Lake but in a better defined form as data deltas and
event datetimes are taken into accountrsquo
So
Time Variant
amp
Non Volatile
Subject Oriented
amp
Integrated
EDW
Subject Oriented Integrated
Time Variant Non-Volatile
EDW
rwerschkull
nllinkedincominrogierwerschkull
Could be a
Data LAKE
VirtualisedEnsemble
Tier
EDW
Time Variant
amp
Non Volatile
Subject Oriented
amp
Integrated
EDW
rwerschkull
nllinkedincominrogierwerschkull
How
Does PS-3C Work
rwerschkull
nllinkedincominrogierwerschkullPhoto credit Public Domain
StagingArea
EDWInformation
Marts
Focus of Current ensemble EDWrsquos
rwerschkull
nllinkedincominrogierwerschkull
Persistent StagingArea HSA =
Data LibraryEDW
Information Marts
Splitting the work
rwerschkull
nllinkedincominrogierwerschkull
Persitent
Staging
-
Concept
Context
Connector
Business
Concept
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
Identify source event stream Primary or Unique KeyUse source metadata for this
Automate the building of a PS lsquoaround this keyrsquo Take all columns
Historize using SCD-2 approach
Persistent Staging - how
rwerschkull
nllinkedincominrogierwerschkull
Entity levelUnique key
Functional Description
Delivering party
Owner Responsible
MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)
hellip
Persistent Staging Metadata-1
rwerschkull
nllinkedincominrogierwerschkull
Column level [Load Date Timestamp]
[Load End Date Timestamp]
[Deleted Flag] OR delete as new record
[Source system] on table file level (lowest possible)
Load End Date Timestamp possible but difficulthellipRequires updates
Persistent Staging Metadata-2
rwerschkull
nllinkedincominrogierwerschkull
ACID is possible in HIVE
ACID Makes Updates possibleBy registering updates as lsquonew datarsquo
Reconciliation compacting when idle at user command
Use ORC files
PLUS changing the HIVE configurationhellip
UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)
rwerschkull
nllinkedincominrogierwerschkull
HivePut semi structured data = variable columns in MAP data type
OR use Data storage type that supports schema-evolutionAVRO (ORC in development)
Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo
Schema can be different for every row
What about SEMI-STRUCTURED Data
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
3C - how
rwerschkull
nllinkedincominrogierwerschkull
Always starts with Conceptual data modeling
NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic
No Link Satellites
No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo
Explicit Helper entities
Like Data Vault(2) BUT
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
a UNIQUE Domain specific point of integration
hellipa business entity
hellipwithin itrsquos own domain
hellipdoes not necessarily need to be Enterprise Wide
Business Concept (BC)
rwerschkull
nllinkedincominrogierwerschkull
Why not lsquoenterprise widersquoCompany
Customer
Sales Customer
International
Sales Customer
Local
Sales Customer
Marketing Customer
Customer hellip
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description]
[Owner Responsible]
Column level [Load Date Timestamp]
[Source system] on table file level (lowest possible)
Business Concept Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example-Data
NSR-Station
NS-
Travelcard
NS-
Trainseries
Business Key
IC|855
IC|8852
Sp|7455
St|16050
hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
NS-
Traveller
Business Key
3528 0234 2073 1234
3528 0234 2073 5678
hellip
Business Key
CRM-RW123456
CRM-LAS224466
hellip
rwerschkull
nllinkedincominrogierwerschkull
Most easy entity to be virtualised(if performance allows)
No Hashing amp No surrogateBUSINESSKEYS
(not by default at least)
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Containts Context about a Concept In a historical way
hellipLike a Data Vault Satellite
Every CC belongs to only one BC
Seperate entity per source system table stream
Concept Context (CC)
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Concept Context Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
Example-dataBK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts META_Deleted_Ind
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0
Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0
Asa hellip hellip hellip hellip hellip hellip
NSR-Station
[adres]Source NSS1 table y
More difficult to be virtualisedDepends on semantic gap with source
But do make virtual when lsquostreaming datarsquo is necessary
Because we have PS layerExposing all columns not necessary
Refactoring is more easyhellip
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Relations between Concepts + Context
In a historical way
hellipMerger of Data Vault Link + Link Satellite
Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point
in time
Connector (C)
rwerschkull
nllinkedincominrogierwerschkull
Explicitly defining a driving key as metadatahellip
Gives business understanding
Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)
bull is not registered as a new lsquoconnectionrsquo
Connector Driving key
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Connector Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
Driving Key
NS-Travelcard
+Checkin timestamp
Example-Data
NSR-StationNS-
Travelcard
NS-
Trainseries
rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
BK_NSR-
Station-from
BK_NSR-
Station-to
BK_NS_
Trainseries
BK_NS-Travelcard Checkin
timestamp
Checkout
timestamp
META_Load_dts META_Load_end_dts
Asd Ut IC 855 3528 0234 2073
1234
5-4-2016 84932 5-4-2016 94012 6-4-2016
220000
31-12-9999
000000
Ut Asd IC 855 3528 0234 2073
1234
5-4-2016 181009 5-4-2016 185520 6-4-2016
220000
31-12-9999
000000
We add Two mandatory
HELPEREntities
Are we there yet No
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
To help switching from sources that are Tied together by technical (surrogate) keyshellip
To a Business Key based model
Itrsquos a LOOKUP table that translates the technical to the
Business Key
Business Alias
Example
NSR-Station
[valuation]Source NSS2 table p
[description]SourceNSS1 table x[adres]
Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
BA-NSS2
Key Lookup voor NSS1 source tables
Key Lookup voor NSS2 source tables
Example-data
NSR-Station
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
Key Lookup voor NSS1 source tables
Business Key NSS1_Surrogate_key
Ut 123522
Asd 666323
Asa 222443
hellip hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
Has a 1 on (01) relation with a Business Concept
More difficult to be virtualised Lookup table should be kept small
Therefore DO NOT do key lookup in Concept Context entity
Load generate together with BC
Preferably lsquoin memoryrsquo somehowhellip
BA important Details
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Integrate the validity timelines of Concept Contextsbelonging to a Business Concept
Like a Data Vault Point-in-time construct
But Mandatory
And with a clearly defined and performantapproach
BC-Timeline
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-Station
[valuation]Source NSS2 table p
[adres]Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BK_NSR-
Station
WOZ
waarde
Waarde
Ratingbureau X
META_Laad_dts META_Laad_eind_dts
Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959
Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959
Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000
BK_NSR-
Station
Combined_Load_dts Combined_Load_end_dts
Ut 5-6-2013 220000 1-1-2014 215959
Ut 1-1-2014 220000 1-1-2015 215959
Ut 1-1-2015 220000 4-7-2015 215959
Ut 4-7-2015 220000 1-3-2016 215959
Ut 1-3-2016 220000 31-12-9999 000000
Asd 5-6-2013 220000 31-12-9999 000000
BK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959
Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000
Asa hellip hellip hellip hellip hellip hellip
BCT
modelling
rwerschkull
nllinkedincominrogierwerschkull
What
Makes PS-3Ca Different Ensemble
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
rwerschkull
nllinkedincominrogierwerschkull
1) Explicitly Splitting The work
Data
+History
Subjects
+Integration
rwerschkull
nllinkedincominrogierwerschkull
2) NO HASHED BUSINeSS KEYS
or surrogate keys
httpwwwcannabisculturecomfilesimages6hashbrickJPG
Only
Concatenatedones
rwerschkull
nllinkedincominrogierwerschkull
3) Less
joinsRelation
+Technical validity timeline
+ Relation context
Together in one entityPhoto credit Public Domain
rwerschkull
nllinkedincominrogierwerschkull
4) Explicit
HelpEREntities
Business Alias
Business Component Timeline
+ explicitly define Driving key(s)
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
Systems Like
rwerschkull
nllinkedincominrogierwerschkull
Data Flood
Photo credit Kurayba (httpswwwflickrcomphotos48503330N0828564454666 )
under cc licence (httpscreativecommonsorglicensesby-sa20)
rwerschkull
nllinkedincominrogierwerschkull
The possible resulthellip
Photo credit httpshighfiveexportswordpresscom201006253000-pieces-lego-mix-specialty-pieces-rare-pieces-bricks-blocks-
parts-more-ultimate-lot-of-lego-parts-pieces-lego-for-sale-lego-batman-lego-starwars-lego-technic-lego-minifigur
rwerschkull
nllinkedincominrogierwerschkull
But isNrsquot Data Vault v2
lsquomade for
Big data centric
systemsrsquorwerschkull
nllinkedincominrogierwerschkull
In
DV2you still
do thisin one go
Subject Oriented
Integrated
Time Variant Non-Volatile
EDW
rwerschkull
nllinkedincominrogierwerschkull
Coding =
A lot like modelling
Being Data Centric
conflictswith thecomplex Data
MODELLINGwork
httpxkcdcom844
problem
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Less Mature
JOINoptimizers
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Key-Value
Document
Column Family
NoSQL Databases
+SQL on Hadoop solutions
Do NOT like joins
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
This
REALLYcomplicates
AnchoRModeling
rwerschkull
nllinkedincominrogierwerschkull
And Personally
HUB
SAT
LINK
This one
too(Link Satellite)
HUB
HUB
SAT
SAT
rwerschkull
nllinkedincominrogierwerschkull
problem
rwerschkull
nllinkedincominrogierwerschkull
a) HASHING OF Business keys
Rolling
Stock Nr Datetime Sensor Id Value Concatenated Business Key
Key
Len MD5 Hash
Key
Len
8739
2015-01-22
013427 72A1_FINV 123 8739|2015-01-22 013427|72A1_FINV 34 86ae4c6b0e2e2d5a13a0d11440529aeb 32
8739
2015-01-22
013427 72A1_SLDET 100 8739|2015-01-22 013427|72A1_SLDET 35 51ce9bc292eef407bd7c91a52eebcf2e 32
8739
2015-01-22
013432 72A1_FINV 126 8739|2015-01-22 013432|72A1_FINV 34 9482a41c1fecc4c64b8c437af6cc85e8 32
8739
2015-01-22
013432
13A8_MW_UB
AT_VT 5 8739|2015-01-22 013432|13A8_MW_UBAT_VT 42 e4160914ee55ce0b93f87b23366a0ce3 32
8674
2015-01-22
013426 72A1_FINV 6 8674|2015-01-22 013426|72A1_FINV 34 fcb3e7c8c91e44ce396d908a4948ca65 32
8674
2015-01-22
013426
16A1_HSVER
OND 7 8674|2015-01-22 013426|16A1_HSVEROND 38 fe9098c8c291ad56af5c8afae5169196 32
Loses statistical Informationregarding the data distributionQuery optimizers do not like thishellip
Column family Document and Key-value databases need a
good (natural) sharding key for (partial) key-
lookups
Hashinghelliphellip
httpwwwebaytechblogcom20120814cassandra-data-modeling-best-practices-part-2rwerschkull
nllinkedincominrogierwerschkull
Surrogates keys require
centralized coordination
hellipand thus can impact the overall systemrsquos scalability and availability
A lot of MPP NoSQL databases simply do not have themhellip
B) Surrogate BuSINESS keys
rwerschkull
nllinkedincominrogierwerschkull
Then Some Inspiration
httproelantvoscomblogp=1119
rwerschkull
nllinkedincominrogierwerschkull
lsquoIn my opinion the answer lies in the adoption of the
persistent (Historical) Staging Area concept
(also known as Historical Staging or the History Area)
This basically adopts the fundamentals of a Data Warehousersquo
lsquoThe Historical Staging Area effectively lsquoactsrsquo as
Data Lake but in a better defined form as data deltas and
event datetimes are taken into accountrsquo
So
Time Variant
amp
Non Volatile
Subject Oriented
amp
Integrated
EDW
Subject Oriented Integrated
Time Variant Non-Volatile
EDW
rwerschkull
nllinkedincominrogierwerschkull
Could be a
Data LAKE
VirtualisedEnsemble
Tier
EDW
Time Variant
amp
Non Volatile
Subject Oriented
amp
Integrated
EDW
rwerschkull
nllinkedincominrogierwerschkull
How
Does PS-3C Work
rwerschkull
nllinkedincominrogierwerschkullPhoto credit Public Domain
StagingArea
EDWInformation
Marts
Focus of Current ensemble EDWrsquos
rwerschkull
nllinkedincominrogierwerschkull
Persistent StagingArea HSA =
Data LibraryEDW
Information Marts
Splitting the work
rwerschkull
nllinkedincominrogierwerschkull
Persitent
Staging
-
Concept
Context
Connector
Business
Concept
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
Identify source event stream Primary or Unique KeyUse source metadata for this
Automate the building of a PS lsquoaround this keyrsquo Take all columns
Historize using SCD-2 approach
Persistent Staging - how
rwerschkull
nllinkedincominrogierwerschkull
Entity levelUnique key
Functional Description
Delivering party
Owner Responsible
MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)
hellip
Persistent Staging Metadata-1
rwerschkull
nllinkedincominrogierwerschkull
Column level [Load Date Timestamp]
[Load End Date Timestamp]
[Deleted Flag] OR delete as new record
[Source system] on table file level (lowest possible)
Load End Date Timestamp possible but difficulthellipRequires updates
Persistent Staging Metadata-2
rwerschkull
nllinkedincominrogierwerschkull
ACID is possible in HIVE
ACID Makes Updates possibleBy registering updates as lsquonew datarsquo
Reconciliation compacting when idle at user command
Use ORC files
PLUS changing the HIVE configurationhellip
UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)
rwerschkull
nllinkedincominrogierwerschkull
HivePut semi structured data = variable columns in MAP data type
OR use Data storage type that supports schema-evolutionAVRO (ORC in development)
Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo
Schema can be different for every row
What about SEMI-STRUCTURED Data
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
3C - how
rwerschkull
nllinkedincominrogierwerschkull
Always starts with Conceptual data modeling
NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic
No Link Satellites
No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo
Explicit Helper entities
Like Data Vault(2) BUT
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
a UNIQUE Domain specific point of integration
hellipa business entity
hellipwithin itrsquos own domain
hellipdoes not necessarily need to be Enterprise Wide
Business Concept (BC)
rwerschkull
nllinkedincominrogierwerschkull
Why not lsquoenterprise widersquoCompany
Customer
Sales Customer
International
Sales Customer
Local
Sales Customer
Marketing Customer
Customer hellip
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description]
[Owner Responsible]
Column level [Load Date Timestamp]
[Source system] on table file level (lowest possible)
Business Concept Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example-Data
NSR-Station
NS-
Travelcard
NS-
Trainseries
Business Key
IC|855
IC|8852
Sp|7455
St|16050
hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
NS-
Traveller
Business Key
3528 0234 2073 1234
3528 0234 2073 5678
hellip
Business Key
CRM-RW123456
CRM-LAS224466
hellip
rwerschkull
nllinkedincominrogierwerschkull
Most easy entity to be virtualised(if performance allows)
No Hashing amp No surrogateBUSINESSKEYS
(not by default at least)
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Containts Context about a Concept In a historical way
hellipLike a Data Vault Satellite
Every CC belongs to only one BC
Seperate entity per source system table stream
Concept Context (CC)
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Concept Context Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
Example-dataBK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts META_Deleted_Ind
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0
Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0
Asa hellip hellip hellip hellip hellip hellip
NSR-Station
[adres]Source NSS1 table y
More difficult to be virtualisedDepends on semantic gap with source
But do make virtual when lsquostreaming datarsquo is necessary
Because we have PS layerExposing all columns not necessary
Refactoring is more easyhellip
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Relations between Concepts + Context
In a historical way
hellipMerger of Data Vault Link + Link Satellite
Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point
in time
Connector (C)
rwerschkull
nllinkedincominrogierwerschkull
Explicitly defining a driving key as metadatahellip
Gives business understanding
Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)
bull is not registered as a new lsquoconnectionrsquo
Connector Driving key
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Connector Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
Driving Key
NS-Travelcard
+Checkin timestamp
Example-Data
NSR-StationNS-
Travelcard
NS-
Trainseries
rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
BK_NSR-
Station-from
BK_NSR-
Station-to
BK_NS_
Trainseries
BK_NS-Travelcard Checkin
timestamp
Checkout
timestamp
META_Load_dts META_Load_end_dts
Asd Ut IC 855 3528 0234 2073
1234
5-4-2016 84932 5-4-2016 94012 6-4-2016
220000
31-12-9999
000000
Ut Asd IC 855 3528 0234 2073
1234
5-4-2016 181009 5-4-2016 185520 6-4-2016
220000
31-12-9999
000000
We add Two mandatory
HELPEREntities
Are we there yet No
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
To help switching from sources that are Tied together by technical (surrogate) keyshellip
To a Business Key based model
Itrsquos a LOOKUP table that translates the technical to the
Business Key
Business Alias
Example
NSR-Station
[valuation]Source NSS2 table p
[description]SourceNSS1 table x[adres]
Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
BA-NSS2
Key Lookup voor NSS1 source tables
Key Lookup voor NSS2 source tables
Example-data
NSR-Station
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
Key Lookup voor NSS1 source tables
Business Key NSS1_Surrogate_key
Ut 123522
Asd 666323
Asa 222443
hellip hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
Has a 1 on (01) relation with a Business Concept
More difficult to be virtualised Lookup table should be kept small
Therefore DO NOT do key lookup in Concept Context entity
Load generate together with BC
Preferably lsquoin memoryrsquo somehowhellip
BA important Details
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Integrate the validity timelines of Concept Contextsbelonging to a Business Concept
Like a Data Vault Point-in-time construct
But Mandatory
And with a clearly defined and performantapproach
BC-Timeline
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-Station
[valuation]Source NSS2 table p
[adres]Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BK_NSR-
Station
WOZ
waarde
Waarde
Ratingbureau X
META_Laad_dts META_Laad_eind_dts
Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959
Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959
Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000
BK_NSR-
Station
Combined_Load_dts Combined_Load_end_dts
Ut 5-6-2013 220000 1-1-2014 215959
Ut 1-1-2014 220000 1-1-2015 215959
Ut 1-1-2015 220000 4-7-2015 215959
Ut 4-7-2015 220000 1-3-2016 215959
Ut 1-3-2016 220000 31-12-9999 000000
Asd 5-6-2013 220000 31-12-9999 000000
BK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959
Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000
Asa hellip hellip hellip hellip hellip hellip
BCT
modelling
rwerschkull
nllinkedincominrogierwerschkull
What
Makes PS-3Ca Different Ensemble
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
rwerschkull
nllinkedincominrogierwerschkull
1) Explicitly Splitting The work
Data
+History
Subjects
+Integration
rwerschkull
nllinkedincominrogierwerschkull
2) NO HASHED BUSINeSS KEYS
or surrogate keys
httpwwwcannabisculturecomfilesimages6hashbrickJPG
Only
Concatenatedones
rwerschkull
nllinkedincominrogierwerschkull
3) Less
joinsRelation
+Technical validity timeline
+ Relation context
Together in one entityPhoto credit Public Domain
rwerschkull
nllinkedincominrogierwerschkull
4) Explicit
HelpEREntities
Business Alias
Business Component Timeline
+ explicitly define Driving key(s)
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
Data Flood
Photo credit Kurayba (httpswwwflickrcomphotos48503330N0828564454666 )
under cc licence (httpscreativecommonsorglicensesby-sa20)
rwerschkull
nllinkedincominrogierwerschkull
The possible resulthellip
Photo credit httpshighfiveexportswordpresscom201006253000-pieces-lego-mix-specialty-pieces-rare-pieces-bricks-blocks-
parts-more-ultimate-lot-of-lego-parts-pieces-lego-for-sale-lego-batman-lego-starwars-lego-technic-lego-minifigur
rwerschkull
nllinkedincominrogierwerschkull
But isNrsquot Data Vault v2
lsquomade for
Big data centric
systemsrsquorwerschkull
nllinkedincominrogierwerschkull
In
DV2you still
do thisin one go
Subject Oriented
Integrated
Time Variant Non-Volatile
EDW
rwerschkull
nllinkedincominrogierwerschkull
Coding =
A lot like modelling
Being Data Centric
conflictswith thecomplex Data
MODELLINGwork
httpxkcdcom844
problem
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Less Mature
JOINoptimizers
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Key-Value
Document
Column Family
NoSQL Databases
+SQL on Hadoop solutions
Do NOT like joins
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
This
REALLYcomplicates
AnchoRModeling
rwerschkull
nllinkedincominrogierwerschkull
And Personally
HUB
SAT
LINK
This one
too(Link Satellite)
HUB
HUB
SAT
SAT
rwerschkull
nllinkedincominrogierwerschkull
problem
rwerschkull
nllinkedincominrogierwerschkull
a) HASHING OF Business keys
Rolling
Stock Nr Datetime Sensor Id Value Concatenated Business Key
Key
Len MD5 Hash
Key
Len
8739
2015-01-22
013427 72A1_FINV 123 8739|2015-01-22 013427|72A1_FINV 34 86ae4c6b0e2e2d5a13a0d11440529aeb 32
8739
2015-01-22
013427 72A1_SLDET 100 8739|2015-01-22 013427|72A1_SLDET 35 51ce9bc292eef407bd7c91a52eebcf2e 32
8739
2015-01-22
013432 72A1_FINV 126 8739|2015-01-22 013432|72A1_FINV 34 9482a41c1fecc4c64b8c437af6cc85e8 32
8739
2015-01-22
013432
13A8_MW_UB
AT_VT 5 8739|2015-01-22 013432|13A8_MW_UBAT_VT 42 e4160914ee55ce0b93f87b23366a0ce3 32
8674
2015-01-22
013426 72A1_FINV 6 8674|2015-01-22 013426|72A1_FINV 34 fcb3e7c8c91e44ce396d908a4948ca65 32
8674
2015-01-22
013426
16A1_HSVER
OND 7 8674|2015-01-22 013426|16A1_HSVEROND 38 fe9098c8c291ad56af5c8afae5169196 32
Loses statistical Informationregarding the data distributionQuery optimizers do not like thishellip
Column family Document and Key-value databases need a
good (natural) sharding key for (partial) key-
lookups
Hashinghelliphellip
httpwwwebaytechblogcom20120814cassandra-data-modeling-best-practices-part-2rwerschkull
nllinkedincominrogierwerschkull
Surrogates keys require
centralized coordination
hellipand thus can impact the overall systemrsquos scalability and availability
A lot of MPP NoSQL databases simply do not have themhellip
B) Surrogate BuSINESS keys
rwerschkull
nllinkedincominrogierwerschkull
Then Some Inspiration
httproelantvoscomblogp=1119
rwerschkull
nllinkedincominrogierwerschkull
lsquoIn my opinion the answer lies in the adoption of the
persistent (Historical) Staging Area concept
(also known as Historical Staging or the History Area)
This basically adopts the fundamentals of a Data Warehousersquo
lsquoThe Historical Staging Area effectively lsquoactsrsquo as
Data Lake but in a better defined form as data deltas and
event datetimes are taken into accountrsquo
So
Time Variant
amp
Non Volatile
Subject Oriented
amp
Integrated
EDW
Subject Oriented Integrated
Time Variant Non-Volatile
EDW
rwerschkull
nllinkedincominrogierwerschkull
Could be a
Data LAKE
VirtualisedEnsemble
Tier
EDW
Time Variant
amp
Non Volatile
Subject Oriented
amp
Integrated
EDW
rwerschkull
nllinkedincominrogierwerschkull
How
Does PS-3C Work
rwerschkull
nllinkedincominrogierwerschkullPhoto credit Public Domain
StagingArea
EDWInformation
Marts
Focus of Current ensemble EDWrsquos
rwerschkull
nllinkedincominrogierwerschkull
Persistent StagingArea HSA =
Data LibraryEDW
Information Marts
Splitting the work
rwerschkull
nllinkedincominrogierwerschkull
Persitent
Staging
-
Concept
Context
Connector
Business
Concept
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
Identify source event stream Primary or Unique KeyUse source metadata for this
Automate the building of a PS lsquoaround this keyrsquo Take all columns
Historize using SCD-2 approach
Persistent Staging - how
rwerschkull
nllinkedincominrogierwerschkull
Entity levelUnique key
Functional Description
Delivering party
Owner Responsible
MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)
hellip
Persistent Staging Metadata-1
rwerschkull
nllinkedincominrogierwerschkull
Column level [Load Date Timestamp]
[Load End Date Timestamp]
[Deleted Flag] OR delete as new record
[Source system] on table file level (lowest possible)
Load End Date Timestamp possible but difficulthellipRequires updates
Persistent Staging Metadata-2
rwerschkull
nllinkedincominrogierwerschkull
ACID is possible in HIVE
ACID Makes Updates possibleBy registering updates as lsquonew datarsquo
Reconciliation compacting when idle at user command
Use ORC files
PLUS changing the HIVE configurationhellip
UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)
rwerschkull
nllinkedincominrogierwerschkull
HivePut semi structured data = variable columns in MAP data type
OR use Data storage type that supports schema-evolutionAVRO (ORC in development)
Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo
Schema can be different for every row
What about SEMI-STRUCTURED Data
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
3C - how
rwerschkull
nllinkedincominrogierwerschkull
Always starts with Conceptual data modeling
NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic
No Link Satellites
No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo
Explicit Helper entities
Like Data Vault(2) BUT
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
a UNIQUE Domain specific point of integration
hellipa business entity
hellipwithin itrsquos own domain
hellipdoes not necessarily need to be Enterprise Wide
Business Concept (BC)
rwerschkull
nllinkedincominrogierwerschkull
Why not lsquoenterprise widersquoCompany
Customer
Sales Customer
International
Sales Customer
Local
Sales Customer
Marketing Customer
Customer hellip
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description]
[Owner Responsible]
Column level [Load Date Timestamp]
[Source system] on table file level (lowest possible)
Business Concept Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example-Data
NSR-Station
NS-
Travelcard
NS-
Trainseries
Business Key
IC|855
IC|8852
Sp|7455
St|16050
hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
NS-
Traveller
Business Key
3528 0234 2073 1234
3528 0234 2073 5678
hellip
Business Key
CRM-RW123456
CRM-LAS224466
hellip
rwerschkull
nllinkedincominrogierwerschkull
Most easy entity to be virtualised(if performance allows)
No Hashing amp No surrogateBUSINESSKEYS
(not by default at least)
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Containts Context about a Concept In a historical way
hellipLike a Data Vault Satellite
Every CC belongs to only one BC
Seperate entity per source system table stream
Concept Context (CC)
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Concept Context Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
Example-dataBK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts META_Deleted_Ind
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0
Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0
Asa hellip hellip hellip hellip hellip hellip
NSR-Station
[adres]Source NSS1 table y
More difficult to be virtualisedDepends on semantic gap with source
But do make virtual when lsquostreaming datarsquo is necessary
Because we have PS layerExposing all columns not necessary
Refactoring is more easyhellip
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Relations between Concepts + Context
In a historical way
hellipMerger of Data Vault Link + Link Satellite
Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point
in time
Connector (C)
rwerschkull
nllinkedincominrogierwerschkull
Explicitly defining a driving key as metadatahellip
Gives business understanding
Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)
bull is not registered as a new lsquoconnectionrsquo
Connector Driving key
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Connector Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
Driving Key
NS-Travelcard
+Checkin timestamp
Example-Data
NSR-StationNS-
Travelcard
NS-
Trainseries
rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
BK_NSR-
Station-from
BK_NSR-
Station-to
BK_NS_
Trainseries
BK_NS-Travelcard Checkin
timestamp
Checkout
timestamp
META_Load_dts META_Load_end_dts
Asd Ut IC 855 3528 0234 2073
1234
5-4-2016 84932 5-4-2016 94012 6-4-2016
220000
31-12-9999
000000
Ut Asd IC 855 3528 0234 2073
1234
5-4-2016 181009 5-4-2016 185520 6-4-2016
220000
31-12-9999
000000
We add Two mandatory
HELPEREntities
Are we there yet No
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
To help switching from sources that are Tied together by technical (surrogate) keyshellip
To a Business Key based model
Itrsquos a LOOKUP table that translates the technical to the
Business Key
Business Alias
Example
NSR-Station
[valuation]Source NSS2 table p
[description]SourceNSS1 table x[adres]
Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
BA-NSS2
Key Lookup voor NSS1 source tables
Key Lookup voor NSS2 source tables
Example-data
NSR-Station
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
Key Lookup voor NSS1 source tables
Business Key NSS1_Surrogate_key
Ut 123522
Asd 666323
Asa 222443
hellip hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
Has a 1 on (01) relation with a Business Concept
More difficult to be virtualised Lookup table should be kept small
Therefore DO NOT do key lookup in Concept Context entity
Load generate together with BC
Preferably lsquoin memoryrsquo somehowhellip
BA important Details
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Integrate the validity timelines of Concept Contextsbelonging to a Business Concept
Like a Data Vault Point-in-time construct
But Mandatory
And with a clearly defined and performantapproach
BC-Timeline
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-Station
[valuation]Source NSS2 table p
[adres]Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BK_NSR-
Station
WOZ
waarde
Waarde
Ratingbureau X
META_Laad_dts META_Laad_eind_dts
Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959
Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959
Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000
BK_NSR-
Station
Combined_Load_dts Combined_Load_end_dts
Ut 5-6-2013 220000 1-1-2014 215959
Ut 1-1-2014 220000 1-1-2015 215959
Ut 1-1-2015 220000 4-7-2015 215959
Ut 4-7-2015 220000 1-3-2016 215959
Ut 1-3-2016 220000 31-12-9999 000000
Asd 5-6-2013 220000 31-12-9999 000000
BK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959
Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000
Asa hellip hellip hellip hellip hellip hellip
BCT
modelling
rwerschkull
nllinkedincominrogierwerschkull
What
Makes PS-3Ca Different Ensemble
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
rwerschkull
nllinkedincominrogierwerschkull
1) Explicitly Splitting The work
Data
+History
Subjects
+Integration
rwerschkull
nllinkedincominrogierwerschkull
2) NO HASHED BUSINeSS KEYS
or surrogate keys
httpwwwcannabisculturecomfilesimages6hashbrickJPG
Only
Concatenatedones
rwerschkull
nllinkedincominrogierwerschkull
3) Less
joinsRelation
+Technical validity timeline
+ Relation context
Together in one entityPhoto credit Public Domain
rwerschkull
nllinkedincominrogierwerschkull
4) Explicit
HelpEREntities
Business Alias
Business Component Timeline
+ explicitly define Driving key(s)
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
The possible resulthellip
Photo credit httpshighfiveexportswordpresscom201006253000-pieces-lego-mix-specialty-pieces-rare-pieces-bricks-blocks-
parts-more-ultimate-lot-of-lego-parts-pieces-lego-for-sale-lego-batman-lego-starwars-lego-technic-lego-minifigur
rwerschkull
nllinkedincominrogierwerschkull
But isNrsquot Data Vault v2
lsquomade for
Big data centric
systemsrsquorwerschkull
nllinkedincominrogierwerschkull
In
DV2you still
do thisin one go
Subject Oriented
Integrated
Time Variant Non-Volatile
EDW
rwerschkull
nllinkedincominrogierwerschkull
Coding =
A lot like modelling
Being Data Centric
conflictswith thecomplex Data
MODELLINGwork
httpxkcdcom844
problem
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Less Mature
JOINoptimizers
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Key-Value
Document
Column Family
NoSQL Databases
+SQL on Hadoop solutions
Do NOT like joins
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
This
REALLYcomplicates
AnchoRModeling
rwerschkull
nllinkedincominrogierwerschkull
And Personally
HUB
SAT
LINK
This one
too(Link Satellite)
HUB
HUB
SAT
SAT
rwerschkull
nllinkedincominrogierwerschkull
problem
rwerschkull
nllinkedincominrogierwerschkull
a) HASHING OF Business keys
Rolling
Stock Nr Datetime Sensor Id Value Concatenated Business Key
Key
Len MD5 Hash
Key
Len
8739
2015-01-22
013427 72A1_FINV 123 8739|2015-01-22 013427|72A1_FINV 34 86ae4c6b0e2e2d5a13a0d11440529aeb 32
8739
2015-01-22
013427 72A1_SLDET 100 8739|2015-01-22 013427|72A1_SLDET 35 51ce9bc292eef407bd7c91a52eebcf2e 32
8739
2015-01-22
013432 72A1_FINV 126 8739|2015-01-22 013432|72A1_FINV 34 9482a41c1fecc4c64b8c437af6cc85e8 32
8739
2015-01-22
013432
13A8_MW_UB
AT_VT 5 8739|2015-01-22 013432|13A8_MW_UBAT_VT 42 e4160914ee55ce0b93f87b23366a0ce3 32
8674
2015-01-22
013426 72A1_FINV 6 8674|2015-01-22 013426|72A1_FINV 34 fcb3e7c8c91e44ce396d908a4948ca65 32
8674
2015-01-22
013426
16A1_HSVER
OND 7 8674|2015-01-22 013426|16A1_HSVEROND 38 fe9098c8c291ad56af5c8afae5169196 32
Loses statistical Informationregarding the data distributionQuery optimizers do not like thishellip
Column family Document and Key-value databases need a
good (natural) sharding key for (partial) key-
lookups
Hashinghelliphellip
httpwwwebaytechblogcom20120814cassandra-data-modeling-best-practices-part-2rwerschkull
nllinkedincominrogierwerschkull
Surrogates keys require
centralized coordination
hellipand thus can impact the overall systemrsquos scalability and availability
A lot of MPP NoSQL databases simply do not have themhellip
B) Surrogate BuSINESS keys
rwerschkull
nllinkedincominrogierwerschkull
Then Some Inspiration
httproelantvoscomblogp=1119
rwerschkull
nllinkedincominrogierwerschkull
lsquoIn my opinion the answer lies in the adoption of the
persistent (Historical) Staging Area concept
(also known as Historical Staging or the History Area)
This basically adopts the fundamentals of a Data Warehousersquo
lsquoThe Historical Staging Area effectively lsquoactsrsquo as
Data Lake but in a better defined form as data deltas and
event datetimes are taken into accountrsquo
So
Time Variant
amp
Non Volatile
Subject Oriented
amp
Integrated
EDW
Subject Oriented Integrated
Time Variant Non-Volatile
EDW
rwerschkull
nllinkedincominrogierwerschkull
Could be a
Data LAKE
VirtualisedEnsemble
Tier
EDW
Time Variant
amp
Non Volatile
Subject Oriented
amp
Integrated
EDW
rwerschkull
nllinkedincominrogierwerschkull
How
Does PS-3C Work
rwerschkull
nllinkedincominrogierwerschkullPhoto credit Public Domain
StagingArea
EDWInformation
Marts
Focus of Current ensemble EDWrsquos
rwerschkull
nllinkedincominrogierwerschkull
Persistent StagingArea HSA =
Data LibraryEDW
Information Marts
Splitting the work
rwerschkull
nllinkedincominrogierwerschkull
Persitent
Staging
-
Concept
Context
Connector
Business
Concept
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
Identify source event stream Primary or Unique KeyUse source metadata for this
Automate the building of a PS lsquoaround this keyrsquo Take all columns
Historize using SCD-2 approach
Persistent Staging - how
rwerschkull
nllinkedincominrogierwerschkull
Entity levelUnique key
Functional Description
Delivering party
Owner Responsible
MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)
hellip
Persistent Staging Metadata-1
rwerschkull
nllinkedincominrogierwerschkull
Column level [Load Date Timestamp]
[Load End Date Timestamp]
[Deleted Flag] OR delete as new record
[Source system] on table file level (lowest possible)
Load End Date Timestamp possible but difficulthellipRequires updates
Persistent Staging Metadata-2
rwerschkull
nllinkedincominrogierwerschkull
ACID is possible in HIVE
ACID Makes Updates possibleBy registering updates as lsquonew datarsquo
Reconciliation compacting when idle at user command
Use ORC files
PLUS changing the HIVE configurationhellip
UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)
rwerschkull
nllinkedincominrogierwerschkull
HivePut semi structured data = variable columns in MAP data type
OR use Data storage type that supports schema-evolutionAVRO (ORC in development)
Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo
Schema can be different for every row
What about SEMI-STRUCTURED Data
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
3C - how
rwerschkull
nllinkedincominrogierwerschkull
Always starts with Conceptual data modeling
NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic
No Link Satellites
No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo
Explicit Helper entities
Like Data Vault(2) BUT
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
a UNIQUE Domain specific point of integration
hellipa business entity
hellipwithin itrsquos own domain
hellipdoes not necessarily need to be Enterprise Wide
Business Concept (BC)
rwerschkull
nllinkedincominrogierwerschkull
Why not lsquoenterprise widersquoCompany
Customer
Sales Customer
International
Sales Customer
Local
Sales Customer
Marketing Customer
Customer hellip
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description]
[Owner Responsible]
Column level [Load Date Timestamp]
[Source system] on table file level (lowest possible)
Business Concept Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example-Data
NSR-Station
NS-
Travelcard
NS-
Trainseries
Business Key
IC|855
IC|8852
Sp|7455
St|16050
hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
NS-
Traveller
Business Key
3528 0234 2073 1234
3528 0234 2073 5678
hellip
Business Key
CRM-RW123456
CRM-LAS224466
hellip
rwerschkull
nllinkedincominrogierwerschkull
Most easy entity to be virtualised(if performance allows)
No Hashing amp No surrogateBUSINESSKEYS
(not by default at least)
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Containts Context about a Concept In a historical way
hellipLike a Data Vault Satellite
Every CC belongs to only one BC
Seperate entity per source system table stream
Concept Context (CC)
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Concept Context Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
Example-dataBK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts META_Deleted_Ind
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0
Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0
Asa hellip hellip hellip hellip hellip hellip
NSR-Station
[adres]Source NSS1 table y
More difficult to be virtualisedDepends on semantic gap with source
But do make virtual when lsquostreaming datarsquo is necessary
Because we have PS layerExposing all columns not necessary
Refactoring is more easyhellip
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Relations between Concepts + Context
In a historical way
hellipMerger of Data Vault Link + Link Satellite
Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point
in time
Connector (C)
rwerschkull
nllinkedincominrogierwerschkull
Explicitly defining a driving key as metadatahellip
Gives business understanding
Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)
bull is not registered as a new lsquoconnectionrsquo
Connector Driving key
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Connector Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
Driving Key
NS-Travelcard
+Checkin timestamp
Example-Data
NSR-StationNS-
Travelcard
NS-
Trainseries
rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
BK_NSR-
Station-from
BK_NSR-
Station-to
BK_NS_
Trainseries
BK_NS-Travelcard Checkin
timestamp
Checkout
timestamp
META_Load_dts META_Load_end_dts
Asd Ut IC 855 3528 0234 2073
1234
5-4-2016 84932 5-4-2016 94012 6-4-2016
220000
31-12-9999
000000
Ut Asd IC 855 3528 0234 2073
1234
5-4-2016 181009 5-4-2016 185520 6-4-2016
220000
31-12-9999
000000
We add Two mandatory
HELPEREntities
Are we there yet No
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
To help switching from sources that are Tied together by technical (surrogate) keyshellip
To a Business Key based model
Itrsquos a LOOKUP table that translates the technical to the
Business Key
Business Alias
Example
NSR-Station
[valuation]Source NSS2 table p
[description]SourceNSS1 table x[adres]
Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
BA-NSS2
Key Lookup voor NSS1 source tables
Key Lookup voor NSS2 source tables
Example-data
NSR-Station
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
Key Lookup voor NSS1 source tables
Business Key NSS1_Surrogate_key
Ut 123522
Asd 666323
Asa 222443
hellip hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
Has a 1 on (01) relation with a Business Concept
More difficult to be virtualised Lookup table should be kept small
Therefore DO NOT do key lookup in Concept Context entity
Load generate together with BC
Preferably lsquoin memoryrsquo somehowhellip
BA important Details
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Integrate the validity timelines of Concept Contextsbelonging to a Business Concept
Like a Data Vault Point-in-time construct
But Mandatory
And with a clearly defined and performantapproach
BC-Timeline
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-Station
[valuation]Source NSS2 table p
[adres]Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BK_NSR-
Station
WOZ
waarde
Waarde
Ratingbureau X
META_Laad_dts META_Laad_eind_dts
Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959
Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959
Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000
BK_NSR-
Station
Combined_Load_dts Combined_Load_end_dts
Ut 5-6-2013 220000 1-1-2014 215959
Ut 1-1-2014 220000 1-1-2015 215959
Ut 1-1-2015 220000 4-7-2015 215959
Ut 4-7-2015 220000 1-3-2016 215959
Ut 1-3-2016 220000 31-12-9999 000000
Asd 5-6-2013 220000 31-12-9999 000000
BK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959
Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000
Asa hellip hellip hellip hellip hellip hellip
BCT
modelling
rwerschkull
nllinkedincominrogierwerschkull
What
Makes PS-3Ca Different Ensemble
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
rwerschkull
nllinkedincominrogierwerschkull
1) Explicitly Splitting The work
Data
+History
Subjects
+Integration
rwerschkull
nllinkedincominrogierwerschkull
2) NO HASHED BUSINeSS KEYS
or surrogate keys
httpwwwcannabisculturecomfilesimages6hashbrickJPG
Only
Concatenatedones
rwerschkull
nllinkedincominrogierwerschkull
3) Less
joinsRelation
+Technical validity timeline
+ Relation context
Together in one entityPhoto credit Public Domain
rwerschkull
nllinkedincominrogierwerschkull
4) Explicit
HelpEREntities
Business Alias
Business Component Timeline
+ explicitly define Driving key(s)
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
But isNrsquot Data Vault v2
lsquomade for
Big data centric
systemsrsquorwerschkull
nllinkedincominrogierwerschkull
In
DV2you still
do thisin one go
Subject Oriented
Integrated
Time Variant Non-Volatile
EDW
rwerschkull
nllinkedincominrogierwerschkull
Coding =
A lot like modelling
Being Data Centric
conflictswith thecomplex Data
MODELLINGwork
httpxkcdcom844
problem
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Less Mature
JOINoptimizers
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Key-Value
Document
Column Family
NoSQL Databases
+SQL on Hadoop solutions
Do NOT like joins
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
This
REALLYcomplicates
AnchoRModeling
rwerschkull
nllinkedincominrogierwerschkull
And Personally
HUB
SAT
LINK
This one
too(Link Satellite)
HUB
HUB
SAT
SAT
rwerschkull
nllinkedincominrogierwerschkull
problem
rwerschkull
nllinkedincominrogierwerschkull
a) HASHING OF Business keys
Rolling
Stock Nr Datetime Sensor Id Value Concatenated Business Key
Key
Len MD5 Hash
Key
Len
8739
2015-01-22
013427 72A1_FINV 123 8739|2015-01-22 013427|72A1_FINV 34 86ae4c6b0e2e2d5a13a0d11440529aeb 32
8739
2015-01-22
013427 72A1_SLDET 100 8739|2015-01-22 013427|72A1_SLDET 35 51ce9bc292eef407bd7c91a52eebcf2e 32
8739
2015-01-22
013432 72A1_FINV 126 8739|2015-01-22 013432|72A1_FINV 34 9482a41c1fecc4c64b8c437af6cc85e8 32
8739
2015-01-22
013432
13A8_MW_UB
AT_VT 5 8739|2015-01-22 013432|13A8_MW_UBAT_VT 42 e4160914ee55ce0b93f87b23366a0ce3 32
8674
2015-01-22
013426 72A1_FINV 6 8674|2015-01-22 013426|72A1_FINV 34 fcb3e7c8c91e44ce396d908a4948ca65 32
8674
2015-01-22
013426
16A1_HSVER
OND 7 8674|2015-01-22 013426|16A1_HSVEROND 38 fe9098c8c291ad56af5c8afae5169196 32
Loses statistical Informationregarding the data distributionQuery optimizers do not like thishellip
Column family Document and Key-value databases need a
good (natural) sharding key for (partial) key-
lookups
Hashinghelliphellip
httpwwwebaytechblogcom20120814cassandra-data-modeling-best-practices-part-2rwerschkull
nllinkedincominrogierwerschkull
Surrogates keys require
centralized coordination
hellipand thus can impact the overall systemrsquos scalability and availability
A lot of MPP NoSQL databases simply do not have themhellip
B) Surrogate BuSINESS keys
rwerschkull
nllinkedincominrogierwerschkull
Then Some Inspiration
httproelantvoscomblogp=1119
rwerschkull
nllinkedincominrogierwerschkull
lsquoIn my opinion the answer lies in the adoption of the
persistent (Historical) Staging Area concept
(also known as Historical Staging or the History Area)
This basically adopts the fundamentals of a Data Warehousersquo
lsquoThe Historical Staging Area effectively lsquoactsrsquo as
Data Lake but in a better defined form as data deltas and
event datetimes are taken into accountrsquo
So
Time Variant
amp
Non Volatile
Subject Oriented
amp
Integrated
EDW
Subject Oriented Integrated
Time Variant Non-Volatile
EDW
rwerschkull
nllinkedincominrogierwerschkull
Could be a
Data LAKE
VirtualisedEnsemble
Tier
EDW
Time Variant
amp
Non Volatile
Subject Oriented
amp
Integrated
EDW
rwerschkull
nllinkedincominrogierwerschkull
How
Does PS-3C Work
rwerschkull
nllinkedincominrogierwerschkullPhoto credit Public Domain
StagingArea
EDWInformation
Marts
Focus of Current ensemble EDWrsquos
rwerschkull
nllinkedincominrogierwerschkull
Persistent StagingArea HSA =
Data LibraryEDW
Information Marts
Splitting the work
rwerschkull
nllinkedincominrogierwerschkull
Persitent
Staging
-
Concept
Context
Connector
Business
Concept
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
Identify source event stream Primary or Unique KeyUse source metadata for this
Automate the building of a PS lsquoaround this keyrsquo Take all columns
Historize using SCD-2 approach
Persistent Staging - how
rwerschkull
nllinkedincominrogierwerschkull
Entity levelUnique key
Functional Description
Delivering party
Owner Responsible
MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)
hellip
Persistent Staging Metadata-1
rwerschkull
nllinkedincominrogierwerschkull
Column level [Load Date Timestamp]
[Load End Date Timestamp]
[Deleted Flag] OR delete as new record
[Source system] on table file level (lowest possible)
Load End Date Timestamp possible but difficulthellipRequires updates
Persistent Staging Metadata-2
rwerschkull
nllinkedincominrogierwerschkull
ACID is possible in HIVE
ACID Makes Updates possibleBy registering updates as lsquonew datarsquo
Reconciliation compacting when idle at user command
Use ORC files
PLUS changing the HIVE configurationhellip
UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)
rwerschkull
nllinkedincominrogierwerschkull
HivePut semi structured data = variable columns in MAP data type
OR use Data storage type that supports schema-evolutionAVRO (ORC in development)
Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo
Schema can be different for every row
What about SEMI-STRUCTURED Data
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
3C - how
rwerschkull
nllinkedincominrogierwerschkull
Always starts with Conceptual data modeling
NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic
No Link Satellites
No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo
Explicit Helper entities
Like Data Vault(2) BUT
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
a UNIQUE Domain specific point of integration
hellipa business entity
hellipwithin itrsquos own domain
hellipdoes not necessarily need to be Enterprise Wide
Business Concept (BC)
rwerschkull
nllinkedincominrogierwerschkull
Why not lsquoenterprise widersquoCompany
Customer
Sales Customer
International
Sales Customer
Local
Sales Customer
Marketing Customer
Customer hellip
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description]
[Owner Responsible]
Column level [Load Date Timestamp]
[Source system] on table file level (lowest possible)
Business Concept Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example-Data
NSR-Station
NS-
Travelcard
NS-
Trainseries
Business Key
IC|855
IC|8852
Sp|7455
St|16050
hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
NS-
Traveller
Business Key
3528 0234 2073 1234
3528 0234 2073 5678
hellip
Business Key
CRM-RW123456
CRM-LAS224466
hellip
rwerschkull
nllinkedincominrogierwerschkull
Most easy entity to be virtualised(if performance allows)
No Hashing amp No surrogateBUSINESSKEYS
(not by default at least)
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Containts Context about a Concept In a historical way
hellipLike a Data Vault Satellite
Every CC belongs to only one BC
Seperate entity per source system table stream
Concept Context (CC)
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Concept Context Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
Example-dataBK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts META_Deleted_Ind
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0
Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0
Asa hellip hellip hellip hellip hellip hellip
NSR-Station
[adres]Source NSS1 table y
More difficult to be virtualisedDepends on semantic gap with source
But do make virtual when lsquostreaming datarsquo is necessary
Because we have PS layerExposing all columns not necessary
Refactoring is more easyhellip
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Relations between Concepts + Context
In a historical way
hellipMerger of Data Vault Link + Link Satellite
Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point
in time
Connector (C)
rwerschkull
nllinkedincominrogierwerschkull
Explicitly defining a driving key as metadatahellip
Gives business understanding
Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)
bull is not registered as a new lsquoconnectionrsquo
Connector Driving key
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Connector Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
Driving Key
NS-Travelcard
+Checkin timestamp
Example-Data
NSR-StationNS-
Travelcard
NS-
Trainseries
rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
BK_NSR-
Station-from
BK_NSR-
Station-to
BK_NS_
Trainseries
BK_NS-Travelcard Checkin
timestamp
Checkout
timestamp
META_Load_dts META_Load_end_dts
Asd Ut IC 855 3528 0234 2073
1234
5-4-2016 84932 5-4-2016 94012 6-4-2016
220000
31-12-9999
000000
Ut Asd IC 855 3528 0234 2073
1234
5-4-2016 181009 5-4-2016 185520 6-4-2016
220000
31-12-9999
000000
We add Two mandatory
HELPEREntities
Are we there yet No
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
To help switching from sources that are Tied together by technical (surrogate) keyshellip
To a Business Key based model
Itrsquos a LOOKUP table that translates the technical to the
Business Key
Business Alias
Example
NSR-Station
[valuation]Source NSS2 table p
[description]SourceNSS1 table x[adres]
Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
BA-NSS2
Key Lookup voor NSS1 source tables
Key Lookup voor NSS2 source tables
Example-data
NSR-Station
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
Key Lookup voor NSS1 source tables
Business Key NSS1_Surrogate_key
Ut 123522
Asd 666323
Asa 222443
hellip hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
Has a 1 on (01) relation with a Business Concept
More difficult to be virtualised Lookup table should be kept small
Therefore DO NOT do key lookup in Concept Context entity
Load generate together with BC
Preferably lsquoin memoryrsquo somehowhellip
BA important Details
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Integrate the validity timelines of Concept Contextsbelonging to a Business Concept
Like a Data Vault Point-in-time construct
But Mandatory
And with a clearly defined and performantapproach
BC-Timeline
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-Station
[valuation]Source NSS2 table p
[adres]Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BK_NSR-
Station
WOZ
waarde
Waarde
Ratingbureau X
META_Laad_dts META_Laad_eind_dts
Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959
Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959
Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000
BK_NSR-
Station
Combined_Load_dts Combined_Load_end_dts
Ut 5-6-2013 220000 1-1-2014 215959
Ut 1-1-2014 220000 1-1-2015 215959
Ut 1-1-2015 220000 4-7-2015 215959
Ut 4-7-2015 220000 1-3-2016 215959
Ut 1-3-2016 220000 31-12-9999 000000
Asd 5-6-2013 220000 31-12-9999 000000
BK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959
Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000
Asa hellip hellip hellip hellip hellip hellip
BCT
modelling
rwerschkull
nllinkedincominrogierwerschkull
What
Makes PS-3Ca Different Ensemble
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
rwerschkull
nllinkedincominrogierwerschkull
1) Explicitly Splitting The work
Data
+History
Subjects
+Integration
rwerschkull
nllinkedincominrogierwerschkull
2) NO HASHED BUSINeSS KEYS
or surrogate keys
httpwwwcannabisculturecomfilesimages6hashbrickJPG
Only
Concatenatedones
rwerschkull
nllinkedincominrogierwerschkull
3) Less
joinsRelation
+Technical validity timeline
+ Relation context
Together in one entityPhoto credit Public Domain
rwerschkull
nllinkedincominrogierwerschkull
4) Explicit
HelpEREntities
Business Alias
Business Component Timeline
+ explicitly define Driving key(s)
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
In
DV2you still
do thisin one go
Subject Oriented
Integrated
Time Variant Non-Volatile
EDW
rwerschkull
nllinkedincominrogierwerschkull
Coding =
A lot like modelling
Being Data Centric
conflictswith thecomplex Data
MODELLINGwork
httpxkcdcom844
problem
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Less Mature
JOINoptimizers
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Key-Value
Document
Column Family
NoSQL Databases
+SQL on Hadoop solutions
Do NOT like joins
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
This
REALLYcomplicates
AnchoRModeling
rwerschkull
nllinkedincominrogierwerschkull
And Personally
HUB
SAT
LINK
This one
too(Link Satellite)
HUB
HUB
SAT
SAT
rwerschkull
nllinkedincominrogierwerschkull
problem
rwerschkull
nllinkedincominrogierwerschkull
a) HASHING OF Business keys
Rolling
Stock Nr Datetime Sensor Id Value Concatenated Business Key
Key
Len MD5 Hash
Key
Len
8739
2015-01-22
013427 72A1_FINV 123 8739|2015-01-22 013427|72A1_FINV 34 86ae4c6b0e2e2d5a13a0d11440529aeb 32
8739
2015-01-22
013427 72A1_SLDET 100 8739|2015-01-22 013427|72A1_SLDET 35 51ce9bc292eef407bd7c91a52eebcf2e 32
8739
2015-01-22
013432 72A1_FINV 126 8739|2015-01-22 013432|72A1_FINV 34 9482a41c1fecc4c64b8c437af6cc85e8 32
8739
2015-01-22
013432
13A8_MW_UB
AT_VT 5 8739|2015-01-22 013432|13A8_MW_UBAT_VT 42 e4160914ee55ce0b93f87b23366a0ce3 32
8674
2015-01-22
013426 72A1_FINV 6 8674|2015-01-22 013426|72A1_FINV 34 fcb3e7c8c91e44ce396d908a4948ca65 32
8674
2015-01-22
013426
16A1_HSVER
OND 7 8674|2015-01-22 013426|16A1_HSVEROND 38 fe9098c8c291ad56af5c8afae5169196 32
Loses statistical Informationregarding the data distributionQuery optimizers do not like thishellip
Column family Document and Key-value databases need a
good (natural) sharding key for (partial) key-
lookups
Hashinghelliphellip
httpwwwebaytechblogcom20120814cassandra-data-modeling-best-practices-part-2rwerschkull
nllinkedincominrogierwerschkull
Surrogates keys require
centralized coordination
hellipand thus can impact the overall systemrsquos scalability and availability
A lot of MPP NoSQL databases simply do not have themhellip
B) Surrogate BuSINESS keys
rwerschkull
nllinkedincominrogierwerschkull
Then Some Inspiration
httproelantvoscomblogp=1119
rwerschkull
nllinkedincominrogierwerschkull
lsquoIn my opinion the answer lies in the adoption of the
persistent (Historical) Staging Area concept
(also known as Historical Staging or the History Area)
This basically adopts the fundamentals of a Data Warehousersquo
lsquoThe Historical Staging Area effectively lsquoactsrsquo as
Data Lake but in a better defined form as data deltas and
event datetimes are taken into accountrsquo
So
Time Variant
amp
Non Volatile
Subject Oriented
amp
Integrated
EDW
Subject Oriented Integrated
Time Variant Non-Volatile
EDW
rwerschkull
nllinkedincominrogierwerschkull
Could be a
Data LAKE
VirtualisedEnsemble
Tier
EDW
Time Variant
amp
Non Volatile
Subject Oriented
amp
Integrated
EDW
rwerschkull
nllinkedincominrogierwerschkull
How
Does PS-3C Work
rwerschkull
nllinkedincominrogierwerschkullPhoto credit Public Domain
StagingArea
EDWInformation
Marts
Focus of Current ensemble EDWrsquos
rwerschkull
nllinkedincominrogierwerschkull
Persistent StagingArea HSA =
Data LibraryEDW
Information Marts
Splitting the work
rwerschkull
nllinkedincominrogierwerschkull
Persitent
Staging
-
Concept
Context
Connector
Business
Concept
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
Identify source event stream Primary or Unique KeyUse source metadata for this
Automate the building of a PS lsquoaround this keyrsquo Take all columns
Historize using SCD-2 approach
Persistent Staging - how
rwerschkull
nllinkedincominrogierwerschkull
Entity levelUnique key
Functional Description
Delivering party
Owner Responsible
MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)
hellip
Persistent Staging Metadata-1
rwerschkull
nllinkedincominrogierwerschkull
Column level [Load Date Timestamp]
[Load End Date Timestamp]
[Deleted Flag] OR delete as new record
[Source system] on table file level (lowest possible)
Load End Date Timestamp possible but difficulthellipRequires updates
Persistent Staging Metadata-2
rwerschkull
nllinkedincominrogierwerschkull
ACID is possible in HIVE
ACID Makes Updates possibleBy registering updates as lsquonew datarsquo
Reconciliation compacting when idle at user command
Use ORC files
PLUS changing the HIVE configurationhellip
UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)
rwerschkull
nllinkedincominrogierwerschkull
HivePut semi structured data = variable columns in MAP data type
OR use Data storage type that supports schema-evolutionAVRO (ORC in development)
Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo
Schema can be different for every row
What about SEMI-STRUCTURED Data
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
3C - how
rwerschkull
nllinkedincominrogierwerschkull
Always starts with Conceptual data modeling
NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic
No Link Satellites
No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo
Explicit Helper entities
Like Data Vault(2) BUT
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
a UNIQUE Domain specific point of integration
hellipa business entity
hellipwithin itrsquos own domain
hellipdoes not necessarily need to be Enterprise Wide
Business Concept (BC)
rwerschkull
nllinkedincominrogierwerschkull
Why not lsquoenterprise widersquoCompany
Customer
Sales Customer
International
Sales Customer
Local
Sales Customer
Marketing Customer
Customer hellip
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description]
[Owner Responsible]
Column level [Load Date Timestamp]
[Source system] on table file level (lowest possible)
Business Concept Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example-Data
NSR-Station
NS-
Travelcard
NS-
Trainseries
Business Key
IC|855
IC|8852
Sp|7455
St|16050
hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
NS-
Traveller
Business Key
3528 0234 2073 1234
3528 0234 2073 5678
hellip
Business Key
CRM-RW123456
CRM-LAS224466
hellip
rwerschkull
nllinkedincominrogierwerschkull
Most easy entity to be virtualised(if performance allows)
No Hashing amp No surrogateBUSINESSKEYS
(not by default at least)
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Containts Context about a Concept In a historical way
hellipLike a Data Vault Satellite
Every CC belongs to only one BC
Seperate entity per source system table stream
Concept Context (CC)
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Concept Context Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
Example-dataBK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts META_Deleted_Ind
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0
Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0
Asa hellip hellip hellip hellip hellip hellip
NSR-Station
[adres]Source NSS1 table y
More difficult to be virtualisedDepends on semantic gap with source
But do make virtual when lsquostreaming datarsquo is necessary
Because we have PS layerExposing all columns not necessary
Refactoring is more easyhellip
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Relations between Concepts + Context
In a historical way
hellipMerger of Data Vault Link + Link Satellite
Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point
in time
Connector (C)
rwerschkull
nllinkedincominrogierwerschkull
Explicitly defining a driving key as metadatahellip
Gives business understanding
Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)
bull is not registered as a new lsquoconnectionrsquo
Connector Driving key
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Connector Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
Driving Key
NS-Travelcard
+Checkin timestamp
Example-Data
NSR-StationNS-
Travelcard
NS-
Trainseries
rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
BK_NSR-
Station-from
BK_NSR-
Station-to
BK_NS_
Trainseries
BK_NS-Travelcard Checkin
timestamp
Checkout
timestamp
META_Load_dts META_Load_end_dts
Asd Ut IC 855 3528 0234 2073
1234
5-4-2016 84932 5-4-2016 94012 6-4-2016
220000
31-12-9999
000000
Ut Asd IC 855 3528 0234 2073
1234
5-4-2016 181009 5-4-2016 185520 6-4-2016
220000
31-12-9999
000000
We add Two mandatory
HELPEREntities
Are we there yet No
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
To help switching from sources that are Tied together by technical (surrogate) keyshellip
To a Business Key based model
Itrsquos a LOOKUP table that translates the technical to the
Business Key
Business Alias
Example
NSR-Station
[valuation]Source NSS2 table p
[description]SourceNSS1 table x[adres]
Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
BA-NSS2
Key Lookup voor NSS1 source tables
Key Lookup voor NSS2 source tables
Example-data
NSR-Station
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
Key Lookup voor NSS1 source tables
Business Key NSS1_Surrogate_key
Ut 123522
Asd 666323
Asa 222443
hellip hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
Has a 1 on (01) relation with a Business Concept
More difficult to be virtualised Lookup table should be kept small
Therefore DO NOT do key lookup in Concept Context entity
Load generate together with BC
Preferably lsquoin memoryrsquo somehowhellip
BA important Details
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Integrate the validity timelines of Concept Contextsbelonging to a Business Concept
Like a Data Vault Point-in-time construct
But Mandatory
And with a clearly defined and performantapproach
BC-Timeline
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-Station
[valuation]Source NSS2 table p
[adres]Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BK_NSR-
Station
WOZ
waarde
Waarde
Ratingbureau X
META_Laad_dts META_Laad_eind_dts
Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959
Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959
Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000
BK_NSR-
Station
Combined_Load_dts Combined_Load_end_dts
Ut 5-6-2013 220000 1-1-2014 215959
Ut 1-1-2014 220000 1-1-2015 215959
Ut 1-1-2015 220000 4-7-2015 215959
Ut 4-7-2015 220000 1-3-2016 215959
Ut 1-3-2016 220000 31-12-9999 000000
Asd 5-6-2013 220000 31-12-9999 000000
BK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959
Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000
Asa hellip hellip hellip hellip hellip hellip
BCT
modelling
rwerschkull
nllinkedincominrogierwerschkull
What
Makes PS-3Ca Different Ensemble
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
rwerschkull
nllinkedincominrogierwerschkull
1) Explicitly Splitting The work
Data
+History
Subjects
+Integration
rwerschkull
nllinkedincominrogierwerschkull
2) NO HASHED BUSINeSS KEYS
or surrogate keys
httpwwwcannabisculturecomfilesimages6hashbrickJPG
Only
Concatenatedones
rwerschkull
nllinkedincominrogierwerschkull
3) Less
joinsRelation
+Technical validity timeline
+ Relation context
Together in one entityPhoto credit Public Domain
rwerschkull
nllinkedincominrogierwerschkull
4) Explicit
HelpEREntities
Business Alias
Business Component Timeline
+ explicitly define Driving key(s)
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
Coding =
A lot like modelling
Being Data Centric
conflictswith thecomplex Data
MODELLINGwork
httpxkcdcom844
problem
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Less Mature
JOINoptimizers
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Key-Value
Document
Column Family
NoSQL Databases
+SQL on Hadoop solutions
Do NOT like joins
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
This
REALLYcomplicates
AnchoRModeling
rwerschkull
nllinkedincominrogierwerschkull
And Personally
HUB
SAT
LINK
This one
too(Link Satellite)
HUB
HUB
SAT
SAT
rwerschkull
nllinkedincominrogierwerschkull
problem
rwerschkull
nllinkedincominrogierwerschkull
a) HASHING OF Business keys
Rolling
Stock Nr Datetime Sensor Id Value Concatenated Business Key
Key
Len MD5 Hash
Key
Len
8739
2015-01-22
013427 72A1_FINV 123 8739|2015-01-22 013427|72A1_FINV 34 86ae4c6b0e2e2d5a13a0d11440529aeb 32
8739
2015-01-22
013427 72A1_SLDET 100 8739|2015-01-22 013427|72A1_SLDET 35 51ce9bc292eef407bd7c91a52eebcf2e 32
8739
2015-01-22
013432 72A1_FINV 126 8739|2015-01-22 013432|72A1_FINV 34 9482a41c1fecc4c64b8c437af6cc85e8 32
8739
2015-01-22
013432
13A8_MW_UB
AT_VT 5 8739|2015-01-22 013432|13A8_MW_UBAT_VT 42 e4160914ee55ce0b93f87b23366a0ce3 32
8674
2015-01-22
013426 72A1_FINV 6 8674|2015-01-22 013426|72A1_FINV 34 fcb3e7c8c91e44ce396d908a4948ca65 32
8674
2015-01-22
013426
16A1_HSVER
OND 7 8674|2015-01-22 013426|16A1_HSVEROND 38 fe9098c8c291ad56af5c8afae5169196 32
Loses statistical Informationregarding the data distributionQuery optimizers do not like thishellip
Column family Document and Key-value databases need a
good (natural) sharding key for (partial) key-
lookups
Hashinghelliphellip
httpwwwebaytechblogcom20120814cassandra-data-modeling-best-practices-part-2rwerschkull
nllinkedincominrogierwerschkull
Surrogates keys require
centralized coordination
hellipand thus can impact the overall systemrsquos scalability and availability
A lot of MPP NoSQL databases simply do not have themhellip
B) Surrogate BuSINESS keys
rwerschkull
nllinkedincominrogierwerschkull
Then Some Inspiration
httproelantvoscomblogp=1119
rwerschkull
nllinkedincominrogierwerschkull
lsquoIn my opinion the answer lies in the adoption of the
persistent (Historical) Staging Area concept
(also known as Historical Staging or the History Area)
This basically adopts the fundamentals of a Data Warehousersquo
lsquoThe Historical Staging Area effectively lsquoactsrsquo as
Data Lake but in a better defined form as data deltas and
event datetimes are taken into accountrsquo
So
Time Variant
amp
Non Volatile
Subject Oriented
amp
Integrated
EDW
Subject Oriented Integrated
Time Variant Non-Volatile
EDW
rwerschkull
nllinkedincominrogierwerschkull
Could be a
Data LAKE
VirtualisedEnsemble
Tier
EDW
Time Variant
amp
Non Volatile
Subject Oriented
amp
Integrated
EDW
rwerschkull
nllinkedincominrogierwerschkull
How
Does PS-3C Work
rwerschkull
nllinkedincominrogierwerschkullPhoto credit Public Domain
StagingArea
EDWInformation
Marts
Focus of Current ensemble EDWrsquos
rwerschkull
nllinkedincominrogierwerschkull
Persistent StagingArea HSA =
Data LibraryEDW
Information Marts
Splitting the work
rwerschkull
nllinkedincominrogierwerschkull
Persitent
Staging
-
Concept
Context
Connector
Business
Concept
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
Identify source event stream Primary or Unique KeyUse source metadata for this
Automate the building of a PS lsquoaround this keyrsquo Take all columns
Historize using SCD-2 approach
Persistent Staging - how
rwerschkull
nllinkedincominrogierwerschkull
Entity levelUnique key
Functional Description
Delivering party
Owner Responsible
MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)
hellip
Persistent Staging Metadata-1
rwerschkull
nllinkedincominrogierwerschkull
Column level [Load Date Timestamp]
[Load End Date Timestamp]
[Deleted Flag] OR delete as new record
[Source system] on table file level (lowest possible)
Load End Date Timestamp possible but difficulthellipRequires updates
Persistent Staging Metadata-2
rwerschkull
nllinkedincominrogierwerschkull
ACID is possible in HIVE
ACID Makes Updates possibleBy registering updates as lsquonew datarsquo
Reconciliation compacting when idle at user command
Use ORC files
PLUS changing the HIVE configurationhellip
UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)
rwerschkull
nllinkedincominrogierwerschkull
HivePut semi structured data = variable columns in MAP data type
OR use Data storage type that supports schema-evolutionAVRO (ORC in development)
Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo
Schema can be different for every row
What about SEMI-STRUCTURED Data
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
3C - how
rwerschkull
nllinkedincominrogierwerschkull
Always starts with Conceptual data modeling
NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic
No Link Satellites
No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo
Explicit Helper entities
Like Data Vault(2) BUT
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
a UNIQUE Domain specific point of integration
hellipa business entity
hellipwithin itrsquos own domain
hellipdoes not necessarily need to be Enterprise Wide
Business Concept (BC)
rwerschkull
nllinkedincominrogierwerschkull
Why not lsquoenterprise widersquoCompany
Customer
Sales Customer
International
Sales Customer
Local
Sales Customer
Marketing Customer
Customer hellip
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description]
[Owner Responsible]
Column level [Load Date Timestamp]
[Source system] on table file level (lowest possible)
Business Concept Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example-Data
NSR-Station
NS-
Travelcard
NS-
Trainseries
Business Key
IC|855
IC|8852
Sp|7455
St|16050
hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
NS-
Traveller
Business Key
3528 0234 2073 1234
3528 0234 2073 5678
hellip
Business Key
CRM-RW123456
CRM-LAS224466
hellip
rwerschkull
nllinkedincominrogierwerschkull
Most easy entity to be virtualised(if performance allows)
No Hashing amp No surrogateBUSINESSKEYS
(not by default at least)
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Containts Context about a Concept In a historical way
hellipLike a Data Vault Satellite
Every CC belongs to only one BC
Seperate entity per source system table stream
Concept Context (CC)
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Concept Context Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
Example-dataBK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts META_Deleted_Ind
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0
Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0
Asa hellip hellip hellip hellip hellip hellip
NSR-Station
[adres]Source NSS1 table y
More difficult to be virtualisedDepends on semantic gap with source
But do make virtual when lsquostreaming datarsquo is necessary
Because we have PS layerExposing all columns not necessary
Refactoring is more easyhellip
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Relations between Concepts + Context
In a historical way
hellipMerger of Data Vault Link + Link Satellite
Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point
in time
Connector (C)
rwerschkull
nllinkedincominrogierwerschkull
Explicitly defining a driving key as metadatahellip
Gives business understanding
Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)
bull is not registered as a new lsquoconnectionrsquo
Connector Driving key
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Connector Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
Driving Key
NS-Travelcard
+Checkin timestamp
Example-Data
NSR-StationNS-
Travelcard
NS-
Trainseries
rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
BK_NSR-
Station-from
BK_NSR-
Station-to
BK_NS_
Trainseries
BK_NS-Travelcard Checkin
timestamp
Checkout
timestamp
META_Load_dts META_Load_end_dts
Asd Ut IC 855 3528 0234 2073
1234
5-4-2016 84932 5-4-2016 94012 6-4-2016
220000
31-12-9999
000000
Ut Asd IC 855 3528 0234 2073
1234
5-4-2016 181009 5-4-2016 185520 6-4-2016
220000
31-12-9999
000000
We add Two mandatory
HELPEREntities
Are we there yet No
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
To help switching from sources that are Tied together by technical (surrogate) keyshellip
To a Business Key based model
Itrsquos a LOOKUP table that translates the technical to the
Business Key
Business Alias
Example
NSR-Station
[valuation]Source NSS2 table p
[description]SourceNSS1 table x[adres]
Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
BA-NSS2
Key Lookup voor NSS1 source tables
Key Lookup voor NSS2 source tables
Example-data
NSR-Station
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
Key Lookup voor NSS1 source tables
Business Key NSS1_Surrogate_key
Ut 123522
Asd 666323
Asa 222443
hellip hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
Has a 1 on (01) relation with a Business Concept
More difficult to be virtualised Lookup table should be kept small
Therefore DO NOT do key lookup in Concept Context entity
Load generate together with BC
Preferably lsquoin memoryrsquo somehowhellip
BA important Details
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Integrate the validity timelines of Concept Contextsbelonging to a Business Concept
Like a Data Vault Point-in-time construct
But Mandatory
And with a clearly defined and performantapproach
BC-Timeline
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-Station
[valuation]Source NSS2 table p
[adres]Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BK_NSR-
Station
WOZ
waarde
Waarde
Ratingbureau X
META_Laad_dts META_Laad_eind_dts
Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959
Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959
Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000
BK_NSR-
Station
Combined_Load_dts Combined_Load_end_dts
Ut 5-6-2013 220000 1-1-2014 215959
Ut 1-1-2014 220000 1-1-2015 215959
Ut 1-1-2015 220000 4-7-2015 215959
Ut 4-7-2015 220000 1-3-2016 215959
Ut 1-3-2016 220000 31-12-9999 000000
Asd 5-6-2013 220000 31-12-9999 000000
BK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959
Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000
Asa hellip hellip hellip hellip hellip hellip
BCT
modelling
rwerschkull
nllinkedincominrogierwerschkull
What
Makes PS-3Ca Different Ensemble
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
rwerschkull
nllinkedincominrogierwerschkull
1) Explicitly Splitting The work
Data
+History
Subjects
+Integration
rwerschkull
nllinkedincominrogierwerschkull
2) NO HASHED BUSINeSS KEYS
or surrogate keys
httpwwwcannabisculturecomfilesimages6hashbrickJPG
Only
Concatenatedones
rwerschkull
nllinkedincominrogierwerschkull
3) Less
joinsRelation
+Technical validity timeline
+ Relation context
Together in one entityPhoto credit Public Domain
rwerschkull
nllinkedincominrogierwerschkull
4) Explicit
HelpEREntities
Business Alias
Business Component Timeline
+ explicitly define Driving key(s)
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
problem
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Less Mature
JOINoptimizers
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Key-Value
Document
Column Family
NoSQL Databases
+SQL on Hadoop solutions
Do NOT like joins
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
This
REALLYcomplicates
AnchoRModeling
rwerschkull
nllinkedincominrogierwerschkull
And Personally
HUB
SAT
LINK
This one
too(Link Satellite)
HUB
HUB
SAT
SAT
rwerschkull
nllinkedincominrogierwerschkull
problem
rwerschkull
nllinkedincominrogierwerschkull
a) HASHING OF Business keys
Rolling
Stock Nr Datetime Sensor Id Value Concatenated Business Key
Key
Len MD5 Hash
Key
Len
8739
2015-01-22
013427 72A1_FINV 123 8739|2015-01-22 013427|72A1_FINV 34 86ae4c6b0e2e2d5a13a0d11440529aeb 32
8739
2015-01-22
013427 72A1_SLDET 100 8739|2015-01-22 013427|72A1_SLDET 35 51ce9bc292eef407bd7c91a52eebcf2e 32
8739
2015-01-22
013432 72A1_FINV 126 8739|2015-01-22 013432|72A1_FINV 34 9482a41c1fecc4c64b8c437af6cc85e8 32
8739
2015-01-22
013432
13A8_MW_UB
AT_VT 5 8739|2015-01-22 013432|13A8_MW_UBAT_VT 42 e4160914ee55ce0b93f87b23366a0ce3 32
8674
2015-01-22
013426 72A1_FINV 6 8674|2015-01-22 013426|72A1_FINV 34 fcb3e7c8c91e44ce396d908a4948ca65 32
8674
2015-01-22
013426
16A1_HSVER
OND 7 8674|2015-01-22 013426|16A1_HSVEROND 38 fe9098c8c291ad56af5c8afae5169196 32
Loses statistical Informationregarding the data distributionQuery optimizers do not like thishellip
Column family Document and Key-value databases need a
good (natural) sharding key for (partial) key-
lookups
Hashinghelliphellip
httpwwwebaytechblogcom20120814cassandra-data-modeling-best-practices-part-2rwerschkull
nllinkedincominrogierwerschkull
Surrogates keys require
centralized coordination
hellipand thus can impact the overall systemrsquos scalability and availability
A lot of MPP NoSQL databases simply do not have themhellip
B) Surrogate BuSINESS keys
rwerschkull
nllinkedincominrogierwerschkull
Then Some Inspiration
httproelantvoscomblogp=1119
rwerschkull
nllinkedincominrogierwerschkull
lsquoIn my opinion the answer lies in the adoption of the
persistent (Historical) Staging Area concept
(also known as Historical Staging or the History Area)
This basically adopts the fundamentals of a Data Warehousersquo
lsquoThe Historical Staging Area effectively lsquoactsrsquo as
Data Lake but in a better defined form as data deltas and
event datetimes are taken into accountrsquo
So
Time Variant
amp
Non Volatile
Subject Oriented
amp
Integrated
EDW
Subject Oriented Integrated
Time Variant Non-Volatile
EDW
rwerschkull
nllinkedincominrogierwerschkull
Could be a
Data LAKE
VirtualisedEnsemble
Tier
EDW
Time Variant
amp
Non Volatile
Subject Oriented
amp
Integrated
EDW
rwerschkull
nllinkedincominrogierwerschkull
How
Does PS-3C Work
rwerschkull
nllinkedincominrogierwerschkullPhoto credit Public Domain
StagingArea
EDWInformation
Marts
Focus of Current ensemble EDWrsquos
rwerschkull
nllinkedincominrogierwerschkull
Persistent StagingArea HSA =
Data LibraryEDW
Information Marts
Splitting the work
rwerschkull
nllinkedincominrogierwerschkull
Persitent
Staging
-
Concept
Context
Connector
Business
Concept
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
Identify source event stream Primary or Unique KeyUse source metadata for this
Automate the building of a PS lsquoaround this keyrsquo Take all columns
Historize using SCD-2 approach
Persistent Staging - how
rwerschkull
nllinkedincominrogierwerschkull
Entity levelUnique key
Functional Description
Delivering party
Owner Responsible
MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)
hellip
Persistent Staging Metadata-1
rwerschkull
nllinkedincominrogierwerschkull
Column level [Load Date Timestamp]
[Load End Date Timestamp]
[Deleted Flag] OR delete as new record
[Source system] on table file level (lowest possible)
Load End Date Timestamp possible but difficulthellipRequires updates
Persistent Staging Metadata-2
rwerschkull
nllinkedincominrogierwerschkull
ACID is possible in HIVE
ACID Makes Updates possibleBy registering updates as lsquonew datarsquo
Reconciliation compacting when idle at user command
Use ORC files
PLUS changing the HIVE configurationhellip
UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)
rwerschkull
nllinkedincominrogierwerschkull
HivePut semi structured data = variable columns in MAP data type
OR use Data storage type that supports schema-evolutionAVRO (ORC in development)
Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo
Schema can be different for every row
What about SEMI-STRUCTURED Data
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
3C - how
rwerschkull
nllinkedincominrogierwerschkull
Always starts with Conceptual data modeling
NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic
No Link Satellites
No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo
Explicit Helper entities
Like Data Vault(2) BUT
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
a UNIQUE Domain specific point of integration
hellipa business entity
hellipwithin itrsquos own domain
hellipdoes not necessarily need to be Enterprise Wide
Business Concept (BC)
rwerschkull
nllinkedincominrogierwerschkull
Why not lsquoenterprise widersquoCompany
Customer
Sales Customer
International
Sales Customer
Local
Sales Customer
Marketing Customer
Customer hellip
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description]
[Owner Responsible]
Column level [Load Date Timestamp]
[Source system] on table file level (lowest possible)
Business Concept Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example-Data
NSR-Station
NS-
Travelcard
NS-
Trainseries
Business Key
IC|855
IC|8852
Sp|7455
St|16050
hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
NS-
Traveller
Business Key
3528 0234 2073 1234
3528 0234 2073 5678
hellip
Business Key
CRM-RW123456
CRM-LAS224466
hellip
rwerschkull
nllinkedincominrogierwerschkull
Most easy entity to be virtualised(if performance allows)
No Hashing amp No surrogateBUSINESSKEYS
(not by default at least)
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Containts Context about a Concept In a historical way
hellipLike a Data Vault Satellite
Every CC belongs to only one BC
Seperate entity per source system table stream
Concept Context (CC)
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Concept Context Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
Example-dataBK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts META_Deleted_Ind
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0
Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0
Asa hellip hellip hellip hellip hellip hellip
NSR-Station
[adres]Source NSS1 table y
More difficult to be virtualisedDepends on semantic gap with source
But do make virtual when lsquostreaming datarsquo is necessary
Because we have PS layerExposing all columns not necessary
Refactoring is more easyhellip
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Relations between Concepts + Context
In a historical way
hellipMerger of Data Vault Link + Link Satellite
Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point
in time
Connector (C)
rwerschkull
nllinkedincominrogierwerschkull
Explicitly defining a driving key as metadatahellip
Gives business understanding
Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)
bull is not registered as a new lsquoconnectionrsquo
Connector Driving key
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Connector Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
Driving Key
NS-Travelcard
+Checkin timestamp
Example-Data
NSR-StationNS-
Travelcard
NS-
Trainseries
rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
BK_NSR-
Station-from
BK_NSR-
Station-to
BK_NS_
Trainseries
BK_NS-Travelcard Checkin
timestamp
Checkout
timestamp
META_Load_dts META_Load_end_dts
Asd Ut IC 855 3528 0234 2073
1234
5-4-2016 84932 5-4-2016 94012 6-4-2016
220000
31-12-9999
000000
Ut Asd IC 855 3528 0234 2073
1234
5-4-2016 181009 5-4-2016 185520 6-4-2016
220000
31-12-9999
000000
We add Two mandatory
HELPEREntities
Are we there yet No
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
To help switching from sources that are Tied together by technical (surrogate) keyshellip
To a Business Key based model
Itrsquos a LOOKUP table that translates the technical to the
Business Key
Business Alias
Example
NSR-Station
[valuation]Source NSS2 table p
[description]SourceNSS1 table x[adres]
Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
BA-NSS2
Key Lookup voor NSS1 source tables
Key Lookup voor NSS2 source tables
Example-data
NSR-Station
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
Key Lookup voor NSS1 source tables
Business Key NSS1_Surrogate_key
Ut 123522
Asd 666323
Asa 222443
hellip hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
Has a 1 on (01) relation with a Business Concept
More difficult to be virtualised Lookup table should be kept small
Therefore DO NOT do key lookup in Concept Context entity
Load generate together with BC
Preferably lsquoin memoryrsquo somehowhellip
BA important Details
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Integrate the validity timelines of Concept Contextsbelonging to a Business Concept
Like a Data Vault Point-in-time construct
But Mandatory
And with a clearly defined and performantapproach
BC-Timeline
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-Station
[valuation]Source NSS2 table p
[adres]Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BK_NSR-
Station
WOZ
waarde
Waarde
Ratingbureau X
META_Laad_dts META_Laad_eind_dts
Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959
Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959
Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000
BK_NSR-
Station
Combined_Load_dts Combined_Load_end_dts
Ut 5-6-2013 220000 1-1-2014 215959
Ut 1-1-2014 220000 1-1-2015 215959
Ut 1-1-2015 220000 4-7-2015 215959
Ut 4-7-2015 220000 1-3-2016 215959
Ut 1-3-2016 220000 31-12-9999 000000
Asd 5-6-2013 220000 31-12-9999 000000
BK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959
Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000
Asa hellip hellip hellip hellip hellip hellip
BCT
modelling
rwerschkull
nllinkedincominrogierwerschkull
What
Makes PS-3Ca Different Ensemble
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
rwerschkull
nllinkedincominrogierwerschkull
1) Explicitly Splitting The work
Data
+History
Subjects
+Integration
rwerschkull
nllinkedincominrogierwerschkull
2) NO HASHED BUSINeSS KEYS
or surrogate keys
httpwwwcannabisculturecomfilesimages6hashbrickJPG
Only
Concatenatedones
rwerschkull
nllinkedincominrogierwerschkull
3) Less
joinsRelation
+Technical validity timeline
+ Relation context
Together in one entityPhoto credit Public Domain
rwerschkull
nllinkedincominrogierwerschkull
4) Explicit
HelpEREntities
Business Alias
Business Component Timeline
+ explicitly define Driving key(s)
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Less Mature
JOINoptimizers
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Key-Value
Document
Column Family
NoSQL Databases
+SQL on Hadoop solutions
Do NOT like joins
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
This
REALLYcomplicates
AnchoRModeling
rwerschkull
nllinkedincominrogierwerschkull
And Personally
HUB
SAT
LINK
This one
too(Link Satellite)
HUB
HUB
SAT
SAT
rwerschkull
nllinkedincominrogierwerschkull
problem
rwerschkull
nllinkedincominrogierwerschkull
a) HASHING OF Business keys
Rolling
Stock Nr Datetime Sensor Id Value Concatenated Business Key
Key
Len MD5 Hash
Key
Len
8739
2015-01-22
013427 72A1_FINV 123 8739|2015-01-22 013427|72A1_FINV 34 86ae4c6b0e2e2d5a13a0d11440529aeb 32
8739
2015-01-22
013427 72A1_SLDET 100 8739|2015-01-22 013427|72A1_SLDET 35 51ce9bc292eef407bd7c91a52eebcf2e 32
8739
2015-01-22
013432 72A1_FINV 126 8739|2015-01-22 013432|72A1_FINV 34 9482a41c1fecc4c64b8c437af6cc85e8 32
8739
2015-01-22
013432
13A8_MW_UB
AT_VT 5 8739|2015-01-22 013432|13A8_MW_UBAT_VT 42 e4160914ee55ce0b93f87b23366a0ce3 32
8674
2015-01-22
013426 72A1_FINV 6 8674|2015-01-22 013426|72A1_FINV 34 fcb3e7c8c91e44ce396d908a4948ca65 32
8674
2015-01-22
013426
16A1_HSVER
OND 7 8674|2015-01-22 013426|16A1_HSVEROND 38 fe9098c8c291ad56af5c8afae5169196 32
Loses statistical Informationregarding the data distributionQuery optimizers do not like thishellip
Column family Document and Key-value databases need a
good (natural) sharding key for (partial) key-
lookups
Hashinghelliphellip
httpwwwebaytechblogcom20120814cassandra-data-modeling-best-practices-part-2rwerschkull
nllinkedincominrogierwerschkull
Surrogates keys require
centralized coordination
hellipand thus can impact the overall systemrsquos scalability and availability
A lot of MPP NoSQL databases simply do not have themhellip
B) Surrogate BuSINESS keys
rwerschkull
nllinkedincominrogierwerschkull
Then Some Inspiration
httproelantvoscomblogp=1119
rwerschkull
nllinkedincominrogierwerschkull
lsquoIn my opinion the answer lies in the adoption of the
persistent (Historical) Staging Area concept
(also known as Historical Staging or the History Area)
This basically adopts the fundamentals of a Data Warehousersquo
lsquoThe Historical Staging Area effectively lsquoactsrsquo as
Data Lake but in a better defined form as data deltas and
event datetimes are taken into accountrsquo
So
Time Variant
amp
Non Volatile
Subject Oriented
amp
Integrated
EDW
Subject Oriented Integrated
Time Variant Non-Volatile
EDW
rwerschkull
nllinkedincominrogierwerschkull
Could be a
Data LAKE
VirtualisedEnsemble
Tier
EDW
Time Variant
amp
Non Volatile
Subject Oriented
amp
Integrated
EDW
rwerschkull
nllinkedincominrogierwerschkull
How
Does PS-3C Work
rwerschkull
nllinkedincominrogierwerschkullPhoto credit Public Domain
StagingArea
EDWInformation
Marts
Focus of Current ensemble EDWrsquos
rwerschkull
nllinkedincominrogierwerschkull
Persistent StagingArea HSA =
Data LibraryEDW
Information Marts
Splitting the work
rwerschkull
nllinkedincominrogierwerschkull
Persitent
Staging
-
Concept
Context
Connector
Business
Concept
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
Identify source event stream Primary or Unique KeyUse source metadata for this
Automate the building of a PS lsquoaround this keyrsquo Take all columns
Historize using SCD-2 approach
Persistent Staging - how
rwerschkull
nllinkedincominrogierwerschkull
Entity levelUnique key
Functional Description
Delivering party
Owner Responsible
MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)
hellip
Persistent Staging Metadata-1
rwerschkull
nllinkedincominrogierwerschkull
Column level [Load Date Timestamp]
[Load End Date Timestamp]
[Deleted Flag] OR delete as new record
[Source system] on table file level (lowest possible)
Load End Date Timestamp possible but difficulthellipRequires updates
Persistent Staging Metadata-2
rwerschkull
nllinkedincominrogierwerschkull
ACID is possible in HIVE
ACID Makes Updates possibleBy registering updates as lsquonew datarsquo
Reconciliation compacting when idle at user command
Use ORC files
PLUS changing the HIVE configurationhellip
UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)
rwerschkull
nllinkedincominrogierwerschkull
HivePut semi structured data = variable columns in MAP data type
OR use Data storage type that supports schema-evolutionAVRO (ORC in development)
Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo
Schema can be different for every row
What about SEMI-STRUCTURED Data
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
3C - how
rwerschkull
nllinkedincominrogierwerschkull
Always starts with Conceptual data modeling
NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic
No Link Satellites
No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo
Explicit Helper entities
Like Data Vault(2) BUT
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
a UNIQUE Domain specific point of integration
hellipa business entity
hellipwithin itrsquos own domain
hellipdoes not necessarily need to be Enterprise Wide
Business Concept (BC)
rwerschkull
nllinkedincominrogierwerschkull
Why not lsquoenterprise widersquoCompany
Customer
Sales Customer
International
Sales Customer
Local
Sales Customer
Marketing Customer
Customer hellip
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description]
[Owner Responsible]
Column level [Load Date Timestamp]
[Source system] on table file level (lowest possible)
Business Concept Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example-Data
NSR-Station
NS-
Travelcard
NS-
Trainseries
Business Key
IC|855
IC|8852
Sp|7455
St|16050
hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
NS-
Traveller
Business Key
3528 0234 2073 1234
3528 0234 2073 5678
hellip
Business Key
CRM-RW123456
CRM-LAS224466
hellip
rwerschkull
nllinkedincominrogierwerschkull
Most easy entity to be virtualised(if performance allows)
No Hashing amp No surrogateBUSINESSKEYS
(not by default at least)
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Containts Context about a Concept In a historical way
hellipLike a Data Vault Satellite
Every CC belongs to only one BC
Seperate entity per source system table stream
Concept Context (CC)
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Concept Context Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
Example-dataBK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts META_Deleted_Ind
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0
Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0
Asa hellip hellip hellip hellip hellip hellip
NSR-Station
[adres]Source NSS1 table y
More difficult to be virtualisedDepends on semantic gap with source
But do make virtual when lsquostreaming datarsquo is necessary
Because we have PS layerExposing all columns not necessary
Refactoring is more easyhellip
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Relations between Concepts + Context
In a historical way
hellipMerger of Data Vault Link + Link Satellite
Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point
in time
Connector (C)
rwerschkull
nllinkedincominrogierwerschkull
Explicitly defining a driving key as metadatahellip
Gives business understanding
Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)
bull is not registered as a new lsquoconnectionrsquo
Connector Driving key
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Connector Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
Driving Key
NS-Travelcard
+Checkin timestamp
Example-Data
NSR-StationNS-
Travelcard
NS-
Trainseries
rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
BK_NSR-
Station-from
BK_NSR-
Station-to
BK_NS_
Trainseries
BK_NS-Travelcard Checkin
timestamp
Checkout
timestamp
META_Load_dts META_Load_end_dts
Asd Ut IC 855 3528 0234 2073
1234
5-4-2016 84932 5-4-2016 94012 6-4-2016
220000
31-12-9999
000000
Ut Asd IC 855 3528 0234 2073
1234
5-4-2016 181009 5-4-2016 185520 6-4-2016
220000
31-12-9999
000000
We add Two mandatory
HELPEREntities
Are we there yet No
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
To help switching from sources that are Tied together by technical (surrogate) keyshellip
To a Business Key based model
Itrsquos a LOOKUP table that translates the technical to the
Business Key
Business Alias
Example
NSR-Station
[valuation]Source NSS2 table p
[description]SourceNSS1 table x[adres]
Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
BA-NSS2
Key Lookup voor NSS1 source tables
Key Lookup voor NSS2 source tables
Example-data
NSR-Station
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
Key Lookup voor NSS1 source tables
Business Key NSS1_Surrogate_key
Ut 123522
Asd 666323
Asa 222443
hellip hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
Has a 1 on (01) relation with a Business Concept
More difficult to be virtualised Lookup table should be kept small
Therefore DO NOT do key lookup in Concept Context entity
Load generate together with BC
Preferably lsquoin memoryrsquo somehowhellip
BA important Details
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Integrate the validity timelines of Concept Contextsbelonging to a Business Concept
Like a Data Vault Point-in-time construct
But Mandatory
And with a clearly defined and performantapproach
BC-Timeline
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-Station
[valuation]Source NSS2 table p
[adres]Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BK_NSR-
Station
WOZ
waarde
Waarde
Ratingbureau X
META_Laad_dts META_Laad_eind_dts
Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959
Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959
Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000
BK_NSR-
Station
Combined_Load_dts Combined_Load_end_dts
Ut 5-6-2013 220000 1-1-2014 215959
Ut 1-1-2014 220000 1-1-2015 215959
Ut 1-1-2015 220000 4-7-2015 215959
Ut 4-7-2015 220000 1-3-2016 215959
Ut 1-3-2016 220000 31-12-9999 000000
Asd 5-6-2013 220000 31-12-9999 000000
BK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959
Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000
Asa hellip hellip hellip hellip hellip hellip
BCT
modelling
rwerschkull
nllinkedincominrogierwerschkull
What
Makes PS-3Ca Different Ensemble
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
rwerschkull
nllinkedincominrogierwerschkull
1) Explicitly Splitting The work
Data
+History
Subjects
+Integration
rwerschkull
nllinkedincominrogierwerschkull
2) NO HASHED BUSINeSS KEYS
or surrogate keys
httpwwwcannabisculturecomfilesimages6hashbrickJPG
Only
Concatenatedones
rwerschkull
nllinkedincominrogierwerschkull
3) Less
joinsRelation
+Technical validity timeline
+ Relation context
Together in one entityPhoto credit Public Domain
rwerschkull
nllinkedincominrogierwerschkull
4) Explicit
HelpEREntities
Business Alias
Business Component Timeline
+ explicitly define Driving key(s)
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
Less Mature
JOINoptimizers
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Key-Value
Document
Column Family
NoSQL Databases
+SQL on Hadoop solutions
Do NOT like joins
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
This
REALLYcomplicates
AnchoRModeling
rwerschkull
nllinkedincominrogierwerschkull
And Personally
HUB
SAT
LINK
This one
too(Link Satellite)
HUB
HUB
SAT
SAT
rwerschkull
nllinkedincominrogierwerschkull
problem
rwerschkull
nllinkedincominrogierwerschkull
a) HASHING OF Business keys
Rolling
Stock Nr Datetime Sensor Id Value Concatenated Business Key
Key
Len MD5 Hash
Key
Len
8739
2015-01-22
013427 72A1_FINV 123 8739|2015-01-22 013427|72A1_FINV 34 86ae4c6b0e2e2d5a13a0d11440529aeb 32
8739
2015-01-22
013427 72A1_SLDET 100 8739|2015-01-22 013427|72A1_SLDET 35 51ce9bc292eef407bd7c91a52eebcf2e 32
8739
2015-01-22
013432 72A1_FINV 126 8739|2015-01-22 013432|72A1_FINV 34 9482a41c1fecc4c64b8c437af6cc85e8 32
8739
2015-01-22
013432
13A8_MW_UB
AT_VT 5 8739|2015-01-22 013432|13A8_MW_UBAT_VT 42 e4160914ee55ce0b93f87b23366a0ce3 32
8674
2015-01-22
013426 72A1_FINV 6 8674|2015-01-22 013426|72A1_FINV 34 fcb3e7c8c91e44ce396d908a4948ca65 32
8674
2015-01-22
013426
16A1_HSVER
OND 7 8674|2015-01-22 013426|16A1_HSVEROND 38 fe9098c8c291ad56af5c8afae5169196 32
Loses statistical Informationregarding the data distributionQuery optimizers do not like thishellip
Column family Document and Key-value databases need a
good (natural) sharding key for (partial) key-
lookups
Hashinghelliphellip
httpwwwebaytechblogcom20120814cassandra-data-modeling-best-practices-part-2rwerschkull
nllinkedincominrogierwerschkull
Surrogates keys require
centralized coordination
hellipand thus can impact the overall systemrsquos scalability and availability
A lot of MPP NoSQL databases simply do not have themhellip
B) Surrogate BuSINESS keys
rwerschkull
nllinkedincominrogierwerschkull
Then Some Inspiration
httproelantvoscomblogp=1119
rwerschkull
nllinkedincominrogierwerschkull
lsquoIn my opinion the answer lies in the adoption of the
persistent (Historical) Staging Area concept
(also known as Historical Staging or the History Area)
This basically adopts the fundamentals of a Data Warehousersquo
lsquoThe Historical Staging Area effectively lsquoactsrsquo as
Data Lake but in a better defined form as data deltas and
event datetimes are taken into accountrsquo
So
Time Variant
amp
Non Volatile
Subject Oriented
amp
Integrated
EDW
Subject Oriented Integrated
Time Variant Non-Volatile
EDW
rwerschkull
nllinkedincominrogierwerschkull
Could be a
Data LAKE
VirtualisedEnsemble
Tier
EDW
Time Variant
amp
Non Volatile
Subject Oriented
amp
Integrated
EDW
rwerschkull
nllinkedincominrogierwerschkull
How
Does PS-3C Work
rwerschkull
nllinkedincominrogierwerschkullPhoto credit Public Domain
StagingArea
EDWInformation
Marts
Focus of Current ensemble EDWrsquos
rwerschkull
nllinkedincominrogierwerschkull
Persistent StagingArea HSA =
Data LibraryEDW
Information Marts
Splitting the work
rwerschkull
nllinkedincominrogierwerschkull
Persitent
Staging
-
Concept
Context
Connector
Business
Concept
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
Identify source event stream Primary or Unique KeyUse source metadata for this
Automate the building of a PS lsquoaround this keyrsquo Take all columns
Historize using SCD-2 approach
Persistent Staging - how
rwerschkull
nllinkedincominrogierwerschkull
Entity levelUnique key
Functional Description
Delivering party
Owner Responsible
MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)
hellip
Persistent Staging Metadata-1
rwerschkull
nllinkedincominrogierwerschkull
Column level [Load Date Timestamp]
[Load End Date Timestamp]
[Deleted Flag] OR delete as new record
[Source system] on table file level (lowest possible)
Load End Date Timestamp possible but difficulthellipRequires updates
Persistent Staging Metadata-2
rwerschkull
nllinkedincominrogierwerschkull
ACID is possible in HIVE
ACID Makes Updates possibleBy registering updates as lsquonew datarsquo
Reconciliation compacting when idle at user command
Use ORC files
PLUS changing the HIVE configurationhellip
UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)
rwerschkull
nllinkedincominrogierwerschkull
HivePut semi structured data = variable columns in MAP data type
OR use Data storage type that supports schema-evolutionAVRO (ORC in development)
Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo
Schema can be different for every row
What about SEMI-STRUCTURED Data
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
3C - how
rwerschkull
nllinkedincominrogierwerschkull
Always starts with Conceptual data modeling
NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic
No Link Satellites
No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo
Explicit Helper entities
Like Data Vault(2) BUT
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
a UNIQUE Domain specific point of integration
hellipa business entity
hellipwithin itrsquos own domain
hellipdoes not necessarily need to be Enterprise Wide
Business Concept (BC)
rwerschkull
nllinkedincominrogierwerschkull
Why not lsquoenterprise widersquoCompany
Customer
Sales Customer
International
Sales Customer
Local
Sales Customer
Marketing Customer
Customer hellip
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description]
[Owner Responsible]
Column level [Load Date Timestamp]
[Source system] on table file level (lowest possible)
Business Concept Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example-Data
NSR-Station
NS-
Travelcard
NS-
Trainseries
Business Key
IC|855
IC|8852
Sp|7455
St|16050
hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
NS-
Traveller
Business Key
3528 0234 2073 1234
3528 0234 2073 5678
hellip
Business Key
CRM-RW123456
CRM-LAS224466
hellip
rwerschkull
nllinkedincominrogierwerschkull
Most easy entity to be virtualised(if performance allows)
No Hashing amp No surrogateBUSINESSKEYS
(not by default at least)
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Containts Context about a Concept In a historical way
hellipLike a Data Vault Satellite
Every CC belongs to only one BC
Seperate entity per source system table stream
Concept Context (CC)
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Concept Context Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
Example-dataBK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts META_Deleted_Ind
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0
Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0
Asa hellip hellip hellip hellip hellip hellip
NSR-Station
[adres]Source NSS1 table y
More difficult to be virtualisedDepends on semantic gap with source
But do make virtual when lsquostreaming datarsquo is necessary
Because we have PS layerExposing all columns not necessary
Refactoring is more easyhellip
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Relations between Concepts + Context
In a historical way
hellipMerger of Data Vault Link + Link Satellite
Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point
in time
Connector (C)
rwerschkull
nllinkedincominrogierwerschkull
Explicitly defining a driving key as metadatahellip
Gives business understanding
Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)
bull is not registered as a new lsquoconnectionrsquo
Connector Driving key
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Connector Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
Driving Key
NS-Travelcard
+Checkin timestamp
Example-Data
NSR-StationNS-
Travelcard
NS-
Trainseries
rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
BK_NSR-
Station-from
BK_NSR-
Station-to
BK_NS_
Trainseries
BK_NS-Travelcard Checkin
timestamp
Checkout
timestamp
META_Load_dts META_Load_end_dts
Asd Ut IC 855 3528 0234 2073
1234
5-4-2016 84932 5-4-2016 94012 6-4-2016
220000
31-12-9999
000000
Ut Asd IC 855 3528 0234 2073
1234
5-4-2016 181009 5-4-2016 185520 6-4-2016
220000
31-12-9999
000000
We add Two mandatory
HELPEREntities
Are we there yet No
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
To help switching from sources that are Tied together by technical (surrogate) keyshellip
To a Business Key based model
Itrsquos a LOOKUP table that translates the technical to the
Business Key
Business Alias
Example
NSR-Station
[valuation]Source NSS2 table p
[description]SourceNSS1 table x[adres]
Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
BA-NSS2
Key Lookup voor NSS1 source tables
Key Lookup voor NSS2 source tables
Example-data
NSR-Station
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
Key Lookup voor NSS1 source tables
Business Key NSS1_Surrogate_key
Ut 123522
Asd 666323
Asa 222443
hellip hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
Has a 1 on (01) relation with a Business Concept
More difficult to be virtualised Lookup table should be kept small
Therefore DO NOT do key lookup in Concept Context entity
Load generate together with BC
Preferably lsquoin memoryrsquo somehowhellip
BA important Details
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Integrate the validity timelines of Concept Contextsbelonging to a Business Concept
Like a Data Vault Point-in-time construct
But Mandatory
And with a clearly defined and performantapproach
BC-Timeline
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-Station
[valuation]Source NSS2 table p
[adres]Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BK_NSR-
Station
WOZ
waarde
Waarde
Ratingbureau X
META_Laad_dts META_Laad_eind_dts
Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959
Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959
Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000
BK_NSR-
Station
Combined_Load_dts Combined_Load_end_dts
Ut 5-6-2013 220000 1-1-2014 215959
Ut 1-1-2014 220000 1-1-2015 215959
Ut 1-1-2015 220000 4-7-2015 215959
Ut 4-7-2015 220000 1-3-2016 215959
Ut 1-3-2016 220000 31-12-9999 000000
Asd 5-6-2013 220000 31-12-9999 000000
BK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959
Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000
Asa hellip hellip hellip hellip hellip hellip
BCT
modelling
rwerschkull
nllinkedincominrogierwerschkull
What
Makes PS-3Ca Different Ensemble
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
rwerschkull
nllinkedincominrogierwerschkull
1) Explicitly Splitting The work
Data
+History
Subjects
+Integration
rwerschkull
nllinkedincominrogierwerschkull
2) NO HASHED BUSINeSS KEYS
or surrogate keys
httpwwwcannabisculturecomfilesimages6hashbrickJPG
Only
Concatenatedones
rwerschkull
nllinkedincominrogierwerschkull
3) Less
joinsRelation
+Technical validity timeline
+ Relation context
Together in one entityPhoto credit Public Domain
rwerschkull
nllinkedincominrogierwerschkull
4) Explicit
HelpEREntities
Business Alias
Business Component Timeline
+ explicitly define Driving key(s)
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
Key-Value
Document
Column Family
NoSQL Databases
+SQL on Hadoop solutions
Do NOT like joins
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
This
REALLYcomplicates
AnchoRModeling
rwerschkull
nllinkedincominrogierwerschkull
And Personally
HUB
SAT
LINK
This one
too(Link Satellite)
HUB
HUB
SAT
SAT
rwerschkull
nllinkedincominrogierwerschkull
problem
rwerschkull
nllinkedincominrogierwerschkull
a) HASHING OF Business keys
Rolling
Stock Nr Datetime Sensor Id Value Concatenated Business Key
Key
Len MD5 Hash
Key
Len
8739
2015-01-22
013427 72A1_FINV 123 8739|2015-01-22 013427|72A1_FINV 34 86ae4c6b0e2e2d5a13a0d11440529aeb 32
8739
2015-01-22
013427 72A1_SLDET 100 8739|2015-01-22 013427|72A1_SLDET 35 51ce9bc292eef407bd7c91a52eebcf2e 32
8739
2015-01-22
013432 72A1_FINV 126 8739|2015-01-22 013432|72A1_FINV 34 9482a41c1fecc4c64b8c437af6cc85e8 32
8739
2015-01-22
013432
13A8_MW_UB
AT_VT 5 8739|2015-01-22 013432|13A8_MW_UBAT_VT 42 e4160914ee55ce0b93f87b23366a0ce3 32
8674
2015-01-22
013426 72A1_FINV 6 8674|2015-01-22 013426|72A1_FINV 34 fcb3e7c8c91e44ce396d908a4948ca65 32
8674
2015-01-22
013426
16A1_HSVER
OND 7 8674|2015-01-22 013426|16A1_HSVEROND 38 fe9098c8c291ad56af5c8afae5169196 32
Loses statistical Informationregarding the data distributionQuery optimizers do not like thishellip
Column family Document and Key-value databases need a
good (natural) sharding key for (partial) key-
lookups
Hashinghelliphellip
httpwwwebaytechblogcom20120814cassandra-data-modeling-best-practices-part-2rwerschkull
nllinkedincominrogierwerschkull
Surrogates keys require
centralized coordination
hellipand thus can impact the overall systemrsquos scalability and availability
A lot of MPP NoSQL databases simply do not have themhellip
B) Surrogate BuSINESS keys
rwerschkull
nllinkedincominrogierwerschkull
Then Some Inspiration
httproelantvoscomblogp=1119
rwerschkull
nllinkedincominrogierwerschkull
lsquoIn my opinion the answer lies in the adoption of the
persistent (Historical) Staging Area concept
(also known as Historical Staging or the History Area)
This basically adopts the fundamentals of a Data Warehousersquo
lsquoThe Historical Staging Area effectively lsquoactsrsquo as
Data Lake but in a better defined form as data deltas and
event datetimes are taken into accountrsquo
So
Time Variant
amp
Non Volatile
Subject Oriented
amp
Integrated
EDW
Subject Oriented Integrated
Time Variant Non-Volatile
EDW
rwerschkull
nllinkedincominrogierwerschkull
Could be a
Data LAKE
VirtualisedEnsemble
Tier
EDW
Time Variant
amp
Non Volatile
Subject Oriented
amp
Integrated
EDW
rwerschkull
nllinkedincominrogierwerschkull
How
Does PS-3C Work
rwerschkull
nllinkedincominrogierwerschkullPhoto credit Public Domain
StagingArea
EDWInformation
Marts
Focus of Current ensemble EDWrsquos
rwerschkull
nllinkedincominrogierwerschkull
Persistent StagingArea HSA =
Data LibraryEDW
Information Marts
Splitting the work
rwerschkull
nllinkedincominrogierwerschkull
Persitent
Staging
-
Concept
Context
Connector
Business
Concept
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
Identify source event stream Primary or Unique KeyUse source metadata for this
Automate the building of a PS lsquoaround this keyrsquo Take all columns
Historize using SCD-2 approach
Persistent Staging - how
rwerschkull
nllinkedincominrogierwerschkull
Entity levelUnique key
Functional Description
Delivering party
Owner Responsible
MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)
hellip
Persistent Staging Metadata-1
rwerschkull
nllinkedincominrogierwerschkull
Column level [Load Date Timestamp]
[Load End Date Timestamp]
[Deleted Flag] OR delete as new record
[Source system] on table file level (lowest possible)
Load End Date Timestamp possible but difficulthellipRequires updates
Persistent Staging Metadata-2
rwerschkull
nllinkedincominrogierwerschkull
ACID is possible in HIVE
ACID Makes Updates possibleBy registering updates as lsquonew datarsquo
Reconciliation compacting when idle at user command
Use ORC files
PLUS changing the HIVE configurationhellip
UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)
rwerschkull
nllinkedincominrogierwerschkull
HivePut semi structured data = variable columns in MAP data type
OR use Data storage type that supports schema-evolutionAVRO (ORC in development)
Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo
Schema can be different for every row
What about SEMI-STRUCTURED Data
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
3C - how
rwerschkull
nllinkedincominrogierwerschkull
Always starts with Conceptual data modeling
NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic
No Link Satellites
No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo
Explicit Helper entities
Like Data Vault(2) BUT
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
a UNIQUE Domain specific point of integration
hellipa business entity
hellipwithin itrsquos own domain
hellipdoes not necessarily need to be Enterprise Wide
Business Concept (BC)
rwerschkull
nllinkedincominrogierwerschkull
Why not lsquoenterprise widersquoCompany
Customer
Sales Customer
International
Sales Customer
Local
Sales Customer
Marketing Customer
Customer hellip
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description]
[Owner Responsible]
Column level [Load Date Timestamp]
[Source system] on table file level (lowest possible)
Business Concept Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example-Data
NSR-Station
NS-
Travelcard
NS-
Trainseries
Business Key
IC|855
IC|8852
Sp|7455
St|16050
hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
NS-
Traveller
Business Key
3528 0234 2073 1234
3528 0234 2073 5678
hellip
Business Key
CRM-RW123456
CRM-LAS224466
hellip
rwerschkull
nllinkedincominrogierwerschkull
Most easy entity to be virtualised(if performance allows)
No Hashing amp No surrogateBUSINESSKEYS
(not by default at least)
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Containts Context about a Concept In a historical way
hellipLike a Data Vault Satellite
Every CC belongs to only one BC
Seperate entity per source system table stream
Concept Context (CC)
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Concept Context Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
Example-dataBK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts META_Deleted_Ind
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0
Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0
Asa hellip hellip hellip hellip hellip hellip
NSR-Station
[adres]Source NSS1 table y
More difficult to be virtualisedDepends on semantic gap with source
But do make virtual when lsquostreaming datarsquo is necessary
Because we have PS layerExposing all columns not necessary
Refactoring is more easyhellip
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Relations between Concepts + Context
In a historical way
hellipMerger of Data Vault Link + Link Satellite
Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point
in time
Connector (C)
rwerschkull
nllinkedincominrogierwerschkull
Explicitly defining a driving key as metadatahellip
Gives business understanding
Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)
bull is not registered as a new lsquoconnectionrsquo
Connector Driving key
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Connector Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
Driving Key
NS-Travelcard
+Checkin timestamp
Example-Data
NSR-StationNS-
Travelcard
NS-
Trainseries
rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
BK_NSR-
Station-from
BK_NSR-
Station-to
BK_NS_
Trainseries
BK_NS-Travelcard Checkin
timestamp
Checkout
timestamp
META_Load_dts META_Load_end_dts
Asd Ut IC 855 3528 0234 2073
1234
5-4-2016 84932 5-4-2016 94012 6-4-2016
220000
31-12-9999
000000
Ut Asd IC 855 3528 0234 2073
1234
5-4-2016 181009 5-4-2016 185520 6-4-2016
220000
31-12-9999
000000
We add Two mandatory
HELPEREntities
Are we there yet No
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
To help switching from sources that are Tied together by technical (surrogate) keyshellip
To a Business Key based model
Itrsquos a LOOKUP table that translates the technical to the
Business Key
Business Alias
Example
NSR-Station
[valuation]Source NSS2 table p
[description]SourceNSS1 table x[adres]
Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
BA-NSS2
Key Lookup voor NSS1 source tables
Key Lookup voor NSS2 source tables
Example-data
NSR-Station
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
Key Lookup voor NSS1 source tables
Business Key NSS1_Surrogate_key
Ut 123522
Asd 666323
Asa 222443
hellip hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
Has a 1 on (01) relation with a Business Concept
More difficult to be virtualised Lookup table should be kept small
Therefore DO NOT do key lookup in Concept Context entity
Load generate together with BC
Preferably lsquoin memoryrsquo somehowhellip
BA important Details
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Integrate the validity timelines of Concept Contextsbelonging to a Business Concept
Like a Data Vault Point-in-time construct
But Mandatory
And with a clearly defined and performantapproach
BC-Timeline
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-Station
[valuation]Source NSS2 table p
[adres]Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BK_NSR-
Station
WOZ
waarde
Waarde
Ratingbureau X
META_Laad_dts META_Laad_eind_dts
Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959
Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959
Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000
BK_NSR-
Station
Combined_Load_dts Combined_Load_end_dts
Ut 5-6-2013 220000 1-1-2014 215959
Ut 1-1-2014 220000 1-1-2015 215959
Ut 1-1-2015 220000 4-7-2015 215959
Ut 4-7-2015 220000 1-3-2016 215959
Ut 1-3-2016 220000 31-12-9999 000000
Asd 5-6-2013 220000 31-12-9999 000000
BK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959
Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000
Asa hellip hellip hellip hellip hellip hellip
BCT
modelling
rwerschkull
nllinkedincominrogierwerschkull
What
Makes PS-3Ca Different Ensemble
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
rwerschkull
nllinkedincominrogierwerschkull
1) Explicitly Splitting The work
Data
+History
Subjects
+Integration
rwerschkull
nllinkedincominrogierwerschkull
2) NO HASHED BUSINeSS KEYS
or surrogate keys
httpwwwcannabisculturecomfilesimages6hashbrickJPG
Only
Concatenatedones
rwerschkull
nllinkedincominrogierwerschkull
3) Less
joinsRelation
+Technical validity timeline
+ Relation context
Together in one entityPhoto credit Public Domain
rwerschkull
nllinkedincominrogierwerschkull
4) Explicit
HelpEREntities
Business Alias
Business Component Timeline
+ explicitly define Driving key(s)
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
This
REALLYcomplicates
AnchoRModeling
rwerschkull
nllinkedincominrogierwerschkull
And Personally
HUB
SAT
LINK
This one
too(Link Satellite)
HUB
HUB
SAT
SAT
rwerschkull
nllinkedincominrogierwerschkull
problem
rwerschkull
nllinkedincominrogierwerschkull
a) HASHING OF Business keys
Rolling
Stock Nr Datetime Sensor Id Value Concatenated Business Key
Key
Len MD5 Hash
Key
Len
8739
2015-01-22
013427 72A1_FINV 123 8739|2015-01-22 013427|72A1_FINV 34 86ae4c6b0e2e2d5a13a0d11440529aeb 32
8739
2015-01-22
013427 72A1_SLDET 100 8739|2015-01-22 013427|72A1_SLDET 35 51ce9bc292eef407bd7c91a52eebcf2e 32
8739
2015-01-22
013432 72A1_FINV 126 8739|2015-01-22 013432|72A1_FINV 34 9482a41c1fecc4c64b8c437af6cc85e8 32
8739
2015-01-22
013432
13A8_MW_UB
AT_VT 5 8739|2015-01-22 013432|13A8_MW_UBAT_VT 42 e4160914ee55ce0b93f87b23366a0ce3 32
8674
2015-01-22
013426 72A1_FINV 6 8674|2015-01-22 013426|72A1_FINV 34 fcb3e7c8c91e44ce396d908a4948ca65 32
8674
2015-01-22
013426
16A1_HSVER
OND 7 8674|2015-01-22 013426|16A1_HSVEROND 38 fe9098c8c291ad56af5c8afae5169196 32
Loses statistical Informationregarding the data distributionQuery optimizers do not like thishellip
Column family Document and Key-value databases need a
good (natural) sharding key for (partial) key-
lookups
Hashinghelliphellip
httpwwwebaytechblogcom20120814cassandra-data-modeling-best-practices-part-2rwerschkull
nllinkedincominrogierwerschkull
Surrogates keys require
centralized coordination
hellipand thus can impact the overall systemrsquos scalability and availability
A lot of MPP NoSQL databases simply do not have themhellip
B) Surrogate BuSINESS keys
rwerschkull
nllinkedincominrogierwerschkull
Then Some Inspiration
httproelantvoscomblogp=1119
rwerschkull
nllinkedincominrogierwerschkull
lsquoIn my opinion the answer lies in the adoption of the
persistent (Historical) Staging Area concept
(also known as Historical Staging or the History Area)
This basically adopts the fundamentals of a Data Warehousersquo
lsquoThe Historical Staging Area effectively lsquoactsrsquo as
Data Lake but in a better defined form as data deltas and
event datetimes are taken into accountrsquo
So
Time Variant
amp
Non Volatile
Subject Oriented
amp
Integrated
EDW
Subject Oriented Integrated
Time Variant Non-Volatile
EDW
rwerschkull
nllinkedincominrogierwerschkull
Could be a
Data LAKE
VirtualisedEnsemble
Tier
EDW
Time Variant
amp
Non Volatile
Subject Oriented
amp
Integrated
EDW
rwerschkull
nllinkedincominrogierwerschkull
How
Does PS-3C Work
rwerschkull
nllinkedincominrogierwerschkullPhoto credit Public Domain
StagingArea
EDWInformation
Marts
Focus of Current ensemble EDWrsquos
rwerschkull
nllinkedincominrogierwerschkull
Persistent StagingArea HSA =
Data LibraryEDW
Information Marts
Splitting the work
rwerschkull
nllinkedincominrogierwerschkull
Persitent
Staging
-
Concept
Context
Connector
Business
Concept
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
Identify source event stream Primary or Unique KeyUse source metadata for this
Automate the building of a PS lsquoaround this keyrsquo Take all columns
Historize using SCD-2 approach
Persistent Staging - how
rwerschkull
nllinkedincominrogierwerschkull
Entity levelUnique key
Functional Description
Delivering party
Owner Responsible
MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)
hellip
Persistent Staging Metadata-1
rwerschkull
nllinkedincominrogierwerschkull
Column level [Load Date Timestamp]
[Load End Date Timestamp]
[Deleted Flag] OR delete as new record
[Source system] on table file level (lowest possible)
Load End Date Timestamp possible but difficulthellipRequires updates
Persistent Staging Metadata-2
rwerschkull
nllinkedincominrogierwerschkull
ACID is possible in HIVE
ACID Makes Updates possibleBy registering updates as lsquonew datarsquo
Reconciliation compacting when idle at user command
Use ORC files
PLUS changing the HIVE configurationhellip
UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)
rwerschkull
nllinkedincominrogierwerschkull
HivePut semi structured data = variable columns in MAP data type
OR use Data storage type that supports schema-evolutionAVRO (ORC in development)
Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo
Schema can be different for every row
What about SEMI-STRUCTURED Data
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
3C - how
rwerschkull
nllinkedincominrogierwerschkull
Always starts with Conceptual data modeling
NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic
No Link Satellites
No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo
Explicit Helper entities
Like Data Vault(2) BUT
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
a UNIQUE Domain specific point of integration
hellipa business entity
hellipwithin itrsquos own domain
hellipdoes not necessarily need to be Enterprise Wide
Business Concept (BC)
rwerschkull
nllinkedincominrogierwerschkull
Why not lsquoenterprise widersquoCompany
Customer
Sales Customer
International
Sales Customer
Local
Sales Customer
Marketing Customer
Customer hellip
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description]
[Owner Responsible]
Column level [Load Date Timestamp]
[Source system] on table file level (lowest possible)
Business Concept Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example-Data
NSR-Station
NS-
Travelcard
NS-
Trainseries
Business Key
IC|855
IC|8852
Sp|7455
St|16050
hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
NS-
Traveller
Business Key
3528 0234 2073 1234
3528 0234 2073 5678
hellip
Business Key
CRM-RW123456
CRM-LAS224466
hellip
rwerschkull
nllinkedincominrogierwerschkull
Most easy entity to be virtualised(if performance allows)
No Hashing amp No surrogateBUSINESSKEYS
(not by default at least)
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Containts Context about a Concept In a historical way
hellipLike a Data Vault Satellite
Every CC belongs to only one BC
Seperate entity per source system table stream
Concept Context (CC)
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Concept Context Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
Example-dataBK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts META_Deleted_Ind
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0
Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0
Asa hellip hellip hellip hellip hellip hellip
NSR-Station
[adres]Source NSS1 table y
More difficult to be virtualisedDepends on semantic gap with source
But do make virtual when lsquostreaming datarsquo is necessary
Because we have PS layerExposing all columns not necessary
Refactoring is more easyhellip
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Relations between Concepts + Context
In a historical way
hellipMerger of Data Vault Link + Link Satellite
Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point
in time
Connector (C)
rwerschkull
nllinkedincominrogierwerschkull
Explicitly defining a driving key as metadatahellip
Gives business understanding
Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)
bull is not registered as a new lsquoconnectionrsquo
Connector Driving key
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Connector Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
Driving Key
NS-Travelcard
+Checkin timestamp
Example-Data
NSR-StationNS-
Travelcard
NS-
Trainseries
rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
BK_NSR-
Station-from
BK_NSR-
Station-to
BK_NS_
Trainseries
BK_NS-Travelcard Checkin
timestamp
Checkout
timestamp
META_Load_dts META_Load_end_dts
Asd Ut IC 855 3528 0234 2073
1234
5-4-2016 84932 5-4-2016 94012 6-4-2016
220000
31-12-9999
000000
Ut Asd IC 855 3528 0234 2073
1234
5-4-2016 181009 5-4-2016 185520 6-4-2016
220000
31-12-9999
000000
We add Two mandatory
HELPEREntities
Are we there yet No
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
To help switching from sources that are Tied together by technical (surrogate) keyshellip
To a Business Key based model
Itrsquos a LOOKUP table that translates the technical to the
Business Key
Business Alias
Example
NSR-Station
[valuation]Source NSS2 table p
[description]SourceNSS1 table x[adres]
Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
BA-NSS2
Key Lookup voor NSS1 source tables
Key Lookup voor NSS2 source tables
Example-data
NSR-Station
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
Key Lookup voor NSS1 source tables
Business Key NSS1_Surrogate_key
Ut 123522
Asd 666323
Asa 222443
hellip hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
Has a 1 on (01) relation with a Business Concept
More difficult to be virtualised Lookup table should be kept small
Therefore DO NOT do key lookup in Concept Context entity
Load generate together with BC
Preferably lsquoin memoryrsquo somehowhellip
BA important Details
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Integrate the validity timelines of Concept Contextsbelonging to a Business Concept
Like a Data Vault Point-in-time construct
But Mandatory
And with a clearly defined and performantapproach
BC-Timeline
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-Station
[valuation]Source NSS2 table p
[adres]Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BK_NSR-
Station
WOZ
waarde
Waarde
Ratingbureau X
META_Laad_dts META_Laad_eind_dts
Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959
Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959
Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000
BK_NSR-
Station
Combined_Load_dts Combined_Load_end_dts
Ut 5-6-2013 220000 1-1-2014 215959
Ut 1-1-2014 220000 1-1-2015 215959
Ut 1-1-2015 220000 4-7-2015 215959
Ut 4-7-2015 220000 1-3-2016 215959
Ut 1-3-2016 220000 31-12-9999 000000
Asd 5-6-2013 220000 31-12-9999 000000
BK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959
Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000
Asa hellip hellip hellip hellip hellip hellip
BCT
modelling
rwerschkull
nllinkedincominrogierwerschkull
What
Makes PS-3Ca Different Ensemble
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
rwerschkull
nllinkedincominrogierwerschkull
1) Explicitly Splitting The work
Data
+History
Subjects
+Integration
rwerschkull
nllinkedincominrogierwerschkull
2) NO HASHED BUSINeSS KEYS
or surrogate keys
httpwwwcannabisculturecomfilesimages6hashbrickJPG
Only
Concatenatedones
rwerschkull
nllinkedincominrogierwerschkull
3) Less
joinsRelation
+Technical validity timeline
+ Relation context
Together in one entityPhoto credit Public Domain
rwerschkull
nllinkedincominrogierwerschkull
4) Explicit
HelpEREntities
Business Alias
Business Component Timeline
+ explicitly define Driving key(s)
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
And Personally
HUB
SAT
LINK
This one
too(Link Satellite)
HUB
HUB
SAT
SAT
rwerschkull
nllinkedincominrogierwerschkull
problem
rwerschkull
nllinkedincominrogierwerschkull
a) HASHING OF Business keys
Rolling
Stock Nr Datetime Sensor Id Value Concatenated Business Key
Key
Len MD5 Hash
Key
Len
8739
2015-01-22
013427 72A1_FINV 123 8739|2015-01-22 013427|72A1_FINV 34 86ae4c6b0e2e2d5a13a0d11440529aeb 32
8739
2015-01-22
013427 72A1_SLDET 100 8739|2015-01-22 013427|72A1_SLDET 35 51ce9bc292eef407bd7c91a52eebcf2e 32
8739
2015-01-22
013432 72A1_FINV 126 8739|2015-01-22 013432|72A1_FINV 34 9482a41c1fecc4c64b8c437af6cc85e8 32
8739
2015-01-22
013432
13A8_MW_UB
AT_VT 5 8739|2015-01-22 013432|13A8_MW_UBAT_VT 42 e4160914ee55ce0b93f87b23366a0ce3 32
8674
2015-01-22
013426 72A1_FINV 6 8674|2015-01-22 013426|72A1_FINV 34 fcb3e7c8c91e44ce396d908a4948ca65 32
8674
2015-01-22
013426
16A1_HSVER
OND 7 8674|2015-01-22 013426|16A1_HSVEROND 38 fe9098c8c291ad56af5c8afae5169196 32
Loses statistical Informationregarding the data distributionQuery optimizers do not like thishellip
Column family Document and Key-value databases need a
good (natural) sharding key for (partial) key-
lookups
Hashinghelliphellip
httpwwwebaytechblogcom20120814cassandra-data-modeling-best-practices-part-2rwerschkull
nllinkedincominrogierwerschkull
Surrogates keys require
centralized coordination
hellipand thus can impact the overall systemrsquos scalability and availability
A lot of MPP NoSQL databases simply do not have themhellip
B) Surrogate BuSINESS keys
rwerschkull
nllinkedincominrogierwerschkull
Then Some Inspiration
httproelantvoscomblogp=1119
rwerschkull
nllinkedincominrogierwerschkull
lsquoIn my opinion the answer lies in the adoption of the
persistent (Historical) Staging Area concept
(also known as Historical Staging or the History Area)
This basically adopts the fundamentals of a Data Warehousersquo
lsquoThe Historical Staging Area effectively lsquoactsrsquo as
Data Lake but in a better defined form as data deltas and
event datetimes are taken into accountrsquo
So
Time Variant
amp
Non Volatile
Subject Oriented
amp
Integrated
EDW
Subject Oriented Integrated
Time Variant Non-Volatile
EDW
rwerschkull
nllinkedincominrogierwerschkull
Could be a
Data LAKE
VirtualisedEnsemble
Tier
EDW
Time Variant
amp
Non Volatile
Subject Oriented
amp
Integrated
EDW
rwerschkull
nllinkedincominrogierwerschkull
How
Does PS-3C Work
rwerschkull
nllinkedincominrogierwerschkullPhoto credit Public Domain
StagingArea
EDWInformation
Marts
Focus of Current ensemble EDWrsquos
rwerschkull
nllinkedincominrogierwerschkull
Persistent StagingArea HSA =
Data LibraryEDW
Information Marts
Splitting the work
rwerschkull
nllinkedincominrogierwerschkull
Persitent
Staging
-
Concept
Context
Connector
Business
Concept
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
Identify source event stream Primary or Unique KeyUse source metadata for this
Automate the building of a PS lsquoaround this keyrsquo Take all columns
Historize using SCD-2 approach
Persistent Staging - how
rwerschkull
nllinkedincominrogierwerschkull
Entity levelUnique key
Functional Description
Delivering party
Owner Responsible
MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)
hellip
Persistent Staging Metadata-1
rwerschkull
nllinkedincominrogierwerschkull
Column level [Load Date Timestamp]
[Load End Date Timestamp]
[Deleted Flag] OR delete as new record
[Source system] on table file level (lowest possible)
Load End Date Timestamp possible but difficulthellipRequires updates
Persistent Staging Metadata-2
rwerschkull
nllinkedincominrogierwerschkull
ACID is possible in HIVE
ACID Makes Updates possibleBy registering updates as lsquonew datarsquo
Reconciliation compacting when idle at user command
Use ORC files
PLUS changing the HIVE configurationhellip
UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)
rwerschkull
nllinkedincominrogierwerschkull
HivePut semi structured data = variable columns in MAP data type
OR use Data storage type that supports schema-evolutionAVRO (ORC in development)
Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo
Schema can be different for every row
What about SEMI-STRUCTURED Data
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
3C - how
rwerschkull
nllinkedincominrogierwerschkull
Always starts with Conceptual data modeling
NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic
No Link Satellites
No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo
Explicit Helper entities
Like Data Vault(2) BUT
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
a UNIQUE Domain specific point of integration
hellipa business entity
hellipwithin itrsquos own domain
hellipdoes not necessarily need to be Enterprise Wide
Business Concept (BC)
rwerschkull
nllinkedincominrogierwerschkull
Why not lsquoenterprise widersquoCompany
Customer
Sales Customer
International
Sales Customer
Local
Sales Customer
Marketing Customer
Customer hellip
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description]
[Owner Responsible]
Column level [Load Date Timestamp]
[Source system] on table file level (lowest possible)
Business Concept Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example-Data
NSR-Station
NS-
Travelcard
NS-
Trainseries
Business Key
IC|855
IC|8852
Sp|7455
St|16050
hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
NS-
Traveller
Business Key
3528 0234 2073 1234
3528 0234 2073 5678
hellip
Business Key
CRM-RW123456
CRM-LAS224466
hellip
rwerschkull
nllinkedincominrogierwerschkull
Most easy entity to be virtualised(if performance allows)
No Hashing amp No surrogateBUSINESSKEYS
(not by default at least)
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Containts Context about a Concept In a historical way
hellipLike a Data Vault Satellite
Every CC belongs to only one BC
Seperate entity per source system table stream
Concept Context (CC)
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Concept Context Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
Example-dataBK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts META_Deleted_Ind
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0
Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0
Asa hellip hellip hellip hellip hellip hellip
NSR-Station
[adres]Source NSS1 table y
More difficult to be virtualisedDepends on semantic gap with source
But do make virtual when lsquostreaming datarsquo is necessary
Because we have PS layerExposing all columns not necessary
Refactoring is more easyhellip
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Relations between Concepts + Context
In a historical way
hellipMerger of Data Vault Link + Link Satellite
Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point
in time
Connector (C)
rwerschkull
nllinkedincominrogierwerschkull
Explicitly defining a driving key as metadatahellip
Gives business understanding
Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)
bull is not registered as a new lsquoconnectionrsquo
Connector Driving key
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Connector Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
Driving Key
NS-Travelcard
+Checkin timestamp
Example-Data
NSR-StationNS-
Travelcard
NS-
Trainseries
rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
BK_NSR-
Station-from
BK_NSR-
Station-to
BK_NS_
Trainseries
BK_NS-Travelcard Checkin
timestamp
Checkout
timestamp
META_Load_dts META_Load_end_dts
Asd Ut IC 855 3528 0234 2073
1234
5-4-2016 84932 5-4-2016 94012 6-4-2016
220000
31-12-9999
000000
Ut Asd IC 855 3528 0234 2073
1234
5-4-2016 181009 5-4-2016 185520 6-4-2016
220000
31-12-9999
000000
We add Two mandatory
HELPEREntities
Are we there yet No
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
To help switching from sources that are Tied together by technical (surrogate) keyshellip
To a Business Key based model
Itrsquos a LOOKUP table that translates the technical to the
Business Key
Business Alias
Example
NSR-Station
[valuation]Source NSS2 table p
[description]SourceNSS1 table x[adres]
Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
BA-NSS2
Key Lookup voor NSS1 source tables
Key Lookup voor NSS2 source tables
Example-data
NSR-Station
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
Key Lookup voor NSS1 source tables
Business Key NSS1_Surrogate_key
Ut 123522
Asd 666323
Asa 222443
hellip hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
Has a 1 on (01) relation with a Business Concept
More difficult to be virtualised Lookup table should be kept small
Therefore DO NOT do key lookup in Concept Context entity
Load generate together with BC
Preferably lsquoin memoryrsquo somehowhellip
BA important Details
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Integrate the validity timelines of Concept Contextsbelonging to a Business Concept
Like a Data Vault Point-in-time construct
But Mandatory
And with a clearly defined and performantapproach
BC-Timeline
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-Station
[valuation]Source NSS2 table p
[adres]Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BK_NSR-
Station
WOZ
waarde
Waarde
Ratingbureau X
META_Laad_dts META_Laad_eind_dts
Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959
Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959
Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000
BK_NSR-
Station
Combined_Load_dts Combined_Load_end_dts
Ut 5-6-2013 220000 1-1-2014 215959
Ut 1-1-2014 220000 1-1-2015 215959
Ut 1-1-2015 220000 4-7-2015 215959
Ut 4-7-2015 220000 1-3-2016 215959
Ut 1-3-2016 220000 31-12-9999 000000
Asd 5-6-2013 220000 31-12-9999 000000
BK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959
Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000
Asa hellip hellip hellip hellip hellip hellip
BCT
modelling
rwerschkull
nllinkedincominrogierwerschkull
What
Makes PS-3Ca Different Ensemble
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
rwerschkull
nllinkedincominrogierwerschkull
1) Explicitly Splitting The work
Data
+History
Subjects
+Integration
rwerschkull
nllinkedincominrogierwerschkull
2) NO HASHED BUSINeSS KEYS
or surrogate keys
httpwwwcannabisculturecomfilesimages6hashbrickJPG
Only
Concatenatedones
rwerschkull
nllinkedincominrogierwerschkull
3) Less
joinsRelation
+Technical validity timeline
+ Relation context
Together in one entityPhoto credit Public Domain
rwerschkull
nllinkedincominrogierwerschkull
4) Explicit
HelpEREntities
Business Alias
Business Component Timeline
+ explicitly define Driving key(s)
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
problem
rwerschkull
nllinkedincominrogierwerschkull
a) HASHING OF Business keys
Rolling
Stock Nr Datetime Sensor Id Value Concatenated Business Key
Key
Len MD5 Hash
Key
Len
8739
2015-01-22
013427 72A1_FINV 123 8739|2015-01-22 013427|72A1_FINV 34 86ae4c6b0e2e2d5a13a0d11440529aeb 32
8739
2015-01-22
013427 72A1_SLDET 100 8739|2015-01-22 013427|72A1_SLDET 35 51ce9bc292eef407bd7c91a52eebcf2e 32
8739
2015-01-22
013432 72A1_FINV 126 8739|2015-01-22 013432|72A1_FINV 34 9482a41c1fecc4c64b8c437af6cc85e8 32
8739
2015-01-22
013432
13A8_MW_UB
AT_VT 5 8739|2015-01-22 013432|13A8_MW_UBAT_VT 42 e4160914ee55ce0b93f87b23366a0ce3 32
8674
2015-01-22
013426 72A1_FINV 6 8674|2015-01-22 013426|72A1_FINV 34 fcb3e7c8c91e44ce396d908a4948ca65 32
8674
2015-01-22
013426
16A1_HSVER
OND 7 8674|2015-01-22 013426|16A1_HSVEROND 38 fe9098c8c291ad56af5c8afae5169196 32
Loses statistical Informationregarding the data distributionQuery optimizers do not like thishellip
Column family Document and Key-value databases need a
good (natural) sharding key for (partial) key-
lookups
Hashinghelliphellip
httpwwwebaytechblogcom20120814cassandra-data-modeling-best-practices-part-2rwerschkull
nllinkedincominrogierwerschkull
Surrogates keys require
centralized coordination
hellipand thus can impact the overall systemrsquos scalability and availability
A lot of MPP NoSQL databases simply do not have themhellip
B) Surrogate BuSINESS keys
rwerschkull
nllinkedincominrogierwerschkull
Then Some Inspiration
httproelantvoscomblogp=1119
rwerschkull
nllinkedincominrogierwerschkull
lsquoIn my opinion the answer lies in the adoption of the
persistent (Historical) Staging Area concept
(also known as Historical Staging or the History Area)
This basically adopts the fundamentals of a Data Warehousersquo
lsquoThe Historical Staging Area effectively lsquoactsrsquo as
Data Lake but in a better defined form as data deltas and
event datetimes are taken into accountrsquo
So
Time Variant
amp
Non Volatile
Subject Oriented
amp
Integrated
EDW
Subject Oriented Integrated
Time Variant Non-Volatile
EDW
rwerschkull
nllinkedincominrogierwerschkull
Could be a
Data LAKE
VirtualisedEnsemble
Tier
EDW
Time Variant
amp
Non Volatile
Subject Oriented
amp
Integrated
EDW
rwerschkull
nllinkedincominrogierwerschkull
How
Does PS-3C Work
rwerschkull
nllinkedincominrogierwerschkullPhoto credit Public Domain
StagingArea
EDWInformation
Marts
Focus of Current ensemble EDWrsquos
rwerschkull
nllinkedincominrogierwerschkull
Persistent StagingArea HSA =
Data LibraryEDW
Information Marts
Splitting the work
rwerschkull
nllinkedincominrogierwerschkull
Persitent
Staging
-
Concept
Context
Connector
Business
Concept
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
Identify source event stream Primary or Unique KeyUse source metadata for this
Automate the building of a PS lsquoaround this keyrsquo Take all columns
Historize using SCD-2 approach
Persistent Staging - how
rwerschkull
nllinkedincominrogierwerschkull
Entity levelUnique key
Functional Description
Delivering party
Owner Responsible
MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)
hellip
Persistent Staging Metadata-1
rwerschkull
nllinkedincominrogierwerschkull
Column level [Load Date Timestamp]
[Load End Date Timestamp]
[Deleted Flag] OR delete as new record
[Source system] on table file level (lowest possible)
Load End Date Timestamp possible but difficulthellipRequires updates
Persistent Staging Metadata-2
rwerschkull
nllinkedincominrogierwerschkull
ACID is possible in HIVE
ACID Makes Updates possibleBy registering updates as lsquonew datarsquo
Reconciliation compacting when idle at user command
Use ORC files
PLUS changing the HIVE configurationhellip
UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)
rwerschkull
nllinkedincominrogierwerschkull
HivePut semi structured data = variable columns in MAP data type
OR use Data storage type that supports schema-evolutionAVRO (ORC in development)
Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo
Schema can be different for every row
What about SEMI-STRUCTURED Data
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
3C - how
rwerschkull
nllinkedincominrogierwerschkull
Always starts with Conceptual data modeling
NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic
No Link Satellites
No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo
Explicit Helper entities
Like Data Vault(2) BUT
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
a UNIQUE Domain specific point of integration
hellipa business entity
hellipwithin itrsquos own domain
hellipdoes not necessarily need to be Enterprise Wide
Business Concept (BC)
rwerschkull
nllinkedincominrogierwerschkull
Why not lsquoenterprise widersquoCompany
Customer
Sales Customer
International
Sales Customer
Local
Sales Customer
Marketing Customer
Customer hellip
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description]
[Owner Responsible]
Column level [Load Date Timestamp]
[Source system] on table file level (lowest possible)
Business Concept Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example-Data
NSR-Station
NS-
Travelcard
NS-
Trainseries
Business Key
IC|855
IC|8852
Sp|7455
St|16050
hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
NS-
Traveller
Business Key
3528 0234 2073 1234
3528 0234 2073 5678
hellip
Business Key
CRM-RW123456
CRM-LAS224466
hellip
rwerschkull
nllinkedincominrogierwerschkull
Most easy entity to be virtualised(if performance allows)
No Hashing amp No surrogateBUSINESSKEYS
(not by default at least)
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Containts Context about a Concept In a historical way
hellipLike a Data Vault Satellite
Every CC belongs to only one BC
Seperate entity per source system table stream
Concept Context (CC)
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Concept Context Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
Example-dataBK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts META_Deleted_Ind
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0
Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0
Asa hellip hellip hellip hellip hellip hellip
NSR-Station
[adres]Source NSS1 table y
More difficult to be virtualisedDepends on semantic gap with source
But do make virtual when lsquostreaming datarsquo is necessary
Because we have PS layerExposing all columns not necessary
Refactoring is more easyhellip
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Relations between Concepts + Context
In a historical way
hellipMerger of Data Vault Link + Link Satellite
Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point
in time
Connector (C)
rwerschkull
nllinkedincominrogierwerschkull
Explicitly defining a driving key as metadatahellip
Gives business understanding
Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)
bull is not registered as a new lsquoconnectionrsquo
Connector Driving key
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Connector Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
Driving Key
NS-Travelcard
+Checkin timestamp
Example-Data
NSR-StationNS-
Travelcard
NS-
Trainseries
rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
BK_NSR-
Station-from
BK_NSR-
Station-to
BK_NS_
Trainseries
BK_NS-Travelcard Checkin
timestamp
Checkout
timestamp
META_Load_dts META_Load_end_dts
Asd Ut IC 855 3528 0234 2073
1234
5-4-2016 84932 5-4-2016 94012 6-4-2016
220000
31-12-9999
000000
Ut Asd IC 855 3528 0234 2073
1234
5-4-2016 181009 5-4-2016 185520 6-4-2016
220000
31-12-9999
000000
We add Two mandatory
HELPEREntities
Are we there yet No
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
To help switching from sources that are Tied together by technical (surrogate) keyshellip
To a Business Key based model
Itrsquos a LOOKUP table that translates the technical to the
Business Key
Business Alias
Example
NSR-Station
[valuation]Source NSS2 table p
[description]SourceNSS1 table x[adres]
Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
BA-NSS2
Key Lookup voor NSS1 source tables
Key Lookup voor NSS2 source tables
Example-data
NSR-Station
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
Key Lookup voor NSS1 source tables
Business Key NSS1_Surrogate_key
Ut 123522
Asd 666323
Asa 222443
hellip hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
Has a 1 on (01) relation with a Business Concept
More difficult to be virtualised Lookup table should be kept small
Therefore DO NOT do key lookup in Concept Context entity
Load generate together with BC
Preferably lsquoin memoryrsquo somehowhellip
BA important Details
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Integrate the validity timelines of Concept Contextsbelonging to a Business Concept
Like a Data Vault Point-in-time construct
But Mandatory
And with a clearly defined and performantapproach
BC-Timeline
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-Station
[valuation]Source NSS2 table p
[adres]Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BK_NSR-
Station
WOZ
waarde
Waarde
Ratingbureau X
META_Laad_dts META_Laad_eind_dts
Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959
Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959
Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000
BK_NSR-
Station
Combined_Load_dts Combined_Load_end_dts
Ut 5-6-2013 220000 1-1-2014 215959
Ut 1-1-2014 220000 1-1-2015 215959
Ut 1-1-2015 220000 4-7-2015 215959
Ut 4-7-2015 220000 1-3-2016 215959
Ut 1-3-2016 220000 31-12-9999 000000
Asd 5-6-2013 220000 31-12-9999 000000
BK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959
Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000
Asa hellip hellip hellip hellip hellip hellip
BCT
modelling
rwerschkull
nllinkedincominrogierwerschkull
What
Makes PS-3Ca Different Ensemble
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
rwerschkull
nllinkedincominrogierwerschkull
1) Explicitly Splitting The work
Data
+History
Subjects
+Integration
rwerschkull
nllinkedincominrogierwerschkull
2) NO HASHED BUSINeSS KEYS
or surrogate keys
httpwwwcannabisculturecomfilesimages6hashbrickJPG
Only
Concatenatedones
rwerschkull
nllinkedincominrogierwerschkull
3) Less
joinsRelation
+Technical validity timeline
+ Relation context
Together in one entityPhoto credit Public Domain
rwerschkull
nllinkedincominrogierwerschkull
4) Explicit
HelpEREntities
Business Alias
Business Component Timeline
+ explicitly define Driving key(s)
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
a) HASHING OF Business keys
Rolling
Stock Nr Datetime Sensor Id Value Concatenated Business Key
Key
Len MD5 Hash
Key
Len
8739
2015-01-22
013427 72A1_FINV 123 8739|2015-01-22 013427|72A1_FINV 34 86ae4c6b0e2e2d5a13a0d11440529aeb 32
8739
2015-01-22
013427 72A1_SLDET 100 8739|2015-01-22 013427|72A1_SLDET 35 51ce9bc292eef407bd7c91a52eebcf2e 32
8739
2015-01-22
013432 72A1_FINV 126 8739|2015-01-22 013432|72A1_FINV 34 9482a41c1fecc4c64b8c437af6cc85e8 32
8739
2015-01-22
013432
13A8_MW_UB
AT_VT 5 8739|2015-01-22 013432|13A8_MW_UBAT_VT 42 e4160914ee55ce0b93f87b23366a0ce3 32
8674
2015-01-22
013426 72A1_FINV 6 8674|2015-01-22 013426|72A1_FINV 34 fcb3e7c8c91e44ce396d908a4948ca65 32
8674
2015-01-22
013426
16A1_HSVER
OND 7 8674|2015-01-22 013426|16A1_HSVEROND 38 fe9098c8c291ad56af5c8afae5169196 32
Loses statistical Informationregarding the data distributionQuery optimizers do not like thishellip
Column family Document and Key-value databases need a
good (natural) sharding key for (partial) key-
lookups
Hashinghelliphellip
httpwwwebaytechblogcom20120814cassandra-data-modeling-best-practices-part-2rwerschkull
nllinkedincominrogierwerschkull
Surrogates keys require
centralized coordination
hellipand thus can impact the overall systemrsquos scalability and availability
A lot of MPP NoSQL databases simply do not have themhellip
B) Surrogate BuSINESS keys
rwerschkull
nllinkedincominrogierwerschkull
Then Some Inspiration
httproelantvoscomblogp=1119
rwerschkull
nllinkedincominrogierwerschkull
lsquoIn my opinion the answer lies in the adoption of the
persistent (Historical) Staging Area concept
(also known as Historical Staging or the History Area)
This basically adopts the fundamentals of a Data Warehousersquo
lsquoThe Historical Staging Area effectively lsquoactsrsquo as
Data Lake but in a better defined form as data deltas and
event datetimes are taken into accountrsquo
So
Time Variant
amp
Non Volatile
Subject Oriented
amp
Integrated
EDW
Subject Oriented Integrated
Time Variant Non-Volatile
EDW
rwerschkull
nllinkedincominrogierwerschkull
Could be a
Data LAKE
VirtualisedEnsemble
Tier
EDW
Time Variant
amp
Non Volatile
Subject Oriented
amp
Integrated
EDW
rwerschkull
nllinkedincominrogierwerschkull
How
Does PS-3C Work
rwerschkull
nllinkedincominrogierwerschkullPhoto credit Public Domain
StagingArea
EDWInformation
Marts
Focus of Current ensemble EDWrsquos
rwerschkull
nllinkedincominrogierwerschkull
Persistent StagingArea HSA =
Data LibraryEDW
Information Marts
Splitting the work
rwerschkull
nllinkedincominrogierwerschkull
Persitent
Staging
-
Concept
Context
Connector
Business
Concept
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
Identify source event stream Primary or Unique KeyUse source metadata for this
Automate the building of a PS lsquoaround this keyrsquo Take all columns
Historize using SCD-2 approach
Persistent Staging - how
rwerschkull
nllinkedincominrogierwerschkull
Entity levelUnique key
Functional Description
Delivering party
Owner Responsible
MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)
hellip
Persistent Staging Metadata-1
rwerschkull
nllinkedincominrogierwerschkull
Column level [Load Date Timestamp]
[Load End Date Timestamp]
[Deleted Flag] OR delete as new record
[Source system] on table file level (lowest possible)
Load End Date Timestamp possible but difficulthellipRequires updates
Persistent Staging Metadata-2
rwerschkull
nllinkedincominrogierwerschkull
ACID is possible in HIVE
ACID Makes Updates possibleBy registering updates as lsquonew datarsquo
Reconciliation compacting when idle at user command
Use ORC files
PLUS changing the HIVE configurationhellip
UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)
rwerschkull
nllinkedincominrogierwerschkull
HivePut semi structured data = variable columns in MAP data type
OR use Data storage type that supports schema-evolutionAVRO (ORC in development)
Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo
Schema can be different for every row
What about SEMI-STRUCTURED Data
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
3C - how
rwerschkull
nllinkedincominrogierwerschkull
Always starts with Conceptual data modeling
NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic
No Link Satellites
No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo
Explicit Helper entities
Like Data Vault(2) BUT
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
a UNIQUE Domain specific point of integration
hellipa business entity
hellipwithin itrsquos own domain
hellipdoes not necessarily need to be Enterprise Wide
Business Concept (BC)
rwerschkull
nllinkedincominrogierwerschkull
Why not lsquoenterprise widersquoCompany
Customer
Sales Customer
International
Sales Customer
Local
Sales Customer
Marketing Customer
Customer hellip
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description]
[Owner Responsible]
Column level [Load Date Timestamp]
[Source system] on table file level (lowest possible)
Business Concept Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example-Data
NSR-Station
NS-
Travelcard
NS-
Trainseries
Business Key
IC|855
IC|8852
Sp|7455
St|16050
hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
NS-
Traveller
Business Key
3528 0234 2073 1234
3528 0234 2073 5678
hellip
Business Key
CRM-RW123456
CRM-LAS224466
hellip
rwerschkull
nllinkedincominrogierwerschkull
Most easy entity to be virtualised(if performance allows)
No Hashing amp No surrogateBUSINESSKEYS
(not by default at least)
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Containts Context about a Concept In a historical way
hellipLike a Data Vault Satellite
Every CC belongs to only one BC
Seperate entity per source system table stream
Concept Context (CC)
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Concept Context Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
Example-dataBK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts META_Deleted_Ind
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0
Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0
Asa hellip hellip hellip hellip hellip hellip
NSR-Station
[adres]Source NSS1 table y
More difficult to be virtualisedDepends on semantic gap with source
But do make virtual when lsquostreaming datarsquo is necessary
Because we have PS layerExposing all columns not necessary
Refactoring is more easyhellip
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Relations between Concepts + Context
In a historical way
hellipMerger of Data Vault Link + Link Satellite
Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point
in time
Connector (C)
rwerschkull
nllinkedincominrogierwerschkull
Explicitly defining a driving key as metadatahellip
Gives business understanding
Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)
bull is not registered as a new lsquoconnectionrsquo
Connector Driving key
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Connector Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
Driving Key
NS-Travelcard
+Checkin timestamp
Example-Data
NSR-StationNS-
Travelcard
NS-
Trainseries
rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
BK_NSR-
Station-from
BK_NSR-
Station-to
BK_NS_
Trainseries
BK_NS-Travelcard Checkin
timestamp
Checkout
timestamp
META_Load_dts META_Load_end_dts
Asd Ut IC 855 3528 0234 2073
1234
5-4-2016 84932 5-4-2016 94012 6-4-2016
220000
31-12-9999
000000
Ut Asd IC 855 3528 0234 2073
1234
5-4-2016 181009 5-4-2016 185520 6-4-2016
220000
31-12-9999
000000
We add Two mandatory
HELPEREntities
Are we there yet No
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
To help switching from sources that are Tied together by technical (surrogate) keyshellip
To a Business Key based model
Itrsquos a LOOKUP table that translates the technical to the
Business Key
Business Alias
Example
NSR-Station
[valuation]Source NSS2 table p
[description]SourceNSS1 table x[adres]
Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
BA-NSS2
Key Lookup voor NSS1 source tables
Key Lookup voor NSS2 source tables
Example-data
NSR-Station
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
Key Lookup voor NSS1 source tables
Business Key NSS1_Surrogate_key
Ut 123522
Asd 666323
Asa 222443
hellip hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
Has a 1 on (01) relation with a Business Concept
More difficult to be virtualised Lookup table should be kept small
Therefore DO NOT do key lookup in Concept Context entity
Load generate together with BC
Preferably lsquoin memoryrsquo somehowhellip
BA important Details
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Integrate the validity timelines of Concept Contextsbelonging to a Business Concept
Like a Data Vault Point-in-time construct
But Mandatory
And with a clearly defined and performantapproach
BC-Timeline
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-Station
[valuation]Source NSS2 table p
[adres]Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BK_NSR-
Station
WOZ
waarde
Waarde
Ratingbureau X
META_Laad_dts META_Laad_eind_dts
Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959
Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959
Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000
BK_NSR-
Station
Combined_Load_dts Combined_Load_end_dts
Ut 5-6-2013 220000 1-1-2014 215959
Ut 1-1-2014 220000 1-1-2015 215959
Ut 1-1-2015 220000 4-7-2015 215959
Ut 4-7-2015 220000 1-3-2016 215959
Ut 1-3-2016 220000 31-12-9999 000000
Asd 5-6-2013 220000 31-12-9999 000000
BK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959
Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000
Asa hellip hellip hellip hellip hellip hellip
BCT
modelling
rwerschkull
nllinkedincominrogierwerschkull
What
Makes PS-3Ca Different Ensemble
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
rwerschkull
nllinkedincominrogierwerschkull
1) Explicitly Splitting The work
Data
+History
Subjects
+Integration
rwerschkull
nllinkedincominrogierwerschkull
2) NO HASHED BUSINeSS KEYS
or surrogate keys
httpwwwcannabisculturecomfilesimages6hashbrickJPG
Only
Concatenatedones
rwerschkull
nllinkedincominrogierwerschkull
3) Less
joinsRelation
+Technical validity timeline
+ Relation context
Together in one entityPhoto credit Public Domain
rwerschkull
nllinkedincominrogierwerschkull
4) Explicit
HelpEREntities
Business Alias
Business Component Timeline
+ explicitly define Driving key(s)
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
Loses statistical Informationregarding the data distributionQuery optimizers do not like thishellip
Column family Document and Key-value databases need a
good (natural) sharding key for (partial) key-
lookups
Hashinghelliphellip
httpwwwebaytechblogcom20120814cassandra-data-modeling-best-practices-part-2rwerschkull
nllinkedincominrogierwerschkull
Surrogates keys require
centralized coordination
hellipand thus can impact the overall systemrsquos scalability and availability
A lot of MPP NoSQL databases simply do not have themhellip
B) Surrogate BuSINESS keys
rwerschkull
nllinkedincominrogierwerschkull
Then Some Inspiration
httproelantvoscomblogp=1119
rwerschkull
nllinkedincominrogierwerschkull
lsquoIn my opinion the answer lies in the adoption of the
persistent (Historical) Staging Area concept
(also known as Historical Staging or the History Area)
This basically adopts the fundamentals of a Data Warehousersquo
lsquoThe Historical Staging Area effectively lsquoactsrsquo as
Data Lake but in a better defined form as data deltas and
event datetimes are taken into accountrsquo
So
Time Variant
amp
Non Volatile
Subject Oriented
amp
Integrated
EDW
Subject Oriented Integrated
Time Variant Non-Volatile
EDW
rwerschkull
nllinkedincominrogierwerschkull
Could be a
Data LAKE
VirtualisedEnsemble
Tier
EDW
Time Variant
amp
Non Volatile
Subject Oriented
amp
Integrated
EDW
rwerschkull
nllinkedincominrogierwerschkull
How
Does PS-3C Work
rwerschkull
nllinkedincominrogierwerschkullPhoto credit Public Domain
StagingArea
EDWInformation
Marts
Focus of Current ensemble EDWrsquos
rwerschkull
nllinkedincominrogierwerschkull
Persistent StagingArea HSA =
Data LibraryEDW
Information Marts
Splitting the work
rwerschkull
nllinkedincominrogierwerschkull
Persitent
Staging
-
Concept
Context
Connector
Business
Concept
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
Identify source event stream Primary or Unique KeyUse source metadata for this
Automate the building of a PS lsquoaround this keyrsquo Take all columns
Historize using SCD-2 approach
Persistent Staging - how
rwerschkull
nllinkedincominrogierwerschkull
Entity levelUnique key
Functional Description
Delivering party
Owner Responsible
MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)
hellip
Persistent Staging Metadata-1
rwerschkull
nllinkedincominrogierwerschkull
Column level [Load Date Timestamp]
[Load End Date Timestamp]
[Deleted Flag] OR delete as new record
[Source system] on table file level (lowest possible)
Load End Date Timestamp possible but difficulthellipRequires updates
Persistent Staging Metadata-2
rwerschkull
nllinkedincominrogierwerschkull
ACID is possible in HIVE
ACID Makes Updates possibleBy registering updates as lsquonew datarsquo
Reconciliation compacting when idle at user command
Use ORC files
PLUS changing the HIVE configurationhellip
UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)
rwerschkull
nllinkedincominrogierwerschkull
HivePut semi structured data = variable columns in MAP data type
OR use Data storage type that supports schema-evolutionAVRO (ORC in development)
Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo
Schema can be different for every row
What about SEMI-STRUCTURED Data
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
3C - how
rwerschkull
nllinkedincominrogierwerschkull
Always starts with Conceptual data modeling
NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic
No Link Satellites
No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo
Explicit Helper entities
Like Data Vault(2) BUT
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
a UNIQUE Domain specific point of integration
hellipa business entity
hellipwithin itrsquos own domain
hellipdoes not necessarily need to be Enterprise Wide
Business Concept (BC)
rwerschkull
nllinkedincominrogierwerschkull
Why not lsquoenterprise widersquoCompany
Customer
Sales Customer
International
Sales Customer
Local
Sales Customer
Marketing Customer
Customer hellip
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description]
[Owner Responsible]
Column level [Load Date Timestamp]
[Source system] on table file level (lowest possible)
Business Concept Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example-Data
NSR-Station
NS-
Travelcard
NS-
Trainseries
Business Key
IC|855
IC|8852
Sp|7455
St|16050
hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
NS-
Traveller
Business Key
3528 0234 2073 1234
3528 0234 2073 5678
hellip
Business Key
CRM-RW123456
CRM-LAS224466
hellip
rwerschkull
nllinkedincominrogierwerschkull
Most easy entity to be virtualised(if performance allows)
No Hashing amp No surrogateBUSINESSKEYS
(not by default at least)
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Containts Context about a Concept In a historical way
hellipLike a Data Vault Satellite
Every CC belongs to only one BC
Seperate entity per source system table stream
Concept Context (CC)
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Concept Context Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
Example-dataBK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts META_Deleted_Ind
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0
Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0
Asa hellip hellip hellip hellip hellip hellip
NSR-Station
[adres]Source NSS1 table y
More difficult to be virtualisedDepends on semantic gap with source
But do make virtual when lsquostreaming datarsquo is necessary
Because we have PS layerExposing all columns not necessary
Refactoring is more easyhellip
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Relations between Concepts + Context
In a historical way
hellipMerger of Data Vault Link + Link Satellite
Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point
in time
Connector (C)
rwerschkull
nllinkedincominrogierwerschkull
Explicitly defining a driving key as metadatahellip
Gives business understanding
Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)
bull is not registered as a new lsquoconnectionrsquo
Connector Driving key
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Connector Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
Driving Key
NS-Travelcard
+Checkin timestamp
Example-Data
NSR-StationNS-
Travelcard
NS-
Trainseries
rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
BK_NSR-
Station-from
BK_NSR-
Station-to
BK_NS_
Trainseries
BK_NS-Travelcard Checkin
timestamp
Checkout
timestamp
META_Load_dts META_Load_end_dts
Asd Ut IC 855 3528 0234 2073
1234
5-4-2016 84932 5-4-2016 94012 6-4-2016
220000
31-12-9999
000000
Ut Asd IC 855 3528 0234 2073
1234
5-4-2016 181009 5-4-2016 185520 6-4-2016
220000
31-12-9999
000000
We add Two mandatory
HELPEREntities
Are we there yet No
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
To help switching from sources that are Tied together by technical (surrogate) keyshellip
To a Business Key based model
Itrsquos a LOOKUP table that translates the technical to the
Business Key
Business Alias
Example
NSR-Station
[valuation]Source NSS2 table p
[description]SourceNSS1 table x[adres]
Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
BA-NSS2
Key Lookup voor NSS1 source tables
Key Lookup voor NSS2 source tables
Example-data
NSR-Station
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
Key Lookup voor NSS1 source tables
Business Key NSS1_Surrogate_key
Ut 123522
Asd 666323
Asa 222443
hellip hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
Has a 1 on (01) relation with a Business Concept
More difficult to be virtualised Lookup table should be kept small
Therefore DO NOT do key lookup in Concept Context entity
Load generate together with BC
Preferably lsquoin memoryrsquo somehowhellip
BA important Details
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Integrate the validity timelines of Concept Contextsbelonging to a Business Concept
Like a Data Vault Point-in-time construct
But Mandatory
And with a clearly defined and performantapproach
BC-Timeline
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-Station
[valuation]Source NSS2 table p
[adres]Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BK_NSR-
Station
WOZ
waarde
Waarde
Ratingbureau X
META_Laad_dts META_Laad_eind_dts
Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959
Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959
Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000
BK_NSR-
Station
Combined_Load_dts Combined_Load_end_dts
Ut 5-6-2013 220000 1-1-2014 215959
Ut 1-1-2014 220000 1-1-2015 215959
Ut 1-1-2015 220000 4-7-2015 215959
Ut 4-7-2015 220000 1-3-2016 215959
Ut 1-3-2016 220000 31-12-9999 000000
Asd 5-6-2013 220000 31-12-9999 000000
BK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959
Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000
Asa hellip hellip hellip hellip hellip hellip
BCT
modelling
rwerschkull
nllinkedincominrogierwerschkull
What
Makes PS-3Ca Different Ensemble
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
rwerschkull
nllinkedincominrogierwerschkull
1) Explicitly Splitting The work
Data
+History
Subjects
+Integration
rwerschkull
nllinkedincominrogierwerschkull
2) NO HASHED BUSINeSS KEYS
or surrogate keys
httpwwwcannabisculturecomfilesimages6hashbrickJPG
Only
Concatenatedones
rwerschkull
nllinkedincominrogierwerschkull
3) Less
joinsRelation
+Technical validity timeline
+ Relation context
Together in one entityPhoto credit Public Domain
rwerschkull
nllinkedincominrogierwerschkull
4) Explicit
HelpEREntities
Business Alias
Business Component Timeline
+ explicitly define Driving key(s)
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
Surrogates keys require
centralized coordination
hellipand thus can impact the overall systemrsquos scalability and availability
A lot of MPP NoSQL databases simply do not have themhellip
B) Surrogate BuSINESS keys
rwerschkull
nllinkedincominrogierwerschkull
Then Some Inspiration
httproelantvoscomblogp=1119
rwerschkull
nllinkedincominrogierwerschkull
lsquoIn my opinion the answer lies in the adoption of the
persistent (Historical) Staging Area concept
(also known as Historical Staging or the History Area)
This basically adopts the fundamentals of a Data Warehousersquo
lsquoThe Historical Staging Area effectively lsquoactsrsquo as
Data Lake but in a better defined form as data deltas and
event datetimes are taken into accountrsquo
So
Time Variant
amp
Non Volatile
Subject Oriented
amp
Integrated
EDW
Subject Oriented Integrated
Time Variant Non-Volatile
EDW
rwerschkull
nllinkedincominrogierwerschkull
Could be a
Data LAKE
VirtualisedEnsemble
Tier
EDW
Time Variant
amp
Non Volatile
Subject Oriented
amp
Integrated
EDW
rwerschkull
nllinkedincominrogierwerschkull
How
Does PS-3C Work
rwerschkull
nllinkedincominrogierwerschkullPhoto credit Public Domain
StagingArea
EDWInformation
Marts
Focus of Current ensemble EDWrsquos
rwerschkull
nllinkedincominrogierwerschkull
Persistent StagingArea HSA =
Data LibraryEDW
Information Marts
Splitting the work
rwerschkull
nllinkedincominrogierwerschkull
Persitent
Staging
-
Concept
Context
Connector
Business
Concept
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
Identify source event stream Primary or Unique KeyUse source metadata for this
Automate the building of a PS lsquoaround this keyrsquo Take all columns
Historize using SCD-2 approach
Persistent Staging - how
rwerschkull
nllinkedincominrogierwerschkull
Entity levelUnique key
Functional Description
Delivering party
Owner Responsible
MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)
hellip
Persistent Staging Metadata-1
rwerschkull
nllinkedincominrogierwerschkull
Column level [Load Date Timestamp]
[Load End Date Timestamp]
[Deleted Flag] OR delete as new record
[Source system] on table file level (lowest possible)
Load End Date Timestamp possible but difficulthellipRequires updates
Persistent Staging Metadata-2
rwerschkull
nllinkedincominrogierwerschkull
ACID is possible in HIVE
ACID Makes Updates possibleBy registering updates as lsquonew datarsquo
Reconciliation compacting when idle at user command
Use ORC files
PLUS changing the HIVE configurationhellip
UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)
rwerschkull
nllinkedincominrogierwerschkull
HivePut semi structured data = variable columns in MAP data type
OR use Data storage type that supports schema-evolutionAVRO (ORC in development)
Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo
Schema can be different for every row
What about SEMI-STRUCTURED Data
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
3C - how
rwerschkull
nllinkedincominrogierwerschkull
Always starts with Conceptual data modeling
NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic
No Link Satellites
No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo
Explicit Helper entities
Like Data Vault(2) BUT
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
a UNIQUE Domain specific point of integration
hellipa business entity
hellipwithin itrsquos own domain
hellipdoes not necessarily need to be Enterprise Wide
Business Concept (BC)
rwerschkull
nllinkedincominrogierwerschkull
Why not lsquoenterprise widersquoCompany
Customer
Sales Customer
International
Sales Customer
Local
Sales Customer
Marketing Customer
Customer hellip
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description]
[Owner Responsible]
Column level [Load Date Timestamp]
[Source system] on table file level (lowest possible)
Business Concept Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example-Data
NSR-Station
NS-
Travelcard
NS-
Trainseries
Business Key
IC|855
IC|8852
Sp|7455
St|16050
hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
NS-
Traveller
Business Key
3528 0234 2073 1234
3528 0234 2073 5678
hellip
Business Key
CRM-RW123456
CRM-LAS224466
hellip
rwerschkull
nllinkedincominrogierwerschkull
Most easy entity to be virtualised(if performance allows)
No Hashing amp No surrogateBUSINESSKEYS
(not by default at least)
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Containts Context about a Concept In a historical way
hellipLike a Data Vault Satellite
Every CC belongs to only one BC
Seperate entity per source system table stream
Concept Context (CC)
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Concept Context Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
Example-dataBK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts META_Deleted_Ind
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0
Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0
Asa hellip hellip hellip hellip hellip hellip
NSR-Station
[adres]Source NSS1 table y
More difficult to be virtualisedDepends on semantic gap with source
But do make virtual when lsquostreaming datarsquo is necessary
Because we have PS layerExposing all columns not necessary
Refactoring is more easyhellip
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Relations between Concepts + Context
In a historical way
hellipMerger of Data Vault Link + Link Satellite
Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point
in time
Connector (C)
rwerschkull
nllinkedincominrogierwerschkull
Explicitly defining a driving key as metadatahellip
Gives business understanding
Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)
bull is not registered as a new lsquoconnectionrsquo
Connector Driving key
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Connector Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
Driving Key
NS-Travelcard
+Checkin timestamp
Example-Data
NSR-StationNS-
Travelcard
NS-
Trainseries
rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
BK_NSR-
Station-from
BK_NSR-
Station-to
BK_NS_
Trainseries
BK_NS-Travelcard Checkin
timestamp
Checkout
timestamp
META_Load_dts META_Load_end_dts
Asd Ut IC 855 3528 0234 2073
1234
5-4-2016 84932 5-4-2016 94012 6-4-2016
220000
31-12-9999
000000
Ut Asd IC 855 3528 0234 2073
1234
5-4-2016 181009 5-4-2016 185520 6-4-2016
220000
31-12-9999
000000
We add Two mandatory
HELPEREntities
Are we there yet No
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
To help switching from sources that are Tied together by technical (surrogate) keyshellip
To a Business Key based model
Itrsquos a LOOKUP table that translates the technical to the
Business Key
Business Alias
Example
NSR-Station
[valuation]Source NSS2 table p
[description]SourceNSS1 table x[adres]
Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
BA-NSS2
Key Lookup voor NSS1 source tables
Key Lookup voor NSS2 source tables
Example-data
NSR-Station
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
Key Lookup voor NSS1 source tables
Business Key NSS1_Surrogate_key
Ut 123522
Asd 666323
Asa 222443
hellip hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
Has a 1 on (01) relation with a Business Concept
More difficult to be virtualised Lookup table should be kept small
Therefore DO NOT do key lookup in Concept Context entity
Load generate together with BC
Preferably lsquoin memoryrsquo somehowhellip
BA important Details
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Integrate the validity timelines of Concept Contextsbelonging to a Business Concept
Like a Data Vault Point-in-time construct
But Mandatory
And with a clearly defined and performantapproach
BC-Timeline
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-Station
[valuation]Source NSS2 table p
[adres]Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BK_NSR-
Station
WOZ
waarde
Waarde
Ratingbureau X
META_Laad_dts META_Laad_eind_dts
Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959
Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959
Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000
BK_NSR-
Station
Combined_Load_dts Combined_Load_end_dts
Ut 5-6-2013 220000 1-1-2014 215959
Ut 1-1-2014 220000 1-1-2015 215959
Ut 1-1-2015 220000 4-7-2015 215959
Ut 4-7-2015 220000 1-3-2016 215959
Ut 1-3-2016 220000 31-12-9999 000000
Asd 5-6-2013 220000 31-12-9999 000000
BK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959
Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000
Asa hellip hellip hellip hellip hellip hellip
BCT
modelling
rwerschkull
nllinkedincominrogierwerschkull
What
Makes PS-3Ca Different Ensemble
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
rwerschkull
nllinkedincominrogierwerschkull
1) Explicitly Splitting The work
Data
+History
Subjects
+Integration
rwerschkull
nllinkedincominrogierwerschkull
2) NO HASHED BUSINeSS KEYS
or surrogate keys
httpwwwcannabisculturecomfilesimages6hashbrickJPG
Only
Concatenatedones
rwerschkull
nllinkedincominrogierwerschkull
3) Less
joinsRelation
+Technical validity timeline
+ Relation context
Together in one entityPhoto credit Public Domain
rwerschkull
nllinkedincominrogierwerschkull
4) Explicit
HelpEREntities
Business Alias
Business Component Timeline
+ explicitly define Driving key(s)
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
Then Some Inspiration
httproelantvoscomblogp=1119
rwerschkull
nllinkedincominrogierwerschkull
lsquoIn my opinion the answer lies in the adoption of the
persistent (Historical) Staging Area concept
(also known as Historical Staging or the History Area)
This basically adopts the fundamentals of a Data Warehousersquo
lsquoThe Historical Staging Area effectively lsquoactsrsquo as
Data Lake but in a better defined form as data deltas and
event datetimes are taken into accountrsquo
So
Time Variant
amp
Non Volatile
Subject Oriented
amp
Integrated
EDW
Subject Oriented Integrated
Time Variant Non-Volatile
EDW
rwerschkull
nllinkedincominrogierwerschkull
Could be a
Data LAKE
VirtualisedEnsemble
Tier
EDW
Time Variant
amp
Non Volatile
Subject Oriented
amp
Integrated
EDW
rwerschkull
nllinkedincominrogierwerschkull
How
Does PS-3C Work
rwerschkull
nllinkedincominrogierwerschkullPhoto credit Public Domain
StagingArea
EDWInformation
Marts
Focus of Current ensemble EDWrsquos
rwerschkull
nllinkedincominrogierwerschkull
Persistent StagingArea HSA =
Data LibraryEDW
Information Marts
Splitting the work
rwerschkull
nllinkedincominrogierwerschkull
Persitent
Staging
-
Concept
Context
Connector
Business
Concept
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
Identify source event stream Primary or Unique KeyUse source metadata for this
Automate the building of a PS lsquoaround this keyrsquo Take all columns
Historize using SCD-2 approach
Persistent Staging - how
rwerschkull
nllinkedincominrogierwerschkull
Entity levelUnique key
Functional Description
Delivering party
Owner Responsible
MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)
hellip
Persistent Staging Metadata-1
rwerschkull
nllinkedincominrogierwerschkull
Column level [Load Date Timestamp]
[Load End Date Timestamp]
[Deleted Flag] OR delete as new record
[Source system] on table file level (lowest possible)
Load End Date Timestamp possible but difficulthellipRequires updates
Persistent Staging Metadata-2
rwerschkull
nllinkedincominrogierwerschkull
ACID is possible in HIVE
ACID Makes Updates possibleBy registering updates as lsquonew datarsquo
Reconciliation compacting when idle at user command
Use ORC files
PLUS changing the HIVE configurationhellip
UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)
rwerschkull
nllinkedincominrogierwerschkull
HivePut semi structured data = variable columns in MAP data type
OR use Data storage type that supports schema-evolutionAVRO (ORC in development)
Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo
Schema can be different for every row
What about SEMI-STRUCTURED Data
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
3C - how
rwerschkull
nllinkedincominrogierwerschkull
Always starts with Conceptual data modeling
NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic
No Link Satellites
No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo
Explicit Helper entities
Like Data Vault(2) BUT
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
a UNIQUE Domain specific point of integration
hellipa business entity
hellipwithin itrsquos own domain
hellipdoes not necessarily need to be Enterprise Wide
Business Concept (BC)
rwerschkull
nllinkedincominrogierwerschkull
Why not lsquoenterprise widersquoCompany
Customer
Sales Customer
International
Sales Customer
Local
Sales Customer
Marketing Customer
Customer hellip
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description]
[Owner Responsible]
Column level [Load Date Timestamp]
[Source system] on table file level (lowest possible)
Business Concept Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example-Data
NSR-Station
NS-
Travelcard
NS-
Trainseries
Business Key
IC|855
IC|8852
Sp|7455
St|16050
hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
NS-
Traveller
Business Key
3528 0234 2073 1234
3528 0234 2073 5678
hellip
Business Key
CRM-RW123456
CRM-LAS224466
hellip
rwerschkull
nllinkedincominrogierwerschkull
Most easy entity to be virtualised(if performance allows)
No Hashing amp No surrogateBUSINESSKEYS
(not by default at least)
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Containts Context about a Concept In a historical way
hellipLike a Data Vault Satellite
Every CC belongs to only one BC
Seperate entity per source system table stream
Concept Context (CC)
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Concept Context Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
Example-dataBK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts META_Deleted_Ind
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0
Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0
Asa hellip hellip hellip hellip hellip hellip
NSR-Station
[adres]Source NSS1 table y
More difficult to be virtualisedDepends on semantic gap with source
But do make virtual when lsquostreaming datarsquo is necessary
Because we have PS layerExposing all columns not necessary
Refactoring is more easyhellip
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Relations between Concepts + Context
In a historical way
hellipMerger of Data Vault Link + Link Satellite
Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point
in time
Connector (C)
rwerschkull
nllinkedincominrogierwerschkull
Explicitly defining a driving key as metadatahellip
Gives business understanding
Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)
bull is not registered as a new lsquoconnectionrsquo
Connector Driving key
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Connector Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
Driving Key
NS-Travelcard
+Checkin timestamp
Example-Data
NSR-StationNS-
Travelcard
NS-
Trainseries
rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
BK_NSR-
Station-from
BK_NSR-
Station-to
BK_NS_
Trainseries
BK_NS-Travelcard Checkin
timestamp
Checkout
timestamp
META_Load_dts META_Load_end_dts
Asd Ut IC 855 3528 0234 2073
1234
5-4-2016 84932 5-4-2016 94012 6-4-2016
220000
31-12-9999
000000
Ut Asd IC 855 3528 0234 2073
1234
5-4-2016 181009 5-4-2016 185520 6-4-2016
220000
31-12-9999
000000
We add Two mandatory
HELPEREntities
Are we there yet No
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
To help switching from sources that are Tied together by technical (surrogate) keyshellip
To a Business Key based model
Itrsquos a LOOKUP table that translates the technical to the
Business Key
Business Alias
Example
NSR-Station
[valuation]Source NSS2 table p
[description]SourceNSS1 table x[adres]
Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
BA-NSS2
Key Lookup voor NSS1 source tables
Key Lookup voor NSS2 source tables
Example-data
NSR-Station
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
Key Lookup voor NSS1 source tables
Business Key NSS1_Surrogate_key
Ut 123522
Asd 666323
Asa 222443
hellip hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
Has a 1 on (01) relation with a Business Concept
More difficult to be virtualised Lookup table should be kept small
Therefore DO NOT do key lookup in Concept Context entity
Load generate together with BC
Preferably lsquoin memoryrsquo somehowhellip
BA important Details
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Integrate the validity timelines of Concept Contextsbelonging to a Business Concept
Like a Data Vault Point-in-time construct
But Mandatory
And with a clearly defined and performantapproach
BC-Timeline
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-Station
[valuation]Source NSS2 table p
[adres]Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BK_NSR-
Station
WOZ
waarde
Waarde
Ratingbureau X
META_Laad_dts META_Laad_eind_dts
Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959
Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959
Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000
BK_NSR-
Station
Combined_Load_dts Combined_Load_end_dts
Ut 5-6-2013 220000 1-1-2014 215959
Ut 1-1-2014 220000 1-1-2015 215959
Ut 1-1-2015 220000 4-7-2015 215959
Ut 4-7-2015 220000 1-3-2016 215959
Ut 1-3-2016 220000 31-12-9999 000000
Asd 5-6-2013 220000 31-12-9999 000000
BK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959
Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000
Asa hellip hellip hellip hellip hellip hellip
BCT
modelling
rwerschkull
nllinkedincominrogierwerschkull
What
Makes PS-3Ca Different Ensemble
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
rwerschkull
nllinkedincominrogierwerschkull
1) Explicitly Splitting The work
Data
+History
Subjects
+Integration
rwerschkull
nllinkedincominrogierwerschkull
2) NO HASHED BUSINeSS KEYS
or surrogate keys
httpwwwcannabisculturecomfilesimages6hashbrickJPG
Only
Concatenatedones
rwerschkull
nllinkedincominrogierwerschkull
3) Less
joinsRelation
+Technical validity timeline
+ Relation context
Together in one entityPhoto credit Public Domain
rwerschkull
nllinkedincominrogierwerschkull
4) Explicit
HelpEREntities
Business Alias
Business Component Timeline
+ explicitly define Driving key(s)
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
lsquoIn my opinion the answer lies in the adoption of the
persistent (Historical) Staging Area concept
(also known as Historical Staging or the History Area)
This basically adopts the fundamentals of a Data Warehousersquo
lsquoThe Historical Staging Area effectively lsquoactsrsquo as
Data Lake but in a better defined form as data deltas and
event datetimes are taken into accountrsquo
So
Time Variant
amp
Non Volatile
Subject Oriented
amp
Integrated
EDW
Subject Oriented Integrated
Time Variant Non-Volatile
EDW
rwerschkull
nllinkedincominrogierwerschkull
Could be a
Data LAKE
VirtualisedEnsemble
Tier
EDW
Time Variant
amp
Non Volatile
Subject Oriented
amp
Integrated
EDW
rwerschkull
nllinkedincominrogierwerschkull
How
Does PS-3C Work
rwerschkull
nllinkedincominrogierwerschkullPhoto credit Public Domain
StagingArea
EDWInformation
Marts
Focus of Current ensemble EDWrsquos
rwerschkull
nllinkedincominrogierwerschkull
Persistent StagingArea HSA =
Data LibraryEDW
Information Marts
Splitting the work
rwerschkull
nllinkedincominrogierwerschkull
Persitent
Staging
-
Concept
Context
Connector
Business
Concept
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
Identify source event stream Primary or Unique KeyUse source metadata for this
Automate the building of a PS lsquoaround this keyrsquo Take all columns
Historize using SCD-2 approach
Persistent Staging - how
rwerschkull
nllinkedincominrogierwerschkull
Entity levelUnique key
Functional Description
Delivering party
Owner Responsible
MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)
hellip
Persistent Staging Metadata-1
rwerschkull
nllinkedincominrogierwerschkull
Column level [Load Date Timestamp]
[Load End Date Timestamp]
[Deleted Flag] OR delete as new record
[Source system] on table file level (lowest possible)
Load End Date Timestamp possible but difficulthellipRequires updates
Persistent Staging Metadata-2
rwerschkull
nllinkedincominrogierwerschkull
ACID is possible in HIVE
ACID Makes Updates possibleBy registering updates as lsquonew datarsquo
Reconciliation compacting when idle at user command
Use ORC files
PLUS changing the HIVE configurationhellip
UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)
rwerschkull
nllinkedincominrogierwerschkull
HivePut semi structured data = variable columns in MAP data type
OR use Data storage type that supports schema-evolutionAVRO (ORC in development)
Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo
Schema can be different for every row
What about SEMI-STRUCTURED Data
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
3C - how
rwerschkull
nllinkedincominrogierwerschkull
Always starts with Conceptual data modeling
NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic
No Link Satellites
No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo
Explicit Helper entities
Like Data Vault(2) BUT
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
a UNIQUE Domain specific point of integration
hellipa business entity
hellipwithin itrsquos own domain
hellipdoes not necessarily need to be Enterprise Wide
Business Concept (BC)
rwerschkull
nllinkedincominrogierwerschkull
Why not lsquoenterprise widersquoCompany
Customer
Sales Customer
International
Sales Customer
Local
Sales Customer
Marketing Customer
Customer hellip
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description]
[Owner Responsible]
Column level [Load Date Timestamp]
[Source system] on table file level (lowest possible)
Business Concept Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example-Data
NSR-Station
NS-
Travelcard
NS-
Trainseries
Business Key
IC|855
IC|8852
Sp|7455
St|16050
hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
NS-
Traveller
Business Key
3528 0234 2073 1234
3528 0234 2073 5678
hellip
Business Key
CRM-RW123456
CRM-LAS224466
hellip
rwerschkull
nllinkedincominrogierwerschkull
Most easy entity to be virtualised(if performance allows)
No Hashing amp No surrogateBUSINESSKEYS
(not by default at least)
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Containts Context about a Concept In a historical way
hellipLike a Data Vault Satellite
Every CC belongs to only one BC
Seperate entity per source system table stream
Concept Context (CC)
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Concept Context Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
Example-dataBK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts META_Deleted_Ind
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0
Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0
Asa hellip hellip hellip hellip hellip hellip
NSR-Station
[adres]Source NSS1 table y
More difficult to be virtualisedDepends on semantic gap with source
But do make virtual when lsquostreaming datarsquo is necessary
Because we have PS layerExposing all columns not necessary
Refactoring is more easyhellip
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Relations between Concepts + Context
In a historical way
hellipMerger of Data Vault Link + Link Satellite
Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point
in time
Connector (C)
rwerschkull
nllinkedincominrogierwerschkull
Explicitly defining a driving key as metadatahellip
Gives business understanding
Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)
bull is not registered as a new lsquoconnectionrsquo
Connector Driving key
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Connector Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
Driving Key
NS-Travelcard
+Checkin timestamp
Example-Data
NSR-StationNS-
Travelcard
NS-
Trainseries
rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
BK_NSR-
Station-from
BK_NSR-
Station-to
BK_NS_
Trainseries
BK_NS-Travelcard Checkin
timestamp
Checkout
timestamp
META_Load_dts META_Load_end_dts
Asd Ut IC 855 3528 0234 2073
1234
5-4-2016 84932 5-4-2016 94012 6-4-2016
220000
31-12-9999
000000
Ut Asd IC 855 3528 0234 2073
1234
5-4-2016 181009 5-4-2016 185520 6-4-2016
220000
31-12-9999
000000
We add Two mandatory
HELPEREntities
Are we there yet No
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
To help switching from sources that are Tied together by technical (surrogate) keyshellip
To a Business Key based model
Itrsquos a LOOKUP table that translates the technical to the
Business Key
Business Alias
Example
NSR-Station
[valuation]Source NSS2 table p
[description]SourceNSS1 table x[adres]
Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
BA-NSS2
Key Lookup voor NSS1 source tables
Key Lookup voor NSS2 source tables
Example-data
NSR-Station
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
Key Lookup voor NSS1 source tables
Business Key NSS1_Surrogate_key
Ut 123522
Asd 666323
Asa 222443
hellip hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
Has a 1 on (01) relation with a Business Concept
More difficult to be virtualised Lookup table should be kept small
Therefore DO NOT do key lookup in Concept Context entity
Load generate together with BC
Preferably lsquoin memoryrsquo somehowhellip
BA important Details
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Integrate the validity timelines of Concept Contextsbelonging to a Business Concept
Like a Data Vault Point-in-time construct
But Mandatory
And with a clearly defined and performantapproach
BC-Timeline
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-Station
[valuation]Source NSS2 table p
[adres]Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BK_NSR-
Station
WOZ
waarde
Waarde
Ratingbureau X
META_Laad_dts META_Laad_eind_dts
Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959
Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959
Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000
BK_NSR-
Station
Combined_Load_dts Combined_Load_end_dts
Ut 5-6-2013 220000 1-1-2014 215959
Ut 1-1-2014 220000 1-1-2015 215959
Ut 1-1-2015 220000 4-7-2015 215959
Ut 4-7-2015 220000 1-3-2016 215959
Ut 1-3-2016 220000 31-12-9999 000000
Asd 5-6-2013 220000 31-12-9999 000000
BK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959
Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000
Asa hellip hellip hellip hellip hellip hellip
BCT
modelling
rwerschkull
nllinkedincominrogierwerschkull
What
Makes PS-3Ca Different Ensemble
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
rwerschkull
nllinkedincominrogierwerschkull
1) Explicitly Splitting The work
Data
+History
Subjects
+Integration
rwerschkull
nllinkedincominrogierwerschkull
2) NO HASHED BUSINeSS KEYS
or surrogate keys
httpwwwcannabisculturecomfilesimages6hashbrickJPG
Only
Concatenatedones
rwerschkull
nllinkedincominrogierwerschkull
3) Less
joinsRelation
+Technical validity timeline
+ Relation context
Together in one entityPhoto credit Public Domain
rwerschkull
nllinkedincominrogierwerschkull
4) Explicit
HelpEREntities
Business Alias
Business Component Timeline
+ explicitly define Driving key(s)
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
So
Time Variant
amp
Non Volatile
Subject Oriented
amp
Integrated
EDW
Subject Oriented Integrated
Time Variant Non-Volatile
EDW
rwerschkull
nllinkedincominrogierwerschkull
Could be a
Data LAKE
VirtualisedEnsemble
Tier
EDW
Time Variant
amp
Non Volatile
Subject Oriented
amp
Integrated
EDW
rwerschkull
nllinkedincominrogierwerschkull
How
Does PS-3C Work
rwerschkull
nllinkedincominrogierwerschkullPhoto credit Public Domain
StagingArea
EDWInformation
Marts
Focus of Current ensemble EDWrsquos
rwerschkull
nllinkedincominrogierwerschkull
Persistent StagingArea HSA =
Data LibraryEDW
Information Marts
Splitting the work
rwerschkull
nllinkedincominrogierwerschkull
Persitent
Staging
-
Concept
Context
Connector
Business
Concept
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
Identify source event stream Primary or Unique KeyUse source metadata for this
Automate the building of a PS lsquoaround this keyrsquo Take all columns
Historize using SCD-2 approach
Persistent Staging - how
rwerschkull
nllinkedincominrogierwerschkull
Entity levelUnique key
Functional Description
Delivering party
Owner Responsible
MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)
hellip
Persistent Staging Metadata-1
rwerschkull
nllinkedincominrogierwerschkull
Column level [Load Date Timestamp]
[Load End Date Timestamp]
[Deleted Flag] OR delete as new record
[Source system] on table file level (lowest possible)
Load End Date Timestamp possible but difficulthellipRequires updates
Persistent Staging Metadata-2
rwerschkull
nllinkedincominrogierwerschkull
ACID is possible in HIVE
ACID Makes Updates possibleBy registering updates as lsquonew datarsquo
Reconciliation compacting when idle at user command
Use ORC files
PLUS changing the HIVE configurationhellip
UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)
rwerschkull
nllinkedincominrogierwerschkull
HivePut semi structured data = variable columns in MAP data type
OR use Data storage type that supports schema-evolutionAVRO (ORC in development)
Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo
Schema can be different for every row
What about SEMI-STRUCTURED Data
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
3C - how
rwerschkull
nllinkedincominrogierwerschkull
Always starts with Conceptual data modeling
NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic
No Link Satellites
No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo
Explicit Helper entities
Like Data Vault(2) BUT
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
a UNIQUE Domain specific point of integration
hellipa business entity
hellipwithin itrsquos own domain
hellipdoes not necessarily need to be Enterprise Wide
Business Concept (BC)
rwerschkull
nllinkedincominrogierwerschkull
Why not lsquoenterprise widersquoCompany
Customer
Sales Customer
International
Sales Customer
Local
Sales Customer
Marketing Customer
Customer hellip
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description]
[Owner Responsible]
Column level [Load Date Timestamp]
[Source system] on table file level (lowest possible)
Business Concept Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example-Data
NSR-Station
NS-
Travelcard
NS-
Trainseries
Business Key
IC|855
IC|8852
Sp|7455
St|16050
hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
NS-
Traveller
Business Key
3528 0234 2073 1234
3528 0234 2073 5678
hellip
Business Key
CRM-RW123456
CRM-LAS224466
hellip
rwerschkull
nllinkedincominrogierwerschkull
Most easy entity to be virtualised(if performance allows)
No Hashing amp No surrogateBUSINESSKEYS
(not by default at least)
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Containts Context about a Concept In a historical way
hellipLike a Data Vault Satellite
Every CC belongs to only one BC
Seperate entity per source system table stream
Concept Context (CC)
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Concept Context Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
Example-dataBK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts META_Deleted_Ind
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0
Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0
Asa hellip hellip hellip hellip hellip hellip
NSR-Station
[adres]Source NSS1 table y
More difficult to be virtualisedDepends on semantic gap with source
But do make virtual when lsquostreaming datarsquo is necessary
Because we have PS layerExposing all columns not necessary
Refactoring is more easyhellip
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Relations between Concepts + Context
In a historical way
hellipMerger of Data Vault Link + Link Satellite
Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point
in time
Connector (C)
rwerschkull
nllinkedincominrogierwerschkull
Explicitly defining a driving key as metadatahellip
Gives business understanding
Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)
bull is not registered as a new lsquoconnectionrsquo
Connector Driving key
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Connector Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
Driving Key
NS-Travelcard
+Checkin timestamp
Example-Data
NSR-StationNS-
Travelcard
NS-
Trainseries
rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
BK_NSR-
Station-from
BK_NSR-
Station-to
BK_NS_
Trainseries
BK_NS-Travelcard Checkin
timestamp
Checkout
timestamp
META_Load_dts META_Load_end_dts
Asd Ut IC 855 3528 0234 2073
1234
5-4-2016 84932 5-4-2016 94012 6-4-2016
220000
31-12-9999
000000
Ut Asd IC 855 3528 0234 2073
1234
5-4-2016 181009 5-4-2016 185520 6-4-2016
220000
31-12-9999
000000
We add Two mandatory
HELPEREntities
Are we there yet No
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
To help switching from sources that are Tied together by technical (surrogate) keyshellip
To a Business Key based model
Itrsquos a LOOKUP table that translates the technical to the
Business Key
Business Alias
Example
NSR-Station
[valuation]Source NSS2 table p
[description]SourceNSS1 table x[adres]
Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
BA-NSS2
Key Lookup voor NSS1 source tables
Key Lookup voor NSS2 source tables
Example-data
NSR-Station
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
Key Lookup voor NSS1 source tables
Business Key NSS1_Surrogate_key
Ut 123522
Asd 666323
Asa 222443
hellip hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
Has a 1 on (01) relation with a Business Concept
More difficult to be virtualised Lookup table should be kept small
Therefore DO NOT do key lookup in Concept Context entity
Load generate together with BC
Preferably lsquoin memoryrsquo somehowhellip
BA important Details
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Integrate the validity timelines of Concept Contextsbelonging to a Business Concept
Like a Data Vault Point-in-time construct
But Mandatory
And with a clearly defined and performantapproach
BC-Timeline
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-Station
[valuation]Source NSS2 table p
[adres]Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BK_NSR-
Station
WOZ
waarde
Waarde
Ratingbureau X
META_Laad_dts META_Laad_eind_dts
Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959
Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959
Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000
BK_NSR-
Station
Combined_Load_dts Combined_Load_end_dts
Ut 5-6-2013 220000 1-1-2014 215959
Ut 1-1-2014 220000 1-1-2015 215959
Ut 1-1-2015 220000 4-7-2015 215959
Ut 4-7-2015 220000 1-3-2016 215959
Ut 1-3-2016 220000 31-12-9999 000000
Asd 5-6-2013 220000 31-12-9999 000000
BK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959
Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000
Asa hellip hellip hellip hellip hellip hellip
BCT
modelling
rwerschkull
nllinkedincominrogierwerschkull
What
Makes PS-3Ca Different Ensemble
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
rwerschkull
nllinkedincominrogierwerschkull
1) Explicitly Splitting The work
Data
+History
Subjects
+Integration
rwerschkull
nllinkedincominrogierwerschkull
2) NO HASHED BUSINeSS KEYS
or surrogate keys
httpwwwcannabisculturecomfilesimages6hashbrickJPG
Only
Concatenatedones
rwerschkull
nllinkedincominrogierwerschkull
3) Less
joinsRelation
+Technical validity timeline
+ Relation context
Together in one entityPhoto credit Public Domain
rwerschkull
nllinkedincominrogierwerschkull
4) Explicit
HelpEREntities
Business Alias
Business Component Timeline
+ explicitly define Driving key(s)
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
Could be a
Data LAKE
VirtualisedEnsemble
Tier
EDW
Time Variant
amp
Non Volatile
Subject Oriented
amp
Integrated
EDW
rwerschkull
nllinkedincominrogierwerschkull
How
Does PS-3C Work
rwerschkull
nllinkedincominrogierwerschkullPhoto credit Public Domain
StagingArea
EDWInformation
Marts
Focus of Current ensemble EDWrsquos
rwerschkull
nllinkedincominrogierwerschkull
Persistent StagingArea HSA =
Data LibraryEDW
Information Marts
Splitting the work
rwerschkull
nllinkedincominrogierwerschkull
Persitent
Staging
-
Concept
Context
Connector
Business
Concept
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
Identify source event stream Primary or Unique KeyUse source metadata for this
Automate the building of a PS lsquoaround this keyrsquo Take all columns
Historize using SCD-2 approach
Persistent Staging - how
rwerschkull
nllinkedincominrogierwerschkull
Entity levelUnique key
Functional Description
Delivering party
Owner Responsible
MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)
hellip
Persistent Staging Metadata-1
rwerschkull
nllinkedincominrogierwerschkull
Column level [Load Date Timestamp]
[Load End Date Timestamp]
[Deleted Flag] OR delete as new record
[Source system] on table file level (lowest possible)
Load End Date Timestamp possible but difficulthellipRequires updates
Persistent Staging Metadata-2
rwerschkull
nllinkedincominrogierwerschkull
ACID is possible in HIVE
ACID Makes Updates possibleBy registering updates as lsquonew datarsquo
Reconciliation compacting when idle at user command
Use ORC files
PLUS changing the HIVE configurationhellip
UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)
rwerschkull
nllinkedincominrogierwerschkull
HivePut semi structured data = variable columns in MAP data type
OR use Data storage type that supports schema-evolutionAVRO (ORC in development)
Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo
Schema can be different for every row
What about SEMI-STRUCTURED Data
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
3C - how
rwerschkull
nllinkedincominrogierwerschkull
Always starts with Conceptual data modeling
NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic
No Link Satellites
No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo
Explicit Helper entities
Like Data Vault(2) BUT
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
a UNIQUE Domain specific point of integration
hellipa business entity
hellipwithin itrsquos own domain
hellipdoes not necessarily need to be Enterprise Wide
Business Concept (BC)
rwerschkull
nllinkedincominrogierwerschkull
Why not lsquoenterprise widersquoCompany
Customer
Sales Customer
International
Sales Customer
Local
Sales Customer
Marketing Customer
Customer hellip
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description]
[Owner Responsible]
Column level [Load Date Timestamp]
[Source system] on table file level (lowest possible)
Business Concept Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example-Data
NSR-Station
NS-
Travelcard
NS-
Trainseries
Business Key
IC|855
IC|8852
Sp|7455
St|16050
hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
NS-
Traveller
Business Key
3528 0234 2073 1234
3528 0234 2073 5678
hellip
Business Key
CRM-RW123456
CRM-LAS224466
hellip
rwerschkull
nllinkedincominrogierwerschkull
Most easy entity to be virtualised(if performance allows)
No Hashing amp No surrogateBUSINESSKEYS
(not by default at least)
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Containts Context about a Concept In a historical way
hellipLike a Data Vault Satellite
Every CC belongs to only one BC
Seperate entity per source system table stream
Concept Context (CC)
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Concept Context Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
Example-dataBK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts META_Deleted_Ind
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0
Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0
Asa hellip hellip hellip hellip hellip hellip
NSR-Station
[adres]Source NSS1 table y
More difficult to be virtualisedDepends on semantic gap with source
But do make virtual when lsquostreaming datarsquo is necessary
Because we have PS layerExposing all columns not necessary
Refactoring is more easyhellip
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Relations between Concepts + Context
In a historical way
hellipMerger of Data Vault Link + Link Satellite
Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point
in time
Connector (C)
rwerschkull
nllinkedincominrogierwerschkull
Explicitly defining a driving key as metadatahellip
Gives business understanding
Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)
bull is not registered as a new lsquoconnectionrsquo
Connector Driving key
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Connector Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
Driving Key
NS-Travelcard
+Checkin timestamp
Example-Data
NSR-StationNS-
Travelcard
NS-
Trainseries
rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
BK_NSR-
Station-from
BK_NSR-
Station-to
BK_NS_
Trainseries
BK_NS-Travelcard Checkin
timestamp
Checkout
timestamp
META_Load_dts META_Load_end_dts
Asd Ut IC 855 3528 0234 2073
1234
5-4-2016 84932 5-4-2016 94012 6-4-2016
220000
31-12-9999
000000
Ut Asd IC 855 3528 0234 2073
1234
5-4-2016 181009 5-4-2016 185520 6-4-2016
220000
31-12-9999
000000
We add Two mandatory
HELPEREntities
Are we there yet No
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
To help switching from sources that are Tied together by technical (surrogate) keyshellip
To a Business Key based model
Itrsquos a LOOKUP table that translates the technical to the
Business Key
Business Alias
Example
NSR-Station
[valuation]Source NSS2 table p
[description]SourceNSS1 table x[adres]
Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
BA-NSS2
Key Lookup voor NSS1 source tables
Key Lookup voor NSS2 source tables
Example-data
NSR-Station
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
Key Lookup voor NSS1 source tables
Business Key NSS1_Surrogate_key
Ut 123522
Asd 666323
Asa 222443
hellip hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
Has a 1 on (01) relation with a Business Concept
More difficult to be virtualised Lookup table should be kept small
Therefore DO NOT do key lookup in Concept Context entity
Load generate together with BC
Preferably lsquoin memoryrsquo somehowhellip
BA important Details
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Integrate the validity timelines of Concept Contextsbelonging to a Business Concept
Like a Data Vault Point-in-time construct
But Mandatory
And with a clearly defined and performantapproach
BC-Timeline
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-Station
[valuation]Source NSS2 table p
[adres]Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BK_NSR-
Station
WOZ
waarde
Waarde
Ratingbureau X
META_Laad_dts META_Laad_eind_dts
Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959
Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959
Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000
BK_NSR-
Station
Combined_Load_dts Combined_Load_end_dts
Ut 5-6-2013 220000 1-1-2014 215959
Ut 1-1-2014 220000 1-1-2015 215959
Ut 1-1-2015 220000 4-7-2015 215959
Ut 4-7-2015 220000 1-3-2016 215959
Ut 1-3-2016 220000 31-12-9999 000000
Asd 5-6-2013 220000 31-12-9999 000000
BK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959
Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000
Asa hellip hellip hellip hellip hellip hellip
BCT
modelling
rwerschkull
nllinkedincominrogierwerschkull
What
Makes PS-3Ca Different Ensemble
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
rwerschkull
nllinkedincominrogierwerschkull
1) Explicitly Splitting The work
Data
+History
Subjects
+Integration
rwerschkull
nllinkedincominrogierwerschkull
2) NO HASHED BUSINeSS KEYS
or surrogate keys
httpwwwcannabisculturecomfilesimages6hashbrickJPG
Only
Concatenatedones
rwerschkull
nllinkedincominrogierwerschkull
3) Less
joinsRelation
+Technical validity timeline
+ Relation context
Together in one entityPhoto credit Public Domain
rwerschkull
nllinkedincominrogierwerschkull
4) Explicit
HelpEREntities
Business Alias
Business Component Timeline
+ explicitly define Driving key(s)
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
How
Does PS-3C Work
rwerschkull
nllinkedincominrogierwerschkullPhoto credit Public Domain
StagingArea
EDWInformation
Marts
Focus of Current ensemble EDWrsquos
rwerschkull
nllinkedincominrogierwerschkull
Persistent StagingArea HSA =
Data LibraryEDW
Information Marts
Splitting the work
rwerschkull
nllinkedincominrogierwerschkull
Persitent
Staging
-
Concept
Context
Connector
Business
Concept
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
Identify source event stream Primary or Unique KeyUse source metadata for this
Automate the building of a PS lsquoaround this keyrsquo Take all columns
Historize using SCD-2 approach
Persistent Staging - how
rwerschkull
nllinkedincominrogierwerschkull
Entity levelUnique key
Functional Description
Delivering party
Owner Responsible
MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)
hellip
Persistent Staging Metadata-1
rwerschkull
nllinkedincominrogierwerschkull
Column level [Load Date Timestamp]
[Load End Date Timestamp]
[Deleted Flag] OR delete as new record
[Source system] on table file level (lowest possible)
Load End Date Timestamp possible but difficulthellipRequires updates
Persistent Staging Metadata-2
rwerschkull
nllinkedincominrogierwerschkull
ACID is possible in HIVE
ACID Makes Updates possibleBy registering updates as lsquonew datarsquo
Reconciliation compacting when idle at user command
Use ORC files
PLUS changing the HIVE configurationhellip
UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)
rwerschkull
nllinkedincominrogierwerschkull
HivePut semi structured data = variable columns in MAP data type
OR use Data storage type that supports schema-evolutionAVRO (ORC in development)
Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo
Schema can be different for every row
What about SEMI-STRUCTURED Data
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
3C - how
rwerschkull
nllinkedincominrogierwerschkull
Always starts with Conceptual data modeling
NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic
No Link Satellites
No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo
Explicit Helper entities
Like Data Vault(2) BUT
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
a UNIQUE Domain specific point of integration
hellipa business entity
hellipwithin itrsquos own domain
hellipdoes not necessarily need to be Enterprise Wide
Business Concept (BC)
rwerschkull
nllinkedincominrogierwerschkull
Why not lsquoenterprise widersquoCompany
Customer
Sales Customer
International
Sales Customer
Local
Sales Customer
Marketing Customer
Customer hellip
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description]
[Owner Responsible]
Column level [Load Date Timestamp]
[Source system] on table file level (lowest possible)
Business Concept Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example-Data
NSR-Station
NS-
Travelcard
NS-
Trainseries
Business Key
IC|855
IC|8852
Sp|7455
St|16050
hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
NS-
Traveller
Business Key
3528 0234 2073 1234
3528 0234 2073 5678
hellip
Business Key
CRM-RW123456
CRM-LAS224466
hellip
rwerschkull
nllinkedincominrogierwerschkull
Most easy entity to be virtualised(if performance allows)
No Hashing amp No surrogateBUSINESSKEYS
(not by default at least)
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Containts Context about a Concept In a historical way
hellipLike a Data Vault Satellite
Every CC belongs to only one BC
Seperate entity per source system table stream
Concept Context (CC)
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Concept Context Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
Example-dataBK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts META_Deleted_Ind
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0
Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0
Asa hellip hellip hellip hellip hellip hellip
NSR-Station
[adres]Source NSS1 table y
More difficult to be virtualisedDepends on semantic gap with source
But do make virtual when lsquostreaming datarsquo is necessary
Because we have PS layerExposing all columns not necessary
Refactoring is more easyhellip
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Relations between Concepts + Context
In a historical way
hellipMerger of Data Vault Link + Link Satellite
Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point
in time
Connector (C)
rwerschkull
nllinkedincominrogierwerschkull
Explicitly defining a driving key as metadatahellip
Gives business understanding
Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)
bull is not registered as a new lsquoconnectionrsquo
Connector Driving key
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Connector Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
Driving Key
NS-Travelcard
+Checkin timestamp
Example-Data
NSR-StationNS-
Travelcard
NS-
Trainseries
rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
BK_NSR-
Station-from
BK_NSR-
Station-to
BK_NS_
Trainseries
BK_NS-Travelcard Checkin
timestamp
Checkout
timestamp
META_Load_dts META_Load_end_dts
Asd Ut IC 855 3528 0234 2073
1234
5-4-2016 84932 5-4-2016 94012 6-4-2016
220000
31-12-9999
000000
Ut Asd IC 855 3528 0234 2073
1234
5-4-2016 181009 5-4-2016 185520 6-4-2016
220000
31-12-9999
000000
We add Two mandatory
HELPEREntities
Are we there yet No
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
To help switching from sources that are Tied together by technical (surrogate) keyshellip
To a Business Key based model
Itrsquos a LOOKUP table that translates the technical to the
Business Key
Business Alias
Example
NSR-Station
[valuation]Source NSS2 table p
[description]SourceNSS1 table x[adres]
Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
BA-NSS2
Key Lookup voor NSS1 source tables
Key Lookup voor NSS2 source tables
Example-data
NSR-Station
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
Key Lookup voor NSS1 source tables
Business Key NSS1_Surrogate_key
Ut 123522
Asd 666323
Asa 222443
hellip hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
Has a 1 on (01) relation with a Business Concept
More difficult to be virtualised Lookup table should be kept small
Therefore DO NOT do key lookup in Concept Context entity
Load generate together with BC
Preferably lsquoin memoryrsquo somehowhellip
BA important Details
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Integrate the validity timelines of Concept Contextsbelonging to a Business Concept
Like a Data Vault Point-in-time construct
But Mandatory
And with a clearly defined and performantapproach
BC-Timeline
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-Station
[valuation]Source NSS2 table p
[adres]Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BK_NSR-
Station
WOZ
waarde
Waarde
Ratingbureau X
META_Laad_dts META_Laad_eind_dts
Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959
Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959
Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000
BK_NSR-
Station
Combined_Load_dts Combined_Load_end_dts
Ut 5-6-2013 220000 1-1-2014 215959
Ut 1-1-2014 220000 1-1-2015 215959
Ut 1-1-2015 220000 4-7-2015 215959
Ut 4-7-2015 220000 1-3-2016 215959
Ut 1-3-2016 220000 31-12-9999 000000
Asd 5-6-2013 220000 31-12-9999 000000
BK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959
Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000
Asa hellip hellip hellip hellip hellip hellip
BCT
modelling
rwerschkull
nllinkedincominrogierwerschkull
What
Makes PS-3Ca Different Ensemble
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
rwerschkull
nllinkedincominrogierwerschkull
1) Explicitly Splitting The work
Data
+History
Subjects
+Integration
rwerschkull
nllinkedincominrogierwerschkull
2) NO HASHED BUSINeSS KEYS
or surrogate keys
httpwwwcannabisculturecomfilesimages6hashbrickJPG
Only
Concatenatedones
rwerschkull
nllinkedincominrogierwerschkull
3) Less
joinsRelation
+Technical validity timeline
+ Relation context
Together in one entityPhoto credit Public Domain
rwerschkull
nllinkedincominrogierwerschkull
4) Explicit
HelpEREntities
Business Alias
Business Component Timeline
+ explicitly define Driving key(s)
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
StagingArea
EDWInformation
Marts
Focus of Current ensemble EDWrsquos
rwerschkull
nllinkedincominrogierwerschkull
Persistent StagingArea HSA =
Data LibraryEDW
Information Marts
Splitting the work
rwerschkull
nllinkedincominrogierwerschkull
Persitent
Staging
-
Concept
Context
Connector
Business
Concept
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
Identify source event stream Primary or Unique KeyUse source metadata for this
Automate the building of a PS lsquoaround this keyrsquo Take all columns
Historize using SCD-2 approach
Persistent Staging - how
rwerschkull
nllinkedincominrogierwerschkull
Entity levelUnique key
Functional Description
Delivering party
Owner Responsible
MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)
hellip
Persistent Staging Metadata-1
rwerschkull
nllinkedincominrogierwerschkull
Column level [Load Date Timestamp]
[Load End Date Timestamp]
[Deleted Flag] OR delete as new record
[Source system] on table file level (lowest possible)
Load End Date Timestamp possible but difficulthellipRequires updates
Persistent Staging Metadata-2
rwerschkull
nllinkedincominrogierwerschkull
ACID is possible in HIVE
ACID Makes Updates possibleBy registering updates as lsquonew datarsquo
Reconciliation compacting when idle at user command
Use ORC files
PLUS changing the HIVE configurationhellip
UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)
rwerschkull
nllinkedincominrogierwerschkull
HivePut semi structured data = variable columns in MAP data type
OR use Data storage type that supports schema-evolutionAVRO (ORC in development)
Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo
Schema can be different for every row
What about SEMI-STRUCTURED Data
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
3C - how
rwerschkull
nllinkedincominrogierwerschkull
Always starts with Conceptual data modeling
NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic
No Link Satellites
No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo
Explicit Helper entities
Like Data Vault(2) BUT
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
a UNIQUE Domain specific point of integration
hellipa business entity
hellipwithin itrsquos own domain
hellipdoes not necessarily need to be Enterprise Wide
Business Concept (BC)
rwerschkull
nllinkedincominrogierwerschkull
Why not lsquoenterprise widersquoCompany
Customer
Sales Customer
International
Sales Customer
Local
Sales Customer
Marketing Customer
Customer hellip
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description]
[Owner Responsible]
Column level [Load Date Timestamp]
[Source system] on table file level (lowest possible)
Business Concept Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example-Data
NSR-Station
NS-
Travelcard
NS-
Trainseries
Business Key
IC|855
IC|8852
Sp|7455
St|16050
hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
NS-
Traveller
Business Key
3528 0234 2073 1234
3528 0234 2073 5678
hellip
Business Key
CRM-RW123456
CRM-LAS224466
hellip
rwerschkull
nllinkedincominrogierwerschkull
Most easy entity to be virtualised(if performance allows)
No Hashing amp No surrogateBUSINESSKEYS
(not by default at least)
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Containts Context about a Concept In a historical way
hellipLike a Data Vault Satellite
Every CC belongs to only one BC
Seperate entity per source system table stream
Concept Context (CC)
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Concept Context Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
Example-dataBK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts META_Deleted_Ind
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0
Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0
Asa hellip hellip hellip hellip hellip hellip
NSR-Station
[adres]Source NSS1 table y
More difficult to be virtualisedDepends on semantic gap with source
But do make virtual when lsquostreaming datarsquo is necessary
Because we have PS layerExposing all columns not necessary
Refactoring is more easyhellip
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Relations between Concepts + Context
In a historical way
hellipMerger of Data Vault Link + Link Satellite
Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point
in time
Connector (C)
rwerschkull
nllinkedincominrogierwerschkull
Explicitly defining a driving key as metadatahellip
Gives business understanding
Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)
bull is not registered as a new lsquoconnectionrsquo
Connector Driving key
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Connector Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
Driving Key
NS-Travelcard
+Checkin timestamp
Example-Data
NSR-StationNS-
Travelcard
NS-
Trainseries
rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
BK_NSR-
Station-from
BK_NSR-
Station-to
BK_NS_
Trainseries
BK_NS-Travelcard Checkin
timestamp
Checkout
timestamp
META_Load_dts META_Load_end_dts
Asd Ut IC 855 3528 0234 2073
1234
5-4-2016 84932 5-4-2016 94012 6-4-2016
220000
31-12-9999
000000
Ut Asd IC 855 3528 0234 2073
1234
5-4-2016 181009 5-4-2016 185520 6-4-2016
220000
31-12-9999
000000
We add Two mandatory
HELPEREntities
Are we there yet No
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
To help switching from sources that are Tied together by technical (surrogate) keyshellip
To a Business Key based model
Itrsquos a LOOKUP table that translates the technical to the
Business Key
Business Alias
Example
NSR-Station
[valuation]Source NSS2 table p
[description]SourceNSS1 table x[adres]
Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
BA-NSS2
Key Lookup voor NSS1 source tables
Key Lookup voor NSS2 source tables
Example-data
NSR-Station
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
Key Lookup voor NSS1 source tables
Business Key NSS1_Surrogate_key
Ut 123522
Asd 666323
Asa 222443
hellip hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
Has a 1 on (01) relation with a Business Concept
More difficult to be virtualised Lookup table should be kept small
Therefore DO NOT do key lookup in Concept Context entity
Load generate together with BC
Preferably lsquoin memoryrsquo somehowhellip
BA important Details
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Integrate the validity timelines of Concept Contextsbelonging to a Business Concept
Like a Data Vault Point-in-time construct
But Mandatory
And with a clearly defined and performantapproach
BC-Timeline
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-Station
[valuation]Source NSS2 table p
[adres]Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BK_NSR-
Station
WOZ
waarde
Waarde
Ratingbureau X
META_Laad_dts META_Laad_eind_dts
Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959
Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959
Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000
BK_NSR-
Station
Combined_Load_dts Combined_Load_end_dts
Ut 5-6-2013 220000 1-1-2014 215959
Ut 1-1-2014 220000 1-1-2015 215959
Ut 1-1-2015 220000 4-7-2015 215959
Ut 4-7-2015 220000 1-3-2016 215959
Ut 1-3-2016 220000 31-12-9999 000000
Asd 5-6-2013 220000 31-12-9999 000000
BK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959
Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000
Asa hellip hellip hellip hellip hellip hellip
BCT
modelling
rwerschkull
nllinkedincominrogierwerschkull
What
Makes PS-3Ca Different Ensemble
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
rwerschkull
nllinkedincominrogierwerschkull
1) Explicitly Splitting The work
Data
+History
Subjects
+Integration
rwerschkull
nllinkedincominrogierwerschkull
2) NO HASHED BUSINeSS KEYS
or surrogate keys
httpwwwcannabisculturecomfilesimages6hashbrickJPG
Only
Concatenatedones
rwerschkull
nllinkedincominrogierwerschkull
3) Less
joinsRelation
+Technical validity timeline
+ Relation context
Together in one entityPhoto credit Public Domain
rwerschkull
nllinkedincominrogierwerschkull
4) Explicit
HelpEREntities
Business Alias
Business Component Timeline
+ explicitly define Driving key(s)
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
Persistent StagingArea HSA =
Data LibraryEDW
Information Marts
Splitting the work
rwerschkull
nllinkedincominrogierwerschkull
Persitent
Staging
-
Concept
Context
Connector
Business
Concept
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
Identify source event stream Primary or Unique KeyUse source metadata for this
Automate the building of a PS lsquoaround this keyrsquo Take all columns
Historize using SCD-2 approach
Persistent Staging - how
rwerschkull
nllinkedincominrogierwerschkull
Entity levelUnique key
Functional Description
Delivering party
Owner Responsible
MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)
hellip
Persistent Staging Metadata-1
rwerschkull
nllinkedincominrogierwerschkull
Column level [Load Date Timestamp]
[Load End Date Timestamp]
[Deleted Flag] OR delete as new record
[Source system] on table file level (lowest possible)
Load End Date Timestamp possible but difficulthellipRequires updates
Persistent Staging Metadata-2
rwerschkull
nllinkedincominrogierwerschkull
ACID is possible in HIVE
ACID Makes Updates possibleBy registering updates as lsquonew datarsquo
Reconciliation compacting when idle at user command
Use ORC files
PLUS changing the HIVE configurationhellip
UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)
rwerschkull
nllinkedincominrogierwerschkull
HivePut semi structured data = variable columns in MAP data type
OR use Data storage type that supports schema-evolutionAVRO (ORC in development)
Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo
Schema can be different for every row
What about SEMI-STRUCTURED Data
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
3C - how
rwerschkull
nllinkedincominrogierwerschkull
Always starts with Conceptual data modeling
NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic
No Link Satellites
No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo
Explicit Helper entities
Like Data Vault(2) BUT
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
a UNIQUE Domain specific point of integration
hellipa business entity
hellipwithin itrsquos own domain
hellipdoes not necessarily need to be Enterprise Wide
Business Concept (BC)
rwerschkull
nllinkedincominrogierwerschkull
Why not lsquoenterprise widersquoCompany
Customer
Sales Customer
International
Sales Customer
Local
Sales Customer
Marketing Customer
Customer hellip
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description]
[Owner Responsible]
Column level [Load Date Timestamp]
[Source system] on table file level (lowest possible)
Business Concept Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example-Data
NSR-Station
NS-
Travelcard
NS-
Trainseries
Business Key
IC|855
IC|8852
Sp|7455
St|16050
hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
NS-
Traveller
Business Key
3528 0234 2073 1234
3528 0234 2073 5678
hellip
Business Key
CRM-RW123456
CRM-LAS224466
hellip
rwerschkull
nllinkedincominrogierwerschkull
Most easy entity to be virtualised(if performance allows)
No Hashing amp No surrogateBUSINESSKEYS
(not by default at least)
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Containts Context about a Concept In a historical way
hellipLike a Data Vault Satellite
Every CC belongs to only one BC
Seperate entity per source system table stream
Concept Context (CC)
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Concept Context Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
Example-dataBK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts META_Deleted_Ind
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0
Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0
Asa hellip hellip hellip hellip hellip hellip
NSR-Station
[adres]Source NSS1 table y
More difficult to be virtualisedDepends on semantic gap with source
But do make virtual when lsquostreaming datarsquo is necessary
Because we have PS layerExposing all columns not necessary
Refactoring is more easyhellip
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Relations between Concepts + Context
In a historical way
hellipMerger of Data Vault Link + Link Satellite
Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point
in time
Connector (C)
rwerschkull
nllinkedincominrogierwerschkull
Explicitly defining a driving key as metadatahellip
Gives business understanding
Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)
bull is not registered as a new lsquoconnectionrsquo
Connector Driving key
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Connector Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
Driving Key
NS-Travelcard
+Checkin timestamp
Example-Data
NSR-StationNS-
Travelcard
NS-
Trainseries
rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
BK_NSR-
Station-from
BK_NSR-
Station-to
BK_NS_
Trainseries
BK_NS-Travelcard Checkin
timestamp
Checkout
timestamp
META_Load_dts META_Load_end_dts
Asd Ut IC 855 3528 0234 2073
1234
5-4-2016 84932 5-4-2016 94012 6-4-2016
220000
31-12-9999
000000
Ut Asd IC 855 3528 0234 2073
1234
5-4-2016 181009 5-4-2016 185520 6-4-2016
220000
31-12-9999
000000
We add Two mandatory
HELPEREntities
Are we there yet No
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
To help switching from sources that are Tied together by technical (surrogate) keyshellip
To a Business Key based model
Itrsquos a LOOKUP table that translates the technical to the
Business Key
Business Alias
Example
NSR-Station
[valuation]Source NSS2 table p
[description]SourceNSS1 table x[adres]
Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
BA-NSS2
Key Lookup voor NSS1 source tables
Key Lookup voor NSS2 source tables
Example-data
NSR-Station
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
Key Lookup voor NSS1 source tables
Business Key NSS1_Surrogate_key
Ut 123522
Asd 666323
Asa 222443
hellip hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
Has a 1 on (01) relation with a Business Concept
More difficult to be virtualised Lookup table should be kept small
Therefore DO NOT do key lookup in Concept Context entity
Load generate together with BC
Preferably lsquoin memoryrsquo somehowhellip
BA important Details
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Integrate the validity timelines of Concept Contextsbelonging to a Business Concept
Like a Data Vault Point-in-time construct
But Mandatory
And with a clearly defined and performantapproach
BC-Timeline
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-Station
[valuation]Source NSS2 table p
[adres]Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BK_NSR-
Station
WOZ
waarde
Waarde
Ratingbureau X
META_Laad_dts META_Laad_eind_dts
Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959
Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959
Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000
BK_NSR-
Station
Combined_Load_dts Combined_Load_end_dts
Ut 5-6-2013 220000 1-1-2014 215959
Ut 1-1-2014 220000 1-1-2015 215959
Ut 1-1-2015 220000 4-7-2015 215959
Ut 4-7-2015 220000 1-3-2016 215959
Ut 1-3-2016 220000 31-12-9999 000000
Asd 5-6-2013 220000 31-12-9999 000000
BK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959
Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000
Asa hellip hellip hellip hellip hellip hellip
BCT
modelling
rwerschkull
nllinkedincominrogierwerschkull
What
Makes PS-3Ca Different Ensemble
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
rwerschkull
nllinkedincominrogierwerschkull
1) Explicitly Splitting The work
Data
+History
Subjects
+Integration
rwerschkull
nllinkedincominrogierwerschkull
2) NO HASHED BUSINeSS KEYS
or surrogate keys
httpwwwcannabisculturecomfilesimages6hashbrickJPG
Only
Concatenatedones
rwerschkull
nllinkedincominrogierwerschkull
3) Less
joinsRelation
+Technical validity timeline
+ Relation context
Together in one entityPhoto credit Public Domain
rwerschkull
nllinkedincominrogierwerschkull
4) Explicit
HelpEREntities
Business Alias
Business Component Timeline
+ explicitly define Driving key(s)
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
Persitent
Staging
-
Concept
Context
Connector
Business
Concept
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
Identify source event stream Primary or Unique KeyUse source metadata for this
Automate the building of a PS lsquoaround this keyrsquo Take all columns
Historize using SCD-2 approach
Persistent Staging - how
rwerschkull
nllinkedincominrogierwerschkull
Entity levelUnique key
Functional Description
Delivering party
Owner Responsible
MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)
hellip
Persistent Staging Metadata-1
rwerschkull
nllinkedincominrogierwerschkull
Column level [Load Date Timestamp]
[Load End Date Timestamp]
[Deleted Flag] OR delete as new record
[Source system] on table file level (lowest possible)
Load End Date Timestamp possible but difficulthellipRequires updates
Persistent Staging Metadata-2
rwerschkull
nllinkedincominrogierwerschkull
ACID is possible in HIVE
ACID Makes Updates possibleBy registering updates as lsquonew datarsquo
Reconciliation compacting when idle at user command
Use ORC files
PLUS changing the HIVE configurationhellip
UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)
rwerschkull
nllinkedincominrogierwerschkull
HivePut semi structured data = variable columns in MAP data type
OR use Data storage type that supports schema-evolutionAVRO (ORC in development)
Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo
Schema can be different for every row
What about SEMI-STRUCTURED Data
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
3C - how
rwerschkull
nllinkedincominrogierwerschkull
Always starts with Conceptual data modeling
NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic
No Link Satellites
No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo
Explicit Helper entities
Like Data Vault(2) BUT
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
a UNIQUE Domain specific point of integration
hellipa business entity
hellipwithin itrsquos own domain
hellipdoes not necessarily need to be Enterprise Wide
Business Concept (BC)
rwerschkull
nllinkedincominrogierwerschkull
Why not lsquoenterprise widersquoCompany
Customer
Sales Customer
International
Sales Customer
Local
Sales Customer
Marketing Customer
Customer hellip
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description]
[Owner Responsible]
Column level [Load Date Timestamp]
[Source system] on table file level (lowest possible)
Business Concept Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example-Data
NSR-Station
NS-
Travelcard
NS-
Trainseries
Business Key
IC|855
IC|8852
Sp|7455
St|16050
hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
NS-
Traveller
Business Key
3528 0234 2073 1234
3528 0234 2073 5678
hellip
Business Key
CRM-RW123456
CRM-LAS224466
hellip
rwerschkull
nllinkedincominrogierwerschkull
Most easy entity to be virtualised(if performance allows)
No Hashing amp No surrogateBUSINESSKEYS
(not by default at least)
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Containts Context about a Concept In a historical way
hellipLike a Data Vault Satellite
Every CC belongs to only one BC
Seperate entity per source system table stream
Concept Context (CC)
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Concept Context Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
Example-dataBK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts META_Deleted_Ind
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0
Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0
Asa hellip hellip hellip hellip hellip hellip
NSR-Station
[adres]Source NSS1 table y
More difficult to be virtualisedDepends on semantic gap with source
But do make virtual when lsquostreaming datarsquo is necessary
Because we have PS layerExposing all columns not necessary
Refactoring is more easyhellip
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Relations between Concepts + Context
In a historical way
hellipMerger of Data Vault Link + Link Satellite
Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point
in time
Connector (C)
rwerschkull
nllinkedincominrogierwerschkull
Explicitly defining a driving key as metadatahellip
Gives business understanding
Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)
bull is not registered as a new lsquoconnectionrsquo
Connector Driving key
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Connector Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
Driving Key
NS-Travelcard
+Checkin timestamp
Example-Data
NSR-StationNS-
Travelcard
NS-
Trainseries
rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
BK_NSR-
Station-from
BK_NSR-
Station-to
BK_NS_
Trainseries
BK_NS-Travelcard Checkin
timestamp
Checkout
timestamp
META_Load_dts META_Load_end_dts
Asd Ut IC 855 3528 0234 2073
1234
5-4-2016 84932 5-4-2016 94012 6-4-2016
220000
31-12-9999
000000
Ut Asd IC 855 3528 0234 2073
1234
5-4-2016 181009 5-4-2016 185520 6-4-2016
220000
31-12-9999
000000
We add Two mandatory
HELPEREntities
Are we there yet No
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
To help switching from sources that are Tied together by technical (surrogate) keyshellip
To a Business Key based model
Itrsquos a LOOKUP table that translates the technical to the
Business Key
Business Alias
Example
NSR-Station
[valuation]Source NSS2 table p
[description]SourceNSS1 table x[adres]
Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
BA-NSS2
Key Lookup voor NSS1 source tables
Key Lookup voor NSS2 source tables
Example-data
NSR-Station
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
Key Lookup voor NSS1 source tables
Business Key NSS1_Surrogate_key
Ut 123522
Asd 666323
Asa 222443
hellip hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
Has a 1 on (01) relation with a Business Concept
More difficult to be virtualised Lookup table should be kept small
Therefore DO NOT do key lookup in Concept Context entity
Load generate together with BC
Preferably lsquoin memoryrsquo somehowhellip
BA important Details
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Integrate the validity timelines of Concept Contextsbelonging to a Business Concept
Like a Data Vault Point-in-time construct
But Mandatory
And with a clearly defined and performantapproach
BC-Timeline
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-Station
[valuation]Source NSS2 table p
[adres]Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BK_NSR-
Station
WOZ
waarde
Waarde
Ratingbureau X
META_Laad_dts META_Laad_eind_dts
Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959
Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959
Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000
BK_NSR-
Station
Combined_Load_dts Combined_Load_end_dts
Ut 5-6-2013 220000 1-1-2014 215959
Ut 1-1-2014 220000 1-1-2015 215959
Ut 1-1-2015 220000 4-7-2015 215959
Ut 4-7-2015 220000 1-3-2016 215959
Ut 1-3-2016 220000 31-12-9999 000000
Asd 5-6-2013 220000 31-12-9999 000000
BK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959
Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000
Asa hellip hellip hellip hellip hellip hellip
BCT
modelling
rwerschkull
nllinkedincominrogierwerschkull
What
Makes PS-3Ca Different Ensemble
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
rwerschkull
nllinkedincominrogierwerschkull
1) Explicitly Splitting The work
Data
+History
Subjects
+Integration
rwerschkull
nllinkedincominrogierwerschkull
2) NO HASHED BUSINeSS KEYS
or surrogate keys
httpwwwcannabisculturecomfilesimages6hashbrickJPG
Only
Concatenatedones
rwerschkull
nllinkedincominrogierwerschkull
3) Less
joinsRelation
+Technical validity timeline
+ Relation context
Together in one entityPhoto credit Public Domain
rwerschkull
nllinkedincominrogierwerschkull
4) Explicit
HelpEREntities
Business Alias
Business Component Timeline
+ explicitly define Driving key(s)
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
Identify source event stream Primary or Unique KeyUse source metadata for this
Automate the building of a PS lsquoaround this keyrsquo Take all columns
Historize using SCD-2 approach
Persistent Staging - how
rwerschkull
nllinkedincominrogierwerschkull
Entity levelUnique key
Functional Description
Delivering party
Owner Responsible
MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)
hellip
Persistent Staging Metadata-1
rwerschkull
nllinkedincominrogierwerschkull
Column level [Load Date Timestamp]
[Load End Date Timestamp]
[Deleted Flag] OR delete as new record
[Source system] on table file level (lowest possible)
Load End Date Timestamp possible but difficulthellipRequires updates
Persistent Staging Metadata-2
rwerschkull
nllinkedincominrogierwerschkull
ACID is possible in HIVE
ACID Makes Updates possibleBy registering updates as lsquonew datarsquo
Reconciliation compacting when idle at user command
Use ORC files
PLUS changing the HIVE configurationhellip
UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)
rwerschkull
nllinkedincominrogierwerschkull
HivePut semi structured data = variable columns in MAP data type
OR use Data storage type that supports schema-evolutionAVRO (ORC in development)
Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo
Schema can be different for every row
What about SEMI-STRUCTURED Data
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
3C - how
rwerschkull
nllinkedincominrogierwerschkull
Always starts with Conceptual data modeling
NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic
No Link Satellites
No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo
Explicit Helper entities
Like Data Vault(2) BUT
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
a UNIQUE Domain specific point of integration
hellipa business entity
hellipwithin itrsquos own domain
hellipdoes not necessarily need to be Enterprise Wide
Business Concept (BC)
rwerschkull
nllinkedincominrogierwerschkull
Why not lsquoenterprise widersquoCompany
Customer
Sales Customer
International
Sales Customer
Local
Sales Customer
Marketing Customer
Customer hellip
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description]
[Owner Responsible]
Column level [Load Date Timestamp]
[Source system] on table file level (lowest possible)
Business Concept Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example-Data
NSR-Station
NS-
Travelcard
NS-
Trainseries
Business Key
IC|855
IC|8852
Sp|7455
St|16050
hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
NS-
Traveller
Business Key
3528 0234 2073 1234
3528 0234 2073 5678
hellip
Business Key
CRM-RW123456
CRM-LAS224466
hellip
rwerschkull
nllinkedincominrogierwerschkull
Most easy entity to be virtualised(if performance allows)
No Hashing amp No surrogateBUSINESSKEYS
(not by default at least)
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Containts Context about a Concept In a historical way
hellipLike a Data Vault Satellite
Every CC belongs to only one BC
Seperate entity per source system table stream
Concept Context (CC)
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Concept Context Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
Example-dataBK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts META_Deleted_Ind
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0
Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0
Asa hellip hellip hellip hellip hellip hellip
NSR-Station
[adres]Source NSS1 table y
More difficult to be virtualisedDepends on semantic gap with source
But do make virtual when lsquostreaming datarsquo is necessary
Because we have PS layerExposing all columns not necessary
Refactoring is more easyhellip
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Relations between Concepts + Context
In a historical way
hellipMerger of Data Vault Link + Link Satellite
Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point
in time
Connector (C)
rwerschkull
nllinkedincominrogierwerschkull
Explicitly defining a driving key as metadatahellip
Gives business understanding
Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)
bull is not registered as a new lsquoconnectionrsquo
Connector Driving key
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Connector Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
Driving Key
NS-Travelcard
+Checkin timestamp
Example-Data
NSR-StationNS-
Travelcard
NS-
Trainseries
rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
BK_NSR-
Station-from
BK_NSR-
Station-to
BK_NS_
Trainseries
BK_NS-Travelcard Checkin
timestamp
Checkout
timestamp
META_Load_dts META_Load_end_dts
Asd Ut IC 855 3528 0234 2073
1234
5-4-2016 84932 5-4-2016 94012 6-4-2016
220000
31-12-9999
000000
Ut Asd IC 855 3528 0234 2073
1234
5-4-2016 181009 5-4-2016 185520 6-4-2016
220000
31-12-9999
000000
We add Two mandatory
HELPEREntities
Are we there yet No
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
To help switching from sources that are Tied together by technical (surrogate) keyshellip
To a Business Key based model
Itrsquos a LOOKUP table that translates the technical to the
Business Key
Business Alias
Example
NSR-Station
[valuation]Source NSS2 table p
[description]SourceNSS1 table x[adres]
Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
BA-NSS2
Key Lookup voor NSS1 source tables
Key Lookup voor NSS2 source tables
Example-data
NSR-Station
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
Key Lookup voor NSS1 source tables
Business Key NSS1_Surrogate_key
Ut 123522
Asd 666323
Asa 222443
hellip hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
Has a 1 on (01) relation with a Business Concept
More difficult to be virtualised Lookup table should be kept small
Therefore DO NOT do key lookup in Concept Context entity
Load generate together with BC
Preferably lsquoin memoryrsquo somehowhellip
BA important Details
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Integrate the validity timelines of Concept Contextsbelonging to a Business Concept
Like a Data Vault Point-in-time construct
But Mandatory
And with a clearly defined and performantapproach
BC-Timeline
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-Station
[valuation]Source NSS2 table p
[adres]Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BK_NSR-
Station
WOZ
waarde
Waarde
Ratingbureau X
META_Laad_dts META_Laad_eind_dts
Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959
Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959
Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000
BK_NSR-
Station
Combined_Load_dts Combined_Load_end_dts
Ut 5-6-2013 220000 1-1-2014 215959
Ut 1-1-2014 220000 1-1-2015 215959
Ut 1-1-2015 220000 4-7-2015 215959
Ut 4-7-2015 220000 1-3-2016 215959
Ut 1-3-2016 220000 31-12-9999 000000
Asd 5-6-2013 220000 31-12-9999 000000
BK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959
Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000
Asa hellip hellip hellip hellip hellip hellip
BCT
modelling
rwerschkull
nllinkedincominrogierwerschkull
What
Makes PS-3Ca Different Ensemble
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
rwerschkull
nllinkedincominrogierwerschkull
1) Explicitly Splitting The work
Data
+History
Subjects
+Integration
rwerschkull
nllinkedincominrogierwerschkull
2) NO HASHED BUSINeSS KEYS
or surrogate keys
httpwwwcannabisculturecomfilesimages6hashbrickJPG
Only
Concatenatedones
rwerschkull
nllinkedincominrogierwerschkull
3) Less
joinsRelation
+Technical validity timeline
+ Relation context
Together in one entityPhoto credit Public Domain
rwerschkull
nllinkedincominrogierwerschkull
4) Explicit
HelpEREntities
Business Alias
Business Component Timeline
+ explicitly define Driving key(s)
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
Identify source event stream Primary or Unique KeyUse source metadata for this
Automate the building of a PS lsquoaround this keyrsquo Take all columns
Historize using SCD-2 approach
Persistent Staging - how
rwerschkull
nllinkedincominrogierwerschkull
Entity levelUnique key
Functional Description
Delivering party
Owner Responsible
MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)
hellip
Persistent Staging Metadata-1
rwerschkull
nllinkedincominrogierwerschkull
Column level [Load Date Timestamp]
[Load End Date Timestamp]
[Deleted Flag] OR delete as new record
[Source system] on table file level (lowest possible)
Load End Date Timestamp possible but difficulthellipRequires updates
Persistent Staging Metadata-2
rwerschkull
nllinkedincominrogierwerschkull
ACID is possible in HIVE
ACID Makes Updates possibleBy registering updates as lsquonew datarsquo
Reconciliation compacting when idle at user command
Use ORC files
PLUS changing the HIVE configurationhellip
UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)
rwerschkull
nllinkedincominrogierwerschkull
HivePut semi structured data = variable columns in MAP data type
OR use Data storage type that supports schema-evolutionAVRO (ORC in development)
Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo
Schema can be different for every row
What about SEMI-STRUCTURED Data
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
3C - how
rwerschkull
nllinkedincominrogierwerschkull
Always starts with Conceptual data modeling
NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic
No Link Satellites
No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo
Explicit Helper entities
Like Data Vault(2) BUT
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
a UNIQUE Domain specific point of integration
hellipa business entity
hellipwithin itrsquos own domain
hellipdoes not necessarily need to be Enterprise Wide
Business Concept (BC)
rwerschkull
nllinkedincominrogierwerschkull
Why not lsquoenterprise widersquoCompany
Customer
Sales Customer
International
Sales Customer
Local
Sales Customer
Marketing Customer
Customer hellip
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description]
[Owner Responsible]
Column level [Load Date Timestamp]
[Source system] on table file level (lowest possible)
Business Concept Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example-Data
NSR-Station
NS-
Travelcard
NS-
Trainseries
Business Key
IC|855
IC|8852
Sp|7455
St|16050
hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
NS-
Traveller
Business Key
3528 0234 2073 1234
3528 0234 2073 5678
hellip
Business Key
CRM-RW123456
CRM-LAS224466
hellip
rwerschkull
nllinkedincominrogierwerschkull
Most easy entity to be virtualised(if performance allows)
No Hashing amp No surrogateBUSINESSKEYS
(not by default at least)
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Containts Context about a Concept In a historical way
hellipLike a Data Vault Satellite
Every CC belongs to only one BC
Seperate entity per source system table stream
Concept Context (CC)
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Concept Context Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
Example-dataBK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts META_Deleted_Ind
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0
Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0
Asa hellip hellip hellip hellip hellip hellip
NSR-Station
[adres]Source NSS1 table y
More difficult to be virtualisedDepends on semantic gap with source
But do make virtual when lsquostreaming datarsquo is necessary
Because we have PS layerExposing all columns not necessary
Refactoring is more easyhellip
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Relations between Concepts + Context
In a historical way
hellipMerger of Data Vault Link + Link Satellite
Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point
in time
Connector (C)
rwerschkull
nllinkedincominrogierwerschkull
Explicitly defining a driving key as metadatahellip
Gives business understanding
Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)
bull is not registered as a new lsquoconnectionrsquo
Connector Driving key
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Connector Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
Driving Key
NS-Travelcard
+Checkin timestamp
Example-Data
NSR-StationNS-
Travelcard
NS-
Trainseries
rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
BK_NSR-
Station-from
BK_NSR-
Station-to
BK_NS_
Trainseries
BK_NS-Travelcard Checkin
timestamp
Checkout
timestamp
META_Load_dts META_Load_end_dts
Asd Ut IC 855 3528 0234 2073
1234
5-4-2016 84932 5-4-2016 94012 6-4-2016
220000
31-12-9999
000000
Ut Asd IC 855 3528 0234 2073
1234
5-4-2016 181009 5-4-2016 185520 6-4-2016
220000
31-12-9999
000000
We add Two mandatory
HELPEREntities
Are we there yet No
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
To help switching from sources that are Tied together by technical (surrogate) keyshellip
To a Business Key based model
Itrsquos a LOOKUP table that translates the technical to the
Business Key
Business Alias
Example
NSR-Station
[valuation]Source NSS2 table p
[description]SourceNSS1 table x[adres]
Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
BA-NSS2
Key Lookup voor NSS1 source tables
Key Lookup voor NSS2 source tables
Example-data
NSR-Station
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
Key Lookup voor NSS1 source tables
Business Key NSS1_Surrogate_key
Ut 123522
Asd 666323
Asa 222443
hellip hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
Has a 1 on (01) relation with a Business Concept
More difficult to be virtualised Lookup table should be kept small
Therefore DO NOT do key lookup in Concept Context entity
Load generate together with BC
Preferably lsquoin memoryrsquo somehowhellip
BA important Details
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Integrate the validity timelines of Concept Contextsbelonging to a Business Concept
Like a Data Vault Point-in-time construct
But Mandatory
And with a clearly defined and performantapproach
BC-Timeline
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-Station
[valuation]Source NSS2 table p
[adres]Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BK_NSR-
Station
WOZ
waarde
Waarde
Ratingbureau X
META_Laad_dts META_Laad_eind_dts
Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959
Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959
Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000
BK_NSR-
Station
Combined_Load_dts Combined_Load_end_dts
Ut 5-6-2013 220000 1-1-2014 215959
Ut 1-1-2014 220000 1-1-2015 215959
Ut 1-1-2015 220000 4-7-2015 215959
Ut 4-7-2015 220000 1-3-2016 215959
Ut 1-3-2016 220000 31-12-9999 000000
Asd 5-6-2013 220000 31-12-9999 000000
BK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959
Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000
Asa hellip hellip hellip hellip hellip hellip
BCT
modelling
rwerschkull
nllinkedincominrogierwerschkull
What
Makes PS-3Ca Different Ensemble
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
rwerschkull
nllinkedincominrogierwerschkull
1) Explicitly Splitting The work
Data
+History
Subjects
+Integration
rwerschkull
nllinkedincominrogierwerschkull
2) NO HASHED BUSINeSS KEYS
or surrogate keys
httpwwwcannabisculturecomfilesimages6hashbrickJPG
Only
Concatenatedones
rwerschkull
nllinkedincominrogierwerschkull
3) Less
joinsRelation
+Technical validity timeline
+ Relation context
Together in one entityPhoto credit Public Domain
rwerschkull
nllinkedincominrogierwerschkull
4) Explicit
HelpEREntities
Business Alias
Business Component Timeline
+ explicitly define Driving key(s)
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
Entity levelUnique key
Functional Description
Delivering party
Owner Responsible
MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)
hellip
Persistent Staging Metadata-1
rwerschkull
nllinkedincominrogierwerschkull
Column level [Load Date Timestamp]
[Load End Date Timestamp]
[Deleted Flag] OR delete as new record
[Source system] on table file level (lowest possible)
Load End Date Timestamp possible but difficulthellipRequires updates
Persistent Staging Metadata-2
rwerschkull
nllinkedincominrogierwerschkull
ACID is possible in HIVE
ACID Makes Updates possibleBy registering updates as lsquonew datarsquo
Reconciliation compacting when idle at user command
Use ORC files
PLUS changing the HIVE configurationhellip
UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)
rwerschkull
nllinkedincominrogierwerschkull
HivePut semi structured data = variable columns in MAP data type
OR use Data storage type that supports schema-evolutionAVRO (ORC in development)
Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo
Schema can be different for every row
What about SEMI-STRUCTURED Data
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
3C - how
rwerschkull
nllinkedincominrogierwerschkull
Always starts with Conceptual data modeling
NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic
No Link Satellites
No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo
Explicit Helper entities
Like Data Vault(2) BUT
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
a UNIQUE Domain specific point of integration
hellipa business entity
hellipwithin itrsquos own domain
hellipdoes not necessarily need to be Enterprise Wide
Business Concept (BC)
rwerschkull
nllinkedincominrogierwerschkull
Why not lsquoenterprise widersquoCompany
Customer
Sales Customer
International
Sales Customer
Local
Sales Customer
Marketing Customer
Customer hellip
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description]
[Owner Responsible]
Column level [Load Date Timestamp]
[Source system] on table file level (lowest possible)
Business Concept Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example-Data
NSR-Station
NS-
Travelcard
NS-
Trainseries
Business Key
IC|855
IC|8852
Sp|7455
St|16050
hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
NS-
Traveller
Business Key
3528 0234 2073 1234
3528 0234 2073 5678
hellip
Business Key
CRM-RW123456
CRM-LAS224466
hellip
rwerschkull
nllinkedincominrogierwerschkull
Most easy entity to be virtualised(if performance allows)
No Hashing amp No surrogateBUSINESSKEYS
(not by default at least)
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Containts Context about a Concept In a historical way
hellipLike a Data Vault Satellite
Every CC belongs to only one BC
Seperate entity per source system table stream
Concept Context (CC)
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Concept Context Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
Example-dataBK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts META_Deleted_Ind
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0
Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0
Asa hellip hellip hellip hellip hellip hellip
NSR-Station
[adres]Source NSS1 table y
More difficult to be virtualisedDepends on semantic gap with source
But do make virtual when lsquostreaming datarsquo is necessary
Because we have PS layerExposing all columns not necessary
Refactoring is more easyhellip
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Relations between Concepts + Context
In a historical way
hellipMerger of Data Vault Link + Link Satellite
Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point
in time
Connector (C)
rwerschkull
nllinkedincominrogierwerschkull
Explicitly defining a driving key as metadatahellip
Gives business understanding
Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)
bull is not registered as a new lsquoconnectionrsquo
Connector Driving key
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Connector Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
Driving Key
NS-Travelcard
+Checkin timestamp
Example-Data
NSR-StationNS-
Travelcard
NS-
Trainseries
rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
BK_NSR-
Station-from
BK_NSR-
Station-to
BK_NS_
Trainseries
BK_NS-Travelcard Checkin
timestamp
Checkout
timestamp
META_Load_dts META_Load_end_dts
Asd Ut IC 855 3528 0234 2073
1234
5-4-2016 84932 5-4-2016 94012 6-4-2016
220000
31-12-9999
000000
Ut Asd IC 855 3528 0234 2073
1234
5-4-2016 181009 5-4-2016 185520 6-4-2016
220000
31-12-9999
000000
We add Two mandatory
HELPEREntities
Are we there yet No
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
To help switching from sources that are Tied together by technical (surrogate) keyshellip
To a Business Key based model
Itrsquos a LOOKUP table that translates the technical to the
Business Key
Business Alias
Example
NSR-Station
[valuation]Source NSS2 table p
[description]SourceNSS1 table x[adres]
Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
BA-NSS2
Key Lookup voor NSS1 source tables
Key Lookup voor NSS2 source tables
Example-data
NSR-Station
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
Key Lookup voor NSS1 source tables
Business Key NSS1_Surrogate_key
Ut 123522
Asd 666323
Asa 222443
hellip hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
Has a 1 on (01) relation with a Business Concept
More difficult to be virtualised Lookup table should be kept small
Therefore DO NOT do key lookup in Concept Context entity
Load generate together with BC
Preferably lsquoin memoryrsquo somehowhellip
BA important Details
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Integrate the validity timelines of Concept Contextsbelonging to a Business Concept
Like a Data Vault Point-in-time construct
But Mandatory
And with a clearly defined and performantapproach
BC-Timeline
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-Station
[valuation]Source NSS2 table p
[adres]Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BK_NSR-
Station
WOZ
waarde
Waarde
Ratingbureau X
META_Laad_dts META_Laad_eind_dts
Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959
Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959
Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000
BK_NSR-
Station
Combined_Load_dts Combined_Load_end_dts
Ut 5-6-2013 220000 1-1-2014 215959
Ut 1-1-2014 220000 1-1-2015 215959
Ut 1-1-2015 220000 4-7-2015 215959
Ut 4-7-2015 220000 1-3-2016 215959
Ut 1-3-2016 220000 31-12-9999 000000
Asd 5-6-2013 220000 31-12-9999 000000
BK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959
Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000
Asa hellip hellip hellip hellip hellip hellip
BCT
modelling
rwerschkull
nllinkedincominrogierwerschkull
What
Makes PS-3Ca Different Ensemble
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
rwerschkull
nllinkedincominrogierwerschkull
1) Explicitly Splitting The work
Data
+History
Subjects
+Integration
rwerschkull
nllinkedincominrogierwerschkull
2) NO HASHED BUSINeSS KEYS
or surrogate keys
httpwwwcannabisculturecomfilesimages6hashbrickJPG
Only
Concatenatedones
rwerschkull
nllinkedincominrogierwerschkull
3) Less
joinsRelation
+Technical validity timeline
+ Relation context
Together in one entityPhoto credit Public Domain
rwerschkull
nllinkedincominrogierwerschkull
4) Explicit
HelpEREntities
Business Alias
Business Component Timeline
+ explicitly define Driving key(s)
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
Column level [Load Date Timestamp]
[Load End Date Timestamp]
[Deleted Flag] OR delete as new record
[Source system] on table file level (lowest possible)
Load End Date Timestamp possible but difficulthellipRequires updates
Persistent Staging Metadata-2
rwerschkull
nllinkedincominrogierwerschkull
ACID is possible in HIVE
ACID Makes Updates possibleBy registering updates as lsquonew datarsquo
Reconciliation compacting when idle at user command
Use ORC files
PLUS changing the HIVE configurationhellip
UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)
rwerschkull
nllinkedincominrogierwerschkull
HivePut semi structured data = variable columns in MAP data type
OR use Data storage type that supports schema-evolutionAVRO (ORC in development)
Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo
Schema can be different for every row
What about SEMI-STRUCTURED Data
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
3C - how
rwerschkull
nllinkedincominrogierwerschkull
Always starts with Conceptual data modeling
NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic
No Link Satellites
No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo
Explicit Helper entities
Like Data Vault(2) BUT
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
a UNIQUE Domain specific point of integration
hellipa business entity
hellipwithin itrsquos own domain
hellipdoes not necessarily need to be Enterprise Wide
Business Concept (BC)
rwerschkull
nllinkedincominrogierwerschkull
Why not lsquoenterprise widersquoCompany
Customer
Sales Customer
International
Sales Customer
Local
Sales Customer
Marketing Customer
Customer hellip
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description]
[Owner Responsible]
Column level [Load Date Timestamp]
[Source system] on table file level (lowest possible)
Business Concept Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example-Data
NSR-Station
NS-
Travelcard
NS-
Trainseries
Business Key
IC|855
IC|8852
Sp|7455
St|16050
hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
NS-
Traveller
Business Key
3528 0234 2073 1234
3528 0234 2073 5678
hellip
Business Key
CRM-RW123456
CRM-LAS224466
hellip
rwerschkull
nllinkedincominrogierwerschkull
Most easy entity to be virtualised(if performance allows)
No Hashing amp No surrogateBUSINESSKEYS
(not by default at least)
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Containts Context about a Concept In a historical way
hellipLike a Data Vault Satellite
Every CC belongs to only one BC
Seperate entity per source system table stream
Concept Context (CC)
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Concept Context Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
Example-dataBK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts META_Deleted_Ind
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0
Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0
Asa hellip hellip hellip hellip hellip hellip
NSR-Station
[adres]Source NSS1 table y
More difficult to be virtualisedDepends on semantic gap with source
But do make virtual when lsquostreaming datarsquo is necessary
Because we have PS layerExposing all columns not necessary
Refactoring is more easyhellip
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Relations between Concepts + Context
In a historical way
hellipMerger of Data Vault Link + Link Satellite
Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point
in time
Connector (C)
rwerschkull
nllinkedincominrogierwerschkull
Explicitly defining a driving key as metadatahellip
Gives business understanding
Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)
bull is not registered as a new lsquoconnectionrsquo
Connector Driving key
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Connector Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
Driving Key
NS-Travelcard
+Checkin timestamp
Example-Data
NSR-StationNS-
Travelcard
NS-
Trainseries
rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
BK_NSR-
Station-from
BK_NSR-
Station-to
BK_NS_
Trainseries
BK_NS-Travelcard Checkin
timestamp
Checkout
timestamp
META_Load_dts META_Load_end_dts
Asd Ut IC 855 3528 0234 2073
1234
5-4-2016 84932 5-4-2016 94012 6-4-2016
220000
31-12-9999
000000
Ut Asd IC 855 3528 0234 2073
1234
5-4-2016 181009 5-4-2016 185520 6-4-2016
220000
31-12-9999
000000
We add Two mandatory
HELPEREntities
Are we there yet No
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
To help switching from sources that are Tied together by technical (surrogate) keyshellip
To a Business Key based model
Itrsquos a LOOKUP table that translates the technical to the
Business Key
Business Alias
Example
NSR-Station
[valuation]Source NSS2 table p
[description]SourceNSS1 table x[adres]
Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
BA-NSS2
Key Lookup voor NSS1 source tables
Key Lookup voor NSS2 source tables
Example-data
NSR-Station
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
Key Lookup voor NSS1 source tables
Business Key NSS1_Surrogate_key
Ut 123522
Asd 666323
Asa 222443
hellip hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
Has a 1 on (01) relation with a Business Concept
More difficult to be virtualised Lookup table should be kept small
Therefore DO NOT do key lookup in Concept Context entity
Load generate together with BC
Preferably lsquoin memoryrsquo somehowhellip
BA important Details
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Integrate the validity timelines of Concept Contextsbelonging to a Business Concept
Like a Data Vault Point-in-time construct
But Mandatory
And with a clearly defined and performantapproach
BC-Timeline
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-Station
[valuation]Source NSS2 table p
[adres]Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BK_NSR-
Station
WOZ
waarde
Waarde
Ratingbureau X
META_Laad_dts META_Laad_eind_dts
Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959
Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959
Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000
BK_NSR-
Station
Combined_Load_dts Combined_Load_end_dts
Ut 5-6-2013 220000 1-1-2014 215959
Ut 1-1-2014 220000 1-1-2015 215959
Ut 1-1-2015 220000 4-7-2015 215959
Ut 4-7-2015 220000 1-3-2016 215959
Ut 1-3-2016 220000 31-12-9999 000000
Asd 5-6-2013 220000 31-12-9999 000000
BK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959
Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000
Asa hellip hellip hellip hellip hellip hellip
BCT
modelling
rwerschkull
nllinkedincominrogierwerschkull
What
Makes PS-3Ca Different Ensemble
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
rwerschkull
nllinkedincominrogierwerschkull
1) Explicitly Splitting The work
Data
+History
Subjects
+Integration
rwerschkull
nllinkedincominrogierwerschkull
2) NO HASHED BUSINeSS KEYS
or surrogate keys
httpwwwcannabisculturecomfilesimages6hashbrickJPG
Only
Concatenatedones
rwerschkull
nllinkedincominrogierwerschkull
3) Less
joinsRelation
+Technical validity timeline
+ Relation context
Together in one entityPhoto credit Public Domain
rwerschkull
nllinkedincominrogierwerschkull
4) Explicit
HelpEREntities
Business Alias
Business Component Timeline
+ explicitly define Driving key(s)
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
ACID is possible in HIVE
ACID Makes Updates possibleBy registering updates as lsquonew datarsquo
Reconciliation compacting when idle at user command
Use ORC files
PLUS changing the HIVE configurationhellip
UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)
rwerschkull
nllinkedincominrogierwerschkull
HivePut semi structured data = variable columns in MAP data type
OR use Data storage type that supports schema-evolutionAVRO (ORC in development)
Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo
Schema can be different for every row
What about SEMI-STRUCTURED Data
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
3C - how
rwerschkull
nllinkedincominrogierwerschkull
Always starts with Conceptual data modeling
NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic
No Link Satellites
No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo
Explicit Helper entities
Like Data Vault(2) BUT
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
a UNIQUE Domain specific point of integration
hellipa business entity
hellipwithin itrsquos own domain
hellipdoes not necessarily need to be Enterprise Wide
Business Concept (BC)
rwerschkull
nllinkedincominrogierwerschkull
Why not lsquoenterprise widersquoCompany
Customer
Sales Customer
International
Sales Customer
Local
Sales Customer
Marketing Customer
Customer hellip
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description]
[Owner Responsible]
Column level [Load Date Timestamp]
[Source system] on table file level (lowest possible)
Business Concept Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example-Data
NSR-Station
NS-
Travelcard
NS-
Trainseries
Business Key
IC|855
IC|8852
Sp|7455
St|16050
hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
NS-
Traveller
Business Key
3528 0234 2073 1234
3528 0234 2073 5678
hellip
Business Key
CRM-RW123456
CRM-LAS224466
hellip
rwerschkull
nllinkedincominrogierwerschkull
Most easy entity to be virtualised(if performance allows)
No Hashing amp No surrogateBUSINESSKEYS
(not by default at least)
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Containts Context about a Concept In a historical way
hellipLike a Data Vault Satellite
Every CC belongs to only one BC
Seperate entity per source system table stream
Concept Context (CC)
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Concept Context Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
Example-dataBK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts META_Deleted_Ind
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0
Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0
Asa hellip hellip hellip hellip hellip hellip
NSR-Station
[adres]Source NSS1 table y
More difficult to be virtualisedDepends on semantic gap with source
But do make virtual when lsquostreaming datarsquo is necessary
Because we have PS layerExposing all columns not necessary
Refactoring is more easyhellip
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Relations between Concepts + Context
In a historical way
hellipMerger of Data Vault Link + Link Satellite
Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point
in time
Connector (C)
rwerschkull
nllinkedincominrogierwerschkull
Explicitly defining a driving key as metadatahellip
Gives business understanding
Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)
bull is not registered as a new lsquoconnectionrsquo
Connector Driving key
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Connector Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
Driving Key
NS-Travelcard
+Checkin timestamp
Example-Data
NSR-StationNS-
Travelcard
NS-
Trainseries
rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
BK_NSR-
Station-from
BK_NSR-
Station-to
BK_NS_
Trainseries
BK_NS-Travelcard Checkin
timestamp
Checkout
timestamp
META_Load_dts META_Load_end_dts
Asd Ut IC 855 3528 0234 2073
1234
5-4-2016 84932 5-4-2016 94012 6-4-2016
220000
31-12-9999
000000
Ut Asd IC 855 3528 0234 2073
1234
5-4-2016 181009 5-4-2016 185520 6-4-2016
220000
31-12-9999
000000
We add Two mandatory
HELPEREntities
Are we there yet No
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
To help switching from sources that are Tied together by technical (surrogate) keyshellip
To a Business Key based model
Itrsquos a LOOKUP table that translates the technical to the
Business Key
Business Alias
Example
NSR-Station
[valuation]Source NSS2 table p
[description]SourceNSS1 table x[adres]
Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
BA-NSS2
Key Lookup voor NSS1 source tables
Key Lookup voor NSS2 source tables
Example-data
NSR-Station
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
Key Lookup voor NSS1 source tables
Business Key NSS1_Surrogate_key
Ut 123522
Asd 666323
Asa 222443
hellip hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
Has a 1 on (01) relation with a Business Concept
More difficult to be virtualised Lookup table should be kept small
Therefore DO NOT do key lookup in Concept Context entity
Load generate together with BC
Preferably lsquoin memoryrsquo somehowhellip
BA important Details
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Integrate the validity timelines of Concept Contextsbelonging to a Business Concept
Like a Data Vault Point-in-time construct
But Mandatory
And with a clearly defined and performantapproach
BC-Timeline
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-Station
[valuation]Source NSS2 table p
[adres]Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BK_NSR-
Station
WOZ
waarde
Waarde
Ratingbureau X
META_Laad_dts META_Laad_eind_dts
Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959
Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959
Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000
BK_NSR-
Station
Combined_Load_dts Combined_Load_end_dts
Ut 5-6-2013 220000 1-1-2014 215959
Ut 1-1-2014 220000 1-1-2015 215959
Ut 1-1-2015 220000 4-7-2015 215959
Ut 4-7-2015 220000 1-3-2016 215959
Ut 1-3-2016 220000 31-12-9999 000000
Asd 5-6-2013 220000 31-12-9999 000000
BK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959
Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000
Asa hellip hellip hellip hellip hellip hellip
BCT
modelling
rwerschkull
nllinkedincominrogierwerschkull
What
Makes PS-3Ca Different Ensemble
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
rwerschkull
nllinkedincominrogierwerschkull
1) Explicitly Splitting The work
Data
+History
Subjects
+Integration
rwerschkull
nllinkedincominrogierwerschkull
2) NO HASHED BUSINeSS KEYS
or surrogate keys
httpwwwcannabisculturecomfilesimages6hashbrickJPG
Only
Concatenatedones
rwerschkull
nllinkedincominrogierwerschkull
3) Less
joinsRelation
+Technical validity timeline
+ Relation context
Together in one entityPhoto credit Public Domain
rwerschkull
nllinkedincominrogierwerschkull
4) Explicit
HelpEREntities
Business Alias
Business Component Timeline
+ explicitly define Driving key(s)
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
HivePut semi structured data = variable columns in MAP data type
OR use Data storage type that supports schema-evolutionAVRO (ORC in development)
Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo
Schema can be different for every row
What about SEMI-STRUCTURED Data
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
3C - how
rwerschkull
nllinkedincominrogierwerschkull
Always starts with Conceptual data modeling
NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic
No Link Satellites
No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo
Explicit Helper entities
Like Data Vault(2) BUT
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
a UNIQUE Domain specific point of integration
hellipa business entity
hellipwithin itrsquos own domain
hellipdoes not necessarily need to be Enterprise Wide
Business Concept (BC)
rwerschkull
nllinkedincominrogierwerschkull
Why not lsquoenterprise widersquoCompany
Customer
Sales Customer
International
Sales Customer
Local
Sales Customer
Marketing Customer
Customer hellip
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description]
[Owner Responsible]
Column level [Load Date Timestamp]
[Source system] on table file level (lowest possible)
Business Concept Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example-Data
NSR-Station
NS-
Travelcard
NS-
Trainseries
Business Key
IC|855
IC|8852
Sp|7455
St|16050
hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
NS-
Traveller
Business Key
3528 0234 2073 1234
3528 0234 2073 5678
hellip
Business Key
CRM-RW123456
CRM-LAS224466
hellip
rwerschkull
nllinkedincominrogierwerschkull
Most easy entity to be virtualised(if performance allows)
No Hashing amp No surrogateBUSINESSKEYS
(not by default at least)
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Containts Context about a Concept In a historical way
hellipLike a Data Vault Satellite
Every CC belongs to only one BC
Seperate entity per source system table stream
Concept Context (CC)
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Concept Context Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
Example-dataBK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts META_Deleted_Ind
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0
Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0
Asa hellip hellip hellip hellip hellip hellip
NSR-Station
[adres]Source NSS1 table y
More difficult to be virtualisedDepends on semantic gap with source
But do make virtual when lsquostreaming datarsquo is necessary
Because we have PS layerExposing all columns not necessary
Refactoring is more easyhellip
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Relations between Concepts + Context
In a historical way
hellipMerger of Data Vault Link + Link Satellite
Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point
in time
Connector (C)
rwerschkull
nllinkedincominrogierwerschkull
Explicitly defining a driving key as metadatahellip
Gives business understanding
Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)
bull is not registered as a new lsquoconnectionrsquo
Connector Driving key
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Connector Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
Driving Key
NS-Travelcard
+Checkin timestamp
Example-Data
NSR-StationNS-
Travelcard
NS-
Trainseries
rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
BK_NSR-
Station-from
BK_NSR-
Station-to
BK_NS_
Trainseries
BK_NS-Travelcard Checkin
timestamp
Checkout
timestamp
META_Load_dts META_Load_end_dts
Asd Ut IC 855 3528 0234 2073
1234
5-4-2016 84932 5-4-2016 94012 6-4-2016
220000
31-12-9999
000000
Ut Asd IC 855 3528 0234 2073
1234
5-4-2016 181009 5-4-2016 185520 6-4-2016
220000
31-12-9999
000000
We add Two mandatory
HELPEREntities
Are we there yet No
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
To help switching from sources that are Tied together by technical (surrogate) keyshellip
To a Business Key based model
Itrsquos a LOOKUP table that translates the technical to the
Business Key
Business Alias
Example
NSR-Station
[valuation]Source NSS2 table p
[description]SourceNSS1 table x[adres]
Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
BA-NSS2
Key Lookup voor NSS1 source tables
Key Lookup voor NSS2 source tables
Example-data
NSR-Station
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
Key Lookup voor NSS1 source tables
Business Key NSS1_Surrogate_key
Ut 123522
Asd 666323
Asa 222443
hellip hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
Has a 1 on (01) relation with a Business Concept
More difficult to be virtualised Lookup table should be kept small
Therefore DO NOT do key lookup in Concept Context entity
Load generate together with BC
Preferably lsquoin memoryrsquo somehowhellip
BA important Details
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Integrate the validity timelines of Concept Contextsbelonging to a Business Concept
Like a Data Vault Point-in-time construct
But Mandatory
And with a clearly defined and performantapproach
BC-Timeline
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-Station
[valuation]Source NSS2 table p
[adres]Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BK_NSR-
Station
WOZ
waarde
Waarde
Ratingbureau X
META_Laad_dts META_Laad_eind_dts
Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959
Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959
Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000
BK_NSR-
Station
Combined_Load_dts Combined_Load_end_dts
Ut 5-6-2013 220000 1-1-2014 215959
Ut 1-1-2014 220000 1-1-2015 215959
Ut 1-1-2015 220000 4-7-2015 215959
Ut 4-7-2015 220000 1-3-2016 215959
Ut 1-3-2016 220000 31-12-9999 000000
Asd 5-6-2013 220000 31-12-9999 000000
BK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959
Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000
Asa hellip hellip hellip hellip hellip hellip
BCT
modelling
rwerschkull
nllinkedincominrogierwerschkull
What
Makes PS-3Ca Different Ensemble
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
rwerschkull
nllinkedincominrogierwerschkull
1) Explicitly Splitting The work
Data
+History
Subjects
+Integration
rwerschkull
nllinkedincominrogierwerschkull
2) NO HASHED BUSINeSS KEYS
or surrogate keys
httpwwwcannabisculturecomfilesimages6hashbrickJPG
Only
Concatenatedones
rwerschkull
nllinkedincominrogierwerschkull
3) Less
joinsRelation
+Technical validity timeline
+ Relation context
Together in one entityPhoto credit Public Domain
rwerschkull
nllinkedincominrogierwerschkull
4) Explicit
HelpEREntities
Business Alias
Business Component Timeline
+ explicitly define Driving key(s)
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
3C - how
rwerschkull
nllinkedincominrogierwerschkull
Always starts with Conceptual data modeling
NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic
No Link Satellites
No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo
Explicit Helper entities
Like Data Vault(2) BUT
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
a UNIQUE Domain specific point of integration
hellipa business entity
hellipwithin itrsquos own domain
hellipdoes not necessarily need to be Enterprise Wide
Business Concept (BC)
rwerschkull
nllinkedincominrogierwerschkull
Why not lsquoenterprise widersquoCompany
Customer
Sales Customer
International
Sales Customer
Local
Sales Customer
Marketing Customer
Customer hellip
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description]
[Owner Responsible]
Column level [Load Date Timestamp]
[Source system] on table file level (lowest possible)
Business Concept Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example-Data
NSR-Station
NS-
Travelcard
NS-
Trainseries
Business Key
IC|855
IC|8852
Sp|7455
St|16050
hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
NS-
Traveller
Business Key
3528 0234 2073 1234
3528 0234 2073 5678
hellip
Business Key
CRM-RW123456
CRM-LAS224466
hellip
rwerschkull
nllinkedincominrogierwerschkull
Most easy entity to be virtualised(if performance allows)
No Hashing amp No surrogateBUSINESSKEYS
(not by default at least)
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Containts Context about a Concept In a historical way
hellipLike a Data Vault Satellite
Every CC belongs to only one BC
Seperate entity per source system table stream
Concept Context (CC)
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Concept Context Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
Example-dataBK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts META_Deleted_Ind
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0
Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0
Asa hellip hellip hellip hellip hellip hellip
NSR-Station
[adres]Source NSS1 table y
More difficult to be virtualisedDepends on semantic gap with source
But do make virtual when lsquostreaming datarsquo is necessary
Because we have PS layerExposing all columns not necessary
Refactoring is more easyhellip
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Relations between Concepts + Context
In a historical way
hellipMerger of Data Vault Link + Link Satellite
Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point
in time
Connector (C)
rwerschkull
nllinkedincominrogierwerschkull
Explicitly defining a driving key as metadatahellip
Gives business understanding
Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)
bull is not registered as a new lsquoconnectionrsquo
Connector Driving key
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Connector Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
Driving Key
NS-Travelcard
+Checkin timestamp
Example-Data
NSR-StationNS-
Travelcard
NS-
Trainseries
rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
BK_NSR-
Station-from
BK_NSR-
Station-to
BK_NS_
Trainseries
BK_NS-Travelcard Checkin
timestamp
Checkout
timestamp
META_Load_dts META_Load_end_dts
Asd Ut IC 855 3528 0234 2073
1234
5-4-2016 84932 5-4-2016 94012 6-4-2016
220000
31-12-9999
000000
Ut Asd IC 855 3528 0234 2073
1234
5-4-2016 181009 5-4-2016 185520 6-4-2016
220000
31-12-9999
000000
We add Two mandatory
HELPEREntities
Are we there yet No
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
To help switching from sources that are Tied together by technical (surrogate) keyshellip
To a Business Key based model
Itrsquos a LOOKUP table that translates the technical to the
Business Key
Business Alias
Example
NSR-Station
[valuation]Source NSS2 table p
[description]SourceNSS1 table x[adres]
Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
BA-NSS2
Key Lookup voor NSS1 source tables
Key Lookup voor NSS2 source tables
Example-data
NSR-Station
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
Key Lookup voor NSS1 source tables
Business Key NSS1_Surrogate_key
Ut 123522
Asd 666323
Asa 222443
hellip hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
Has a 1 on (01) relation with a Business Concept
More difficult to be virtualised Lookup table should be kept small
Therefore DO NOT do key lookup in Concept Context entity
Load generate together with BC
Preferably lsquoin memoryrsquo somehowhellip
BA important Details
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Integrate the validity timelines of Concept Contextsbelonging to a Business Concept
Like a Data Vault Point-in-time construct
But Mandatory
And with a clearly defined and performantapproach
BC-Timeline
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-Station
[valuation]Source NSS2 table p
[adres]Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BK_NSR-
Station
WOZ
waarde
Waarde
Ratingbureau X
META_Laad_dts META_Laad_eind_dts
Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959
Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959
Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000
BK_NSR-
Station
Combined_Load_dts Combined_Load_end_dts
Ut 5-6-2013 220000 1-1-2014 215959
Ut 1-1-2014 220000 1-1-2015 215959
Ut 1-1-2015 220000 4-7-2015 215959
Ut 4-7-2015 220000 1-3-2016 215959
Ut 1-3-2016 220000 31-12-9999 000000
Asd 5-6-2013 220000 31-12-9999 000000
BK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959
Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000
Asa hellip hellip hellip hellip hellip hellip
BCT
modelling
rwerschkull
nllinkedincominrogierwerschkull
What
Makes PS-3Ca Different Ensemble
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
rwerschkull
nllinkedincominrogierwerschkull
1) Explicitly Splitting The work
Data
+History
Subjects
+Integration
rwerschkull
nllinkedincominrogierwerschkull
2) NO HASHED BUSINeSS KEYS
or surrogate keys
httpwwwcannabisculturecomfilesimages6hashbrickJPG
Only
Concatenatedones
rwerschkull
nllinkedincominrogierwerschkull
3) Less
joinsRelation
+Technical validity timeline
+ Relation context
Together in one entityPhoto credit Public Domain
rwerschkull
nllinkedincominrogierwerschkull
4) Explicit
HelpEREntities
Business Alias
Business Component Timeline
+ explicitly define Driving key(s)
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
3C - how
rwerschkull
nllinkedincominrogierwerschkull
Always starts with Conceptual data modeling
NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic
No Link Satellites
No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo
Explicit Helper entities
Like Data Vault(2) BUT
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
a UNIQUE Domain specific point of integration
hellipa business entity
hellipwithin itrsquos own domain
hellipdoes not necessarily need to be Enterprise Wide
Business Concept (BC)
rwerschkull
nllinkedincominrogierwerschkull
Why not lsquoenterprise widersquoCompany
Customer
Sales Customer
International
Sales Customer
Local
Sales Customer
Marketing Customer
Customer hellip
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description]
[Owner Responsible]
Column level [Load Date Timestamp]
[Source system] on table file level (lowest possible)
Business Concept Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example-Data
NSR-Station
NS-
Travelcard
NS-
Trainseries
Business Key
IC|855
IC|8852
Sp|7455
St|16050
hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
NS-
Traveller
Business Key
3528 0234 2073 1234
3528 0234 2073 5678
hellip
Business Key
CRM-RW123456
CRM-LAS224466
hellip
rwerschkull
nllinkedincominrogierwerschkull
Most easy entity to be virtualised(if performance allows)
No Hashing amp No surrogateBUSINESSKEYS
(not by default at least)
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Containts Context about a Concept In a historical way
hellipLike a Data Vault Satellite
Every CC belongs to only one BC
Seperate entity per source system table stream
Concept Context (CC)
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Concept Context Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
Example-dataBK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts META_Deleted_Ind
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0
Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0
Asa hellip hellip hellip hellip hellip hellip
NSR-Station
[adres]Source NSS1 table y
More difficult to be virtualisedDepends on semantic gap with source
But do make virtual when lsquostreaming datarsquo is necessary
Because we have PS layerExposing all columns not necessary
Refactoring is more easyhellip
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Relations between Concepts + Context
In a historical way
hellipMerger of Data Vault Link + Link Satellite
Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point
in time
Connector (C)
rwerschkull
nllinkedincominrogierwerschkull
Explicitly defining a driving key as metadatahellip
Gives business understanding
Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)
bull is not registered as a new lsquoconnectionrsquo
Connector Driving key
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Connector Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
Driving Key
NS-Travelcard
+Checkin timestamp
Example-Data
NSR-StationNS-
Travelcard
NS-
Trainseries
rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
BK_NSR-
Station-from
BK_NSR-
Station-to
BK_NS_
Trainseries
BK_NS-Travelcard Checkin
timestamp
Checkout
timestamp
META_Load_dts META_Load_end_dts
Asd Ut IC 855 3528 0234 2073
1234
5-4-2016 84932 5-4-2016 94012 6-4-2016
220000
31-12-9999
000000
Ut Asd IC 855 3528 0234 2073
1234
5-4-2016 181009 5-4-2016 185520 6-4-2016
220000
31-12-9999
000000
We add Two mandatory
HELPEREntities
Are we there yet No
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
To help switching from sources that are Tied together by technical (surrogate) keyshellip
To a Business Key based model
Itrsquos a LOOKUP table that translates the technical to the
Business Key
Business Alias
Example
NSR-Station
[valuation]Source NSS2 table p
[description]SourceNSS1 table x[adres]
Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
BA-NSS2
Key Lookup voor NSS1 source tables
Key Lookup voor NSS2 source tables
Example-data
NSR-Station
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
Key Lookup voor NSS1 source tables
Business Key NSS1_Surrogate_key
Ut 123522
Asd 666323
Asa 222443
hellip hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
Has a 1 on (01) relation with a Business Concept
More difficult to be virtualised Lookup table should be kept small
Therefore DO NOT do key lookup in Concept Context entity
Load generate together with BC
Preferably lsquoin memoryrsquo somehowhellip
BA important Details
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Integrate the validity timelines of Concept Contextsbelonging to a Business Concept
Like a Data Vault Point-in-time construct
But Mandatory
And with a clearly defined and performantapproach
BC-Timeline
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-Station
[valuation]Source NSS2 table p
[adres]Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BK_NSR-
Station
WOZ
waarde
Waarde
Ratingbureau X
META_Laad_dts META_Laad_eind_dts
Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959
Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959
Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000
BK_NSR-
Station
Combined_Load_dts Combined_Load_end_dts
Ut 5-6-2013 220000 1-1-2014 215959
Ut 1-1-2014 220000 1-1-2015 215959
Ut 1-1-2015 220000 4-7-2015 215959
Ut 4-7-2015 220000 1-3-2016 215959
Ut 1-3-2016 220000 31-12-9999 000000
Asd 5-6-2013 220000 31-12-9999 000000
BK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959
Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000
Asa hellip hellip hellip hellip hellip hellip
BCT
modelling
rwerschkull
nllinkedincominrogierwerschkull
What
Makes PS-3Ca Different Ensemble
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
rwerschkull
nllinkedincominrogierwerschkull
1) Explicitly Splitting The work
Data
+History
Subjects
+Integration
rwerschkull
nllinkedincominrogierwerschkull
2) NO HASHED BUSINeSS KEYS
or surrogate keys
httpwwwcannabisculturecomfilesimages6hashbrickJPG
Only
Concatenatedones
rwerschkull
nllinkedincominrogierwerschkull
3) Less
joinsRelation
+Technical validity timeline
+ Relation context
Together in one entityPhoto credit Public Domain
rwerschkull
nllinkedincominrogierwerschkull
4) Explicit
HelpEREntities
Business Alias
Business Component Timeline
+ explicitly define Driving key(s)
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
Always starts with Conceptual data modeling
NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic
No Link Satellites
No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo
Explicit Helper entities
Like Data Vault(2) BUT
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
a UNIQUE Domain specific point of integration
hellipa business entity
hellipwithin itrsquos own domain
hellipdoes not necessarily need to be Enterprise Wide
Business Concept (BC)
rwerschkull
nllinkedincominrogierwerschkull
Why not lsquoenterprise widersquoCompany
Customer
Sales Customer
International
Sales Customer
Local
Sales Customer
Marketing Customer
Customer hellip
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description]
[Owner Responsible]
Column level [Load Date Timestamp]
[Source system] on table file level (lowest possible)
Business Concept Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example-Data
NSR-Station
NS-
Travelcard
NS-
Trainseries
Business Key
IC|855
IC|8852
Sp|7455
St|16050
hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
NS-
Traveller
Business Key
3528 0234 2073 1234
3528 0234 2073 5678
hellip
Business Key
CRM-RW123456
CRM-LAS224466
hellip
rwerschkull
nllinkedincominrogierwerschkull
Most easy entity to be virtualised(if performance allows)
No Hashing amp No surrogateBUSINESSKEYS
(not by default at least)
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Containts Context about a Concept In a historical way
hellipLike a Data Vault Satellite
Every CC belongs to only one BC
Seperate entity per source system table stream
Concept Context (CC)
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Concept Context Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
Example-dataBK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts META_Deleted_Ind
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0
Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0
Asa hellip hellip hellip hellip hellip hellip
NSR-Station
[adres]Source NSS1 table y
More difficult to be virtualisedDepends on semantic gap with source
But do make virtual when lsquostreaming datarsquo is necessary
Because we have PS layerExposing all columns not necessary
Refactoring is more easyhellip
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Relations between Concepts + Context
In a historical way
hellipMerger of Data Vault Link + Link Satellite
Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point
in time
Connector (C)
rwerschkull
nllinkedincominrogierwerschkull
Explicitly defining a driving key as metadatahellip
Gives business understanding
Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)
bull is not registered as a new lsquoconnectionrsquo
Connector Driving key
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Connector Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
Driving Key
NS-Travelcard
+Checkin timestamp
Example-Data
NSR-StationNS-
Travelcard
NS-
Trainseries
rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
BK_NSR-
Station-from
BK_NSR-
Station-to
BK_NS_
Trainseries
BK_NS-Travelcard Checkin
timestamp
Checkout
timestamp
META_Load_dts META_Load_end_dts
Asd Ut IC 855 3528 0234 2073
1234
5-4-2016 84932 5-4-2016 94012 6-4-2016
220000
31-12-9999
000000
Ut Asd IC 855 3528 0234 2073
1234
5-4-2016 181009 5-4-2016 185520 6-4-2016
220000
31-12-9999
000000
We add Two mandatory
HELPEREntities
Are we there yet No
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
To help switching from sources that are Tied together by technical (surrogate) keyshellip
To a Business Key based model
Itrsquos a LOOKUP table that translates the technical to the
Business Key
Business Alias
Example
NSR-Station
[valuation]Source NSS2 table p
[description]SourceNSS1 table x[adres]
Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
BA-NSS2
Key Lookup voor NSS1 source tables
Key Lookup voor NSS2 source tables
Example-data
NSR-Station
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
Key Lookup voor NSS1 source tables
Business Key NSS1_Surrogate_key
Ut 123522
Asd 666323
Asa 222443
hellip hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
Has a 1 on (01) relation with a Business Concept
More difficult to be virtualised Lookup table should be kept small
Therefore DO NOT do key lookup in Concept Context entity
Load generate together with BC
Preferably lsquoin memoryrsquo somehowhellip
BA important Details
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Integrate the validity timelines of Concept Contextsbelonging to a Business Concept
Like a Data Vault Point-in-time construct
But Mandatory
And with a clearly defined and performantapproach
BC-Timeline
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-Station
[valuation]Source NSS2 table p
[adres]Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BK_NSR-
Station
WOZ
waarde
Waarde
Ratingbureau X
META_Laad_dts META_Laad_eind_dts
Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959
Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959
Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000
BK_NSR-
Station
Combined_Load_dts Combined_Load_end_dts
Ut 5-6-2013 220000 1-1-2014 215959
Ut 1-1-2014 220000 1-1-2015 215959
Ut 1-1-2015 220000 4-7-2015 215959
Ut 4-7-2015 220000 1-3-2016 215959
Ut 1-3-2016 220000 31-12-9999 000000
Asd 5-6-2013 220000 31-12-9999 000000
BK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959
Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000
Asa hellip hellip hellip hellip hellip hellip
BCT
modelling
rwerschkull
nllinkedincominrogierwerschkull
What
Makes PS-3Ca Different Ensemble
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
rwerschkull
nllinkedincominrogierwerschkull
1) Explicitly Splitting The work
Data
+History
Subjects
+Integration
rwerschkull
nllinkedincominrogierwerschkull
2) NO HASHED BUSINeSS KEYS
or surrogate keys
httpwwwcannabisculturecomfilesimages6hashbrickJPG
Only
Concatenatedones
rwerschkull
nllinkedincominrogierwerschkull
3) Less
joinsRelation
+Technical validity timeline
+ Relation context
Together in one entityPhoto credit Public Domain
rwerschkull
nllinkedincominrogierwerschkull
4) Explicit
HelpEREntities
Business Alias
Business Component Timeline
+ explicitly define Driving key(s)
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
a UNIQUE Domain specific point of integration
hellipa business entity
hellipwithin itrsquos own domain
hellipdoes not necessarily need to be Enterprise Wide
Business Concept (BC)
rwerschkull
nllinkedincominrogierwerschkull
Why not lsquoenterprise widersquoCompany
Customer
Sales Customer
International
Sales Customer
Local
Sales Customer
Marketing Customer
Customer hellip
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description]
[Owner Responsible]
Column level [Load Date Timestamp]
[Source system] on table file level (lowest possible)
Business Concept Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example-Data
NSR-Station
NS-
Travelcard
NS-
Trainseries
Business Key
IC|855
IC|8852
Sp|7455
St|16050
hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
NS-
Traveller
Business Key
3528 0234 2073 1234
3528 0234 2073 5678
hellip
Business Key
CRM-RW123456
CRM-LAS224466
hellip
rwerschkull
nllinkedincominrogierwerschkull
Most easy entity to be virtualised(if performance allows)
No Hashing amp No surrogateBUSINESSKEYS
(not by default at least)
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Containts Context about a Concept In a historical way
hellipLike a Data Vault Satellite
Every CC belongs to only one BC
Seperate entity per source system table stream
Concept Context (CC)
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Concept Context Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
Example-dataBK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts META_Deleted_Ind
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0
Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0
Asa hellip hellip hellip hellip hellip hellip
NSR-Station
[adres]Source NSS1 table y
More difficult to be virtualisedDepends on semantic gap with source
But do make virtual when lsquostreaming datarsquo is necessary
Because we have PS layerExposing all columns not necessary
Refactoring is more easyhellip
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Relations between Concepts + Context
In a historical way
hellipMerger of Data Vault Link + Link Satellite
Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point
in time
Connector (C)
rwerschkull
nllinkedincominrogierwerschkull
Explicitly defining a driving key as metadatahellip
Gives business understanding
Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)
bull is not registered as a new lsquoconnectionrsquo
Connector Driving key
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Connector Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
Driving Key
NS-Travelcard
+Checkin timestamp
Example-Data
NSR-StationNS-
Travelcard
NS-
Trainseries
rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
BK_NSR-
Station-from
BK_NSR-
Station-to
BK_NS_
Trainseries
BK_NS-Travelcard Checkin
timestamp
Checkout
timestamp
META_Load_dts META_Load_end_dts
Asd Ut IC 855 3528 0234 2073
1234
5-4-2016 84932 5-4-2016 94012 6-4-2016
220000
31-12-9999
000000
Ut Asd IC 855 3528 0234 2073
1234
5-4-2016 181009 5-4-2016 185520 6-4-2016
220000
31-12-9999
000000
We add Two mandatory
HELPEREntities
Are we there yet No
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
To help switching from sources that are Tied together by technical (surrogate) keyshellip
To a Business Key based model
Itrsquos a LOOKUP table that translates the technical to the
Business Key
Business Alias
Example
NSR-Station
[valuation]Source NSS2 table p
[description]SourceNSS1 table x[adres]
Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
BA-NSS2
Key Lookup voor NSS1 source tables
Key Lookup voor NSS2 source tables
Example-data
NSR-Station
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
Key Lookup voor NSS1 source tables
Business Key NSS1_Surrogate_key
Ut 123522
Asd 666323
Asa 222443
hellip hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
Has a 1 on (01) relation with a Business Concept
More difficult to be virtualised Lookup table should be kept small
Therefore DO NOT do key lookup in Concept Context entity
Load generate together with BC
Preferably lsquoin memoryrsquo somehowhellip
BA important Details
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Integrate the validity timelines of Concept Contextsbelonging to a Business Concept
Like a Data Vault Point-in-time construct
But Mandatory
And with a clearly defined and performantapproach
BC-Timeline
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-Station
[valuation]Source NSS2 table p
[adres]Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BK_NSR-
Station
WOZ
waarde
Waarde
Ratingbureau X
META_Laad_dts META_Laad_eind_dts
Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959
Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959
Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000
BK_NSR-
Station
Combined_Load_dts Combined_Load_end_dts
Ut 5-6-2013 220000 1-1-2014 215959
Ut 1-1-2014 220000 1-1-2015 215959
Ut 1-1-2015 220000 4-7-2015 215959
Ut 4-7-2015 220000 1-3-2016 215959
Ut 1-3-2016 220000 31-12-9999 000000
Asd 5-6-2013 220000 31-12-9999 000000
BK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959
Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000
Asa hellip hellip hellip hellip hellip hellip
BCT
modelling
rwerschkull
nllinkedincominrogierwerschkull
What
Makes PS-3Ca Different Ensemble
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
rwerschkull
nllinkedincominrogierwerschkull
1) Explicitly Splitting The work
Data
+History
Subjects
+Integration
rwerschkull
nllinkedincominrogierwerschkull
2) NO HASHED BUSINeSS KEYS
or surrogate keys
httpwwwcannabisculturecomfilesimages6hashbrickJPG
Only
Concatenatedones
rwerschkull
nllinkedincominrogierwerschkull
3) Less
joinsRelation
+Technical validity timeline
+ Relation context
Together in one entityPhoto credit Public Domain
rwerschkull
nllinkedincominrogierwerschkull
4) Explicit
HelpEREntities
Business Alias
Business Component Timeline
+ explicitly define Driving key(s)
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
a UNIQUE Domain specific point of integration
hellipa business entity
hellipwithin itrsquos own domain
hellipdoes not necessarily need to be Enterprise Wide
Business Concept (BC)
rwerschkull
nllinkedincominrogierwerschkull
Why not lsquoenterprise widersquoCompany
Customer
Sales Customer
International
Sales Customer
Local
Sales Customer
Marketing Customer
Customer hellip
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description]
[Owner Responsible]
Column level [Load Date Timestamp]
[Source system] on table file level (lowest possible)
Business Concept Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example-Data
NSR-Station
NS-
Travelcard
NS-
Trainseries
Business Key
IC|855
IC|8852
Sp|7455
St|16050
hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
NS-
Traveller
Business Key
3528 0234 2073 1234
3528 0234 2073 5678
hellip
Business Key
CRM-RW123456
CRM-LAS224466
hellip
rwerschkull
nllinkedincominrogierwerschkull
Most easy entity to be virtualised(if performance allows)
No Hashing amp No surrogateBUSINESSKEYS
(not by default at least)
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Containts Context about a Concept In a historical way
hellipLike a Data Vault Satellite
Every CC belongs to only one BC
Seperate entity per source system table stream
Concept Context (CC)
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Concept Context Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
Example-dataBK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts META_Deleted_Ind
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0
Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0
Asa hellip hellip hellip hellip hellip hellip
NSR-Station
[adres]Source NSS1 table y
More difficult to be virtualisedDepends on semantic gap with source
But do make virtual when lsquostreaming datarsquo is necessary
Because we have PS layerExposing all columns not necessary
Refactoring is more easyhellip
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Relations between Concepts + Context
In a historical way
hellipMerger of Data Vault Link + Link Satellite
Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point
in time
Connector (C)
rwerschkull
nllinkedincominrogierwerschkull
Explicitly defining a driving key as metadatahellip
Gives business understanding
Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)
bull is not registered as a new lsquoconnectionrsquo
Connector Driving key
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Connector Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
Driving Key
NS-Travelcard
+Checkin timestamp
Example-Data
NSR-StationNS-
Travelcard
NS-
Trainseries
rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
BK_NSR-
Station-from
BK_NSR-
Station-to
BK_NS_
Trainseries
BK_NS-Travelcard Checkin
timestamp
Checkout
timestamp
META_Load_dts META_Load_end_dts
Asd Ut IC 855 3528 0234 2073
1234
5-4-2016 84932 5-4-2016 94012 6-4-2016
220000
31-12-9999
000000
Ut Asd IC 855 3528 0234 2073
1234
5-4-2016 181009 5-4-2016 185520 6-4-2016
220000
31-12-9999
000000
We add Two mandatory
HELPEREntities
Are we there yet No
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
To help switching from sources that are Tied together by technical (surrogate) keyshellip
To a Business Key based model
Itrsquos a LOOKUP table that translates the technical to the
Business Key
Business Alias
Example
NSR-Station
[valuation]Source NSS2 table p
[description]SourceNSS1 table x[adres]
Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
BA-NSS2
Key Lookup voor NSS1 source tables
Key Lookup voor NSS2 source tables
Example-data
NSR-Station
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
Key Lookup voor NSS1 source tables
Business Key NSS1_Surrogate_key
Ut 123522
Asd 666323
Asa 222443
hellip hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
Has a 1 on (01) relation with a Business Concept
More difficult to be virtualised Lookup table should be kept small
Therefore DO NOT do key lookup in Concept Context entity
Load generate together with BC
Preferably lsquoin memoryrsquo somehowhellip
BA important Details
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Integrate the validity timelines of Concept Contextsbelonging to a Business Concept
Like a Data Vault Point-in-time construct
But Mandatory
And with a clearly defined and performantapproach
BC-Timeline
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-Station
[valuation]Source NSS2 table p
[adres]Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BK_NSR-
Station
WOZ
waarde
Waarde
Ratingbureau X
META_Laad_dts META_Laad_eind_dts
Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959
Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959
Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000
BK_NSR-
Station
Combined_Load_dts Combined_Load_end_dts
Ut 5-6-2013 220000 1-1-2014 215959
Ut 1-1-2014 220000 1-1-2015 215959
Ut 1-1-2015 220000 4-7-2015 215959
Ut 4-7-2015 220000 1-3-2016 215959
Ut 1-3-2016 220000 31-12-9999 000000
Asd 5-6-2013 220000 31-12-9999 000000
BK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959
Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000
Asa hellip hellip hellip hellip hellip hellip
BCT
modelling
rwerschkull
nllinkedincominrogierwerschkull
What
Makes PS-3Ca Different Ensemble
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
rwerschkull
nllinkedincominrogierwerschkull
1) Explicitly Splitting The work
Data
+History
Subjects
+Integration
rwerschkull
nllinkedincominrogierwerschkull
2) NO HASHED BUSINeSS KEYS
or surrogate keys
httpwwwcannabisculturecomfilesimages6hashbrickJPG
Only
Concatenatedones
rwerschkull
nllinkedincominrogierwerschkull
3) Less
joinsRelation
+Technical validity timeline
+ Relation context
Together in one entityPhoto credit Public Domain
rwerschkull
nllinkedincominrogierwerschkull
4) Explicit
HelpEREntities
Business Alias
Business Component Timeline
+ explicitly define Driving key(s)
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
Why not lsquoenterprise widersquoCompany
Customer
Sales Customer
International
Sales Customer
Local
Sales Customer
Marketing Customer
Customer hellip
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description]
[Owner Responsible]
Column level [Load Date Timestamp]
[Source system] on table file level (lowest possible)
Business Concept Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example-Data
NSR-Station
NS-
Travelcard
NS-
Trainseries
Business Key
IC|855
IC|8852
Sp|7455
St|16050
hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
NS-
Traveller
Business Key
3528 0234 2073 1234
3528 0234 2073 5678
hellip
Business Key
CRM-RW123456
CRM-LAS224466
hellip
rwerschkull
nllinkedincominrogierwerschkull
Most easy entity to be virtualised(if performance allows)
No Hashing amp No surrogateBUSINESSKEYS
(not by default at least)
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Containts Context about a Concept In a historical way
hellipLike a Data Vault Satellite
Every CC belongs to only one BC
Seperate entity per source system table stream
Concept Context (CC)
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Concept Context Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
Example-dataBK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts META_Deleted_Ind
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0
Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0
Asa hellip hellip hellip hellip hellip hellip
NSR-Station
[adres]Source NSS1 table y
More difficult to be virtualisedDepends on semantic gap with source
But do make virtual when lsquostreaming datarsquo is necessary
Because we have PS layerExposing all columns not necessary
Refactoring is more easyhellip
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Relations between Concepts + Context
In a historical way
hellipMerger of Data Vault Link + Link Satellite
Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point
in time
Connector (C)
rwerschkull
nllinkedincominrogierwerschkull
Explicitly defining a driving key as metadatahellip
Gives business understanding
Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)
bull is not registered as a new lsquoconnectionrsquo
Connector Driving key
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Connector Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
Driving Key
NS-Travelcard
+Checkin timestamp
Example-Data
NSR-StationNS-
Travelcard
NS-
Trainseries
rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
BK_NSR-
Station-from
BK_NSR-
Station-to
BK_NS_
Trainseries
BK_NS-Travelcard Checkin
timestamp
Checkout
timestamp
META_Load_dts META_Load_end_dts
Asd Ut IC 855 3528 0234 2073
1234
5-4-2016 84932 5-4-2016 94012 6-4-2016
220000
31-12-9999
000000
Ut Asd IC 855 3528 0234 2073
1234
5-4-2016 181009 5-4-2016 185520 6-4-2016
220000
31-12-9999
000000
We add Two mandatory
HELPEREntities
Are we there yet No
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
To help switching from sources that are Tied together by technical (surrogate) keyshellip
To a Business Key based model
Itrsquos a LOOKUP table that translates the technical to the
Business Key
Business Alias
Example
NSR-Station
[valuation]Source NSS2 table p
[description]SourceNSS1 table x[adres]
Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
BA-NSS2
Key Lookup voor NSS1 source tables
Key Lookup voor NSS2 source tables
Example-data
NSR-Station
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
Key Lookup voor NSS1 source tables
Business Key NSS1_Surrogate_key
Ut 123522
Asd 666323
Asa 222443
hellip hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
Has a 1 on (01) relation with a Business Concept
More difficult to be virtualised Lookup table should be kept small
Therefore DO NOT do key lookup in Concept Context entity
Load generate together with BC
Preferably lsquoin memoryrsquo somehowhellip
BA important Details
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Integrate the validity timelines of Concept Contextsbelonging to a Business Concept
Like a Data Vault Point-in-time construct
But Mandatory
And with a clearly defined and performantapproach
BC-Timeline
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-Station
[valuation]Source NSS2 table p
[adres]Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BK_NSR-
Station
WOZ
waarde
Waarde
Ratingbureau X
META_Laad_dts META_Laad_eind_dts
Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959
Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959
Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000
BK_NSR-
Station
Combined_Load_dts Combined_Load_end_dts
Ut 5-6-2013 220000 1-1-2014 215959
Ut 1-1-2014 220000 1-1-2015 215959
Ut 1-1-2015 220000 4-7-2015 215959
Ut 4-7-2015 220000 1-3-2016 215959
Ut 1-3-2016 220000 31-12-9999 000000
Asd 5-6-2013 220000 31-12-9999 000000
BK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959
Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000
Asa hellip hellip hellip hellip hellip hellip
BCT
modelling
rwerschkull
nllinkedincominrogierwerschkull
What
Makes PS-3Ca Different Ensemble
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
rwerschkull
nllinkedincominrogierwerschkull
1) Explicitly Splitting The work
Data
+History
Subjects
+Integration
rwerschkull
nllinkedincominrogierwerschkull
2) NO HASHED BUSINeSS KEYS
or surrogate keys
httpwwwcannabisculturecomfilesimages6hashbrickJPG
Only
Concatenatedones
rwerschkull
nllinkedincominrogierwerschkull
3) Less
joinsRelation
+Technical validity timeline
+ Relation context
Together in one entityPhoto credit Public Domain
rwerschkull
nllinkedincominrogierwerschkull
4) Explicit
HelpEREntities
Business Alias
Business Component Timeline
+ explicitly define Driving key(s)
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
Entity level [Description]
[Owner Responsible]
Column level [Load Date Timestamp]
[Source system] on table file level (lowest possible)
Business Concept Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example-Data
NSR-Station
NS-
Travelcard
NS-
Trainseries
Business Key
IC|855
IC|8852
Sp|7455
St|16050
hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
NS-
Traveller
Business Key
3528 0234 2073 1234
3528 0234 2073 5678
hellip
Business Key
CRM-RW123456
CRM-LAS224466
hellip
rwerschkull
nllinkedincominrogierwerschkull
Most easy entity to be virtualised(if performance allows)
No Hashing amp No surrogateBUSINESSKEYS
(not by default at least)
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Containts Context about a Concept In a historical way
hellipLike a Data Vault Satellite
Every CC belongs to only one BC
Seperate entity per source system table stream
Concept Context (CC)
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Concept Context Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
Example-dataBK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts META_Deleted_Ind
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0
Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0
Asa hellip hellip hellip hellip hellip hellip
NSR-Station
[adres]Source NSS1 table y
More difficult to be virtualisedDepends on semantic gap with source
But do make virtual when lsquostreaming datarsquo is necessary
Because we have PS layerExposing all columns not necessary
Refactoring is more easyhellip
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Relations between Concepts + Context
In a historical way
hellipMerger of Data Vault Link + Link Satellite
Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point
in time
Connector (C)
rwerschkull
nllinkedincominrogierwerschkull
Explicitly defining a driving key as metadatahellip
Gives business understanding
Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)
bull is not registered as a new lsquoconnectionrsquo
Connector Driving key
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Connector Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
Driving Key
NS-Travelcard
+Checkin timestamp
Example-Data
NSR-StationNS-
Travelcard
NS-
Trainseries
rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
BK_NSR-
Station-from
BK_NSR-
Station-to
BK_NS_
Trainseries
BK_NS-Travelcard Checkin
timestamp
Checkout
timestamp
META_Load_dts META_Load_end_dts
Asd Ut IC 855 3528 0234 2073
1234
5-4-2016 84932 5-4-2016 94012 6-4-2016
220000
31-12-9999
000000
Ut Asd IC 855 3528 0234 2073
1234
5-4-2016 181009 5-4-2016 185520 6-4-2016
220000
31-12-9999
000000
We add Two mandatory
HELPEREntities
Are we there yet No
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
To help switching from sources that are Tied together by technical (surrogate) keyshellip
To a Business Key based model
Itrsquos a LOOKUP table that translates the technical to the
Business Key
Business Alias
Example
NSR-Station
[valuation]Source NSS2 table p
[description]SourceNSS1 table x[adres]
Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
BA-NSS2
Key Lookup voor NSS1 source tables
Key Lookup voor NSS2 source tables
Example-data
NSR-Station
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
Key Lookup voor NSS1 source tables
Business Key NSS1_Surrogate_key
Ut 123522
Asd 666323
Asa 222443
hellip hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
Has a 1 on (01) relation with a Business Concept
More difficult to be virtualised Lookup table should be kept small
Therefore DO NOT do key lookup in Concept Context entity
Load generate together with BC
Preferably lsquoin memoryrsquo somehowhellip
BA important Details
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Integrate the validity timelines of Concept Contextsbelonging to a Business Concept
Like a Data Vault Point-in-time construct
But Mandatory
And with a clearly defined and performantapproach
BC-Timeline
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-Station
[valuation]Source NSS2 table p
[adres]Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BK_NSR-
Station
WOZ
waarde
Waarde
Ratingbureau X
META_Laad_dts META_Laad_eind_dts
Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959
Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959
Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000
BK_NSR-
Station
Combined_Load_dts Combined_Load_end_dts
Ut 5-6-2013 220000 1-1-2014 215959
Ut 1-1-2014 220000 1-1-2015 215959
Ut 1-1-2015 220000 4-7-2015 215959
Ut 4-7-2015 220000 1-3-2016 215959
Ut 1-3-2016 220000 31-12-9999 000000
Asd 5-6-2013 220000 31-12-9999 000000
BK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959
Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000
Asa hellip hellip hellip hellip hellip hellip
BCT
modelling
rwerschkull
nllinkedincominrogierwerschkull
What
Makes PS-3Ca Different Ensemble
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
rwerschkull
nllinkedincominrogierwerschkull
1) Explicitly Splitting The work
Data
+History
Subjects
+Integration
rwerschkull
nllinkedincominrogierwerschkull
2) NO HASHED BUSINeSS KEYS
or surrogate keys
httpwwwcannabisculturecomfilesimages6hashbrickJPG
Only
Concatenatedones
rwerschkull
nllinkedincominrogierwerschkull
3) Less
joinsRelation
+Technical validity timeline
+ Relation context
Together in one entityPhoto credit Public Domain
rwerschkull
nllinkedincominrogierwerschkull
4) Explicit
HelpEREntities
Business Alias
Business Component Timeline
+ explicitly define Driving key(s)
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
Example-Data
NSR-Station
NS-
Travelcard
NS-
Trainseries
Business Key
IC|855
IC|8852
Sp|7455
St|16050
hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
NS-
Traveller
Business Key
3528 0234 2073 1234
3528 0234 2073 5678
hellip
Business Key
CRM-RW123456
CRM-LAS224466
hellip
rwerschkull
nllinkedincominrogierwerschkull
Most easy entity to be virtualised(if performance allows)
No Hashing amp No surrogateBUSINESSKEYS
(not by default at least)
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Containts Context about a Concept In a historical way
hellipLike a Data Vault Satellite
Every CC belongs to only one BC
Seperate entity per source system table stream
Concept Context (CC)
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Concept Context Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
Example-dataBK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts META_Deleted_Ind
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0
Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0
Asa hellip hellip hellip hellip hellip hellip
NSR-Station
[adres]Source NSS1 table y
More difficult to be virtualisedDepends on semantic gap with source
But do make virtual when lsquostreaming datarsquo is necessary
Because we have PS layerExposing all columns not necessary
Refactoring is more easyhellip
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Relations between Concepts + Context
In a historical way
hellipMerger of Data Vault Link + Link Satellite
Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point
in time
Connector (C)
rwerschkull
nllinkedincominrogierwerschkull
Explicitly defining a driving key as metadatahellip
Gives business understanding
Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)
bull is not registered as a new lsquoconnectionrsquo
Connector Driving key
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Connector Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
Driving Key
NS-Travelcard
+Checkin timestamp
Example-Data
NSR-StationNS-
Travelcard
NS-
Trainseries
rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
BK_NSR-
Station-from
BK_NSR-
Station-to
BK_NS_
Trainseries
BK_NS-Travelcard Checkin
timestamp
Checkout
timestamp
META_Load_dts META_Load_end_dts
Asd Ut IC 855 3528 0234 2073
1234
5-4-2016 84932 5-4-2016 94012 6-4-2016
220000
31-12-9999
000000
Ut Asd IC 855 3528 0234 2073
1234
5-4-2016 181009 5-4-2016 185520 6-4-2016
220000
31-12-9999
000000
We add Two mandatory
HELPEREntities
Are we there yet No
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
To help switching from sources that are Tied together by technical (surrogate) keyshellip
To a Business Key based model
Itrsquos a LOOKUP table that translates the technical to the
Business Key
Business Alias
Example
NSR-Station
[valuation]Source NSS2 table p
[description]SourceNSS1 table x[adres]
Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
BA-NSS2
Key Lookup voor NSS1 source tables
Key Lookup voor NSS2 source tables
Example-data
NSR-Station
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
Key Lookup voor NSS1 source tables
Business Key NSS1_Surrogate_key
Ut 123522
Asd 666323
Asa 222443
hellip hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
Has a 1 on (01) relation with a Business Concept
More difficult to be virtualised Lookup table should be kept small
Therefore DO NOT do key lookup in Concept Context entity
Load generate together with BC
Preferably lsquoin memoryrsquo somehowhellip
BA important Details
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Integrate the validity timelines of Concept Contextsbelonging to a Business Concept
Like a Data Vault Point-in-time construct
But Mandatory
And with a clearly defined and performantapproach
BC-Timeline
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-Station
[valuation]Source NSS2 table p
[adres]Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BK_NSR-
Station
WOZ
waarde
Waarde
Ratingbureau X
META_Laad_dts META_Laad_eind_dts
Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959
Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959
Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000
BK_NSR-
Station
Combined_Load_dts Combined_Load_end_dts
Ut 5-6-2013 220000 1-1-2014 215959
Ut 1-1-2014 220000 1-1-2015 215959
Ut 1-1-2015 220000 4-7-2015 215959
Ut 4-7-2015 220000 1-3-2016 215959
Ut 1-3-2016 220000 31-12-9999 000000
Asd 5-6-2013 220000 31-12-9999 000000
BK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959
Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000
Asa hellip hellip hellip hellip hellip hellip
BCT
modelling
rwerschkull
nllinkedincominrogierwerschkull
What
Makes PS-3Ca Different Ensemble
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
rwerschkull
nllinkedincominrogierwerschkull
1) Explicitly Splitting The work
Data
+History
Subjects
+Integration
rwerschkull
nllinkedincominrogierwerschkull
2) NO HASHED BUSINeSS KEYS
or surrogate keys
httpwwwcannabisculturecomfilesimages6hashbrickJPG
Only
Concatenatedones
rwerschkull
nllinkedincominrogierwerschkull
3) Less
joinsRelation
+Technical validity timeline
+ Relation context
Together in one entityPhoto credit Public Domain
rwerschkull
nllinkedincominrogierwerschkull
4) Explicit
HelpEREntities
Business Alias
Business Component Timeline
+ explicitly define Driving key(s)
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
Most easy entity to be virtualised(if performance allows)
No Hashing amp No surrogateBUSINESSKEYS
(not by default at least)
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Containts Context about a Concept In a historical way
hellipLike a Data Vault Satellite
Every CC belongs to only one BC
Seperate entity per source system table stream
Concept Context (CC)
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Concept Context Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
Example-dataBK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts META_Deleted_Ind
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0
Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0
Asa hellip hellip hellip hellip hellip hellip
NSR-Station
[adres]Source NSS1 table y
More difficult to be virtualisedDepends on semantic gap with source
But do make virtual when lsquostreaming datarsquo is necessary
Because we have PS layerExposing all columns not necessary
Refactoring is more easyhellip
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Relations between Concepts + Context
In a historical way
hellipMerger of Data Vault Link + Link Satellite
Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point
in time
Connector (C)
rwerschkull
nllinkedincominrogierwerschkull
Explicitly defining a driving key as metadatahellip
Gives business understanding
Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)
bull is not registered as a new lsquoconnectionrsquo
Connector Driving key
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Connector Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
Driving Key
NS-Travelcard
+Checkin timestamp
Example-Data
NSR-StationNS-
Travelcard
NS-
Trainseries
rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
BK_NSR-
Station-from
BK_NSR-
Station-to
BK_NS_
Trainseries
BK_NS-Travelcard Checkin
timestamp
Checkout
timestamp
META_Load_dts META_Load_end_dts
Asd Ut IC 855 3528 0234 2073
1234
5-4-2016 84932 5-4-2016 94012 6-4-2016
220000
31-12-9999
000000
Ut Asd IC 855 3528 0234 2073
1234
5-4-2016 181009 5-4-2016 185520 6-4-2016
220000
31-12-9999
000000
We add Two mandatory
HELPEREntities
Are we there yet No
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
To help switching from sources that are Tied together by technical (surrogate) keyshellip
To a Business Key based model
Itrsquos a LOOKUP table that translates the technical to the
Business Key
Business Alias
Example
NSR-Station
[valuation]Source NSS2 table p
[description]SourceNSS1 table x[adres]
Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
BA-NSS2
Key Lookup voor NSS1 source tables
Key Lookup voor NSS2 source tables
Example-data
NSR-Station
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
Key Lookup voor NSS1 source tables
Business Key NSS1_Surrogate_key
Ut 123522
Asd 666323
Asa 222443
hellip hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
Has a 1 on (01) relation with a Business Concept
More difficult to be virtualised Lookup table should be kept small
Therefore DO NOT do key lookup in Concept Context entity
Load generate together with BC
Preferably lsquoin memoryrsquo somehowhellip
BA important Details
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Integrate the validity timelines of Concept Contextsbelonging to a Business Concept
Like a Data Vault Point-in-time construct
But Mandatory
And with a clearly defined and performantapproach
BC-Timeline
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-Station
[valuation]Source NSS2 table p
[adres]Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BK_NSR-
Station
WOZ
waarde
Waarde
Ratingbureau X
META_Laad_dts META_Laad_eind_dts
Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959
Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959
Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000
BK_NSR-
Station
Combined_Load_dts Combined_Load_end_dts
Ut 5-6-2013 220000 1-1-2014 215959
Ut 1-1-2014 220000 1-1-2015 215959
Ut 1-1-2015 220000 4-7-2015 215959
Ut 4-7-2015 220000 1-3-2016 215959
Ut 1-3-2016 220000 31-12-9999 000000
Asd 5-6-2013 220000 31-12-9999 000000
BK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959
Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000
Asa hellip hellip hellip hellip hellip hellip
BCT
modelling
rwerschkull
nllinkedincominrogierwerschkull
What
Makes PS-3Ca Different Ensemble
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
rwerschkull
nllinkedincominrogierwerschkull
1) Explicitly Splitting The work
Data
+History
Subjects
+Integration
rwerschkull
nllinkedincominrogierwerschkull
2) NO HASHED BUSINeSS KEYS
or surrogate keys
httpwwwcannabisculturecomfilesimages6hashbrickJPG
Only
Concatenatedones
rwerschkull
nllinkedincominrogierwerschkull
3) Less
joinsRelation
+Technical validity timeline
+ Relation context
Together in one entityPhoto credit Public Domain
rwerschkull
nllinkedincominrogierwerschkull
4) Explicit
HelpEREntities
Business Alias
Business Component Timeline
+ explicitly define Driving key(s)
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Containts Context about a Concept In a historical way
hellipLike a Data Vault Satellite
Every CC belongs to only one BC
Seperate entity per source system table stream
Concept Context (CC)
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Concept Context Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
Example-dataBK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts META_Deleted_Ind
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0
Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0
Asa hellip hellip hellip hellip hellip hellip
NSR-Station
[adres]Source NSS1 table y
More difficult to be virtualisedDepends on semantic gap with source
But do make virtual when lsquostreaming datarsquo is necessary
Because we have PS layerExposing all columns not necessary
Refactoring is more easyhellip
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Relations between Concepts + Context
In a historical way
hellipMerger of Data Vault Link + Link Satellite
Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point
in time
Connector (C)
rwerschkull
nllinkedincominrogierwerschkull
Explicitly defining a driving key as metadatahellip
Gives business understanding
Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)
bull is not registered as a new lsquoconnectionrsquo
Connector Driving key
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Connector Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
Driving Key
NS-Travelcard
+Checkin timestamp
Example-Data
NSR-StationNS-
Travelcard
NS-
Trainseries
rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
BK_NSR-
Station-from
BK_NSR-
Station-to
BK_NS_
Trainseries
BK_NS-Travelcard Checkin
timestamp
Checkout
timestamp
META_Load_dts META_Load_end_dts
Asd Ut IC 855 3528 0234 2073
1234
5-4-2016 84932 5-4-2016 94012 6-4-2016
220000
31-12-9999
000000
Ut Asd IC 855 3528 0234 2073
1234
5-4-2016 181009 5-4-2016 185520 6-4-2016
220000
31-12-9999
000000
We add Two mandatory
HELPEREntities
Are we there yet No
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
To help switching from sources that are Tied together by technical (surrogate) keyshellip
To a Business Key based model
Itrsquos a LOOKUP table that translates the technical to the
Business Key
Business Alias
Example
NSR-Station
[valuation]Source NSS2 table p
[description]SourceNSS1 table x[adres]
Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
BA-NSS2
Key Lookup voor NSS1 source tables
Key Lookup voor NSS2 source tables
Example-data
NSR-Station
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
Key Lookup voor NSS1 source tables
Business Key NSS1_Surrogate_key
Ut 123522
Asd 666323
Asa 222443
hellip hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
Has a 1 on (01) relation with a Business Concept
More difficult to be virtualised Lookup table should be kept small
Therefore DO NOT do key lookup in Concept Context entity
Load generate together with BC
Preferably lsquoin memoryrsquo somehowhellip
BA important Details
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Integrate the validity timelines of Concept Contextsbelonging to a Business Concept
Like a Data Vault Point-in-time construct
But Mandatory
And with a clearly defined and performantapproach
BC-Timeline
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-Station
[valuation]Source NSS2 table p
[adres]Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BK_NSR-
Station
WOZ
waarde
Waarde
Ratingbureau X
META_Laad_dts META_Laad_eind_dts
Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959
Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959
Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000
BK_NSR-
Station
Combined_Load_dts Combined_Load_end_dts
Ut 5-6-2013 220000 1-1-2014 215959
Ut 1-1-2014 220000 1-1-2015 215959
Ut 1-1-2015 220000 4-7-2015 215959
Ut 4-7-2015 220000 1-3-2016 215959
Ut 1-3-2016 220000 31-12-9999 000000
Asd 5-6-2013 220000 31-12-9999 000000
BK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959
Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000
Asa hellip hellip hellip hellip hellip hellip
BCT
modelling
rwerschkull
nllinkedincominrogierwerschkull
What
Makes PS-3Ca Different Ensemble
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
rwerschkull
nllinkedincominrogierwerschkull
1) Explicitly Splitting The work
Data
+History
Subjects
+Integration
rwerschkull
nllinkedincominrogierwerschkull
2) NO HASHED BUSINeSS KEYS
or surrogate keys
httpwwwcannabisculturecomfilesimages6hashbrickJPG
Only
Concatenatedones
rwerschkull
nllinkedincominrogierwerschkull
3) Less
joinsRelation
+Technical validity timeline
+ Relation context
Together in one entityPhoto credit Public Domain
rwerschkull
nllinkedincominrogierwerschkull
4) Explicit
HelpEREntities
Business Alias
Business Component Timeline
+ explicitly define Driving key(s)
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
Containts Context about a Concept In a historical way
hellipLike a Data Vault Satellite
Every CC belongs to only one BC
Seperate entity per source system table stream
Concept Context (CC)
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Concept Context Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
Example-dataBK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts META_Deleted_Ind
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0
Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0
Asa hellip hellip hellip hellip hellip hellip
NSR-Station
[adres]Source NSS1 table y
More difficult to be virtualisedDepends on semantic gap with source
But do make virtual when lsquostreaming datarsquo is necessary
Because we have PS layerExposing all columns not necessary
Refactoring is more easyhellip
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Relations between Concepts + Context
In a historical way
hellipMerger of Data Vault Link + Link Satellite
Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point
in time
Connector (C)
rwerschkull
nllinkedincominrogierwerschkull
Explicitly defining a driving key as metadatahellip
Gives business understanding
Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)
bull is not registered as a new lsquoconnectionrsquo
Connector Driving key
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Connector Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
Driving Key
NS-Travelcard
+Checkin timestamp
Example-Data
NSR-StationNS-
Travelcard
NS-
Trainseries
rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
BK_NSR-
Station-from
BK_NSR-
Station-to
BK_NS_
Trainseries
BK_NS-Travelcard Checkin
timestamp
Checkout
timestamp
META_Load_dts META_Load_end_dts
Asd Ut IC 855 3528 0234 2073
1234
5-4-2016 84932 5-4-2016 94012 6-4-2016
220000
31-12-9999
000000
Ut Asd IC 855 3528 0234 2073
1234
5-4-2016 181009 5-4-2016 185520 6-4-2016
220000
31-12-9999
000000
We add Two mandatory
HELPEREntities
Are we there yet No
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
To help switching from sources that are Tied together by technical (surrogate) keyshellip
To a Business Key based model
Itrsquos a LOOKUP table that translates the technical to the
Business Key
Business Alias
Example
NSR-Station
[valuation]Source NSS2 table p
[description]SourceNSS1 table x[adres]
Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
BA-NSS2
Key Lookup voor NSS1 source tables
Key Lookup voor NSS2 source tables
Example-data
NSR-Station
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
Key Lookup voor NSS1 source tables
Business Key NSS1_Surrogate_key
Ut 123522
Asd 666323
Asa 222443
hellip hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
Has a 1 on (01) relation with a Business Concept
More difficult to be virtualised Lookup table should be kept small
Therefore DO NOT do key lookup in Concept Context entity
Load generate together with BC
Preferably lsquoin memoryrsquo somehowhellip
BA important Details
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Integrate the validity timelines of Concept Contextsbelonging to a Business Concept
Like a Data Vault Point-in-time construct
But Mandatory
And with a clearly defined and performantapproach
BC-Timeline
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-Station
[valuation]Source NSS2 table p
[adres]Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BK_NSR-
Station
WOZ
waarde
Waarde
Ratingbureau X
META_Laad_dts META_Laad_eind_dts
Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959
Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959
Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000
BK_NSR-
Station
Combined_Load_dts Combined_Load_end_dts
Ut 5-6-2013 220000 1-1-2014 215959
Ut 1-1-2014 220000 1-1-2015 215959
Ut 1-1-2015 220000 4-7-2015 215959
Ut 4-7-2015 220000 1-3-2016 215959
Ut 1-3-2016 220000 31-12-9999 000000
Asd 5-6-2013 220000 31-12-9999 000000
BK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959
Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000
Asa hellip hellip hellip hellip hellip hellip
BCT
modelling
rwerschkull
nllinkedincominrogierwerschkull
What
Makes PS-3Ca Different Ensemble
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
rwerschkull
nllinkedincominrogierwerschkull
1) Explicitly Splitting The work
Data
+History
Subjects
+Integration
rwerschkull
nllinkedincominrogierwerschkull
2) NO HASHED BUSINeSS KEYS
or surrogate keys
httpwwwcannabisculturecomfilesimages6hashbrickJPG
Only
Concatenatedones
rwerschkull
nllinkedincominrogierwerschkull
3) Less
joinsRelation
+Technical validity timeline
+ Relation context
Together in one entityPhoto credit Public Domain
rwerschkull
nllinkedincominrogierwerschkull
4) Explicit
HelpEREntities
Business Alias
Business Component Timeline
+ explicitly define Driving key(s)
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Concept Context Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
Example-dataBK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts META_Deleted_Ind
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0
Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0
Asa hellip hellip hellip hellip hellip hellip
NSR-Station
[adres]Source NSS1 table y
More difficult to be virtualisedDepends on semantic gap with source
But do make virtual when lsquostreaming datarsquo is necessary
Because we have PS layerExposing all columns not necessary
Refactoring is more easyhellip
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Relations between Concepts + Context
In a historical way
hellipMerger of Data Vault Link + Link Satellite
Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point
in time
Connector (C)
rwerschkull
nllinkedincominrogierwerschkull
Explicitly defining a driving key as metadatahellip
Gives business understanding
Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)
bull is not registered as a new lsquoconnectionrsquo
Connector Driving key
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Connector Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
Driving Key
NS-Travelcard
+Checkin timestamp
Example-Data
NSR-StationNS-
Travelcard
NS-
Trainseries
rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
BK_NSR-
Station-from
BK_NSR-
Station-to
BK_NS_
Trainseries
BK_NS-Travelcard Checkin
timestamp
Checkout
timestamp
META_Load_dts META_Load_end_dts
Asd Ut IC 855 3528 0234 2073
1234
5-4-2016 84932 5-4-2016 94012 6-4-2016
220000
31-12-9999
000000
Ut Asd IC 855 3528 0234 2073
1234
5-4-2016 181009 5-4-2016 185520 6-4-2016
220000
31-12-9999
000000
We add Two mandatory
HELPEREntities
Are we there yet No
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
To help switching from sources that are Tied together by technical (surrogate) keyshellip
To a Business Key based model
Itrsquos a LOOKUP table that translates the technical to the
Business Key
Business Alias
Example
NSR-Station
[valuation]Source NSS2 table p
[description]SourceNSS1 table x[adres]
Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
BA-NSS2
Key Lookup voor NSS1 source tables
Key Lookup voor NSS2 source tables
Example-data
NSR-Station
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
Key Lookup voor NSS1 source tables
Business Key NSS1_Surrogate_key
Ut 123522
Asd 666323
Asa 222443
hellip hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
Has a 1 on (01) relation with a Business Concept
More difficult to be virtualised Lookup table should be kept small
Therefore DO NOT do key lookup in Concept Context entity
Load generate together with BC
Preferably lsquoin memoryrsquo somehowhellip
BA important Details
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Integrate the validity timelines of Concept Contextsbelonging to a Business Concept
Like a Data Vault Point-in-time construct
But Mandatory
And with a clearly defined and performantapproach
BC-Timeline
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-Station
[valuation]Source NSS2 table p
[adres]Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BK_NSR-
Station
WOZ
waarde
Waarde
Ratingbureau X
META_Laad_dts META_Laad_eind_dts
Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959
Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959
Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000
BK_NSR-
Station
Combined_Load_dts Combined_Load_end_dts
Ut 5-6-2013 220000 1-1-2014 215959
Ut 1-1-2014 220000 1-1-2015 215959
Ut 1-1-2015 220000 4-7-2015 215959
Ut 4-7-2015 220000 1-3-2016 215959
Ut 1-3-2016 220000 31-12-9999 000000
Asd 5-6-2013 220000 31-12-9999 000000
BK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959
Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000
Asa hellip hellip hellip hellip hellip hellip
BCT
modelling
rwerschkull
nllinkedincominrogierwerschkull
What
Makes PS-3Ca Different Ensemble
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
rwerschkull
nllinkedincominrogierwerschkull
1) Explicitly Splitting The work
Data
+History
Subjects
+Integration
rwerschkull
nllinkedincominrogierwerschkull
2) NO HASHED BUSINeSS KEYS
or surrogate keys
httpwwwcannabisculturecomfilesimages6hashbrickJPG
Only
Concatenatedones
rwerschkull
nllinkedincominrogierwerschkull
3) Less
joinsRelation
+Technical validity timeline
+ Relation context
Together in one entityPhoto credit Public Domain
rwerschkull
nllinkedincominrogierwerschkull
4) Explicit
HelpEREntities
Business Alias
Business Component Timeline
+ explicitly define Driving key(s)
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
Example-dataBK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts META_Deleted_Ind
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0
Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0
Asa hellip hellip hellip hellip hellip hellip
NSR-Station
[adres]Source NSS1 table y
More difficult to be virtualisedDepends on semantic gap with source
But do make virtual when lsquostreaming datarsquo is necessary
Because we have PS layerExposing all columns not necessary
Refactoring is more easyhellip
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Relations between Concepts + Context
In a historical way
hellipMerger of Data Vault Link + Link Satellite
Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point
in time
Connector (C)
rwerschkull
nllinkedincominrogierwerschkull
Explicitly defining a driving key as metadatahellip
Gives business understanding
Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)
bull is not registered as a new lsquoconnectionrsquo
Connector Driving key
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Connector Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
Driving Key
NS-Travelcard
+Checkin timestamp
Example-Data
NSR-StationNS-
Travelcard
NS-
Trainseries
rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
BK_NSR-
Station-from
BK_NSR-
Station-to
BK_NS_
Trainseries
BK_NS-Travelcard Checkin
timestamp
Checkout
timestamp
META_Load_dts META_Load_end_dts
Asd Ut IC 855 3528 0234 2073
1234
5-4-2016 84932 5-4-2016 94012 6-4-2016
220000
31-12-9999
000000
Ut Asd IC 855 3528 0234 2073
1234
5-4-2016 181009 5-4-2016 185520 6-4-2016
220000
31-12-9999
000000
We add Two mandatory
HELPEREntities
Are we there yet No
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
To help switching from sources that are Tied together by technical (surrogate) keyshellip
To a Business Key based model
Itrsquos a LOOKUP table that translates the technical to the
Business Key
Business Alias
Example
NSR-Station
[valuation]Source NSS2 table p
[description]SourceNSS1 table x[adres]
Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
BA-NSS2
Key Lookup voor NSS1 source tables
Key Lookup voor NSS2 source tables
Example-data
NSR-Station
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
Key Lookup voor NSS1 source tables
Business Key NSS1_Surrogate_key
Ut 123522
Asd 666323
Asa 222443
hellip hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
Has a 1 on (01) relation with a Business Concept
More difficult to be virtualised Lookup table should be kept small
Therefore DO NOT do key lookup in Concept Context entity
Load generate together with BC
Preferably lsquoin memoryrsquo somehowhellip
BA important Details
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Integrate the validity timelines of Concept Contextsbelonging to a Business Concept
Like a Data Vault Point-in-time construct
But Mandatory
And with a clearly defined and performantapproach
BC-Timeline
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-Station
[valuation]Source NSS2 table p
[adres]Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BK_NSR-
Station
WOZ
waarde
Waarde
Ratingbureau X
META_Laad_dts META_Laad_eind_dts
Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959
Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959
Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000
BK_NSR-
Station
Combined_Load_dts Combined_Load_end_dts
Ut 5-6-2013 220000 1-1-2014 215959
Ut 1-1-2014 220000 1-1-2015 215959
Ut 1-1-2015 220000 4-7-2015 215959
Ut 4-7-2015 220000 1-3-2016 215959
Ut 1-3-2016 220000 31-12-9999 000000
Asd 5-6-2013 220000 31-12-9999 000000
BK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959
Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000
Asa hellip hellip hellip hellip hellip hellip
BCT
modelling
rwerschkull
nllinkedincominrogierwerschkull
What
Makes PS-3Ca Different Ensemble
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
rwerschkull
nllinkedincominrogierwerschkull
1) Explicitly Splitting The work
Data
+History
Subjects
+Integration
rwerschkull
nllinkedincominrogierwerschkull
2) NO HASHED BUSINeSS KEYS
or surrogate keys
httpwwwcannabisculturecomfilesimages6hashbrickJPG
Only
Concatenatedones
rwerschkull
nllinkedincominrogierwerschkull
3) Less
joinsRelation
+Technical validity timeline
+ Relation context
Together in one entityPhoto credit Public Domain
rwerschkull
nllinkedincominrogierwerschkull
4) Explicit
HelpEREntities
Business Alias
Business Component Timeline
+ explicitly define Driving key(s)
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
Example-dataBK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts META_Deleted_Ind
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0
Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0
Asa hellip hellip hellip hellip hellip hellip
NSR-Station
[adres]Source NSS1 table y
More difficult to be virtualisedDepends on semantic gap with source
But do make virtual when lsquostreaming datarsquo is necessary
Because we have PS layerExposing all columns not necessary
Refactoring is more easyhellip
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Relations between Concepts + Context
In a historical way
hellipMerger of Data Vault Link + Link Satellite
Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point
in time
Connector (C)
rwerschkull
nllinkedincominrogierwerschkull
Explicitly defining a driving key as metadatahellip
Gives business understanding
Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)
bull is not registered as a new lsquoconnectionrsquo
Connector Driving key
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Connector Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
Driving Key
NS-Travelcard
+Checkin timestamp
Example-Data
NSR-StationNS-
Travelcard
NS-
Trainseries
rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
BK_NSR-
Station-from
BK_NSR-
Station-to
BK_NS_
Trainseries
BK_NS-Travelcard Checkin
timestamp
Checkout
timestamp
META_Load_dts META_Load_end_dts
Asd Ut IC 855 3528 0234 2073
1234
5-4-2016 84932 5-4-2016 94012 6-4-2016
220000
31-12-9999
000000
Ut Asd IC 855 3528 0234 2073
1234
5-4-2016 181009 5-4-2016 185520 6-4-2016
220000
31-12-9999
000000
We add Two mandatory
HELPEREntities
Are we there yet No
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
To help switching from sources that are Tied together by technical (surrogate) keyshellip
To a Business Key based model
Itrsquos a LOOKUP table that translates the technical to the
Business Key
Business Alias
Example
NSR-Station
[valuation]Source NSS2 table p
[description]SourceNSS1 table x[adres]
Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
BA-NSS2
Key Lookup voor NSS1 source tables
Key Lookup voor NSS2 source tables
Example-data
NSR-Station
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
Key Lookup voor NSS1 source tables
Business Key NSS1_Surrogate_key
Ut 123522
Asd 666323
Asa 222443
hellip hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
Has a 1 on (01) relation with a Business Concept
More difficult to be virtualised Lookup table should be kept small
Therefore DO NOT do key lookup in Concept Context entity
Load generate together with BC
Preferably lsquoin memoryrsquo somehowhellip
BA important Details
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Integrate the validity timelines of Concept Contextsbelonging to a Business Concept
Like a Data Vault Point-in-time construct
But Mandatory
And with a clearly defined and performantapproach
BC-Timeline
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-Station
[valuation]Source NSS2 table p
[adres]Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BK_NSR-
Station
WOZ
waarde
Waarde
Ratingbureau X
META_Laad_dts META_Laad_eind_dts
Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959
Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959
Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000
BK_NSR-
Station
Combined_Load_dts Combined_Load_end_dts
Ut 5-6-2013 220000 1-1-2014 215959
Ut 1-1-2014 220000 1-1-2015 215959
Ut 1-1-2015 220000 4-7-2015 215959
Ut 4-7-2015 220000 1-3-2016 215959
Ut 1-3-2016 220000 31-12-9999 000000
Asd 5-6-2013 220000 31-12-9999 000000
BK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959
Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000
Asa hellip hellip hellip hellip hellip hellip
BCT
modelling
rwerschkull
nllinkedincominrogierwerschkull
What
Makes PS-3Ca Different Ensemble
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
rwerschkull
nllinkedincominrogierwerschkull
1) Explicitly Splitting The work
Data
+History
Subjects
+Integration
rwerschkull
nllinkedincominrogierwerschkull
2) NO HASHED BUSINeSS KEYS
or surrogate keys
httpwwwcannabisculturecomfilesimages6hashbrickJPG
Only
Concatenatedones
rwerschkull
nllinkedincominrogierwerschkull
3) Less
joinsRelation
+Technical validity timeline
+ Relation context
Together in one entityPhoto credit Public Domain
rwerschkull
nllinkedincominrogierwerschkull
4) Explicit
HelpEREntities
Business Alias
Business Component Timeline
+ explicitly define Driving key(s)
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
More difficult to be virtualisedDepends on semantic gap with source
But do make virtual when lsquostreaming datarsquo is necessary
Because we have PS layerExposing all columns not necessary
Refactoring is more easyhellip
BC important notes
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Relations between Concepts + Context
In a historical way
hellipMerger of Data Vault Link + Link Satellite
Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point
in time
Connector (C)
rwerschkull
nllinkedincominrogierwerschkull
Explicitly defining a driving key as metadatahellip
Gives business understanding
Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)
bull is not registered as a new lsquoconnectionrsquo
Connector Driving key
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Connector Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
Driving Key
NS-Travelcard
+Checkin timestamp
Example-Data
NSR-StationNS-
Travelcard
NS-
Trainseries
rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
BK_NSR-
Station-from
BK_NSR-
Station-to
BK_NS_
Trainseries
BK_NS-Travelcard Checkin
timestamp
Checkout
timestamp
META_Load_dts META_Load_end_dts
Asd Ut IC 855 3528 0234 2073
1234
5-4-2016 84932 5-4-2016 94012 6-4-2016
220000
31-12-9999
000000
Ut Asd IC 855 3528 0234 2073
1234
5-4-2016 181009 5-4-2016 185520 6-4-2016
220000
31-12-9999
000000
We add Two mandatory
HELPEREntities
Are we there yet No
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
To help switching from sources that are Tied together by technical (surrogate) keyshellip
To a Business Key based model
Itrsquos a LOOKUP table that translates the technical to the
Business Key
Business Alias
Example
NSR-Station
[valuation]Source NSS2 table p
[description]SourceNSS1 table x[adres]
Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
BA-NSS2
Key Lookup voor NSS1 source tables
Key Lookup voor NSS2 source tables
Example-data
NSR-Station
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
Key Lookup voor NSS1 source tables
Business Key NSS1_Surrogate_key
Ut 123522
Asd 666323
Asa 222443
hellip hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
Has a 1 on (01) relation with a Business Concept
More difficult to be virtualised Lookup table should be kept small
Therefore DO NOT do key lookup in Concept Context entity
Load generate together with BC
Preferably lsquoin memoryrsquo somehowhellip
BA important Details
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Integrate the validity timelines of Concept Contextsbelonging to a Business Concept
Like a Data Vault Point-in-time construct
But Mandatory
And with a clearly defined and performantapproach
BC-Timeline
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-Station
[valuation]Source NSS2 table p
[adres]Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BK_NSR-
Station
WOZ
waarde
Waarde
Ratingbureau X
META_Laad_dts META_Laad_eind_dts
Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959
Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959
Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000
BK_NSR-
Station
Combined_Load_dts Combined_Load_end_dts
Ut 5-6-2013 220000 1-1-2014 215959
Ut 1-1-2014 220000 1-1-2015 215959
Ut 1-1-2015 220000 4-7-2015 215959
Ut 4-7-2015 220000 1-3-2016 215959
Ut 1-3-2016 220000 31-12-9999 000000
Asd 5-6-2013 220000 31-12-9999 000000
BK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959
Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000
Asa hellip hellip hellip hellip hellip hellip
BCT
modelling
rwerschkull
nllinkedincominrogierwerschkull
What
Makes PS-3Ca Different Ensemble
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
rwerschkull
nllinkedincominrogierwerschkull
1) Explicitly Splitting The work
Data
+History
Subjects
+Integration
rwerschkull
nllinkedincominrogierwerschkull
2) NO HASHED BUSINeSS KEYS
or surrogate keys
httpwwwcannabisculturecomfilesimages6hashbrickJPG
Only
Concatenatedones
rwerschkull
nllinkedincominrogierwerschkull
3) Less
joinsRelation
+Technical validity timeline
+ Relation context
Together in one entityPhoto credit Public Domain
rwerschkull
nllinkedincominrogierwerschkull
4) Explicit
HelpEREntities
Business Alias
Business Component Timeline
+ explicitly define Driving key(s)
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Relations between Concepts + Context
In a historical way
hellipMerger of Data Vault Link + Link Satellite
Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point
in time
Connector (C)
rwerschkull
nllinkedincominrogierwerschkull
Explicitly defining a driving key as metadatahellip
Gives business understanding
Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)
bull is not registered as a new lsquoconnectionrsquo
Connector Driving key
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Connector Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
Driving Key
NS-Travelcard
+Checkin timestamp
Example-Data
NSR-StationNS-
Travelcard
NS-
Trainseries
rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
BK_NSR-
Station-from
BK_NSR-
Station-to
BK_NS_
Trainseries
BK_NS-Travelcard Checkin
timestamp
Checkout
timestamp
META_Load_dts META_Load_end_dts
Asd Ut IC 855 3528 0234 2073
1234
5-4-2016 84932 5-4-2016 94012 6-4-2016
220000
31-12-9999
000000
Ut Asd IC 855 3528 0234 2073
1234
5-4-2016 181009 5-4-2016 185520 6-4-2016
220000
31-12-9999
000000
We add Two mandatory
HELPEREntities
Are we there yet No
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
To help switching from sources that are Tied together by technical (surrogate) keyshellip
To a Business Key based model
Itrsquos a LOOKUP table that translates the technical to the
Business Key
Business Alias
Example
NSR-Station
[valuation]Source NSS2 table p
[description]SourceNSS1 table x[adres]
Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
BA-NSS2
Key Lookup voor NSS1 source tables
Key Lookup voor NSS2 source tables
Example-data
NSR-Station
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
Key Lookup voor NSS1 source tables
Business Key NSS1_Surrogate_key
Ut 123522
Asd 666323
Asa 222443
hellip hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
Has a 1 on (01) relation with a Business Concept
More difficult to be virtualised Lookup table should be kept small
Therefore DO NOT do key lookup in Concept Context entity
Load generate together with BC
Preferably lsquoin memoryrsquo somehowhellip
BA important Details
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Integrate the validity timelines of Concept Contextsbelonging to a Business Concept
Like a Data Vault Point-in-time construct
But Mandatory
And with a clearly defined and performantapproach
BC-Timeline
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-Station
[valuation]Source NSS2 table p
[adres]Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BK_NSR-
Station
WOZ
waarde
Waarde
Ratingbureau X
META_Laad_dts META_Laad_eind_dts
Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959
Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959
Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000
BK_NSR-
Station
Combined_Load_dts Combined_Load_end_dts
Ut 5-6-2013 220000 1-1-2014 215959
Ut 1-1-2014 220000 1-1-2015 215959
Ut 1-1-2015 220000 4-7-2015 215959
Ut 4-7-2015 220000 1-3-2016 215959
Ut 1-3-2016 220000 31-12-9999 000000
Asd 5-6-2013 220000 31-12-9999 000000
BK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959
Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000
Asa hellip hellip hellip hellip hellip hellip
BCT
modelling
rwerschkull
nllinkedincominrogierwerschkull
What
Makes PS-3Ca Different Ensemble
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
rwerschkull
nllinkedincominrogierwerschkull
1) Explicitly Splitting The work
Data
+History
Subjects
+Integration
rwerschkull
nllinkedincominrogierwerschkull
2) NO HASHED BUSINeSS KEYS
or surrogate keys
httpwwwcannabisculturecomfilesimages6hashbrickJPG
Only
Concatenatedones
rwerschkull
nllinkedincominrogierwerschkull
3) Less
joinsRelation
+Technical validity timeline
+ Relation context
Together in one entityPhoto credit Public Domain
rwerschkull
nllinkedincominrogierwerschkull
4) Explicit
HelpEREntities
Business Alias
Business Component Timeline
+ explicitly define Driving key(s)
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
Relations between Concepts + Context
In a historical way
hellipMerger of Data Vault Link + Link Satellite
Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point
in time
Connector (C)
rwerschkull
nllinkedincominrogierwerschkull
Explicitly defining a driving key as metadatahellip
Gives business understanding
Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)
bull is not registered as a new lsquoconnectionrsquo
Connector Driving key
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Connector Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
Driving Key
NS-Travelcard
+Checkin timestamp
Example-Data
NSR-StationNS-
Travelcard
NS-
Trainseries
rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
BK_NSR-
Station-from
BK_NSR-
Station-to
BK_NS_
Trainseries
BK_NS-Travelcard Checkin
timestamp
Checkout
timestamp
META_Load_dts META_Load_end_dts
Asd Ut IC 855 3528 0234 2073
1234
5-4-2016 84932 5-4-2016 94012 6-4-2016
220000
31-12-9999
000000
Ut Asd IC 855 3528 0234 2073
1234
5-4-2016 181009 5-4-2016 185520 6-4-2016
220000
31-12-9999
000000
We add Two mandatory
HELPEREntities
Are we there yet No
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
To help switching from sources that are Tied together by technical (surrogate) keyshellip
To a Business Key based model
Itrsquos a LOOKUP table that translates the technical to the
Business Key
Business Alias
Example
NSR-Station
[valuation]Source NSS2 table p
[description]SourceNSS1 table x[adres]
Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
BA-NSS2
Key Lookup voor NSS1 source tables
Key Lookup voor NSS2 source tables
Example-data
NSR-Station
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
Key Lookup voor NSS1 source tables
Business Key NSS1_Surrogate_key
Ut 123522
Asd 666323
Asa 222443
hellip hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
Has a 1 on (01) relation with a Business Concept
More difficult to be virtualised Lookup table should be kept small
Therefore DO NOT do key lookup in Concept Context entity
Load generate together with BC
Preferably lsquoin memoryrsquo somehowhellip
BA important Details
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Integrate the validity timelines of Concept Contextsbelonging to a Business Concept
Like a Data Vault Point-in-time construct
But Mandatory
And with a clearly defined and performantapproach
BC-Timeline
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-Station
[valuation]Source NSS2 table p
[adres]Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BK_NSR-
Station
WOZ
waarde
Waarde
Ratingbureau X
META_Laad_dts META_Laad_eind_dts
Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959
Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959
Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000
BK_NSR-
Station
Combined_Load_dts Combined_Load_end_dts
Ut 5-6-2013 220000 1-1-2014 215959
Ut 1-1-2014 220000 1-1-2015 215959
Ut 1-1-2015 220000 4-7-2015 215959
Ut 4-7-2015 220000 1-3-2016 215959
Ut 1-3-2016 220000 31-12-9999 000000
Asd 5-6-2013 220000 31-12-9999 000000
BK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959
Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000
Asa hellip hellip hellip hellip hellip hellip
BCT
modelling
rwerschkull
nllinkedincominrogierwerschkull
What
Makes PS-3Ca Different Ensemble
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
rwerschkull
nllinkedincominrogierwerschkull
1) Explicitly Splitting The work
Data
+History
Subjects
+Integration
rwerschkull
nllinkedincominrogierwerschkull
2) NO HASHED BUSINeSS KEYS
or surrogate keys
httpwwwcannabisculturecomfilesimages6hashbrickJPG
Only
Concatenatedones
rwerschkull
nllinkedincominrogierwerschkull
3) Less
joinsRelation
+Technical validity timeline
+ Relation context
Together in one entityPhoto credit Public Domain
rwerschkull
nllinkedincominrogierwerschkull
4) Explicit
HelpEREntities
Business Alias
Business Component Timeline
+ explicitly define Driving key(s)
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
Explicitly defining a driving key as metadatahellip
Gives business understanding
Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)
bull is not registered as a new lsquoconnectionrsquo
Connector Driving key
rwerschkull
nllinkedincominrogierwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Connector Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
Driving Key
NS-Travelcard
+Checkin timestamp
Example-Data
NSR-StationNS-
Travelcard
NS-
Trainseries
rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
BK_NSR-
Station-from
BK_NSR-
Station-to
BK_NS_
Trainseries
BK_NS-Travelcard Checkin
timestamp
Checkout
timestamp
META_Load_dts META_Load_end_dts
Asd Ut IC 855 3528 0234 2073
1234
5-4-2016 84932 5-4-2016 94012 6-4-2016
220000
31-12-9999
000000
Ut Asd IC 855 3528 0234 2073
1234
5-4-2016 181009 5-4-2016 185520 6-4-2016
220000
31-12-9999
000000
We add Two mandatory
HELPEREntities
Are we there yet No
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
To help switching from sources that are Tied together by technical (surrogate) keyshellip
To a Business Key based model
Itrsquos a LOOKUP table that translates the technical to the
Business Key
Business Alias
Example
NSR-Station
[valuation]Source NSS2 table p
[description]SourceNSS1 table x[adres]
Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
BA-NSS2
Key Lookup voor NSS1 source tables
Key Lookup voor NSS2 source tables
Example-data
NSR-Station
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
Key Lookup voor NSS1 source tables
Business Key NSS1_Surrogate_key
Ut 123522
Asd 666323
Asa 222443
hellip hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
Has a 1 on (01) relation with a Business Concept
More difficult to be virtualised Lookup table should be kept small
Therefore DO NOT do key lookup in Concept Context entity
Load generate together with BC
Preferably lsquoin memoryrsquo somehowhellip
BA important Details
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Integrate the validity timelines of Concept Contextsbelonging to a Business Concept
Like a Data Vault Point-in-time construct
But Mandatory
And with a clearly defined and performantapproach
BC-Timeline
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-Station
[valuation]Source NSS2 table p
[adres]Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BK_NSR-
Station
WOZ
waarde
Waarde
Ratingbureau X
META_Laad_dts META_Laad_eind_dts
Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959
Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959
Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000
BK_NSR-
Station
Combined_Load_dts Combined_Load_end_dts
Ut 5-6-2013 220000 1-1-2014 215959
Ut 1-1-2014 220000 1-1-2015 215959
Ut 1-1-2015 220000 4-7-2015 215959
Ut 4-7-2015 220000 1-3-2016 215959
Ut 1-3-2016 220000 31-12-9999 000000
Asd 5-6-2013 220000 31-12-9999 000000
BK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959
Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000
Asa hellip hellip hellip hellip hellip hellip
BCT
modelling
rwerschkull
nllinkedincominrogierwerschkull
What
Makes PS-3Ca Different Ensemble
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
rwerschkull
nllinkedincominrogierwerschkull
1) Explicitly Splitting The work
Data
+History
Subjects
+Integration
rwerschkull
nllinkedincominrogierwerschkull
2) NO HASHED BUSINeSS KEYS
or surrogate keys
httpwwwcannabisculturecomfilesimages6hashbrickJPG
Only
Concatenatedones
rwerschkull
nllinkedincominrogierwerschkull
3) Less
joinsRelation
+Technical validity timeline
+ Relation context
Together in one entityPhoto credit Public Domain
rwerschkull
nllinkedincominrogierwerschkull
4) Explicit
HelpEREntities
Business Alias
Business Component Timeline
+ explicitly define Driving key(s)
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
Entity level [Description] [Owner Responsible]
Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)
Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record
Connector Metadata
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
Driving Key
NS-Travelcard
+Checkin timestamp
Example-Data
NSR-StationNS-
Travelcard
NS-
Trainseries
rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
BK_NSR-
Station-from
BK_NSR-
Station-to
BK_NS_
Trainseries
BK_NS-Travelcard Checkin
timestamp
Checkout
timestamp
META_Load_dts META_Load_end_dts
Asd Ut IC 855 3528 0234 2073
1234
5-4-2016 84932 5-4-2016 94012 6-4-2016
220000
31-12-9999
000000
Ut Asd IC 855 3528 0234 2073
1234
5-4-2016 181009 5-4-2016 185520 6-4-2016
220000
31-12-9999
000000
We add Two mandatory
HELPEREntities
Are we there yet No
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
To help switching from sources that are Tied together by technical (surrogate) keyshellip
To a Business Key based model
Itrsquos a LOOKUP table that translates the technical to the
Business Key
Business Alias
Example
NSR-Station
[valuation]Source NSS2 table p
[description]SourceNSS1 table x[adres]
Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
BA-NSS2
Key Lookup voor NSS1 source tables
Key Lookup voor NSS2 source tables
Example-data
NSR-Station
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
Key Lookup voor NSS1 source tables
Business Key NSS1_Surrogate_key
Ut 123522
Asd 666323
Asa 222443
hellip hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
Has a 1 on (01) relation with a Business Concept
More difficult to be virtualised Lookup table should be kept small
Therefore DO NOT do key lookup in Concept Context entity
Load generate together with BC
Preferably lsquoin memoryrsquo somehowhellip
BA important Details
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Integrate the validity timelines of Concept Contextsbelonging to a Business Concept
Like a Data Vault Point-in-time construct
But Mandatory
And with a clearly defined and performantapproach
BC-Timeline
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-Station
[valuation]Source NSS2 table p
[adres]Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BK_NSR-
Station
WOZ
waarde
Waarde
Ratingbureau X
META_Laad_dts META_Laad_eind_dts
Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959
Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959
Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000
BK_NSR-
Station
Combined_Load_dts Combined_Load_end_dts
Ut 5-6-2013 220000 1-1-2014 215959
Ut 1-1-2014 220000 1-1-2015 215959
Ut 1-1-2015 220000 4-7-2015 215959
Ut 4-7-2015 220000 1-3-2016 215959
Ut 1-3-2016 220000 31-12-9999 000000
Asd 5-6-2013 220000 31-12-9999 000000
BK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959
Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000
Asa hellip hellip hellip hellip hellip hellip
BCT
modelling
rwerschkull
nllinkedincominrogierwerschkull
What
Makes PS-3Ca Different Ensemble
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
rwerschkull
nllinkedincominrogierwerschkull
1) Explicitly Splitting The work
Data
+History
Subjects
+Integration
rwerschkull
nllinkedincominrogierwerschkull
2) NO HASHED BUSINeSS KEYS
or surrogate keys
httpwwwcannabisculturecomfilesimages6hashbrickJPG
Only
Concatenatedones
rwerschkull
nllinkedincominrogierwerschkull
3) Less
joinsRelation
+Technical validity timeline
+ Relation context
Together in one entityPhoto credit Public Domain
rwerschkull
nllinkedincominrogierwerschkull
4) Explicit
HelpEREntities
Business Alias
Business Component Timeline
+ explicitly define Driving key(s)
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
Example
NSR-StationNS-
Travelcard
NS-
TrainseriesNS-
Traveller
[valuation]Source NSS2 table p
[description]SourceNSS1 table x
[adres]Source NSS1 table y
[description]Source NTR table q
[ovchip_
personal]Source NSR table r
[ovchip_
on-usageSource NSR table s
[personal_
details]Source NSR table t
[adres_
data]Source NSR table t rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
Driving Key
NS-Travelcard
+Checkin timestamp
Example-Data
NSR-StationNS-
Travelcard
NS-
Trainseries
rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
BK_NSR-
Station-from
BK_NSR-
Station-to
BK_NS_
Trainseries
BK_NS-Travelcard Checkin
timestamp
Checkout
timestamp
META_Load_dts META_Load_end_dts
Asd Ut IC 855 3528 0234 2073
1234
5-4-2016 84932 5-4-2016 94012 6-4-2016
220000
31-12-9999
000000
Ut Asd IC 855 3528 0234 2073
1234
5-4-2016 181009 5-4-2016 185520 6-4-2016
220000
31-12-9999
000000
We add Two mandatory
HELPEREntities
Are we there yet No
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
To help switching from sources that are Tied together by technical (surrogate) keyshellip
To a Business Key based model
Itrsquos a LOOKUP table that translates the technical to the
Business Key
Business Alias
Example
NSR-Station
[valuation]Source NSS2 table p
[description]SourceNSS1 table x[adres]
Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
BA-NSS2
Key Lookup voor NSS1 source tables
Key Lookup voor NSS2 source tables
Example-data
NSR-Station
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
Key Lookup voor NSS1 source tables
Business Key NSS1_Surrogate_key
Ut 123522
Asd 666323
Asa 222443
hellip hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
Has a 1 on (01) relation with a Business Concept
More difficult to be virtualised Lookup table should be kept small
Therefore DO NOT do key lookup in Concept Context entity
Load generate together with BC
Preferably lsquoin memoryrsquo somehowhellip
BA important Details
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Integrate the validity timelines of Concept Contextsbelonging to a Business Concept
Like a Data Vault Point-in-time construct
But Mandatory
And with a clearly defined and performantapproach
BC-Timeline
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-Station
[valuation]Source NSS2 table p
[adres]Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BK_NSR-
Station
WOZ
waarde
Waarde
Ratingbureau X
META_Laad_dts META_Laad_eind_dts
Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959
Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959
Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000
BK_NSR-
Station
Combined_Load_dts Combined_Load_end_dts
Ut 5-6-2013 220000 1-1-2014 215959
Ut 1-1-2014 220000 1-1-2015 215959
Ut 1-1-2015 220000 4-7-2015 215959
Ut 4-7-2015 220000 1-3-2016 215959
Ut 1-3-2016 220000 31-12-9999 000000
Asd 5-6-2013 220000 31-12-9999 000000
BK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959
Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000
Asa hellip hellip hellip hellip hellip hellip
BCT
modelling
rwerschkull
nllinkedincominrogierwerschkull
What
Makes PS-3Ca Different Ensemble
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
rwerschkull
nllinkedincominrogierwerschkull
1) Explicitly Splitting The work
Data
+History
Subjects
+Integration
rwerschkull
nllinkedincominrogierwerschkull
2) NO HASHED BUSINeSS KEYS
or surrogate keys
httpwwwcannabisculturecomfilesimages6hashbrickJPG
Only
Concatenatedones
rwerschkull
nllinkedincominrogierwerschkull
3) Less
joinsRelation
+Technical validity timeline
+ Relation context
Together in one entityPhoto credit Public Domain
rwerschkull
nllinkedincominrogierwerschkull
4) Explicit
HelpEREntities
Business Alias
Business Component Timeline
+ explicitly define Driving key(s)
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
Example-Data
NSR-StationNS-
Travelcard
NS-
Trainseries
rwerschkull
nllinkedincominrogierwerschkull
NSR-
TravelmovementCheckin timestamp
from
to
BK_NSR-
Station-from
BK_NSR-
Station-to
BK_NS_
Trainseries
BK_NS-Travelcard Checkin
timestamp
Checkout
timestamp
META_Load_dts META_Load_end_dts
Asd Ut IC 855 3528 0234 2073
1234
5-4-2016 84932 5-4-2016 94012 6-4-2016
220000
31-12-9999
000000
Ut Asd IC 855 3528 0234 2073
1234
5-4-2016 181009 5-4-2016 185520 6-4-2016
220000
31-12-9999
000000
We add Two mandatory
HELPEREntities
Are we there yet No
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
To help switching from sources that are Tied together by technical (surrogate) keyshellip
To a Business Key based model
Itrsquos a LOOKUP table that translates the technical to the
Business Key
Business Alias
Example
NSR-Station
[valuation]Source NSS2 table p
[description]SourceNSS1 table x[adres]
Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
BA-NSS2
Key Lookup voor NSS1 source tables
Key Lookup voor NSS2 source tables
Example-data
NSR-Station
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
Key Lookup voor NSS1 source tables
Business Key NSS1_Surrogate_key
Ut 123522
Asd 666323
Asa 222443
hellip hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
Has a 1 on (01) relation with a Business Concept
More difficult to be virtualised Lookup table should be kept small
Therefore DO NOT do key lookup in Concept Context entity
Load generate together with BC
Preferably lsquoin memoryrsquo somehowhellip
BA important Details
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Integrate the validity timelines of Concept Contextsbelonging to a Business Concept
Like a Data Vault Point-in-time construct
But Mandatory
And with a clearly defined and performantapproach
BC-Timeline
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-Station
[valuation]Source NSS2 table p
[adres]Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BK_NSR-
Station
WOZ
waarde
Waarde
Ratingbureau X
META_Laad_dts META_Laad_eind_dts
Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959
Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959
Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000
BK_NSR-
Station
Combined_Load_dts Combined_Load_end_dts
Ut 5-6-2013 220000 1-1-2014 215959
Ut 1-1-2014 220000 1-1-2015 215959
Ut 1-1-2015 220000 4-7-2015 215959
Ut 4-7-2015 220000 1-3-2016 215959
Ut 1-3-2016 220000 31-12-9999 000000
Asd 5-6-2013 220000 31-12-9999 000000
BK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959
Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000
Asa hellip hellip hellip hellip hellip hellip
BCT
modelling
rwerschkull
nllinkedincominrogierwerschkull
What
Makes PS-3Ca Different Ensemble
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
rwerschkull
nllinkedincominrogierwerschkull
1) Explicitly Splitting The work
Data
+History
Subjects
+Integration
rwerschkull
nllinkedincominrogierwerschkull
2) NO HASHED BUSINeSS KEYS
or surrogate keys
httpwwwcannabisculturecomfilesimages6hashbrickJPG
Only
Concatenatedones
rwerschkull
nllinkedincominrogierwerschkull
3) Less
joinsRelation
+Technical validity timeline
+ Relation context
Together in one entityPhoto credit Public Domain
rwerschkull
nllinkedincominrogierwerschkull
4) Explicit
HelpEREntities
Business Alias
Business Component Timeline
+ explicitly define Driving key(s)
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
We add Two mandatory
HELPEREntities
Are we there yet No
rwerschkull
nllinkedincominrogierwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
To help switching from sources that are Tied together by technical (surrogate) keyshellip
To a Business Key based model
Itrsquos a LOOKUP table that translates the technical to the
Business Key
Business Alias
Example
NSR-Station
[valuation]Source NSS2 table p
[description]SourceNSS1 table x[adres]
Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
BA-NSS2
Key Lookup voor NSS1 source tables
Key Lookup voor NSS2 source tables
Example-data
NSR-Station
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
Key Lookup voor NSS1 source tables
Business Key NSS1_Surrogate_key
Ut 123522
Asd 666323
Asa 222443
hellip hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
Has a 1 on (01) relation with a Business Concept
More difficult to be virtualised Lookup table should be kept small
Therefore DO NOT do key lookup in Concept Context entity
Load generate together with BC
Preferably lsquoin memoryrsquo somehowhellip
BA important Details
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Integrate the validity timelines of Concept Contextsbelonging to a Business Concept
Like a Data Vault Point-in-time construct
But Mandatory
And with a clearly defined and performantapproach
BC-Timeline
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-Station
[valuation]Source NSS2 table p
[adres]Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BK_NSR-
Station
WOZ
waarde
Waarde
Ratingbureau X
META_Laad_dts META_Laad_eind_dts
Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959
Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959
Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000
BK_NSR-
Station
Combined_Load_dts Combined_Load_end_dts
Ut 5-6-2013 220000 1-1-2014 215959
Ut 1-1-2014 220000 1-1-2015 215959
Ut 1-1-2015 220000 4-7-2015 215959
Ut 4-7-2015 220000 1-3-2016 215959
Ut 1-3-2016 220000 31-12-9999 000000
Asd 5-6-2013 220000 31-12-9999 000000
BK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959
Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000
Asa hellip hellip hellip hellip hellip hellip
BCT
modelling
rwerschkull
nllinkedincominrogierwerschkull
What
Makes PS-3Ca Different Ensemble
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
rwerschkull
nllinkedincominrogierwerschkull
1) Explicitly Splitting The work
Data
+History
Subjects
+Integration
rwerschkull
nllinkedincominrogierwerschkull
2) NO HASHED BUSINeSS KEYS
or surrogate keys
httpwwwcannabisculturecomfilesimages6hashbrickJPG
Only
Concatenatedones
rwerschkull
nllinkedincominrogierwerschkull
3) Less
joinsRelation
+Technical validity timeline
+ Relation context
Together in one entityPhoto credit Public Domain
rwerschkull
nllinkedincominrogierwerschkull
4) Explicit
HelpEREntities
Business Alias
Business Component Timeline
+ explicitly define Driving key(s)
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
To help switching from sources that are Tied together by technical (surrogate) keyshellip
To a Business Key based model
Itrsquos a LOOKUP table that translates the technical to the
Business Key
Business Alias
Example
NSR-Station
[valuation]Source NSS2 table p
[description]SourceNSS1 table x[adres]
Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
BA-NSS2
Key Lookup voor NSS1 source tables
Key Lookup voor NSS2 source tables
Example-data
NSR-Station
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
Key Lookup voor NSS1 source tables
Business Key NSS1_Surrogate_key
Ut 123522
Asd 666323
Asa 222443
hellip hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
Has a 1 on (01) relation with a Business Concept
More difficult to be virtualised Lookup table should be kept small
Therefore DO NOT do key lookup in Concept Context entity
Load generate together with BC
Preferably lsquoin memoryrsquo somehowhellip
BA important Details
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Integrate the validity timelines of Concept Contextsbelonging to a Business Concept
Like a Data Vault Point-in-time construct
But Mandatory
And with a clearly defined and performantapproach
BC-Timeline
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-Station
[valuation]Source NSS2 table p
[adres]Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BK_NSR-
Station
WOZ
waarde
Waarde
Ratingbureau X
META_Laad_dts META_Laad_eind_dts
Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959
Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959
Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000
BK_NSR-
Station
Combined_Load_dts Combined_Load_end_dts
Ut 5-6-2013 220000 1-1-2014 215959
Ut 1-1-2014 220000 1-1-2015 215959
Ut 1-1-2015 220000 4-7-2015 215959
Ut 4-7-2015 220000 1-3-2016 215959
Ut 1-3-2016 220000 31-12-9999 000000
Asd 5-6-2013 220000 31-12-9999 000000
BK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959
Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000
Asa hellip hellip hellip hellip hellip hellip
BCT
modelling
rwerschkull
nllinkedincominrogierwerschkull
What
Makes PS-3Ca Different Ensemble
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
rwerschkull
nllinkedincominrogierwerschkull
1) Explicitly Splitting The work
Data
+History
Subjects
+Integration
rwerschkull
nllinkedincominrogierwerschkull
2) NO HASHED BUSINeSS KEYS
or surrogate keys
httpwwwcannabisculturecomfilesimages6hashbrickJPG
Only
Concatenatedones
rwerschkull
nllinkedincominrogierwerschkull
3) Less
joinsRelation
+Technical validity timeline
+ Relation context
Together in one entityPhoto credit Public Domain
rwerschkull
nllinkedincominrogierwerschkull
4) Explicit
HelpEREntities
Business Alias
Business Component Timeline
+ explicitly define Driving key(s)
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
To help switching from sources that are Tied together by technical (surrogate) keyshellip
To a Business Key based model
Itrsquos a LOOKUP table that translates the technical to the
Business Key
Business Alias
Example
NSR-Station
[valuation]Source NSS2 table p
[description]SourceNSS1 table x[adres]
Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
BA-NSS2
Key Lookup voor NSS1 source tables
Key Lookup voor NSS2 source tables
Example-data
NSR-Station
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
Key Lookup voor NSS1 source tables
Business Key NSS1_Surrogate_key
Ut 123522
Asd 666323
Asa 222443
hellip hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
Has a 1 on (01) relation with a Business Concept
More difficult to be virtualised Lookup table should be kept small
Therefore DO NOT do key lookup in Concept Context entity
Load generate together with BC
Preferably lsquoin memoryrsquo somehowhellip
BA important Details
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Integrate the validity timelines of Concept Contextsbelonging to a Business Concept
Like a Data Vault Point-in-time construct
But Mandatory
And with a clearly defined and performantapproach
BC-Timeline
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-Station
[valuation]Source NSS2 table p
[adres]Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BK_NSR-
Station
WOZ
waarde
Waarde
Ratingbureau X
META_Laad_dts META_Laad_eind_dts
Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959
Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959
Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000
BK_NSR-
Station
Combined_Load_dts Combined_Load_end_dts
Ut 5-6-2013 220000 1-1-2014 215959
Ut 1-1-2014 220000 1-1-2015 215959
Ut 1-1-2015 220000 4-7-2015 215959
Ut 4-7-2015 220000 1-3-2016 215959
Ut 1-3-2016 220000 31-12-9999 000000
Asd 5-6-2013 220000 31-12-9999 000000
BK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959
Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000
Asa hellip hellip hellip hellip hellip hellip
BCT
modelling
rwerschkull
nllinkedincominrogierwerschkull
What
Makes PS-3Ca Different Ensemble
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
rwerschkull
nllinkedincominrogierwerschkull
1) Explicitly Splitting The work
Data
+History
Subjects
+Integration
rwerschkull
nllinkedincominrogierwerschkull
2) NO HASHED BUSINeSS KEYS
or surrogate keys
httpwwwcannabisculturecomfilesimages6hashbrickJPG
Only
Concatenatedones
rwerschkull
nllinkedincominrogierwerschkull
3) Less
joinsRelation
+Technical validity timeline
+ Relation context
Together in one entityPhoto credit Public Domain
rwerschkull
nllinkedincominrogierwerschkull
4) Explicit
HelpEREntities
Business Alias
Business Component Timeline
+ explicitly define Driving key(s)
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
To help switching from sources that are Tied together by technical (surrogate) keyshellip
To a Business Key based model
Itrsquos a LOOKUP table that translates the technical to the
Business Key
Business Alias
Example
NSR-Station
[valuation]Source NSS2 table p
[description]SourceNSS1 table x[adres]
Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
BA-NSS2
Key Lookup voor NSS1 source tables
Key Lookup voor NSS2 source tables
Example-data
NSR-Station
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
Key Lookup voor NSS1 source tables
Business Key NSS1_Surrogate_key
Ut 123522
Asd 666323
Asa 222443
hellip hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
Has a 1 on (01) relation with a Business Concept
More difficult to be virtualised Lookup table should be kept small
Therefore DO NOT do key lookup in Concept Context entity
Load generate together with BC
Preferably lsquoin memoryrsquo somehowhellip
BA important Details
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Integrate the validity timelines of Concept Contextsbelonging to a Business Concept
Like a Data Vault Point-in-time construct
But Mandatory
And with a clearly defined and performantapproach
BC-Timeline
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-Station
[valuation]Source NSS2 table p
[adres]Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BK_NSR-
Station
WOZ
waarde
Waarde
Ratingbureau X
META_Laad_dts META_Laad_eind_dts
Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959
Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959
Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000
BK_NSR-
Station
Combined_Load_dts Combined_Load_end_dts
Ut 5-6-2013 220000 1-1-2014 215959
Ut 1-1-2014 220000 1-1-2015 215959
Ut 1-1-2015 220000 4-7-2015 215959
Ut 4-7-2015 220000 1-3-2016 215959
Ut 1-3-2016 220000 31-12-9999 000000
Asd 5-6-2013 220000 31-12-9999 000000
BK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959
Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000
Asa hellip hellip hellip hellip hellip hellip
BCT
modelling
rwerschkull
nllinkedincominrogierwerschkull
What
Makes PS-3Ca Different Ensemble
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
rwerschkull
nllinkedincominrogierwerschkull
1) Explicitly Splitting The work
Data
+History
Subjects
+Integration
rwerschkull
nllinkedincominrogierwerschkull
2) NO HASHED BUSINeSS KEYS
or surrogate keys
httpwwwcannabisculturecomfilesimages6hashbrickJPG
Only
Concatenatedones
rwerschkull
nllinkedincominrogierwerschkull
3) Less
joinsRelation
+Technical validity timeline
+ Relation context
Together in one entityPhoto credit Public Domain
rwerschkull
nllinkedincominrogierwerschkull
4) Explicit
HelpEREntities
Business Alias
Business Component Timeline
+ explicitly define Driving key(s)
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
Example
NSR-Station
[valuation]Source NSS2 table p
[description]SourceNSS1 table x[adres]
Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
BA-NSS2
Key Lookup voor NSS1 source tables
Key Lookup voor NSS2 source tables
Example-data
NSR-Station
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
Key Lookup voor NSS1 source tables
Business Key NSS1_Surrogate_key
Ut 123522
Asd 666323
Asa 222443
hellip hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
Has a 1 on (01) relation with a Business Concept
More difficult to be virtualised Lookup table should be kept small
Therefore DO NOT do key lookup in Concept Context entity
Load generate together with BC
Preferably lsquoin memoryrsquo somehowhellip
BA important Details
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Integrate the validity timelines of Concept Contextsbelonging to a Business Concept
Like a Data Vault Point-in-time construct
But Mandatory
And with a clearly defined and performantapproach
BC-Timeline
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-Station
[valuation]Source NSS2 table p
[adres]Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BK_NSR-
Station
WOZ
waarde
Waarde
Ratingbureau X
META_Laad_dts META_Laad_eind_dts
Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959
Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959
Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000
BK_NSR-
Station
Combined_Load_dts Combined_Load_end_dts
Ut 5-6-2013 220000 1-1-2014 215959
Ut 1-1-2014 220000 1-1-2015 215959
Ut 1-1-2015 220000 4-7-2015 215959
Ut 4-7-2015 220000 1-3-2016 215959
Ut 1-3-2016 220000 31-12-9999 000000
Asd 5-6-2013 220000 31-12-9999 000000
BK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959
Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000
Asa hellip hellip hellip hellip hellip hellip
BCT
modelling
rwerschkull
nllinkedincominrogierwerschkull
What
Makes PS-3Ca Different Ensemble
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
rwerschkull
nllinkedincominrogierwerschkull
1) Explicitly Splitting The work
Data
+History
Subjects
+Integration
rwerschkull
nllinkedincominrogierwerschkull
2) NO HASHED BUSINeSS KEYS
or surrogate keys
httpwwwcannabisculturecomfilesimages6hashbrickJPG
Only
Concatenatedones
rwerschkull
nllinkedincominrogierwerschkull
3) Less
joinsRelation
+Technical validity timeline
+ Relation context
Together in one entityPhoto credit Public Domain
rwerschkull
nllinkedincominrogierwerschkull
4) Explicit
HelpEREntities
Business Alias
Business Component Timeline
+ explicitly define Driving key(s)
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
Example-data
NSR-Station
rwerschkull
nllinkedincominrogierwerschkull
BA-NSS1
Key Lookup voor NSS1 source tables
Business Key NSS1_Surrogate_key
Ut 123522
Asd 666323
Asa 222443
hellip hellip
Business
Key
META_Source META_Load_dts
Ut NSS1_y 5-6-2015 220000
Asd NSS1_y 5-6-2015 220000
Asa NSS2_p 6-6-2015 220000
hellip hellip hellip
Has a 1 on (01) relation with a Business Concept
More difficult to be virtualised Lookup table should be kept small
Therefore DO NOT do key lookup in Concept Context entity
Load generate together with BC
Preferably lsquoin memoryrsquo somehowhellip
BA important Details
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Integrate the validity timelines of Concept Contextsbelonging to a Business Concept
Like a Data Vault Point-in-time construct
But Mandatory
And with a clearly defined and performantapproach
BC-Timeline
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-Station
[valuation]Source NSS2 table p
[adres]Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BK_NSR-
Station
WOZ
waarde
Waarde
Ratingbureau X
META_Laad_dts META_Laad_eind_dts
Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959
Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959
Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000
BK_NSR-
Station
Combined_Load_dts Combined_Load_end_dts
Ut 5-6-2013 220000 1-1-2014 215959
Ut 1-1-2014 220000 1-1-2015 215959
Ut 1-1-2015 220000 4-7-2015 215959
Ut 4-7-2015 220000 1-3-2016 215959
Ut 1-3-2016 220000 31-12-9999 000000
Asd 5-6-2013 220000 31-12-9999 000000
BK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959
Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000
Asa hellip hellip hellip hellip hellip hellip
BCT
modelling
rwerschkull
nllinkedincominrogierwerschkull
What
Makes PS-3Ca Different Ensemble
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
rwerschkull
nllinkedincominrogierwerschkull
1) Explicitly Splitting The work
Data
+History
Subjects
+Integration
rwerschkull
nllinkedincominrogierwerschkull
2) NO HASHED BUSINeSS KEYS
or surrogate keys
httpwwwcannabisculturecomfilesimages6hashbrickJPG
Only
Concatenatedones
rwerschkull
nllinkedincominrogierwerschkull
3) Less
joinsRelation
+Technical validity timeline
+ Relation context
Together in one entityPhoto credit Public Domain
rwerschkull
nllinkedincominrogierwerschkull
4) Explicit
HelpEREntities
Business Alias
Business Component Timeline
+ explicitly define Driving key(s)
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
Has a 1 on (01) relation with a Business Concept
More difficult to be virtualised Lookup table should be kept small
Therefore DO NOT do key lookup in Concept Context entity
Load generate together with BC
Preferably lsquoin memoryrsquo somehowhellip
BA important Details
rwerschkull
nllinkedincominrogierwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Integrate the validity timelines of Concept Contextsbelonging to a Business Concept
Like a Data Vault Point-in-time construct
But Mandatory
And with a clearly defined and performantapproach
BC-Timeline
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-Station
[valuation]Source NSS2 table p
[adres]Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BK_NSR-
Station
WOZ
waarde
Waarde
Ratingbureau X
META_Laad_dts META_Laad_eind_dts
Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959
Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959
Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000
BK_NSR-
Station
Combined_Load_dts Combined_Load_end_dts
Ut 5-6-2013 220000 1-1-2014 215959
Ut 1-1-2014 220000 1-1-2015 215959
Ut 1-1-2015 220000 4-7-2015 215959
Ut 4-7-2015 220000 1-3-2016 215959
Ut 1-3-2016 220000 31-12-9999 000000
Asd 5-6-2013 220000 31-12-9999 000000
BK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959
Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000
Asa hellip hellip hellip hellip hellip hellip
BCT
modelling
rwerschkull
nllinkedincominrogierwerschkull
What
Makes PS-3Ca Different Ensemble
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
rwerschkull
nllinkedincominrogierwerschkull
1) Explicitly Splitting The work
Data
+History
Subjects
+Integration
rwerschkull
nllinkedincominrogierwerschkull
2) NO HASHED BUSINeSS KEYS
or surrogate keys
httpwwwcannabisculturecomfilesimages6hashbrickJPG
Only
Concatenatedones
rwerschkull
nllinkedincominrogierwerschkull
3) Less
joinsRelation
+Technical validity timeline
+ Relation context
Together in one entityPhoto credit Public Domain
rwerschkull
nllinkedincominrogierwerschkull
4) Explicit
HelpEREntities
Business Alias
Business Component Timeline
+ explicitly define Driving key(s)
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
rwerschkull
nllinkedincominrogierwerschkull
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
3C - Details
Integrate the validity timelines of Concept Contextsbelonging to a Business Concept
Like a Data Vault Point-in-time construct
But Mandatory
And with a clearly defined and performantapproach
BC-Timeline
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-Station
[valuation]Source NSS2 table p
[adres]Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BK_NSR-
Station
WOZ
waarde
Waarde
Ratingbureau X
META_Laad_dts META_Laad_eind_dts
Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959
Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959
Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000
BK_NSR-
Station
Combined_Load_dts Combined_Load_end_dts
Ut 5-6-2013 220000 1-1-2014 215959
Ut 1-1-2014 220000 1-1-2015 215959
Ut 1-1-2015 220000 4-7-2015 215959
Ut 4-7-2015 220000 1-3-2016 215959
Ut 1-3-2016 220000 31-12-9999 000000
Asd 5-6-2013 220000 31-12-9999 000000
BK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959
Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000
Asa hellip hellip hellip hellip hellip hellip
BCT
modelling
rwerschkull
nllinkedincominrogierwerschkull
What
Makes PS-3Ca Different Ensemble
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
rwerschkull
nllinkedincominrogierwerschkull
1) Explicitly Splitting The work
Data
+History
Subjects
+Integration
rwerschkull
nllinkedincominrogierwerschkull
2) NO HASHED BUSINeSS KEYS
or surrogate keys
httpwwwcannabisculturecomfilesimages6hashbrickJPG
Only
Concatenatedones
rwerschkull
nllinkedincominrogierwerschkull
3) Less
joinsRelation
+Technical validity timeline
+ Relation context
Together in one entityPhoto credit Public Domain
rwerschkull
nllinkedincominrogierwerschkull
4) Explicit
HelpEREntities
Business Alias
Business Component Timeline
+ explicitly define Driving key(s)
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
Integrate the validity timelines of Concept Contextsbelonging to a Business Concept
Like a Data Vault Point-in-time construct
But Mandatory
And with a clearly defined and performantapproach
BC-Timeline
rwerschkull
nllinkedincominrogierwerschkull
Example
NSR-Station
[valuation]Source NSS2 table p
[adres]Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BK_NSR-
Station
WOZ
waarde
Waarde
Ratingbureau X
META_Laad_dts META_Laad_eind_dts
Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959
Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959
Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000
BK_NSR-
Station
Combined_Load_dts Combined_Load_end_dts
Ut 5-6-2013 220000 1-1-2014 215959
Ut 1-1-2014 220000 1-1-2015 215959
Ut 1-1-2015 220000 4-7-2015 215959
Ut 4-7-2015 220000 1-3-2016 215959
Ut 1-3-2016 220000 31-12-9999 000000
Asd 5-6-2013 220000 31-12-9999 000000
BK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959
Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000
Asa hellip hellip hellip hellip hellip hellip
BCT
modelling
rwerschkull
nllinkedincominrogierwerschkull
What
Makes PS-3Ca Different Ensemble
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
rwerschkull
nllinkedincominrogierwerschkull
1) Explicitly Splitting The work
Data
+History
Subjects
+Integration
rwerschkull
nllinkedincominrogierwerschkull
2) NO HASHED BUSINeSS KEYS
or surrogate keys
httpwwwcannabisculturecomfilesimages6hashbrickJPG
Only
Concatenatedones
rwerschkull
nllinkedincominrogierwerschkull
3) Less
joinsRelation
+Technical validity timeline
+ Relation context
Together in one entityPhoto credit Public Domain
rwerschkull
nllinkedincominrogierwerschkull
4) Explicit
HelpEREntities
Business Alias
Business Component Timeline
+ explicitly define Driving key(s)
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
Example
NSR-Station
[valuation]Source NSS2 table p
[adres]Source NSS1 table y
rwerschkull
nllinkedincominrogierwerschkull
BK_NSR-
Station
WOZ
waarde
Waarde
Ratingbureau X
META_Laad_dts META_Laad_eind_dts
Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959
Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959
Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000
BK_NSR-
Station
Combined_Load_dts Combined_Load_end_dts
Ut 5-6-2013 220000 1-1-2014 215959
Ut 1-1-2014 220000 1-1-2015 215959
Ut 1-1-2015 220000 4-7-2015 215959
Ut 4-7-2015 220000 1-3-2016 215959
Ut 1-3-2016 220000 31-12-9999 000000
Asd 5-6-2013 220000 31-12-9999 000000
BK_NSR-
Station
Postadres_
postcode
GPS hellip META_
source
META_Load_dts META_Load_end_dts
Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959
Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000
Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000
Asa hellip hellip hellip hellip hellip hellip
BCT
modelling
rwerschkull
nllinkedincominrogierwerschkull
What
Makes PS-3Ca Different Ensemble
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
rwerschkull
nllinkedincominrogierwerschkull
1) Explicitly Splitting The work
Data
+History
Subjects
+Integration
rwerschkull
nllinkedincominrogierwerschkull
2) NO HASHED BUSINeSS KEYS
or surrogate keys
httpwwwcannabisculturecomfilesimages6hashbrickJPG
Only
Concatenatedones
rwerschkull
nllinkedincominrogierwerschkull
3) Less
joinsRelation
+Technical validity timeline
+ Relation context
Together in one entityPhoto credit Public Domain
rwerschkull
nllinkedincominrogierwerschkull
4) Explicit
HelpEREntities
Business Alias
Business Component Timeline
+ explicitly define Driving key(s)
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
modelling
rwerschkull
nllinkedincominrogierwerschkull
What
Makes PS-3Ca Different Ensemble
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
rwerschkull
nllinkedincominrogierwerschkull
1) Explicitly Splitting The work
Data
+History
Subjects
+Integration
rwerschkull
nllinkedincominrogierwerschkull
2) NO HASHED BUSINeSS KEYS
or surrogate keys
httpwwwcannabisculturecomfilesimages6hashbrickJPG
Only
Concatenatedones
rwerschkull
nllinkedincominrogierwerschkull
3) Less
joinsRelation
+Technical validity timeline
+ Relation context
Together in one entityPhoto credit Public Domain
rwerschkull
nllinkedincominrogierwerschkull
4) Explicit
HelpEREntities
Business Alias
Business Component Timeline
+ explicitly define Driving key(s)
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
What
Makes PS-3Ca Different Ensemble
Business
Concept
X
Concept
ContextX-A
Concept
ContextX-B
Business
Concept
Y
Concept
Context Y-A
Concept
ContextY-B
Concept
ContextY-C
Connector
Business Alias A
Business Alias B
BC-Timeline X
rwerschkull
nllinkedincominrogierwerschkull
1) Explicitly Splitting The work
Data
+History
Subjects
+Integration
rwerschkull
nllinkedincominrogierwerschkull
2) NO HASHED BUSINeSS KEYS
or surrogate keys
httpwwwcannabisculturecomfilesimages6hashbrickJPG
Only
Concatenatedones
rwerschkull
nllinkedincominrogierwerschkull
3) Less
joinsRelation
+Technical validity timeline
+ Relation context
Together in one entityPhoto credit Public Domain
rwerschkull
nllinkedincominrogierwerschkull
4) Explicit
HelpEREntities
Business Alias
Business Component Timeline
+ explicitly define Driving key(s)
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
1) Explicitly Splitting The work
Data
+History
Subjects
+Integration
rwerschkull
nllinkedincominrogierwerschkull
2) NO HASHED BUSINeSS KEYS
or surrogate keys
httpwwwcannabisculturecomfilesimages6hashbrickJPG
Only
Concatenatedones
rwerschkull
nllinkedincominrogierwerschkull
3) Less
joinsRelation
+Technical validity timeline
+ Relation context
Together in one entityPhoto credit Public Domain
rwerschkull
nllinkedincominrogierwerschkull
4) Explicit
HelpEREntities
Business Alias
Business Component Timeline
+ explicitly define Driving key(s)
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
2) NO HASHED BUSINeSS KEYS
or surrogate keys
httpwwwcannabisculturecomfilesimages6hashbrickJPG
Only
Concatenatedones
rwerschkull
nllinkedincominrogierwerschkull
3) Less
joinsRelation
+Technical validity timeline
+ Relation context
Together in one entityPhoto credit Public Domain
rwerschkull
nllinkedincominrogierwerschkull
4) Explicit
HelpEREntities
Business Alias
Business Component Timeline
+ explicitly define Driving key(s)
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
3) Less
joinsRelation
+Technical validity timeline
+ Relation context
Together in one entityPhoto credit Public Domain
rwerschkull
nllinkedincominrogierwerschkull
4) Explicit
HelpEREntities
Business Alias
Business Component Timeline
+ explicitly define Driving key(s)
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
4) Explicit
HelpEREntities
Business Alias
Business Component Timeline
+ explicitly define Driving key(s)
Photo credit Public Domainrwerschkull
nllinkedincominrogierwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
Hope to
AVOIDThishellip
httpxkcdcom927
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull
PS-3CA new PROPOSEDensemble modelling technique
Help
needed
Questions
About Me
lsquoHead of BIrsquo Spillgames
Certified Data Vault modeler since 2009
Contact details nllinkedincominrogierwerschkull
rogierwerschkullgmailcom
rwerschkull