Top Banner
PS-3C A new ensemble modelling technique
80

PS-3C Data Modelling Zone Berlin

Jan 18, 2017

Download

Data & Analytics

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: PS-3C Data Modelling Zone Berlin

PS-3CA new ensemble modellingtechnique

About Me

lsquoHead of BIrsquo Spilgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Ensemble data Modellinghellip

rwerschkull

nllinkedincominrogierwerschkull

WHY

Another Ensemble

Wersquove got loads already

httptopofmindssewp2014ensemble-modeling-forelasningar-och-lars-ronnbackrwerschkull

nllinkedincominrogierwerschkull

13600

9970

37073 5

0

2000

4000

6000

8000

10000

12000

14000

16000

Data Vault modeling+data warehouse

Anchor modeling+data warehouse

Hyper agility +datawarehouse

Focal point modeling+data warehouse

Head version modeling+data warehouse

Search Hits on Google 31-8-2016

Ensemble

Popularity

Data Vault

Ran into Problems using

Head-Version

Anchor Modelling

rwerschkull

nllinkedincominrogierwerschkull

WHAT problemS

Photo My ownhelliphelliprwerschkull

nllinkedincominrogierwerschkull

problem

rwerschkull

nllinkedincominrogierwerschkull

Not Build for

BiG data lake

Data CentrICITY

Photo credit Lake Public Domain httpwwwwriteupsorgstar-trek-brent-spiner-datarwerschkull

nllinkedincominrogierwerschkull

lsquoData may first be stored in a

data lake so that it can be explored cleaned and prepared

If it can be structured in a relational format (basically rows and columns) and needs to be used frequently and kept highly secure it may go into a

data warehouse

If it stops being used frequently it may go back to a HDFS

(Hadoop Distributed File System)-based archiversquo

Data Centric data first THOMAS H DAVENPORT WALL STEET JOURNAL OF 3-6-2015

httpblogswsjcomcio20150603the-shift-to-a-new-data-architecture rwerschkull

nllinkedincominrogierwerschkull

Systems Like

rwerschkull

nllinkedincominrogierwerschkull

Data Flood

Photo credit Kurayba (httpswwwflickrcomphotos48503330N0828564454666 )

under cc licence (httpscreativecommonsorglicensesby-sa20)

rwerschkull

nllinkedincominrogierwerschkull

The possible resulthellip

Photo credit httpshighfiveexportswordpresscom201006253000-pieces-lego-mix-specialty-pieces-rare-pieces-bricks-blocks-

parts-more-ultimate-lot-of-lego-parts-pieces-lego-for-sale-lego-batman-lego-starwars-lego-technic-lego-minifigur

rwerschkull

nllinkedincominrogierwerschkull

But isNrsquot Data Vault v2

lsquomade for

Big data centric

systemsrsquorwerschkull

nllinkedincominrogierwerschkull

In

DV2you still

do thisin one go

Subject Oriented

Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Coding =

A lot like modelling

Being Data Centric

conflictswith thecomplex Data

MODELLINGwork

httpxkcdcom844

problem

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Less Mature

JOINoptimizers

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Key-Value

Document

Column Family

NoSQL Databases

+SQL on Hadoop solutions

Do NOT like joins

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

This

REALLYcomplicates

AnchoRModeling

rwerschkull

nllinkedincominrogierwerschkull

And Personally

HUB

SAT

LINK

This one

too(Link Satellite)

HUB

HUB

SAT

SAT

rwerschkull

nllinkedincominrogierwerschkull

problem

rwerschkull

nllinkedincominrogierwerschkull

a) HASHING OF Business keys

Rolling

Stock Nr Datetime Sensor Id Value Concatenated Business Key

Key

Len MD5 Hash

Key

Len

8739

2015-01-22

013427 72A1_FINV 123 8739|2015-01-22 013427|72A1_FINV 34 86ae4c6b0e2e2d5a13a0d11440529aeb 32

8739

2015-01-22

013427 72A1_SLDET 100 8739|2015-01-22 013427|72A1_SLDET 35 51ce9bc292eef407bd7c91a52eebcf2e 32

8739

2015-01-22

013432 72A1_FINV 126 8739|2015-01-22 013432|72A1_FINV 34 9482a41c1fecc4c64b8c437af6cc85e8 32

8739

2015-01-22

013432

13A8_MW_UB

AT_VT 5 8739|2015-01-22 013432|13A8_MW_UBAT_VT 42 e4160914ee55ce0b93f87b23366a0ce3 32

8674

2015-01-22

013426 72A1_FINV 6 8674|2015-01-22 013426|72A1_FINV 34 fcb3e7c8c91e44ce396d908a4948ca65 32

8674

2015-01-22

013426

16A1_HSVER

OND 7 8674|2015-01-22 013426|16A1_HSVEROND 38 fe9098c8c291ad56af5c8afae5169196 32

Loses statistical Informationregarding the data distributionQuery optimizers do not like thishellip

Column family Document and Key-value databases need a

good (natural) sharding key for (partial) key-

lookups

Hashinghelliphellip

httpwwwebaytechblogcom20120814cassandra-data-modeling-best-practices-part-2rwerschkull

nllinkedincominrogierwerschkull

Surrogates keys require

centralized coordination

hellipand thus can impact the overall systemrsquos scalability and availability

A lot of MPP NoSQL databases simply do not have themhellip

B) Surrogate BuSINESS keys

rwerschkull

nllinkedincominrogierwerschkull

Then Some Inspiration

httproelantvoscomblogp=1119

rwerschkull

nllinkedincominrogierwerschkull

lsquoIn my opinion the answer lies in the adoption of the

persistent (Historical) Staging Area concept

(also known as Historical Staging or the History Area)

This basically adopts the fundamentals of a Data Warehousersquo

lsquoThe Historical Staging Area effectively lsquoactsrsquo as

Data Lake but in a better defined form as data deltas and

event datetimes are taken into accountrsquo

So

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

Subject Oriented Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Could be a

Data LAKE

VirtualisedEnsemble

Tier

EDW

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

rwerschkull

nllinkedincominrogierwerschkull

How

Does PS-3C Work

rwerschkull

nllinkedincominrogierwerschkullPhoto credit Public Domain

StagingArea

EDWInformation

Marts

Focus of Current ensemble EDWrsquos

rwerschkull

nllinkedincominrogierwerschkull

Persistent StagingArea HSA =

Data LibraryEDW

Information Marts

Splitting the work

rwerschkull

nllinkedincominrogierwerschkull

Persitent

Staging

-

Concept

Context

Connector

Business

Concept

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

Identify source event stream Primary or Unique KeyUse source metadata for this

Automate the building of a PS lsquoaround this keyrsquo Take all columns

Historize using SCD-2 approach

Persistent Staging - how

rwerschkull

nllinkedincominrogierwerschkull

Entity levelUnique key

Functional Description

Delivering party

Owner Responsible

MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)

hellip

Persistent Staging Metadata-1

rwerschkull

nllinkedincominrogierwerschkull

Column level [Load Date Timestamp]

[Load End Date Timestamp]

[Deleted Flag] OR delete as new record

[Source system] on table file level (lowest possible)

Load End Date Timestamp possible but difficulthellipRequires updates

Persistent Staging Metadata-2

rwerschkull

nllinkedincominrogierwerschkull

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 2: PS-3C Data Modelling Zone Berlin

About Me

lsquoHead of BIrsquo Spilgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Ensemble data Modellinghellip

rwerschkull

nllinkedincominrogierwerschkull

WHY

Another Ensemble

Wersquove got loads already

httptopofmindssewp2014ensemble-modeling-forelasningar-och-lars-ronnbackrwerschkull

nllinkedincominrogierwerschkull

13600

9970

37073 5

0

2000

4000

6000

8000

10000

12000

14000

16000

Data Vault modeling+data warehouse

Anchor modeling+data warehouse

Hyper agility +datawarehouse

Focal point modeling+data warehouse

Head version modeling+data warehouse

Search Hits on Google 31-8-2016

Ensemble

Popularity

Data Vault

Ran into Problems using

Head-Version

Anchor Modelling

rwerschkull

nllinkedincominrogierwerschkull

WHAT problemS

Photo My ownhelliphelliprwerschkull

nllinkedincominrogierwerschkull

problem

rwerschkull

nllinkedincominrogierwerschkull

Not Build for

BiG data lake

Data CentrICITY

Photo credit Lake Public Domain httpwwwwriteupsorgstar-trek-brent-spiner-datarwerschkull

nllinkedincominrogierwerschkull

lsquoData may first be stored in a

data lake so that it can be explored cleaned and prepared

If it can be structured in a relational format (basically rows and columns) and needs to be used frequently and kept highly secure it may go into a

data warehouse

If it stops being used frequently it may go back to a HDFS

(Hadoop Distributed File System)-based archiversquo

Data Centric data first THOMAS H DAVENPORT WALL STEET JOURNAL OF 3-6-2015

httpblogswsjcomcio20150603the-shift-to-a-new-data-architecture rwerschkull

nllinkedincominrogierwerschkull

Systems Like

rwerschkull

nllinkedincominrogierwerschkull

Data Flood

Photo credit Kurayba (httpswwwflickrcomphotos48503330N0828564454666 )

under cc licence (httpscreativecommonsorglicensesby-sa20)

rwerschkull

nllinkedincominrogierwerschkull

The possible resulthellip

Photo credit httpshighfiveexportswordpresscom201006253000-pieces-lego-mix-specialty-pieces-rare-pieces-bricks-blocks-

parts-more-ultimate-lot-of-lego-parts-pieces-lego-for-sale-lego-batman-lego-starwars-lego-technic-lego-minifigur

rwerschkull

nllinkedincominrogierwerschkull

But isNrsquot Data Vault v2

lsquomade for

Big data centric

systemsrsquorwerschkull

nllinkedincominrogierwerschkull

In

DV2you still

do thisin one go

Subject Oriented

Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Coding =

A lot like modelling

Being Data Centric

conflictswith thecomplex Data

MODELLINGwork

httpxkcdcom844

problem

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Less Mature

JOINoptimizers

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Key-Value

Document

Column Family

NoSQL Databases

+SQL on Hadoop solutions

Do NOT like joins

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

This

REALLYcomplicates

AnchoRModeling

rwerschkull

nllinkedincominrogierwerschkull

And Personally

HUB

SAT

LINK

This one

too(Link Satellite)

HUB

HUB

SAT

SAT

rwerschkull

nllinkedincominrogierwerschkull

problem

rwerschkull

nllinkedincominrogierwerschkull

a) HASHING OF Business keys

Rolling

Stock Nr Datetime Sensor Id Value Concatenated Business Key

Key

Len MD5 Hash

Key

Len

8739

2015-01-22

013427 72A1_FINV 123 8739|2015-01-22 013427|72A1_FINV 34 86ae4c6b0e2e2d5a13a0d11440529aeb 32

8739

2015-01-22

013427 72A1_SLDET 100 8739|2015-01-22 013427|72A1_SLDET 35 51ce9bc292eef407bd7c91a52eebcf2e 32

8739

2015-01-22

013432 72A1_FINV 126 8739|2015-01-22 013432|72A1_FINV 34 9482a41c1fecc4c64b8c437af6cc85e8 32

8739

2015-01-22

013432

13A8_MW_UB

AT_VT 5 8739|2015-01-22 013432|13A8_MW_UBAT_VT 42 e4160914ee55ce0b93f87b23366a0ce3 32

8674

2015-01-22

013426 72A1_FINV 6 8674|2015-01-22 013426|72A1_FINV 34 fcb3e7c8c91e44ce396d908a4948ca65 32

8674

2015-01-22

013426

16A1_HSVER

OND 7 8674|2015-01-22 013426|16A1_HSVEROND 38 fe9098c8c291ad56af5c8afae5169196 32

Loses statistical Informationregarding the data distributionQuery optimizers do not like thishellip

Column family Document and Key-value databases need a

good (natural) sharding key for (partial) key-

lookups

Hashinghelliphellip

httpwwwebaytechblogcom20120814cassandra-data-modeling-best-practices-part-2rwerschkull

nllinkedincominrogierwerschkull

Surrogates keys require

centralized coordination

hellipand thus can impact the overall systemrsquos scalability and availability

A lot of MPP NoSQL databases simply do not have themhellip

B) Surrogate BuSINESS keys

rwerschkull

nllinkedincominrogierwerschkull

Then Some Inspiration

httproelantvoscomblogp=1119

rwerschkull

nllinkedincominrogierwerschkull

lsquoIn my opinion the answer lies in the adoption of the

persistent (Historical) Staging Area concept

(also known as Historical Staging or the History Area)

This basically adopts the fundamentals of a Data Warehousersquo

lsquoThe Historical Staging Area effectively lsquoactsrsquo as

Data Lake but in a better defined form as data deltas and

event datetimes are taken into accountrsquo

So

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

Subject Oriented Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Could be a

Data LAKE

VirtualisedEnsemble

Tier

EDW

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

rwerschkull

nllinkedincominrogierwerschkull

How

Does PS-3C Work

rwerschkull

nllinkedincominrogierwerschkullPhoto credit Public Domain

StagingArea

EDWInformation

Marts

Focus of Current ensemble EDWrsquos

rwerschkull

nllinkedincominrogierwerschkull

Persistent StagingArea HSA =

Data LibraryEDW

Information Marts

Splitting the work

rwerschkull

nllinkedincominrogierwerschkull

Persitent

Staging

-

Concept

Context

Connector

Business

Concept

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

Identify source event stream Primary or Unique KeyUse source metadata for this

Automate the building of a PS lsquoaround this keyrsquo Take all columns

Historize using SCD-2 approach

Persistent Staging - how

rwerschkull

nllinkedincominrogierwerschkull

Entity levelUnique key

Functional Description

Delivering party

Owner Responsible

MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)

hellip

Persistent Staging Metadata-1

rwerschkull

nllinkedincominrogierwerschkull

Column level [Load Date Timestamp]

[Load End Date Timestamp]

[Deleted Flag] OR delete as new record

[Source system] on table file level (lowest possible)

Load End Date Timestamp possible but difficulthellipRequires updates

Persistent Staging Metadata-2

rwerschkull

nllinkedincominrogierwerschkull

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 3: PS-3C Data Modelling Zone Berlin

Ensemble data Modellinghellip

rwerschkull

nllinkedincominrogierwerschkull

WHY

Another Ensemble

Wersquove got loads already

httptopofmindssewp2014ensemble-modeling-forelasningar-och-lars-ronnbackrwerschkull

nllinkedincominrogierwerschkull

13600

9970

37073 5

0

2000

4000

6000

8000

10000

12000

14000

16000

Data Vault modeling+data warehouse

Anchor modeling+data warehouse

Hyper agility +datawarehouse

Focal point modeling+data warehouse

Head version modeling+data warehouse

Search Hits on Google 31-8-2016

Ensemble

Popularity

Data Vault

Ran into Problems using

Head-Version

Anchor Modelling

rwerschkull

nllinkedincominrogierwerschkull

WHAT problemS

Photo My ownhelliphelliprwerschkull

nllinkedincominrogierwerschkull

problem

rwerschkull

nllinkedincominrogierwerschkull

Not Build for

BiG data lake

Data CentrICITY

Photo credit Lake Public Domain httpwwwwriteupsorgstar-trek-brent-spiner-datarwerschkull

nllinkedincominrogierwerschkull

lsquoData may first be stored in a

data lake so that it can be explored cleaned and prepared

If it can be structured in a relational format (basically rows and columns) and needs to be used frequently and kept highly secure it may go into a

data warehouse

If it stops being used frequently it may go back to a HDFS

(Hadoop Distributed File System)-based archiversquo

Data Centric data first THOMAS H DAVENPORT WALL STEET JOURNAL OF 3-6-2015

httpblogswsjcomcio20150603the-shift-to-a-new-data-architecture rwerschkull

nllinkedincominrogierwerschkull

Systems Like

rwerschkull

nllinkedincominrogierwerschkull

Data Flood

Photo credit Kurayba (httpswwwflickrcomphotos48503330N0828564454666 )

under cc licence (httpscreativecommonsorglicensesby-sa20)

rwerschkull

nllinkedincominrogierwerschkull

The possible resulthellip

Photo credit httpshighfiveexportswordpresscom201006253000-pieces-lego-mix-specialty-pieces-rare-pieces-bricks-blocks-

parts-more-ultimate-lot-of-lego-parts-pieces-lego-for-sale-lego-batman-lego-starwars-lego-technic-lego-minifigur

rwerschkull

nllinkedincominrogierwerschkull

But isNrsquot Data Vault v2

lsquomade for

Big data centric

systemsrsquorwerschkull

nllinkedincominrogierwerschkull

In

DV2you still

do thisin one go

Subject Oriented

Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Coding =

A lot like modelling

Being Data Centric

conflictswith thecomplex Data

MODELLINGwork

httpxkcdcom844

problem

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Less Mature

JOINoptimizers

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Key-Value

Document

Column Family

NoSQL Databases

+SQL on Hadoop solutions

Do NOT like joins

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

This

REALLYcomplicates

AnchoRModeling

rwerschkull

nllinkedincominrogierwerschkull

And Personally

HUB

SAT

LINK

This one

too(Link Satellite)

HUB

HUB

SAT

SAT

rwerschkull

nllinkedincominrogierwerschkull

problem

rwerschkull

nllinkedincominrogierwerschkull

a) HASHING OF Business keys

Rolling

Stock Nr Datetime Sensor Id Value Concatenated Business Key

Key

Len MD5 Hash

Key

Len

8739

2015-01-22

013427 72A1_FINV 123 8739|2015-01-22 013427|72A1_FINV 34 86ae4c6b0e2e2d5a13a0d11440529aeb 32

8739

2015-01-22

013427 72A1_SLDET 100 8739|2015-01-22 013427|72A1_SLDET 35 51ce9bc292eef407bd7c91a52eebcf2e 32

8739

2015-01-22

013432 72A1_FINV 126 8739|2015-01-22 013432|72A1_FINV 34 9482a41c1fecc4c64b8c437af6cc85e8 32

8739

2015-01-22

013432

13A8_MW_UB

AT_VT 5 8739|2015-01-22 013432|13A8_MW_UBAT_VT 42 e4160914ee55ce0b93f87b23366a0ce3 32

8674

2015-01-22

013426 72A1_FINV 6 8674|2015-01-22 013426|72A1_FINV 34 fcb3e7c8c91e44ce396d908a4948ca65 32

8674

2015-01-22

013426

16A1_HSVER

OND 7 8674|2015-01-22 013426|16A1_HSVEROND 38 fe9098c8c291ad56af5c8afae5169196 32

Loses statistical Informationregarding the data distributionQuery optimizers do not like thishellip

Column family Document and Key-value databases need a

good (natural) sharding key for (partial) key-

lookups

Hashinghelliphellip

httpwwwebaytechblogcom20120814cassandra-data-modeling-best-practices-part-2rwerschkull

nllinkedincominrogierwerschkull

Surrogates keys require

centralized coordination

hellipand thus can impact the overall systemrsquos scalability and availability

A lot of MPP NoSQL databases simply do not have themhellip

B) Surrogate BuSINESS keys

rwerschkull

nllinkedincominrogierwerschkull

Then Some Inspiration

httproelantvoscomblogp=1119

rwerschkull

nllinkedincominrogierwerschkull

lsquoIn my opinion the answer lies in the adoption of the

persistent (Historical) Staging Area concept

(also known as Historical Staging or the History Area)

This basically adopts the fundamentals of a Data Warehousersquo

lsquoThe Historical Staging Area effectively lsquoactsrsquo as

Data Lake but in a better defined form as data deltas and

event datetimes are taken into accountrsquo

So

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

Subject Oriented Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Could be a

Data LAKE

VirtualisedEnsemble

Tier

EDW

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

rwerschkull

nllinkedincominrogierwerschkull

How

Does PS-3C Work

rwerschkull

nllinkedincominrogierwerschkullPhoto credit Public Domain

StagingArea

EDWInformation

Marts

Focus of Current ensemble EDWrsquos

rwerschkull

nllinkedincominrogierwerschkull

Persistent StagingArea HSA =

Data LibraryEDW

Information Marts

Splitting the work

rwerschkull

nllinkedincominrogierwerschkull

Persitent

Staging

-

Concept

Context

Connector

Business

Concept

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

Identify source event stream Primary or Unique KeyUse source metadata for this

Automate the building of a PS lsquoaround this keyrsquo Take all columns

Historize using SCD-2 approach

Persistent Staging - how

rwerschkull

nllinkedincominrogierwerschkull

Entity levelUnique key

Functional Description

Delivering party

Owner Responsible

MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)

hellip

Persistent Staging Metadata-1

rwerschkull

nllinkedincominrogierwerschkull

Column level [Load Date Timestamp]

[Load End Date Timestamp]

[Deleted Flag] OR delete as new record

[Source system] on table file level (lowest possible)

Load End Date Timestamp possible but difficulthellipRequires updates

Persistent Staging Metadata-2

rwerschkull

nllinkedincominrogierwerschkull

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 4: PS-3C Data Modelling Zone Berlin

WHY

Another Ensemble

Wersquove got loads already

httptopofmindssewp2014ensemble-modeling-forelasningar-och-lars-ronnbackrwerschkull

nllinkedincominrogierwerschkull

13600

9970

37073 5

0

2000

4000

6000

8000

10000

12000

14000

16000

Data Vault modeling+data warehouse

Anchor modeling+data warehouse

Hyper agility +datawarehouse

Focal point modeling+data warehouse

Head version modeling+data warehouse

Search Hits on Google 31-8-2016

Ensemble

Popularity

Data Vault

Ran into Problems using

Head-Version

Anchor Modelling

rwerschkull

nllinkedincominrogierwerschkull

WHAT problemS

Photo My ownhelliphelliprwerschkull

nllinkedincominrogierwerschkull

problem

rwerschkull

nllinkedincominrogierwerschkull

Not Build for

BiG data lake

Data CentrICITY

Photo credit Lake Public Domain httpwwwwriteupsorgstar-trek-brent-spiner-datarwerschkull

nllinkedincominrogierwerschkull

lsquoData may first be stored in a

data lake so that it can be explored cleaned and prepared

If it can be structured in a relational format (basically rows and columns) and needs to be used frequently and kept highly secure it may go into a

data warehouse

If it stops being used frequently it may go back to a HDFS

(Hadoop Distributed File System)-based archiversquo

Data Centric data first THOMAS H DAVENPORT WALL STEET JOURNAL OF 3-6-2015

httpblogswsjcomcio20150603the-shift-to-a-new-data-architecture rwerschkull

nllinkedincominrogierwerschkull

Systems Like

rwerschkull

nllinkedincominrogierwerschkull

Data Flood

Photo credit Kurayba (httpswwwflickrcomphotos48503330N0828564454666 )

under cc licence (httpscreativecommonsorglicensesby-sa20)

rwerschkull

nllinkedincominrogierwerschkull

The possible resulthellip

Photo credit httpshighfiveexportswordpresscom201006253000-pieces-lego-mix-specialty-pieces-rare-pieces-bricks-blocks-

parts-more-ultimate-lot-of-lego-parts-pieces-lego-for-sale-lego-batman-lego-starwars-lego-technic-lego-minifigur

rwerschkull

nllinkedincominrogierwerschkull

But isNrsquot Data Vault v2

lsquomade for

Big data centric

systemsrsquorwerschkull

nllinkedincominrogierwerschkull

In

DV2you still

do thisin one go

Subject Oriented

Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Coding =

A lot like modelling

Being Data Centric

conflictswith thecomplex Data

MODELLINGwork

httpxkcdcom844

problem

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Less Mature

JOINoptimizers

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Key-Value

Document

Column Family

NoSQL Databases

+SQL on Hadoop solutions

Do NOT like joins

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

This

REALLYcomplicates

AnchoRModeling

rwerschkull

nllinkedincominrogierwerschkull

And Personally

HUB

SAT

LINK

This one

too(Link Satellite)

HUB

HUB

SAT

SAT

rwerschkull

nllinkedincominrogierwerschkull

problem

rwerschkull

nllinkedincominrogierwerschkull

a) HASHING OF Business keys

Rolling

Stock Nr Datetime Sensor Id Value Concatenated Business Key

Key

Len MD5 Hash

Key

Len

8739

2015-01-22

013427 72A1_FINV 123 8739|2015-01-22 013427|72A1_FINV 34 86ae4c6b0e2e2d5a13a0d11440529aeb 32

8739

2015-01-22

013427 72A1_SLDET 100 8739|2015-01-22 013427|72A1_SLDET 35 51ce9bc292eef407bd7c91a52eebcf2e 32

8739

2015-01-22

013432 72A1_FINV 126 8739|2015-01-22 013432|72A1_FINV 34 9482a41c1fecc4c64b8c437af6cc85e8 32

8739

2015-01-22

013432

13A8_MW_UB

AT_VT 5 8739|2015-01-22 013432|13A8_MW_UBAT_VT 42 e4160914ee55ce0b93f87b23366a0ce3 32

8674

2015-01-22

013426 72A1_FINV 6 8674|2015-01-22 013426|72A1_FINV 34 fcb3e7c8c91e44ce396d908a4948ca65 32

8674

2015-01-22

013426

16A1_HSVER

OND 7 8674|2015-01-22 013426|16A1_HSVEROND 38 fe9098c8c291ad56af5c8afae5169196 32

Loses statistical Informationregarding the data distributionQuery optimizers do not like thishellip

Column family Document and Key-value databases need a

good (natural) sharding key for (partial) key-

lookups

Hashinghelliphellip

httpwwwebaytechblogcom20120814cassandra-data-modeling-best-practices-part-2rwerschkull

nllinkedincominrogierwerschkull

Surrogates keys require

centralized coordination

hellipand thus can impact the overall systemrsquos scalability and availability

A lot of MPP NoSQL databases simply do not have themhellip

B) Surrogate BuSINESS keys

rwerschkull

nllinkedincominrogierwerschkull

Then Some Inspiration

httproelantvoscomblogp=1119

rwerschkull

nllinkedincominrogierwerschkull

lsquoIn my opinion the answer lies in the adoption of the

persistent (Historical) Staging Area concept

(also known as Historical Staging or the History Area)

This basically adopts the fundamentals of a Data Warehousersquo

lsquoThe Historical Staging Area effectively lsquoactsrsquo as

Data Lake but in a better defined form as data deltas and

event datetimes are taken into accountrsquo

So

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

Subject Oriented Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Could be a

Data LAKE

VirtualisedEnsemble

Tier

EDW

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

rwerschkull

nllinkedincominrogierwerschkull

How

Does PS-3C Work

rwerschkull

nllinkedincominrogierwerschkullPhoto credit Public Domain

StagingArea

EDWInformation

Marts

Focus of Current ensemble EDWrsquos

rwerschkull

nllinkedincominrogierwerschkull

Persistent StagingArea HSA =

Data LibraryEDW

Information Marts

Splitting the work

rwerschkull

nllinkedincominrogierwerschkull

Persitent

Staging

-

Concept

Context

Connector

Business

Concept

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

Identify source event stream Primary or Unique KeyUse source metadata for this

Automate the building of a PS lsquoaround this keyrsquo Take all columns

Historize using SCD-2 approach

Persistent Staging - how

rwerschkull

nllinkedincominrogierwerschkull

Entity levelUnique key

Functional Description

Delivering party

Owner Responsible

MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)

hellip

Persistent Staging Metadata-1

rwerschkull

nllinkedincominrogierwerschkull

Column level [Load Date Timestamp]

[Load End Date Timestamp]

[Deleted Flag] OR delete as new record

[Source system] on table file level (lowest possible)

Load End Date Timestamp possible but difficulthellipRequires updates

Persistent Staging Metadata-2

rwerschkull

nllinkedincominrogierwerschkull

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 5: PS-3C Data Modelling Zone Berlin

13600

9970

37073 5

0

2000

4000

6000

8000

10000

12000

14000

16000

Data Vault modeling+data warehouse

Anchor modeling+data warehouse

Hyper agility +datawarehouse

Focal point modeling+data warehouse

Head version modeling+data warehouse

Search Hits on Google 31-8-2016

Ensemble

Popularity

Data Vault

Ran into Problems using

Head-Version

Anchor Modelling

rwerschkull

nllinkedincominrogierwerschkull

WHAT problemS

Photo My ownhelliphelliprwerschkull

nllinkedincominrogierwerschkull

problem

rwerschkull

nllinkedincominrogierwerschkull

Not Build for

BiG data lake

Data CentrICITY

Photo credit Lake Public Domain httpwwwwriteupsorgstar-trek-brent-spiner-datarwerschkull

nllinkedincominrogierwerschkull

lsquoData may first be stored in a

data lake so that it can be explored cleaned and prepared

If it can be structured in a relational format (basically rows and columns) and needs to be used frequently and kept highly secure it may go into a

data warehouse

If it stops being used frequently it may go back to a HDFS

(Hadoop Distributed File System)-based archiversquo

Data Centric data first THOMAS H DAVENPORT WALL STEET JOURNAL OF 3-6-2015

httpblogswsjcomcio20150603the-shift-to-a-new-data-architecture rwerschkull

nllinkedincominrogierwerschkull

Systems Like

rwerschkull

nllinkedincominrogierwerschkull

Data Flood

Photo credit Kurayba (httpswwwflickrcomphotos48503330N0828564454666 )

under cc licence (httpscreativecommonsorglicensesby-sa20)

rwerschkull

nllinkedincominrogierwerschkull

The possible resulthellip

Photo credit httpshighfiveexportswordpresscom201006253000-pieces-lego-mix-specialty-pieces-rare-pieces-bricks-blocks-

parts-more-ultimate-lot-of-lego-parts-pieces-lego-for-sale-lego-batman-lego-starwars-lego-technic-lego-minifigur

rwerschkull

nllinkedincominrogierwerschkull

But isNrsquot Data Vault v2

lsquomade for

Big data centric

systemsrsquorwerschkull

nllinkedincominrogierwerschkull

In

DV2you still

do thisin one go

Subject Oriented

Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Coding =

A lot like modelling

Being Data Centric

conflictswith thecomplex Data

MODELLINGwork

httpxkcdcom844

problem

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Less Mature

JOINoptimizers

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Key-Value

Document

Column Family

NoSQL Databases

+SQL on Hadoop solutions

Do NOT like joins

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

This

REALLYcomplicates

AnchoRModeling

rwerschkull

nllinkedincominrogierwerschkull

And Personally

HUB

SAT

LINK

This one

too(Link Satellite)

HUB

HUB

SAT

SAT

rwerschkull

nllinkedincominrogierwerschkull

problem

rwerschkull

nllinkedincominrogierwerschkull

a) HASHING OF Business keys

Rolling

Stock Nr Datetime Sensor Id Value Concatenated Business Key

Key

Len MD5 Hash

Key

Len

8739

2015-01-22

013427 72A1_FINV 123 8739|2015-01-22 013427|72A1_FINV 34 86ae4c6b0e2e2d5a13a0d11440529aeb 32

8739

2015-01-22

013427 72A1_SLDET 100 8739|2015-01-22 013427|72A1_SLDET 35 51ce9bc292eef407bd7c91a52eebcf2e 32

8739

2015-01-22

013432 72A1_FINV 126 8739|2015-01-22 013432|72A1_FINV 34 9482a41c1fecc4c64b8c437af6cc85e8 32

8739

2015-01-22

013432

13A8_MW_UB

AT_VT 5 8739|2015-01-22 013432|13A8_MW_UBAT_VT 42 e4160914ee55ce0b93f87b23366a0ce3 32

8674

2015-01-22

013426 72A1_FINV 6 8674|2015-01-22 013426|72A1_FINV 34 fcb3e7c8c91e44ce396d908a4948ca65 32

8674

2015-01-22

013426

16A1_HSVER

OND 7 8674|2015-01-22 013426|16A1_HSVEROND 38 fe9098c8c291ad56af5c8afae5169196 32

Loses statistical Informationregarding the data distributionQuery optimizers do not like thishellip

Column family Document and Key-value databases need a

good (natural) sharding key for (partial) key-

lookups

Hashinghelliphellip

httpwwwebaytechblogcom20120814cassandra-data-modeling-best-practices-part-2rwerschkull

nllinkedincominrogierwerschkull

Surrogates keys require

centralized coordination

hellipand thus can impact the overall systemrsquos scalability and availability

A lot of MPP NoSQL databases simply do not have themhellip

B) Surrogate BuSINESS keys

rwerschkull

nllinkedincominrogierwerschkull

Then Some Inspiration

httproelantvoscomblogp=1119

rwerschkull

nllinkedincominrogierwerschkull

lsquoIn my opinion the answer lies in the adoption of the

persistent (Historical) Staging Area concept

(also known as Historical Staging or the History Area)

This basically adopts the fundamentals of a Data Warehousersquo

lsquoThe Historical Staging Area effectively lsquoactsrsquo as

Data Lake but in a better defined form as data deltas and

event datetimes are taken into accountrsquo

So

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

Subject Oriented Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Could be a

Data LAKE

VirtualisedEnsemble

Tier

EDW

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

rwerschkull

nllinkedincominrogierwerschkull

How

Does PS-3C Work

rwerschkull

nllinkedincominrogierwerschkullPhoto credit Public Domain

StagingArea

EDWInformation

Marts

Focus of Current ensemble EDWrsquos

rwerschkull

nllinkedincominrogierwerschkull

Persistent StagingArea HSA =

Data LibraryEDW

Information Marts

Splitting the work

rwerschkull

nllinkedincominrogierwerschkull

Persitent

Staging

-

Concept

Context

Connector

Business

Concept

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

Identify source event stream Primary or Unique KeyUse source metadata for this

Automate the building of a PS lsquoaround this keyrsquo Take all columns

Historize using SCD-2 approach

Persistent Staging - how

rwerschkull

nllinkedincominrogierwerschkull

Entity levelUnique key

Functional Description

Delivering party

Owner Responsible

MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)

hellip

Persistent Staging Metadata-1

rwerschkull

nllinkedincominrogierwerschkull

Column level [Load Date Timestamp]

[Load End Date Timestamp]

[Deleted Flag] OR delete as new record

[Source system] on table file level (lowest possible)

Load End Date Timestamp possible but difficulthellipRequires updates

Persistent Staging Metadata-2

rwerschkull

nllinkedincominrogierwerschkull

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 6: PS-3C Data Modelling Zone Berlin

Data Vault

Ran into Problems using

Head-Version

Anchor Modelling

rwerschkull

nllinkedincominrogierwerschkull

WHAT problemS

Photo My ownhelliphelliprwerschkull

nllinkedincominrogierwerschkull

problem

rwerschkull

nllinkedincominrogierwerschkull

Not Build for

BiG data lake

Data CentrICITY

Photo credit Lake Public Domain httpwwwwriteupsorgstar-trek-brent-spiner-datarwerschkull

nllinkedincominrogierwerschkull

lsquoData may first be stored in a

data lake so that it can be explored cleaned and prepared

If it can be structured in a relational format (basically rows and columns) and needs to be used frequently and kept highly secure it may go into a

data warehouse

If it stops being used frequently it may go back to a HDFS

(Hadoop Distributed File System)-based archiversquo

Data Centric data first THOMAS H DAVENPORT WALL STEET JOURNAL OF 3-6-2015

httpblogswsjcomcio20150603the-shift-to-a-new-data-architecture rwerschkull

nllinkedincominrogierwerschkull

Systems Like

rwerschkull

nllinkedincominrogierwerschkull

Data Flood

Photo credit Kurayba (httpswwwflickrcomphotos48503330N0828564454666 )

under cc licence (httpscreativecommonsorglicensesby-sa20)

rwerschkull

nllinkedincominrogierwerschkull

The possible resulthellip

Photo credit httpshighfiveexportswordpresscom201006253000-pieces-lego-mix-specialty-pieces-rare-pieces-bricks-blocks-

parts-more-ultimate-lot-of-lego-parts-pieces-lego-for-sale-lego-batman-lego-starwars-lego-technic-lego-minifigur

rwerschkull

nllinkedincominrogierwerschkull

But isNrsquot Data Vault v2

lsquomade for

Big data centric

systemsrsquorwerschkull

nllinkedincominrogierwerschkull

In

DV2you still

do thisin one go

Subject Oriented

Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Coding =

A lot like modelling

Being Data Centric

conflictswith thecomplex Data

MODELLINGwork

httpxkcdcom844

problem

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Less Mature

JOINoptimizers

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Key-Value

Document

Column Family

NoSQL Databases

+SQL on Hadoop solutions

Do NOT like joins

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

This

REALLYcomplicates

AnchoRModeling

rwerschkull

nllinkedincominrogierwerschkull

And Personally

HUB

SAT

LINK

This one

too(Link Satellite)

HUB

HUB

SAT

SAT

rwerschkull

nllinkedincominrogierwerschkull

problem

rwerschkull

nllinkedincominrogierwerschkull

a) HASHING OF Business keys

Rolling

Stock Nr Datetime Sensor Id Value Concatenated Business Key

Key

Len MD5 Hash

Key

Len

8739

2015-01-22

013427 72A1_FINV 123 8739|2015-01-22 013427|72A1_FINV 34 86ae4c6b0e2e2d5a13a0d11440529aeb 32

8739

2015-01-22

013427 72A1_SLDET 100 8739|2015-01-22 013427|72A1_SLDET 35 51ce9bc292eef407bd7c91a52eebcf2e 32

8739

2015-01-22

013432 72A1_FINV 126 8739|2015-01-22 013432|72A1_FINV 34 9482a41c1fecc4c64b8c437af6cc85e8 32

8739

2015-01-22

013432

13A8_MW_UB

AT_VT 5 8739|2015-01-22 013432|13A8_MW_UBAT_VT 42 e4160914ee55ce0b93f87b23366a0ce3 32

8674

2015-01-22

013426 72A1_FINV 6 8674|2015-01-22 013426|72A1_FINV 34 fcb3e7c8c91e44ce396d908a4948ca65 32

8674

2015-01-22

013426

16A1_HSVER

OND 7 8674|2015-01-22 013426|16A1_HSVEROND 38 fe9098c8c291ad56af5c8afae5169196 32

Loses statistical Informationregarding the data distributionQuery optimizers do not like thishellip

Column family Document and Key-value databases need a

good (natural) sharding key for (partial) key-

lookups

Hashinghelliphellip

httpwwwebaytechblogcom20120814cassandra-data-modeling-best-practices-part-2rwerschkull

nllinkedincominrogierwerschkull

Surrogates keys require

centralized coordination

hellipand thus can impact the overall systemrsquos scalability and availability

A lot of MPP NoSQL databases simply do not have themhellip

B) Surrogate BuSINESS keys

rwerschkull

nllinkedincominrogierwerschkull

Then Some Inspiration

httproelantvoscomblogp=1119

rwerschkull

nllinkedincominrogierwerschkull

lsquoIn my opinion the answer lies in the adoption of the

persistent (Historical) Staging Area concept

(also known as Historical Staging or the History Area)

This basically adopts the fundamentals of a Data Warehousersquo

lsquoThe Historical Staging Area effectively lsquoactsrsquo as

Data Lake but in a better defined form as data deltas and

event datetimes are taken into accountrsquo

So

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

Subject Oriented Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Could be a

Data LAKE

VirtualisedEnsemble

Tier

EDW

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

rwerschkull

nllinkedincominrogierwerschkull

How

Does PS-3C Work

rwerschkull

nllinkedincominrogierwerschkullPhoto credit Public Domain

StagingArea

EDWInformation

Marts

Focus of Current ensemble EDWrsquos

rwerschkull

nllinkedincominrogierwerschkull

Persistent StagingArea HSA =

Data LibraryEDW

Information Marts

Splitting the work

rwerschkull

nllinkedincominrogierwerschkull

Persitent

Staging

-

Concept

Context

Connector

Business

Concept

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

Identify source event stream Primary or Unique KeyUse source metadata for this

Automate the building of a PS lsquoaround this keyrsquo Take all columns

Historize using SCD-2 approach

Persistent Staging - how

rwerschkull

nllinkedincominrogierwerschkull

Entity levelUnique key

Functional Description

Delivering party

Owner Responsible

MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)

hellip

Persistent Staging Metadata-1

rwerschkull

nllinkedincominrogierwerschkull

Column level [Load Date Timestamp]

[Load End Date Timestamp]

[Deleted Flag] OR delete as new record

[Source system] on table file level (lowest possible)

Load End Date Timestamp possible but difficulthellipRequires updates

Persistent Staging Metadata-2

rwerschkull

nllinkedincominrogierwerschkull

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 7: PS-3C Data Modelling Zone Berlin

WHAT problemS

Photo My ownhelliphelliprwerschkull

nllinkedincominrogierwerschkull

problem

rwerschkull

nllinkedincominrogierwerschkull

Not Build for

BiG data lake

Data CentrICITY

Photo credit Lake Public Domain httpwwwwriteupsorgstar-trek-brent-spiner-datarwerschkull

nllinkedincominrogierwerschkull

lsquoData may first be stored in a

data lake so that it can be explored cleaned and prepared

If it can be structured in a relational format (basically rows and columns) and needs to be used frequently and kept highly secure it may go into a

data warehouse

If it stops being used frequently it may go back to a HDFS

(Hadoop Distributed File System)-based archiversquo

Data Centric data first THOMAS H DAVENPORT WALL STEET JOURNAL OF 3-6-2015

httpblogswsjcomcio20150603the-shift-to-a-new-data-architecture rwerschkull

nllinkedincominrogierwerschkull

Systems Like

rwerschkull

nllinkedincominrogierwerschkull

Data Flood

Photo credit Kurayba (httpswwwflickrcomphotos48503330N0828564454666 )

under cc licence (httpscreativecommonsorglicensesby-sa20)

rwerschkull

nllinkedincominrogierwerschkull

The possible resulthellip

Photo credit httpshighfiveexportswordpresscom201006253000-pieces-lego-mix-specialty-pieces-rare-pieces-bricks-blocks-

parts-more-ultimate-lot-of-lego-parts-pieces-lego-for-sale-lego-batman-lego-starwars-lego-technic-lego-minifigur

rwerschkull

nllinkedincominrogierwerschkull

But isNrsquot Data Vault v2

lsquomade for

Big data centric

systemsrsquorwerschkull

nllinkedincominrogierwerschkull

In

DV2you still

do thisin one go

Subject Oriented

Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Coding =

A lot like modelling

Being Data Centric

conflictswith thecomplex Data

MODELLINGwork

httpxkcdcom844

problem

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Less Mature

JOINoptimizers

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Key-Value

Document

Column Family

NoSQL Databases

+SQL on Hadoop solutions

Do NOT like joins

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

This

REALLYcomplicates

AnchoRModeling

rwerschkull

nllinkedincominrogierwerschkull

And Personally

HUB

SAT

LINK

This one

too(Link Satellite)

HUB

HUB

SAT

SAT

rwerschkull

nllinkedincominrogierwerschkull

problem

rwerschkull

nllinkedincominrogierwerschkull

a) HASHING OF Business keys

Rolling

Stock Nr Datetime Sensor Id Value Concatenated Business Key

Key

Len MD5 Hash

Key

Len

8739

2015-01-22

013427 72A1_FINV 123 8739|2015-01-22 013427|72A1_FINV 34 86ae4c6b0e2e2d5a13a0d11440529aeb 32

8739

2015-01-22

013427 72A1_SLDET 100 8739|2015-01-22 013427|72A1_SLDET 35 51ce9bc292eef407bd7c91a52eebcf2e 32

8739

2015-01-22

013432 72A1_FINV 126 8739|2015-01-22 013432|72A1_FINV 34 9482a41c1fecc4c64b8c437af6cc85e8 32

8739

2015-01-22

013432

13A8_MW_UB

AT_VT 5 8739|2015-01-22 013432|13A8_MW_UBAT_VT 42 e4160914ee55ce0b93f87b23366a0ce3 32

8674

2015-01-22

013426 72A1_FINV 6 8674|2015-01-22 013426|72A1_FINV 34 fcb3e7c8c91e44ce396d908a4948ca65 32

8674

2015-01-22

013426

16A1_HSVER

OND 7 8674|2015-01-22 013426|16A1_HSVEROND 38 fe9098c8c291ad56af5c8afae5169196 32

Loses statistical Informationregarding the data distributionQuery optimizers do not like thishellip

Column family Document and Key-value databases need a

good (natural) sharding key for (partial) key-

lookups

Hashinghelliphellip

httpwwwebaytechblogcom20120814cassandra-data-modeling-best-practices-part-2rwerschkull

nllinkedincominrogierwerschkull

Surrogates keys require

centralized coordination

hellipand thus can impact the overall systemrsquos scalability and availability

A lot of MPP NoSQL databases simply do not have themhellip

B) Surrogate BuSINESS keys

rwerschkull

nllinkedincominrogierwerschkull

Then Some Inspiration

httproelantvoscomblogp=1119

rwerschkull

nllinkedincominrogierwerschkull

lsquoIn my opinion the answer lies in the adoption of the

persistent (Historical) Staging Area concept

(also known as Historical Staging or the History Area)

This basically adopts the fundamentals of a Data Warehousersquo

lsquoThe Historical Staging Area effectively lsquoactsrsquo as

Data Lake but in a better defined form as data deltas and

event datetimes are taken into accountrsquo

So

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

Subject Oriented Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Could be a

Data LAKE

VirtualisedEnsemble

Tier

EDW

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

rwerschkull

nllinkedincominrogierwerschkull

How

Does PS-3C Work

rwerschkull

nllinkedincominrogierwerschkullPhoto credit Public Domain

StagingArea

EDWInformation

Marts

Focus of Current ensemble EDWrsquos

rwerschkull

nllinkedincominrogierwerschkull

Persistent StagingArea HSA =

Data LibraryEDW

Information Marts

Splitting the work

rwerschkull

nllinkedincominrogierwerschkull

Persitent

Staging

-

Concept

Context

Connector

Business

Concept

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

Identify source event stream Primary or Unique KeyUse source metadata for this

Automate the building of a PS lsquoaround this keyrsquo Take all columns

Historize using SCD-2 approach

Persistent Staging - how

rwerschkull

nllinkedincominrogierwerschkull

Entity levelUnique key

Functional Description

Delivering party

Owner Responsible

MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)

hellip

Persistent Staging Metadata-1

rwerschkull

nllinkedincominrogierwerschkull

Column level [Load Date Timestamp]

[Load End Date Timestamp]

[Deleted Flag] OR delete as new record

[Source system] on table file level (lowest possible)

Load End Date Timestamp possible but difficulthellipRequires updates

Persistent Staging Metadata-2

rwerschkull

nllinkedincominrogierwerschkull

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 8: PS-3C Data Modelling Zone Berlin

problem

rwerschkull

nllinkedincominrogierwerschkull

Not Build for

BiG data lake

Data CentrICITY

Photo credit Lake Public Domain httpwwwwriteupsorgstar-trek-brent-spiner-datarwerschkull

nllinkedincominrogierwerschkull

lsquoData may first be stored in a

data lake so that it can be explored cleaned and prepared

If it can be structured in a relational format (basically rows and columns) and needs to be used frequently and kept highly secure it may go into a

data warehouse

If it stops being used frequently it may go back to a HDFS

(Hadoop Distributed File System)-based archiversquo

Data Centric data first THOMAS H DAVENPORT WALL STEET JOURNAL OF 3-6-2015

httpblogswsjcomcio20150603the-shift-to-a-new-data-architecture rwerschkull

nllinkedincominrogierwerschkull

Systems Like

rwerschkull

nllinkedincominrogierwerschkull

Data Flood

Photo credit Kurayba (httpswwwflickrcomphotos48503330N0828564454666 )

under cc licence (httpscreativecommonsorglicensesby-sa20)

rwerschkull

nllinkedincominrogierwerschkull

The possible resulthellip

Photo credit httpshighfiveexportswordpresscom201006253000-pieces-lego-mix-specialty-pieces-rare-pieces-bricks-blocks-

parts-more-ultimate-lot-of-lego-parts-pieces-lego-for-sale-lego-batman-lego-starwars-lego-technic-lego-minifigur

rwerschkull

nllinkedincominrogierwerschkull

But isNrsquot Data Vault v2

lsquomade for

Big data centric

systemsrsquorwerschkull

nllinkedincominrogierwerschkull

In

DV2you still

do thisin one go

Subject Oriented

Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Coding =

A lot like modelling

Being Data Centric

conflictswith thecomplex Data

MODELLINGwork

httpxkcdcom844

problem

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Less Mature

JOINoptimizers

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Key-Value

Document

Column Family

NoSQL Databases

+SQL on Hadoop solutions

Do NOT like joins

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

This

REALLYcomplicates

AnchoRModeling

rwerschkull

nllinkedincominrogierwerschkull

And Personally

HUB

SAT

LINK

This one

too(Link Satellite)

HUB

HUB

SAT

SAT

rwerschkull

nllinkedincominrogierwerschkull

problem

rwerschkull

nllinkedincominrogierwerschkull

a) HASHING OF Business keys

Rolling

Stock Nr Datetime Sensor Id Value Concatenated Business Key

Key

Len MD5 Hash

Key

Len

8739

2015-01-22

013427 72A1_FINV 123 8739|2015-01-22 013427|72A1_FINV 34 86ae4c6b0e2e2d5a13a0d11440529aeb 32

8739

2015-01-22

013427 72A1_SLDET 100 8739|2015-01-22 013427|72A1_SLDET 35 51ce9bc292eef407bd7c91a52eebcf2e 32

8739

2015-01-22

013432 72A1_FINV 126 8739|2015-01-22 013432|72A1_FINV 34 9482a41c1fecc4c64b8c437af6cc85e8 32

8739

2015-01-22

013432

13A8_MW_UB

AT_VT 5 8739|2015-01-22 013432|13A8_MW_UBAT_VT 42 e4160914ee55ce0b93f87b23366a0ce3 32

8674

2015-01-22

013426 72A1_FINV 6 8674|2015-01-22 013426|72A1_FINV 34 fcb3e7c8c91e44ce396d908a4948ca65 32

8674

2015-01-22

013426

16A1_HSVER

OND 7 8674|2015-01-22 013426|16A1_HSVEROND 38 fe9098c8c291ad56af5c8afae5169196 32

Loses statistical Informationregarding the data distributionQuery optimizers do not like thishellip

Column family Document and Key-value databases need a

good (natural) sharding key for (partial) key-

lookups

Hashinghelliphellip

httpwwwebaytechblogcom20120814cassandra-data-modeling-best-practices-part-2rwerschkull

nllinkedincominrogierwerschkull

Surrogates keys require

centralized coordination

hellipand thus can impact the overall systemrsquos scalability and availability

A lot of MPP NoSQL databases simply do not have themhellip

B) Surrogate BuSINESS keys

rwerschkull

nllinkedincominrogierwerschkull

Then Some Inspiration

httproelantvoscomblogp=1119

rwerschkull

nllinkedincominrogierwerschkull

lsquoIn my opinion the answer lies in the adoption of the

persistent (Historical) Staging Area concept

(also known as Historical Staging or the History Area)

This basically adopts the fundamentals of a Data Warehousersquo

lsquoThe Historical Staging Area effectively lsquoactsrsquo as

Data Lake but in a better defined form as data deltas and

event datetimes are taken into accountrsquo

So

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

Subject Oriented Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Could be a

Data LAKE

VirtualisedEnsemble

Tier

EDW

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

rwerschkull

nllinkedincominrogierwerschkull

How

Does PS-3C Work

rwerschkull

nllinkedincominrogierwerschkullPhoto credit Public Domain

StagingArea

EDWInformation

Marts

Focus of Current ensemble EDWrsquos

rwerschkull

nllinkedincominrogierwerschkull

Persistent StagingArea HSA =

Data LibraryEDW

Information Marts

Splitting the work

rwerschkull

nllinkedincominrogierwerschkull

Persitent

Staging

-

Concept

Context

Connector

Business

Concept

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

Identify source event stream Primary or Unique KeyUse source metadata for this

Automate the building of a PS lsquoaround this keyrsquo Take all columns

Historize using SCD-2 approach

Persistent Staging - how

rwerschkull

nllinkedincominrogierwerschkull

Entity levelUnique key

Functional Description

Delivering party

Owner Responsible

MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)

hellip

Persistent Staging Metadata-1

rwerschkull

nllinkedincominrogierwerschkull

Column level [Load Date Timestamp]

[Load End Date Timestamp]

[Deleted Flag] OR delete as new record

[Source system] on table file level (lowest possible)

Load End Date Timestamp possible but difficulthellipRequires updates

Persistent Staging Metadata-2

rwerschkull

nllinkedincominrogierwerschkull

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 9: PS-3C Data Modelling Zone Berlin

Not Build for

BiG data lake

Data CentrICITY

Photo credit Lake Public Domain httpwwwwriteupsorgstar-trek-brent-spiner-datarwerschkull

nllinkedincominrogierwerschkull

lsquoData may first be stored in a

data lake so that it can be explored cleaned and prepared

If it can be structured in a relational format (basically rows and columns) and needs to be used frequently and kept highly secure it may go into a

data warehouse

If it stops being used frequently it may go back to a HDFS

(Hadoop Distributed File System)-based archiversquo

Data Centric data first THOMAS H DAVENPORT WALL STEET JOURNAL OF 3-6-2015

httpblogswsjcomcio20150603the-shift-to-a-new-data-architecture rwerschkull

nllinkedincominrogierwerschkull

Systems Like

rwerschkull

nllinkedincominrogierwerschkull

Data Flood

Photo credit Kurayba (httpswwwflickrcomphotos48503330N0828564454666 )

under cc licence (httpscreativecommonsorglicensesby-sa20)

rwerschkull

nllinkedincominrogierwerschkull

The possible resulthellip

Photo credit httpshighfiveexportswordpresscom201006253000-pieces-lego-mix-specialty-pieces-rare-pieces-bricks-blocks-

parts-more-ultimate-lot-of-lego-parts-pieces-lego-for-sale-lego-batman-lego-starwars-lego-technic-lego-minifigur

rwerschkull

nllinkedincominrogierwerschkull

But isNrsquot Data Vault v2

lsquomade for

Big data centric

systemsrsquorwerschkull

nllinkedincominrogierwerschkull

In

DV2you still

do thisin one go

Subject Oriented

Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Coding =

A lot like modelling

Being Data Centric

conflictswith thecomplex Data

MODELLINGwork

httpxkcdcom844

problem

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Less Mature

JOINoptimizers

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Key-Value

Document

Column Family

NoSQL Databases

+SQL on Hadoop solutions

Do NOT like joins

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

This

REALLYcomplicates

AnchoRModeling

rwerschkull

nllinkedincominrogierwerschkull

And Personally

HUB

SAT

LINK

This one

too(Link Satellite)

HUB

HUB

SAT

SAT

rwerschkull

nllinkedincominrogierwerschkull

problem

rwerschkull

nllinkedincominrogierwerschkull

a) HASHING OF Business keys

Rolling

Stock Nr Datetime Sensor Id Value Concatenated Business Key

Key

Len MD5 Hash

Key

Len

8739

2015-01-22

013427 72A1_FINV 123 8739|2015-01-22 013427|72A1_FINV 34 86ae4c6b0e2e2d5a13a0d11440529aeb 32

8739

2015-01-22

013427 72A1_SLDET 100 8739|2015-01-22 013427|72A1_SLDET 35 51ce9bc292eef407bd7c91a52eebcf2e 32

8739

2015-01-22

013432 72A1_FINV 126 8739|2015-01-22 013432|72A1_FINV 34 9482a41c1fecc4c64b8c437af6cc85e8 32

8739

2015-01-22

013432

13A8_MW_UB

AT_VT 5 8739|2015-01-22 013432|13A8_MW_UBAT_VT 42 e4160914ee55ce0b93f87b23366a0ce3 32

8674

2015-01-22

013426 72A1_FINV 6 8674|2015-01-22 013426|72A1_FINV 34 fcb3e7c8c91e44ce396d908a4948ca65 32

8674

2015-01-22

013426

16A1_HSVER

OND 7 8674|2015-01-22 013426|16A1_HSVEROND 38 fe9098c8c291ad56af5c8afae5169196 32

Loses statistical Informationregarding the data distributionQuery optimizers do not like thishellip

Column family Document and Key-value databases need a

good (natural) sharding key for (partial) key-

lookups

Hashinghelliphellip

httpwwwebaytechblogcom20120814cassandra-data-modeling-best-practices-part-2rwerschkull

nllinkedincominrogierwerschkull

Surrogates keys require

centralized coordination

hellipand thus can impact the overall systemrsquos scalability and availability

A lot of MPP NoSQL databases simply do not have themhellip

B) Surrogate BuSINESS keys

rwerschkull

nllinkedincominrogierwerschkull

Then Some Inspiration

httproelantvoscomblogp=1119

rwerschkull

nllinkedincominrogierwerschkull

lsquoIn my opinion the answer lies in the adoption of the

persistent (Historical) Staging Area concept

(also known as Historical Staging or the History Area)

This basically adopts the fundamentals of a Data Warehousersquo

lsquoThe Historical Staging Area effectively lsquoactsrsquo as

Data Lake but in a better defined form as data deltas and

event datetimes are taken into accountrsquo

So

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

Subject Oriented Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Could be a

Data LAKE

VirtualisedEnsemble

Tier

EDW

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

rwerschkull

nllinkedincominrogierwerschkull

How

Does PS-3C Work

rwerschkull

nllinkedincominrogierwerschkullPhoto credit Public Domain

StagingArea

EDWInformation

Marts

Focus of Current ensemble EDWrsquos

rwerschkull

nllinkedincominrogierwerschkull

Persistent StagingArea HSA =

Data LibraryEDW

Information Marts

Splitting the work

rwerschkull

nllinkedincominrogierwerschkull

Persitent

Staging

-

Concept

Context

Connector

Business

Concept

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

Identify source event stream Primary or Unique KeyUse source metadata for this

Automate the building of a PS lsquoaround this keyrsquo Take all columns

Historize using SCD-2 approach

Persistent Staging - how

rwerschkull

nllinkedincominrogierwerschkull

Entity levelUnique key

Functional Description

Delivering party

Owner Responsible

MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)

hellip

Persistent Staging Metadata-1

rwerschkull

nllinkedincominrogierwerschkull

Column level [Load Date Timestamp]

[Load End Date Timestamp]

[Deleted Flag] OR delete as new record

[Source system] on table file level (lowest possible)

Load End Date Timestamp possible but difficulthellipRequires updates

Persistent Staging Metadata-2

rwerschkull

nllinkedincominrogierwerschkull

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 10: PS-3C Data Modelling Zone Berlin

lsquoData may first be stored in a

data lake so that it can be explored cleaned and prepared

If it can be structured in a relational format (basically rows and columns) and needs to be used frequently and kept highly secure it may go into a

data warehouse

If it stops being used frequently it may go back to a HDFS

(Hadoop Distributed File System)-based archiversquo

Data Centric data first THOMAS H DAVENPORT WALL STEET JOURNAL OF 3-6-2015

httpblogswsjcomcio20150603the-shift-to-a-new-data-architecture rwerschkull

nllinkedincominrogierwerschkull

Systems Like

rwerschkull

nllinkedincominrogierwerschkull

Data Flood

Photo credit Kurayba (httpswwwflickrcomphotos48503330N0828564454666 )

under cc licence (httpscreativecommonsorglicensesby-sa20)

rwerschkull

nllinkedincominrogierwerschkull

The possible resulthellip

Photo credit httpshighfiveexportswordpresscom201006253000-pieces-lego-mix-specialty-pieces-rare-pieces-bricks-blocks-

parts-more-ultimate-lot-of-lego-parts-pieces-lego-for-sale-lego-batman-lego-starwars-lego-technic-lego-minifigur

rwerschkull

nllinkedincominrogierwerschkull

But isNrsquot Data Vault v2

lsquomade for

Big data centric

systemsrsquorwerschkull

nllinkedincominrogierwerschkull

In

DV2you still

do thisin one go

Subject Oriented

Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Coding =

A lot like modelling

Being Data Centric

conflictswith thecomplex Data

MODELLINGwork

httpxkcdcom844

problem

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Less Mature

JOINoptimizers

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Key-Value

Document

Column Family

NoSQL Databases

+SQL on Hadoop solutions

Do NOT like joins

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

This

REALLYcomplicates

AnchoRModeling

rwerschkull

nllinkedincominrogierwerschkull

And Personally

HUB

SAT

LINK

This one

too(Link Satellite)

HUB

HUB

SAT

SAT

rwerschkull

nllinkedincominrogierwerschkull

problem

rwerschkull

nllinkedincominrogierwerschkull

a) HASHING OF Business keys

Rolling

Stock Nr Datetime Sensor Id Value Concatenated Business Key

Key

Len MD5 Hash

Key

Len

8739

2015-01-22

013427 72A1_FINV 123 8739|2015-01-22 013427|72A1_FINV 34 86ae4c6b0e2e2d5a13a0d11440529aeb 32

8739

2015-01-22

013427 72A1_SLDET 100 8739|2015-01-22 013427|72A1_SLDET 35 51ce9bc292eef407bd7c91a52eebcf2e 32

8739

2015-01-22

013432 72A1_FINV 126 8739|2015-01-22 013432|72A1_FINV 34 9482a41c1fecc4c64b8c437af6cc85e8 32

8739

2015-01-22

013432

13A8_MW_UB

AT_VT 5 8739|2015-01-22 013432|13A8_MW_UBAT_VT 42 e4160914ee55ce0b93f87b23366a0ce3 32

8674

2015-01-22

013426 72A1_FINV 6 8674|2015-01-22 013426|72A1_FINV 34 fcb3e7c8c91e44ce396d908a4948ca65 32

8674

2015-01-22

013426

16A1_HSVER

OND 7 8674|2015-01-22 013426|16A1_HSVEROND 38 fe9098c8c291ad56af5c8afae5169196 32

Loses statistical Informationregarding the data distributionQuery optimizers do not like thishellip

Column family Document and Key-value databases need a

good (natural) sharding key for (partial) key-

lookups

Hashinghelliphellip

httpwwwebaytechblogcom20120814cassandra-data-modeling-best-practices-part-2rwerschkull

nllinkedincominrogierwerschkull

Surrogates keys require

centralized coordination

hellipand thus can impact the overall systemrsquos scalability and availability

A lot of MPP NoSQL databases simply do not have themhellip

B) Surrogate BuSINESS keys

rwerschkull

nllinkedincominrogierwerschkull

Then Some Inspiration

httproelantvoscomblogp=1119

rwerschkull

nllinkedincominrogierwerschkull

lsquoIn my opinion the answer lies in the adoption of the

persistent (Historical) Staging Area concept

(also known as Historical Staging or the History Area)

This basically adopts the fundamentals of a Data Warehousersquo

lsquoThe Historical Staging Area effectively lsquoactsrsquo as

Data Lake but in a better defined form as data deltas and

event datetimes are taken into accountrsquo

So

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

Subject Oriented Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Could be a

Data LAKE

VirtualisedEnsemble

Tier

EDW

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

rwerschkull

nllinkedincominrogierwerschkull

How

Does PS-3C Work

rwerschkull

nllinkedincominrogierwerschkullPhoto credit Public Domain

StagingArea

EDWInformation

Marts

Focus of Current ensemble EDWrsquos

rwerschkull

nllinkedincominrogierwerschkull

Persistent StagingArea HSA =

Data LibraryEDW

Information Marts

Splitting the work

rwerschkull

nllinkedincominrogierwerschkull

Persitent

Staging

-

Concept

Context

Connector

Business

Concept

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

Identify source event stream Primary or Unique KeyUse source metadata for this

Automate the building of a PS lsquoaround this keyrsquo Take all columns

Historize using SCD-2 approach

Persistent Staging - how

rwerschkull

nllinkedincominrogierwerschkull

Entity levelUnique key

Functional Description

Delivering party

Owner Responsible

MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)

hellip

Persistent Staging Metadata-1

rwerschkull

nllinkedincominrogierwerschkull

Column level [Load Date Timestamp]

[Load End Date Timestamp]

[Deleted Flag] OR delete as new record

[Source system] on table file level (lowest possible)

Load End Date Timestamp possible but difficulthellipRequires updates

Persistent Staging Metadata-2

rwerschkull

nllinkedincominrogierwerschkull

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 11: PS-3C Data Modelling Zone Berlin

Systems Like

rwerschkull

nllinkedincominrogierwerschkull

Data Flood

Photo credit Kurayba (httpswwwflickrcomphotos48503330N0828564454666 )

under cc licence (httpscreativecommonsorglicensesby-sa20)

rwerschkull

nllinkedincominrogierwerschkull

The possible resulthellip

Photo credit httpshighfiveexportswordpresscom201006253000-pieces-lego-mix-specialty-pieces-rare-pieces-bricks-blocks-

parts-more-ultimate-lot-of-lego-parts-pieces-lego-for-sale-lego-batman-lego-starwars-lego-technic-lego-minifigur

rwerschkull

nllinkedincominrogierwerschkull

But isNrsquot Data Vault v2

lsquomade for

Big data centric

systemsrsquorwerschkull

nllinkedincominrogierwerschkull

In

DV2you still

do thisin one go

Subject Oriented

Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Coding =

A lot like modelling

Being Data Centric

conflictswith thecomplex Data

MODELLINGwork

httpxkcdcom844

problem

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Less Mature

JOINoptimizers

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Key-Value

Document

Column Family

NoSQL Databases

+SQL on Hadoop solutions

Do NOT like joins

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

This

REALLYcomplicates

AnchoRModeling

rwerschkull

nllinkedincominrogierwerschkull

And Personally

HUB

SAT

LINK

This one

too(Link Satellite)

HUB

HUB

SAT

SAT

rwerschkull

nllinkedincominrogierwerschkull

problem

rwerschkull

nllinkedincominrogierwerschkull

a) HASHING OF Business keys

Rolling

Stock Nr Datetime Sensor Id Value Concatenated Business Key

Key

Len MD5 Hash

Key

Len

8739

2015-01-22

013427 72A1_FINV 123 8739|2015-01-22 013427|72A1_FINV 34 86ae4c6b0e2e2d5a13a0d11440529aeb 32

8739

2015-01-22

013427 72A1_SLDET 100 8739|2015-01-22 013427|72A1_SLDET 35 51ce9bc292eef407bd7c91a52eebcf2e 32

8739

2015-01-22

013432 72A1_FINV 126 8739|2015-01-22 013432|72A1_FINV 34 9482a41c1fecc4c64b8c437af6cc85e8 32

8739

2015-01-22

013432

13A8_MW_UB

AT_VT 5 8739|2015-01-22 013432|13A8_MW_UBAT_VT 42 e4160914ee55ce0b93f87b23366a0ce3 32

8674

2015-01-22

013426 72A1_FINV 6 8674|2015-01-22 013426|72A1_FINV 34 fcb3e7c8c91e44ce396d908a4948ca65 32

8674

2015-01-22

013426

16A1_HSVER

OND 7 8674|2015-01-22 013426|16A1_HSVEROND 38 fe9098c8c291ad56af5c8afae5169196 32

Loses statistical Informationregarding the data distributionQuery optimizers do not like thishellip

Column family Document and Key-value databases need a

good (natural) sharding key for (partial) key-

lookups

Hashinghelliphellip

httpwwwebaytechblogcom20120814cassandra-data-modeling-best-practices-part-2rwerschkull

nllinkedincominrogierwerschkull

Surrogates keys require

centralized coordination

hellipand thus can impact the overall systemrsquos scalability and availability

A lot of MPP NoSQL databases simply do not have themhellip

B) Surrogate BuSINESS keys

rwerschkull

nllinkedincominrogierwerschkull

Then Some Inspiration

httproelantvoscomblogp=1119

rwerschkull

nllinkedincominrogierwerschkull

lsquoIn my opinion the answer lies in the adoption of the

persistent (Historical) Staging Area concept

(also known as Historical Staging or the History Area)

This basically adopts the fundamentals of a Data Warehousersquo

lsquoThe Historical Staging Area effectively lsquoactsrsquo as

Data Lake but in a better defined form as data deltas and

event datetimes are taken into accountrsquo

So

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

Subject Oriented Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Could be a

Data LAKE

VirtualisedEnsemble

Tier

EDW

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

rwerschkull

nllinkedincominrogierwerschkull

How

Does PS-3C Work

rwerschkull

nllinkedincominrogierwerschkullPhoto credit Public Domain

StagingArea

EDWInformation

Marts

Focus of Current ensemble EDWrsquos

rwerschkull

nllinkedincominrogierwerschkull

Persistent StagingArea HSA =

Data LibraryEDW

Information Marts

Splitting the work

rwerschkull

nllinkedincominrogierwerschkull

Persitent

Staging

-

Concept

Context

Connector

Business

Concept

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

Identify source event stream Primary or Unique KeyUse source metadata for this

Automate the building of a PS lsquoaround this keyrsquo Take all columns

Historize using SCD-2 approach

Persistent Staging - how

rwerschkull

nllinkedincominrogierwerschkull

Entity levelUnique key

Functional Description

Delivering party

Owner Responsible

MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)

hellip

Persistent Staging Metadata-1

rwerschkull

nllinkedincominrogierwerschkull

Column level [Load Date Timestamp]

[Load End Date Timestamp]

[Deleted Flag] OR delete as new record

[Source system] on table file level (lowest possible)

Load End Date Timestamp possible but difficulthellipRequires updates

Persistent Staging Metadata-2

rwerschkull

nllinkedincominrogierwerschkull

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 12: PS-3C Data Modelling Zone Berlin

Data Flood

Photo credit Kurayba (httpswwwflickrcomphotos48503330N0828564454666 )

under cc licence (httpscreativecommonsorglicensesby-sa20)

rwerschkull

nllinkedincominrogierwerschkull

The possible resulthellip

Photo credit httpshighfiveexportswordpresscom201006253000-pieces-lego-mix-specialty-pieces-rare-pieces-bricks-blocks-

parts-more-ultimate-lot-of-lego-parts-pieces-lego-for-sale-lego-batman-lego-starwars-lego-technic-lego-minifigur

rwerschkull

nllinkedincominrogierwerschkull

But isNrsquot Data Vault v2

lsquomade for

Big data centric

systemsrsquorwerschkull

nllinkedincominrogierwerschkull

In

DV2you still

do thisin one go

Subject Oriented

Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Coding =

A lot like modelling

Being Data Centric

conflictswith thecomplex Data

MODELLINGwork

httpxkcdcom844

problem

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Less Mature

JOINoptimizers

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Key-Value

Document

Column Family

NoSQL Databases

+SQL on Hadoop solutions

Do NOT like joins

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

This

REALLYcomplicates

AnchoRModeling

rwerschkull

nllinkedincominrogierwerschkull

And Personally

HUB

SAT

LINK

This one

too(Link Satellite)

HUB

HUB

SAT

SAT

rwerschkull

nllinkedincominrogierwerschkull

problem

rwerschkull

nllinkedincominrogierwerschkull

a) HASHING OF Business keys

Rolling

Stock Nr Datetime Sensor Id Value Concatenated Business Key

Key

Len MD5 Hash

Key

Len

8739

2015-01-22

013427 72A1_FINV 123 8739|2015-01-22 013427|72A1_FINV 34 86ae4c6b0e2e2d5a13a0d11440529aeb 32

8739

2015-01-22

013427 72A1_SLDET 100 8739|2015-01-22 013427|72A1_SLDET 35 51ce9bc292eef407bd7c91a52eebcf2e 32

8739

2015-01-22

013432 72A1_FINV 126 8739|2015-01-22 013432|72A1_FINV 34 9482a41c1fecc4c64b8c437af6cc85e8 32

8739

2015-01-22

013432

13A8_MW_UB

AT_VT 5 8739|2015-01-22 013432|13A8_MW_UBAT_VT 42 e4160914ee55ce0b93f87b23366a0ce3 32

8674

2015-01-22

013426 72A1_FINV 6 8674|2015-01-22 013426|72A1_FINV 34 fcb3e7c8c91e44ce396d908a4948ca65 32

8674

2015-01-22

013426

16A1_HSVER

OND 7 8674|2015-01-22 013426|16A1_HSVEROND 38 fe9098c8c291ad56af5c8afae5169196 32

Loses statistical Informationregarding the data distributionQuery optimizers do not like thishellip

Column family Document and Key-value databases need a

good (natural) sharding key for (partial) key-

lookups

Hashinghelliphellip

httpwwwebaytechblogcom20120814cassandra-data-modeling-best-practices-part-2rwerschkull

nllinkedincominrogierwerschkull

Surrogates keys require

centralized coordination

hellipand thus can impact the overall systemrsquos scalability and availability

A lot of MPP NoSQL databases simply do not have themhellip

B) Surrogate BuSINESS keys

rwerschkull

nllinkedincominrogierwerschkull

Then Some Inspiration

httproelantvoscomblogp=1119

rwerschkull

nllinkedincominrogierwerschkull

lsquoIn my opinion the answer lies in the adoption of the

persistent (Historical) Staging Area concept

(also known as Historical Staging or the History Area)

This basically adopts the fundamentals of a Data Warehousersquo

lsquoThe Historical Staging Area effectively lsquoactsrsquo as

Data Lake but in a better defined form as data deltas and

event datetimes are taken into accountrsquo

So

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

Subject Oriented Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Could be a

Data LAKE

VirtualisedEnsemble

Tier

EDW

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

rwerschkull

nllinkedincominrogierwerschkull

How

Does PS-3C Work

rwerschkull

nllinkedincominrogierwerschkullPhoto credit Public Domain

StagingArea

EDWInformation

Marts

Focus of Current ensemble EDWrsquos

rwerschkull

nllinkedincominrogierwerschkull

Persistent StagingArea HSA =

Data LibraryEDW

Information Marts

Splitting the work

rwerschkull

nllinkedincominrogierwerschkull

Persitent

Staging

-

Concept

Context

Connector

Business

Concept

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

Identify source event stream Primary or Unique KeyUse source metadata for this

Automate the building of a PS lsquoaround this keyrsquo Take all columns

Historize using SCD-2 approach

Persistent Staging - how

rwerschkull

nllinkedincominrogierwerschkull

Entity levelUnique key

Functional Description

Delivering party

Owner Responsible

MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)

hellip

Persistent Staging Metadata-1

rwerschkull

nllinkedincominrogierwerschkull

Column level [Load Date Timestamp]

[Load End Date Timestamp]

[Deleted Flag] OR delete as new record

[Source system] on table file level (lowest possible)

Load End Date Timestamp possible but difficulthellipRequires updates

Persistent Staging Metadata-2

rwerschkull

nllinkedincominrogierwerschkull

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 13: PS-3C Data Modelling Zone Berlin

The possible resulthellip

Photo credit httpshighfiveexportswordpresscom201006253000-pieces-lego-mix-specialty-pieces-rare-pieces-bricks-blocks-

parts-more-ultimate-lot-of-lego-parts-pieces-lego-for-sale-lego-batman-lego-starwars-lego-technic-lego-minifigur

rwerschkull

nllinkedincominrogierwerschkull

But isNrsquot Data Vault v2

lsquomade for

Big data centric

systemsrsquorwerschkull

nllinkedincominrogierwerschkull

In

DV2you still

do thisin one go

Subject Oriented

Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Coding =

A lot like modelling

Being Data Centric

conflictswith thecomplex Data

MODELLINGwork

httpxkcdcom844

problem

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Less Mature

JOINoptimizers

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Key-Value

Document

Column Family

NoSQL Databases

+SQL on Hadoop solutions

Do NOT like joins

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

This

REALLYcomplicates

AnchoRModeling

rwerschkull

nllinkedincominrogierwerschkull

And Personally

HUB

SAT

LINK

This one

too(Link Satellite)

HUB

HUB

SAT

SAT

rwerschkull

nllinkedincominrogierwerschkull

problem

rwerschkull

nllinkedincominrogierwerschkull

a) HASHING OF Business keys

Rolling

Stock Nr Datetime Sensor Id Value Concatenated Business Key

Key

Len MD5 Hash

Key

Len

8739

2015-01-22

013427 72A1_FINV 123 8739|2015-01-22 013427|72A1_FINV 34 86ae4c6b0e2e2d5a13a0d11440529aeb 32

8739

2015-01-22

013427 72A1_SLDET 100 8739|2015-01-22 013427|72A1_SLDET 35 51ce9bc292eef407bd7c91a52eebcf2e 32

8739

2015-01-22

013432 72A1_FINV 126 8739|2015-01-22 013432|72A1_FINV 34 9482a41c1fecc4c64b8c437af6cc85e8 32

8739

2015-01-22

013432

13A8_MW_UB

AT_VT 5 8739|2015-01-22 013432|13A8_MW_UBAT_VT 42 e4160914ee55ce0b93f87b23366a0ce3 32

8674

2015-01-22

013426 72A1_FINV 6 8674|2015-01-22 013426|72A1_FINV 34 fcb3e7c8c91e44ce396d908a4948ca65 32

8674

2015-01-22

013426

16A1_HSVER

OND 7 8674|2015-01-22 013426|16A1_HSVEROND 38 fe9098c8c291ad56af5c8afae5169196 32

Loses statistical Informationregarding the data distributionQuery optimizers do not like thishellip

Column family Document and Key-value databases need a

good (natural) sharding key for (partial) key-

lookups

Hashinghelliphellip

httpwwwebaytechblogcom20120814cassandra-data-modeling-best-practices-part-2rwerschkull

nllinkedincominrogierwerschkull

Surrogates keys require

centralized coordination

hellipand thus can impact the overall systemrsquos scalability and availability

A lot of MPP NoSQL databases simply do not have themhellip

B) Surrogate BuSINESS keys

rwerschkull

nllinkedincominrogierwerschkull

Then Some Inspiration

httproelantvoscomblogp=1119

rwerschkull

nllinkedincominrogierwerschkull

lsquoIn my opinion the answer lies in the adoption of the

persistent (Historical) Staging Area concept

(also known as Historical Staging or the History Area)

This basically adopts the fundamentals of a Data Warehousersquo

lsquoThe Historical Staging Area effectively lsquoactsrsquo as

Data Lake but in a better defined form as data deltas and

event datetimes are taken into accountrsquo

So

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

Subject Oriented Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Could be a

Data LAKE

VirtualisedEnsemble

Tier

EDW

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

rwerschkull

nllinkedincominrogierwerschkull

How

Does PS-3C Work

rwerschkull

nllinkedincominrogierwerschkullPhoto credit Public Domain

StagingArea

EDWInformation

Marts

Focus of Current ensemble EDWrsquos

rwerschkull

nllinkedincominrogierwerschkull

Persistent StagingArea HSA =

Data LibraryEDW

Information Marts

Splitting the work

rwerschkull

nllinkedincominrogierwerschkull

Persitent

Staging

-

Concept

Context

Connector

Business

Concept

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

Identify source event stream Primary or Unique KeyUse source metadata for this

Automate the building of a PS lsquoaround this keyrsquo Take all columns

Historize using SCD-2 approach

Persistent Staging - how

rwerschkull

nllinkedincominrogierwerschkull

Entity levelUnique key

Functional Description

Delivering party

Owner Responsible

MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)

hellip

Persistent Staging Metadata-1

rwerschkull

nllinkedincominrogierwerschkull

Column level [Load Date Timestamp]

[Load End Date Timestamp]

[Deleted Flag] OR delete as new record

[Source system] on table file level (lowest possible)

Load End Date Timestamp possible but difficulthellipRequires updates

Persistent Staging Metadata-2

rwerschkull

nllinkedincominrogierwerschkull

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 14: PS-3C Data Modelling Zone Berlin

But isNrsquot Data Vault v2

lsquomade for

Big data centric

systemsrsquorwerschkull

nllinkedincominrogierwerschkull

In

DV2you still

do thisin one go

Subject Oriented

Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Coding =

A lot like modelling

Being Data Centric

conflictswith thecomplex Data

MODELLINGwork

httpxkcdcom844

problem

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Less Mature

JOINoptimizers

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Key-Value

Document

Column Family

NoSQL Databases

+SQL on Hadoop solutions

Do NOT like joins

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

This

REALLYcomplicates

AnchoRModeling

rwerschkull

nllinkedincominrogierwerschkull

And Personally

HUB

SAT

LINK

This one

too(Link Satellite)

HUB

HUB

SAT

SAT

rwerschkull

nllinkedincominrogierwerschkull

problem

rwerschkull

nllinkedincominrogierwerschkull

a) HASHING OF Business keys

Rolling

Stock Nr Datetime Sensor Id Value Concatenated Business Key

Key

Len MD5 Hash

Key

Len

8739

2015-01-22

013427 72A1_FINV 123 8739|2015-01-22 013427|72A1_FINV 34 86ae4c6b0e2e2d5a13a0d11440529aeb 32

8739

2015-01-22

013427 72A1_SLDET 100 8739|2015-01-22 013427|72A1_SLDET 35 51ce9bc292eef407bd7c91a52eebcf2e 32

8739

2015-01-22

013432 72A1_FINV 126 8739|2015-01-22 013432|72A1_FINV 34 9482a41c1fecc4c64b8c437af6cc85e8 32

8739

2015-01-22

013432

13A8_MW_UB

AT_VT 5 8739|2015-01-22 013432|13A8_MW_UBAT_VT 42 e4160914ee55ce0b93f87b23366a0ce3 32

8674

2015-01-22

013426 72A1_FINV 6 8674|2015-01-22 013426|72A1_FINV 34 fcb3e7c8c91e44ce396d908a4948ca65 32

8674

2015-01-22

013426

16A1_HSVER

OND 7 8674|2015-01-22 013426|16A1_HSVEROND 38 fe9098c8c291ad56af5c8afae5169196 32

Loses statistical Informationregarding the data distributionQuery optimizers do not like thishellip

Column family Document and Key-value databases need a

good (natural) sharding key for (partial) key-

lookups

Hashinghelliphellip

httpwwwebaytechblogcom20120814cassandra-data-modeling-best-practices-part-2rwerschkull

nllinkedincominrogierwerschkull

Surrogates keys require

centralized coordination

hellipand thus can impact the overall systemrsquos scalability and availability

A lot of MPP NoSQL databases simply do not have themhellip

B) Surrogate BuSINESS keys

rwerschkull

nllinkedincominrogierwerschkull

Then Some Inspiration

httproelantvoscomblogp=1119

rwerschkull

nllinkedincominrogierwerschkull

lsquoIn my opinion the answer lies in the adoption of the

persistent (Historical) Staging Area concept

(also known as Historical Staging or the History Area)

This basically adopts the fundamentals of a Data Warehousersquo

lsquoThe Historical Staging Area effectively lsquoactsrsquo as

Data Lake but in a better defined form as data deltas and

event datetimes are taken into accountrsquo

So

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

Subject Oriented Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Could be a

Data LAKE

VirtualisedEnsemble

Tier

EDW

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

rwerschkull

nllinkedincominrogierwerschkull

How

Does PS-3C Work

rwerschkull

nllinkedincominrogierwerschkullPhoto credit Public Domain

StagingArea

EDWInformation

Marts

Focus of Current ensemble EDWrsquos

rwerschkull

nllinkedincominrogierwerschkull

Persistent StagingArea HSA =

Data LibraryEDW

Information Marts

Splitting the work

rwerschkull

nllinkedincominrogierwerschkull

Persitent

Staging

-

Concept

Context

Connector

Business

Concept

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

Identify source event stream Primary or Unique KeyUse source metadata for this

Automate the building of a PS lsquoaround this keyrsquo Take all columns

Historize using SCD-2 approach

Persistent Staging - how

rwerschkull

nllinkedincominrogierwerschkull

Entity levelUnique key

Functional Description

Delivering party

Owner Responsible

MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)

hellip

Persistent Staging Metadata-1

rwerschkull

nllinkedincominrogierwerschkull

Column level [Load Date Timestamp]

[Load End Date Timestamp]

[Deleted Flag] OR delete as new record

[Source system] on table file level (lowest possible)

Load End Date Timestamp possible but difficulthellipRequires updates

Persistent Staging Metadata-2

rwerschkull

nllinkedincominrogierwerschkull

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 15: PS-3C Data Modelling Zone Berlin

In

DV2you still

do thisin one go

Subject Oriented

Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Coding =

A lot like modelling

Being Data Centric

conflictswith thecomplex Data

MODELLINGwork

httpxkcdcom844

problem

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Less Mature

JOINoptimizers

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Key-Value

Document

Column Family

NoSQL Databases

+SQL on Hadoop solutions

Do NOT like joins

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

This

REALLYcomplicates

AnchoRModeling

rwerschkull

nllinkedincominrogierwerschkull

And Personally

HUB

SAT

LINK

This one

too(Link Satellite)

HUB

HUB

SAT

SAT

rwerschkull

nllinkedincominrogierwerschkull

problem

rwerschkull

nllinkedincominrogierwerschkull

a) HASHING OF Business keys

Rolling

Stock Nr Datetime Sensor Id Value Concatenated Business Key

Key

Len MD5 Hash

Key

Len

8739

2015-01-22

013427 72A1_FINV 123 8739|2015-01-22 013427|72A1_FINV 34 86ae4c6b0e2e2d5a13a0d11440529aeb 32

8739

2015-01-22

013427 72A1_SLDET 100 8739|2015-01-22 013427|72A1_SLDET 35 51ce9bc292eef407bd7c91a52eebcf2e 32

8739

2015-01-22

013432 72A1_FINV 126 8739|2015-01-22 013432|72A1_FINV 34 9482a41c1fecc4c64b8c437af6cc85e8 32

8739

2015-01-22

013432

13A8_MW_UB

AT_VT 5 8739|2015-01-22 013432|13A8_MW_UBAT_VT 42 e4160914ee55ce0b93f87b23366a0ce3 32

8674

2015-01-22

013426 72A1_FINV 6 8674|2015-01-22 013426|72A1_FINV 34 fcb3e7c8c91e44ce396d908a4948ca65 32

8674

2015-01-22

013426

16A1_HSVER

OND 7 8674|2015-01-22 013426|16A1_HSVEROND 38 fe9098c8c291ad56af5c8afae5169196 32

Loses statistical Informationregarding the data distributionQuery optimizers do not like thishellip

Column family Document and Key-value databases need a

good (natural) sharding key for (partial) key-

lookups

Hashinghelliphellip

httpwwwebaytechblogcom20120814cassandra-data-modeling-best-practices-part-2rwerschkull

nllinkedincominrogierwerschkull

Surrogates keys require

centralized coordination

hellipand thus can impact the overall systemrsquos scalability and availability

A lot of MPP NoSQL databases simply do not have themhellip

B) Surrogate BuSINESS keys

rwerschkull

nllinkedincominrogierwerschkull

Then Some Inspiration

httproelantvoscomblogp=1119

rwerschkull

nllinkedincominrogierwerschkull

lsquoIn my opinion the answer lies in the adoption of the

persistent (Historical) Staging Area concept

(also known as Historical Staging or the History Area)

This basically adopts the fundamentals of a Data Warehousersquo

lsquoThe Historical Staging Area effectively lsquoactsrsquo as

Data Lake but in a better defined form as data deltas and

event datetimes are taken into accountrsquo

So

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

Subject Oriented Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Could be a

Data LAKE

VirtualisedEnsemble

Tier

EDW

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

rwerschkull

nllinkedincominrogierwerschkull

How

Does PS-3C Work

rwerschkull

nllinkedincominrogierwerschkullPhoto credit Public Domain

StagingArea

EDWInformation

Marts

Focus of Current ensemble EDWrsquos

rwerschkull

nllinkedincominrogierwerschkull

Persistent StagingArea HSA =

Data LibraryEDW

Information Marts

Splitting the work

rwerschkull

nllinkedincominrogierwerschkull

Persitent

Staging

-

Concept

Context

Connector

Business

Concept

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

Identify source event stream Primary or Unique KeyUse source metadata for this

Automate the building of a PS lsquoaround this keyrsquo Take all columns

Historize using SCD-2 approach

Persistent Staging - how

rwerschkull

nllinkedincominrogierwerschkull

Entity levelUnique key

Functional Description

Delivering party

Owner Responsible

MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)

hellip

Persistent Staging Metadata-1

rwerschkull

nllinkedincominrogierwerschkull

Column level [Load Date Timestamp]

[Load End Date Timestamp]

[Deleted Flag] OR delete as new record

[Source system] on table file level (lowest possible)

Load End Date Timestamp possible but difficulthellipRequires updates

Persistent Staging Metadata-2

rwerschkull

nllinkedincominrogierwerschkull

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 16: PS-3C Data Modelling Zone Berlin

Coding =

A lot like modelling

Being Data Centric

conflictswith thecomplex Data

MODELLINGwork

httpxkcdcom844

problem

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Less Mature

JOINoptimizers

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Key-Value

Document

Column Family

NoSQL Databases

+SQL on Hadoop solutions

Do NOT like joins

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

This

REALLYcomplicates

AnchoRModeling

rwerschkull

nllinkedincominrogierwerschkull

And Personally

HUB

SAT

LINK

This one

too(Link Satellite)

HUB

HUB

SAT

SAT

rwerschkull

nllinkedincominrogierwerschkull

problem

rwerschkull

nllinkedincominrogierwerschkull

a) HASHING OF Business keys

Rolling

Stock Nr Datetime Sensor Id Value Concatenated Business Key

Key

Len MD5 Hash

Key

Len

8739

2015-01-22

013427 72A1_FINV 123 8739|2015-01-22 013427|72A1_FINV 34 86ae4c6b0e2e2d5a13a0d11440529aeb 32

8739

2015-01-22

013427 72A1_SLDET 100 8739|2015-01-22 013427|72A1_SLDET 35 51ce9bc292eef407bd7c91a52eebcf2e 32

8739

2015-01-22

013432 72A1_FINV 126 8739|2015-01-22 013432|72A1_FINV 34 9482a41c1fecc4c64b8c437af6cc85e8 32

8739

2015-01-22

013432

13A8_MW_UB

AT_VT 5 8739|2015-01-22 013432|13A8_MW_UBAT_VT 42 e4160914ee55ce0b93f87b23366a0ce3 32

8674

2015-01-22

013426 72A1_FINV 6 8674|2015-01-22 013426|72A1_FINV 34 fcb3e7c8c91e44ce396d908a4948ca65 32

8674

2015-01-22

013426

16A1_HSVER

OND 7 8674|2015-01-22 013426|16A1_HSVEROND 38 fe9098c8c291ad56af5c8afae5169196 32

Loses statistical Informationregarding the data distributionQuery optimizers do not like thishellip

Column family Document and Key-value databases need a

good (natural) sharding key for (partial) key-

lookups

Hashinghelliphellip

httpwwwebaytechblogcom20120814cassandra-data-modeling-best-practices-part-2rwerschkull

nllinkedincominrogierwerschkull

Surrogates keys require

centralized coordination

hellipand thus can impact the overall systemrsquos scalability and availability

A lot of MPP NoSQL databases simply do not have themhellip

B) Surrogate BuSINESS keys

rwerschkull

nllinkedincominrogierwerschkull

Then Some Inspiration

httproelantvoscomblogp=1119

rwerschkull

nllinkedincominrogierwerschkull

lsquoIn my opinion the answer lies in the adoption of the

persistent (Historical) Staging Area concept

(also known as Historical Staging or the History Area)

This basically adopts the fundamentals of a Data Warehousersquo

lsquoThe Historical Staging Area effectively lsquoactsrsquo as

Data Lake but in a better defined form as data deltas and

event datetimes are taken into accountrsquo

So

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

Subject Oriented Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Could be a

Data LAKE

VirtualisedEnsemble

Tier

EDW

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

rwerschkull

nllinkedincominrogierwerschkull

How

Does PS-3C Work

rwerschkull

nllinkedincominrogierwerschkullPhoto credit Public Domain

StagingArea

EDWInformation

Marts

Focus of Current ensemble EDWrsquos

rwerschkull

nllinkedincominrogierwerschkull

Persistent StagingArea HSA =

Data LibraryEDW

Information Marts

Splitting the work

rwerschkull

nllinkedincominrogierwerschkull

Persitent

Staging

-

Concept

Context

Connector

Business

Concept

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

Identify source event stream Primary or Unique KeyUse source metadata for this

Automate the building of a PS lsquoaround this keyrsquo Take all columns

Historize using SCD-2 approach

Persistent Staging - how

rwerschkull

nllinkedincominrogierwerschkull

Entity levelUnique key

Functional Description

Delivering party

Owner Responsible

MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)

hellip

Persistent Staging Metadata-1

rwerschkull

nllinkedincominrogierwerschkull

Column level [Load Date Timestamp]

[Load End Date Timestamp]

[Deleted Flag] OR delete as new record

[Source system] on table file level (lowest possible)

Load End Date Timestamp possible but difficulthellipRequires updates

Persistent Staging Metadata-2

rwerschkull

nllinkedincominrogierwerschkull

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 17: PS-3C Data Modelling Zone Berlin

problem

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Less Mature

JOINoptimizers

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Key-Value

Document

Column Family

NoSQL Databases

+SQL on Hadoop solutions

Do NOT like joins

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

This

REALLYcomplicates

AnchoRModeling

rwerschkull

nllinkedincominrogierwerschkull

And Personally

HUB

SAT

LINK

This one

too(Link Satellite)

HUB

HUB

SAT

SAT

rwerschkull

nllinkedincominrogierwerschkull

problem

rwerschkull

nllinkedincominrogierwerschkull

a) HASHING OF Business keys

Rolling

Stock Nr Datetime Sensor Id Value Concatenated Business Key

Key

Len MD5 Hash

Key

Len

8739

2015-01-22

013427 72A1_FINV 123 8739|2015-01-22 013427|72A1_FINV 34 86ae4c6b0e2e2d5a13a0d11440529aeb 32

8739

2015-01-22

013427 72A1_SLDET 100 8739|2015-01-22 013427|72A1_SLDET 35 51ce9bc292eef407bd7c91a52eebcf2e 32

8739

2015-01-22

013432 72A1_FINV 126 8739|2015-01-22 013432|72A1_FINV 34 9482a41c1fecc4c64b8c437af6cc85e8 32

8739

2015-01-22

013432

13A8_MW_UB

AT_VT 5 8739|2015-01-22 013432|13A8_MW_UBAT_VT 42 e4160914ee55ce0b93f87b23366a0ce3 32

8674

2015-01-22

013426 72A1_FINV 6 8674|2015-01-22 013426|72A1_FINV 34 fcb3e7c8c91e44ce396d908a4948ca65 32

8674

2015-01-22

013426

16A1_HSVER

OND 7 8674|2015-01-22 013426|16A1_HSVEROND 38 fe9098c8c291ad56af5c8afae5169196 32

Loses statistical Informationregarding the data distributionQuery optimizers do not like thishellip

Column family Document and Key-value databases need a

good (natural) sharding key for (partial) key-

lookups

Hashinghelliphellip

httpwwwebaytechblogcom20120814cassandra-data-modeling-best-practices-part-2rwerschkull

nllinkedincominrogierwerschkull

Surrogates keys require

centralized coordination

hellipand thus can impact the overall systemrsquos scalability and availability

A lot of MPP NoSQL databases simply do not have themhellip

B) Surrogate BuSINESS keys

rwerschkull

nllinkedincominrogierwerschkull

Then Some Inspiration

httproelantvoscomblogp=1119

rwerschkull

nllinkedincominrogierwerschkull

lsquoIn my opinion the answer lies in the adoption of the

persistent (Historical) Staging Area concept

(also known as Historical Staging or the History Area)

This basically adopts the fundamentals of a Data Warehousersquo

lsquoThe Historical Staging Area effectively lsquoactsrsquo as

Data Lake but in a better defined form as data deltas and

event datetimes are taken into accountrsquo

So

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

Subject Oriented Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Could be a

Data LAKE

VirtualisedEnsemble

Tier

EDW

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

rwerschkull

nllinkedincominrogierwerschkull

How

Does PS-3C Work

rwerschkull

nllinkedincominrogierwerschkullPhoto credit Public Domain

StagingArea

EDWInformation

Marts

Focus of Current ensemble EDWrsquos

rwerschkull

nllinkedincominrogierwerschkull

Persistent StagingArea HSA =

Data LibraryEDW

Information Marts

Splitting the work

rwerschkull

nllinkedincominrogierwerschkull

Persitent

Staging

-

Concept

Context

Connector

Business

Concept

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

Identify source event stream Primary or Unique KeyUse source metadata for this

Automate the building of a PS lsquoaround this keyrsquo Take all columns

Historize using SCD-2 approach

Persistent Staging - how

rwerschkull

nllinkedincominrogierwerschkull

Entity levelUnique key

Functional Description

Delivering party

Owner Responsible

MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)

hellip

Persistent Staging Metadata-1

rwerschkull

nllinkedincominrogierwerschkull

Column level [Load Date Timestamp]

[Load End Date Timestamp]

[Deleted Flag] OR delete as new record

[Source system] on table file level (lowest possible)

Load End Date Timestamp possible but difficulthellipRequires updates

Persistent Staging Metadata-2

rwerschkull

nllinkedincominrogierwerschkull

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 18: PS-3C Data Modelling Zone Berlin

rwerschkull

nllinkedincominrogierwerschkull

Less Mature

JOINoptimizers

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Key-Value

Document

Column Family

NoSQL Databases

+SQL on Hadoop solutions

Do NOT like joins

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

This

REALLYcomplicates

AnchoRModeling

rwerschkull

nllinkedincominrogierwerschkull

And Personally

HUB

SAT

LINK

This one

too(Link Satellite)

HUB

HUB

SAT

SAT

rwerschkull

nllinkedincominrogierwerschkull

problem

rwerschkull

nllinkedincominrogierwerschkull

a) HASHING OF Business keys

Rolling

Stock Nr Datetime Sensor Id Value Concatenated Business Key

Key

Len MD5 Hash

Key

Len

8739

2015-01-22

013427 72A1_FINV 123 8739|2015-01-22 013427|72A1_FINV 34 86ae4c6b0e2e2d5a13a0d11440529aeb 32

8739

2015-01-22

013427 72A1_SLDET 100 8739|2015-01-22 013427|72A1_SLDET 35 51ce9bc292eef407bd7c91a52eebcf2e 32

8739

2015-01-22

013432 72A1_FINV 126 8739|2015-01-22 013432|72A1_FINV 34 9482a41c1fecc4c64b8c437af6cc85e8 32

8739

2015-01-22

013432

13A8_MW_UB

AT_VT 5 8739|2015-01-22 013432|13A8_MW_UBAT_VT 42 e4160914ee55ce0b93f87b23366a0ce3 32

8674

2015-01-22

013426 72A1_FINV 6 8674|2015-01-22 013426|72A1_FINV 34 fcb3e7c8c91e44ce396d908a4948ca65 32

8674

2015-01-22

013426

16A1_HSVER

OND 7 8674|2015-01-22 013426|16A1_HSVEROND 38 fe9098c8c291ad56af5c8afae5169196 32

Loses statistical Informationregarding the data distributionQuery optimizers do not like thishellip

Column family Document and Key-value databases need a

good (natural) sharding key for (partial) key-

lookups

Hashinghelliphellip

httpwwwebaytechblogcom20120814cassandra-data-modeling-best-practices-part-2rwerschkull

nllinkedincominrogierwerschkull

Surrogates keys require

centralized coordination

hellipand thus can impact the overall systemrsquos scalability and availability

A lot of MPP NoSQL databases simply do not have themhellip

B) Surrogate BuSINESS keys

rwerschkull

nllinkedincominrogierwerschkull

Then Some Inspiration

httproelantvoscomblogp=1119

rwerschkull

nllinkedincominrogierwerschkull

lsquoIn my opinion the answer lies in the adoption of the

persistent (Historical) Staging Area concept

(also known as Historical Staging or the History Area)

This basically adopts the fundamentals of a Data Warehousersquo

lsquoThe Historical Staging Area effectively lsquoactsrsquo as

Data Lake but in a better defined form as data deltas and

event datetimes are taken into accountrsquo

So

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

Subject Oriented Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Could be a

Data LAKE

VirtualisedEnsemble

Tier

EDW

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

rwerschkull

nllinkedincominrogierwerschkull

How

Does PS-3C Work

rwerschkull

nllinkedincominrogierwerschkullPhoto credit Public Domain

StagingArea

EDWInformation

Marts

Focus of Current ensemble EDWrsquos

rwerschkull

nllinkedincominrogierwerschkull

Persistent StagingArea HSA =

Data LibraryEDW

Information Marts

Splitting the work

rwerschkull

nllinkedincominrogierwerschkull

Persitent

Staging

-

Concept

Context

Connector

Business

Concept

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

Identify source event stream Primary or Unique KeyUse source metadata for this

Automate the building of a PS lsquoaround this keyrsquo Take all columns

Historize using SCD-2 approach

Persistent Staging - how

rwerschkull

nllinkedincominrogierwerschkull

Entity levelUnique key

Functional Description

Delivering party

Owner Responsible

MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)

hellip

Persistent Staging Metadata-1

rwerschkull

nllinkedincominrogierwerschkull

Column level [Load Date Timestamp]

[Load End Date Timestamp]

[Deleted Flag] OR delete as new record

[Source system] on table file level (lowest possible)

Load End Date Timestamp possible but difficulthellipRequires updates

Persistent Staging Metadata-2

rwerschkull

nllinkedincominrogierwerschkull

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 19: PS-3C Data Modelling Zone Berlin

Less Mature

JOINoptimizers

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Key-Value

Document

Column Family

NoSQL Databases

+SQL on Hadoop solutions

Do NOT like joins

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

This

REALLYcomplicates

AnchoRModeling

rwerschkull

nllinkedincominrogierwerschkull

And Personally

HUB

SAT

LINK

This one

too(Link Satellite)

HUB

HUB

SAT

SAT

rwerschkull

nllinkedincominrogierwerschkull

problem

rwerschkull

nllinkedincominrogierwerschkull

a) HASHING OF Business keys

Rolling

Stock Nr Datetime Sensor Id Value Concatenated Business Key

Key

Len MD5 Hash

Key

Len

8739

2015-01-22

013427 72A1_FINV 123 8739|2015-01-22 013427|72A1_FINV 34 86ae4c6b0e2e2d5a13a0d11440529aeb 32

8739

2015-01-22

013427 72A1_SLDET 100 8739|2015-01-22 013427|72A1_SLDET 35 51ce9bc292eef407bd7c91a52eebcf2e 32

8739

2015-01-22

013432 72A1_FINV 126 8739|2015-01-22 013432|72A1_FINV 34 9482a41c1fecc4c64b8c437af6cc85e8 32

8739

2015-01-22

013432

13A8_MW_UB

AT_VT 5 8739|2015-01-22 013432|13A8_MW_UBAT_VT 42 e4160914ee55ce0b93f87b23366a0ce3 32

8674

2015-01-22

013426 72A1_FINV 6 8674|2015-01-22 013426|72A1_FINV 34 fcb3e7c8c91e44ce396d908a4948ca65 32

8674

2015-01-22

013426

16A1_HSVER

OND 7 8674|2015-01-22 013426|16A1_HSVEROND 38 fe9098c8c291ad56af5c8afae5169196 32

Loses statistical Informationregarding the data distributionQuery optimizers do not like thishellip

Column family Document and Key-value databases need a

good (natural) sharding key for (partial) key-

lookups

Hashinghelliphellip

httpwwwebaytechblogcom20120814cassandra-data-modeling-best-practices-part-2rwerschkull

nllinkedincominrogierwerschkull

Surrogates keys require

centralized coordination

hellipand thus can impact the overall systemrsquos scalability and availability

A lot of MPP NoSQL databases simply do not have themhellip

B) Surrogate BuSINESS keys

rwerschkull

nllinkedincominrogierwerschkull

Then Some Inspiration

httproelantvoscomblogp=1119

rwerschkull

nllinkedincominrogierwerschkull

lsquoIn my opinion the answer lies in the adoption of the

persistent (Historical) Staging Area concept

(also known as Historical Staging or the History Area)

This basically adopts the fundamentals of a Data Warehousersquo

lsquoThe Historical Staging Area effectively lsquoactsrsquo as

Data Lake but in a better defined form as data deltas and

event datetimes are taken into accountrsquo

So

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

Subject Oriented Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Could be a

Data LAKE

VirtualisedEnsemble

Tier

EDW

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

rwerschkull

nllinkedincominrogierwerschkull

How

Does PS-3C Work

rwerschkull

nllinkedincominrogierwerschkullPhoto credit Public Domain

StagingArea

EDWInformation

Marts

Focus of Current ensemble EDWrsquos

rwerschkull

nllinkedincominrogierwerschkull

Persistent StagingArea HSA =

Data LibraryEDW

Information Marts

Splitting the work

rwerschkull

nllinkedincominrogierwerschkull

Persitent

Staging

-

Concept

Context

Connector

Business

Concept

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

Identify source event stream Primary or Unique KeyUse source metadata for this

Automate the building of a PS lsquoaround this keyrsquo Take all columns

Historize using SCD-2 approach

Persistent Staging - how

rwerschkull

nllinkedincominrogierwerschkull

Entity levelUnique key

Functional Description

Delivering party

Owner Responsible

MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)

hellip

Persistent Staging Metadata-1

rwerschkull

nllinkedincominrogierwerschkull

Column level [Load Date Timestamp]

[Load End Date Timestamp]

[Deleted Flag] OR delete as new record

[Source system] on table file level (lowest possible)

Load End Date Timestamp possible but difficulthellipRequires updates

Persistent Staging Metadata-2

rwerschkull

nllinkedincominrogierwerschkull

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 20: PS-3C Data Modelling Zone Berlin

Key-Value

Document

Column Family

NoSQL Databases

+SQL on Hadoop solutions

Do NOT like joins

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

This

REALLYcomplicates

AnchoRModeling

rwerschkull

nllinkedincominrogierwerschkull

And Personally

HUB

SAT

LINK

This one

too(Link Satellite)

HUB

HUB

SAT

SAT

rwerschkull

nllinkedincominrogierwerschkull

problem

rwerschkull

nllinkedincominrogierwerschkull

a) HASHING OF Business keys

Rolling

Stock Nr Datetime Sensor Id Value Concatenated Business Key

Key

Len MD5 Hash

Key

Len

8739

2015-01-22

013427 72A1_FINV 123 8739|2015-01-22 013427|72A1_FINV 34 86ae4c6b0e2e2d5a13a0d11440529aeb 32

8739

2015-01-22

013427 72A1_SLDET 100 8739|2015-01-22 013427|72A1_SLDET 35 51ce9bc292eef407bd7c91a52eebcf2e 32

8739

2015-01-22

013432 72A1_FINV 126 8739|2015-01-22 013432|72A1_FINV 34 9482a41c1fecc4c64b8c437af6cc85e8 32

8739

2015-01-22

013432

13A8_MW_UB

AT_VT 5 8739|2015-01-22 013432|13A8_MW_UBAT_VT 42 e4160914ee55ce0b93f87b23366a0ce3 32

8674

2015-01-22

013426 72A1_FINV 6 8674|2015-01-22 013426|72A1_FINV 34 fcb3e7c8c91e44ce396d908a4948ca65 32

8674

2015-01-22

013426

16A1_HSVER

OND 7 8674|2015-01-22 013426|16A1_HSVEROND 38 fe9098c8c291ad56af5c8afae5169196 32

Loses statistical Informationregarding the data distributionQuery optimizers do not like thishellip

Column family Document and Key-value databases need a

good (natural) sharding key for (partial) key-

lookups

Hashinghelliphellip

httpwwwebaytechblogcom20120814cassandra-data-modeling-best-practices-part-2rwerschkull

nllinkedincominrogierwerschkull

Surrogates keys require

centralized coordination

hellipand thus can impact the overall systemrsquos scalability and availability

A lot of MPP NoSQL databases simply do not have themhellip

B) Surrogate BuSINESS keys

rwerschkull

nllinkedincominrogierwerschkull

Then Some Inspiration

httproelantvoscomblogp=1119

rwerschkull

nllinkedincominrogierwerschkull

lsquoIn my opinion the answer lies in the adoption of the

persistent (Historical) Staging Area concept

(also known as Historical Staging or the History Area)

This basically adopts the fundamentals of a Data Warehousersquo

lsquoThe Historical Staging Area effectively lsquoactsrsquo as

Data Lake but in a better defined form as data deltas and

event datetimes are taken into accountrsquo

So

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

Subject Oriented Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Could be a

Data LAKE

VirtualisedEnsemble

Tier

EDW

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

rwerschkull

nllinkedincominrogierwerschkull

How

Does PS-3C Work

rwerschkull

nllinkedincominrogierwerschkullPhoto credit Public Domain

StagingArea

EDWInformation

Marts

Focus of Current ensemble EDWrsquos

rwerschkull

nllinkedincominrogierwerschkull

Persistent StagingArea HSA =

Data LibraryEDW

Information Marts

Splitting the work

rwerschkull

nllinkedincominrogierwerschkull

Persitent

Staging

-

Concept

Context

Connector

Business

Concept

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

Identify source event stream Primary or Unique KeyUse source metadata for this

Automate the building of a PS lsquoaround this keyrsquo Take all columns

Historize using SCD-2 approach

Persistent Staging - how

rwerschkull

nllinkedincominrogierwerschkull

Entity levelUnique key

Functional Description

Delivering party

Owner Responsible

MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)

hellip

Persistent Staging Metadata-1

rwerschkull

nllinkedincominrogierwerschkull

Column level [Load Date Timestamp]

[Load End Date Timestamp]

[Deleted Flag] OR delete as new record

[Source system] on table file level (lowest possible)

Load End Date Timestamp possible but difficulthellipRequires updates

Persistent Staging Metadata-2

rwerschkull

nllinkedincominrogierwerschkull

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 21: PS-3C Data Modelling Zone Berlin

This

REALLYcomplicates

AnchoRModeling

rwerschkull

nllinkedincominrogierwerschkull

And Personally

HUB

SAT

LINK

This one

too(Link Satellite)

HUB

HUB

SAT

SAT

rwerschkull

nllinkedincominrogierwerschkull

problem

rwerschkull

nllinkedincominrogierwerschkull

a) HASHING OF Business keys

Rolling

Stock Nr Datetime Sensor Id Value Concatenated Business Key

Key

Len MD5 Hash

Key

Len

8739

2015-01-22

013427 72A1_FINV 123 8739|2015-01-22 013427|72A1_FINV 34 86ae4c6b0e2e2d5a13a0d11440529aeb 32

8739

2015-01-22

013427 72A1_SLDET 100 8739|2015-01-22 013427|72A1_SLDET 35 51ce9bc292eef407bd7c91a52eebcf2e 32

8739

2015-01-22

013432 72A1_FINV 126 8739|2015-01-22 013432|72A1_FINV 34 9482a41c1fecc4c64b8c437af6cc85e8 32

8739

2015-01-22

013432

13A8_MW_UB

AT_VT 5 8739|2015-01-22 013432|13A8_MW_UBAT_VT 42 e4160914ee55ce0b93f87b23366a0ce3 32

8674

2015-01-22

013426 72A1_FINV 6 8674|2015-01-22 013426|72A1_FINV 34 fcb3e7c8c91e44ce396d908a4948ca65 32

8674

2015-01-22

013426

16A1_HSVER

OND 7 8674|2015-01-22 013426|16A1_HSVEROND 38 fe9098c8c291ad56af5c8afae5169196 32

Loses statistical Informationregarding the data distributionQuery optimizers do not like thishellip

Column family Document and Key-value databases need a

good (natural) sharding key for (partial) key-

lookups

Hashinghelliphellip

httpwwwebaytechblogcom20120814cassandra-data-modeling-best-practices-part-2rwerschkull

nllinkedincominrogierwerschkull

Surrogates keys require

centralized coordination

hellipand thus can impact the overall systemrsquos scalability and availability

A lot of MPP NoSQL databases simply do not have themhellip

B) Surrogate BuSINESS keys

rwerschkull

nllinkedincominrogierwerschkull

Then Some Inspiration

httproelantvoscomblogp=1119

rwerschkull

nllinkedincominrogierwerschkull

lsquoIn my opinion the answer lies in the adoption of the

persistent (Historical) Staging Area concept

(also known as Historical Staging or the History Area)

This basically adopts the fundamentals of a Data Warehousersquo

lsquoThe Historical Staging Area effectively lsquoactsrsquo as

Data Lake but in a better defined form as data deltas and

event datetimes are taken into accountrsquo

So

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

Subject Oriented Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Could be a

Data LAKE

VirtualisedEnsemble

Tier

EDW

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

rwerschkull

nllinkedincominrogierwerschkull

How

Does PS-3C Work

rwerschkull

nllinkedincominrogierwerschkullPhoto credit Public Domain

StagingArea

EDWInformation

Marts

Focus of Current ensemble EDWrsquos

rwerschkull

nllinkedincominrogierwerschkull

Persistent StagingArea HSA =

Data LibraryEDW

Information Marts

Splitting the work

rwerschkull

nllinkedincominrogierwerschkull

Persitent

Staging

-

Concept

Context

Connector

Business

Concept

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

Identify source event stream Primary or Unique KeyUse source metadata for this

Automate the building of a PS lsquoaround this keyrsquo Take all columns

Historize using SCD-2 approach

Persistent Staging - how

rwerschkull

nllinkedincominrogierwerschkull

Entity levelUnique key

Functional Description

Delivering party

Owner Responsible

MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)

hellip

Persistent Staging Metadata-1

rwerschkull

nllinkedincominrogierwerschkull

Column level [Load Date Timestamp]

[Load End Date Timestamp]

[Deleted Flag] OR delete as new record

[Source system] on table file level (lowest possible)

Load End Date Timestamp possible but difficulthellipRequires updates

Persistent Staging Metadata-2

rwerschkull

nllinkedincominrogierwerschkull

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 22: PS-3C Data Modelling Zone Berlin

And Personally

HUB

SAT

LINK

This one

too(Link Satellite)

HUB

HUB

SAT

SAT

rwerschkull

nllinkedincominrogierwerschkull

problem

rwerschkull

nllinkedincominrogierwerschkull

a) HASHING OF Business keys

Rolling

Stock Nr Datetime Sensor Id Value Concatenated Business Key

Key

Len MD5 Hash

Key

Len

8739

2015-01-22

013427 72A1_FINV 123 8739|2015-01-22 013427|72A1_FINV 34 86ae4c6b0e2e2d5a13a0d11440529aeb 32

8739

2015-01-22

013427 72A1_SLDET 100 8739|2015-01-22 013427|72A1_SLDET 35 51ce9bc292eef407bd7c91a52eebcf2e 32

8739

2015-01-22

013432 72A1_FINV 126 8739|2015-01-22 013432|72A1_FINV 34 9482a41c1fecc4c64b8c437af6cc85e8 32

8739

2015-01-22

013432

13A8_MW_UB

AT_VT 5 8739|2015-01-22 013432|13A8_MW_UBAT_VT 42 e4160914ee55ce0b93f87b23366a0ce3 32

8674

2015-01-22

013426 72A1_FINV 6 8674|2015-01-22 013426|72A1_FINV 34 fcb3e7c8c91e44ce396d908a4948ca65 32

8674

2015-01-22

013426

16A1_HSVER

OND 7 8674|2015-01-22 013426|16A1_HSVEROND 38 fe9098c8c291ad56af5c8afae5169196 32

Loses statistical Informationregarding the data distributionQuery optimizers do not like thishellip

Column family Document and Key-value databases need a

good (natural) sharding key for (partial) key-

lookups

Hashinghelliphellip

httpwwwebaytechblogcom20120814cassandra-data-modeling-best-practices-part-2rwerschkull

nllinkedincominrogierwerschkull

Surrogates keys require

centralized coordination

hellipand thus can impact the overall systemrsquos scalability and availability

A lot of MPP NoSQL databases simply do not have themhellip

B) Surrogate BuSINESS keys

rwerschkull

nllinkedincominrogierwerschkull

Then Some Inspiration

httproelantvoscomblogp=1119

rwerschkull

nllinkedincominrogierwerschkull

lsquoIn my opinion the answer lies in the adoption of the

persistent (Historical) Staging Area concept

(also known as Historical Staging or the History Area)

This basically adopts the fundamentals of a Data Warehousersquo

lsquoThe Historical Staging Area effectively lsquoactsrsquo as

Data Lake but in a better defined form as data deltas and

event datetimes are taken into accountrsquo

So

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

Subject Oriented Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Could be a

Data LAKE

VirtualisedEnsemble

Tier

EDW

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

rwerschkull

nllinkedincominrogierwerschkull

How

Does PS-3C Work

rwerschkull

nllinkedincominrogierwerschkullPhoto credit Public Domain

StagingArea

EDWInformation

Marts

Focus of Current ensemble EDWrsquos

rwerschkull

nllinkedincominrogierwerschkull

Persistent StagingArea HSA =

Data LibraryEDW

Information Marts

Splitting the work

rwerschkull

nllinkedincominrogierwerschkull

Persitent

Staging

-

Concept

Context

Connector

Business

Concept

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

Identify source event stream Primary or Unique KeyUse source metadata for this

Automate the building of a PS lsquoaround this keyrsquo Take all columns

Historize using SCD-2 approach

Persistent Staging - how

rwerschkull

nllinkedincominrogierwerschkull

Entity levelUnique key

Functional Description

Delivering party

Owner Responsible

MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)

hellip

Persistent Staging Metadata-1

rwerschkull

nllinkedincominrogierwerschkull

Column level [Load Date Timestamp]

[Load End Date Timestamp]

[Deleted Flag] OR delete as new record

[Source system] on table file level (lowest possible)

Load End Date Timestamp possible but difficulthellipRequires updates

Persistent Staging Metadata-2

rwerschkull

nllinkedincominrogierwerschkull

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 23: PS-3C Data Modelling Zone Berlin

problem

rwerschkull

nllinkedincominrogierwerschkull

a) HASHING OF Business keys

Rolling

Stock Nr Datetime Sensor Id Value Concatenated Business Key

Key

Len MD5 Hash

Key

Len

8739

2015-01-22

013427 72A1_FINV 123 8739|2015-01-22 013427|72A1_FINV 34 86ae4c6b0e2e2d5a13a0d11440529aeb 32

8739

2015-01-22

013427 72A1_SLDET 100 8739|2015-01-22 013427|72A1_SLDET 35 51ce9bc292eef407bd7c91a52eebcf2e 32

8739

2015-01-22

013432 72A1_FINV 126 8739|2015-01-22 013432|72A1_FINV 34 9482a41c1fecc4c64b8c437af6cc85e8 32

8739

2015-01-22

013432

13A8_MW_UB

AT_VT 5 8739|2015-01-22 013432|13A8_MW_UBAT_VT 42 e4160914ee55ce0b93f87b23366a0ce3 32

8674

2015-01-22

013426 72A1_FINV 6 8674|2015-01-22 013426|72A1_FINV 34 fcb3e7c8c91e44ce396d908a4948ca65 32

8674

2015-01-22

013426

16A1_HSVER

OND 7 8674|2015-01-22 013426|16A1_HSVEROND 38 fe9098c8c291ad56af5c8afae5169196 32

Loses statistical Informationregarding the data distributionQuery optimizers do not like thishellip

Column family Document and Key-value databases need a

good (natural) sharding key for (partial) key-

lookups

Hashinghelliphellip

httpwwwebaytechblogcom20120814cassandra-data-modeling-best-practices-part-2rwerschkull

nllinkedincominrogierwerschkull

Surrogates keys require

centralized coordination

hellipand thus can impact the overall systemrsquos scalability and availability

A lot of MPP NoSQL databases simply do not have themhellip

B) Surrogate BuSINESS keys

rwerschkull

nllinkedincominrogierwerschkull

Then Some Inspiration

httproelantvoscomblogp=1119

rwerschkull

nllinkedincominrogierwerschkull

lsquoIn my opinion the answer lies in the adoption of the

persistent (Historical) Staging Area concept

(also known as Historical Staging or the History Area)

This basically adopts the fundamentals of a Data Warehousersquo

lsquoThe Historical Staging Area effectively lsquoactsrsquo as

Data Lake but in a better defined form as data deltas and

event datetimes are taken into accountrsquo

So

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

Subject Oriented Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Could be a

Data LAKE

VirtualisedEnsemble

Tier

EDW

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

rwerschkull

nllinkedincominrogierwerschkull

How

Does PS-3C Work

rwerschkull

nllinkedincominrogierwerschkullPhoto credit Public Domain

StagingArea

EDWInformation

Marts

Focus of Current ensemble EDWrsquos

rwerschkull

nllinkedincominrogierwerschkull

Persistent StagingArea HSA =

Data LibraryEDW

Information Marts

Splitting the work

rwerschkull

nllinkedincominrogierwerschkull

Persitent

Staging

-

Concept

Context

Connector

Business

Concept

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

Identify source event stream Primary or Unique KeyUse source metadata for this

Automate the building of a PS lsquoaround this keyrsquo Take all columns

Historize using SCD-2 approach

Persistent Staging - how

rwerschkull

nllinkedincominrogierwerschkull

Entity levelUnique key

Functional Description

Delivering party

Owner Responsible

MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)

hellip

Persistent Staging Metadata-1

rwerschkull

nllinkedincominrogierwerschkull

Column level [Load Date Timestamp]

[Load End Date Timestamp]

[Deleted Flag] OR delete as new record

[Source system] on table file level (lowest possible)

Load End Date Timestamp possible but difficulthellipRequires updates

Persistent Staging Metadata-2

rwerschkull

nllinkedincominrogierwerschkull

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 24: PS-3C Data Modelling Zone Berlin

a) HASHING OF Business keys

Rolling

Stock Nr Datetime Sensor Id Value Concatenated Business Key

Key

Len MD5 Hash

Key

Len

8739

2015-01-22

013427 72A1_FINV 123 8739|2015-01-22 013427|72A1_FINV 34 86ae4c6b0e2e2d5a13a0d11440529aeb 32

8739

2015-01-22

013427 72A1_SLDET 100 8739|2015-01-22 013427|72A1_SLDET 35 51ce9bc292eef407bd7c91a52eebcf2e 32

8739

2015-01-22

013432 72A1_FINV 126 8739|2015-01-22 013432|72A1_FINV 34 9482a41c1fecc4c64b8c437af6cc85e8 32

8739

2015-01-22

013432

13A8_MW_UB

AT_VT 5 8739|2015-01-22 013432|13A8_MW_UBAT_VT 42 e4160914ee55ce0b93f87b23366a0ce3 32

8674

2015-01-22

013426 72A1_FINV 6 8674|2015-01-22 013426|72A1_FINV 34 fcb3e7c8c91e44ce396d908a4948ca65 32

8674

2015-01-22

013426

16A1_HSVER

OND 7 8674|2015-01-22 013426|16A1_HSVEROND 38 fe9098c8c291ad56af5c8afae5169196 32

Loses statistical Informationregarding the data distributionQuery optimizers do not like thishellip

Column family Document and Key-value databases need a

good (natural) sharding key for (partial) key-

lookups

Hashinghelliphellip

httpwwwebaytechblogcom20120814cassandra-data-modeling-best-practices-part-2rwerschkull

nllinkedincominrogierwerschkull

Surrogates keys require

centralized coordination

hellipand thus can impact the overall systemrsquos scalability and availability

A lot of MPP NoSQL databases simply do not have themhellip

B) Surrogate BuSINESS keys

rwerschkull

nllinkedincominrogierwerschkull

Then Some Inspiration

httproelantvoscomblogp=1119

rwerschkull

nllinkedincominrogierwerschkull

lsquoIn my opinion the answer lies in the adoption of the

persistent (Historical) Staging Area concept

(also known as Historical Staging or the History Area)

This basically adopts the fundamentals of a Data Warehousersquo

lsquoThe Historical Staging Area effectively lsquoactsrsquo as

Data Lake but in a better defined form as data deltas and

event datetimes are taken into accountrsquo

So

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

Subject Oriented Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Could be a

Data LAKE

VirtualisedEnsemble

Tier

EDW

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

rwerschkull

nllinkedincominrogierwerschkull

How

Does PS-3C Work

rwerschkull

nllinkedincominrogierwerschkullPhoto credit Public Domain

StagingArea

EDWInformation

Marts

Focus of Current ensemble EDWrsquos

rwerschkull

nllinkedincominrogierwerschkull

Persistent StagingArea HSA =

Data LibraryEDW

Information Marts

Splitting the work

rwerschkull

nllinkedincominrogierwerschkull

Persitent

Staging

-

Concept

Context

Connector

Business

Concept

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

Identify source event stream Primary or Unique KeyUse source metadata for this

Automate the building of a PS lsquoaround this keyrsquo Take all columns

Historize using SCD-2 approach

Persistent Staging - how

rwerschkull

nllinkedincominrogierwerschkull

Entity levelUnique key

Functional Description

Delivering party

Owner Responsible

MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)

hellip

Persistent Staging Metadata-1

rwerschkull

nllinkedincominrogierwerschkull

Column level [Load Date Timestamp]

[Load End Date Timestamp]

[Deleted Flag] OR delete as new record

[Source system] on table file level (lowest possible)

Load End Date Timestamp possible but difficulthellipRequires updates

Persistent Staging Metadata-2

rwerschkull

nllinkedincominrogierwerschkull

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 25: PS-3C Data Modelling Zone Berlin

Loses statistical Informationregarding the data distributionQuery optimizers do not like thishellip

Column family Document and Key-value databases need a

good (natural) sharding key for (partial) key-

lookups

Hashinghelliphellip

httpwwwebaytechblogcom20120814cassandra-data-modeling-best-practices-part-2rwerschkull

nllinkedincominrogierwerschkull

Surrogates keys require

centralized coordination

hellipand thus can impact the overall systemrsquos scalability and availability

A lot of MPP NoSQL databases simply do not have themhellip

B) Surrogate BuSINESS keys

rwerschkull

nllinkedincominrogierwerschkull

Then Some Inspiration

httproelantvoscomblogp=1119

rwerschkull

nllinkedincominrogierwerschkull

lsquoIn my opinion the answer lies in the adoption of the

persistent (Historical) Staging Area concept

(also known as Historical Staging or the History Area)

This basically adopts the fundamentals of a Data Warehousersquo

lsquoThe Historical Staging Area effectively lsquoactsrsquo as

Data Lake but in a better defined form as data deltas and

event datetimes are taken into accountrsquo

So

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

Subject Oriented Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Could be a

Data LAKE

VirtualisedEnsemble

Tier

EDW

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

rwerschkull

nllinkedincominrogierwerschkull

How

Does PS-3C Work

rwerschkull

nllinkedincominrogierwerschkullPhoto credit Public Domain

StagingArea

EDWInformation

Marts

Focus of Current ensemble EDWrsquos

rwerschkull

nllinkedincominrogierwerschkull

Persistent StagingArea HSA =

Data LibraryEDW

Information Marts

Splitting the work

rwerschkull

nllinkedincominrogierwerschkull

Persitent

Staging

-

Concept

Context

Connector

Business

Concept

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

Identify source event stream Primary or Unique KeyUse source metadata for this

Automate the building of a PS lsquoaround this keyrsquo Take all columns

Historize using SCD-2 approach

Persistent Staging - how

rwerschkull

nllinkedincominrogierwerschkull

Entity levelUnique key

Functional Description

Delivering party

Owner Responsible

MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)

hellip

Persistent Staging Metadata-1

rwerschkull

nllinkedincominrogierwerschkull

Column level [Load Date Timestamp]

[Load End Date Timestamp]

[Deleted Flag] OR delete as new record

[Source system] on table file level (lowest possible)

Load End Date Timestamp possible but difficulthellipRequires updates

Persistent Staging Metadata-2

rwerschkull

nllinkedincominrogierwerschkull

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 26: PS-3C Data Modelling Zone Berlin

Surrogates keys require

centralized coordination

hellipand thus can impact the overall systemrsquos scalability and availability

A lot of MPP NoSQL databases simply do not have themhellip

B) Surrogate BuSINESS keys

rwerschkull

nllinkedincominrogierwerschkull

Then Some Inspiration

httproelantvoscomblogp=1119

rwerschkull

nllinkedincominrogierwerschkull

lsquoIn my opinion the answer lies in the adoption of the

persistent (Historical) Staging Area concept

(also known as Historical Staging or the History Area)

This basically adopts the fundamentals of a Data Warehousersquo

lsquoThe Historical Staging Area effectively lsquoactsrsquo as

Data Lake but in a better defined form as data deltas and

event datetimes are taken into accountrsquo

So

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

Subject Oriented Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Could be a

Data LAKE

VirtualisedEnsemble

Tier

EDW

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

rwerschkull

nllinkedincominrogierwerschkull

How

Does PS-3C Work

rwerschkull

nllinkedincominrogierwerschkullPhoto credit Public Domain

StagingArea

EDWInformation

Marts

Focus of Current ensemble EDWrsquos

rwerschkull

nllinkedincominrogierwerschkull

Persistent StagingArea HSA =

Data LibraryEDW

Information Marts

Splitting the work

rwerschkull

nllinkedincominrogierwerschkull

Persitent

Staging

-

Concept

Context

Connector

Business

Concept

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

Identify source event stream Primary or Unique KeyUse source metadata for this

Automate the building of a PS lsquoaround this keyrsquo Take all columns

Historize using SCD-2 approach

Persistent Staging - how

rwerschkull

nllinkedincominrogierwerschkull

Entity levelUnique key

Functional Description

Delivering party

Owner Responsible

MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)

hellip

Persistent Staging Metadata-1

rwerschkull

nllinkedincominrogierwerschkull

Column level [Load Date Timestamp]

[Load End Date Timestamp]

[Deleted Flag] OR delete as new record

[Source system] on table file level (lowest possible)

Load End Date Timestamp possible but difficulthellipRequires updates

Persistent Staging Metadata-2

rwerschkull

nllinkedincominrogierwerschkull

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 27: PS-3C Data Modelling Zone Berlin

Then Some Inspiration

httproelantvoscomblogp=1119

rwerschkull

nllinkedincominrogierwerschkull

lsquoIn my opinion the answer lies in the adoption of the

persistent (Historical) Staging Area concept

(also known as Historical Staging or the History Area)

This basically adopts the fundamentals of a Data Warehousersquo

lsquoThe Historical Staging Area effectively lsquoactsrsquo as

Data Lake but in a better defined form as data deltas and

event datetimes are taken into accountrsquo

So

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

Subject Oriented Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Could be a

Data LAKE

VirtualisedEnsemble

Tier

EDW

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

rwerschkull

nllinkedincominrogierwerschkull

How

Does PS-3C Work

rwerschkull

nllinkedincominrogierwerschkullPhoto credit Public Domain

StagingArea

EDWInformation

Marts

Focus of Current ensemble EDWrsquos

rwerschkull

nllinkedincominrogierwerschkull

Persistent StagingArea HSA =

Data LibraryEDW

Information Marts

Splitting the work

rwerschkull

nllinkedincominrogierwerschkull

Persitent

Staging

-

Concept

Context

Connector

Business

Concept

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

Identify source event stream Primary or Unique KeyUse source metadata for this

Automate the building of a PS lsquoaround this keyrsquo Take all columns

Historize using SCD-2 approach

Persistent Staging - how

rwerschkull

nllinkedincominrogierwerschkull

Entity levelUnique key

Functional Description

Delivering party

Owner Responsible

MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)

hellip

Persistent Staging Metadata-1

rwerschkull

nllinkedincominrogierwerschkull

Column level [Load Date Timestamp]

[Load End Date Timestamp]

[Deleted Flag] OR delete as new record

[Source system] on table file level (lowest possible)

Load End Date Timestamp possible but difficulthellipRequires updates

Persistent Staging Metadata-2

rwerschkull

nllinkedincominrogierwerschkull

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 28: PS-3C Data Modelling Zone Berlin

lsquoIn my opinion the answer lies in the adoption of the

persistent (Historical) Staging Area concept

(also known as Historical Staging or the History Area)

This basically adopts the fundamentals of a Data Warehousersquo

lsquoThe Historical Staging Area effectively lsquoactsrsquo as

Data Lake but in a better defined form as data deltas and

event datetimes are taken into accountrsquo

So

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

Subject Oriented Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Could be a

Data LAKE

VirtualisedEnsemble

Tier

EDW

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

rwerschkull

nllinkedincominrogierwerschkull

How

Does PS-3C Work

rwerschkull

nllinkedincominrogierwerschkullPhoto credit Public Domain

StagingArea

EDWInformation

Marts

Focus of Current ensemble EDWrsquos

rwerschkull

nllinkedincominrogierwerschkull

Persistent StagingArea HSA =

Data LibraryEDW

Information Marts

Splitting the work

rwerschkull

nllinkedincominrogierwerschkull

Persitent

Staging

-

Concept

Context

Connector

Business

Concept

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

Identify source event stream Primary or Unique KeyUse source metadata for this

Automate the building of a PS lsquoaround this keyrsquo Take all columns

Historize using SCD-2 approach

Persistent Staging - how

rwerschkull

nllinkedincominrogierwerschkull

Entity levelUnique key

Functional Description

Delivering party

Owner Responsible

MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)

hellip

Persistent Staging Metadata-1

rwerschkull

nllinkedincominrogierwerschkull

Column level [Load Date Timestamp]

[Load End Date Timestamp]

[Deleted Flag] OR delete as new record

[Source system] on table file level (lowest possible)

Load End Date Timestamp possible but difficulthellipRequires updates

Persistent Staging Metadata-2

rwerschkull

nllinkedincominrogierwerschkull

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 29: PS-3C Data Modelling Zone Berlin

So

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

Subject Oriented Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Could be a

Data LAKE

VirtualisedEnsemble

Tier

EDW

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

rwerschkull

nllinkedincominrogierwerschkull

How

Does PS-3C Work

rwerschkull

nllinkedincominrogierwerschkullPhoto credit Public Domain

StagingArea

EDWInformation

Marts

Focus of Current ensemble EDWrsquos

rwerschkull

nllinkedincominrogierwerschkull

Persistent StagingArea HSA =

Data LibraryEDW

Information Marts

Splitting the work

rwerschkull

nllinkedincominrogierwerschkull

Persitent

Staging

-

Concept

Context

Connector

Business

Concept

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

Identify source event stream Primary or Unique KeyUse source metadata for this

Automate the building of a PS lsquoaround this keyrsquo Take all columns

Historize using SCD-2 approach

Persistent Staging - how

rwerschkull

nllinkedincominrogierwerschkull

Entity levelUnique key

Functional Description

Delivering party

Owner Responsible

MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)

hellip

Persistent Staging Metadata-1

rwerschkull

nllinkedincominrogierwerschkull

Column level [Load Date Timestamp]

[Load End Date Timestamp]

[Deleted Flag] OR delete as new record

[Source system] on table file level (lowest possible)

Load End Date Timestamp possible but difficulthellipRequires updates

Persistent Staging Metadata-2

rwerschkull

nllinkedincominrogierwerschkull

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 30: PS-3C Data Modelling Zone Berlin

Could be a

Data LAKE

VirtualisedEnsemble

Tier

EDW

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

rwerschkull

nllinkedincominrogierwerschkull

How

Does PS-3C Work

rwerschkull

nllinkedincominrogierwerschkullPhoto credit Public Domain

StagingArea

EDWInformation

Marts

Focus of Current ensemble EDWrsquos

rwerschkull

nllinkedincominrogierwerschkull

Persistent StagingArea HSA =

Data LibraryEDW

Information Marts

Splitting the work

rwerschkull

nllinkedincominrogierwerschkull

Persitent

Staging

-

Concept

Context

Connector

Business

Concept

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

Identify source event stream Primary or Unique KeyUse source metadata for this

Automate the building of a PS lsquoaround this keyrsquo Take all columns

Historize using SCD-2 approach

Persistent Staging - how

rwerschkull

nllinkedincominrogierwerschkull

Entity levelUnique key

Functional Description

Delivering party

Owner Responsible

MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)

hellip

Persistent Staging Metadata-1

rwerschkull

nllinkedincominrogierwerschkull

Column level [Load Date Timestamp]

[Load End Date Timestamp]

[Deleted Flag] OR delete as new record

[Source system] on table file level (lowest possible)

Load End Date Timestamp possible but difficulthellipRequires updates

Persistent Staging Metadata-2

rwerschkull

nllinkedincominrogierwerschkull

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 31: PS-3C Data Modelling Zone Berlin

How

Does PS-3C Work

rwerschkull

nllinkedincominrogierwerschkullPhoto credit Public Domain

StagingArea

EDWInformation

Marts

Focus of Current ensemble EDWrsquos

rwerschkull

nllinkedincominrogierwerschkull

Persistent StagingArea HSA =

Data LibraryEDW

Information Marts

Splitting the work

rwerschkull

nllinkedincominrogierwerschkull

Persitent

Staging

-

Concept

Context

Connector

Business

Concept

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

Identify source event stream Primary or Unique KeyUse source metadata for this

Automate the building of a PS lsquoaround this keyrsquo Take all columns

Historize using SCD-2 approach

Persistent Staging - how

rwerschkull

nllinkedincominrogierwerschkull

Entity levelUnique key

Functional Description

Delivering party

Owner Responsible

MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)

hellip

Persistent Staging Metadata-1

rwerschkull

nllinkedincominrogierwerschkull

Column level [Load Date Timestamp]

[Load End Date Timestamp]

[Deleted Flag] OR delete as new record

[Source system] on table file level (lowest possible)

Load End Date Timestamp possible but difficulthellipRequires updates

Persistent Staging Metadata-2

rwerschkull

nllinkedincominrogierwerschkull

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 32: PS-3C Data Modelling Zone Berlin

StagingArea

EDWInformation

Marts

Focus of Current ensemble EDWrsquos

rwerschkull

nllinkedincominrogierwerschkull

Persistent StagingArea HSA =

Data LibraryEDW

Information Marts

Splitting the work

rwerschkull

nllinkedincominrogierwerschkull

Persitent

Staging

-

Concept

Context

Connector

Business

Concept

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

Identify source event stream Primary or Unique KeyUse source metadata for this

Automate the building of a PS lsquoaround this keyrsquo Take all columns

Historize using SCD-2 approach

Persistent Staging - how

rwerschkull

nllinkedincominrogierwerschkull

Entity levelUnique key

Functional Description

Delivering party

Owner Responsible

MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)

hellip

Persistent Staging Metadata-1

rwerschkull

nllinkedincominrogierwerschkull

Column level [Load Date Timestamp]

[Load End Date Timestamp]

[Deleted Flag] OR delete as new record

[Source system] on table file level (lowest possible)

Load End Date Timestamp possible but difficulthellipRequires updates

Persistent Staging Metadata-2

rwerschkull

nllinkedincominrogierwerschkull

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 33: PS-3C Data Modelling Zone Berlin

Persistent StagingArea HSA =

Data LibraryEDW

Information Marts

Splitting the work

rwerschkull

nllinkedincominrogierwerschkull

Persitent

Staging

-

Concept

Context

Connector

Business

Concept

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

Identify source event stream Primary or Unique KeyUse source metadata for this

Automate the building of a PS lsquoaround this keyrsquo Take all columns

Historize using SCD-2 approach

Persistent Staging - how

rwerschkull

nllinkedincominrogierwerschkull

Entity levelUnique key

Functional Description

Delivering party

Owner Responsible

MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)

hellip

Persistent Staging Metadata-1

rwerschkull

nllinkedincominrogierwerschkull

Column level [Load Date Timestamp]

[Load End Date Timestamp]

[Deleted Flag] OR delete as new record

[Source system] on table file level (lowest possible)

Load End Date Timestamp possible but difficulthellipRequires updates

Persistent Staging Metadata-2

rwerschkull

nllinkedincominrogierwerschkull

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 34: PS-3C Data Modelling Zone Berlin

Persitent

Staging

-

Concept

Context

Connector

Business

Concept

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

Identify source event stream Primary or Unique KeyUse source metadata for this

Automate the building of a PS lsquoaround this keyrsquo Take all columns

Historize using SCD-2 approach

Persistent Staging - how

rwerschkull

nllinkedincominrogierwerschkull

Entity levelUnique key

Functional Description

Delivering party

Owner Responsible

MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)

hellip

Persistent Staging Metadata-1

rwerschkull

nllinkedincominrogierwerschkull

Column level [Load Date Timestamp]

[Load End Date Timestamp]

[Deleted Flag] OR delete as new record

[Source system] on table file level (lowest possible)

Load End Date Timestamp possible but difficulthellipRequires updates

Persistent Staging Metadata-2

rwerschkull

nllinkedincominrogierwerschkull

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 35: PS-3C Data Modelling Zone Berlin

modelling

rwerschkull

nllinkedincominrogierwerschkull

Identify source event stream Primary or Unique KeyUse source metadata for this

Automate the building of a PS lsquoaround this keyrsquo Take all columns

Historize using SCD-2 approach

Persistent Staging - how

rwerschkull

nllinkedincominrogierwerschkull

Entity levelUnique key

Functional Description

Delivering party

Owner Responsible

MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)

hellip

Persistent Staging Metadata-1

rwerschkull

nllinkedincominrogierwerschkull

Column level [Load Date Timestamp]

[Load End Date Timestamp]

[Deleted Flag] OR delete as new record

[Source system] on table file level (lowest possible)

Load End Date Timestamp possible but difficulthellipRequires updates

Persistent Staging Metadata-2

rwerschkull

nllinkedincominrogierwerschkull

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 36: PS-3C Data Modelling Zone Berlin

Identify source event stream Primary or Unique KeyUse source metadata for this

Automate the building of a PS lsquoaround this keyrsquo Take all columns

Historize using SCD-2 approach

Persistent Staging - how

rwerschkull

nllinkedincominrogierwerschkull

Entity levelUnique key

Functional Description

Delivering party

Owner Responsible

MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)

hellip

Persistent Staging Metadata-1

rwerschkull

nllinkedincominrogierwerschkull

Column level [Load Date Timestamp]

[Load End Date Timestamp]

[Deleted Flag] OR delete as new record

[Source system] on table file level (lowest possible)

Load End Date Timestamp possible but difficulthellipRequires updates

Persistent Staging Metadata-2

rwerschkull

nllinkedincominrogierwerschkull

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 37: PS-3C Data Modelling Zone Berlin

Entity levelUnique key

Functional Description

Delivering party

Owner Responsible

MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)

hellip

Persistent Staging Metadata-1

rwerschkull

nllinkedincominrogierwerschkull

Column level [Load Date Timestamp]

[Load End Date Timestamp]

[Deleted Flag] OR delete as new record

[Source system] on table file level (lowest possible)

Load End Date Timestamp possible but difficulthellipRequires updates

Persistent Staging Metadata-2

rwerschkull

nllinkedincominrogierwerschkull

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 38: PS-3C Data Modelling Zone Berlin

Column level [Load Date Timestamp]

[Load End Date Timestamp]

[Deleted Flag] OR delete as new record

[Source system] on table file level (lowest possible)

Load End Date Timestamp possible but difficulthellipRequires updates

Persistent Staging Metadata-2

rwerschkull

nllinkedincominrogierwerschkull

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 39: PS-3C Data Modelling Zone Berlin

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 40: PS-3C Data Modelling Zone Berlin

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 41: PS-3C Data Modelling Zone Berlin

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 42: PS-3C Data Modelling Zone Berlin

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 43: PS-3C Data Modelling Zone Berlin

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 44: PS-3C Data Modelling Zone Berlin

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 45: PS-3C Data Modelling Zone Berlin

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 46: PS-3C Data Modelling Zone Berlin

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 47: PS-3C Data Modelling Zone Berlin

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 48: PS-3C Data Modelling Zone Berlin

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 49: PS-3C Data Modelling Zone Berlin

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 50: PS-3C Data Modelling Zone Berlin

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 51: PS-3C Data Modelling Zone Berlin

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 52: PS-3C Data Modelling Zone Berlin

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 53: PS-3C Data Modelling Zone Berlin

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 54: PS-3C Data Modelling Zone Berlin

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 55: PS-3C Data Modelling Zone Berlin

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 56: PS-3C Data Modelling Zone Berlin

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 57: PS-3C Data Modelling Zone Berlin

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 58: PS-3C Data Modelling Zone Berlin

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 59: PS-3C Data Modelling Zone Berlin

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 60: PS-3C Data Modelling Zone Berlin

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 61: PS-3C Data Modelling Zone Berlin

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 62: PS-3C Data Modelling Zone Berlin

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 63: PS-3C Data Modelling Zone Berlin

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 64: PS-3C Data Modelling Zone Berlin

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 65: PS-3C Data Modelling Zone Berlin

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 66: PS-3C Data Modelling Zone Berlin

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 67: PS-3C Data Modelling Zone Berlin

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 68: PS-3C Data Modelling Zone Berlin

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 69: PS-3C Data Modelling Zone Berlin

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 70: PS-3C Data Modelling Zone Berlin

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 71: PS-3C Data Modelling Zone Berlin

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 72: PS-3C Data Modelling Zone Berlin

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 73: PS-3C Data Modelling Zone Berlin

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 74: PS-3C Data Modelling Zone Berlin

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 75: PS-3C Data Modelling Zone Berlin

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 76: PS-3C Data Modelling Zone Berlin

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 77: PS-3C Data Modelling Zone Berlin

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 78: PS-3C Data Modelling Zone Berlin

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 79: PS-3C Data Modelling Zone Berlin

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Page 80: PS-3C Data Modelling Zone Berlin

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull