1 Pivotal Confidential–Internal Use Only Modern Data Architecture Alexey Grishchenko
1 Pivotal Confidential–Internal Use Only 1 Pivotal Confidential–Internal Use Only
Modern Data Architecture Alexey Grishchenko
2 Pivotal Confidential–Internal Use Only
About me
Enterprise Architect @ Pivotal � 7 years in data processing
� 5 years with MPP
� 4 years with Hadoop
� Spark contributor
� http://0x0fff.com
3 Pivotal Confidential–Internal Use Only
How it started…
Front End
4 Pivotal Confidential–Internal Use Only
How it started…
Front End
Back End
5 Pivotal Confidential–Internal Use Only
How it started…
Front End
Back End
DBMS
6 Pivotal Confidential–Internal Use Only
How it started…
Front End
Back End
DBMS What about BI?
7 Pivotal Confidential–Internal Use Only
How it started…
Front End
Back End
DBMS Just put it there!
8 Pivotal Confidential–Internal Use Only
How it started…
Front End
Back End
DBMS
BI
9 Pivotal Confidential–Internal Use Only
How it started…
Front End
Back End
DBMS
BI
Was it fast?
10 Pivotal Confidential–Internal Use Only
How it started…
Front End
10ms
Back End
DBMS
BI
100ms
200ms
1-2 min
11 Pivotal Confidential–Internal Use Only
How it started…
Front End
10ms
Back End
DBMS
BI
100ms
200ms
1-2 min
yes, single server…
12 Pivotal Confidential–Internal Use Only
First Issues
Front End
10ms
Back End
DBMS
BI
100ms
200ms
1-2 min
More users got workstations
13 Pivotal Confidential–Internal Use Only
First Issues
Front End
10ms
Back End
DBMS
BI
400ms
800ms
1-2 min
14 Pivotal Confidential–Internal Use Only
First Issues
Front End
10ms
Back End
DBMS
BI
400ms
800ms
1-2 min
Split!
15 Pivotal Confidential–Internal Use Only
First Issues
Front End
10ms
Back End
DBMS
BI
300ms
600ms
1-2 min
16 Pivotal Confidential–Internal Use Only
First Issues
Front End
10ms
Back End
DBMS
BI
300ms
600ms
1-2 min
Even more users?
17 Pivotal Confidential–Internal Use Only
First Issues
Front End
10ms
Back End
DBMS
BI
300ms
600ms
1-2 min
Split!
18 Pivotal Confidential–Internal Use Only
First Issues
Front End
10ms
Back End
DBMS
BI
100ms
400ms
1-2 min
Front End
Back End
Front End
Back End
19 Pivotal Confidential–Internal Use Only
First Issues
Front End
10ms
Back End
DBMS
BI
100ms
400ms
1-2 min
Front End
Back End
Front End
Back End
What about automated systems?
20 Pivotal Confidential–Internal Use Only
First Issues
Front End
10ms
Back End
DBMS
BI
100ms
1 sec
5-10 min
Front End
Back End
Front End
Back End
Front End
Back End
Front End
Back End
21 Pivotal Confidential–Internal Use Only
First Issues
Front End
10ms
Back End
DBMS
BI
100ms
1 sec
5-10 min
Front End
Back End
Front End
Back End
Front End
Back End
Front End
Back End
Database, please, live!
22 Pivotal Confidential–Internal Use Only
First Issues
Front End
10ms
Back End
DBMS
BI
100ms
1 sec
5-10 min
Front End
Back End
Front End
Back End
Front End
Back End
Front End
Back End
23 Pivotal Confidential–Internal Use Only
First Issues
Front End
10ms
Back End
DBMS
BI
100ms
800ms
15-20 min
Front End
Back End
Front End
Back End
Front End
Back End
Front End
Back End
24 Pivotal Confidential–Internal Use Only
First Issues
Front End
10ms
Back End
DBMS
BI
100ms
800ms
15-20 min
Front End
Back End
Front End
Back End
Front End
Back End
Front End
Back End
What if “split” didn’t help this time?
25 Pivotal Confidential–Internal Use Only
First Issues
Front End
10ms
Back End
DBMS
BI
100ms
800ms
15-20 min
Front End
Back End
Front End
Back End
Front End
Back End
Front End
Back End
Split more! Eventually it will help…
26 Pivotal Confidential–Internal Use Only
First Issues
Front End
10ms
Back End
DBMS
BI
100ms
300ms
35-40 min
Front End
Back End
Front End
Back End
Front End
Back End
Front End
Back End
DBMS DBMS DBMS DBMS
27 Pivotal Confidential–Internal Use Only
First Issues
Front End
10ms
Back End
DBMS
BI
100ms
300ms
35-40 min
Front End
Back End
Front End
Back End
Front End
Back End
Front End
Back End
DBMS DBMS DBMS DBMS
28 Pivotal Confidential–Internal Use Only
First Issues
Front End
10ms
Back End
DBMS
BI
100ms
300ms
35-40 min
Front End
Back End
Front End
Back End
Front End
Back End
Front End
Back End
DBMS DBMS DBMS DBMS
Sales went 10% up!
29 Pivotal Confidential–Internal Use Only
First Issues
Front End
10ms
Back End
DBMS
BI
100ms
300ms
35-40 min
Front End
Back End
Front End
Back End
Front End
Back End
Front End
Back End
DBMS DBMS DBMS DBMS
Sales went 10% up!
Sales went 20% down!
30 Pivotal Confidential–Internal Use Only
First Issues
Front End
10ms
Back End
DBMS
BI
100ms
600ms
2-3 hrs
Front End
Back End
Front End
Back End
Front End
Back End
Front End
Back End
DBMS DBMS DBMS DBMS
Sales went 10% up!
Sales went 20% down!
31 Pivotal Confidential–Internal Use Only
First Issues
Front End
10ms
Back End
DBMS
BI
100ms
600ms
2-3 hrs
Front End
Back End
Front End
Back End
Front End
Back End
Front End
Back End
DBMS DBMS DBMS DBMS
Sales went 10% up!
Sales went 20% down!
Stop loading my system with your stupid reports!
32 Pivotal Confidential–Internal Use Only
BI
The Era of Data Warehouse
100ms
DBMS 300ms
2 days
FE BE
DBMS DBMS DBMS DBMS
FE BE
FE BE
FE BE
FE BE
ETL
DWH 1 day
33 Pivotal Confidential–Internal Use Only
BI
The Era of Data Warehouse
100ms
DBMS 300ms
2 days
FE BE
DBMS DBMS DBMS DBMS
FE BE
FE BE
FE BE
FE BE
ETL
DWH 1 day
We need more reports!
34 Pivotal Confidential–Internal Use Only
BI
The Era of Data Warehouse
100ms
DBMS 300ms
3-4 days
FE BE
DBMS DBMS DBMS DBMS
FE BE
FE BE
FE BE
FE BE
ETL
DWH 1 day
Data Mining OLAP …
35 Pivotal Confidential–Internal Use Only
BI
The Era of Data Warehouse
100ms
DBMS 300ms
3-4 days
FE BE
DBMS DBMS DBMS DBMS
FE BE
FE BE
FE BE
FE BE
ETL
DWH 1 day
Data Mining OLAP … We need
secondary site!
36 Pivotal Confidential–Internal Use Only
The Era of Data Warehouse
100ms
300ms
3-4 days
FE BE
DBMS DBMS
FE BE
DBMS
FE BE
DBMS
FE BE
DBMS
FE BE
ETL
DWH 1 day
BI Data Mining OLAP …
37 Pivotal Confidential–Internal Use Only
The Era of Data Warehouse
100ms
300ms
3-4 days
FE BE
DBMS DBMS
FE BE
DBMS
FE BE
DBMS
FE BE
DBMS
FE BE
ETL
DWH 1 day
BI Data Mining OLAP …
FE BE
DBMS DBMS
FE BE
DBMS
FE BE
DBMS
FE BE
DBMS
FE BE
WAL Replication
3-5 minutes late
38 Pivotal Confidential–Internal Use Only
The Era of Data Warehouse
100ms
300ms
3-4 days
FE BE
DBMS DBMS
FE BE
DBMS
FE BE
DBMS
FE BE
DBMS
FE BE
ETL
DWH 1 day
BI Data Mining OLAP …
FE BE
DBMS DBMS
FE BE
DBMS
FE BE
DBMS
FE BE
DBMS
FE BE
WAL Replication
3-5 minutes late
39 Pivotal Confidential–Internal Use Only
The Era of Data Warehouse
100ms
300ms
3-4 days
FE BE
DBMS DBMS
FE BE
DBMS
FE BE
DBMS
FE BE
DBMS
FE BE
ETL
DWH 1 day
BI Data Mining OLAP …
FE BE
DBMS DBMS
FE BE
DBMS
FE BE
DBMS
FE BE
DBMS
FE BE
WAL Replication
3-5 minutes late
Where is our DWH? We need this data now!
40 Pivotal Confidential–Internal Use Only
The Era of Data Warehouse
100ms
300ms
3-4 days
FE BE
DBMS DBMS
FE BE
DBMS
FE BE
DBMS
FE BE
DBMS
FE BE
ETL
DWH 1 day
BI Data Mining OLAP …
FE BE
DBMS DBMS
FE BE
DBMS
FE BE
DBMS
FE BE
DBMS
FE BE
WAL Replication
3-5 minutes late
41 Pivotal Confidential–Internal Use Only
ETL
The Era of Data Warehouse
100ms
300ms
3-4 days
FE BE
DBMS DBMS
FE BE
DBMS
FE BE
DBMS
FE BE
DBMS
FE BE
ETL
DWH 1 day
BI Data Mining OLAP …
FE BE
FE BE
FE BE
FE BE
FE BE
WAL Replication
3-5 minutes late
NAS NAS Backup / Restore
3 days late DWH
BI Data Mining OLAP …
5-7 days
DBMS DBMS DBMS DBMS DBMS
42 Pivotal Confidential–Internal Use Only
ETL
The Era of Data Warehouse
100ms
300ms
3-4 days
FE BE
DBMS DBMS
FE BE
DBMS
FE BE
DBMS
FE BE
DBMS
FE BE
ETL
DWH 1 day
BI Data Mining OLAP …
FE BE
FE BE
FE BE
FE BE
FE BE
WAL Replication
3-5 minutes late
NAS NAS Backup / Restore
3 days late DWH
BI Data Mining OLAP …
5-7 days
DBMS DBMS DBMS DBMS DBMS
Why is this data so old?
43 Pivotal Confidential–Internal Use Only
ETL
The Era of Data Warehouse
100ms
300ms
3-4 days
FE BE
DBMS DBMS
FE BE
DBMS
FE BE
DBMS
FE BE
DBMS
FE BE
ETL
DWH 1 day
BI Data Mining OLAP …
FE BE
FE BE
FE BE
FE BE
FE BE
WAL Replication
3-5 minutes late
NAS NAS Backup / Restore
3 days late DWH
BI Data Mining OLAP …
5-7 days
DBMS DBMS DBMS DBMS DBMS
44 Pivotal Confidential–Internal Use Only
ETL
Advanced Architecture – ELT
100ms
300ms
3-4 days
FE BE
DBMS DBMS
FE BE
DBMS
FE BE
DBMS
FE BE
DBMS
FE BE
ETL
DWH 1 day
BI Data Mining OLAP …
FE BE
FE BE
FE BE
FE BE
FE BE
WAL Replication
3-5 minutes late
NAS NAS Backup / Restore
3 days late DWH
BI Data Mining OLAP …
5-7 days
DBMS DBMS DBMS DBMS DBMS
DBMS DBMS DBMS …
ETL
DDS
Data Marts Reports
Aggregates
OLAP
DBMS DBMS DBMS …
ELT
DDS
Data Marts Reports
Aggregates
OLAP
ODS ODS ODS …
45 Pivotal Confidential–Internal Use Only
ELT
Advanced Architecture – ELT
100ms
300ms
3-4 days
FE BE
DBMS DBMS
FE BE
DBMS
FE BE
DBMS
FE BE
DBMS
FE BE
ELT
DWH 1 day
BI Data Mining OLAP …
FE BE
FE BE
FE BE
FE BE
FE BE
WAL Replication
3-5 minutes late
NAS NAS Backup / Restore
3 days late DWH
BI Data Mining OLAP …
5-7 days
DBMS DBMS DBMS DBMS DBMS
46 Pivotal Confidential–Internal Use Only
ELT
Advanced Architecture – CDC
100ms
300ms
3-4 days
FE BE
DBMS DBMS
FE BE
DBMS
FE BE
DBMS
FE BE
DBMS
FE BE
ELT
DWH 1 day
BI Data Mining OLAP …
FE BE
FE BE
FE BE
FE BE
FE BE
WAL Replication
3-5 minutes late
NAS NAS Backup / Restore
3 days late DWH
BI Data Mining OLAP …
5-7 days
DBMS DBMS DBMS DBMS DBMS
DBMS DBMS DBMS …
ELT
DDS
Data Marts Reports
Aggregates
OLAP
ODS ODS ODS …
DBMS DBMS DBMS …
ELT
DDS
Data Marts Reports
Aggregates
OLAP
ODS ODS ODS …
CDC
1 day
1 hour
47 Pivotal Confidential–Internal Use Only
ELT CDC
Advanced Architecture – CDC
100ms
300ms
1-4 days
FE BE
DBMS DBMS
FE BE
DBMS
FE BE
DBMS
FE BE
DBMS
FE BE
ELT
DWH 3-24 hrs
BI Data Mining OLAP …
FE BE
FE BE
FE BE
FE BE
FE BE
WAL Replication
3-5 minutes late
NAS NAS Backup / Restore
3 days late
BI Data Mining OLAP …
4-7 days
DBMS DBMS DBMS DBMS DBMS
CDC
DWH
48 Pivotal Confidential–Internal Use Only
ELT CDC
Advanced Architecture – CDC
100ms
300ms
1-4 days
FE BE
DBMS DBMS
FE BE
DBMS
FE BE
DBMS
FE BE
DBMS
FE BE
ELT
DWH 3-24 hrs
BI Data Mining OLAP …
FE BE
FE BE
FE BE
FE BE
FE BE
WAL Replication
3-5 minutes late
NAS NAS Backup / Restore
3 days late
BI Data Mining OLAP …
4-7 days
DBMS DBMS DBMS DBMS DBMS
CDC
DWH
Why is our secondary site’s
DWH so old?
49 Pivotal Confidential–Internal Use Only
ELT CDC
100ms
300ms
1-4 days
FE BE
DBMS DBMS
FE BE
DBMS
FE BE
DBMS
FE BE
DBMS
FE BE
ELT
DWH 3-24 hrs
BI Data Mining OLAP …
FE BE
FE BE
FE BE
FE BE
FE BE
WAL Replication
3-5 minutes late
NAS NAS Backup / Restore
3 days late
BI Data Mining OLAP …
4-7 days
DBMS DBMS DBMS DBMS DBMS
CDC
DWH
Moving Forward
50 Pivotal Confidential–Internal Use Only
ELT CDC
100ms
300ms
1-4 days
FE BE
DBMS DBMS
FE BE
DBMS
FE BE
DBMS
FE BE
DBMS
FE BE
ELT
DWH 3-24 hrs
BI Data Mining OLAP …
FE BE
FE BE
FE BE
FE BE
FE BE
WAL Replication
3-5 minutes late
NAS NAS Backup / Restore
3 days late
BI Data Mining OLAP …
4-7 days
DBMS DBMS DBMS DBMS DBMS
CDC
DWH
Our problems are
Moving Forward
51 Pivotal Confidential–Internal Use Only
ELT CDC
100ms
300ms
1-4 days
FE BE
DBMS DBMS
FE BE
DBMS
FE BE
DBMS
FE BE
DBMS
FE BE
ELT
DWH 3-24 hrs
BI Data Mining OLAP …
FE BE
FE BE
FE BE
FE BE
FE BE
WAL Replication
3-5 minutes late
NAS NAS Backup / Restore
3 days late
BI Data Mining OLAP …
4-7 days
DBMS DBMS DBMS DBMS DBMS
CDC
DWH
Our problems are
Ø Time to action takes up to 7 days
Moving Forward
52 Pivotal Confidential–Internal Use Only
ELT CDC
100ms
300ms
1-4 days
FE BE
DBMS DBMS
FE BE
DBMS
FE BE
DBMS
FE BE
DBMS
FE BE
ELT
DWH 3-24 hrs
BI Data Mining OLAP …
FE BE
FE BE
FE BE
FE BE
FE BE
WAL Replication
3-5 minutes late
NAS NAS Backup / Restore
3 days late
BI Data Mining OLAP …
4-7 days
DBMS DBMS DBMS DBMS DBMS
CDC
DWH
Our problems are
Ø Time to action takes up to 7 days
Ø Amount of data is growing
Moving Forward
53 Pivotal Confidential–Internal Use Only
ELT CDC
100ms
300ms
1-4 days
FE BE
DBMS DBMS
FE BE
DBMS
FE BE
DBMS
FE BE
DBMS
FE BE
ELT
DWH 3-24 hrs
BI Data Mining OLAP …
FE BE
FE BE
FE BE
FE BE
FE BE
WAL Replication
3-5 minutes late
NAS NAS Backup / Restore
3 days late
BI Data Mining OLAP …
4-7 days
DBMS DBMS DBMS DBMS DBMS
CDC
DWH
Our problems are
Ø Time to action takes up to 7 days
Ø Amount of data is growing
Ø DWH MPP storage is expensive
Moving Forward
54 Pivotal Confidential–Internal Use Only
ELT CDC
Modern Architectures
100ms
300ms
1-4 days
FE BE
DBMS DBMS
FE BE
DBMS
FE BE
DBMS
FE BE
DBMS
FE BE
ELT
DWH 3-24 hrs
BI Data Mining OLAP …
FE BE
FE BE
FE BE
FE BE
FE BE
WAL Replication
3-5 minutes late
NAS NAS Backup / Restore
3 days late
BI Data Mining OLAP …
4-7 days
DBMS DBMS DBMS DBMS DBMS
CDC
DWH
Our problems are
Ø Time to action takes up to 7 days
Ø Amount of data is growing
Ø DWH MPP storage is expensive Data Lake
55 Pivotal Confidential–Internal Use Only
ELT CDC
Modern Architectures
100ms
300ms
1-4 days
FE BE
DBMS DBMS
FE BE
DBMS
FE BE
DBMS
FE BE
DBMS
FE BE
ELT
DWH 3-24 hrs
BI Data Mining OLAP …
FE BE
FE BE
FE BE
FE BE
FE BE
WAL Replication
3-5 minutes late
NAS NAS Backup / Restore
3 days late
BI Data Mining OLAP …
4-7 days
DBMS DBMS DBMS DBMS DBMS
CDC
DWH
Our problems are
Ø Time to action takes up to 7 days
Ø Amount of data is growing
Ø DWH MPP storage is expensive
Lambda
Data Lake
56 Pivotal Confidential–Internal Use Only
ELT CDC
Modern Architectures – Data Lake
100ms
300ms
1-4 days
FE BE
DBMS DBMS
FE BE
DBMS
FE BE
DBMS
FE BE
DBMS
FE BE
ELT
DWH 3-24 hrs
BI Data Mining OLAP …
FE BE
FE BE
FE BE
FE BE
FE BE
WAL Replication
3-5 minutes late
NAS NAS Backup / Restore
3 days late
BI Data Mining OLAP …
4-7 days
DBMS DBMS DBMS DBMS DBMS
CDC
DWH
Hadoop
DBMS DBMS DBMS …
ELT
DDS
OLAP Data Marts
Aggregates
Reports
ODS ODS ODS …
CDC
DWH ODS UDS
Analytical Archives
BI Data Mining OLAP
SQL-on-Hadoop
Data Mining At Scale
57 Pivotal Confidential–Internal Use Only
ELT CDC
Modern Architectures – Data Lake
100ms
300ms
1-4 days
FE BE
DBMS DBMS
FE BE
DBMS
FE BE
DBMS
FE BE
DBMS
FE BE
ELT
DWH 3-24 hrs
BI Data Mining OLAP …
FE BE
FE BE
FE BE
FE BE
FE BE
WAL Replication
3-5 minutes late
NAS NAS Backup / Restore
3 days late
BI Data Mining OLAP …
4-7 days
DBMS DBMS DBMS DBMS DBMS
CDC
DWH
58 Pivotal Confidential–Internal Use Only
ELT CDC
Modern Architectures – Data Lake
100ms
300ms
1-4 days
FE BE
DBMS DBMS
FE BE
DBMS
FE BE
ELT
DWH 3-24 hrs
OLAP Data Mining BI …
FE BE
FE BE
FE BE
NAS NAS Backup / Restore
2 days late
Data Mining BI OLAP …
3-6 days
DBMS DBMS DBMS WAL Replication
3-5 minutes late
CDC
DWH Hadoop Hadoop
?
59 Pivotal Confidential–Internal Use Only
ELT CDC
Modern Architectures – Lambda
100ms
300ms
1-4 days
FE BE
DBMS DBMS
FE BE
DBMS
FE BE
ELT
DWH 3-24 hrs
OLAP Data Mining BI …
FE BE
FE BE
FE BE
NAS NAS Backup / Restore
2 days late
Data Mining BI OLAP …
3-6 days
DBMS DBMS DBMS WAL Replication
3-5 minutes late
CDC
DWH Hadoop Hadoop
?
Source Data
Speed Layer Batch Layer
Serving Layer
Query Query
Master Dataset
Batch View
Batch View
Batch View
Real-time View
Real-time View
Real-time View
60 Pivotal Confidential–Internal Use Only
ELT CDC
Modern Architectures – Lambda
100ms
300ms
1-4 days
FE BE
DBMS DBMS
FE BE
DBMS
FE BE
ELT
DWH 3-24 hrs
OLAP Data Mining BI …
FE BE
FE BE
FE BE
NAS NAS Backup / Restore
2 days late
Data Mining BI OLAP …
3-6 days
DBMS DBMS DBMS WAL Replication
3-5 minutes late
CDC
DWH Hadoop Hadoop
?
61 Pivotal Confidential–Internal Use Only
In-Memory Data Store
ELT CDC
Modern Architectures – Lambda
100ms
300ms
0-4 days
FE BE
DBMS DBMS
FE BE
DBMS
FE BE
ELT
DWH
0-24 hrs
OLAP Data Mining BI …
FE BE
FE BE
FE BE
NAS NAS Backup / Restore
2 days late
OLAP …
3-6 days
DBMS DBMS DBMS WAL Replication
3-5 minutes late
CDC
DWH Hadoop Hadoop
? In-Memory Data Store
RTDM BI Data Mining
62 Pivotal Confidential–Internal Use Only
In-Memory Data Store
ELT CDC
Modern Architectures
100ms
300ms
0-4 days
FE BE
DBMS DBMS
FE BE
DBMS
FE BE
ELT
DWH
0-24 hrs
OLAP Data Mining BI …
FE BE
FE BE
FE BE
NAS NAS Backup / Restore
2 days late
OLAP …
3-6 days
DBMS DBMS DBMS WAL Replication
3-5 minutes late
CDC
DWH Hadoop Hadoop
? In-Memory Data Store
RTDM BI Data Mining
Our problems are
63 Pivotal Confidential–Internal Use Only
In-Memory Data Store
ELT CDC
Modern Architectures
100ms
300ms
0-4 days
FE BE
DBMS DBMS
FE BE
DBMS
FE BE
ELT
DWH
0-24 hrs
OLAP Data Mining BI …
FE BE
FE BE
FE BE
NAS NAS Backup / Restore
2 days late
OLAP …
3-6 days
DBMS DBMS DBMS WAL Replication
3-5 minutes late
CDC
DWH Hadoop Hadoop
? In-Memory Data Store
RTDM BI Data Mining
Our problems are
Ø Too many standby systems
64 Pivotal Confidential–Internal Use Only
In-Memory Data Store
ELT CDC
Modern Architectures
100ms
300ms
0-4 days
FE BE
DBMS DBMS
FE BE
DBMS
FE BE
ELT
DWH
0-24 hrs
OLAP Data Mining BI …
FE BE
FE BE
FE BE
NAS NAS Backup / Restore
2 days late
OLAP …
3-6 days
DBMS DBMS DBMS WAL Replication
3-5 minutes late
CDC
DWH Hadoop Hadoop
? In-Memory Data Store
RTDM BI Data Mining
Our problems are
Ø Too many standby systems
Ø How to replicate Hadoop cluster?
65 Pivotal Confidential–Internal Use Only
In-Memory Data Store
ELT CDC
Modern Architectures
100ms
300ms
0-4 days
FE BE
DBMS DBMS
FE BE
DBMS
FE BE
ELT
DWH
0-24 hrs
OLAP Data Mining BI …
FE BE
FE BE
FE BE
NAS NAS Backup / Restore
2 days late
OLAP …
3-6 days
DBMS DBMS DBMS WAL Replication
3-5 minutes late
CDC
DWH Hadoop Hadoop
? In-Memory Data Store
RTDM BI Data Mining
Our problems are
Ø Too many standby systems
Ø How to replicate Hadoop cluster?
Ø How to sync data in real-time systems?
66 Pivotal Confidential–Internal Use Only
In-Memory Data Store
ELT CDC
Modern Architectures
100ms
300ms
0-4 days
FE BE
DBMS DBMS
FE BE
DBMS
FE BE
ELT
DWH
0-24 hrs
OLAP Data Mining BI …
FE BE
FE BE
FE BE
NAS NAS Backup / Restore
2 days late
OLAP …
3-6 days
DBMS DBMS DBMS WAL Replication
3-5 minutes late
CDC
DWH Hadoop Hadoop
? In-Memory Data Store
RTDM BI Data Mining
Our problems are
Ø Too many standby systems
Ø How to replicate Hadoop cluster?
Ø How to sync data in real-time systems?
Ø How to better sync DWH?
67 Pivotal Confidential–Internal Use Only
In-Memory Data Store
ELT CDC
Modern Architectures
100ms
300ms
0-4 days
FE BE
DBMS DBMS
FE BE
DBMS
FE BE
ELT
DWH
0-24 hrs
OLAP Data Mining BI …
FE BE
FE BE
FE BE
NAS NAS Backup / Restore
2 days late
OLAP …
3-6 days
DBMS DBMS DBMS WAL Replication
3-5 minutes late
CDC
DWH Hadoop Hadoop
? In-Memory Data Store
RTDM BI Data Mining
Our problems are
Ø Too many standby systems
Ø How to replicate Hadoop cluster?
Ø How to sync data in real-time systems?
Ø How to better sync DWH?
Pipelining
68 Pivotal Confidential–Internal Use Only
In-Memory Data Store
ELT CDC
100ms
300ms
0-4 days
FE BE
DBMS DBMS
FE BE
DBMS
FE BE
ELT
DWH
0-24 hrs
OLAP Data Mining BI …
FE BE
FE BE
FE BE
NAS NAS Backup / Restore
2 days late
OLAP …
3-6 days
DBMS DBMS DBMS WAL Replication
3-5 minutes late
CDC
DWH Hadoop Hadoop
? In-Memory Data Store
RTDM BI Data Mining
Modern Data Architecture – Pipelining
69 Pivotal Confidential–Internal Use Only
In-Memory Data Store
ELT CDC
100ms
300ms
0-4 days
FE BE
DBMS DBMS
FE BE
DBMS
FE BE
ELT
DWH
0-24 hrs
OLAP Data Mining BI …
FE BE
FE BE
FE BE
NAS NAS Backup / Restore
2 days late
OLAP …
3-6 days
DBMS DBMS DBMS WAL Replication
3-5 minutes late
CDC
DWH Hadoop Hadoop
? In-Memory Data Store
RTDM BI Data Mining
Modern Data Architecture – Pipelining
FE
App
App
App
… HTTP
70 Pivotal Confidential–Internal Use Only
In-Memory Data Store
ELT CDC
100ms
300ms
0-4 days
FE BE
DBMS DBMS
FE BE
DBMS
FE BE
ELT
DWH
0-24 hrs
OLAP Data Mining BI …
FE BE
FE BE
FE BE
NAS NAS Backup / Restore
2 days late
OLAP …
3-6 days
DBMS DBMS DBMS WAL Replication
3-5 minutes late
CDC
DWH Hadoop Hadoop
? In-Memory Data Store
RTDM BI Data Mining
Modern Data Architecture – Pipelining
FE
App
App
App
… HTTP
BE
Srv
Srv
Srv
… SOAP
71 Pivotal Confidential–Internal Use Only
In-Memory Data Store
ELT CDC
100ms
300ms
0-4 days
FE BE
DBMS DBMS
FE BE
DBMS
FE BE
ELT
DWH
0-24 hrs
OLAP Data Mining BI …
FE BE
FE BE
FE BE
NAS NAS Backup / Restore
2 days late
OLAP …
3-6 days
DBMS DBMS DBMS WAL Replication
3-5 minutes late
CDC
DWH Hadoop Hadoop
? In-Memory Data Store
RTDM BI Data Mining
Modern Data Architecture – Pipelining
FE
App
App
App
… HTTP
BE
Srv
Srv
Srv
… SOAP
OLTP
SP JDBC
Table
72 Pivotal Confidential–Internal Use Only
In-Memory Data Store
ELT CDC
100ms
300ms
0-4 days
FE BE
DBMS DBMS
FE BE
DBMS
FE BE
ELT
DWH
0-24 hrs
OLAP Data Mining BI …
FE BE
FE BE
FE BE
NAS NAS Backup / Restore
2 days late
OLAP …
3-6 days
DBMS DBMS DBMS WAL Replication
3-5 minutes late
CDC
DWH Hadoop Hadoop
? In-Memory Data Store
RTDM BI Data Mining
Modern Data Architecture – Pipelining
FE
App
App
App
… HTTP
BE
Srv
Srv
Srv
… SOAP
OLTP
SP JDBC
Log
Table
73 Pivotal Confidential–Internal Use Only
In-Memory Data Store
ELT CDC
100ms
300ms
0-4 days
FE BE
DBMS DBMS
FE BE
DBMS
FE BE
ELT
DWH
0-24 hrs
OLAP Data Mining BI …
FE BE
FE BE
FE BE
NAS NAS Backup / Restore
2 days late
OLAP …
3-6 days
DBMS DBMS DBMS WAL Replication
3-5 minutes late
CDC
DWH Hadoop Hadoop
? In-Memory Data Store
RTDM BI Data Mining
Modern Data Architecture – Pipelining
FE
App
App
App
… HTTP
BE
Srv
Srv
Srv
… SOAP
OLTP
SP JDBC
Log
Table
CDC
copy Parse
Batch
74 Pivotal Confidential–Internal Use Only
In-Memory Data Store
ELT CDC
100ms
300ms
0-4 days
FE BE
DBMS DBMS
FE BE
DBMS
FE BE
ELT
DWH
0-24 hrs
OLAP Data Mining BI …
FE BE
FE BE
FE BE
NAS NAS Backup / Restore
2 days late
OLAP …
3-6 days
DBMS DBMS DBMS WAL Replication
3-5 minutes late
CDC
DWH Hadoop Hadoop
? In-Memory Data Store
RTDM BI Data Mining
Modern Data Architecture – Pipelining
FE
App
App
App
… HTTP
BE
Srv
Srv
Srv
… SOAP
OLTP
SP JDBC
Log
Table
CDC
copy Parse
Batch
ETL
cp Batch
ETL
75 Pivotal Confidential–Internal Use Only
In-Memory Data Store
ELT CDC
100ms
300ms
0-4 days
FE BE
DBMS DBMS
FE BE
DBMS
FE BE
ELT
DWH
0-24 hrs
OLAP Data Mining BI …
FE BE
FE BE
FE BE
NAS NAS Backup / Restore
2 days late
OLAP …
3-6 days
DBMS DBMS DBMS WAL Replication
3-5 minutes late
CDC
DWH Hadoop Hadoop
? In-Memory Data Store
RTDM BI Data Mining
Modern Data Architecture – Pipelining
FE
App
App
App
… HTTP
BE
Srv
Srv
Srv
… SOAP
OLTP
SP JDBC
Log
Table
CDC
copy Parse
Batch
ETL
cp Batch
ETL load
OD
S
DWH
76 Pivotal Confidential–Internal Use Only
In-Memory Data Store
ELT CDC
100ms
300ms
0-4 days
FE BE
DBMS DBMS
FE BE
DBMS
FE BE
ELT
DWH
0-24 hrs
OLAP Data Mining BI …
FE BE
FE BE
FE BE
NAS NAS Backup / Restore
2 days late
OLAP …
3-6 days
DBMS DBMS DBMS WAL Replication
3-5 minutes late
CDC
DWH Hadoop Hadoop
? In-Memory Data Store
RTDM BI Data Mining
Modern Data Architecture – Pipelining
FE
App
App
App
… HTTP
BE
Srv
Srv
Srv
… SOAP
OLTP
SP JDBC
Log
Table
CDC
copy Parse
Batch
ETL
cp Batch
ETL load
OD
S
DD
S
DWH
77 Pivotal Confidential–Internal Use Only
In-Memory Data Store
ELT CDC
100ms
300ms
0-4 days
FE BE
DBMS DBMS
FE BE
DBMS
FE BE
ELT
DWH
0-24 hrs
OLAP Data Mining BI …
FE BE
FE BE
FE BE
NAS NAS Backup / Restore
2 days late
OLAP …
3-6 days
DBMS DBMS DBMS WAL Replication
3-5 minutes late
CDC
DWH Hadoop Hadoop
? In-Memory Data Store
RTDM BI Data Mining
Modern Data Architecture – Pipelining
FE
App
App
App
… HTTP
BE
Srv
Srv
Srv
… SOAP
OLTP
SP JDBC
Log
Table
CDC
copy Parse
Batch
ETL
cp Batch
ETL load
OD
S
DD
S
Dat
a M
art
DWH
78 Pivotal Confidential–Internal Use Only
In-Memory Data Store
ELT CDC
100ms
300ms
0-4 days
FE BE
DBMS DBMS
FE BE
DBMS
FE BE
ELT
DWH
0-24 hrs
OLAP Data Mining BI …
FE BE
FE BE
FE BE
NAS NAS Backup / Restore
2 days late
OLAP …
3-6 days
DBMS DBMS DBMS WAL Replication
3-5 minutes late
CDC
DWH Hadoop Hadoop
? In-Memory Data Store
RTDM BI Data Mining
Modern Data Architecture – Pipelining
FE
BI
App
App
App
… HTTP
BE
Srv
Srv
Srv
… SOAP
OLTP
SP JDBC
Log
Table
CDC
copy Parse
Batch
ETL
cp Batch
ETL load
OD
S
DD
S
Dat
a M
art
DWH
JDBC
79 Pivotal Confidential–Internal Use Only
In-Memory Data Store
ELT CDC
100ms
300ms
0-4 days
FE BE
DBMS DBMS
FE BE
DBMS
FE BE
ELT
DWH
0-24 hrs
OLAP Data Mining BI …
FE BE
FE BE
FE BE
NAS NAS Backup / Restore
2 days late
OLAP …
3-6 days
DBMS DBMS DBMS WAL Replication
3-5 minutes late
CDC
DWH Hadoop Hadoop
? In-Memory Data Store
RTDM BI Data Mining
Modern Data Architecture – Pipelining
FE
BI
App
App
App
… HTTP
BE
Srv
Srv
Srv
…
OLTP
SP JDBC
Log
Table
CDC
copy Parse
Batch
ETL
cp Batch
ETL
OD
S
DD
S
Dat
a M
art
DWH
JDBC
80 Pivotal Confidential–Internal Use Only
In-Memory Data Store
ELT CDC
100ms
300ms
0-4 days
FE BE
DBMS DBMS
FE BE
DBMS
FE BE
ELT
DWH
0-24 hrs
OLAP Data Mining BI …
FE BE
FE BE
FE BE
NAS NAS Backup / Restore
2 days late
OLAP …
3-6 days
DBMS DBMS DBMS WAL Replication
3-5 minutes late
CDC
DWH Hadoop Hadoop
? In-Memory Data Store
RTDM BI Data Mining
Modern Data Architecture – Pipelining
FE
BI
App
App
App
… HTTP
BE
Srv
Srv
Srv
…
OLTP
SP JDBC
Log
Table
CDC
copy Parse
Batch
load
OD
S
DD
S
Dat
a M
art
DWH
JDBC
AP
I
Queue ETL
ETL Batch
81 Pivotal Confidential–Internal Use Only
In-Memory Data Store
ELT CDC
100ms
300ms
0-4 days
FE BE
DBMS DBMS
FE BE
DBMS
FE BE
ELT
DWH
0-24 hrs
OLAP Data Mining BI …
FE BE
FE BE
FE BE
NAS NAS Backup / Restore
2 days late
OLAP …
3-6 days
DBMS DBMS DBMS WAL Replication
3-5 minutes late
CDC
DWH Hadoop Hadoop
? In-Memory Data Store
RTDM BI Data Mining
Modern Data Architecture – Pipelining
FE
BI
App
App
App
… HTTP
BE
Srv
Srv
Srv
…
OLTP
SP JDBC
Log
Table
CDC
copy Parse
Batch
load
OD
S
DD
S
Dat
a M
art
DWH
JDBC
AP
I
Queue ETL
ETL Batch
load ETL
82 Pivotal Confidential–Internal Use Only
In-Memory Data Store
ELT CDC
100ms
300ms
0-4 days
FE BE
DBMS DBMS
FE BE
DBMS
FE BE
ELT
DWH
0-24 hrs
OLAP Data Mining BI …
FE BE
FE BE
FE BE
NAS NAS Backup / Restore
2 days late
OLAP …
3-6 days
DBMS DBMS DBMS WAL Replication
3-5 minutes late
CDC
DWH Hadoop Hadoop
? In-Memory Data Store
RTDM BI Data Mining
Modern Data Architecture – Pipelining
FE
BI
App
App
App
… HTTP
BE
Srv
Srv
Srv
…
OLTP
SP JDBC
Log
Table
CDC
copy Parse
Batch
load
OD
S
DD
S
Dat
a M
art
DWH
JDBC
AP
I
Queue ETL
ETL Batch App
ETL Batch
load
load ETL
83 Pivotal Confidential–Internal Use Only
In-Memory Data Store
ELT CDC
100ms
300ms
0-4 days
FE BE
DBMS DBMS
FE BE
DBMS
FE BE
ELT
DWH
0-24 hrs
OLAP Data Mining BI …
FE BE
FE BE
FE BE
NAS NAS Backup / Restore
2 days late
OLAP …
3-6 days
DBMS DBMS DBMS WAL Replication
3-5 minutes late
CDC
DWH Hadoop Hadoop
? In-Memory Data Store
RTDM BI Data Mining
Modern Data Architecture – Pipelining
FE
BI
App
App
App
… HTTP
BE
Srv
Srv
Srv
…
OLTP
SP JDBC
Log
Table
CDC
copy Parse
Batch
load
OD
S
DD
S
Dat
a M
art
DWH
JDBC
AP
I
Queue ETL
ETL Batch App
ETL Batch
load
load ETL
STG
Batch App
Hadoop
HDFS SQL On
Hadoop
84 Pivotal Confidential–Internal Use Only
In-Memory Data Store
ELT CDC
100ms
300ms
0-4 days
FE BE
DBMS DBMS
FE BE
DBMS
FE BE
ELT
DWH
0-24 hrs
OLAP Data Mining BI …
FE BE
FE BE
FE BE
NAS NAS Backup / Restore
2 days late
OLAP …
3-6 days
DBMS DBMS DBMS WAL Replication
3-5 minutes late
CDC
DWH Hadoop Hadoop
? In-Memory Data Store
RTDM BI Data Mining
Modern Data Architecture – Pipelining
FE
BI
App
App
App
… HTTP
BE
Srv
Srv
Srv
…
OLTP
SP JDBC
Log
Table
CDC
copy Parse
Batch
load
OD
S
DD
S
Dat
a M
art
DWH
JDBC
AP
I
Queue ETL
ETL Batch App
ETL Batch
load
load ETL
STG
Batch App
Hadoop
HDFS SQL On
Hadoop
RTI App
85 Pivotal Confidential–Internal Use Only
In-Memory Data Store
ELT CDC
100ms
300ms
0-4 days
FE BE
DBMS DBMS
FE BE
DBMS
FE BE
ELT
DWH
0-24 hrs
OLAP Data Mining BI …
FE BE
FE BE
FE BE
NAS NAS Backup / Restore
2 days late
OLAP …
3-6 days
DBMS DBMS DBMS WAL Replication
3-5 minutes late
CDC
DWH Hadoop Hadoop
? In-Memory Data Store
RTDM BI Data Mining
Modern Data Architecture – Pipelining
FE
BI
App
App
App
… HTTP
BE
Srv
Srv
Srv
…
OLTP
SP JDBC
Log
Table
CDC
copy Parse
Batch
load
OD
S
DD
S
Dat
a M
art
DWH
JDBC
AP
I
Queue ETL
ETL Batch App
ETL Batch
load
load ETL
STG
Batch App
Hadoop
HDFS SQL On
Hadoop
RTI App Replicate
86 Pivotal Confidential–Internal Use Only
In-Memory Data Store
ELT CDC
100ms
300ms
0-4 days
FE BE
DBMS DBMS
FE BE
DBMS
FE BE
ELT
DWH
0-24 hrs
OLAP Data Mining BI …
FE BE
FE BE
FE BE
NAS NAS Backup / Restore
2 days late
OLAP …
3-6 days
DBMS DBMS DBMS WAL Replication
3-5 minutes late
CDC
DWH Hadoop Hadoop
? In-Memory Data Store
RTDM BI Data Mining
Modern Data Architecture – Pipelining
87 Pivotal Confidential–Internal Use Only
ELT CDC
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
OLAP Data Mining RTBI …
FE
BE
FE
BE
FE
BE
CDC
Hadoop In-Memory Data Store
BI
Modern Data Architecture – Pipelining
Replication Queue 3-5 minutes late
In-Memory Data Store
OLAP …
DWH Hadoop
BI Data Mining RTBI
DBMS DBMS DBMS WAL Replication
3-5 minutes late
88 Pivotal Confidential–Internal Use Only
Pivotal and Modern Data Architecture
BI
Pivotal Cloud Foundry
HTTP
FE
…
App
App
App
Queue BE
…
App
App
App
Pivotal GemFire
App
Spring XD
Streaming
Streaming
Data
Pivotal HD
Pivotal HAWQ
ES
DD
S
Dat
a M
art
Pivotal Greenplum
Data Mart PostgreSQL
SP Table
OD
S
ETL
ETL
89 Pivotal Confidential–Internal Use Only
Pivotal and Modern Data Architecture
BI
HTTP
Pivotal GemFire
App
Spring XD
Streaming
Streaming
Data
Pivotal HD
Pivotal HAWQ
ES
DD
S
Dat
a M
art
Pivotal Greenplum
Data Mart PostgreSQL
SP Table
OD
S
ETL
ETL
Pivotal Cloud Foundry
FE
…
App
App
App
Queue BE
…
App
App
App
� Pivotal Labs – agile software development for next-generation applications
� Pivotal Cloud Foundry – PaaS for customer applications
� RabbitMQ – distributed message queue service on top of PCF
� Spring IO – foundation platform for modern applications
90 Pivotal Confidential–Internal Use Only
Pivotal and Modern Data Architecture
BI
Pivotal Cloud Foundry
HTTP
FE
…
App
App
App
Queue BE
…
App
App
App
Spring XD
Streaming
Streaming
Data
Pivotal HD
Pivotal HAWQ
ES
DD
S
Dat
a M
art
Pivotal Greenplum
Data Mart PostgreSQL
SP Table
OD
S
ETL
ETL
Pivotal GemFire
App
Pivotal GemFire and Apache Geode (incubating) – in-memory data grid enabling real-time data processing and real-time decision making for enterprises
91 Pivotal Confidential–Internal Use Only
Pivotal and Modern Data Architecture
BI
Pivotal Cloud Foundry
HTTP
FE
…
App
App
App
Queue BE
…
App
App
App
Pivotal GemFire
App
Streaming
Data
Pivotal HD
Pivotal HAWQ
ES
DD
S
Dat
a M
art
Pivotal Greenplum
Data Mart PostgreSQL
SP Table
OD
S
ETL
ETL
Spring XD
Streaming
Spring XD – unified, distributed and extensible framework for data pipelining: ingesting, batching, processing and exporting
92 Pivotal Confidential–Internal Use Only
Pivotal and Modern Data Architecture
BI
Pivotal Cloud Foundry
HTTP
FE
…
App
App
App
Queue BE
…
App
App
App
Pivotal GemFire
App
Spring XD
Streaming
ES
DD
S
Dat
a M
art
Pivotal Greenplum
PostgreSQL
SP Table
OD
S
ETL
ETL
Streaming
Data
Pivotal HD
Pivotal HAWQ
Data Mart
� Pivotal HD – leading Hadoop distribution based on ODP
� Pivotal HAWQ and Apache HAWQ (incubating) – bringing the power of MPP to the Hadoop cluster, best in class SQL-on-Hadoop solution
� Apache Spark – component of the Pivotal HD distribution, modern framework for distributed data processing
93 Pivotal Confidential–Internal Use Only
Pivotal and Modern Data Architecture
BI
Pivotal Cloud Foundry
HTTP
FE
…
App
App
App
Queue BE
…
App
App
App
Pivotal GemFire
App
Spring XD
Streaming
Streaming
Data
Pivotal HD
Pivotal HAWQ
ES
DD
S
Dat
a M
art
Pivotal Greenplum
Data Mart
OD
S
ETL
ETL
PostgreSQL
SP Table
� Pivotal PostgreSQL – commercially supported by Pivotal open source distribution of PostgreSQL
94 Pivotal Confidential–Internal Use Only
Pivotal and Modern Data Architecture
BI
Pivotal Cloud Foundry
HTTP
FE
…
App
App
App
Queue BE
…
App
App
App
Pivotal GemFire
App
Spring XD
Streaming
Streaming
Data
Pivotal HD
Pivotal HAWQ
Data Mart PostgreSQL
SP Table
ETL
ETL
ES
DD
S
Dat
a M
art
Pivotal Greenplum
OD
S
Pivotal Greenplum – leading analytical MPP database, foundation for the enterprise data warehousing systems and advanced analytics
95 Pivotal Confidential–Internal Use Only
Pivotal and Modern Data Architecture Pivotal GemFire
App
Spring XD
Streaming
BI
Pivotal Cloud Foundry
HTTP
FE
…
App
App
App
Queue BE
…
App
App
App
Streaming
Data
Pivotal HD
Pivotal HAWQ
ES
DD
S
Dat
a M
art
Pivotal Greenplum
Data Mart PostgreSQL
SP Table
OD
S
ETL
ETL
Data Lake
96 Pivotal Confidential–Internal Use Only
Pivotal and Modern Data Architecture
Pivotal Cloud Foundry
HTTP
FE
…
App
App
App
Queue BE
…
App
App
App
Spring XD
Streaming
ES
DD
S
Dat
a M
art
Pivotal Greenplum
PostgreSQL
SP Table
OD
S
ETL
ETL
Pivotal GemFire
App
Streaming
Data
Pivotal HD
Pivotal HAWQ
Data Mart
BI
Lambda Architecture
97 Pivotal Confidential–Internal Use Only
Pivotal and Modern Data Architecture
ES
DD
S
Dat
a M
art
Pivotal Greenplum
PostgreSQL
SP Table
OD
S
ETL
ETL
Pivotal Cloud Foundry
HTTP
FE
…
App
App
App
Queue BE
…
App
App
App
Streaming
Pivotal HD BI
Pivotal GemFire
App
Spring XD
Streaming
Data
Pivotal HAWQ
Data Mart
Pipelining
98 Pivotal Confidential–Internal Use Only
Pivotal and Modern Data Architecture
BI
Pivotal Cloud Foundry
HTTP
FE
…
App
App
App
Queue BE
…
App
App
App
Pivotal GemFire
App
Spring XD
Streaming
Streaming
Data
Pivotal HD
Pivotal HAWQ
ES
DD
S
Dat
a M
art
Pivotal Greenplum
Data Mart PostgreSQL
SP Table
OD
S
ETL
ETL
99 Pivotal Confidential–Internal Use Only 99 Pivotal Confidential–Internal Use Only
Questions?
BUILT FOR THE SPEED OF BUSINESS