BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENF HAMBURG KOPENHAGEN LAUSANNE MÜNCHEN STUTTGART WIEN ZÜRICH Architecture of Big Data Solutions Guido Schmutz Frankfurt, 13.12.2017 @ gschmutz guidoschmutz.wordpress.com
BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENF HAMBURG KOPENHAGEN LAUSANNE MÜNCHEN STUTTGART WIEN ZÜRICH
Architecture of Big Data SolutionsGuido SchmutzFrankfurt, 13.12.2017
@gschmutz guidoschmutz.wordpress.com
Guido Schmutz
Working at Trivadis for more than 20 yearsOracle ACE Director for Fusion Middleware and SOAConsultant, Trainer Software Architect for Java, Oracle, SOA andBig Data / Fast DataHead of Trivadis Architecture BoardTechnology Manager @ Trivadis
More than 30 years of software development experience
Contact: [email protected]: http://guidoschmutz.wordpress.comSlideshare: http://www.slideshare.net/gschmutzTwitter: gschmutz
Architektur of Big Data Solutions
Agenda
1. Introduction2. Big Data & Fast Data Reference Architectures3. Continuous Streaming Data Ingestion4. Big Data & Cloud5. Microservices Architecture6. Big Data Ecosystem – many choices sorted!
Architektur of Big Data Solutions
Introduction
Architektur of Big Data Solutions
Big Data Definition (4 Vs)
+Timetoaction?– BigData+Real-Time=StreamProcessing
CharacteristicsofBigData:ItsVolume,VelocityandVarietyincombination
Architektur of Big Data Solutions
Architektur von Big Data Lösungen
Enterprise Data Warehouse
ETL / Stored Procedures
Data Marts / AggregationsLocation
Social
Clickstream
Segmentation & ChurnAnalysis
BI Tools
Marketing Offers
Billing &Ordering
CRM / Profile
MarketingCampaigns
Architektur of Big Data Solutions
Traditional Flow Diagram - Challenges
Enterprise Data Warehouse
ETL / Stored Procedures
Data Marts / AggregationsLocation
Social
Clickstream
Segmentation & ChurnAnalysis
BI Tools
Marketing Offers
Billing &Ordering
CRM / Profile
MarketingCampaigns
Limited Processing
Power
Does not model easily to traditional
database schema
Limited Processing
Power
Storage Scaling
very expensive
Based on sample /
limited data
Loss in Fidelity
Other / New Data Sources
High Voume
and Velocity
Architektur of Big Data Solutions
Big Data to the rescue? Why is a structuring / architecture important?
Architektur of Big Data Solutions
Why talk about Big Data Architectures?
Choosing the right architecture is key for any (big data) project
Big Data is still quite a rather young field and therefore a “moving target”
no standard architectures available which have been used for years
In the past years, some architectures and best practices have evolved
Know your use cases before choosing your architecture / technologies
To have a reference architecture in place helps in choosing the right/matching technologies
Architektur of Big Data Solutions
Big Data & Fast Data Reference Architectures
Architektur of Big Data Solutions
Hadoop ClusterdHadoop Cluster
Big Data Cluster
Big Data Architecture
BITools
Enterprise Data Warehouse
Billing &Ordering
CRM / Profile
MarketingCampaigns
File Import / SQL Import
SQL
Search/Explore
Online&MobileApps
Search
• MachineLearning• GraphAlgorithms• NaturalLanguageProcessing
Parallel Processing
Storage
Storage
Raw
Ref
ined
Results
Architektur of Big Data Solutions
Hadoop ClusterdHadoop Cluster
Big Data Cluster
Big Data Architecture - Hadoop
BITools
Enterprise Data Warehouse
Billing &Ordering
CRM / Profile
MarketingCampaigns
File Import / SQL Import
SQL
Search/Explore
Online&MobileApps
Search
• MachineLearning• GraphAlgorithms• NaturalLanguageProcessing
Parallel Processing
Storage
Storage
Raw
Ref
ined
Results
Architektur of Big Data Solutions
Hadoop ClusterdHadoop Cluster
Big Data Cluster
Big Data Architecture - Spark
BITools
Enterprise Data Warehouse
Billing &Ordering
CRM / Profile
MarketingCampaigns
File Import / SQL Import
SQL
Search/Explore
Online&MobileApps
Search
• MachineLearning• GraphAlgorithms• NaturalLanguageProcessing
Parallel Processing
Storage
Storage
Raw
Ref
ined
Results
Architektur of Big Data Solutions
Event HubEvent
Hub
Hadoop ClusterdHadoop Cluster
Big Data Cluster
Event Hub for handling streaming data
BITools
Enterprise Data Warehouse
Event Hub
SQL
Search/Explore
Online&MobileApps
Search
Data Flow • MachineLearning• GraphAlgorithms• NaturalLanguageProcessing
Parallel Processing
Storage
Storage
Raw
Ref
ined
Results
Architektur of Big Data Solutions
Location
Social
Clickstream
Sensor Data
Billing &Ordering
CRM / Profile
MarketingCampaigns
CallCenter
MobileApps
WeatherData
Event HubEvent
Hub
Hadoop ClusterdHadoop Cluster
Big Data Cluster
Event Hub for handling streaming data
BITools
Enterprise Data Warehouse
Event Hub
SQL
Search/Explore
Online&MobileApps
Search
Data Flow • MachineLearning• GraphAlgorithms• NaturalLanguageProcessing
Parallel Processing
Storage
Storage
Raw
Ref
ined
Results
Location
Social
Clickstream
Sensor Data
Billing &Ordering
CRM / Profile
MarketingCampaigns
CallCenter
MobileApps
WeatherData
Architektur of Big Data Solutions
Event HubEvent
Hub
Hadoop ClusterdHadoop Cluster
Big Data Cluster
Event Hub for handling streaming data
BITools
Enterprise Data Warehouse
Event Hub
SQL
Search/Explore
Online&MobileApps
Search
Data Flow • MachineLearning• GraphAlgorithms• NaturalLanguageProcessing
Parallel Processing
Storage
Storage
Raw
Ref
ined
Results
Architektur of Big Data Solutions
Location
Social
Clickstream
Sensor Data
Billing &Ordering
CRM / Profile
MarketingCampaigns
CallCenter
MobileApps
WeatherData
highlatency
“Data at Rest” vs. “Data in Motion”
Architektur of Big Data Solutions
Data at Rest Data in Motion
Event HubEvent
Hub
Hadoop ClusterdHadoop Cluster
Stream Processing Cluster
Streaming Analytics Architecture
BITools
Enterprise Data Warehouse
Event Hub
Search/Explore
Online&MobileApps
Search
Data Flow Data Flow
Results
• LowLatencyProcessing• Alerting• ”Real-Time”Dashboard
Stream Analytics
Reference /Models
Dashboard
Architektur of Big Data Solutions
Location
Social
Clickstream
Sensor Data
Billing &Ordering
CRM / Profile
MarketingCampaigns
CallCenter
MobileApps
WeatherData
Event HubEvent
Hub
Hadoop ClusterdHadoop Cluster
Stream Processing Cluster
BITools
Enterprise Data Warehouse
Event Hub
Search/Explore
Online&MobileApps
Search
Data Flow Data Flow
Results
• LowLatencyProcessing• Alerting• ”Real-Time”Dashboard
Stream Analytics
Reference /Models
Dashboard
Architektur of Big Data Solutions
Location
Social
Clickstream
Sensor Data
Billing &Ordering
CRM / Profile
MarketingCampaigns
CallCenter
MobileApps
WeatherData
Streaming Analytics Architecture – Open Source
Event HubEvent
Hub
Hadoop ClusterdHadoop Cluster
Stream Processing Cluster
Streaming Analytics Architecture
BITools
Enterprise Data Warehouse
Event Hub
Search/Explore
Online&MobileApps
Search
Data Flow Data Flow
Results
• LowLatencyProcessing• Alerting• ”Real-Time”Dashboard
Stream Analytics
Reference /Models
Dashboard
Architektur of Big Data Solutions
Location
Social
Clickstream
Sensor Data
Billing &Ordering
CRM / Profile
MarketingCampaigns
CallCenter
MobileApps
WeatherData
lowlatencywithoutkeepingrawdata/events
Hadoop ClusterdHadoop Cluster
Event Processing Cluster
Keep raw event data
BITools
Enterprise Data Warehouse
Search/Explore
Online&MobileApps
Search
ResultsStream Analytics
Reference /Models
Dashboard
Hadoop ClusterdHadoop Cluster
Big Data Cluster
Event HubEvent
HubEvent Hub
File Import / SQL Import
Parallel Processing
Storage
Storage
Raw
Ref
ined
Results
Architektur of Big Data Solutions
Location
Social
Clickstream
Sensor Data
Billing &Ordering
CRM / Profile
MarketingCampaigns
CallCenter
MobileApps
WeatherData
“Lambda Architecture” for Big Data
Location
Social
Clickstream
Sensor Data
Billing &Ordering
CRM / Profile
MarketingCampaigns
CallCenter
MobileApps
Event HubEvent
HubEvent Hub
SQL
Search
BITools
Enterprise Data Warehouse
Search/Explore
Online&MobileApps
File Import / SQL Import
WeatherData
Hadoop ClusterdHadoop Cluster
Event Processing Cluster
ResultsStream Analytics
Reference /Models
Dashboard
Hadoop ClusterdHadoop Cluster
Big Data Cluster
Parallel Processing
Storage
Storage
Raw
Ref
ined
Results
Architektur of Big Data Solutions
“Kappa Architecture” for Big Data
Location
Social
Clickstream
Sensor Data
Billing &Ordering
CRM / Profile
MarketingCampaigns
CallCenter
MobileApps
SQL
Search
BITools
Enterprise Data Warehouse
Search/Explore
Online&MobileApps
File Import / SQL Import
WeatherData
Hadoop ClusterdHadoop Cluster
Event Processing Cluster
ResultsStream Analytics
Reference /Models
Dashboard
Hadoop ClusterdHadoop Cluster
Big Data Cluster
Event HubEvent
HubEvent Hub
Parallel Processing
Storage
Storage
Raw
Ref
ined
Results
Architektur of Big Data Solutions
Hadoop ClusterdHadoop ClusterBig Data Cluster
“Unified Architecture” for Big Data
Location
Social
Clickstream
Sensor Data
Billing &Ordering
CRM / Profile
MarketingCampaigns
CallCenter
MobileApps
Batch Analytics
Streaming Analytics
Stream AnalyticsNoSQL
Reference /Models
SQL
Search
Dashboard
BITools
Enterprise Data Warehouse
Search/Explore
Online&MobileApps
File Import / SQL Import
WeatherData
Event HubEvent
HubEvent Hub
Parallel Processing
Storage
Storage
Raw
Ref
ined
Results
Architektur of Big Data Solutions
Continuous Streaming Data Ingestion
Architektur of Big Data Solutions
Hadoop ClusterdHadoop ClusterBig Data Cluster
Continuous Data Ingestion
Location
Social
Clickstream
Sensor Data
Billing &Ordering
CRM / Profile
MarketingCampaigns
CallCenter
MobileApps
Batch Analytics
Streaming Analytics
Stream AnalyticsNoSQL
Reference /Models
SQL
Search
Dashboard
BITools
Enterprise Data Warehouse
Search/Explore
Online&MobileApps
File Import / SQL Import
WeatherData
Event HubEvent
HubEvent Hub
Parallel Processing
Storage
Storage
Raw
Ref
ined
Results
Architektur of Big Data Solutions
Continuous Streaming Data Ingestion
DBSourceBigDataLog
StreamProcessing
IoT Sensor
EventHub
Topic
Topic
REST
Topic
IoT GW
CDCGW
Conn
ect
CDC
DBSource
Log CDC
Native
IoT Sensor
IoT Sensor
31
DataflowGW
Topic
Topic
Queue
MessageGW
Topic
DataflowGW
Dataflow
TopicRE
ST31FileSourceLog
Log
Log
Social
Native
Topic
Topic
Architektur of Big Data Solutions
Continuous Streaming Data Ingestion
Architektur of Big Data Solutions
SQL Polling
Change Data Capture (CDC)
File Polling
File Stream (File Tailing)
File Stream (Appender)
Sensor Stream
Continuous Streaming Data Ingestion
DBSourceBigDataLog
StreamProcessing
IoT Sensor
EventHub
Topic
Topic
REST
Topic
IoT GW
CDCGW
Conn
ect
CDC
DBSource
Log CDC
Native
IoT Sensor
33
DataflowGW
Topic
Topic
Queue
MessageGW
Topic
DataflowGW
Dataflow
TopicRE
ST33FileSourceLog
Log
Log
Social
Native
Topic
Topic
Architektur of Big Data Solutions
Big Data & Cloud
Architektur of Big Data Solutions
Data Locality vs. Compute/Storage Separation
Data Local Compute Separate Compute and Storage
Worker #1
Disk
Processing
Master Node
Worker #2
Disk
Processing
Worker #3
Disk
Processing
Network
Storage
Disk Disk Disk
Compute #1
Processing
Compute #2
Processing
Compute #3
Processing
Network
Master Node
Network
Separation of compute and storage – the fundamental difference• store data in Object
Storage instead of DFS
• bring up Compute nodes only for data processing
• multiple workloads on separate clusters can access same data
Architektur of Big Data Solutions
A new way to Manage Big Data
Big Data Traditional Assumptions
Bare-metal
Data Locality
HDFS on local disks
Big DataA New Approach
Containers and VMs
Compute and storage separation
Shared storage
Benefits and Value
Big-Data-as-a-Service
Agility and cost savings
Faster time-to-insights
Architektur of Big Data Solutions
Hadoop ClusterdHadoop ClusterBig Data Cluster
Location
Social
Clickstream
Sensor Data
Billing &Ordering
CRM / Profile
MarketingCampaigns
CallCenter
MobileApps
Batch Analytics
Streaming Analytics
Stream AnalyticsNoSQL
Reference /Models
SQL
Search
Dashboard
BITools
Enterprise Data Warehouse
Search/Explore
Online&MobileApps
File Import / SQL Import
WeatherData
Event HubEvent
HubEvent Hub
Parallel Processing
Storage
Storage
Raw
Ref
ined
Results
Architektur of Big Data Solutions
Big Data & Cloud - Amazon WebServices (AWS)
Microservices Architecture
Architektur of Big Data Solutions
Hadoop ClusterdHadoop ClusterBig Data Cluster
Asynchronous Microservice Architecture
Location
Social
Clickstream
Sensor Data
Billing &Ordering
CRM / Profile
MarketingCampaigns
CallCenter
MobileApps
SQL
Search
BITools
Enterprise Data Warehouse
Search/Explore
Online&MobileApps
File Import / SQL Import
WeatherData
Event Hub
Parallel Processing
Storage
Storage
Raw
Ref
ined
Results
Microservice Cluster
Microservice State
{}
API
Stream Analytics Cluster
StreamProcessor
State
{}
API
EventStream
EventStream
Service
Architektur of Big Data Solutions
Big Data Ecosystem – many choices sorted!
Architektur of Big Data Solutions
Big Data Ecosystem – many choices sorted!
Architektur of Big Data Solutions
Big Data Ecosystem – many choices sorted!
Architektur of Big Data Solutions