New trends in big data: in-memory analytics, streaming computing and distributed machine learning

Post on 12-Jan-2017

636 Views

Category:

Data & Analytics

1 Downloads

Preview:

Click to see full reader

Transcript

Trends in Big Data.

Natalino BusaData Platform Architect at Ing

Play with your phones

Re-think Big DataHadoop has turned 10

Memory is eating Big DataAmazon is delivering instances with 2 TB RAM

Facebook, Microsoft: 90% workload below the 100 GB

Machine Learning algorithms fit on a single node

250 MB hard disk drive from 1979

I like Big Data and I cannot lie.

Disk -> RAMHadoop -> Spark

Map-Reduce -> Data Flow Graphs

HDFS -> Storage, MPPs, NoSQL

Wheel mill.

Stream like a boss

Streaming and Real-Time Analytics

Batch -> Event-DrivenETL -> Streaming

Hive -> Flink, Akka, Spark

Stream Centric Architectures

Spark - RDDs

Streaming SQL MLlib Graphx

Analytics, Statistics, Data Science, Model Training

HDFS NoSQL SQL

Data Sources

Map-Reduce

HDFS KAFKA

Spark: Unified Distributed Computing:SQL + Machine Learning + Graph Analytics

Hive

Virtual resources

Big Data Applications,

Assemble!

Clusters -> ResourcesOrchestrated -> Isolated

Static -> Disposable

YARN, MESOS, CoreOS, Kubernetes

Application-oriented Infrastructure

Elastic: Docker, Mesos, Yarn, Kubernetes

Data Processing: Flink, Spark, Akka

Indexing: Elastic Search, Deep Learning

APIs and microservices: Akka, Python, Java

Data storage: SQL, NoSQL, HDFS, Streaming

MESOS, YARN

Spark

Streaming

SQL MLlib

Graphx

DBs

ES

C*

Application Oriented Architectures

That’s all folks!

Natalino BusaData Platform Architect at Ing

@natbusa

top related