Top Banner
Fast Data Platforms @HUG_Italy Meetup (17/4/2015) @andrea_gioia
44

Fast data platforms - Hadoop User Group (Italy)

Jul 17, 2015

Download

Data & Analytics

Andrea Gioia
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Fast data platforms  - Hadoop User Group (Italy)

Fast Data Platforms@HUG_Italy Meetup (17/4/2015)

@andrea_gioia

Page 2: Fast data platforms  - Hadoop User Group (Italy)

Un po’ di storia

VoltDB e i Fast Data

Utilizzo di VoltDB in una Enterprise Data Platform

Page 3: Fast data platforms  - Hadoop User Group (Italy)

Un po’ di storia

VoltDB e i Fast Data

Utilizzo di VoltDB in una Enterprise Data Platform

Page 4: Fast data platforms  - Hadoop User Group (Italy)

FASE 1: ONE SIZE FIT ALL

Page 5: Fast data platforms  - Hadoop User Group (Italy)

FASE 2: OLAP vs OLTP

Page 6: Fast data platforms  - Hadoop User Group (Italy)

FASE 2: ARCHITETTURA DATI

Page 7: Fast data platforms  - Hadoop User Group (Italy)

…MA I VOLUMI CRESCONO VELOCEMENTE

Page 8: Fast data platforms  - Hadoop User Group (Italy)

PROBLEMA: SCALABILITA’ SOLO VERTICALE

Page 9: Fast data platforms  - Hadoop User Group (Italy)

SOLUZIONE: CODE + SHARDING

Page 10: Fast data platforms  - Hadoop User Group (Italy)

SOLUZIONE: CODE + SHARDING

Partition-1 Partition-2 Partition-3 Partition-4 Partition-5 Partition-6

Page 11: Fast data platforms  - Hadoop User Group (Italy)

…MA I VOLUMI CRESCONO VELOCEMENTE

Page 12: Fast data platforms  - Hadoop User Group (Italy)

Problemi

PROBLEMI

1. Gestione dei fault

2. Gestione applicativa del cluster

3. Ricalcolo massivo

Page 13: Fast data platforms  - Hadoop User Group (Italy)

FASE 3: HADOOP 1.0

Componenti1. Dati distribuiti (HDFS)2. Computazione distribuita (Map-Reduce)

Vantaggi1. Maschera la complessità della gestione

del cluster2. Minimizza gli spostamenti dei dati3. Scala orizzontalmente su commodity

hardware

Page 14: Fast data platforms  - Hadoop User Group (Italy)

FASE 3: ARCHITETTURA

Page 15: Fast data platforms  - Hadoop User Group (Italy)

FASE 3: DATA LAKE

Caratteristiche1. Tutti i dati al massimo livello di

dettaglio (Volume)2. Dati strutturati e non (Varietà)3. Dati aggiunti appena disponibili

(Velocità)4. Dati processabili in modalità

distribuita (Valore)

Page 16: Fast data platforms  - Hadoop User Group (Italy)

DATA LAKE != DWH

Page 17: Fast data platforms  - Hadoop User Group (Italy)

PROBLEMA: BIG MA NON FAST

COLLECT EXPLORE

ANALYZEACT

RISULTATI1. Scoperta2. Interrogazione3. Ottimizzazione

Page 18: Fast data platforms  - Hadoop User Group (Italy)

FASE 4: SQL on HADOOP

Page 19: Fast data platforms  - Hadoop User Group (Italy)

FASE 4: ARCHITETTURA

Page 20: Fast data platforms  - Hadoop User Group (Italy)

PROBLEMA: VELOCE MA NON ABBASTANZA

Page 21: Fast data platforms  - Hadoop User Group (Italy)

…PERCHE’ I DATI CRESCONO IN VOLUME E IN VELOCITA’

Page 22: Fast data platforms  - Hadoop User Group (Italy)

FASE 5: SPECIALIZZAZIONE

Page 23: Fast data platforms  - Hadoop User Group (Italy)

FASE 5: LAMBDA ARCHITECTURE

Marged View(QUERY)

Page 24: Fast data platforms  - Hadoop User Group (Italy)

LAMBDA ARCHITECTURE: PAIN POINTS

Problematiche1. Duplicazione della logica di calcolo2. Integrazione viste effettuata a livello

applicativo3. Molte componenti software da

gestire4. Molte componenti hardware

esposte a possibili fault5. Velocità del fast layer limitata dal

sistema di storage dello stato utilizzato

Page 25: Fast data platforms  - Hadoop User Group (Italy)

FAST LAYER SEMPLIFICATO

Page 26: Fast data platforms  - Hadoop User Group (Italy)

Un po’ di storia

VoltDB e i Fast Data

Utilizzo di VoltDB in una Enterprise Data Platform

Page 27: Fast data platforms  - Hadoop User Group (Italy)

VoltDB è un database…1. In memory2. Partitioned3. Single-threaded4. Distributed5. ACID compliant

COS’E’?

Page 28: Fast data platforms  - Hadoop User Group (Italy)

A tutte quelle applicazioni che hanno bisogno di processare grosse quantità di dati in modo affidabile e veloce (fast data)

Requisiti chiave per queste applicazioni sono…

1. Altissimo throughput2. Scalabilità3. Affidabilità4. High Availability

A CHI SERVE

Page 29: Fast data platforms  - Hadoop User Group (Italy)

A CHI NON SERVE

A tutte quelle applicazioni che hanno bisogno di immagazzinare e confrontare grosse quantità di dati storici suddivisi su più tabelle (dwhe bi)

Page 30: Fast data platforms  - Hadoop User Group (Italy)

DATA PARTITIONING

Page 31: Fast data platforms  - Hadoop User Group (Italy)

DATA REPLICATION

Page 32: Fast data platforms  - Hadoop User Group (Italy)

DISTRIBUTED PROCESSING

Page 33: Fast data platforms  - Hadoop User Group (Italy)

Garantita per mezzo di …1. Replica delle partizioni

(K-SAFETY)

HIGH AVAILABILITY

Page 34: Fast data platforms  - Hadoop User Group (Italy)

DURABILITY

Garantita per mezzo di …1. Snapshots periodici2. Command logging (sincrono o

asincrono)3. Replication (business continuity)

Page 35: Fast data platforms  - Hadoop User Group (Italy)

Un po’ di storia

VoltDB e i Fast Data

Utilizzo di VoltDB in una Enterprise Data Platform

Page 36: Fast data platforms  - Hadoop User Group (Italy)

DATA PLATFORM 1

Page 37: Fast data platforms  - Hadoop User Group (Italy)

DATA PLATFORM 2

Page 38: Fast data platforms  - Hadoop User Group (Italy)

DATA PLATFORM 2

Page 39: Fast data platforms  - Hadoop User Group (Italy)

DATA PLATFORM 2

Page 40: Fast data platforms  - Hadoop User Group (Italy)

DATA PLATFORM 2

APP APP

Page 41: Fast data platforms  - Hadoop User Group (Italy)

GRAZIE!

Page 42: Fast data platforms  - Hadoop User Group (Italy)

DOMANDE?

Page 43: Fast data platforms  - Hadoop User Group (Italy)

BIBLIOGRAFIA

1. How to beat the CAP (Nathan Marz)2. Questioning the Lambda Architecture (Jay Kreps)3. The Log: What every software engineer should know about real-

time data's unifying abstraction (Jay Kreps)4. Polyglot Persistence (Martin Fowler)5. Fast Data and the New Enterprise Data Architecture (Scott Jarr)6. Simplifying the (complex) Lambda architecture (John Piekos)