Top Banner
Streaming Computing Some thoughts and technology choices for event-driven processing Natalino Busa - 29 Aug. 2013
20

Streaming computing: architectures, and tchnologies

Dec 14, 2014

Download

Technology

Natalino Busà

Some loosen thoughts about the latest buzzwords, streaming computing, realtime processing, and in memory computing.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Streaming computing: architectures, and tchnologies

Streaming ComputingSome thoughts and technology choices for event-driven processing

Natalino Busa - 29 Aug. 2013

Page 2: Streaming computing: architectures, and tchnologies

Outline

● Concurrency● Streaming computing

● Technologies○ Gigaspaces○ Storm○ Akka

● Comparison matrix● Opportunities

Page 3: Streaming computing: architectures, and tchnologies

Algorithms: a tribute

Numbers and Algorithms:

9th century Persian Muslim mathematician Abu Abdullah Muhammad ibn Musa Al-Khwarizmi,

whose work built upon that of the 7th century Indian mathematician Brahmagupta.

We own a lot to these guys !!!

Page 4: Streaming computing: architectures, and tchnologies

Why do we need parallelism?

It gets bigger,

It doesn’t get much faster

BUT

We get more cores in a chip.

More cores = more parallelismWe are happy now, right?

Page 5: Streaming computing: architectures, and tchnologies

Moore’s law

Every 18 months, the number of CPU

core’s double

Another interpretation:

Every 18 months, the number of idle

CPU core’s double

Page 6: Streaming computing: architectures, and tchnologies

More parallelism

We trade:

Time vs ( CPU, Memory, I/O)

Page 7: Streaming computing: architectures, and tchnologies

Modern applications

Scalability:Vertical: concurrency

(use all the cores, memory and I/O of a given machine)

Horizontal: distribution (use all the machines in the cluster)

High availability: Fault tolerance: all levels (local, distributed)

(the terminator effect: you can stop it but can’t kill it )

Page 8: Streaming computing: architectures, and tchnologies

Streaming applications

Performance: Efficient use of resources:

CPU and memory, but also OS threads and sockets

Asynchronous:

event driven, reacts on new data

Distributed:

more machines = more performancethe algorithm is partitioned and/or replicated on the cluster

Page 9: Streaming computing: architectures, and tchnologies

What to increase?

More CPU: It helps when there is

computation involved

More MEMORY: It helps when there is

more state to keep

More I/O: It helps when there are

more messages to transfer

Page 10: Streaming computing: architectures, and tchnologies

Streaming or batch?

ProcessingData

Natalino Busa - 12 Feb. 2013

Data

source system target systemour system

What differentiate Streaming from Batch?

● Granularity of Data● Granularity of Processing

Granularity impacts:

Throughput, Latency, and the Cost of the system!

Page 11: Streaming computing: architectures, and tchnologies

The choice is yours

1000 events/sec (1 KB/event)

running on 100 cores all day long

“Wait a day, then process”

860 M events = 86 GB of data

Latency: 24 hoursThroughput: 1 update/day

BATCH: Hadoop

Latency 1ms Throughput: 1000 updates/sec

STREAMING: Akka

“Do not wait”

Process the 1KB of data each msec.

“Both are valid options. It depends on the application domain and the requirements/specs of the target and source systems”

Page 12: Streaming computing: architectures, and tchnologies

Mapping it to existing applications

Granularity of Data

256 GB 256 GB

Granularity of Processing

1 CPU 100 CPU’s

Traditional DB systems Big Data (Hadoop)

Granularity of Data

1 KB 1 KB

Granularity of Processing

1 CPU 100 CPU’s

Traditional mail server Web application server

Page 13: Streaming computing: architectures, and tchnologies

Technologies: Gigaspaces

Page 14: Streaming computing: architectures, and tchnologies

Technologies: StormTopology

SupervisingScaling

Page 15: Streaming computing: architectures, and tchnologies

Technologies: Akka

Supervising:tree of actors

Topology (statics and dynamic actors)

Scaling and distributed processing

Page 16: Streaming computing: architectures, and tchnologies

Technology matrix

Gran

ular

ity o

f Dat

aGranularity of Processing

Small Big

Small Akka AkkaGigaspaces

Big ? Storm

System end-to-end throughput

High ~ 10’000 events/sec Medium ~100 events/sec Low ~10 events/sec

Akka Storm/ Gigaspaces Scripting languages

Page 17: Streaming computing: architectures, and tchnologies

Big Data in motion

Both are:Distributed, fault-tolerant, streaming

- Storm ++ multi-language -- not user/admin friendly -- slow supervising

processing elements are jvm’s ideal when data is coarse grained

- Akka ++ high throughput, fine grained actors ++ dynamic topologies -- low-level, but high performance

processing elements are small and lightweightideal for millions of transactions per second

- Gigaspaces ++ combines memory + application distribution -- framework api is not very flexible

processing elements are jvmsideal for all-in-one solution, with little customization

Page 18: Streaming computing: architectures, and tchnologies

Opportunity: Lambda Architecture

Logic layerSoftware as a Servicee.g realt-time predictor

Natalino Busa - 12 Feb. 2013from http://www.manning.com/marz/

Page 19: Streaming computing: architectures, and tchnologies

Opportunity: Batch + Streaming

BatchComputing

Front End Services

In-MemoryDistributed Database

In-memoryDistributed DB’s

BatchStreaming

HTML5 Client / Responsive Applow-latencyHTTP API services FETCH

(refresh)

StreamingComputing

Data Warehouses Messaging Busses

PUSH(SSE, notifications)