Top Banner
Dr. Roland Kuhn Akka Tech Lead @rolandkuhn Reactive Streams Handling Data-Flows the Reactive Way
60

Reactive Streams: Handling Data-Flow the Reactive Way

Aug 27, 2014

Download

Software

Roland Kuhn

Building on the success of Reactive Extensions—first in Rx.NET and now in RxJava—we are taking Observers and Observables to the next level: by adding the capability of handling back-pressure between asynchronous execution stages we enable the distribution of stream processing across a cluster of potentially thousands of nodes. The project defines the common interfaces for interoperable stream implementations on the JVM and is the result of a collaboration between Twitter, Netflix, Pivotal, RedHat and Typesafe. In this presentation I introduce the guiding principles behind its design and show examples using the actor-based implementation in Akka.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Reactive Streams: Handling Data-Flow the Reactive Way

Dr. Roland Kuhn Akka Tech Lead @rolandkuhn

Reactive StreamsHandling Data-Flows the Reactive Way

Page 2: Reactive Streams: Handling Data-Flow the Reactive Way

Introduction: Streams

Page 3: Reactive Streams: Handling Data-Flow the Reactive Way

What is a Stream?

• ephemeral flow of data • focused on describing transformation • possibly unbounded in size

3

Page 4: Reactive Streams: Handling Data-Flow the Reactive Way

Common uses of Streams

• bulk data transfer • real-time data sources • batch processing of large data sets • monitoring and analytics

4

Page 5: Reactive Streams: Handling Data-Flow the Reactive Way

What is special about Reactive Streams?

Page 6: Reactive Streams: Handling Data-Flow the Reactive Way

Reactive  Applications

The Four Reactive Traits

6

http://reactivemanifesto.org/

Page 7: Reactive Streams: Handling Data-Flow the Reactive Way

Needed: Asynchrony

• Resilience demands it: • encapsulation • isolation

• Scalability demands it: • distribution across nodes • distribution across cores

7

Page 8: Reactive Streams: Handling Data-Flow the Reactive Way

Many Kinds of Async Boundaries

8

Page 9: Reactive Streams: Handling Data-Flow the Reactive Way

Many Kinds of Async Boundaries

• between different applications

8

Page 10: Reactive Streams: Handling Data-Flow the Reactive Way

Many Kinds of Async Boundaries

• between different applications• between network nodes

8

Page 11: Reactive Streams: Handling Data-Flow the Reactive Way

Many Kinds of Async Boundaries

• between different applications• between network nodes• between CPUs

8

Page 12: Reactive Streams: Handling Data-Flow the Reactive Way

Many Kinds of Async Boundaries

• between different applications• between network nodes• between CPUs• between threads

8

Page 13: Reactive Streams: Handling Data-Flow the Reactive Way

Many Kinds of Async Boundaries

• between different applications• between network nodes• between CPUs• between threads• between actors

8

Page 14: Reactive Streams: Handling Data-Flow the Reactive Way

The Problem: !

Getting Data across an Async Boundary

Page 15: Reactive Streams: Handling Data-Flow the Reactive Way

Possible Solutions

10

Page 16: Reactive Streams: Handling Data-Flow the Reactive Way

Possible Solutions

• the Traditional way: blocking calls

10

Page 17: Reactive Streams: Handling Data-Flow the Reactive Way

Possible Solutions

10

Page 18: Reactive Streams: Handling Data-Flow the Reactive Way

Possible Solutions

!

• the Push way: buffering and/or dropping

11

Page 19: Reactive Streams: Handling Data-Flow the Reactive Way

Possible Solutions

11

Page 20: Reactive Streams: Handling Data-Flow the Reactive Way

Possible Solutions

!

!

• the Reactive way:non-blocking & non-dropping & bounded

12

Page 21: Reactive Streams: Handling Data-Flow the Reactive Way

How do we achieve that?

Page 22: Reactive Streams: Handling Data-Flow the Reactive Way

Supply and Demand

• data items flow downstream • demand flows upstream • data items flow only when there is demand • recipient is in control of incoming data rate • data in flight is bounded by signaled demand

14

Publisher Subscriber

data

demand

Page 23: Reactive Streams: Handling Data-Flow the Reactive Way

Dynamic Push–Pull

• “push” behavior when consumer is faster • “pull” behavior when producer is faster • switches automatically between these • batching demand allows batching data

15

Publisher Subscriber

data

demand

Page 24: Reactive Streams: Handling Data-Flow the Reactive Way

Explicit Demand: Tailored Flow Control

16

demand

data

splitting the data means merging the demand

Page 25: Reactive Streams: Handling Data-Flow the Reactive Way

Explicit Demand: Tailored Flow Control

17

merging the data means splitting the demand

Page 26: Reactive Streams: Handling Data-Flow the Reactive Way

Reactive Streams

• asynchronous non-blocking data flow • asynchronous non-blocking demand flow • minimal coordination and contention • message passing allows for distribution

• across applications • across nodes • across CPUs • across threads • across actors

18

Page 27: Reactive Streams: Handling Data-Flow the Reactive Way

Are Streams Collections?

Page 28: Reactive Streams: Handling Data-Flow the Reactive Way

What is a Collection?

20

Page 29: Reactive Streams: Handling Data-Flow the Reactive Way

What is a Collection?

• Oxford Dictionary: • “a group of things or people”

20

Page 30: Reactive Streams: Handling Data-Flow the Reactive Way

What is a Collection?

• Oxford Dictionary: • “a group of things or people”

• wikipedia: • “a grouping of some variable number of data items”

20

Page 31: Reactive Streams: Handling Data-Flow the Reactive Way

What is a Collection?

• Oxford Dictionary: • “a group of things or people”

• wikipedia: • “a grouping of some variable number of data items”

• backbone.js: • “collections are simply an ordered set of models”

20

Page 32: Reactive Streams: Handling Data-Flow the Reactive Way

What is a Collection?

• Oxford Dictionary: • “a group of things or people”

• wikipedia: • “a grouping of some variable number of data items”

• backbone.js: • “collections are simply an ordered set of models”

• java.util.Collection: • definite size, provides an iterator, query membership

20

Page 33: Reactive Streams: Handling Data-Flow the Reactive Way

User Expectations

• an Iterator is expected to visit all elements(especially with immutable collections) • x.head + x.tail == x • the contents does not depend on who is

processing the collection • the contents does not depend on when the

processing happens(especially with immutable collections)

21

Page 34: Reactive Streams: Handling Data-Flow the Reactive Way

Streams have Unexpected Properties

• the observed sequence depends on • … when the observer subscribed to the stream • … whether the observer can process fast enough • … whether the streams flows fast enough

22

Page 35: Reactive Streams: Handling Data-Flow the Reactive Way

Streams are not Collections!

• java.util.stream:Stream is not derived from Collection“Streams differ from Coll’s in several ways” • no storage • functional in nature • laziness seeking • possibly unbounded • consumable

23

Page 36: Reactive Streams: Handling Data-Flow the Reactive Way

Streams are not Collections!

• a collection can be streamed • a stream observer can create a collection • … but saying that a Stream is just a lazy

Collection evokes the wrong associations

24

Page 37: Reactive Streams: Handling Data-Flow the Reactive Way

So, Reactive Streams: why not just java.util.stream.Stream?

Page 38: Reactive Streams: Handling Data-Flow the Reactive Way

Java 8 Stream

26

import java.util.stream.*; !// get some stream final Stream<Integer> s = Stream.of(1, 2, 3); // describe transformation final Stream<String> s2 = s.map(i -> "a" + i); // make a pull collection s2.iterator(); // or alternatively push it somewhere s2.forEach(i -> System.out.println(i)); // (need to pick one, Stream is consumable)

Page 39: Reactive Streams: Handling Data-Flow the Reactive Way

Java 8 Stream

• provides a DSL for describing transformation • introduces staged computation

(but does not allow reuse) • prescribes an eager model of execution • offers either push or pull, chosen statically

27

Page 40: Reactive Streams: Handling Data-Flow the Reactive Way

What about RxJava?

Page 41: Reactive Streams: Handling Data-Flow the Reactive Way

RxJava

29

import rx.Observable; import rx.Observable.*; !// get some stream source final Observable<Integer> obs = range(1, 3); // describe transformation final Observable<String> obs2 = obs.map(i -> "b" + i); // and use it twice obs2.subscribe(i -> System.out.println(i)); obs2.filter(i -> i.equals("b2")) .subscribe(i -> System.out.println(i));

Page 42: Reactive Streams: Handling Data-Flow the Reactive Way

RxJava

• implements pure “push” model • includes extensive DSL for transformations • only allows blocking for back pressure • currently uses unbounded buffering for

crossing an async boundary • work on distributed Observables sparked

participation in Reactive Streams

30

Page 43: Reactive Streams: Handling Data-Flow the Reactive Way

The Reactive Streams Project

Page 44: Reactive Streams: Handling Data-Flow the Reactive Way

Participants

• Engineers from • Netflix • Oracle • Pivotal • Red Hat • Twitter • Typesafe

• Individuals like Doug Lea and Todd Montgomery

32

Page 45: Reactive Streams: Handling Data-Flow the Reactive Way

The Motivation

• all participants had the same basic problem • all are building tools for their community • a common solution benefits everybody • interoperability to make best use of efforts • e.g. use Reactor data store driver with Akka

transformation pipeline and Rx monitoring to drive a vert.x REST API (purely made up, at this point)

33

see also Jon Brisbin’s post on “Tribalism as a Force for Good”

Page 46: Reactive Streams: Handling Data-Flow the Reactive Way

Recipe for Success

• minimal interfaces • rigorous specification of semantics • full TCK for verification of implementation • complete freedom for many idiomatic APIs

34

Page 47: Reactive Streams: Handling Data-Flow the Reactive Way

The Meat

35

trait Publisher[T] { def subscribe(sub: Subscriber[T]): Unit } trait Subscription { def requestMore(n: Int): Unit def cancel(): Unit } trait Subscriber[T] { def onSubscribe(s: Subscription): Unit def onNext(elem: T): Unit def onError(thr: Throwable): Unit def onComplete(): Unit }

Page 48: Reactive Streams: Handling Data-Flow the Reactive Way

The Sauce

• all calls on Subscriber must dispatch async • all calls on Subscription must not block • Publisher is just there to create Subscriptions

36

Page 49: Reactive Streams: Handling Data-Flow the Reactive Way

How does it Connect?

37

SubscriberPublisher

subscribe

onSubscribeSubscription

requestMore

onNext

Page 50: Reactive Streams: Handling Data-Flow the Reactive Way

Akka Streams

Page 51: Reactive Streams: Handling Data-Flow the Reactive Way

Akka Streams

• powered by Akka Actors • execution • distribution • resilience

• type-safe streaming through Actors with bounded buffering

39

Page 52: Reactive Streams: Handling Data-Flow the Reactive Way

Basic Akka Example

40

implicit val system = ActorSystem("Sys") val mat = FlowMaterializer(...) !Flow(text.split("\\s").toVector). map(word => word.toUpperCase). foreach(tranformed => println(tranformed)). onComplete(mat) { case Success(_) => system.shutdown() case Failure(e) => println("Failure: " + e.getMessage) system.shutdown() }

Page 53: Reactive Streams: Handling Data-Flow the Reactive Way

More Advanced Akka Example

41

val s = Source.fromFile("log.txt", "utf-8") Flow(s.getLines()). groupBy { case LogLevelPattern(level) => level case other => "OTHER" }. foreach { case (level, producer) => val out = new PrintWriter(level+".txt") Flow(producer). foreach(line => out.println(line)). onComplete(mat)(_ => Try(out.close())) }. onComplete(mat)(_ => Try(s.close()))

Page 54: Reactive Streams: Handling Data-Flow the Reactive Way

Java 8 Example

42

final ActorSystem system = ActorSystem.create("Sys"); final MaterializerSettings settings = MaterializerSettings.create(); final FlowMaterializer materializer = FlowMaterializer.create(settings, system); !final String[] lookup = { "a", "b", "c", "d", "e", "f" }; final Iterable<Integer> input = Arrays.asList(0, 1, 2, 3, 4, 5); !Flow.create(input).drop(2).take(3). // leave 2, 3, 4 map(elem -> lookup[elem]). // translate to "c","d","e" filter(elem -> !elem.equals("c")). // filter out the "c" grouped(2). // make into a list mapConcat(list -> list). // flatten the list fold("", (acc, elem) -> acc + elem). // accumulate into "de" foreach(elem -> System.out.println(elem)). // print it consume(materializer);

Page 55: Reactive Streams: Handling Data-Flow the Reactive Way

Akka TCP Echo Server

43

val future = IO(StreamTcp) ? Bind(addr, ...) future.onComplete { case Success(server: TcpServerBinding) => println("listen at "+server.localAddress) ! Flow(server.connectionStream).foreach{ c => println("client from "+c.remoteAddress) c.inputStream.produceTo(c.outputStream) }.consume(mat) ! case Failure(ex) => println("cannot bind: "+ex) system.shutdown() }

Page 56: Reactive Streams: Handling Data-Flow the Reactive Way

Akka HTTP Sketch (NOT REAL CODE)

44

val future = IO(Http) ? RequestChannelSetup(...) future.onSuccess { case ch: HttpRequestChannel => val p = ch.processor[Int] val body = Flow(new File(...)).toProducer(mat) Flow(HttpRequest(method = POST, uri = "http://ex.amp.le/42", entity = body) -> 42) .produceTo(mat, p) Flow(p).map { case (resp, token) => val out = FileSink(new File(...)) Flow(resp.entity.data).produceTo(mat, out) }.consume(mat) }

Page 57: Reactive Streams: Handling Data-Flow the Reactive Way

Closing Remarks

Page 58: Reactive Streams: Handling Data-Flow the Reactive Way

Current State

• Early Preview is available:"org.reactivestreams" % "reactive-streams-spi" % "0.2""com.typesafe.akka" %% "akka-stream-experimental" % "0.3" • check out the Activator template

"Akka Streams with Scala!"(https://github.com/typesafehub/activator-akka-stream-scala)

46

Page 59: Reactive Streams: Handling Data-Flow the Reactive Way

Next Steps

• we work towards inclusion in future JDK • try it out and give feedback! • http://reactive-streams.org/ • https://github.com/reactive-streams

47

Page 60: Reactive Streams: Handling Data-Flow the Reactive Way

©Typesafe 2014 – All Rights Reserved