Top Banner
@crichardson Map(), flatMap() and reduce() are your new best friends: Simpler collections, concurrency, and big data Chris Richardson Author of POJOs in Action Founder of the original CloudFoundry.com @crichardson [email protected] http://plainoldobjects.com
94

Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

May 09, 2015

Download

Technology

Higher-order functions such as map(), flatmap(), filter() and reduce() have their origins in mathematics and ancient functional programming languages such as Lisp. But today they have entered the mainstream and are available in languages such as JavaScript, Scala and Java 8. They are well on their way to becoming an essential part of every developer’s toolbox.

In this talk you will learn how these and other higher-order functions enable you to write simple, expressive and concise code that solve problems in a diverse set of domains. We will describe how you use them to process collections in Java and Scala. You will learn how functional Futures and Rx (Reactive Extensions) Observables simplify concurrent code. We will even talk about how to write big data applications in a functional style using libraries such as Scalding.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

Map(), flatMap() and reduce() are your new best friends:

Simpler collections, concurrency, and big data

Chris Richardson

Author of POJOs in ActionFounder of the original CloudFoundry.com

@[email protected]://plainoldobjects.com

Page 2: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

Presentation goalHow functional programming simplifies

your code

Show that map(), flatMap() and reduce()

are remarkably versatile functions

Page 3: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

About Chris

Page 4: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

About Chris

Founder of a buzzword compliant (stealthy, social, mobile, big data, machine learning, ...) startup

Consultant helping organizations improve how they architect and deploy applications using cloud, micro services, polyglot applications, NoSQL, ...

Page 5: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

Agenda

Why functional programming?

Simplifying collection processing

Eliminating NullPointerExceptions

Simplifying concurrency with Futures and Rx Observables

Tackling big data problems with functional programming

Page 6: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

What’s functional programming?

Page 7: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

It’s a programming paradigm

Page 8: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

It’s a kind of programming language

Page 9: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

Functions as the building blocks of the application

Page 10: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

Functions as first class citizens

Assign functions to variables

Store functions in fields

Use and write higher-order functions:

Pass functions as arguments

Return functions as values

Page 11: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

Avoids mutable stateUse:

Immutable data structures

Single assignment variables

Some functional languages such as Haskell don’t side-effects

There are benefits to immutability

Easier concurrency

More reliable code

But be pragmatic

Page 12: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

Why functional programming?

"the highest goal of programming-language design to enable good ideas to be elegantly

expressed"

http://en.wikipedia.org/wiki/Tony_Hoare

Page 13: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

Why functional programming?

More expressive

More concise

More intuitive - solution matches problem definition

Elimination of error-prone mutable state

Easy parallelization

Page 14: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

An ancient idea that has recently become popular

Page 15: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

Mathematical foundation:

λ-calculus

Introduced byAlonzo Church in the 1930s

Page 16: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

Lisp = an early functional language invented in 1958

http://en.wikipedia.org/wiki/Lisp_(programming_language)

1940

1950

1960

1970

1980

1990

2000

2010

garbage collection dynamic typing

self-hosting compiler tree data structures

(defun factorial (n) (if (<= n 1) 1 (* n (factorial (- n 1)))))

Page 17: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

My final year project in 1985: Implementing SASL

sieve (p:xs) = p : sieve [x | x <- xs, rem x p > 0];

primes = sieve [2..]

A list of integers starting with 2

Filter out multiples of p

Page 18: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

Mostly an Ivory Tower technology

Lisp was used for AI

FP languages: Miranda, ML, Haskell, ...

“Side-effects kills kittens and

puppies”

Page 19: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

http://steve-yegge.blogspot.com/2010/12/haskell-researchers-announce-discovery.html

!*

!*

!*

Page 20: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

But today FP is mainstreamClojure - a dialect of Lisp

A hybrid OO/functional language

A hybrid OO/FP language for .NET

Java 8 has lambda expressions

Page 21: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

Java 8 lambda expressions are functions

x -> x * x

x -> { for (int i = 2; i < Math.sqrt(x); i = i + 1) { if (x % i == 0) return false; } return true; };

(x, y) -> x * x + y * y

Page 22: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

Java 8 lambdas are a shorthand* for an anonymous

inner class

* not exactly. See http://programmers.stackexchange.com/questions/177879/type-inference-in-java-8

Page 23: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

Java 8 functional interfacesInterface with a single abstract method

e.g. Runnable, Callable, Spring’s TransactionCallback

A lambda expression is an instance of a functional interface.

You can use a lambda wherever a function interface “value” is expected

The type of the lambda expression is determined from it’s context

Page 24: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

Example Functional InterfaceFunction<Integer, Integer> square = x -> x * x;

BiFunction<Integer, Integer, Integer> sumSquares = (x, y) -> x * x + y * y;

Predicate<Integer> makeIsDivisibleBy(int y) { return x -> x % y == 0;}

Predicate<Integer> isEven = makeIsDivisibleBy(2);

Assert.assertTrue(isEven.test(8));Assert.assertFalse(isEven.test(11));

Page 25: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

Example Functional InterfaceExecutorService executor = ...;

final int x = 999

Future<Boolean> outcome = executor.submit(() -> { for (int i = 2; i < Math.sqrt(x); i = i + 1) { if (x % i == 0) return false; } return true; }

This lambda is a Callable

Page 26: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

Agenda

Why functional programming?

Simplifying collection processing

Eliminating NullPointerExceptions

Simplifying concurrency with Futures and Rx Observables

Tackling big data problems with functional programming

Page 27: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

Lot’s of application code=

collection processing:

Mapping, filtering, and reducing

Page 28: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

Social network examplepublic class Person {

enum Gender { MALE, FEMALE }

private Name name; private LocalDate birthday; private Gender gender; private Hometown hometown;

private Set<Friend> friends = new HashSet<Friend>(); ....

public class Friend {

private Person friend; private LocalDate becameFriends; ...}

public class SocialNetwork { private Set<Person> people; ...

Page 29: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

Mapping, filtering, and reducingpublic class Person {

public Set<Hometown> hometownsOfFriends() { Set<Hometown> result = new HashSet<>(); for (Friend friend : friends) { result.add(friend.getPerson().getHometown()); } return result; }

Page 30: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

Mapping, filtering, and reducing

public class Person {

public Set<Person> friendOfFriends() { Set<Person> result = new HashSet(); for (Friend friend : friends) for (Friend friendOfFriend : friend.getPerson().friends) if (friendOfFriend.getPerson() != this) result.add(friendOfFriend.getPerson()); return result; }

Page 31: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

Mapping, filtering, and reducingpublic class SocialNetwork {

private Set<Person> people;

...

public Set<Person> lonelyPeople() { Set<Person> result = new HashSet<Person>(); for (Person p : people) { if (p.getFriends().isEmpty()) result.add(p); } return result; }

Page 32: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

Mapping, filtering, and reducingpublic class SocialNetwork {

private Set<Person> people;

...

public int averageNumberOfFriends() { int sum = 0; for (Person p : people) { sum += p.getFriends().size(); } return sum / people.size(); }

Page 33: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

Problems with this style of programming

Low level

Imperative (how to do it) NOT declarative (what to do)

Verbose

Mutable variables are potentially error prone

Difficult to parallelize

Page 34: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

Java 8 streams to the rescue

A sequence of elements

“Wrapper” around a collection

Streams can also be infinite

Provides a functional/lambda-based API for transforming, filtering and aggregating elements

Much simpler, cleaner code

Page 35: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

Using Java 8 streams - mappingclass Person ..

private Set<Friend> friends = ...;

public Set<Hometown> hometownsOfFriends() { return friends.stream() .map(f -> f.getPerson().getHometown()) .collect(Collectors.toSet()); }

Page 36: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

The map() function

s1 a b c d e ...

s2 f(a) f(b) f(c) f(d) f(e) ...

s2 = s1.map(f)

Page 37: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

public class SocialNetwork {

private Set<Person> people;

...

public Set<Person> peopleWithNoFriends() { Set<Person> result = new HashSet<Person>(); for (Person p : people) { if (p.getFriends().isEmpty()) result.add(p); } return result; }

Using Java 8 streams - filteringpublic class SocialNetwork {

private Set<Person> people;

...

public Set<Person> lonelyPeople() { return people.stream()

.filter(p -> p.getFriends().isEmpty())

.collect(Collectors.toSet()); }

Page 38: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

Using Java 8 streams - friend of friends V1

class Person ..

public Set<Person> friendOfFriends() { Set<Set<Friend>> fof = friends.stream() .map(friend -> friend.getPerson().friends) .collect(Collectors.toSet()); ... }

Using map() => Set of Sets :-(

Somehow we need to flatten

Page 39: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

Using Java 8 streams - mapping

class Person ..

public Set<Person> friendOfFriends() { return friends.stream() .flatMap(friend -> friend.getPerson().friends.stream()) .map(Friend::getPerson) .filter(f -> f != this) .collect(Collectors.toSet()); }

maps and flattens

Page 40: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

The flatMap() function

s1 a b ...

s2 f(a)0 f(a)1 f(b)0 f(b)1 f(b)2 ...

s2 = s1.flatMap(f)

Page 41: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

Using Java 8 streams - reducingpublic class SocialNetwork {

private Set<Person> people;

...

public long averageNumberOfFriends() { return people.stream() .map ( p -> p.getFriends().size() ) .reduce(0, (x, y) -> x + y) / people.size(); } int x = 0;

for (int y : inputStream) x = x + yreturn x;

Page 42: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

The reduce() function

s1 a b c d e ...

x = s1.reduce(initial, f)

f(f(f(f(f(f(initial, a), b), c), d), e), ...)

Page 43: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

Newton's method for finding square roots

public class SquareRootCalculator {

public double squareRoot(double input, double precision) { return Stream.iterate( new Result(1.0), current -> refine(current, input, precision)) .filter(r -> r.done) .findFirst().get().value; }

private static Result refine(Result current, double input, double precision) { double value = current.value; double newCurrent = value - (value * value - input) / (2 * value); boolean done = Math.abs(value - newCurrent) < precision; return new Result(newCurrent, done); } class Result { boolean done; double value; }

Creates an infinite stream: seed, f(seed), f(f(seed)), .....

Don’t panic! Streams are lazy

Page 44: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

Agenda

Why functional programming?

Simplifying collection processing

Eliminating NullPointerExceptions

Simplifying concurrency with Futures and Rx Observables

Tackling big data problems with functional programming

Page 45: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

Tony’s $1B mistake

“I call it my billion-dollar mistake. It was the invention of the null

reference in 1965....But I couldn't resist the temptation to put in a null reference, simply because it

was so easy to implement...”

http://qconlondon.com/london-2009/presentation/Null+References:+The+Billion+Dollar+Mistake

Page 46: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

Coding with null pointersclass Person

public Friend longestFriendship() { Friend result = null; for (Friend friend : friends) { if (result == null || friend.getBecameFriends() .isBefore(result.getBecameFriends())) result = friend; } return result; }

Friend oldestFriend = person.longestFriendship();if (oldestFriend != null) { ...} else { ...}

Null check is essential yet easily forgotten

Page 47: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

Java 8 Optional<T>A wrapper for nullable references

It has two states:

empty ⇒ throws an exception if you try to get the reference

non-empty ⇒ contain a non-null reference

Provides methods for:

testing whether it has a value

getting the value

...

Return reference wrapped in an instance of this type instead of null

Page 48: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

Coding with optionalsclass Person public Optional<Friend> longestFriendship() { Friend result = null; for (Friend friend : friends) { if (result == null || friend.getBecameFriends().isBefore(result.getBecameFriends())) result = friend; } return Optional.ofNullable(result); }

Optional<Friend> oldestFriend = person.longestFriendship();// Might throw java.util.NoSuchElementException: No value present// Person dangerous = popularPerson.get();if (oldestFriend.isPresent) { ...oldestFriend.get()} else { ...}

Page 49: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

Using Optionals - better

Optional<Friend> oldestFriendship = ...;

Friend whoToCall1 = oldestFriendship.orElse(mother);

Avoid calling isPresent() and get()

Friend whoToCall3 = oldestFriendship.orElseThrow( () -> new LonelyPersonException());

Friend whoToCall2 = oldestFriendship.orElseGet(() -> lazilyFindSomeoneElse());

Page 50: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

Using Optional.map()public class Person {

public Optional<Friend> longestFriendship() { return ...; }

public Optional<Long> ageDifferenceWithOldestFriend() { Optional<Friend> oldestFriend = longestFriendship(); return oldestFriend.map ( of -> Math.abs(of.getPerson().getAge() - getAge())) ); }

Eliminates messy conditional logic

Page 51: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

Using flatMap()class Person

public Optional<Friend> longestFriendship() {...}

public Optional<Friend> longestFriendshipOfLongestFriend() { return longestFriendship() .flatMap(friend -> friend.getPerson().longestFriendship());}

not always a symmetric relationship. :-)

Page 52: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

Agenda

Why functional programming?

Simplifying collection processing

Eliminating NullPointerExceptions

Simplifying concurrency with Futures and Rx Observables

Tackling big data problems with functional programming

Page 53: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

Let’s imagine you are performing a CPU intensive operation

class Person ..

public Set<Hometown> hometownsOfFriends() { return friends.stream() .map(f -> cpuIntensiveOperation()) .collect(Collectors.toSet()); }

Page 54: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

class Person ..

public Set<Hometown> hometownsOfFriends() { return friends.parallelStream() .map(f -> cpuIntensiveOperation()) .collect(Collectors.toSet()); }

Parallel streams = simple concurrency Potentially uses N cores

⇒Nx speed up

Page 55: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

Let’s imagine that you are writing code to display the

products in a user’s wish list

Page 56: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

The need for concurrency

Step #1

Web service request to get the user profile including wish list (list of product Ids)

Step #2

For each productId: web service request to get product info

Sequentially ⇒ terrible response time

Need fetch productInfo concurrently

Page 57: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

Futures are a great concurrency abstraction

http://en.wikipedia.org/wiki/Futures_and_promises

Page 58: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

Worker thread or event-driven

Main threadHow futures work

Outcome

Future

Client

get

Asynchronous operation

set

initiates

Page 59: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

BenefitsSimple way for two concurrent activities to communicate safely

Abstraction:

Client does not know how the asynchronous operation is implemented

Easy to implement scatter/gather:

Scatter: Client can invoke multiple asynchronous operations and gets a Future for each one.

Gather: Get values from the futures

Page 60: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

Example wish list servicepublic interface UserService { Future<UserProfile> getUserProfile(long userId);}

public class UserServiceProxy implements UserService {

private ExecutorService executorService;

@Override public Future<UserProfile> getUserProfile(long userId) { return executorService.submit(() -> restfulGet("http://uservice/user/" + userId,

UserProfile.class)); } ...}

public interface ProductInfoService { Future<ProductInfo> getProductInfo(long productId);}

Page 61: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

public class WishlistService {

private UserService userService; private ProductInfoService productInfoService;

public Wishlist getWishlistDetails(long userId) throws Exception {

Future<UserProfile> userProfileFuture = userService.getUserProfile(userId); UserProfile userProfile = userProfileFuture.get(300, TimeUnit.MILLISECONDS);

Example wish list serviceget user

info

List<Future<ProductInfo>> productInfoFutures = userProfile.getWishListProductIds().stream() .map(productInfoService::getProductInfo) .collect(Collectors.toList());

long deadline = System.currentTimeMillis() + 300;

List<ProductInfo> products = new ArrayList<ProductInfo>(); for (Future<ProductInfo> pif : productInfoFutures) { long timeout = deadline - System.currentTimeMillis(); if (timeout <= 0) throw new TimeoutException(...); products.add(pif.get(timeout, TimeUnit.MILLISECONDS)); }... return new Wishlist(products); }

asynchronouslyget all products

wait for product

info

Page 62: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

It works BUTCode is very low-level and

messyAnd, it’s blocking

Page 63: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

Better: Futures with callbacks ⇒ no blocking!

def asyncSquare(x : Int) : Future[Int] = ... x * x...

val f = asyncSquare(25)

Guava ListenableFutures, Spring 4 ListenableFutureJava 8 CompletableFuture, Scala Futures

f onSuccess { case x : Int => println(x)}f onFailure { case e : Exception => println("exception thrown")}

Partial function applied to successful outcome

Applied to failed outcome

Page 64: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

But callback-based scatter/gather

⇒Messy, tangled code(aka. callback hell)

Page 65: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

Functional futures - map

def asyncPlus(x : Int, y : Int) = ... x + y ...

val future2 = asyncPlus(4, 5).map{ _ * 3 }

assertEquals(27, Await.result(future2, 1 second))

Scala, Java 8 CompletableFuture

Asynchronously transforms future

Page 66: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

Functional futures - flatMap()

val f2 = asyncPlus(5, 8).flatMap { x => asyncSquare(x) }

assertEquals(169, Await.result(f2, 1 second))

Scala, Java 8 CompletableFuture (partially)

Calls asyncSquare() with the eventual outcome of asyncPlus()

Page 67: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

flatMap() is asynchronous

Outcome3f3

Outcome3

f2

f2 = f1 flatMap (someFn)

Outcome1

f1

Implemented using callbacks

someFn(outcome1)

Page 68: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

class WishListService(...) { def getWishList(userId : Long) : Future[WishList] = {

userService.getUserProfile(userId) flatMap { userProfile =>

Scala wishlist service

val futureOfProductsList : Future[List[ProductInfo]] = Future.sequence(listOfProductFutures)

val timeoutFuture = ... Future.firstCompletedOf(Seq(wishlist, timeoutFuture)) } }

val wishlist = futureOfProductsList.map { products =>

WishList(products) }

val listOfProductFutures : List[Future[ProductInfo]] = userProfile.wishListProductIds

.map { productInfoService.getProductInfo }

Page 69: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

Using Java 8 CompletableFutures

public class UserServiceImpl implements UserService { @Override public CompletableFuture<UserInfo> getUserInfo(long userId) { return CompletableFuture.supplyAsync( () -> httpGetRequest("http://myuservice/user" + userId,

UserInfo.class)); }

Runs in ExecutorService

Page 70: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

Using Java 8 CompletableFuturespublic CompletableFuture<Wishlist> getWishlistDetails(long userId) { return userService.getUserProfile(userId).thenComposeAsync(userProfile -> {

Stream<CompletableFuture<ProductInfo>> s1 = userProfile.getWishListProductIds() .stream() .map(productInfoService::getProductInfo);

Stream<CompletableFuture<List<ProductInfo>>> s2 = s1.map(fOfPi -> fOfPi.thenApplyAsync(pi -> Arrays.asList(pi)));

CompletableFuture<List<ProductInfo>> productInfos = s2 .reduce((f1, f2) -> f1.thenCombine(f2, ListUtils::union)) .orElse(CompletableFuture.completedFuture(Collections.emptyList()));

return productInfos.thenApply(list -> new Wishlist()); }); }

Java 8 is missing Future.sequence()

flatMap()!

map()!

Page 71: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

Introducing Reactive Extensions (Rx)

The Reactive Extensions (Rx) is a library for composing asynchronous and event-based programs using

observable sequences and LINQ-style query operators. Using Rx, developers represent asynchronous data streams with Observables , query asynchronous

data streams using LINQ operators , and .....

https://rx.codeplex.com/

Page 72: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

About RxJava

Reactive Extensions (Rx) for the JVM

Original motivation for Netflix was to provide rich Futures

Implemented in Java

Adaptors for Scala, Groovy and Clojure

https://github.com/Netflix/RxJava

Page 73: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

RxJava core concepts

trait Observable[T] { def subscribe(observer : Observer[T]) : Subscription ...}

trait Observer[T] {def onNext(value : T)def onCompleted()def onError(e : Throwable)

}

Notifies

An asynchronous stream of items

Used to unsubscribe

Page 74: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

Comparing Observable to...Observer pattern - similar but adds

Observer.onComplete()

Observer.onError()

Iterator pattern - mirror image

Push rather than pull

Futures - similar

Can be used as Futures

But Observables = a stream of multiple values

Collections and Streams - similar

Functional API supporting map(), flatMap(), ...

But Observables are asynchronous

Page 75: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

Fun with observables

val every10Seconds = Observable.interval(10 seconds)

-1 0 1 ...

t=0 t=10 t=20 ...

val oneItem = Observable.items(-1L)

val ticker = oneItem ++ every10Seconds

val subscription = ticker.subscribe { (value: Long) => println("value=" + value) }...subscription.unsubscribe()

Page 76: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

def getTableStatus(tableName: String) : Observable[DynamoDbStatus]=

Observable { subscriber: Subscriber[DynamoDbMessage] =>

}

Connecting observables to the outside world

amazonDynamoDBAsyncClient.describeTableAsync(new DescribeTableRequest(tableName), new AsyncHandler[DescribeTableRequest, DescribeTableResult] {

override def onSuccess(request: DescribeTableRequest, result: DescribeTableResult) = { subscriber.onNext(DynamoDbStatus(result.getTable.getTableStatus)) subscriber.onCompleted() }

override def onError(exception: Exception) = exception match { case t: ResourceNotFoundException => subscriber.onNext(DynamoDbStatus("NOT_FOUND")) subscriber.onCompleted() case _ => subscriber.onError(exception) } }) }

Page 77: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

Transforming observables

val tableStatus = ticker.flatMap { i => logger.info("{}th describe table", i + 1) getTableStatus(name) }

Status1 Status2 Status3 ...

t=0 t=10 t=20 ...+ Usual collection methods: map(), filter(), take(), drop(), ...

Page 78: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

Calculating rolling averageclass AverageTradePriceCalculator {

def calculateAverages(trades: Observable[Trade]): Observable[AveragePrice] = { ... }

case class Trade( symbol : String, price : Double, quantity : Int ...)

case class AveragePrice(symbol : String, price : Double, ...)

Page 79: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

Calculating average pricesdef calculateAverages(trades: Observable[Trade]): Observable[AveragePrice] = {

trades.groupBy(_.symbol).map { symbolAndTrades => val (symbol, tradesForSymbol) = symbolAndTrades val openingEverySecond =

Observable.items(-1L) ++ Observable.interval(1 seconds) def closingAfterSixSeconds(opening: Any) =

Observable.interval(6 seconds).take(1)

tradesForSymbol.window(...).map { windowOfTradesForSymbol => windowOfTradesForSymbol.fold((0.0, 0, List[Double]())) { (soFar, trade) => val (sum, count, prices) = soFar (sum + trade.price, count + trade.quantity, trade.price +: prices) } map { x => val (sum, length, prices) = x AveragePrice(symbol, sum / length, prices) } }.flatten }.flatten}

Page 80: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

Agenda

Why functional programming?

Simplifying collection processing

Eliminating NullPointerExceptions

Simplifying concurrency with Futures and Rx Observables

Tackling big data problems with functional programming

Page 81: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

Let’s imagine that you want to count word frequencies

Page 82: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

Scala Word Count

val frequency : Map[String, Int] = Source.fromFile("gettysburgaddress.txt").getLines() .flatMap { _.split(" ") }.toList

frequency("THE") should be(11)frequency("LIBERTY") should be(1)

.groupBy(identity) .mapValues(_.length))

Map

Reduce

Page 83: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

But how to scale to a cluster of machines?

Page 84: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

Apache HadoopOpen-source software for reliable, scalable, distributed computing

Hadoop Distributed File System (HDFS)

Efficiently stores very large amounts of data

Files are partitioned and replicated across multiple machines

Hadoop MapReduce

Batch processing system

Provides plumbing for writing distributed jobs

Handles failures

...

Page 85: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

Overview of MapReduceInputData

Mapper

Mapper

Mapper

Reducer

Reducer

Reducer

Output

DataShuffle

(K,V)

(K,V)

(K,V)

(K,V)*

(K,V)*

(K,V)*

(K1,V, ....)*

(K2,V, ....)*

(K3,V, ....)*

(K,V)

(K,V)

(K,V)

Page 86: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

MapReduce Word count - mapper

class Map extends Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, Context context) { String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); context.write(word, one); } }}

(“Four”, 1), (“score”, 1), (“and”, 1), (“seven”, 1), ...

Four score and seven years⇒

http://wiki.apache.org/hadoop/WordCount

Page 87: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

Hadoop then shuffles the key-value pairs...

Page 88: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

MapReduce Word count - reducer

class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {

public void reduce(Text key, Iterable<IntWritable> values, Context context) { int sum = 0; for (IntWritable val : values) { sum += val.get(); } context.write(key, new IntWritable(sum)); } }

(“the”, 11)

(“the”, (1, 1, 1, 1, 1, 1, ...))⇒

http://wiki.apache.org/hadoop/WordCount

Page 89: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

About MapReduceVery simple programming abstract yet incredibly powerful

By chaining together multiple map/reduce jobs you can process very large amounts of data

e.g. Apache Mahout for machine learning

But

Mappers and Reducers = verbose code

Development is challenging, e.g. unit testing is difficult

It’s disk-based, batch processing ⇒ slow

Page 90: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

Scalding: Scala DSL for MapReduce

class WordCountJob(args : Args) extends Job(args) { TextLine( args("input") ) .flatMap('line -> 'word) { line : String => tokenize(line) } .groupBy('word) { _.size } .write( Tsv( args("output") ) )

def tokenize(text : String) : Array[String] = { text.toLowerCase.replaceAll("[^a-zA-Z0-9\\s]", "") .split("\\s+") }}

https://github.com/twitter/scalding

Expressive and unit testable

Each row is a map of named fields

Page 91: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

Apache SparkPart of the Hadoop ecosystem

Key abstraction = Resilient Distributed Datasets (RDD)

Collection that is partitioned across cluster members

Operations are parallelized

Created from either a Scala collection or a Hadoop supported datasource - HDFS, S3 etc

Can be cached in-memory for super-fast performance

Can be replicated for fault-tolerance

http://spark.apache.org

Page 92: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

Spark Word Countval sc = new SparkContext(...)

sc.textFile(“s3n://mybucket/...”) .flatMap { _.split(" ")} .groupBy(identity) .mapValues(_.length) .toArray.toMap }}

Expressive, unit testable and very fast

Page 93: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

Summary

Functional programming enables the elegant expression of good ideas in a wide variety of domains

map(), flatMap() and reduce() are remarkably versatile higher-order functions

Use FP and OOP together

Java 8 has taken a good first step towards supporting FP

Page 94: Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

@crichardson

Questions?

@crichardson [email protected]

http://plainoldobjects.com