@crichardson Map(), flatMap() and reduce() are your new best friends: Simpler collections, concurrency, and big data Chris Richardson Author of POJOs in Action Founder of the original CloudFoundry.com @crichardson [email protected]http://plainoldobjects.com
94
Embed
Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)
Higher-order functions such as map(), flatmap(), filter() and reduce() have their origins in mathematics and ancient functional programming languages such as Lisp. But today they have entered the mainstream and are available in languages such as JavaScript, Scala and Java 8. They are well on their way to becoming an essential part of every developer’s toolbox.
In this talk you will learn how these and other higher-order functions enable you to write simple, expressive and concise code that solve problems in a diverse set of domains. We will describe how you use them to process collections in Java and Scala. You will learn how functional Futures and Rx (Reactive Extensions) Observables simplify concurrent code. We will even talk about how to write big data applications in a functional style using libraries such as Scalding.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
@crichardson
Map(), flatMap() and reduce() are your new best friends:
Simpler collections, concurrency, and big data
Chris Richardson
Author of POJOs in ActionFounder of the original CloudFoundry.com
private Set<Friend> friends = new HashSet<Friend>(); ....
public class Friend {
private Person friend; private LocalDate becameFriends; ...}
public class SocialNetwork { private Set<Person> people; ...
@crichardson
Mapping, filtering, and reducingpublic class Person {
public Set<Hometown> hometownsOfFriends() { Set<Hometown> result = new HashSet<>(); for (Friend friend : friends) { result.add(friend.getPerson().getHometown()); } return result; }
@crichardson
Mapping, filtering, and reducing
public class Person {
public Set<Person> friendOfFriends() { Set<Person> result = new HashSet(); for (Friend friend : friends) for (Friend friendOfFriend : friend.getPerson().friends) if (friendOfFriend.getPerson() != this) result.add(friendOfFriend.getPerson()); return result; }
@crichardson
Mapping, filtering, and reducingpublic class SocialNetwork {
private Set<Person> people;
...
public Set<Person> lonelyPeople() { Set<Person> result = new HashSet<Person>(); for (Person p : people) { if (p.getFriends().isEmpty()) result.add(p); } return result; }
@crichardson
Mapping, filtering, and reducingpublic class SocialNetwork {
private Set<Person> people;
...
public int averageNumberOfFriends() { int sum = 0; for (Person p : people) { sum += p.getFriends().size(); } return sum / people.size(); }
@crichardson
Problems with this style of programming
Low level
Imperative (how to do it) NOT declarative (what to do)
Verbose
Mutable variables are potentially error prone
Difficult to parallelize
@crichardson
Java 8 streams to the rescue
A sequence of elements
“Wrapper” around a collection
Streams can also be infinite
Provides a functional/lambda-based API for transforming, filtering and aggregating elements
public Set<Person> peopleWithNoFriends() { Set<Person> result = new HashSet<Person>(); for (Person p : people) { if (p.getFriends().isEmpty()) result.add(p); } return result; }
Using Java 8 streams - filteringpublic class SocialNetwork {
private Set<Person> people;
...
public Set<Person> lonelyPeople() { return people.stream()
public Set<Person> friendOfFriends() { return friends.stream() .flatMap(friend -> friend.getPerson().friends.stream()) .map(Friend::getPerson) .filter(f -> f != this) .collect(Collectors.toSet()); }
maps and flattens
@crichardson
The flatMap() function
s1 a b ...
s2 f(a)0 f(a)1 f(b)0 f(b)1 f(b)2 ...
s2 = s1.flatMap(f)
@crichardson
Using Java 8 streams - reducingpublic class SocialNetwork {
private Set<Person> people;
...
public long averageNumberOfFriends() { return people.stream() .map ( p -> p.getFriends().size() ) .reduce(0, (x, y) -> x + y) / people.size(); } int x = 0;
for (int y : inputStream) x = x + yreturn x;
@crichardson
The reduce() function
s1 a b c d e ...
x = s1.reduce(initial, f)
f(f(f(f(f(f(initial, a), b), c), d), e), ...)
@crichardson
Newton's method for finding square roots
public class SquareRootCalculator {
public double squareRoot(double input, double precision) { return Stream.iterate( new Result(1.0), current -> refine(current, input, precision)) .filter(r -> r.done) .findFirst().get().value; }
private static Result refine(Result current, double input, double precision) { double value = current.value; double newCurrent = value - (value * value - input) / (2 * value); boolean done = Math.abs(value - newCurrent) < precision; return new Result(newCurrent, done); } class Result { boolean done; double value; }
Creates an infinite stream: seed, f(seed), f(f(seed)), .....
Don’t panic! Streams are lazy
@crichardson
Agenda
Why functional programming?
Simplifying collection processing
Eliminating NullPointerExceptions
Simplifying concurrency with Futures and Rx Observables
Tackling big data problems with functional programming
@crichardson
Tony’s $1B mistake
“I call it my billion-dollar mistake. It was the invention of the null
reference in 1965....But I couldn't resist the temptation to put in a null reference, simply because it
Java 8 Optional<T>A wrapper for nullable references
It has two states:
empty ⇒ throws an exception if you try to get the reference
non-empty ⇒ contain a non-null reference
Provides methods for:
testing whether it has a value
getting the value
...
Return reference wrapped in an instance of this type instead of null
@crichardson
Coding with optionalsclass Person public Optional<Friend> longestFriendship() { Friend result = null; for (Friend friend : friends) { if (result == null || friend.getBecameFriends().isBefore(result.getBecameFriends())) result = friend; } return Optional.ofNullable(result); }
Optional<Friend> oldestFriend = person.longestFriendship();// Might throw java.util.NoSuchElementException: No value present// Person dangerous = popularPerson.get();if (oldestFriend.isPresent) { ...oldestFriend.get()} else { ...}
tradesForSymbol.window(...).map { windowOfTradesForSymbol => windowOfTradesForSymbol.fold((0.0, 0, List[Double]())) { (soFar, trade) => val (sum, count, prices) = soFar (sum + trade.price, count + trade.quantity, trade.price +: prices) } map { x => val (sum, length, prices) = x AveragePrice(symbol, sum / length, prices) } }.flatten }.flatten}
@crichardson
Agenda
Why functional programming?
Simplifying collection processing
Eliminating NullPointerExceptions
Simplifying concurrency with Futures and Rx Observables
Tackling big data problems with functional programming
@crichardson
Let’s imagine that you want to count word frequencies
@crichardson
Scala Word Count
val frequency : Map[String, Int] = Source.fromFile("gettysburgaddress.txt").getLines() .flatMap { _.split(" ") }.toList
frequency("THE") should be(11)frequency("LIBERTY") should be(1)
.groupBy(identity) .mapValues(_.length))
Map
Reduce
@crichardson
But how to scale to a cluster of machines?
@crichardson
Apache HadoopOpen-source software for reliable, scalable, distributed computing
Hadoop Distributed File System (HDFS)
Efficiently stores very large amounts of data
Files are partitioned and replicated across multiple machines
Hadoop MapReduce
Batch processing system
Provides plumbing for writing distributed jobs
Handles failures
...
@crichardson
Overview of MapReduceInputData
Mapper
Mapper
Mapper
Reducer
Reducer
Reducer
Output
DataShuffle
(K,V)
(K,V)
(K,V)
(K,V)*
(K,V)*
(K,V)*
(K1,V, ....)*
(K2,V, ....)*
(K3,V, ....)*
(K,V)
(K,V)
(K,V)
@crichardson
MapReduce Word count - mapper
class Map extends Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, Context context) { String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); context.write(word, one); } }}
class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterable<IntWritable> values, Context context) { int sum = 0; for (IntWritable val : values) { sum += val.get(); } context.write(key, new IntWritable(sum)); } }
(“the”, 11)
(“the”, (1, 1, 1, 1, 1, 1, ...))⇒
http://wiki.apache.org/hadoop/WordCount
@crichardson
About MapReduceVery simple programming abstract yet incredibly powerful
By chaining together multiple map/reduce jobs you can process very large amounts of data
e.g. Apache Mahout for machine learning
But
Mappers and Reducers = verbose code
Development is challenging, e.g. unit testing is difficult