Top Banner
Deconstructing Lambda Architectures Felix Crisan, Co-Founder & CTO, NETOPIA
18

Deconstructing Lambda Architectures

Jul 19, 2015

Download

Technology

fixone
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Deconstructing Lambda Architectures

Deconstructing Lambda ArchitecturesFelix Crisan, Co-Founder & CTO, NETOPIA

Page 2: Deconstructing Lambda Architectures

Then...

Page 3: Deconstructing Lambda Architectures

...and now

Page 4: Deconstructing Lambda Architectures

Meaning of NoSQL

1970 = We have no SQL

1980 = Know SQL

2000 = No SQL!

2005 = Not only SQL

2015 = No, SQL(slide adapted from @markmadsen)

Page 5: Deconstructing Lambda Architectures

λ - Architecture - Why?

● Robust & Fault Tolerant system● Multiple workloads● Ad-hoc queries● Scalable (out rather than up)● Extensible

Page 6: Deconstructing Lambda Architectures

λ - Architecture - So?

● In a nutshell - storage is cheap● Data is generated (retrieved,

stored) in time increments● Data tends to be immutable

(especially events)

Page 7: Deconstructing Lambda Architectures

Relational Algebra

queryresult = f(all data)

f is a composite of one or more σ(select), Π(projection), U(reunion), X(product), -

(difference - setwise)

Page 8: Deconstructing Lambda Architectures

CAP - Pick Any Two

Consistency Availability

Partition Tolerance

No system can be in this region

You can find systems here

Page 9: Deconstructing Lambda Architectures

λ-Arch - Who/When?

● Nathan Marz● Oct 2011● “How To Beat the CAP

theorem” http://nathanmarz.

com/blog/how-to-beat-the-cap-theorem.html

Page 10: Deconstructing Lambda Architectures

λ-Architecture - What?

(Big)Data Source

Batch Layer

Speed Layer

Serving Layer

Page 11: Deconstructing Lambda Architectures

λ-Architecture

Layer Holds Characteristics

Batch Deep Global Truth High Throughput/High Latency

Speed Relevant Local Truth Medium Throughput/Low Latency

Serving Data for Rapid Retrieval Low Throughput/Low Latency

Page 12: Deconstructing Lambda Architectures

Batch Layer

● de-normalized data inputs/master dataset

● append-only● scalable● idempotent calculations● AP

Page 13: Deconstructing Lambda Architectures

Usual suspects

● HDFS/Hadoop● MapReduce (MRv1), YARN (MRv2)● Spark/SparkSQL● Hive● Pig● ...and others

Page 14: Deconstructing Lambda Architectures

Speed/Realtime Layer

● Realtime is actually Near Real Time● Compensates latency in Batch Layer● Continuous computation/Limited

window

Page 15: Deconstructing Lambda Architectures

Usual Suspects

● Storm (http://storm.apache.org/)

● Spark Streaming (http://spark.apache.org/)

● Samza (http://samza.apache.org/)

● S4 (http://incubator.apache.org/s4/)

● MQ (0MQ,RabbitMQ/AMQP etc)

Page 16: Deconstructing Lambda Architectures

Serving Layer

● Indexes and exposes views into data● Sometimes split into Speed Serving

and Batch Serving● Supports ad-hoc queries

Page 17: Deconstructing Lambda Architectures

(Non)Usual suspects

● Drill (http://drill.apache.org/)

● ElephantDB (https://github.com/nathanmarz/elephantdb)

● Voldemort (http://www.project-voldemort.

com/voldemort/)

● ElasticSearch/Solr/Lucene (https://www.

elastic.co/)

● Cassandra sstables (http://cassandra.apache.

org/)

● Cloudera Impala (http://www.cloudera.com/ )

Page 18: Deconstructing Lambda Architectures

Takeaways

● Immutability● (P)Re-computation● Human fault-tolerance● No One-Size-Fits-All● YMMV● ψ-Architectures?