Top Banner
How to Architect Big Data Apps with the Lambda Architecture OCTOBER 2014 Altan Khendup – Big Data Architect Ron Bodkin – Founder Think Big, a Teradata company
22

How to Architect Big Data Apps with the Lambda Architecture OCTOBER 2014 Altan Khendup – Big Data Architect Ron Bodkin – Founder Think Big, a Teradata.

Dec 14, 2015

Download

Documents

Joel Medley
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: How to Architect Big Data Apps with the Lambda Architecture  OCTOBER 2014  Altan Khendup – Big Data Architect  Ron Bodkin – Founder Think Big, a Teradata.

How to Architect Big Data Apps with the Lambda Architecture

OCTOBER 2014Altan Khendup – Big Data Architect

Ron Bodkin – Founder Think Big, a Teradata company

Page 2: How to Architect Big Data Apps with the Lambda Architecture  OCTOBER 2014  Altan Khendup – Big Data Architect  Ron Bodkin – Founder Think Big, a Teradata.

2

Real-Time

• Low latency– Query response– Data refresh– End-to-end response

• … nanoseconds, milliseconds, seconds, or minutes depending on your problem

• Two basic patterns– Strategic insight: decision support– Process execution: system of engagement/operational analytics

Copyright 2013-2014 Think Big, a Teradata Company

Page 3: How to Architect Big Data Apps with the Lambda Architecture  OCTOBER 2014  Altan Khendup – Big Data Architect  Ron Bodkin – Founder Think Big, a Teradata.

3© 2014 Teradata

• Many users looking to gain valuable insights from both batch and real-time systems

• User Characteristics– Do not always understand the complexities of tackling this

challenge– Also want to use familiar/easy-to-use interfaces wherever

possible– Want best practices about ways to integrate real-time

(current) and batch (historical)– Often not aware of all the options and trade-offs among

them

Real-time Demand Growing

Page 4: How to Architect Big Data Apps with the Lambda Architecture  OCTOBER 2014  Altan Khendup – Big Data Architect  Ron Bodkin – Founder Think Big, a Teradata.

4© 2014 Teradata

• Lambda Architecture…– Provides a common architectural pattern for discussion– Provides a more clear picture of the complexities typically

found in most organizations

• Some challenges in tackling Lambda architecture– Complete Lambda requires more than just a single system- Typically requires multiple components- E.g. Batch/cold storage via e.g. Hadoop, Real-time/current data

via e.g. Storm, Query via e.g. business analysis using a database

– Also some challenges in delivering results to the business- Coordination is very difficult across the stack- Quality results back to the organization very important

– Takes a lot of knowledge/expertise/technology to tackle– Not typically a first step in Big Data implementation

Enter Lambda Architecture

Page 5: How to Architect Big Data Apps with the Lambda Architecture  OCTOBER 2014  Altan Khendup – Big Data Architect  Ron Bodkin – Founder Think Big, a Teradata.

5

Background of Lambda Architecture

Background– Reference architecture for Big Data systems– Designed by Nathan Marz (Twitter)– Defined as a system that runs arbitrary functions on

arbitrary data– “query = function(all data)”

Design Principles– Human fault-tolerant, Immutability, Computable

Lambda Layers – Batch - Contains the immutable, constantly growing master

dataset.– Speed - Deals only with new data and compensates for the

high latency updates of the serving layer.– Serving - Loads and exposes the combined view of data so

that they can be queried.

Page 6: How to Architect Big Data Apps with the Lambda Architecture  OCTOBER 2014  Altan Khendup – Big Data Architect  Ron Bodkin – Founder Think Big, a Teradata.

6

Overview of Lambda Architecture

Page 7: How to Architect Big Data Apps with the Lambda Architecture  OCTOBER 2014  Altan Khendup – Big Data Architect  Ron Bodkin – Founder Think Big, a Teradata.

© 2014 Teradata7

USE CASE - MEDICAL

Page 8: How to Architect Big Data Apps with the Lambda Architecture  OCTOBER 2014  Altan Khendup – Big Data Architect  Ron Bodkin – Founder Think Big, a Teradata.

Every year, more than a million people from all 50 states and nearly 150 countries come for care

Challenges in Medical DataHealth data tends to be “wide”, not “deep”

New data types are becoming more importantUnstructured

Real-time streaming

A challenge to generally move from retrospective “BI” viewing to event-based and predictive analytics

usage

Page 9: How to Architect Big Data Apps with the Lambda Architecture  OCTOBER 2014  Altan Khendup – Big Data Architect  Ron Bodkin – Founder Think Big, a Teradata.

Optimize an existing Natural Language Processing pipeline in support of critical Colorectal Surgery

(Move to tens of thousands of documents processed)

Replace an existing free-text search facility used by Clinical Web Service for colorectal cancer

(Move search to milliseconds)

Page 10: How to Architect Big Data Apps with the Lambda Architecture  OCTOBER 2014  Altan Khendup – Big Data Architect  Ron Bodkin – Founder Think Big, a Teradata.

10

Overall Architecture

Page 11: How to Architect Big Data Apps with the Lambda Architecture  OCTOBER 2014  Altan Khendup – Big Data Architect  Ron Bodkin – Founder Think Big, a Teradata.

11

• Current Storm throughput up to 1.5 million documents per hour

• Average of 140,000 HL7 messages actually processed per day with average latency of 60 milliseconds from ingest to persistence

• Average of 50,000 documents passed through annotators per day versus 5,000 historically

• Actual annotations of documents up to 6 times faster than previously accomplished

• Free-text search use cases that took over 30 minutes on old infrastructure completing in milliseconds in ElasticSearch

Operational Statistics

Page 12: How to Architect Big Data Apps with the Lambda Architecture  OCTOBER 2014  Altan Khendup – Big Data Architect  Ron Bodkin – Founder Think Big, a Teradata.

12

• Challenges– Multiple layers- Lots of events, data

– Complex- Lots of different languages and data structures

– Difficult to maintain- Lots of moving pieces/components/technologies

- Lots of changes for the business

• Need for Practical Lambda approach– Based on real-world implementations– Metadata model (events and data)– Discrete data (query focused datasets)– Data convergence (holistic query focused dataset)

Implementing Lambda

Page 13: How to Architect Big Data Apps with the Lambda Architecture  OCTOBER 2014  Altan Khendup – Big Data Architect  Ron Bodkin – Founder Think Big, a Teradata.

13

Active Executor Lambda Framework

Page 14: How to Architect Big Data Apps with the Lambda Architecture  OCTOBER 2014  Altan Khendup – Big Data Architect  Ron Bodkin – Founder Think Big, a Teradata.

Real Time and Lambda

Page 15: How to Architect Big Data Apps with the Lambda Architecture  OCTOBER 2014  Altan Khendup – Big Data Architect  Ron Bodkin – Founder Think Big, a Teradata.

15

Real-Timeisn’tfree!- 1hourvs.5minvs.seconds- Andmaynotbemeaningfulanyhow- Istherearobotorahumanintheloop?

SimplerInstantiationsofLambda- Micro-BatchFeeds&Real-TimeQueries- EmbarrassinglyParallelSpeedLayer- TransientSpeedLayer

- …OnedatabaseforSpeed&Serving(RDBMSorNoSQL)

KISS

Page 16: How to Architect Big Data Apps with the Lambda Architecture  OCTOBER 2014  Altan Khendup – Big Data Architect  Ron Bodkin – Founder Think Big, a Teradata.

16

Understandingconsumerpurchasebehavioracrossmorethanonetouchpointtodriveholisticresults

Eachchannelforconsumermarketingandengagementhassiloedapplicationsandanalytictools

Correlatingbehavioracrosschannelstounderstandcustomerjourneysallowsbetterengagement(e.g.,web,mobile,callcenter,instore,email,social)

Commongoals:increasedresponserates,increasedshareofwallet,reducedchurn,focusonhighvaluecustomers,increasecustomersatisfaction

Challenges:datavolumes,correlation/sessionization,featurediscovery

UseCase:Cross-ChannelBehaviorAnalytics

Page 17: How to Architect Big Data Apps with the Lambda Architecture  OCTOBER 2014  Altan Khendup – Big Data Architect  Ron Bodkin – Founder Think Big, a Teradata.

17

Manyanalyticsusecasescanbehandledwithupdatelatenciesofafewminutes

Micro-batchingallowsfordramaticefficiencyimprovements

- …canextendtoupdatespereventwithadditionalinfrastructure Pre-aggregation(HBase,MPP,etc.)canservemanyusers

Hadoopquery(Hive0.13+/Tez,Impalaetc.)emerging

Real-TimeQueriesPattern

Micro-batchQueue

Kafkaetc HadoopHBase/Teradata/Hive…

Query/Serving

Events

Webserver…

Page 18: How to Architect Big Data Apps with the Lambda Architecture  OCTOBER 2014  Altan Khendup – Big Data Architect  Ron Bodkin – Founder Think Big, a Teradata.

18

Recommendationsrelyon- recentactivity(purchases,contentviewed,productinterest,supportissues)

- trends/fashion- long-termpropensity(relationshiphistory,micro-segments,social…)

Theopportunityistointegratedeepinsightinto- Behavior- Socialgraph

Buildingproductrecommendations/person/nextbestofferthat’smaximallyeffective

AllA/Btested

UseCase:Recommendations

Page 19: How to Architect Big Data Apps with the Lambda Architecture  OCTOBER 2014  Altan Khendup – Big Data Architect  Ron Bodkin – Founder Think Big, a Teradata.

19

Manyoperationalusecasescanbedistributedacrossappserverfarm

BatchcomputedviewspushedtoNoSQL

ReadNoSQL,update,respond&writetoNoSQLcanbedonequickly

Noneedforstreaminganalytics/computation

EmbarrassinglyParallelSpeedLayerPattern

Micro-batchQueue

Kafkaetc

Hadoop

HBase/Mongo…

NoSQL/Speed

Events

Webserver…

Page 20: How to Architect Big Data Apps with the Lambda Architecture  OCTOBER 2014  Altan Khendup – Big Data Architect  Ron Bodkin – Founder Think Big, a Teradata.

20

Conclusions

Therearemanykindsofreal-timeproblems NooneBigDatatechnologysolvesalltheproblems

Lambdaarchitectureprovidesapowerfulwaytosolvethemoresophisticated

Therearesimplerapproachesforsimplerproblems…

…whichmaybeasteptowardsLambda

Copyright2013-2014ThinkBig,aTeradataCompany

Page 21: How to Architect Big Data Apps with the Lambda Architecture  OCTOBER 2014  Altan Khendup – Big Data Architect  Ron Bodkin – Founder Think Big, a Teradata.

21

We’reHiring!

thinkbig.teradata.com

Booth#324

Page 22: How to Architect Big Data Apps with the Lambda Architecture  OCTOBER 2014  Altan Khendup – Big Data Architect  Ron Bodkin – Founder Think Big, a Teradata.

22

AltanKhendup(@madmongol)

RonBodkin(@ronbodkin)

Thankyou!