© Hortonworks Inc. 2012 Go beyond debug Wire Tap your App for knowlege with Hadoop Tom McCuch Solution Engineering @ Hortonworks Twitter: tmccuch Oleg Zhurakousky Principal Architect @ Hortonworks Twitter: z_oleg
Mar 17, 2016
© Hortonworks Inc. 2012
Go beyond debugWire Tap your App for knowlege
with Hadoop
Tom McCuchSolution Engineering @ HortonworksTwitter: tmccuch
Oleg ZhurakouskyPrincipal Architect @ HortonworksTwitter: z_oleg
© Hortonworks Inc. 2012© Hortonworks Inc. 2012
The Application Development Dilemma
• Today, application developers devote roughly 80% of their code to persisting roughly 20% of the total data flowing through their applications
–80% of the data flowing through our applications is at best lost in rolling log files, at worst never collected -- without ever being analyzed or accounted for
–For the remaining 20% we do currently collect – application-level database programming, licensing, storage, administration, and ETL processing have maxed out IT operations budgets and have constrained app development teams from keeping pace with the rate of change in the business
Page 2
© Hortonworks Inc. 2012© Hortonworks Inc. 2012
Example: Data Available During Ingest
• Record count• Highest/Lowest record length• Average record length• Compression ratio
But with a little more work. . .• Field parsing
–Unique values–Unique values per field–Access to values of each field independently from the record–Relatively fast field-based searches, without indexing–Value encoding–Etc…
These are cross-cutting concerns!Page 3
How do we address cross-cutting concerns without disturbing the
existing process flow?
Page 4
© Hortonworks Inc. 2012© Hortonworks Inc. 2012
Wire Tap Defined
Page 5
© Hortonworks Inc. 2012© Hortonworks Inc. 2012
Wire Tap is an Enterprise Integration Pattern
Page 6
TransformerConvert payload or modify headers
FilterDiscard messages based on boolean evaluation
RouterDetermine next channel based on content
SplitterGenerate multiple messages from one
AggregatorAssemble a single message from multiple
Other Enterprise Integration Patterns
Page 7
© Hortonworks Inc. 2012
The Business Case
© Hortonworks Inc. 2013
6 Key Hadoop DATA TYPES
1. SentimentUnderstand how your customers feel about your brand and products – right now
2. ClickstreamCapture and analyze website visitors’ data trails and optimize your website
3. Sensor/MachineDiscover patterns in data streaming automatically from remote sensors and machines
4. GeographicAnalyze location-based data to manage operations where they occur
5. Server LogsResearch logs to diagnose process failures and prevent security breaches
6. TextUnderstand patterns in text across millions of web pages, emails, and documents
Page
Value
© Hortonworks Inc. 2013
20 Apache Hadoop Enterprise Use Cases
Page
Vertical Use Case Data Type
Financial Services
New Account Risk Screens Text, Server Logs
Fraud Prevention Server Logs
Trading Risk Server Logs
Maximize Deposit Spread Text, Server Logs
Insurance Underwriting Geographic, Sensor, Text
Accelerate Loan Processing Text
Telecom
Call Detail Records (CDRs) Machine, Geographic
Infrastructure Investment Machine, Server Logs
Next Product to Buy (NPTB) Clickstream
Real-time Bandwidth Allocation Server Logs, Text, Sentiment
New Product Development Machine, Geographic
Retail
360° View of the Customer Clickstream, Text
Analyze Brand Sentiment Sentiment
Localized, Personalized Promotions Geographic
Website Optimization Clickstream
Optimal Store Layout Sensor
Manufacturing
Supply Chain and Logistics Sensor
Assembly Line Quality Assurance Sensor
Proactive Maintenance Machine
Crowdsourced Quality Assurance Sentiment
© Hortonworks Inc. 2012
Fraud Prevention
Business Problem• Financial institutions are always at risk of fraud• Fraudsters test bank systems for vulnerabilities• This testing leaves subtle patterns often undetected by bank
employees or law enforcement• Fraud losses costs banks millions
Solution• HDP reduces the cost to detect fraudulent activity• HDP stores more types of data for longer• Analysis of data in the “data lake” exposes fraudulent patterns that
would have gone undetected
Financial Services Data: Server Logs
12
Credit Request Process Flow - Before
Credit Request Processing• Credit Request arrives on a Gateway• Credit Request is sent over a Channel • Credit Request Processor
• Receives Request• Processes the Request• Issues a Response
• Credit Scoring• Fraud Detection• Gathering Data Available during Credit
Request Process Flow
Cross-Cutting Concerns
© Hortonworks Inc. 2012
Demo
15
Credit Request Processing Flow - After
HDP
16
Example: HTTP Header Collection
© Hortonworks Inc. 2012© Hortonworks Inc. 2012
Example: Data Available During Ingest
• Record count• Highest/Lowest record length• Average record length• Compression ratio
But with a little more work. . .• Field parsing - unstructured data is not all that unstructured…
–Unique values–Unique values per field–Access to values of each field independently from the record–Relatively fast field-based searches, without indexing–Value encoding–Etc…
These are cross-cutting concerns!Page 17
© Hortonworks Inc. 2012
Demo
© Hortonworks Inc. 2012
Thank You!Questions & Answers
Follow: @tmccuch, @z_oleg, @hortonworks
Page 19