Top Banner
Real-Time BI in Hadoop Bradford Stephens Lead Engineer, Visible Technologies Principal Consultant, Drawn to Scale Consulting
21

Hw09 Real Time Business Intelligence

Aug 20, 2015

Download

Technology

Cloudera, Inc.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Hw09   Real Time Business Intelligence

Real-Time BI in HadoopBradford Stephens

Lead Engineer, Visible TechnologiesPrincipal Consultant, Drawn to Scale Consulting

Page 2: Hw09   Real Time Business Intelligence

Topics

•Scalability and BI

•Costs and Abilities

•Search as BI

Page 3: Hw09   Real Time Business Intelligence
Page 4: Hw09   Real Time Business Intelligence
Page 5: Hw09   Real Time Business Intelligence
Page 6: Hw09   Real Time Business Intelligence

What Is BI?

Page 7: Hw09   Real Time Business Intelligence
Page 8: Hw09   Real Time Business Intelligence

What is “Real-Time”

•Understanding Latency

•We aim for <5 secs.

Page 9: Hw09   Real Time Business Intelligence
Page 10: Hw09   Real Time Business Intelligence

Scalability in BI

•Scalbility matters now

•Social Media: Catalyst

•All data is important

•Data doesn’t scale with business size any more

Page 11: Hw09   Real Time Business Intelligence

Search as BI

•Katta = Distributed Search on Haddoop

•Bobo = Faceted Lucene

Page 12: Hw09   Real Time Business Intelligence
Page 13: Hw09   Real Time Business Intelligence
Page 14: Hw09   Real Time Business Intelligence
Page 15: Hw09   Real Time Business Intelligence
Page 16: Hw09   Real Time Business Intelligence
Page 17: Hw09   Real Time Business Intelligence

Doing it Cheap

•100 TB, Structured and Unstructured

•Oracle- $100,000,000

•“NewSQL” - $4,000,000

•Hadoop + Katta - $250,000

Page 18: Hw09   Real Time Business Intelligence

Why We Need Hadoop

•Need to process high-latency data to get the “small stuff” fast

•Robust Ecosystem

•Need more than SQL. RDBMS not a Swiss-Army Knife

Page 19: Hw09   Real Time Business Intelligence

Aggregation is Real-Time

•Distributed Search w/ Katta + Facets = Aggregation-Based BI

•Sum, Count, Filter, Avg, Group

Page 20: Hw09   Real Time Business Intelligence

Protips: Review

•Understand High vs. Low Latency data

•Hadoop makes it cheap

•Pre-aggregate w/ Hadoop, Explore w/ Katta + Faceted Search

Page 21: Hw09   Real Time Business Intelligence

The Future

•Search/BI as a Platform: “Google my Data Warehouse”

•Real-Time MR on HBase