Top Banner
© Copyright 2000-2014 TIBCO Software Inc. Hadoop and Data Warehouse – Friends, Enemies or Profiteers? What about Real Time? Kai Wähner [email protected] @KaiWaehner www.kai-waehner.de
70

"Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?" - Slides (including TIBCO Examples) from JAX 2014 Online

Jan 27, 2015

Download

Technology

Kai Waehner

I discuss a good big data architecture which includes Data Warehouse / Business Intelligence + Apache Hadoop + Real Time / Stream Processing. Several real world example are shown. TIBCO offers some very nice products for realizing these use cases, e.g. Spotfire (Business Intelligence / BI), StreamBase (Stream Processing), BusinessEvents (Complex Event Processing / CEP) and BusinessWorks (Integration / ESB). TIBCO is also ready for Hadoop by offering connectors and plugins for many important Hadoop frameworks / interfaces such as HDFS, Pig, Hive, Impala, Apache Flume and more.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: "Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?" - Slides (including TIBCO Examples) from JAX 2014 Online

© Copyright 2000-2014 TIBCO Software Inc.

Hadoop and Data Warehouse – Friends, Enemies or Profiteers? What about Real Time? Kai Wähner [email protected] @KaiWaehner www.kai-waehner.de

Page 2: "Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?" - Slides (including TIBCO Examples) from JAX 2014 Online

© Copyright 2000-2014 TIBCO Software Inc.

Disclaimer

!

These opinions are my own and do not necessarily represent my employer

Page 3: "Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?" - Slides (including TIBCO Examples) from JAX 2014 Online

© Copyright 2000-2014 TIBCO Software Inc.

Key Messages

Big Data is not just Hadoop, concentrate on Business Value!

A good Big Data Architecture combines DWH, Hadoop and Real Time!

The Integration Layer is getting even more important in the Big Data Era!

Page 4: "Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?" - Slides (including TIBCO Examples) from JAX 2014 Online

© Copyright 2000-2014 TIBCO Software Inc.

Agenda

•  Terminology •  Data Warehouse and Business Intelligence •  Big Data Processing with Hadoop •  Big Data Processing in Real Time

Page 5: "Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?" - Slides (including TIBCO Examples) from JAX 2014 Online

© Copyright 2000-2014 TIBCO Software Inc.

Agenda

•  Terminology •  Data Warehouse and Business Intelligence •  Big Data Processing with Hadoop •  Big Data Processing in Real Time

Page 6: "Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?" - Slides (including TIBCO Examples) from JAX 2014 Online

© Copyright 2000-2014 TIBCO Software Inc.

Big Data Architecture

DWH  /  BI  

Hadoop  

Real  Time  

Big  Data  Architecture  

Page 7: "Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?" - Slides (including TIBCO Examples) from JAX 2014 Online

© Copyright 2000-2014 TIBCO Software Inc.

DWH means analyzing OLAP Cubes

h9p://www.exforsys.com/tutorials/msas/data-­‐warehouse-­‐database-­‐and-­‐oltp-­‐database.html  

Page 8: "Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?" - Slides (including TIBCO Examples) from JAX 2014 Online

© Copyright 2000-2014 TIBCO Software Inc.

Big Data means analyzing Everything

h9p://blogs.teradata.com/internaDonal/tag/hadoop/  

•  Store  everything  •  Even  without  structure  •  Use  whatever  you  need  (now  or  later)  

Page 9: "Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?" - Slides (including TIBCO Examples) from JAX 2014 Online

© Copyright 2000-2014 TIBCO Software Inc.

Big Data: Three shifts in the Way we analyze Information

•  Messiness:  Using  ALL  data,  not  just  samples  •  Also  bad  data  (e.g.  Word  spell  checker,  Google  auto-­‐complete  and  „did  

you  mean...“  recommendaDon    

•  Correla-ons:  Instead  of  causaliDes  •  May  not  tell  us  WHY  something  is  happening,  but  THAT  it  is  happening  •  In  many  situaDons,  this  is  good  enough  •  What  drug  substance  cures  cancer?  When  should  I  buy  an  airplane  Dcket?    

•  Datafica-on:  Store,  process,  combine,  reuse,  enhance  all  data!  •  DigitalisaDon  (Amazon  Kindle  à  Read)  vs.  DataficaDon  (Google  Books  à  

Read,  Search,  Process,  ...)    •  Words  becomes  data:  Google  books:  not  just  read,  but  also  search,  

analyse,  etc.  •  LocaDons  becomes  data:  GPS:  not  just  navigaDon,  but  also  insurance  

costs,  economic  routes,  etc.      

Page 10: "Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?" - Slides (including TIBCO Examples) from JAX 2014 Online

© Copyright 2000-2014 TIBCO Software Inc.

What is Big Data? The combined Vs of Big Data

Volume    (terabytes,  petabytes)  

       

         

Variety    (social  networks,  blog  posts,  logs,  sensors,  etc.)  

         Velocity                (realDme)  

       

Value  

X

Page 11: "Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?" - Slides (including TIBCO Examples) from JAX 2014 Online

© Copyright 2000-2014 TIBCO Software Inc.

Real Time

Wikipedia Definition: •  Real time programs must guarantee response within strict time constraints, often referred to as

"deadlines”. Real time responses are often understood to be in the order of milliseconds, and sometimes microseconds.

•  The term "near real time” refers to the time delay introduced, by automated data processing or network transmission.

•  The distinction between the terms "near real time" and "real time" is somewhat nebulous and must be defined for the situation at hand.

Hereby, for this talk, I define: –  Real time == response in nanoseconds || microseconds || milliseconds || <= one second –  Near real time == (response time > one second)

Page 12: "Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?" - Slides (including TIBCO Examples) from JAX 2014 Online

© Copyright 2000-2014 TIBCO Software Inc.

Agenda

•  Terminology •  Data Warehouse and Business Intelligence •  Big Data Processing with Hadoop •  Big Data Processing in Real Time

Page 13: "Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?" - Slides (including TIBCO Examples) from JAX 2014 Online

© Copyright 2000-2014 TIBCO Software Inc.

Big Data Architecture

DWH  /  BI  

Hadoop  

Real  Time  

Big  Data  Architecture  

Page 14: "Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?" - Slides (including TIBCO Examples) from JAX 2014 Online

© Copyright 2000-2014 TIBCO Software Inc.

DWH vs. BI

•  Data Warehouse (DWH) à Storage

•  Business Intelligence (BI) à Analytics •  Both terms are often used as synonym, i.e. when someone talks

about a DWH, this might include analytics

•  BI can be used without a DWH

Page 15: "Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?" - Slides (including TIBCO Examples) from JAX 2014 Online

© Copyright 2000-2014 TIBCO Software Inc.

Typical DWH Process

h9p://wikibon.org/blog/not-­‐your-­‐fathers-­‐data-­‐analyDcs/  

 A  DWH  is  „Business  Case  driven“:  •  ReporDng  •  Dashboards  •  Drill  Down  AnalyDcs  

 Different  DWH  OpDons:  •  Enterprise  DWH  (  ==  EDW)    •  Department  /  Project  DWH  •  Embedded  BI  (into  ApplicaDons)    

Page 16: "Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?" - Slides (including TIBCO Examples) from JAX 2014 Online

© Copyright 2000-2014 TIBCO Software Inc.

BI == Reporting + Statistics + Data Discovery

DWH  

BI  

Page 17: "Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?" - Slides (including TIBCO Examples) from JAX 2014 Online

© Copyright 2000-2014 TIBCO Software Inc.

BI Visualization

Page 18: "Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?" - Slides (including TIBCO Examples) from JAX 2014 Online

© Copyright 2000-2014 TIBCO Software Inc.

Products

DWH •  SQL: e.g. MySQL •  MPP: e.g. Teradata, EMC Greenplum, IBM Netezza

–  Scale very well (almost linear), very high performance, hardware / software costs also increase a lot

BI •  Microsoft Excel •  BI Tools: e.g. TIBCO Spotfire, Tableau, MicroStrategy Hint: Good BI tools •  allow data discovery / visualization using different sources, not just DWH •  are easy to use

Page 19: "Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?" - Slides (including TIBCO Examples) from JAX 2014 Online

© Copyright 2000-2014 TIBCO Software Inc.

BI Tool Example: TIBCO Spotfire

Page 20: "Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?" - Slides (including TIBCO Examples) from JAX 2014 Online

© Copyright 2000-2014 TIBCO Software Inc.

BI Tool Example: TIBCO Spotfire

The  whole  team  needs  analyDcs.  Spo`ire  is  for  everyone,  helping  users  with  a  variety  of  skill  levels  to  visualize,  explore  and  share  informaDon:  It  has    •  At-­‐a-­‐glance  business  facts  for  managers  •  Dashboards  for  front-­‐line  decision-­‐makers  •  Visual  discovery  for  business  users  •  Deep  data  exploraDon  for  analysts  •  Advanced  predicDve  analyDcs  for  

staDsDcians  •  And  beauDful  visualizaDons  to  empress  

your  execuDves  

Page 21: "Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?" - Slides (including TIBCO Examples) from JAX 2014 Online

© Copyright 2000-2014 TIBCO Software Inc.

Example: TIBCO Spotfire

Page 22: "Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?" - Slides (including TIBCO Examples) from JAX 2014 Online

© Copyright 2000-2014 TIBCO Software Inc.

Live Demo

„TIBCO  Spo`ire“  in  acDon...  

Page 23: "Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?" - Slides (including TIBCO Examples) from JAX 2014 Online

© Copyright 2000-2014 TIBCO Software Inc.

DWH Real World Use Case

h9p://spo`ire.Dbco.com/resources/content-­‐center?Content%20Type=Case%20Studies  

Page 24: "Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?" - Slides (including TIBCO Examples) from JAX 2014 Online

© Copyright 2000-2014 TIBCO Software Inc.

DWH Real World Use Case

h9p://spo`ire.Dbco.com/resources/content-­‐center?Content%20Type=Case%20Studies  

Page 25: "Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?" - Slides (including TIBCO Examples) from JAX 2014 Online

© Copyright 2000-2014 TIBCO Software Inc.

Embedded BI Real World Use Case

h9ps://www.jaspersod.com/embeddedShowcase/periscope.html  

Page 26: "Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?" - Slides (including TIBCO Examples) from JAX 2014 Online

© Copyright 2000-2014 TIBCO Software Inc.

Problems of a DWH

No flexibility / agility •  Just structured data •  Just some (maybe aggregated) history data •  Just good for already known business cases

Low speed •  ETL is batch, usually takes hours or sometimes even days •  No proactive reactions possible à “too late architecture”

High costs (per GB) •  Just selected data •  Too old data is often outsourced to archives

Page 27: "Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?" - Slides (including TIBCO Examples) from JAX 2014 Online

© Copyright 2000-2014 TIBCO Software Inc.

Classic BI vs. Big Data BI

Page 28: "Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?" - Slides (including TIBCO Examples) from JAX 2014 Online

© Copyright 2000-2014 TIBCO Software Inc.

Agenda

•  Terminology •  Data Warehouse and Business Intelligence •  Big Data Processing with Hadoop •  Big Data Processing in Real Time

Page 29: "Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?" - Slides (including TIBCO Examples) from JAX 2014 Online

© Copyright 2000-2014 TIBCO Software Inc.

Big Data Architecture

DWH  /  BI  

Hadoop  

Real  Time  

Big  Data  Architecture  

Page 30: "Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?" - Slides (including TIBCO Examples) from JAX 2014 Online

© Copyright 2000-2014 TIBCO Software Inc.

Why no longer DWH, but Hadoop?

Hadoop was built to solve problems of RDBMS and DWH… Benefits of Hadoop: •  Store and analyze all data

–  all data == not just selected (maybe aggregated) data –  all data == structured + semi-structured + unstructured à be more flexible, adapt to changing business cases

•  Better performance (massively parallel) •  Ad hoc data discovery – also for big data volumes •  Save money (commodity hardware, open source software)

Page 31: "Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?" - Slides (including TIBCO Examples) from JAX 2014 Online

© Copyright 2000-2014 TIBCO Software Inc.

What is Hadoop?

Apache Hadoop, an open-source software library, is a framework that allows for the distributed processing of large data sets across clusters of commodity hardware using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.

Page 32: "Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?" - Slides (including TIBCO Examples) from JAX 2014 Online

© Copyright 2000-2014 TIBCO Software Inc.

MapReduce

Simple  example:    

•  Input:  (very  large)  text  files  with  lists  of  strings,  such  as:      „318,  0043012650999991949032412004...0500001N9+01111+99999999999...“  

•  We  are  interested  just  in  some  content:  year  and  temperate  (marked  in  red)  •  The  Map  Reduce  funcDon  has  to  compute  the  maximum  temperature  for  every  year  

Example  from  the  book  “Hadoop:  The  DefiniDve  Guide,  3rd  EdiDon”  

Page 33: "Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?" - Slides (including TIBCO Examples) from JAX 2014 Online

© Copyright 2000-2014 TIBCO Software Inc.

Hadoop Products

MapReduce

HDFS Ecosystem

Features included

few many

Apache Hadoop

Page 34: "Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?" - Slides (including TIBCO Examples) from JAX 2014 Online

© Copyright 2000-2014 TIBCO Software Inc.

Hadoop Ecosystem

Page 35: "Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?" - Slides (including TIBCO Examples) from JAX 2014 Online

© Copyright 2000-2014 TIBCO Software Inc.

Hadoop Products

MapReduce

HDFS Ecosystem

Features included

Hadoop  DistribuDon  

few many

Apache Hadoop

Packaging Deployment-Tooling

Support

+

Page 36: "Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?" - Slides (including TIBCO Examples) from JAX 2014 Online

© Copyright 2000-2014 TIBCO Software Inc.

Hadoop Distributions

(…  some  more  arising)  

EMR  

Page 37: "Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?" - Slides (including TIBCO Examples) from JAX 2014 Online

© Copyright 2000-2014 TIBCO Software Inc.

Hadoop Products

MapReduce

HDFS Ecosystem

Features included

Hadoop  DistribuDon  

Big  Data  Suite  

few many

Apache Hadoop

Packaging Deployment-Tooling

Support

+ Tooling / Modeling Code Generation

Scheduling Integration

+

Page 38: "Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?" - Slides (including TIBCO Examples) from JAX 2014 Online

© Copyright 2000-2014 TIBCO Software Inc.

Big Data Integration Suite: TIBCO BusinessWorks

Page 39: "Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?" - Slides (including TIBCO Examples) from JAX 2014 Online

© Copyright 2000-2014 TIBCO Software Inc.

Live Demo

„TIBCO  BusinessWorks“  in  acDon...  

Page 40: "Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?" - Slides (including TIBCO Examples) from JAX 2014 Online

© Copyright 2000-2014 TIBCO Software Inc.

Hadoop Real World Use Case: Replace ETL to improve Performance

“The advantage of their new system is that they can now look at their data [from their log processing system] in anyway they want: •  Nightly MapReduce jobs collect statistics about their mail system such as spam counts by

domain, bytes transferred and number of logins. •  When they wanted to find out which part of the world their customers logged in from, a quick

[ad hoc] MapReduce job was created and they had the answer within a few hours. Not really possible in your typical ETL system.”

http://highscalability.com/how-rackspace-now-uses-mapreduce-and-hadoop-query-terabytes-data

(  no  TIBCO  reference)  

Page 41: "Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?" - Slides (including TIBCO Examples) from JAX 2014 Online

© Copyright 2000-2014 TIBCO Software Inc.

•  A lot of data must be stored „forever“ •  Numbers increase exponentially •  Goal: As cheap as possible •  Problem: Queries must still be possible (compliance!) •  Solution: Commodity servers and „Hadoop querying“

Global  Parcel  Service  

h9p://archive.org/stream/BigDataImPraxiseinsatz-­‐SzenarienBeispieleEffekte/Big_Data_BITKOM-­‐Lei`aden_Sept.2012#page/n0/mode/2up  

Hadoop Real World Use Case: Storage to reduce Costs

(  no  TIBCO  reference)  

Page 42: "Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?" - Slides (including TIBCO Examples) from JAX 2014 Online

© Copyright 2000-2014 TIBCO Software Inc.

DWH or Hadoop?

DWH   Hadoop  

Data   Structured   All  data  

Maturity   Established  in  Enterprise   New  concepts  

Tooling   Installed,  good  knowledge  and  experience  

New  tools,  coding  required,  business  can  sDll  use  SQL-­‐similar  queries  or  same  BI  tool  

Costs   High  (per  GB)   Low  (per  GB)  

Page 43: "Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?" - Slides (including TIBCO Examples) from JAX 2014 Online

© Copyright 2000-2014 TIBCO Software Inc.

DWH plus Hadoop?

DWH and Hadoop complement each other very well •  Store all data in Hadoop (cheap per GB) •  ETL from Hadoop to DWH (expensive per GB) •  Create specific reports / dashboards in DWH (leverage existing products and knowledge) •  Do Ad Hoc (Big) Data Discovery directly in Hadoop, no DWH needed Good BI tools support both, DWH and Hadoop! For example, TIBCO Spotfire has connectors to: •  RDBMS (e.g. MySQL) •  MPP (e.g. Teradata, IBM Netezza, Greenplum) •  Hadoop (e.g. Hive, Impala) •  In-Memory (e.g. TIBCO ActiveSpaces, SAP HANA)

Page 44: "Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?" - Slides (including TIBCO Examples) from JAX 2014 Online

© Copyright 2000-2014 TIBCO Software Inc.

Recommendation DWH vs. Hadoop vs. XYZ

•  Short  term:  Use  Hadoop  (only)  when  you  can  save  (a  lot  of)  money  or  when  you  can  not  solve  your  business  problem  without  Hadoop.  A  lot  of  things  have  to  be  improved,  e.g.  governance,  security,  performance,  and  tool  support.    •   Long  term:  Hadoop  can  replace  DWH  (as  you  can  create  a  DWH  on  top  of  Hadoop  with  SQL  interface  already  today)!    •  Be  aware:  A  lot  of  other  opDons  emerge  for  analyzing  big  data  besides  Hadoop,  e.g.  

-­‐  AnalyDcal  databases  with  SQL  interface  (MemSQL,  Citus  Data)  -­‐  Log  AnalyDcs  (Splunk,  TIBCO  LogLogic)  -­‐  Graph  databases  (Neo4j,  InfiniteGraph)  

Page 45: "Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?" - Slides (including TIBCO Examples) from JAX 2014 Online

© Copyright 2000-2014 TIBCO Software Inc.

Vendors Strategy...

Hadoop vendors push Hadoop as DWH replacement à Called e.g. „Enterprise Data Hub“ (Cloudera) or „Data Lake“ (Hortonworks)

h9p://gigaom.com/2013/10/29/clouderas-­‐plan-­‐to-­‐become-­‐the-­‐center-­‐of-­‐your-­‐data-­‐universe/   h9p://hortonworks.com/wp-­‐content/uploads/downloads/2013/04/Hortonworks.ApacheHadoopPa9ernsOfUse.v1.0.pdf  

Page 46: "Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?" - Slides (including TIBCO Examples) from JAX 2014 Online

© Copyright 2000-2014 TIBCO Software Inc.

Vendors Strategy...

MPP / DWH vendors add Hadoop support as complementary addon to their DWH

à  Reason (probably): Market pressure! à  Benefit: One platform (including tooling and support) for DWH and Hadoop

Page 47: "Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?" - Slides (including TIBCO Examples) from JAX 2014 Online

© Copyright 2000-2014 TIBCO Software Inc.

Example: EMC combines DWH and Hadoop

h9p://wikibon.org/wiki/v/EMC_Integrates_Greenplum_DB_and_Hadoop_with_Pivotal_HD   h9p://www.gopivotal.com/big-­‐data/pivotal-­‐hd  

Page 48: "Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?" - Slides (including TIBCO Examples) from JAX 2014 Online

© Copyright 2000-2014 TIBCO Software Inc.

Example: Teradata combines DWH and Hadoop

h9p://www.teradata.com/Teradata-­‐Enterprise-­‐Access-­‐for-­‐Hadoop/  

h9p://gigaom.com/2014/04/07/teradata-­‐says-­‐hadoop-­‐is-­‐good-­‐for-­‐business-­‐but-­‐for-­‐how-­‐long/  

Page 49: "Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?" - Slides (including TIBCO Examples) from JAX 2014 Online

© Copyright 2000-2014 TIBCO Software Inc.

Hadoop evolving from Batch to Near Real Time

Hadoop is MapReduce == Batch (== hours, minutes, seconds) •  Good for complex transformations / computations of big data volumes •  Not so good for ad hoc data exploration •  Improvements: Hive Stinger (Hortonworks) etc. Non-MapReduce processing engines added in the meantime (YARN makes it possible) •  Ad hoc data discovery (== seconds) •  Hive / Pig with Apache Tez replacing MapReduce under the hood for data processing •  New Query engines, e.g. Impala (Cloudera) or Apache Drill (MapR) MPP vendors (e.g. Teradata, EMC Greenplum) also add own query engines •  Offer fast data exploration (without MapReduce)

Some Hadoop problems remain •  No good, easy tooling (Hadoop ecosystem) à might be solved next years •  Missing maturity (alpha / beta versions) à might be solved next years •  No “real time” (== ms, ns), but “near real time” (> 1 sec) à “too late architecture”

Page 50: "Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?" - Slides (including TIBCO Examples) from JAX 2014 Online

© Copyright 2000-2014 TIBCO Software Inc.

Agenda

•  Terminology •  Data Warehouse and Business Intelligence •  Big Data Processing with Hadoop •  Big Data Processing in Real Time

Page 51: "Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?" - Slides (including TIBCO Examples) from JAX 2014 Online

© Copyright 2000-2014 TIBCO Software Inc.

Big Data Architecture

DWH  /  BI  

Hadoop  

Real  Time  

Big  Data  Architecture  

Page 52: "Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?" - Slides (including TIBCO Examples) from JAX 2014 Online

© Copyright 2000-2014 TIBCO Software Inc.

Real Time: “The Two-Second Advantage”

“A  li&le  bit  of  the  right  informa2on,  just  a  li&le  bit  beforehand  –  whether  it  is  a  couple  of  seconds,  minutes  or  hours  –  is  more  valuable  than  all  of  the  informa2on  in  the  world  six  months  later…  this  is  the  two-­‐second  advantage.”                                    Vikek  Ranadivé,  Founder  and  CEO  of  TIBCO  

Page 53: "Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?" - Slides (including TIBCO Examples) from JAX 2014 Online

© Copyright 2000-2014 TIBCO Software Inc.

The Value of Data decreases over Time

Page 54: "Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?" - Slides (including TIBCO Examples) from JAX 2014 Online

© Copyright 2000-2014 TIBCO Software Inc.

What is Big Data? The combined Vs of Big Data

Volume    (terabytes,  petabytes)  

       

         

Variety    (social  networks,  blog  posts,  logs,  sensors,  etc.)  

         Velocity                (realDme)  

       X

Fast    Data  

Page 55: "Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?" - Slides (including TIBCO Examples) from JAX 2014 Online

© Copyright 2000-2014 TIBCO Software Inc.

Real Time Architecture?

EVENTS  

Mainframe/ERP/DB/App  

ACTION  

TransacDon  Based  Architectures  

EVENTS  

Mainframe/ERP/DB/App  

ACTION  

Behavior  Based  Architectures  

TransacDon  

Data,  Event  and  AnalyDcs  

Not  ElasDc,  Doesn’t  Scale,    “Always  Late”  architecture  and  analyDcs  

   

ElasDc,  Scales,  Real  Dme  architecture    (Events,  Data  and  AnalyDcs)  

Page 56: "Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?" - Slides (including TIBCO Examples) from JAX 2014 Online

© Copyright 2000-2014 TIBCO Software Inc.

Complex Event / Stream Processing / In-Memory

Concepts •  Streams: Monitoring millions of events in a specific time window to react proactively •  Stateful: Collect, filter and correlate events with state to anticipate outcomes and react proactively •  Transactional: Highly performant transactional event processing Products vs. Frameworks •  Products are mature, mission-critical, in production, e.g. TIBCO StreamBase, IBM InfoSphere Streams •  Open Source Frameworks, e.g. “Apache Spark” and “Apache Storm”

–  Future will tell us about performance, tooling, support, etc. –  Can be combined with Hadoop –  Are complementary to Products such as TIBCO StreamBase

In-Memory •  Can also be used for “big data” (Terabytes possible!) •  Usually complementary, i.e. they can be / have to be combined with stream processing / complex event

processing

Page 57: "Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?" - Slides (including TIBCO Examples) from JAX 2014 Online

© Copyright 2000-2014 TIBCO Software Inc.

Stream Processing Architecture

LiveView Datamart Con-nuous  Query  

Continuous Query Processor

Ad  Hoc  Query  

Alerts  

CEP  

Messaging  (low  latency)  

Messaging  (JMS)  

Social  Media  Data  

Market  Data  

In-­‐Memory  

ESB  Integra-on  

Sensor  Data  Historical  Data  

JDBC  Ac-veSpaces  

Enterprise  data  

Page 58: "Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?" - Slides (including TIBCO Examples) from JAX 2014 Online

© Copyright 2000-2014 TIBCO Software Inc.

Stream Processing Architecture (Example: TIBCO StreamBase)

TIBCO StreamBase Con-nuous  Query  

Continuous Query Processor

Ad  Hoc  Query  

Alerts  

Active Tables

Trading  Signal  

Transac-on  Cost  

Orders  /  Execu-ons  

Market  Data  

Alert  SeMng  

TIBCO LiveView Snapshot  AND  always-­‐live  updates  

Quickly  connect  to  streams  

An;cipate  opportuni;es,  proac;ve  ac;on  

Page 59: "Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?" - Slides (including TIBCO Examples) from JAX 2014 Online

© Copyright 2000-2014 TIBCO Software Inc.

Example: TIBCO StreamBase Tooling

StreamBase Development Studio •  Visual Development •  Visual Debugging •  Feed Simulation •  Unit Testing

StreamBase LiveView •  Real Time Analytics and Visualization •  Ad hoc queries •  Alerts and Notifications •  Web, Mobile and API Integration

Page 60: "Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?" - Slides (including TIBCO Examples) from JAX 2014 Online

© Copyright 2000-2014 TIBCO Software Inc.

Real World: Real-Time Trade Surveillance

Applica-ons  IntegraDon  

NormalizaDon  AggregaDon  CorrelaDon  

Rules  Alerts  

AutomaDon  

Adapters    and    

Handlers  

Adapters  and  

Handlers  

StreamBase  Server(s)  

StreamBase  Studio  for  Developing  EventFlow  Applica-ons    

Data  Management      

Persistence  Stores  

Logs  

Market  Data  

Trade  Data  

Sta-c  Data  

Systems  Data  

Performance  Benchmarks  

Automa-on  

Desktop  

Alerts  

Inputs   Outputs  

Page 61: "Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?" - Slides (including TIBCO Examples) from JAX 2014 Online

© Copyright 2000-2014 TIBCO Software Inc.

Real Time (Stream Processing) Real World Use Case

   Real-­‐Time  Fraud  DetecDon                    “The  firm  needs  to  monitor  machine-­‐driven  algorithms,  and  look  for  suspicious  pa9erns.  Sounds  simple,  right?  Not  so  simple!  In  this  case,  the  pa9erns  of  interest  required  correlaDon  of  5  streams  of  real-­‐Dme  data.  Pa9erns  happen  within  15-­‐30  second  windows,  during  which  thousands  of  dollars  could  be  lost.  A9acks  come  in  bursts.  The  data  required  to  find  these  pa9erns  was  loaded  into  a  data  warehouse  and  reports  were  checked  each  day.  Decisions  to  act  were  made  every  day.  LiveView  now  intercepts  the  data  before  it  hit  the  warehouse  by  connecDng  LiveView  to  the  source  of  data.  It  took  3  days  to  integrate  these  sources  because  it  took  that  long  to  find  someone  who  knew  where  3  of  the  data  streams  came  from!  StreamBase  detects  fraud  pa9erns  in  milliseconds.  But  the  really  interesDng  part  came  next.  Once  this  firm  could  see  pa9erns  of  fraud,  they  were  faced  with  a  new  challenge:  what  to  DO  about  it?  How  many  Dmes  did  the  pa9ern  need  to  be  repeated  unDl  acDve  surveillance  is  started?    Should  the  acDon  be  quaranDned  for  a  period,  or  halted  immediately?  All  these  quesDons  were  new,  and  the  answers  to  them  keeps  changing.  The  fact  that  the  answers  keep  changing  highlights  the  importance  of  ease  of  use.  AnalyDcs  must  be  changed  quickly  and  be  made  available  to  fraud  experts  -­‐  in  some  cases,  in  hours  -­‐  as  understanding  deepens,  and  as  the  bad  guys  change  their  tacDcs.  Be9er,  higher  value-­‐add  customer  service  for  highly  automated  industries.  Knowledge  workers  who  anDcipate  sales  opportuniDes.  Spowng  fraud  in  high-­‐speed  transacDons  streams  and  taking  acDon.“    Some  more  use  cases:  h9p://streambase.typepad.com/streambase_stream_process/2012/04/streambase-­‐liveview-­‐10-­‐3-­‐stories-­‐from-­‐the-­‐trenches.html  

Page 62: "Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?" - Slides (including TIBCO Examples) from JAX 2014 Online

© Copyright 2000-2014 TIBCO Software Inc.

Real Time (CEP + In-Memory) Real World Use Case

“With  38  million  fans,  MGM  knows  how  to  put  its  customers  first,  it  takes  more  than  a  smile  too.  Customers  want  a  personalized,  tailored  experience,  one  that  knows  their  name  and  can  anDcipate  their  needs.  With  the  help  of  TIBCO  technologies  that  leverage  big  data  and  give  customers  a  digital  idenDty,  MGM  can  send  personalized  offers  directly  to  customers,  save  them  a  seat,  and  have  their  favorite  drink  on  the  way.  With  mulDple  customer  touch  points  and  channels,  MGM  can  reach  customers  in  more  ways,  and  in  more  places,  than  ever  before.”    

h9ps://www.youtube.com/watch?v=X-­‐7S3kCOx9k  

CEP:  •  Correlate  •  Analyze  •  AcDon  

In-­‐Memory:  •  Enable  Real  Time  •  Only  customers  that  have  checked  in  

Page 63: "Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?" - Slides (including TIBCO Examples) from JAX 2014 Online

© Copyright 2000-2014 TIBCO Software Inc.

Live Demo

„TIBCO  StreamBase“  in  acDon...  

Page 64: "Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?" - Slides (including TIBCO Examples) from JAX 2014 Online

© Copyright 2000-2014 TIBCO Software Inc.

Hadoop: •  Storage •  Complex computing (MapReduce)

Real Time: •  Immediate (proactive) reactions – automated or manually by user •  Monitor streaming data in Real Time

Example: TIBCO StreamBase and its Apache Flume connector for reading streaming data from Hadoop / HDFS or to send streaming data to Hadoop / HDFS

Real Time plus Hadoop?

Page 65: "Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?" - Slides (including TIBCO Examples) from JAX 2014 Online

© Copyright 2000-2014 TIBCO Software Inc.

Use Case: •  Predict pricing movement in live bets Hadoop: •  Store all history information about all past bets •  Use MapReduce to precompute odds for new

matches, based on all history data

TIBCO StreamBase: •  Compute new odds in real time to react within a live

game after events (e.g. when a team scores a goal) •  Monitor stream data in real time dashboards

Real Time plus Hadoop Real World Use Case

h9p://www.casestudyu.com/news/2014/04/04/7762652.htm  

h9p://vimeo.com/91461315  

Page 66: "Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?" - Slides (including TIBCO Examples) from JAX 2014 Online

© Copyright 2000-2014 TIBCO Software Inc.

Recap: Big Data Architecture

DWH  /  BI  

Hadoop  

Real  Time  

Big  Data  Architecture  

Page 67: "Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?" - Slides (including TIBCO Examples) from JAX 2014 Online

© Copyright 2000-2014 TIBCO Software Inc.

Off Topic

What about Integration?

Page 68: "Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?" - Slides (including TIBCO Examples) from JAX 2014 Online

© Copyright 2000-2014 TIBCO Software Inc.

Off Topic

Integration is no talking point in this session… However: It gets even more important in the future! The number of different data sources and technologies increases even more than in the past

–  CRM, ERP, Host, B2B, etc. will not disappear –  DWH, Hadoop cluster, event / streaming server, In-

Memory DB have to communicate –  Cloud, Mobile, Internet of Things are no option, but our

future!

Page 69: "Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?" - Slides (including TIBCO Examples) from JAX 2014 Online

© Copyright 2000-2014 TIBCO Software Inc.

Recap: Key Messages

Big Data is not just Hadoop, concentrate on Business Value!

A good Big Data Architecture combines DWH, Hadoop and Real Time!

The Integration Layer is getting even more important in the Big Data Era!

Page 70: "Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?" - Slides (including TIBCO Examples) from JAX 2014 Online

© Copyright 2000-2014 TIBCO Software Inc.

Questions? Kai Wähner [email protected], @KaiWaehner, www.kai-waehner.de