Top Banner
© 2012 IBM Corporation Smarter Computing - Big Data 11 June 2012 Dipl.Ing.Wolfgang Nimführ Information Agenda Executive Consultant Big Data Tiger Team IBM Software Group Europe [email protected]
44

IBM CEC Big Data 2011 06-11 final

Jan 20, 2015

Download

Technology

COMMON Europe

COMMON Europe Congress 2012 - Vienna
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: IBM CEC Big Data 2011 06-11 final

© 2012 IBM Corporation

Smarter Computing - Big Data11 June 2012

Dipl.Ing.W olfgang Nimführ

Information Agenda Executive ConsultantBig Data Tiger TeamIBM Software Group Europe

[email protected]

Page 2: IBM CEC Big Data 2011 06-11 final

© 2012 IBM Corporation2

Legal Disclaimer

© IBM Corporation 2012. All Rights Reserved.

The information contained in this publication is provided for informational purposes only. While efforts were made to verify the completeness and accuracy of the information contained in this publication, it is provided AS IS without warranty of any kind, express or implied. In addition, this information is based on IBM’s current product plans and strategy, which are subject to change by IBM without notice. IBM shall not be responsible for any damages arising out of the use of, or otherwise related to, this publication or any other materials. Nothing contained in this publication is intended to, nor shall have the effect of, creating any warranties or representations from IBM or its suppliers or licensors, or altering the terms and conditions of the applicable license agreement governing the use of IBM software.

References in this presentation to IBM products, programs, or services do not imply that they will be available in all countries in which IBM operates. Product release dates and/or capabilities referenced in this presentation may change at any time at IBM’s sole discretion based on market opportunities or other factors, and are not intended to be a commitment to future product or feature availability in any way. Nothing contained in these materials is intended to, nor shall have the effect of, stating or implying that any activities undertaken by you will result in any specific sales, revenue growth or other results.

Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision. The information mentioned regarding potential future products is not a commitment, promise, or legal ob ligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract. The development, release, and timing of any future features or functionality described for our products remains at our sole discretion.

Page 3: IBM CEC Big Data 2011 06-11 final

© 2012 IBM Corporation3

Welcome to the Instrumented Interconnected World!

INSTRUMENTED

INTERCONNECTED

INTELLIGENT

Build a Smarter PlanetBuild a Smarter Planet

Page 4: IBM CEC Big Data 2011 06-11 final

© 2012 IBM Corporation4

Why Big Data

“most enterprise data warehouse (EDW) and BI teams currently lack a clear understanding of big data technologies… They are increasingly asking the question, "How can we use big data to deliver new insights?"Gartner 2012

Searches for "big data" on Gartner's website have increased 981% between March 2011 -October 2011

Big Data - We are at a huge inflection point and this opportunity comes only once.

We are declaring that IBM is the #1 leader in providing a Big Data platform.

Alyse Passarelli, WW VP IM Sales Jan 10th 2012

“Big Data: The next frontier for innovation, competition and productivity”McKinsey Global Institute 2012 will be the year of 'big data' BBC

Nov 30 2011

Big Data will be the CIO Issue of 2012

IDC Prediction 2012 report

Page 5: IBM CEC Big Data 2011 06-11 final

© 2012 IBM Corporation55

2009 800,000 petabytes

2020 35 zettabytes

as much Data and ContentOver Coming Decade

44xBusiness leaders frequently make decisions based on information they don’t trust, or don’t have

1 in 3

83%of CIOs cited “Business intelligence and analytics” as part of their visionary plansto enhance competitiveness

Business leaders say they don’t have access to the information they need to do their jobs1 in 2

of CEOs need to do a better job capturing and understanding information rapidly in order to make swift business decisions

60%Organizations Need Deeper Insights

Of world’s datais unstructured

80%

The Information Explosion in Data and Real World Ev ents

Page 6: IBM CEC Big Data 2011 06-11 final

© 2012 IBM Corporation6

Data AVAILABLE to an organization

Data an organization can PROCESS

� The percentage of available data an enterprise can analyze is decreasing proportionately to the available to it

� Quite simply, this means as enterprises, we are getting “more naive” about our business over time

� We don’t know what we could already know….

The Blind Spot

The resulting explosion of information creates a ne ed for a new kind of intelligence

Missing Insights and Analytics

Page 7: IBM CEC Big Data 2011 06-11 final

© 2012 IBM Corporation7

Challenge Study a Large Volume and Variety of Data to Find Ne w Insights

Identify criminals and threats from disparate video, audio, and data feeds

Make risk decisions and frauds detection based on real-time transactional data

Predict weather patterns to plan optimal wind turbine usage, and optimize capital expenditure on asset placement

Support medical diagnosticsDetect life-threatening conditions

Multi-channel customer sentiment and experience a analysis

Page 8: IBM CEC Big Data 2011 06-11 final

© 2012 IBM Corporation8

How do you address the challenges presented by empo wered market participants generating mountains of data?

Source: 1 – Barrera, Clod and Wojtowecz. “Cloud Leads Five Storage Trends for 2011.” CIO. Januar y 27, 2011 . 2 – http: //www.i nternetworlds tats .com/stats.htm. 3 – http: //www.abiresearch.com/pr ess/3584-More+than+Seven+Trillion+SMS+Messages+Will+Be+Sent+in+2011

Can you capture data generated by these interactions?

Can you turn that data into insights to predictcustomer / competitive / market behavior?

Can you do it in real-time?

Leveraging Big Data Analytics

Page 9: IBM CEC Big Data 2011 06-11 final

© 2012 IBM Corporation9

How does Big Data Analytics impact business?

Source: Outperforming in a Data Rich, H yper Connect ed W orld, an IB M Center for Applied Insights research report. Copyright © IBM 2012

Deploying these competencies extensively correlates to long-term financial performance

Listen and Anticipate consistently deployed across the enterprise correlate to higher compound annual growth rates (5-year CAGR, 2005-2010)

Page 10: IBM CEC Big Data 2011 06-11 final

© 2012 IBM Corporation10

Information Management Capabilities

• Event triggers• Customer Profitability

analysis• Complaint Data• Voice to Text Data• Transactional data• Policy & Procedure

data

DashboardsData Scientist Call CenterClient Mgr

• Relationship / risk data

• Product profitability data

• Email correspondents

• Company website logs

Internal Data

Big Data Analytics

Hub

Natural Language

External Data

• Web Logs• Twitter feeds• Facebook chats• YouTube Video• Blogs/Posting• Appraisal data• Credit bureau data

Leveraging Big Data Analytics can improve Experienc e

……

Page 11: IBM CEC Big Data 2011 06-11 final

© 2012 IBM Corporation11

On 16 Feb 2011 the IBM Watson system won Jeopardy!

Can we design a computing system that rivals a human’s ability to answer questions posed in natural language, interpreting meaning and context and

retrieving, analyzing and understanding vast amounts of information in real-time?

Page 12: IBM CEC Big Data 2011 06-11 final

© 2012 IBM Corporation12

IBM Watson‘s project started 2007

“IBM is not in the entertainment business. But we are in the business of technology and pushing frontiers.”

David Shepler, IBM Research Program Manager

• Project started in 2007, lead David Ferrucci

• Initial goal: create a system able to process natural language & extract knowledge faster than any other computer or human

• Jeopardy! was chosen because it’s a huge challenge for a computer to find the questions to such “human” answers under time pressure

• Watson was NOT online!

• Watson weighs the probability of his answer being right – doesn’t ring the buzzer if he’s not confident enough

• Which questions Watson got wrong almost as interesting as which he got right!

Page 13: IBM CEC Big Data 2011 06-11 final

© 2012 IBM Corporation13

Different Types of Evidence: Keyword Evidence

celebrated

India

In May 1898

400th anniversary

arrival in

Portugal

India

In May

Garyexplorer

celebrated

anniversary

in Portugal

Keyword MatchingKeyword Matching

Keyword MatchingKeyword Matching

Keyword MatchingKeyword Matching

Keyword MatchingKeyword Matching

Keyword MatchingKeyword Matching

arrived in

In May, Gary arrived in India after he celebrated hisanniversary in Portugal .

In May 1898 Portugal celebrated the 400th anniversary of this explorer’s arrival in India.

Evidence suggests “Gary”is the answer BUT the system must learn that keyword matching may be weak relative to other types of evidence

Page 14: IBM CEC Big Data 2011 06-11 final

© 2012 IBM Corporation14

On 27th May 1498, Vasco da Gama landed in Kappad Beach

On 27th May 1498, Vasco da Gama landed in Kappad Beach

celebrated

May 1898 400th anniversary

arrival in

In May 1898 Portugal celebrated the 400th anniversary of this explorer’s arrival in India.

Portugal

landed in

27th May 1498

Vasco da Gama

Temporal Reasoning

Statistical Paraphrasing

GeoSpatialReasoning

explorer

On 27th May 1498, Vasco da Gama landed in Kappad BeachOn the 27 th of May 1498, Vasco da

Gama landed in Kappad Beach

Kappad Beach

Para-phrases

Geo-KB

DateMath

India

Stronger evidence can be much harder to find and score.

The evidence is still not 100% certain.

�Search Far and Wide

�Explore many hypotheses

�Find Judge Evidence

�Many inference algorithms

Different Types of Evidence: Deeper Evidence

Page 15: IBM CEC Big Data 2011 06-11 final

© 2012 IBM Corporation15

Question100s Possible

Answers

1000’s of Pieces of Evidence

Multiple Interpretations

100,000’s scores from many simultaneous Text Analysis Algorithms100s sources

. . .

HypothesisGeneration

Hypothesis and Evidence Scoring

Final Confidence Merging & Ranking

SynthesisQuestion &

Topic Analysis

QuestionDecomposition

HypothesisGeneration

Hypothesis and Evidence Scoring

Answer & Confidence

DeepQA: Massively Parallel Probabilistic Evidence-Based Arc hitecture

Page 16: IBM CEC Big Data 2011 06-11 final

© 2012 IBM Corporation16

Maximum Benefit Requires Combining Deep and Reactive Analytics

Predictive Analytics

100,000 records/sec, 6B/day10 ms/decision

6 PB f or Deep Analytics

DeepQA

100s GB for Deep Analytics

3 sec/decision

1 PB training corpus

Smart Traffic

250K GPS probes/sec

630K segments/sec

2 ms/decision, 4K vehicles

Real time Optimization

100,000 updates/sec,5 ms/decision

Round-trip automation10 PB f or Deep Analytics

Dat

a S

cale

Decision FrequencyOccasional Frequent Real-time

Traditional Data Warehouse and BusinessIntelligence

Integration

Inte

grat

ion

yr mo wk day hr min sec … ms µs

Exa

Peta

Tera

Giga

Mega

Kilo

Feedback

Reactive Analytics

Reality � Actions

Fast

Observations

History

DeepAnalytics

Deep

Hypotheses � Predictions

Integration

Page 17: IBM CEC Big Data 2011 06-11 final

© 2012 IBM Corporation17

Traditional Approach vs Big Data Approach

IT

Structures the data to answer that question

IT

Delivers a platform to enable creative discovery

Business

Explores what questions could be asked

Business Users

Determine what question to ask

Monthly sales reportsProfitability analysisCustomer surveys

Brand sentimentProduct strategyMaximum asset utilization

Big Data ApproachIterative & Exploratory Analysis

Traditional ApproachStructured & Repeatable Analysis

Page 18: IBM CEC Big Data 2011 06-11 final

© 2012 IBM Corporation18

Big Data use cases across all industries

Utilities� Weather impact analysis on

power generation� Transmission monitoring� Smart grid management

Retail� 360° View of the Customer� Click-stream analysis� Real-time promotions

Law Enforcement� Real-time multimodal surveillance� Situational awareness� Cyber security detection

Transportation� Weather and traffic

impact on logistics and fuel consumption

Financial Services� Fraud detection� Risk management� 360° View of the Customer

IT� Transition log analysis

for multiple transactional systems

� Cybersecurity

Health & Life Sciences� Epidemic early warning

system� ICU monitoring� Remote healthcare monitoring

Telecommunications� CDR processing� Churn prediction� Geomapping / marketing� Network monitoring

Page 19: IBM CEC Big Data 2011 06-11 final

© 2012 IBM Corporation19

Monetizing Relationships - not just Transactions

Social Network PublicDatabase

Amy Bearn

32, Married, mother of 3,Accountant

Telco Score: 91CPG Score: 76Fashion Score: 88

Telc

oco

mpa

ny

How v aluable is Amy to my mobile phone network? How likely is she to switch carriers? How many other customers will f ollow

Merged NetworkCalling Network

How v aluable is Amy to my retail sales? Who does she influence? What do they spend?

Telc

o R

eta

il

Page 20: IBM CEC Big Data 2011 06-11 final

© 2012 IBM Corporation20

Personal Attributes• Identifiers : name, address, age, gender, occupation…• Interests : sports, pets, cuisine…• Life Cycle Status : marital, parental

Personal Attributes• Identifiers : name, address, age, gender, occupation…• Interests : sports, pets, cuisine…• Life Cycle Status : marital, parental

Products Interests• Personal preferences of products• Product Purchase history• Suggestions on products & services

Products Interests• Personal preferences of products• Product Purchase history• Suggestions on products & services

Social Media based 360-degree

Consumer Profiles

Life Events• Life-changing events : relocation, having a baby, getting married, getting divorced, buying a house…

Life Events• Life-changing events : relocation, having a baby, getting married, getting divorced, buying a house…

Monetizable intent to buy products Life Events

Location announcementsIntent to buy a house

I'm thinking about buying a home in Buckingham Estates per a recommendation. Anyone have advice on that area? #atx #austinrealestate#austin

I'm thinking about buying a home in Buckingham Estates per a recommendation. Anyone have advice on that area? #atx #austinrealestate#austin

Looks like we'll be moving to New Orleans sooner than I thought.Looks like we'll be moving to New Orleans sooner than I thought.

College: Off to Stanford for my MBA! Bbye chicago!College: Off to Stanford for my MBA! Bbye chicago!

I'm at Starbucks Parque Tezontle http://4sq.com/fYReSjI'm at Starbucks Parque Tezontle http://4sq.com/fYReSj

I need a new digital camera for my food pictures, any recommendations around 300?

I need a new digital camera for my food pictures, any recommendations around 300?

What should I buy?? A mini laptop with Windows 7 OR a Apple MacBook!??!

What should I buy?? A mini laptop with Windows 7 OR a Apple MacBook!??!

Timely Insights• Intent to buy various products • Current Location• Sentiment on products, services, campaigns• Incidents damaging reputation• Customer satisfaction/attrition

Timely Insights• Intent to buy various products • Current Location• Sentiment on products, services, campaigns• Incidents damaging reputation• Customer satisfaction/attrition

Relationships• Personal relationships : family, friends and roommates…• Business relationships: co-workers and work/interest network…

Relationships• Personal relationships : family, friends and roommates…• Business relationships: co-workers and work/interest network…

Sample: Big Data 360 °°°°Lead Generation

Page 21: IBM CEC Big Data 2011 06-11 final

© 2012 IBM Corporation21

Micro-segmentation of consumers by hobbies

Micro-segmentation of consumers by hobbies

Micro-segmentation of product intents by

occupation

Micro-segmentation of product intents by

occupation

Real-time product intents enriched with consumer attributes

Real-time product intents enriched with consumer attributes

Real-time tracking by micro-segmentation

Real-time tracking by micro-segmentation

Integration across Social Media sitesIntegration across Social Media sites

Entries contain promotional messages, wishful thinking, questions, etc

Entries contain promotional messages, wishful thinking, questions, etc

For many of the attributes we need to extract, cleanse, normalize and categorize

For many of the attributes we need to extract, cleanse, normalize and categorize

Sample: Big Data 360 °°°°Lead Generation

Page 22: IBM CEC Big Data 2011 06-11 final

© 2012 IBM Corporation22

Sample: Institutional Risk ApplicationComprehensive view of publicly traded companies and related people based on regulatory filings

Extract

Integrate

Page 23: IBM CEC Big Data 2011 06-11 final

© 2012 IBM Corporation23

Requirements for a Big Data Solution Platform

Analyze Information in MotionStreaming data analysis

Large volume data bursts & ad-hoc analysis

Analyze a Variety of InformationNovel analytics on a broad set of mixed information that could not be analyzed before

Multiple relational & non-relational data types and schemas

Discover & ExperimentAd-hoc analytics, data discovery & experimentation

Analyze Extreme Volumes of InformationCost-efficiently process and analyze petabytes of information

Manage & analyze high volumes of structured, relational data

Manage & PlanEnforce data structure, integrity and control to ensure consistency for repeatable queries

Page 24: IBM CEC Big Data 2011 06-11 final

© 2012 IBM Corporation24

IBM Big Data Platform for Ingest, Data and Analytic s

IBM Big Data Platform

Systems Management

Application Development

Visualization & Discovery

Accelerators

Information Integration & Governance

HadoopSystem

Stream Computing

Data Warehouse

New analytic applications drive the requirements for a big data platform

• Integrate and manage the full variety, velocity and volume of data

• Apply advanced analytics to information in its native form

• Visualize all available data for ad-hoc analysis

• Development environment for building new analytic applications

• Workload optimization and scheduling

• Security and Governance

BI / Reporting

Exploration / Visualization

FunctionalApp

IndustryApp

Predictive Analytics

Content Analytics

Analytic Applications

Page 25: IBM CEC Big Data 2011 06-11 final

© 2012 IBM Corporation25

Big Data Hadoop Capabilities

IBM BigInsightsHadoop-based processing for analytics on variety and volumes of data

IBM StreamsLow latency analytics for streaming data

IBM Big Data SolutionsBig Data Challenges

• Very high volumes (TBs to PBs) unstructured data

• Exploration and discovery• Text, Entity and Social Media Analytics

• Real time processing• Detect failure patterns• High volume, low latency processing

• Scoring and decision analytics

NoS

QL

Dat

a S

trea

min

g

Page 26: IBM CEC Big Data 2011 06-11 final

© 2012 IBM Corporation26

Social

Unstructured

• Meter Data Management• Customer Portals• Smart Meter Analytics• Demand Forecasting• Generation Scheduling• Customer Segmentation• Campaign Management• Outage Management• Estimate Load Shedding• Time of Use Tariffs• Maintenance Scheduling

Foundational

Legacy

Applcations

Regulations

Sensors

Streaming

IBM Streams

Real Time Scoring and Response

• Smart Grid Analytics• Distribution Grid

Monitoring• Root Cause Failure

Analysis• Demand Response

Effectiveness

Streaming Structured or Unstructured

Impr

ove

d A

naly

tics

Structure

d

Analytics and

Reporting

Web/social

Exploration/Discovery Queryable Archive

• Sentiment analysis• Call Centre analysis• Log analysis• Outage Information • Micro customer

segmentation• Offering Management

Impr

ove

d A

naly

tics

Unstructured

IBM BigInsights

Structure

d

Analytics and

Reporting

Employee

Supplier

Maintenance

Orders

GIS

Generation DistributionTransmission

CustomerTrading

Marketing

Smart Meters

High Level Conceptual View *)

Data Asset Landscape

IBM power i

*) Example for Industry Energy & Utility

Operational

Systems

Page 27: IBM CEC Big Data 2011 06-11 final

© 2012 IBM Corporation27

Based on open source & IBM technologies

Distinguishing characteristics

• Built-in analytics enhances business knowledge

• Enterprise software integration complements and extends existing capabilities

• Production-ready platform with tooling for analysts, developers, and administrators speeds time-to-value and simplifies development/maintenance

IBM advantage

• Combination of software, hardware, services and advanced research

IBM InfoSphere BigInsightsAnalytical platform for Big Data at-rest

BI / Reporting

Exploration / Visualization

FunctionalApp

IndustryApp

Predictiv e Analytics

Content Analytics

Analytic Applications

IBM Big Data Platform

Systems Management

Application Development

Visualization & Discovery

Accelerators

Information Integration & Governance

Stream Computing

Data Warehouse

HadoopSystem

Page 28: IBM CEC Big Data 2011 06-11 final

© 2012 IBM Corporation28

IBM InfoSphere BigInsightsEmbrace and Extend Hadoop

HDFS

Storage HBase

GPFS-SNC *)

Application

AdaptiveMRZook

eepe

r

Avr

oPig Hive Jaql

MapReduce

Flume

Data Sources/Connectors

JDBC

Netezza BoardReader

DB2

Streams

Web Crawler

Oozie

Analytics ML Analytics *)Text Analytics Interface

Lucene

R

CSV/XML/JSONData Stage SPSS

IBM

LZ

O C

ompr

essi

on

BigSheets

BigIndexFLEX

Open Source

IBM

Management Console (browser based)

*) future release

Developing Tooling(Eclipse Plug-Ins)

Rest API(for Applications)

Page 29: IBM CEC Big Data 2011 06-11 final

© 2012 IBM Corporation29

�A visual tool for data manipulation and prototyping BigSheets

• Ad-hoc analytics for LOB user

• Analyze a variety of data - unstructured and structured

• Spreadsheet metaphor for exploring/ visualizing data

• Browser-based

Page 30: IBM CEC Big Data 2011 06-11 final

© 2012 IBM Corporation30

Pre-configured text annotators ready for distribute d processing on Big Data

Support for native languages including double-byte

Turns disparate words into measurable insights

Identify positive or negative sentiment,

NLP-based analytics, define

variables, macros and rules.

Physically assemble data,

standardize formats, address

auto-identify language, process punctuation and non-grammatical

characters, standardize

spelling.

Part-of-speech identification, standard

and customized extraction dictionaries,

proper noun identification, concept

categorization, synonyms, exclusions,

multi-terms, regular expressions, fuzzy-

matching

Iterative classification using

automated and manual techniques.

Concept derivation & inclusion, semantic

networks and co-occurrence rules

Reporting/Monitoring social commentary,

combination w /structured data, clustering,

associated concepts, correlated concepts, auto-

classification of documents, sites, posts.

Text Analytics

Page 31: IBM CEC Big Data 2011 06-11 final

© 2012 IBM Corporation31

How it works

• Parses text and detects meaning with annotators

• Understands the context in which the text is analyzed

• Hundreds of pre-built annotators for names, addresses, phone numbers, along others

Accuracy

• Highly accurate in deriving meaning from complex text

Performance

• AQL language optimized for MapReduce

Football World Cup 2010 , one team distinguished themselves well, losing to the eventual champions 1-0 in the Final. Early in the second half, Netherlands’

striker, Arjen Robben, had a breakaway, but the keeper for Spain, Iker Casillas

made the save. Winger Andres Iniesta

scored for Spain for the win.

Unstructured text (document, email, etc)

Classification and Insight

Text AnalyticsHighly accurate analysis of textual content

Page 32: IBM CEC Big Data 2011 06-11 final

© 2012 IBM Corporation32

Framework for machine learning (ML) implementations on Big Data

• Large, sparse data sets, e.g. 5B non-zero values

• Runs on large BigInsights clusters with 1000s of nodes

Productivity

• Build and enhance predictive models directly on Big Data

• High-level language – Declarative Machine Learning Language (DML)

• E.g. 1500 lines of Java code boils down to 15 lines of DML code

• Parallel SPSS data mining algorithms implementable in DML

Optimization

• Compile algorithms into optimized parallel code

• For different clusters and different data characteristics

• E.g. 1 hr. execution (hand-coded) down to 10 mins

0

500

1000

1500

2000

2500

3000

3500

4000

4500

0 500 1000 1500 2000

# non zeros (million)

Exe

cutio

n T

ime

(sec

)Java Map-Reduce Sy stemML Single node R

ML AnalyticsStatistical and Predictive Analysis

Page 33: IBM CEC Big Data 2011 06-11 final

© 2012 IBM Corporation33

Task Map(break task into small parts)

Adaptive Map(optimization —order small units of work)

Reduce(many results to a single result set)

Adaptive MapReduce

� Algorithm to optimize execution time of multiple small jobs

� Performance gains of 30% reduce overhead of task startup

Hadoop System Scheduler

� Identifies small and large jobs from prior experience

� Sequences work to reduce overhead

Workload OptimizationOptimized performance for big data analytic workloa ds

Page 34: IBM CEC Big Data 2011 06-11 final

© 2012 IBM Corporation343434

� Public wind data is available on 284km x 284 km grids (2.5o LAT/LONG)

� More data means more accurate and richer models (adding hundreds of variables)

- Vestas wind library at 2.5 PB: to grow to over 6 PB in the near-term

- Granularity 27km x 27km grids: driving to 9x9, 3x3 to 10m x 10m simulations

� Reduced turbine placement identification from weeks to hours

� Perspective: The Vestas Wind library

34

Page 35: IBM CEC Big Data 2011 06-11 final

© 2012 IBM Corporation35

Built to analyze data in motion

• Multiple concurrent input streams

• Massive scalability

Process and analyze a variety of data

• Structured, unstructured content, video, audio

• Advanced analytic operators

InfoSphere StreamsAnalytical platform for Big Data in-motion

BI / Reporting

Exploration / Visualization

FunctionalApp

IndustryApp

Predictiv e Analytics

Content Analytics

Analytic Applications

IBM Big Data Platform

Systems Management

Application Development

Visualization & Discovery

Accelerators

Information Integration & Governance

HadoopSystem

Data Warehouse

Stream Computing

Page 36: IBM CEC Big Data 2011 06-11 final

© 2012 IBM Corporation36

Current fact finding

Analyze data in motion – before it is stored

Low latency paradigm, push model

Data driven – bring the data to the query

Historical fact finding

Find and analyze information stored on disk

Batch paradigm, pull model

Query-driven: submits queries to static data

Traditional Computing Stream Computing

Stream ComputingAnalyze Data in Motion

Data Base

Page 37: IBM CEC Big Data 2011 06-11 final

© 2012 IBM Corporation37

Streams approach illustratedtuple

Page 38: IBM CEC Big Data 2011 06-11 final

© 2012 IBM Corporation38

Linear Scalability

� Clustered deployments – unlimited scalability

Automated Deployment

� Automatically optimize operator deployment across clusters

Performance Optimization

� JVM Sharing – minimize memory use

� Fuse operators on same cluster

� Telco client – 25 Million messages per second

Analytics on Streaming Data

� Analytic accelerators for a variety of data types

� Optimized for real-time performance

IBM InfoSphere StreamsMassively Scalable Stream Analytics

Visualization

Streams Runtime

Deployments

SyncAdapters

AnalyticOperators

SourceAdapters

Automated and Optimized Deployment

Streaming DataSources

Streams Studio IDE

Page 39: IBM CEC Big Data 2011 06-11 final

© 2012 IBM Corporation39

• Optimize building energy consumption with centralized monitoring

• Automate preventive and corrective maintenance

Capabilities Utilized:• Streaming Analytics• Hadoop System• Business Intelligence

Applications:• Log Analytics• Energy Bill Forecasting• Energy consumption optimization• Detection of anomalous usage• Presence-aware energy mgt.• Policy enforcement

Cisco turns to IBM big data for intelligent

infrastructure management

Page 40: IBM CEC Big Data 2011 06-11 final

© 2012 IBM Corporation40

� Use case– Neonatal infant monitoring– Predict infection in ICU 24 hours in advance

� Solutions– 120 children monitored :120K msg/sec, billion msg/day– Trials expanding to include hospitals in US and China

University of Ontario Institute of Technology

SensorNetwork

Stream-based Distributed Interoperable Health care Infrastructure

Solutions (Applications)

Event Pre-processer

Analysis Framework

Page 41: IBM CEC Big Data 2011 06-11 final

© 2012 IBM Corporation41

Without a Big Data Platform You Code…

Streams provides development, deployment, runtime, and infrastructure services

“TerraEchos developers can deliver applications 45% faster due to the agility of Streams

Processing Language…”– Alex Philip, CEO and President, TerraEchos

Multithreading

Custom SQLand

Scripts

PerformanceOptimization

Debug

ApplicationManagement

EventHandling

Connectors

CheckPointing

Security

HA Acceleratorsand

Tool kits

Over 100 sample applications and toolkits with industry focused toolkits with 300+ functions and operators

Page 42: IBM CEC Big Data 2011 06-11 final

© 2012 IBM Corporation42

2012IBM is Committed to Innovation

2005* TeaLeaf, Varicent Vivismo pending acquisition close

• $16B+ in acquisit ions since 2005

• 10,000+ technical professionals

• ~8000 dedicated consultants

• 27,000+ business partner certifications

• 8 Analytics Solutions Centers

• 100 analytics-based research assets; almost 300 researchers

• $16B+ in acquisit ions since 2005

• 10,000+ technical professionals

• ~8000 dedicated consultants

• 27,000+ business partner certifications

• 8 Analytics Solutions Centers

• 100 analytics-based research assets; almost 300 researchers

IBM ResarchAlmadenAustinMelbourneSao PauloBeijingHaif aDelhiIrelandYamatoWatsonZurich

“Watson is going to revolutionize many, many industries and it will fundamentally change the way we interact with computers & machines.”

John Kelly, SVP & Head of IBM Research

Selected SW Acquisitions

Page 43: IBM CEC Big Data 2011 06-11 final

© 2012 IBM Corporation43

bigdatauniversity.com/

ibm.com/software/data/bigdata/

Making Learning Easy and FunAsk for a Big Data Discovery Workshop

ibm.com/software/data/infosphere/biginsights/

youtube.com/user/ibmbigdata

Page 44: IBM CEC Big Data 2011 06-11 final

© 2012 IBM Corporation44

Dipl.Ing.Wolfgang Nimführ

Information Agenda Executive ConsultantBig Data Tiger TeamIBM Software Group Europe

IBM AustriaObere Donaustrasse 95A1020 Vienna

Tel [email protected]

Questions & Answers