Everything You Need to Know About
‘Big Data’, BI and Data Acceleration Adrian Westmoreland
December, 2012
© 2012 SAP AG. All rights reserved. 2
Safe harbor statement
The information in this presentation is confidential and proprietary to SAP and may not be disclosed without the permission of SAP. This presentation is not subject to your license agreement or any other service or subscription agreement with SAP. SAP has no obligation to pursue any course of business outlined in this document or any related presentation, or to develop or release any functionality mentioned therein. This document, or any related presentation and SAP's strategy and possible future developments, products and or platforms directions and functionality are all subject to change and may be changed by SAP at any time for any reason without notice. The information on this document is not a commitment, promise or legal obligation to deliver any material, code or functionality. This document is provided without a warranty of any kind, either express or implied, including but not limited to, the implied warranties of merchantability, fitness for a particular purpose, or non-infringement. This document is for informational purposes and may not be incorporated into a contract. SAP assumes no responsibility for errors or omissions in this document, except if such damages were caused by SAP intentionally or grossly negligent.
All forward-looking statements are subject to various risks and uncertainties that could cause actual results to differ materially from expectations. Readers are cautioned not to place undue reliance on these forward-looking statements, which speak only as of their dates, and they should not be relied upon in making purchasing decisions.
© 2012 SAP AG. All rights reserved. 3
Agenda
Over the next 30 minutes
You’ll gain an understanding of Big Data technologies, opportunities and challenges including how
technology innovations are changing the Big Data landscape.
You’ll see a brief overview of SAP’s Big Data architecture.
You’ll discover how other companies are utilizing Big Data for their benefit.
Introduction to ”Big Data”
© 2012 SAP AG. All rights reserved. 5
SOCIAL
© 2012 SAP AG. All rights reserved. 7
CRM Data
GP
S
Demand
Sp
ee
d
Velocity
Transactions
Opport
unitie
s
Serv
ice C
alls
Customer
Sales Orders
Inventory
Em
ails
Tw
eets
Planning
Things
Mobile
Insta
nt M
essages
Worldwide digital content will
double in 18 months, and
every 18 months thereafter.
VELOCITY
In 2005, humankind created
150 exabytes of information.
In 2011, 1,200 exabytes will
be created.
VOLUME VARIETY 80% of enterprise data will
be unstructured, spanning
traditional and non traditional
sources. Gartner
IDC
The Economist
VARIABILITY Configuring modern software can be
extremely difficult since a good
configuration depends (at least) on the
hardware environment, the workload, the load
intensity, and the target behavior MassConf Paper, ACM
VALUE Empowerment of the End User is the goal
of Enterprise Software. SAP
VALIDITY Ensure that the information was created in
accordance with complete understanding of the
use cases and includes all the other aspects of
data quality. Gartner
© 2012 SAP AG. All rights reserved. 8
IDC “Big Data” definition
IDC’s “Big Data” definition utilizes criteria and steps to determine whether a use case and
associated technology and services should be included in the “Big Data” market sizing. These
include the following scenarios:
Scenarios:
• Deployments where the data collected is over 100TB (data collected, not stored, accounts for
the use of in-memory technology where data may not be stored on a disk)
• Deployments of ultra-high-speed messaging technology for real-time, streaming data capture,
and monitoring
• Deployments where the data sets may not be very large today but are growing very rapidly at
a rate of 60% or more annually
Next, IDC evaluates whether, for each of the above scenarios, the technology is deployed on
scale-out infrastructure, and finally, IDC evaluates whether the deployments include two or more
data types or data sources, and/or include high-speed data sources such as click-stream
tracking or monitoring of machine-generated data .
© 2012 SAP AG. All rights reserved. 10
Collect Kafka Flume Scribe
Process Azkaban Oozie Pig Hive
Hadoop MapReduce S4 Storm
Store Voldemort Cassandra Hbase
Present Analytics? Applications? Mobile?
Open Source Big Data – Even More Confused?
© 2012 SAP AG. All rights reserved. 11
New storage and processing techniques required
Columnar
Distributed
In-memory
Row
Real-time queries
High value data
Targeted data read
Batch queries
Flexible data sets
All data read
© 2012 SAP AG. All rights reserved. 12
In-Memory computing Rethink
Yesterday Today
Disk
Partitioning
Insert Only on Delta Compression
Row and Column Store
No aggregates Memory
+ +
+ +
Memory
Logging and Backup –
Solid State / Flash / HDD
CPU
Multi-Core
Massively Parallel
SingleOptimized Platform
64-bit address space
supports 2TB RAM
100GB/s throughput
Software and data reside on HDD
• I/O constraint
• Support many platforms
• Optimized for none
• Leverage latest advances in hardware
• Minimize I/O time
• Optimized for x86 platform
Disk
CPU
+
© 2012 SAP AG. All rights reserved. 13
The future of database technology
In-m
em
ory
Co
mp
uti
ng
Ad
op
tio
n
Tra
dit
ion
al D
ata
base A
do
pti
on
Time
2012 – Cost per Terabyte Disk
Memory
$60
$4,900
1990 – Cost per Terabyte Disk
Memory
$9,000,000
$106,000,000
Falling prices move processing from
Disk/SSD to In-Memory
© 2012 SAP AG. All rights reserved. 14
Main memory reference 100 ns
Compress 1K bytes with Zippy 3,000 ns = 3 µs
Send 2K bytes over 1 Gbps network 20,000 ns = 20 µs
SSD random read 150,000 ns = 150 µs
Read 1 MB sequentially from memory 250,000 ns = 250 µs
Round trip within same datacenter 500,000 ns = 0.5 ms
Read 1 MB sequentially from SSD* 1,000,000 ns = 1 ms
Disk seek 10,000,000 ns = 10 ms
Read 1 MB sequentially from disk 20,000,000 ns = 20 ms
Send packet Canada->Europe->Canada 50,000,000 ns = 150 ms
*Assuming ~1GB/sec SSD
Data by [Jeff Dean](http://research.google.com/people/jeff/)
Originally by [Peter Norvig](http://norvig.com/21-days.html#answers)
The future of database technology
© 2012 SAP AG. All rights reserved. 15
Lets multiply all these durations by a billion:
Hour:
Main memory reference 100 s Brushing your teeth
Compress 1K bytes with Zippy 50 min One episode of a TV show (including ad breaks)
Day:
Send 2K bytes over 1 Gbps network 5.5 hr From lunch to end of work day
Week
SSD random read 1.7 days A normal weekend
Read 1 MB sequentially from memory 2.9 days A long weekend
Round trip within same datacenter 5.8 days A vacation
Read 1 MB sequentially from SSD 11.6 days A European vacation
Year
Disk seek 16.5 weeks A semester in university
Read 1 MB sequentially from disk 7.8 months
The above two together 1 year
Decade
Send packet Canada-Europe-Canada 4.8 years Average time it takes to complete a bachelor's degree
The future of database technology
© 2012 SAP AG. All rights reserved. 16
What is SAP HANA?
A flexible, data source agnostic in-memory
analytic appliance to quickly process and
analyze large volumes of transactional data in
real-time
A modern platform that serves as the foundation
to develop a new class of real-time applications
In-Memory Database that runs under SAP
NetWeaver BW for a supercharged data
warehouse
SAP HANA Studio
Real-Time Data Replication
SAP HANA™
SAP Applications Non SAP Data
sources
SAP HANA Database
Calculation Engine
Row & Column In-Memory
SAP BusinessObjects Data Integrator
SAP Information Composer
SAP BusinessObjects BI
Solutions
SAP Applications
SAP NetWeaver BW
SAP HANA – Overview
© 2012 SAP AG. All rights reserved. 17
Next generation SAP Real-time Data Platform
SAP Analytics SAP Business
Suite SAP Big Data
Applications 3rd Party
BI Clients
SAP
Mobile
On Premise / Cloud
Custom
Apps
Open Developer API’s and Protocols
Co
mm
on
L
an
dsca
pe
Ma
na
ge
me
nt
SAP Enterprise Information Management
SAP Sybase
Replication Server
SAP Data
Services
SAP HANA Platform
SAP MDG and MDM
SAP Real-time Data Platform
SAP Sybase IQ SAP Sybase ASE
SAP Sybase
SQLA
SAP Sybase ESP
Co
mm
on
M
od
elin
g
Syb
ase
Po
we
rDe
sig
ne
r
MP
P
Sca
le-O
ut
SAP NW BW
© 2012 SAP AG. All rights reserved. 18
Introducing SAP Big Data Processing Framework
Provide optimized data management across each phase of the information lifecycle
process and deliver real-time, actionable insights
Sybase Replication Server for
real-time high value data
replication
Sybase ESP for collecting
stream data
SAP BusinessObjects Data
Services with Hadoop
Connectors for collecting data
from disparate sources via
batch
SAP BusinessObjects BI
platform to display federated
query results across Hadoop and
HANA/IQ to provide deep
insights (dashboards,
visualization, data exploration,
predictive analysis, analytic
applications, and embedded BI in
business applications)
SAP HANA, ASE, or IQ for
real-time data store
Sybase IQ for near-time data
store and multimedia data
storage
Hadoop for long-term,
extended archive
SAP HANA and Sybase IQ for
real-time high value data
processing
Sybase Event Stream
Processor for real-time event
data processing
Sybase IQ for federated query
w/ MapReduce API
Hadoop/MapReduce for
batch, explorative data
processing
Collect Store Process Present
© 2012 SAP AG. All rights reserved. 19
Fresh Direct
“Our Food is Fresh.
Our Customers Are Spoiled”
© 2012 SAP AG. All rights reserved. 20
Parking Ticket Optimization
© 2012 SAP AG. All rights reserved. 21
“Transforming information into intelligence in real time is a cornerstone for McLaren’s winning formula – and increasingly critical for the future of every company,” Jim
Hagemann Snabe, co-CEO, SAP AG
"Using HANA we can hopefully automate decision making. People have always made decisions based on the data, but we want to get to the point where the system can
make the decision,“ Stuart Birrell , McLaren CIO
“ ”
McLaren Group Limited Automotive Industry (Formula One) – Predict and Transform the outcome of races
Product: Agile Datamart - POC
Business Challenges
Cut costs on expensive data scientists that currently help with the team's data analysts to measure
and predict car’s performance
Better anticipate, accelerate and differentiate its business from competitors
Technical Challenges
Turbo-charge both the speed and depth of McLaren’s telemetry technology
Process Big data and act on it rapidly to create the prescriptive intelligence in order to help transform
the outcome of races
Benefits
Real-time analysis of car sensor data – historical data and predictive models
Make immediate proactive corrections and avoid costly, dangerous incidents and win the race
Provide a technology engine that was integrated, scalable and delivered maximum performance
14,000x faster
data analysis – from 5
hours to 1 second
99% predict
the outcome of a
race
© 2012 SAP AG. All rights reserved. 22
SunGard Leading software and IT services company
SAP Sybase IQ is simple to manage and operate and it’s enabling us to easily build really big systems in a way that is cost-effective, manageable and
sustainable…It doesn’t matter what we throw at it, it seems to take it in stride and give us a great response…We feel like it’s a solution that will carry us
forward into uncharted territory. We see no limit to how far we can go with it.
Product Architect, SunGard
Business Challenges
Enabling the building of newer and larger systems – allowing expansion into new
markets and business areas.
Technical Challenges
Handle very large and continuously growing volumes of data without
performance degradation.
Existing system began to experience performance deterioration that was
unacceptable to end-users
Benefit
Slashes query response time regardless of data volumes
Enables analytics and reporting against virtually unlimited data
1 Trillion rows data stored
“ ”
80 TB of compressed
data
© 2012 SAP AG. All rights reserved. 23
SAP HANA + Hadoop + R
Benefits
Reduces time to detect variant DNA
In-memory accelerates predictive & correlation
analysis
Optimized treatment plans based on DNA mutations
Long-term study of DNA-based cancer treatment
Genomic DNA analysis in real-time will transform how we enable comprehensive patient care to fight against cancer. SAP HANA will be the mission critical
and reliable data platform to make real-time cancer analytics into a reality. Separately, our internal technical comparison demonstrated that SAP HANA
outperforms a traditional disk-based system by factor of 408,000 when performing other types of data analysis.
Yukihisa Kato, Director & Executive Officer, CTO, Research and Development Center, MITSUI KNOWLEDGE INDUSTRY CO.,LTD.
408,000x faster than
traditional disk-based
systems in PoC
216x faster DNA
analysis results - from
2-3 days to 20 minutes
“ ”
© 2012 SAP AG. All rights reserved. 24
SOCIAL ANALYTICS MOBILE BIG DATA CLOUD
HANA REAL-TIME PLATFORM
SAP BusinessObjects BI 4.0
and SAP HANA
© 2012 SAP AG. All rights reserved. 27
SAP HANA A platform for a new class of real-time analytics and applications
Real-time analytics
SAP Business Suite Third-party systems
SAP HANA
Microsoft
Excel
SAP Business
Objects solutions Others…(Open)
Real-time replication services
Data services
Real-time apps
In-memory database
Planning and Calculation Engine
R & Hadoop integration
Predictive Analysis & Business Function
Libraries
SAP NetWeaver
Business Client
Information Composer & Modeling Studio
Text Search Application Services (e.g. HTML 5 Server)
© 2012 SAP AG. All rights reserved. 28
Today's World
Data
Warehouse / Marts
OLAP
Transactional
System
OLTP
Real-time
posting
into Transactional
System
Aggregation
Batch transfer to
Data Warehouse
Limited flexibility due to
pre-defined data structures
Long query run-times
Loss of detail
Long Wait times for reports
Reporting
Challenges
Large Volumes
High Impact
'Real Life'
Business
Transaction
Analysis and Insight
Action
© 2012 SAP AG. All rights reserved. 29
What if this would all happen real-time?
No Aggregation / No Data Staging / No Data Marts
Real-time
Loading into SAP
HANA
High Performance
Large Volume Data
Processing
Fast, flexible and detail
analytics over large volumes
SAP HANA
IN-MEMORY
'Real Life'
Business
Transaction
Analysis and Insight
Action
© 2012 SAP AG. All rights reserved. 30
Accelerated BI with SAP BusinessObjects and SAP HANA One Unified and Complete BI Suite Addressing the Full Spectrum of BI on SAP HANA
Discovery and Analysis
Discover areas to optimize your business
Adapt data to business needs
Tell your story with beautiful visualizations
Discover. Predict. Create.
Dashboards and Apps
Deliver engaging information to users where they
need it
Track key performance indicators and summary
data
Build custom experiences so users get what they
need quickly
Build Engaging Experiences
Reporting
Securely distribute information across your
organization
Give users the ability to ask and answer their
own questions
Build printable reports for operational efficiency
Share Information
© 2012 SAP AG. All rights reserved. 31
Agility for business analysts and business users
Discover trends, outliers and areas of interest in your business
Adapt to business scenarios by combining, manipulating, and enriching data
Tell your story with self-service visualizations and analytics
Forecast and predict future outcomes
Discovery and Analysis Discover. Predict. Create.
Portfolio
Visual Intelligence
Explorer
Analysis
Predictive Analysis
© 2012 SAP AG. All rights reserved. 32
Build engaging, visual dashboards
Powerful environment to build interactive and visually appealing analytics
Rich set of controls: buttons, list boxes, drop-down, crosstabs, charts…
Use custom code to extend and build workflows
Dashboards and Apps Build Engaging Experiences
Portfolio
Design Studio
Dashboards (aka Xcelsius®)
© 2012 SAP AG. All rights reserved. 33
High productivity design for report designers
Quickly build formatted reports on any data source
Securely distribute reports both internally and externally
Minimize IT support costs by empowering end users to easily create and modify their
own reports
Enhance custom applications with embedded reports
Reporting Share Information
Portfolio
Web Intelligence
Crystal Reports
© 2012 SAP AG. All rights reserved. 34
BI 4 Platform: Open, Agnostic, and Unified Access any data, consume information anywhere
Enterprise Portals
MS Office On Demand Services
Browsers Mobile Devices
ERP
Embedded Content
Personal
Universe Semantic Layer
Business Intelligence Platform
EDW
Discovery and Analysis Dashboards and Apps Reporting
Unstructured
© 2012 SAP AG. All rights reserved. 36
No part of this publication may be reproduced or transmitted in any form or for any purpose without
the express permission of SAP AG. The information contained herein may be changed without prior
notice.
Some software products marketed by SAP AG and its distributors contain proprietary software
components of other software vendors.
Microsoft, Windows, Excel, Outlook, PowerPoint, Silverlight, and Visual Studio are registered
trademarks of Microsoft Corporation.
IBM, DB2, DB2 Universal Database, System i, System i5, System p, System p5, System x, System z,
System z10, z10, z/VM, z/OS, OS/390, zEnterprise, PowerVM, Power Architecture, Power Systems,
POWER7, POWER6+, POWER6, POWER, PowerHA, pureScale, PowerPC, BladeCenter, System
Storage, Storwize,
XIV, GPFS, HACMP, RETAIN, DB2 Connect, RACF, Redbooks, OS/2, AIX, Intelligent Miner,
WebSphere, Tivoli, Informix, and Smarter Planet are trademarks or registered trademarks of IBM
Corporation.
Linux is the registered trademark of Linus Torvalds in the United States and other countries.
Adobe, the Adobe logo, Acrobat, PostScript, and Reader are trademarks or registered trademarks of
Adobe Systems Incorporated in the United States and other countries.
Oracle and Java are registered trademarks of Oracle and its affiliates.
UNIX, X/Open, OSF/1, and Motif are registered trademarks of the Open Group.
Citrix, ICA, Program Neighborhood, MetaFrame, WinFrame, VideoFrame, and MultiWin are
trademarks or registered trademarks of Citrix Systems Inc.
HTML, XML, XHTML, and W3C are trademarks or registered trademarks of W3C®, World Wide Web
Consortium, Massachusetts Institute of Technology.
Apple, App Store, iBooks, iPad, iPhone, iPhoto, iPod, iTunes, Multi-Touch, Objective-C, Retina,
Safari, Siri,
and Xcode are trademarks or registered trademarks of Apple Inc.
IOS is a registered trademark of Cisco Systems Inc.
RIM, BlackBerry, BBM, BlackBerry Curve, BlackBerry Bold, BlackBerry Pearl, BlackBerry Torch,
BlackBerry Storm, BlackBerry Storm2, BlackBerry PlayBook, and BlackBerry App World are
trademarks or registered trademarks of Research in Motion Limited.
© 2012 SAP AG. All rights reserved.
Google App Engine, Google Apps, Google Checkout, Google Data API, Google Maps, Google Mobile
Ads, Google Mobile Updater, Google Mobile, Google Store, Google Sync, Google Updater, Google
Voice,
Google Mail, Gmail, YouTube, Dalvik and Android are trademarks or registered trademarks of
Google Inc.
INTERMEC is a registered trademark of Intermec Technologies Corporation.
Wi-Fi is a registered trademark of Wi-Fi Alliance.
Bluetooth is a registered trademark of Bluetooth SIG Inc.
Motorola is a registered trademark of Motorola Trademark Holdings LLC.
Computop is a registered trademark of Computop Wirtschaftsinformatik GmbH.
SAP, R/3, SAP NetWeaver, Duet, PartnerEdge, ByDesign, SAP BusinessObjects Explorer,
StreamWork,
SAP HANA, and other SAP products and services mentioned herein as well as their respective logos
are trademarks or registered trademarks of SAP AG in Germany and other countries.
Business Objects and the Business Objects logo, BusinessObjects, Crystal Reports, Crystal
Decisions, Web Intelligence, Xcelsius, and other Business Objects products and services mentioned
herein as well as their respective logos are trademarks or registered trademarks of Business Objects
Software Ltd. Business Objects is an SAP company.
Sybase and Adaptive Server, iAnywhere, Sybase 365, SQL Anywhere, and other Sybase products
and services mentioned herein as well as their respective logos are trademarks or registered
trademarks of Sybase Inc. Sybase is an SAP company.
Crossgate, m@gic EDDY, B2B 360°, and B2B 360° Services are registered trademarks of
Crossgate AG
in Germany and other countries. Crossgate is an SAP company.
All other product and service names mentioned are the trademarks of their respective companies.
Data contained in this document serves informational purposes only. National product specifications
may vary.
The information in this document is proprietary to SAP. No part of this document may be reproduced,
copied,
or transmitted in any form or for any purpose without the express prior written permission of SAP AG.