© 2014 IBM Corporation
Open '14
Analyzing Big Data
Jeff ScheelChief EngineerLinux on Power
June 2, 2014
© 2014 IBM Corporation2
Agenda
1. Getting started with Big Data
2. OpenPOWER Foundation
3. The future of Analytics
© 2014 IBM Corporation
Getting started with Big Data
© 2014 IBM Corporation4
Big Data is growing and moving fast from a variety of sources, are you keeping up?
• 1 Trillion connected devices generate 2.5 quintillion bytes data / day
• 80% of the world’s data today is unstructured
• 1 in 2 business leaders don’t have access to data they need
© 2014 IBM Corporation5
“Data is the new oil”In its raw form, oil has little value. Once processed and refined, it helps power the world.
“Big Data has arrived at Seton Health Care Family, fortunately accompanied by an analytics tool that will help deal with the complexity of more than two million patient contacts a year…”
“Data is the new oil.”Clive Humby
“At the World Economic Forum last month in Davos, Switzerland, Big Data was a marquee topic. A report by the forum, “Big Data, Big Impact,” declared data a new class of economic asset, like currency or gold.
“Increasingly, businesses are applying analytics to social media such as Facebook and Twitter, as well as to product review websites, to try to “understand where customers are, what makes them tick and what they want”, says Deepak Advani, who heads IBM’s predictive analytics group.”
“Companies are being inundated with data—from information on customer-buying habits to supply-chain efficiency. But many managers struggle to make sense of the numbers.”
© 2014 IBM Corporation6
The challenge: handling the large Volume, Variety, Velocity, and Veracity of data to find new insights and improve business outcome
BI / Reporting Exploration / Visualization
FunctionalApp
IndustryApp
PredictiveAnalytics
ContentAnalytics
Analytic Applications
IBM Big Data Platform
Systems Management
Application Development
Visualization & Discovery
Accelerators
Information Integration & Governance
HadoopSystem
Stream Computing Data Warehouse
MFG - Analyze & correlate log records to improve service and predict failures
Telco - Address customer satisfaction, Predict churn, and match promotions in real time
Healthcare - Detect life-threatening conditions at hospitals in time to intervene
Retail - Multi-channel customer sentiment and experience analysis
Financial Services - Make risk decisions based on real-time transactional data
Law Enforcement - Identify criminals and threats from video, audio feeds
© 2014 IBM Corporation7
Customers are deploying new infrastructure to leverage all data types
Data inMotion
Data atRest
Data inMany Forms
Information Ingestion and Operational Information
Decision Management
BI and Predictive Analytics
Navigation and Discovery
IntelligenceAnalysis
Landing Area,Analytics Zoneand Archive
Raw Data Structured Data Text Analytics Data Mining Entity Analytics Machine Learning
Real-timeAnalytics
Video/Audio Network/Sensor Entity Analytics Predictive
Exploration,Integrated Warehouse, and Mart Zones
Discovery Deep
Reflection Operational Predictive Stream Processing
Data Integration Master Data
Streams
Information Governance, Security and Business Continuity
Hadoop Infrastructure – currently being deployed on commodity hardware
Hadoop Infrastructure – currently being deployed on commodity hardware
© 2014 IBM Corporation8
WATSON
Two new Watson-based products:
• Interactive Care Insights for Oncology
• The WellPoint Interactive Care Guide and Interactive Care Reviewer
IBM and Red Hat innovating in Healthcare with Watson
• Watson's oncology education:
• 600,000 pieces of medical evidence
• 2 million pages of text
• 25,000 training cases
• Watson can review 1.5 million patient records faster than it takes most office computers to boot up
© 2014 IBM Corporation9
Big Data implementation patternsCommon analysis of structured &
unstructured data
WarehouseHadoop
App / BIVisualization / Exploration
Warehouse and BigInsights partitioning
HadoopWarehouse
App / BIVisualization Exploration
App / BIVisualization Exploration
App / BIVisualization Exploration
HadoopWarehouse
Warehouse batch offload
Warehouse
App /BIVisualization Exploration
Hadoop
StructuredUnstructured
App / BIVisualization Exploration
Separate unstructured & structured analysis
StructuredUnstructured
Structured Structured
© 2014 IBM Corporation10
What the experts say
1. Seek project input from Sales, Marketing, and Operations teams
2. Select projects which are well-defined and have quick ROI – less than a year
3. Leverage your experiences from data warehouse and business intelligence projects
4. Avoid starting with “Big Bang”
Source: http://www-01.ibm.com/common/ssi/cgi-bin/ssialias?infotype=SA&subtype=WH&htmlfid=POL03133USEN
© 2014 IBM Corporation11
More ideas for starting
Warehouse
App /BIVisualization Exploration
Hadoop
Existing BI Stack
App / BIVisualization Exploration
Separate unstructured & structured analysis
New
Find a small problem to solve, i.e. an internal phone directory, and start “on-the-side”.
Locate relevant data and identify pieces what are “in motion” or “at rest”.
For data at rest, build opensource Hadoop on your PowerLinux system or try the InfoSphere BigInsights Basic Edition (no charge).
For data in motion, use the InfoSphere Streams trial download.
Reference the IBM Information Center for details onhow to import data into Hadoop and how to write applications using Streams Studio.
Explore Datameer to visualize your Hadoop based Big Data
© 2014 IBM Corporation12
PowerLinux jump start services facilitate starting with Big Data Analytics
5 Day IBM Power Analytics Services Jump StartIncludes:• 5 days, on-site service offering • Quick Analytics Assessment Workshop•Software Installation• Hands on education in getting started• Evaluating the analytical approach for your business that will make the biggest impact • Quick sample application to consume customer data Reference Architecture Workshop
Why Jump Start Services for your IBM Power Analytics solution?• Learn how to optimally leverage IBM Power System for Analytics• Learn the benefits and reasoning of Big Data •Learn how to gain business value from the data you have
2 Day IBM Power Analytics Services Jump StartIncludes:• 2 days, on-site Big Data Analytics service offering•Software installation • Hands on education in getting started Evaluating the analytical approach for your business that will make the biggest impact
IBM Systems Lab Services & Training - Power SystemsServices for PowerLinux, AIX, and OSContact – Linda Hoben, Opportunity Manager, [email protected]
IBM Power Servers is an ideal platform for streaming data and performing analytic computations for a multitude of applications.
Let us help make you successful!
© 2014 IBM Corporation13
IBM POWER has a strong history in transactional processing workloads
1,556 2,845 5,669 9,200 12,60223,871
32,046
50,164
63,021
95,081
150,000$109.00
$89.00
$52.70
$43.00
$17.80
$8.31 $5.42 $5.19 $2.97 $2.81 $0.69
0
20000
40000
60000
80000
100000
120000
140000
160000
S70 S7A S80 S85 p690 p690+ p690++ p5-595 p5-595+ P6 595 P7 780
$0
$20
$40
$60
$80
$100
$120
tpcC $/tpcC
© 2014 IBM Corporation14
POWER8 Processor
Caches • 512 KB SRAM L2 / core• 96 MB eDRAM shared L3• Up to 128 MB eDRAM L4
(off-chip)
Cores • 12 cores (SMT8)• 8 dispatch, 10 issue,
16 exec pipe• 2X internal data
flows/queues• Enhanced prefetching• 64K data cache,
32K instruction cache
Accelerators• Crypto & memory expansion• Transactional Memory • VMM assist • Data Move / VM Mobility Energy Management
• On-chip Power Management Micro-controller• Integrated Per-core VRM• Critical Path Monitors
Technology•22nm SOI, eDRAM, 15 ML 650mm2
Memory• Up to 230 GB/s
sustained bandwidth
Bus Interfaces• Durable open memory
attach interface• Integrated PCIe Gen3• SMP Interconnect• CAPI (Coherent
Accelerator Processor Interface)
ComputerWorld: To make the chip faster, IBM has turned to a more advanced manufacturing process, increased the clock speed and added more cache memory, but perhaps the biggest change heralded by the Power8 cannot be found in the specifications. After years of restricting Power processors to its servers, IBM is throwing open the gates and will be licensing Power8 to third-party chip and component makers. The Register: the Power8 is so clearly engineered for midrange and enterprise systems for running applications on a giant shared memory space, backed by lots of cores and threads. Power8 does not belong in a smartphone unless you want one the size of a shoebox that weighs 20 pounds. But it most certainly does belong in a badass server, and Power8 is by far one of the most elegant chips that Big Blue has ever created, based on the initial specs. PCWorld: With Power8, IBM has more than doubled the sustained memory bandwidth from the Power7 and Power7+, to 230 GB/s, as well as I/O speed, to 48 GB/s. Put another way, Watson’s ability to look up and respond to information has more than doubled as well.
Microprocessor report: Called Power8, the new chip delivers impressive numbers, doubling the performance of its already powerful predecessor, Power7+. Oracle currently leads in server-processor performance, but IBM’s new chip will crush those records. The Power8 specs are mind boggling.
Source: Hotchips presentation
© 2014 IBM Corporation15
POWER8 delivers 2.5x performance on Big Data / HadoopPOWER8 reduces the number of servers by 60% based on the best x86 published Terasort result
POWER8 S822L will deliver over 2x the performance of the best published x86 system
… and continues to offer far superior RAS
POWER8 delivers 1.7X over HP on a per-core normalized benchmark.
POWER8 exploits additional cores, more threads, larger caches, memory bandwidth
Terasort is a popular benchmark to measure the performance of a Hadoop solution
Sorts a large dataset (10 TB) in parallel Exercises the Map-reduced framework
and Hadoop Distributed File System (HDFS)
>2x>2x
Relative System Performance
0
0.5
1
1.5
2
2.5
3
POWER8 Cisco
2.5x2.5x
IBM Analytics Stack: IBM Power System S822L; 24 cores / 192 threads, POWER8; 3.0GHz, 512 GB memory, RHEL 6.5, InfoSphere BigInsights 3.0
Compared to a 16 Cores HP system
http://www.cisco.com/en/US/solutions/collateral/ns340/ns517/ns224/ns944/le_tera.pdf
© 2014 IBM Corporation16
Power Systems S822LPower Systems
S812L• 1-socket, 2U• Linux Only
• 2-socket, 2U• Linux Only
• 2-socket, 2U• All Operating Systems
Power Systems S822
Power Systems S814
• 1-socket, 4U• All Operating Systems
Power Systems S824
• 2-socket, 4U• All Operating
SystemsPower Systems S824L
• 2-socket, 4U• Linux Only• SOD
1 & 2 Sockets
New IBM Power Systems based on POWER8
© 2014 IBM Corporation
OpenPOWER Foundation – The emerging ecosystem
18 © OpenPOWER Foundation 2014
Industry trends
• The number of companies designing & building servers is increasing
– Traditionally there have been few companies designing systems: HP, IBM, SUN, Dell, etc.
– Today there are many more: Google, Microsoft, Facebook, Rackspace, Huawei, Sugon, Inspur, etc.
– A fairly mature ecosystem including the Taiwanese ODMs is a key enabler of this trend
• Numerous disruptive forces are impacting these custom system designs and driving designers to consider new ways of innovating
– Ability to handle rapid growth in Big Data & Analytics based solutions– Choice and Innovation– CPU SOC integration drive need for chip development
• These trends create a need for a server targeted “chip-system-software” ecosystem
– IBM has technology and a software stack ready to meet these needs– IBM recognizes the need to work with partners to create this ecosystem– IBM recognizes the need for choice and options in processor sourcing
19 © OpenPOWER Foundation 2014
OpenPOWER Foundation Structure
OpenPOWER is an industry foundation based on the POWER architecture, enabling an Open community for development and opportunity for member differentiation and growth
20 © OpenPOWER Foundation 2014
Building collaboration and innovation at all levels
Welcoming new members in all areas of the ecosystem100+ inquiries and numerous active dialogues underway
Boards/Systems
I/O, Storage, Acceleration
Chip/SOC
System/Software/Services
21 © OpenPOWER Foundation 2014
OpenPOWER Proposed Ecosystem Enablement
XCATXCAT
System Operating Environment Software StackA modern development environment is emerging
based on tools and services
CloudSoftware
OperatingSystem / KVM
Standard OperatingEnvironment
(System Mgmt)
So
ftw
are
Power Open Source Software Stack Components
ExistingOpen
Source Software
Communities
Firmware
Hardware
New OSS Community
OpenPOWERTechnology
OpenPOWERFirmware
CAPP
PC
Ie
POWER8
CAPI over PCIe
“Standard POWER Products” – 2014
Har
dw
a re
“Custom POWER SoC” – Future
Customizable
Framework to Integrate System IP on Chip
Industry IP License Model
Multiple Options to Design with POWER Technology Within OpenPOWER
© 2014 IBM Corporation22
Non-IBM POWER8 products
http://www.enterprisetech.com/2014/04/28/inside-google-tyan-power8-server-boards/
The Tyan reference (ATX) board, SP010, measures 12” by 9.6”➢ one single-chip module (SCM)➢ four DDR3 memory slots➢ four 6 Gb/sec SATA peripheral connectors➢ two USB 3.0 ports➢ two Gigabit Ethernet network interfaces➢ keyboard and video➢ intended for developers
The Google reference board➢ two single-chip module (SCM)➢ four modified SATA ports➢ Google use only
© 2014 IBM Corporation
The future of Analytics
© 2014 IBM Corporation24
The future of Analytics: An open approach
Open Platform for Choice
25 © OpenPOWER Foundation 2014
POWER8 CAPI
CustomHardware
Application
POWER8
CAPP
Coherence Bus
PSL
FPGA or ASIC
Customizable HardwareApplication Accelerator • Specific system SW, middleware, or user application• Written to durable interface provided by PSL
POWER8
PCIe Gen 3Transport for encapsulated messages
Processor Service Layer (PSL)• Present robust, durable interfaces to applications• Offload complexity / content from CAPP
Virtual Addressing• Accelerator can work with same memory addresses that the
processors use• Pointers de-referenced same as the host application• Removes OS & device driver overhead
Hardware Managed Cache Coherence• Enables the accelerator to participate in “Locks” as a normal thread
Lowers Latency over IO communication model
Coherent Accelerator Processor Interface (CAPI)
© 2014 IBM Corporation26
Coherent Accelerator Processor Interface (CAPI) Overview
CAPP PCIe
POWER8 Processor
Typical I/O Model Flow
Flow with a Coherent Model
Shared Mem. Notify Accelerator
AccelerationShared Memory
Completion
DD CallCopy or PinSource Data
MMIO NotifyAccelerator
AccelerationPoll / Int
CompletionCopy or UnpinResult Data
Ret. From DDCompletion
FPGA
Fu
nctio
n n
Fu
nctio
n 0
Fu
nctio
n 1
Fu
nctio
n 2
CAPI
IBM Supplied POWER Service Layer
© 2014 IBM Corporation27
Example: Innovative “In-Memory” NoSQL/KVS Integrated Solution - via POWER8 CAPI-attached Flash
WWW
10Gb Uplink
POWER8 Server
Flash Array w/ up
to 40TB
Differentiated NoSQL(POWER8 + CAPI Flash)
Infrastructure Attributes- 192 threads in 4U Server drawer
- 40 TB of memory based Flash per 4U Drawer- Shared Memory & Cache for dynamic tuning
- Elimination of I/O and Network Overhead- Cluster solution in a box
5X Cost Reduction with
equivalent performance
WWW
500GB Cache Node500GB
Cache Node500GB Cache Node500GB
Cache Node500GB Cache Node500GB
Cache Node
Backup Node
Load Balancer
Today’s NoSQLin memory (x86)
10Gb Uplink
Infrastructure Requirements- Large Distributed (Scale out)
- Large Memory per node- Networking Bandwidth Needs
- Load Balancing
Power CAPI-attached Flash model for NoSQL offers dramatic (24:1) density advantage
© 2014 IBM Corporation
Wrap-up
© 2014 IBM Corporation29
For more information on Big Data / Analytics
● Sales kits
– PartnerWorld
– IBM internal
● Worldwide contacts
– Renato Loffreda-Mancinelli, World Wide Business Analytics and Big Data Solutions on Power - Business Dev. Leader ([email protected])
– Michael Tabron, Solution Offering Manager, Power Analytics ([email protected])
– Gina King, Solution Offering Manager, Big Data Analytics ([email protected])
– Bob Friske, Marketing Manager ([email protected])
© 2014 IBM Corporation30
Q & A
Summary:
1.Getting started with Big Data is the toughest part. Start simple, small, and on the side.
2.The OpenPOWER Foundation enables new systems and helps support the emerging analytic solutions around NoSQL databases.
3.POWER8 technology like CAPI will enable new solutions from IBM and the OpenPOWER Foundation
© 2014 IBM Corporation31
Special notices
This document was developed for IBM offerings in the United States as of the date of publication. IBM may not make these offerings available in other countries, and the information is subject to change without notice. Consult your local IBM business contact for information on the IBM offerings available in your area.
Information in this document concerning non-IBM products was obtained from the suppliers of these products or other public sources. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products.
IBM may have patents or pending patent applications covering subject matter in this document. The furnishing of this document does not give you any license to these patents. Send license inquires, in writing, to IBM Director of Licensing, IBM Corporation, New Castle Drive, Armonk, NY 10504-1785 USA.
All statements regarding IBM future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only.
The information contained in this document has not been submitted to any formal IBM test and is provided "AS IS" with no warranties or guarantees either expressed or implied.
All examples cited or described in this document are presented as illustrations of the manner in which some IBM products can be used and the results that may be achieved. Actual environmental costs and performance characteristics will vary depending on individual client configurations and conditions.
IBM Global Financing offerings are provided through IBM Credit Corporation in the United States and other IBM subsidiaries and divisions worldwide to qualified commercial and government clients. Rates are based on a client's credit rating, financing terms, offering type, equipment type and options, and may vary by country. Other restrictions may apply. Rates and offerings are subject to change, extension or withdrawal without notice.
IBM is not responsible for printing errors in this document that result in pricing or information inaccuracies.
All prices shown are IBM's United States suggested list prices and are subject to change without notice; reseller prices may vary.
IBM hardware products are manufactured from new parts, or new and serviceable used parts. Regardless, our warranty terms apply.
Any performance data contained in this document was determined in a controlled environment. Actual results may vary significantly and are dependent on many factors including system hardware configuration and software design and configuration. Some measurements quoted in this document may have been made on development-level systems. There is no guarantee these measurements will be the same on generally-available systems. Some measurements quoted in this document may have been estimated through extrapolation. Users of this document should verify the applicable data for their specific environment.
Revised September 26, 2006
© 2014 IBM Corporation
Backup
© 2014 IBM Corporation33
Where to find more information? http://openpowerfoundation.org/