Thinking Outside the Cube: How In-Memory Bolsters Analytics
Post on 11-May-2015
370 Views
Preview:
DESCRIPTION
Transcript
The Briefing Room
Thinking Outside the Cube: How In-Memory Bolsters Analytics
Twitter Tag: #briefr
The Briefing Room
Welcome
Host: Eric Kavanagh
eric.kavanagh@bloorgroup.com
Twitter Tag: #briefr
The Briefing Room
! Reveal the essential characteristics of enterprise software, good and bad
! Provide a forum for detailed analysis of today’s innovative technologies
! Give vendors a chance to explain their product to savvy analysts
! Allow audience members to pose serious questions... and get answers!
Mission
Twitter Tag: #briefr
The Briefing Room
Topics
This Month: ANALYTIC PLATFORMS
September: ANALYTICS
October: DATA PROCESSING
Twitter Tag: #briefr
The Briefing Room
Analytic Platforms
~Albert Einstein
If you always do what you always did, you will always get what you always got.
“ “
Twitter Tag: #briefr
The Briefing Room
Analyst: Mark Madsen
Mark Madsen is president of Third Nature, Inc.
Twitter Tag: #briefr
The Briefing Room
! IBM Cognos Business Intelligence is an enterprise BI platform with an open-data access strategy
! The platform includes IBM Cognos Dynamic Cubes, an in-memory relational OLAP component that complements the existing query engine
! Dynamic Cubes can enable users to perform interactive analysis and reporting over terabytes of data
IBM
Twitter Tag: #briefr
The Briefing Room
Guest: Chris McPherson
Chris McPherson is a Senior Product Manager on the IBM Business Analytics Platform team in the IBM Canada Ottawa Lab. His current area of responsibility is IBM Cognos Dynamic Cubes but prior to that, he was product owner for Modelling, Metadata and EII for the Cognos suite of tools. He has more than nine years of experience within the IBM Business Analytics organization.
© 2012 IBM Corporation
IBM Cognos Dynamic Cubes Chris McPherson – Senior Product Manager IBM Business Analytics
© 2012 IBM Corporation 10
High performance analytics over growing data volumes
Aggregate awareness Aggregate optimization
In-memory caching of members, data, expressions, results, and aggregates
Dynamic Cubes Feature mission
© 2012 IBM Corporation 11
Extensive caching
– Shared caches for maximum reuse
– All caches are security aware
Data Cache
In-Memory Aggregate Cache
Expression Cache
Result Set Cache
Member Cache
Security
MDX Engine
Security
Data Warehouse
Security
© 2012 IBM Corporation 12
Security
Security
Security
Data Cache
In-memory Aggregates
Expression Cache
Member Cache
MDX Engine
Result Set Cache
Query Processor
DQM
Dynamic Cube
DQM
© 2012 IBM Corporation 13
Security
Security
Security
Data Cache
In-memory Aggregates
Expression Cache
Member Cache
MDX Engine
SQL queries to obtain member information
Result Set Cache
Query Processor
DQM
Dynamic Cube
DQM
© 2012 IBM Corporation 14
Security
Security
Security
Data Cache
In-memory Aggregates
Expression Cache
Member Cache
MDX Engine
SQL queries to obtain member information
SQL queries to obtain aggregate data
Result Set Cache
Query Processor
DQM
Dynamic Cube
DQM
© 2012 IBM Corporation 15
Security
Security
Security
Initial Query
Data Cache
In-memory Aggregates
Expression Cache
Member Cache
MDX Engine
SQL queries to obtain member information
SQL queries to obtain fact and summary data
SQL queries to obtain aggregate data
Search aggregate cache for data
Result Set Cache
Query Processor
DQM
Dynamic Cube
DQM
© 2012 IBM Corporation 16
Security
Security
Security
Initial Query
Data Cache
In-memory Aggregates
Expression Cache
Member Cache
MDX Engine
SQL queries to obtain member information
SQL queries to obtain fact and summary data
SQL queries to obtain aggregate data
Search aggregate cache for data
Result Set Cache
Query Processor
DQM
Dynamic Cube
DQM
© 2012 IBM Corporation 17
Dynamic Cube Lifecycle
1. Model & publish
The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.
2. Deploy, manage 3. Reporting & analytics
4. Optimize
Dynamic Cube Server
Dynamic Cube Logs
CM
Warehouse
© 2012 IBM Corporation 18
1. Launch Aggregate Advisor Wizard 2. Run with or without workload
Optimize per report, package, user, or time
3. Advisor returns with in-memory and/or in-database recommendations 4. Save recommendations
§ In-memory aggregates created on re-start à No re-modeling or re-authoring required § DBA creates in-database aggregate tables, and modeler updates model and redeploys
Aggregate Advisor for in-memory aggregates Easy performance improvements
© 2012 IBM Corporation 19
Virtual Cubes
Virtual cube used as source for another
virtual cube
Combines cubes with common Time dimension
Virtual cubes combine two
cubes
Combines cubes with nearly identical
dimensions
Inventory Sales
Sales Inventory
Store Sales
Web Sales
© 2012 IBM Corporation 20
Time Current Month
All Sales cube
All Sales
Current Month Sales
Historic Sales
Virtual Cubes Low latency & faster cube refresh
© 2012 IBM Corporation 21
Cognos Dynamic Cubes - Summary
High Performance • 80x improvement with aggregates • 80% queries under 3 seconds • Over 50% queries sub-second
Growing Data Volumes • Scalable to terabytes of fact data
Flexible and Optimized • You choose where to take advantage of in-
memory capabilities • Aggregate Advisor to easily create
optimized aggregates
Maximize Value of Data Warehouse • Aggregate awareness to balance load
across app and DB tiers • Reduce load on database through use of
application tier caching
21
© 2012 IBM Corporation 22
Twitter Tag: #briefr
The Briefing Room
Perceptions & Questions
Analyst: Mark Madsen
Commentary on analysis and performance,
IBM Business Analy8cs Briefing Room
August, 27 2013 Mark Madsen www.ThirdNature.net @markmadsen
Terminology Disambigufica8on Analysis: a. The separa7on of an intellectual or material whole into its cons7tuent parts for individual study.
b. The study of such cons7tuent parts and their interrela7onships in making up a whole.
Analy8cs: the mathy stuff, like sta7s7cs, machine learning, numerical methods, data mining* (so I won’t use the term as a synonym for OLAP) In-‐memory: a vague term mainly implying not using disks for immediate data access
BI is using broken metaphors
We think of BI as publishing, which is only one part.
Most BI is built on an outdated interac8on model
Result of a poor interac8on model
Delayed interac<on disrupts work
"...each second of system response degradation leads to a similar degradation added to the user's time for the following [command]. This phenomenon seems to be related to an individual's attention span. The traditional model of a person thinking after each system response appears to be inaccurate. Instead, people seem to have a sequence of actions in mind, contained in a short-term mental memory buffer. Increases in SRT [system response time] seem to disrupt the thought processes, and this may result in having to rethink the sequence of actions to be continued.“
Note nonlinearity in graph, an indication that something important is happening.
“The Economic Value of Rapid Response Time “, IBM 1982
Tradi8onal BI fails to put users into the flow zone Flow (Csíkszentmihályi) ▪ Concept of engagement and immersion in a task ▪ The appropriate applica7on of tools and knowledge to analy7cal problems enables produc7vity. ▪ The s7lted interac7on of BI disrupts flow.
Interac8on 8mescale for analysis problems
Un7l you resolve this task performance gap, real analysis work is a challenge (and a reason why Excel remains popular).
Days
Hours
Minutes
Seconds
Instantaneous
come back tomorrow
go to lunch
take a break
get some coffee
check email/FB
take a sip of coffee
immerse yourself in work Flow is possible only in the “less than 3 second” range
Future-‐proofing
The tool market is shiIing, driven by new architectures that are enabled by new technologies. Front-‐end tools are evolving away from BI-‐as-‐publishing, which is going to increase the burden on the back end data stores and cause interac7on problems. You need to evaluate tools based on more detailed usage scenarios and interac7ve capabili7es, less on report-‐building features.
BI should support two sets of ac8ons. One is monitoring the known, one is analyzing the unknown.
Collect new data
Monitor Analyze Exceptions
Analyze Causes Decide Act
No problem No idea Do nothing
Act on the process Usually days/longer timeframe
Act within the process Usually real-time to daily
The real BI design point: context and point of use Information use is diverse and varies based on context: ▪ Get a quick answer ▪ Solve a one-off problem ▪ Make repetitive decisions ▪ Monitor routine processes ▪ Make complex decisions ▪ Choose a course of action ▪ Convince others to take
action Different problems require different response times in order to be effective.
How expensive was performance? 500 GB DW…
Maximum Capacities
• 2 to 30 100MHz Intel Pentium processors
• Up to 3.5GB system memory • Up to 1.7TB of on-line storage
Base Configuration
• 18 slot Sequent bus chassis • 1 Proc card - dual 100MHz Pentium
CPUs • 1 2.1GB SCSI boot disk • 1 CD-ROM/QIC-525 1/4” Tape • 1 Memory controller (64MB, 256MB) • 1 Integrated Ethernet • 5-slot VMEbus chassis • Room for 3 additional 5.25” devices
Expansion Options
• Up to 400 SCSI-2 disks • Up to 29 VMEbus slots • Up to 8 QCIC I/O controllers • Token Ring, FDDI LAN adapters • Sync or Async communications
ports
Price: $1.6 million in 1993
OLAP was a response-‐8me answer
The Codd OLAP paper wriPen for a vendor in 1993: state of the art client technology was the 60 Mhz Intel Pen7um, Windows version 3.1; server tech was the $1M+ database server It’s s7ll hard to get less than 3 second response 7mes from a round-‐trip to a DB It’s s7ll hard to get interac7on right when the BI model is mainly compose-‐compile-‐execute.
You lied about it being in-memory I didn’t say it
would all fit in at the same time…
Differen8a8ng in-‐memory claims Tool vs PlaEorm: OLAP is (generally) in-‐memory technology; there are tradeoffs in the choice
PlaEorm: a) Conven<onal: use a large buffer pool and cache or pin
everything in memory. Speeds up a DB, but not really “in-‐memory”.
b) Memory op<mized: designed assuming all or mostly in memory; map the data needed for opera7ons to memory and/or add features to recognize and use large-‐memory configura7ons.
c) In-‐memory: purpose-‐built, the en7re database is resident in main memory; the only disk access is loading on a cold start or logging changes.
Some ques8ons to start discussion 1. Will this work with any database back-‐end? 2. Who are these features aimed at: end users or the
people who define structures and manage data for the end users?
3. Are cube defini7ons sta7c in this model? 4. Can cubes be populated in slices or layers based on what
a person is looking at? 5. How do the caching improvements address cube-‐
building 7mes? 6. Is this addressing sta7c performance management or
dynamic? 7. Are virtual cubes defined by the user or admin or can
they be automa7c?
About the Presenter
Mark Madsen is president of Third Nature, a technology research and consul7ng firm focused on business intelligence, data integra7on and data management. Mark is an award-‐winning author, architect and CTO whose work has been featured in numerous industry publica7ons. Over the past ten years Mark received awards for his work from the American Produc7vity & Quality Center, TDWI, and the Smithsonian Ins7tute. He is an interna7onal speaker, a contributor at Forbes Online and Informa7on Management. For more informa7on or to contact Mark, follow @markmadsen on TwiPer or visit hPp://ThirdNature.net
About Third Nature
Third Nature is a research and consulting firm focused on new and emerging technology and practices in analytics, business intelligence, and performance management. If your question is related to data, analytics, information strategy and technology infrastructure then you‘re at the right place.
Our goal is to help companies take advantage of information-driven management practices and applications. We offer education, consulting and research services to support business and IT organizations as well as technology vendors.
We fill the gap between what the industry analyst firms cover and what IT needs. We specialize in product and technology analysis, so we look at emerging technologies and markets, evaluating technology and hw it is applied rather than vendor market positions.
CC Image AWribu8ons
Thanks to the people who supplied the crea7ve commons licensed images used in this presenta7on: train_to_sea.jpg -‐ hPp://www.flickr.com/photos/innoxiuss/457069767/ well town hall.jpg -‐ hPp://flickr.com/photos/tuinkabouter/1135560976/
Twitter Tag: #briefr
The Briefing Room
Twitter Tag: #briefr
The Briefing Room
September: ANALYTICS
October: DATA PROCESSING
Upcoming Topics
www.insideanalysis.com
Twitter Tag: #briefr
The Briefing Room
Thank You for Your
Attention
top related