Agile BI & Data Virtualization Tom Breur [email protected] BI & IM Symposium Bussum, 26 November 2012
Jan 13, 2015
It’s a stretch…Volumes of data are growing (fast)!Variety of sources keeps expanding:
Social media, RFID, log-files, GPS, etc.Business users need their data (much)
sooner:monthly weekly daily intra-day
BI in support of operational processes, calls for (near) real-time data
Why go “Agile”? (1)BI projects fail too often, or don’t live
up to expectationsIncreasingly, BI development takes
place alongside (instead of after) application engineering
Why go “Agile”? (2)Winston Royce (1970):
4www.xlntconsulting.com
Release
Test
Development
Design
Analysis
“In my experience, the simpler model … [as pictured below] has never worked on large
software development efforts”
[Royce subsequently went on to describe an enhanced model, which included building a prototype first and then using the
prototype plus feedback between phases to build a final deployment]
5
History of development methods
20001990198019701960
Brooks (1975) “The Mythical Man Month”
Boehm (1986) “A Spiral model of Software Development and Enhancement”
Martin (1991) “Rapid Application Development”
Jackson (1975) “Principles of Program Design”
1994: DSDM Consortium launched
1997: term eXtreme Programming (XP) ‘invented’
2001: term ‘Agile’ adopted
From “code and fix” to more structured, methodical approaches to software
development
Beck (2000) “Extreme Programming Explained”
Unstructured Prescriptive methods Structured methods
2010
1996: Scrum ‘invented’
Poppendieck (2003) “Lean Software Development”
Anderson (2010) “Kanban”
Cockburn (2004) “Crystal Clear”Royce (1970) “Managing the Development of Large Software Systems”
www.xlntconsulting.com
Quick & Dirty ≠ Agile (1)www.agilemanifesto.org (principle #1):
Creating “technical debt” stands squarely in the way of continuous delivery, and maintaining a so-called “sustainable pace”:it creates (new) legacy!
“Our highest priority is to satisfy the customer through early and
continuous delivery of valuable software” [emphasis added]
Quick & Dirty ≠ Agile (2)Top-down project management
(e.g.: Scrum)
&
Bottom-up software engineering(e.g.: Extreme Programming - XP)
Expedited delivery
&
Architectural integrity
Quick & Dirty ≠ Agile (3)
BI requirementsInformation products trigger change
requests:new data insights new requirements
Gerald M. (Jerry) Weinberg:“Without stable requirements,
development can’t stabilize, either”
BI: means and ends uncertaintyMeans uncertainty How do we get there? Lack of “design
patterns” Data integration fraught
with data quality issues Lack of Master Data
Management Lack of Meta Data No agreement on how
to conform dimensions
Ends uncertainty Where are we going
to? Requirements are
difficult to pin down Diverse end-user groups Ambiguous business
case(s) Scope is unclear Data warehouses are
never “done”
www.xlntconsulting.com 10
Waterfall Agile
www.xlntconsulting.com 11
source: Dean Leffingwell (2011)
Waterfall/Traditional Agile
PlanDrive
n
ValueDrive
n
Requirements Resources Date
Resources Date Requirements
Agile fixes the date and resources and varies the scope
Fixed
Estimated
Weinberg on Quality
www.xlntconsulting.com 12
“If quality isn’t an objective (if the software doesn’t have to
work), you can satisfy any other constraint
(e.g.: budget, time, etc.)” Gerald M. (Jerry) Weinberg
Concurrent development (1)Waterfall: you can avoid mistakes/rework
by getting good requirements upfrontThe most costly mistakes arise from
forgetting important elements early onDetailed planning (BDUF) requires:
early (ill informed) decisions uses more time leading to less tangible products to resolve
ambiguity
13www.xlntconsulting.com vicious cycle
Concurrent development (2)Agile: decide at “last responsible
moment”decisions that haven’t been made, don’t
ever need to be revertedNo “free lunch” – deferring decisions
requires: anticipating likely changecoordination/collaboration within teamclose contact with customers
14www.xlntconsulting.com
Inmon Kimball (1)
3-tiered 2-tiered
Inmon Kimball (2)Problems with Inmon Uncovering the
‘correct’ 3NF model requires scarce business expertise
Unclear where 3NF model boundaries begin and end
Model redesigns trigger a cascading nightmare of parent-child key updates
Problems with Kimball Smallest unit of delivery
is a Star, and incremental growth adds prohibitive overhead
Dimensional structure is very rigid not conducive to expansion or change
Conforming dimensions is hard, especially without access to data
www.xlntconsulting.com 16
3NF Dimensional (1)
www.xlntconsulting.com 17
3NF Dimensional (2)
www.xlntconsulting.com 18
see: Kimball design tip # 149http://www.kimballgroup.com/
2012/10/02/design-tip-149-facing-the-re-keying-crisis/
this problem gets (much!) worse with
multiple parent-child levels
Hyper normalized model
www.xlntconsulting.com 19
business keys, context attributes (history), and relations, all have their own tables
appending “Supplier data” to the model (or any other new source), is guaranteed to be contained as a “local” problem (=extension) in the data modelbecause business keys, context attributes (history), and relations all have their own tables
3-tiered DWH architecture
Legacy
OLTP
ERP
LOG files
External
ETL Staging
Area
Data Warehouse
ODS
Datamart 1
Datamart 2
Datamart n
BusinessIntelligenceApplications
Metadata
3 NF hyper
normalized
dimensional
20www.xlntconsulting.com
Horses for courses3NF
quickly & accurately capture transaction dataeasy to get data in
Hyper normalizedintegrate historical data capture all data, all the time
Dimensionalpresent & analyze dataeasy to get data out
www.xlntconsulting.com 21
Legacy
OLTP
ERP
LOG files
External
ETL Staging
Area
Data Warehouse
ODS
Datamart 1
Datamart 2
Datamart n
BusinessIntelligenceApplications
MetadataBack roomData Warehouse Architecture
Front roomBusiness Intelligence Architecture
Backroom Frontroom
22www.xlntconsulting.com
Divide & Conquer“Break down” semantic gap from
back- to front roomOffer a range of data services:
Source data “as is”Source data that have undergone
cleansingDimensional modelsFull-fledge BI applications
Allow business to set priorities!
Why data virtualization?Operational BI calls for real-time dataIntegrate heterogeneous sources, at
least “in the eye of the beholder”Data virtualization layer hides
complexity about underlying applications& enables sharing of meta data
Data virtualization enables federation, so you can delay (definitive) modeling, yet make data available early
ConclusionBig Data are here to stay
(and lets hope the hype passes soon)Data provide a source of sustainable
competitive advantageSpeed and volume prohibit (wholesale)
copying: virtualization is the way forward
Agile BI enables business alignment, and gives us a “sporting chance” to keep up
ConclusionBig Data are here to stay
(and lets hope the hype passes soon)Data provide a source of sustainable
competitive advantageSpeed and volume prohibit (wholesale)
copying: virtualization is the way forward
Agile BI enables business alignment, and gives us a “sporting chance” to keep up
ConclusionBig Data are here to stay
(and lets hope the hype passes soon)Data provide a source of sustainable
competitive advantageSpeed and volume prohibit (wholesale)
copying: virtualization is the way forward
Agile BI enables business alignment, and gives us a “sporting chance” to keep up