DATA MANAG EMENT & INTEG RATIO N BUSINESS DILEM MAS: SO MANY METHODO LOGI E S, S O LIT TLE TIME By: Marianne Gleason, PMP, CSSBB Data Management & Warehouse Consultant
Jun 12, 2015
DATA M
ANAGEMENT
&
INTE
GRATIO
N BUSIN
ESS
DILEMMAS:
SO
MA
NY
ME
TH
OD
OL O
GI E
S,
SO
LI T
TL E
TI M
E
By: Marianne Gleason, PMP, CSSBBData Management & Warehouse Consultant
04/13/2
023
D ATA M A N A G E M E N T & I N T E G RAT I O N : B U S I N E S S D I L EM M A S
DEFINITION OF DATA MANAGEMENT
Data Management: The business function of planning for, controlling and delivering data and information assets. This function includes:
The disciplines of development, execution, and supervision of plans, policies, programs, projects, processes, practices, that control, protect deliver, and enhance the value of data information assets.
--- DMBOK
2
04/13/2
023
D ATA M A N A G E M E N T & I N T E G RAT I O N : B U S I N E S S D I L EM M A S
THE STORY OF TWO LIFECYCLES
SYSTEM DEVELOPMENT LIFECYCLE (SDLC)
Plan Analyze Design Build Test Deploy Maintain
DATA LIFECYCLE
Plan Specify EnableCreate
& Acquire
Maintain & Use
Archive &
Retrieve
Purge
Data is created or acquired, stored and maintained, used, and eventually purged.As I‘m sure many businesses, SMB and Enterprise alike, agree, here’s where it gets interesting. This is due to the dynamics of data, as it may be extracted, imported, exported, validated, cleansed, transformed, aggregated, analyzed, reported, updated, archived, and backed up, to name a few, prior to purging.
3
04/13/2
023
D ATA M A N A G E M E N T & I N T E G RAT I O N : B U S I N E S S D I L EM M A S
HOW DO WE TRANSFORM THE TRADITIONAL LIFE CYCLE TO HANDLE TODAY’S DATA INTEGRATION
DEMANDS?W A T E R FA L L
M E T H O D O L O G Y A G I L E M E T H O D O L O G Y
4
04/13/2
023
D ATA M A N A G E M E N T & I N T E G RAT I O N : B U S I N E S S D I L EM M A S
COMPONENTS OF AGILE
● Story Writing● Estimation● Release Planning● Sprint Planning ● Metrics
APPLY TO DATA INTEGRATION LIFE CYCLE
KEYS ARE:1. CADENCE2. CALLABORATION3. COMMUNICATION4. RISK MITIGATION5. MINIMIZE DATA TIME TO USE FOR THE BUSINESS
5
04/13/2
023
D ATA M A N A G E M E N T & I N T E G RAT I O N : B U S I N E S S D I L EM M A S
HOW DOES AGILE APPLY TO DATA INTEGRATION?
For the purpose of this presentation, I will be providing examples in relation to an enterprise data warehouse (EDW). In this case, the data sets are large, unstructured data which is referring to data that does not fit well into relational database management systems (RDMS).
6
04/13/2
023
D ATA M A N A G E M E N T & I N T E G RAT I O N : B U S I N E S S D I L EM M A S
EXAMPLE: ADDING COMPLEX DATA FROM A NEW SOURCE INTO THE ENTERPRISE DATA WAREHOUSE
(EDW)Below are process steps within an Iteration that integrates with the Agile Components and the macro Data Integration Life Cycle
Requirements Data
Profiling
Coding & Data
Transformation Rules & Mappings
Development / Coding
QA & System Testing /
ValidationDeployment
ReworkRework Rework
Rework
DATA GOVERNANCE (Meta Data andDocument Control)
COMMUNICATION &RISK MITIGATION
7
04/13/2
023
D ATA M A N A G E M E N T & I N T E G RAT I O N : B U S I N E S S D I L EM M A S
Requirements
• Story Writing• Estimation
Data Profiling
• Estimation• Release Planning• Spring Planning
Coding &
Data Transformation Rules &
Mappings
• Estimation• Release Planning• Sprint Planning
Development / Coding
• Release Planning• Sprint Planning
QA &
System Testing / Validation
• Estimation• Release Planning• Sprint Planning• Metrics
Deployment
• Retrospective / Lessons Learned• Continuous Improvement
HOW DO WE USE THE AGILE COMPONENTS WITH THE DATA INTEGRATION LIFE CYCLE?
Story Writing
Estimation
Release Planning
Sprint Planning
Metrics
COMMUNICATION
8
04/13/2
023
D ATA M A N A G E M E N T & I N T E G RAT I O N : B U S I N E S S D I L EM M A S
STORY WRITING
How does a team determine requirements?
● Understand the business case / problem statement
● Draw on team’s expertise to determine tables affected for new data source
● Data Profiling can assist in determining database tables affected
● Define all areas of the business affected – Define as Epic vs. Function vs. Task Breakdown
Tools that can be used:
User Stories, Refer to Stakeholder Matrix, Card, User Conversations, Confirmation (Consensus), Acceptance Criteria, System As A Whole Mentality w/in Scope, What/Why/How Personas, Questionnaires, Observations, SMEs, SPIOC Diagrams, Ishikaw Diagrams, RACI Matrix, to name a few
9
04/13/2
023
D ATA M A N A G E M E N T & I N T E G RAT I O N : B U S I N E S S D I L EM M A S
EPIC STORY WRITING EXAMPLE (SIPOC) =>STORIES FOR LARGE DATA SETS
Define the Process
Supplier Input Process Output Customer
WHO are your primary customers?
WHAT does the customer receive? (Think of their CTQ’s)
What STEPS are Included in the Process today? (high level)
What is provided to START the process?
Who PROVIDES the input?
(Who) (Nouns) (Verbs) (Nouns) (Who)
Requirements
Data Profiling
Coding & Data Transformation Rules and Mappings
Development / Coding
QA & System Testing / Validation
Deployment
Regulations
Data Transportation & Security
Staff Training & Availability (Resources)
IDS, EDW, Data Mart / Tables Effected
Database Environment / Platform(s)
Methodology & Standards
Process Project / Program Management Plans
Software / Hardware Vendors
Source Input Customer/ Organization
Government
Internal Functions affected by data / SMEs
Cycle Time for Data to Use
Report Generation / External Extracts
Valid / Invalid Data to the Warehouse
Metric Evaluation
Data Analytics (Transactional / Analytical)
Risk Analysis
Testing Results and Evaluations
Third Party Extract Recipients
Stakeholders (Internal / External)
Regulators
Vendors
Mobile Device / Web Customers
10
04/13/2
023
D ATA M A N A G E M E N T & I N T E G RAT I O N : B U S I N E S S D I L EM M A S
ESTIMATION● Understand the assumptions and constraints
● Make sure requirements are understood
● Understand potential and known areas of rework
● Use historical throughputs of similar projects
● Estimations are not contracts – so have cultural flexibility with the team
● Break down requirement(s) stories into tasks
● Monitor backlogs throughout iteration => helps for sprint determination
Tools That Can Be Used:
Poker Planning, Historical Estimates, Velocities for Sprints, Forecasting as a Range/Percentage (Short Term) for sprints and project durations, Project Cost Estimations from Velocity Forecasting, Process Mapping, Hypothesis Statements
11
04/13/2
023
D ATA M A N A G E M E N T & I N T E G RAT I O N : B U S I N E S S D I L EM M A S
ESTIMATION EXAMPLEThree Components:
■ Estimate Size of Stories = Defines Sprint
■ Measure Velocity For Each Iteration = Total Sprints Throughput
■ Forecast Duration
Sprin
t 1
Sprin
t 2
Sprin
t 3
Sprin
t 4
0
2
4
ESTIMATION(STORY PTS.)
SPRINT
Forecast: Predict using a Range and a % using Project backlog- Derive Low Velocity- Derive High
Velocity- Derive Average
Velocity- Forecast project
duration by # of sprints then convert to $/sprint then $/iteration
Iteration 1
12
STORY – MAP A DBASE TABLE
(CLAIM)
Define fields to be mapped
(100)
Profile source to target data for
mapping / coding
complexity
TASK
TASK
04/13/2
023
D ATA M A N A G E M E N T & I N T E G RAT I O N : B U S I N E S S D I L EM M A S
RELEASE PLANNING● Paradigm shift between traditional plan driven to agility driven from vision
and values.
● Agile Levels: DI Vision, DI Roadmap, Go Live Plan, Iteration Plan, Daily Commitment
● Set iterations to fit DI Roadmap (usually 1 – 4 week timeframe); decrease data to business use cycle times
● Connects strategic vision to delivery approach (source to target), Eliminates Waste (rework) / Lean, Eliminates Variation, Better Decision Making, Improves Communication, Improves Morale
● Release Planning/DI Planning leads to Roadmap, Plan, Backlog
● Key Elements: Schedule, Estimates on Epics / Stories, Prioritized Backlogs, Velocity of Team
Bottom Line to Tools: Complexity is Estimated, Velocity is Measured, Duration is Derived
13
04/13/2
023
D ATA M A N A G E M E N T & I N T E G RAT I O N : B U S I N E S S D I L EM M A S
RELEASE PLANNING PICTORIAL
Iteration 1
Iteration 2
Iteration 3
RELEASE / DATA INTEGRATION PHASE 1
Iteration 4
Iteration 5
Iteration 6
Iteration 7
Iteration 8
RELEASE / DATA INTEGRATION PHASE 2
14
04/13/2
023
D ATA M A N A G E M E N T & I N T E G RAT I O N : B U S I N E S S D I L EM M A S
SPRINT PLANNING
● Determine and agree on the sprint and next sprint goals
● Determine required attendees, inputs and outputs
● Prioritized logs/backlogs and validate based on estimates
● Review and seek clarification of stories & tasks
● Define and estimate the work plan by breaking into tasks from user stories
● Daily Standups
● Sprint Review and Demo Integration
● Retrospective / Lessons Learned
15
04/13/2
023
D ATA M A N A G E M E N T & I N T E G RAT I O N : B U S I N E S S D I L EM M A S
EXPANDING ON SPRINT PLANNING ELEMENTS● Participation
● Prioritized Backlog
● Presentation of Candidates Stories
● Agreeing On Sprint Goal
● Validation of Sprint Backlog Based on Team
Estimation of Stories
● Capacity Planning
● Defining and Estimating the Work Plan
● Daily Stand Up Meetings
● Sprint Review and Closeout
● Retrospective / Lessons Learned
16
04/13/2
023
D ATA M A N A G E M E N T & I N T E G RAT I O N : B U S I N E S S D I L EM M A S
METRICS
● Derive measurements (Quantitative/Qualitative)
● Leading / Lagging measurements
● Metrics must be motivational and informative
● Determine whether tasks are done – either 100% complete or not complete
● Some agile metrics (going beyond common metrics):
■ Velocity – Sum of points delivered for each iteration / # of iterations
■ Burndown – Rate at which requirements are being delivered
■ Burnup – Project story points are being met – (i.e. scope)
■ Cumulative Flowcharts – The requirements are in respect to the lifecycle over time (i.e. Not Started, In Progress, Pending Acceptance, Completed)
Leads to more accurate OLAP and/or OLTP for BI and Analytic results in conjunction with the company’s business model and dynamic efforts regarding data management strategic planning efforts.
17
04/13/2
023
D ATA M A N A G E M E N T & I N T E G RAT I O N : B U S I N E S S D I L EM M A S
EXAMPLES OF AGILE METRICS - BURNDOWN
18
1 2 3 4 50
10
20
30
40
50
60
70
80
90
IdealActual
Iterations
% COMPLETE
04/13/2
023
D ATA M A N A G E M E N T & I N T E G RAT I O N : B U S I N E S S D I L EM M A S
19
0%
20%
40%
60%
80%
100%
120%
Mapping Coding TargetDomains
Meta Data DataStandardsUnclear
Joins Data Type ForeighKey
Lookup
Grouping Wrong SKValue
Fre
qu
en
cy
Cause
QATesting Defects Pareto Chart
%
Cumulative %
04/13/2
023
D ATA M A N A G E M E N T & I N T E G RAT I O N : B U S I N E S S D I L EM M A S
EXAMPLES OF AGILE METRICS - BURNUP
20
04/13/2
023
D ATA M A N A G E M E N T & I N T E G RAT I O N : B U S I N E S S D I L EM M A S
EXAMPLES OF AGILE METRICS - ITERATION COST USING VELOCITY
21
If backlog is sized at 60 storyPoints, using this velocity trend The projected duration is:
Range:
Low Velocity: 10 story pointsHigh Velocity : 30 story pointsAverage Velocity: 20.5 story points
The team’s velocity ranged from10 to 30 story points.
60/10 = 6 sprints60/30 = 2 sprints
Backlog will release between 2 and 6 sprints
Notice Sprints 1 and 2 have a high degree of story point variability, as the team is likely in the Forming/Storming team development stages. Sprints 3 & 4 tend to be closer in story points, as the team begins to attain the Norming/Performing team development status.
Sprint 1 Sprint 2 Sprint 3 Sprint 40
5
10
15
20
25
30
Iteration - Duration Es-timate
Estimate
If cost per sprint is $10,000 then iteration range prediction is:
Low Estimate: (2 sprints)(10,000) = $20,000High Estimate: (6 sprints)(10,000) = $60,000Avg. Estimate (2.9 sprints)(10,000) = $29,000