© 2009 IBM Corporation Essentials for Test Data Management
2 © 2009 IBM Corporation
Agenda
• Drivers for Effective Test Data Management (TDM)
• Effective Test Data Management
• Test Environment Creation
• Data Masking Considerations
• Editing Test Data
• Compare
• Refreshing Test Environments
• IBM Optim
• Q&A
3 © 2009 IBM Corporation
The Challenge
1
Production 500GB
Training 500GB
Unit Test 500GB
System Test 500GB
UAT 500GB
Integration 500GB
Total 3 TB
Production 500GB
Training 500GB
Unit Test 500GB
System Test 500GB
UAT 500GB
Integration 500GB
Total 3 TB
2
3
4
5
6
4 © 2009 IBM Corporation
Test Data Management (TDM): What & Why?
• Your company can implement a reliable database upgrade
• ensure positive customer
experience
• Compare “before” and “after” images of test data
• De-identify (mask) test data to protect privacy
• Your business can benefit from using IT resources more effectively
• reduce costs
• Edit data to create error and boundary conditions
• Extract related subsets of production data that are targeted to functionality under test
• Your business can deploy new/improved enterprise applications faster without sacrificing quality
• increase revenue generation
• TDM refers to the need to manage data used in testing and other non-production environments
Why?What?
5 © 2009 IBM Corporation
Data Privacy Considerations
• Organizations need the ability to de-identify, mask and transform sensitive data
• Companies can apply a range of transformation techniques to substitute customer data with contextually-accurate but fictionalized data to produce accurate test results
• By masking personally-identifying information, you protect the privacy and security of confidential customer data, and support compliance with local, state, national, international and industry-based privacy regulations
6 © 2009 IBM Corporation
What If I Don’t Do Anything?
• Infrastructure Costs – higher storage costs
• Cloning databases requires more storage hardware
• Larger databases could mean more license costs
• Higher staff costs
• Greater data volumes take longer to clone
• Greater data volumes equates to longer test cycles
• Defects can be expensive
• Costs to resolve defects in production can be 10 – 100 times greater than those caught in the development environment
• Privacy breaches
7 © 2009 IBM Corporation7 6/19/2009
The Symptoms of Poor Testing Strategies
• Management notices that new application functionality is delayed three months
• The business is unable to compete for customers because their software lacks “state-of-the-art”functionality
• The CFO is complaining over how high the IT budget has become to fix application defects
• Developers are sitting around waiting for their copy of the database to work with
8 © 2009 IBM Corporation
TDM: Benefits to Key Stakeholders
CIO� Speed-time-to-market
without sacrificing quality.
� Ensure consistent testing methodologies and reduce costs.
� Minimize threat of data breach.
VP, Line of Business� Ensure a reliable, positive
customer experience.
� Sustain or react to competitive situations quickly.
� Provide customers with sense of security.
Director of IT� Populate realistic test data
to improve testing and quality.
� Streamline testing processes for optimal environment.
� Consistent methodology for privatization of data.
10 © 2009 IBM Corporation
Test Data Management – Building Blocks
Archive Old DataArchive Old DataInspect and Edit Datato Test Error RoutinesInspect and Edit Datato Test Error Routines
Refresh Test DataRefresh Test Data
Correct Errors inProduction Data
Correct Errors inProduction Data
Compare Before/AfterData
Compare Before/AfterData
Create Test EnvironmentCreate Test
Environment
TEST
Go Production !
Create/ModifyApplication
Create/ModifyApplication
Privatization of Personal Information
Privatization of Personal Information
11
22
33
44
55
11 © 2009 IBM Corporation
Environment Creation: Some Current Practices11#1 - Clone Production
WaitWait
Manual examination:Manual examination:Right data?
What Changed?Correct results?
Unintended Result?Someone else modify?
Clone ProductionClone Production
Request for CopyRequest for Copy
Production
Database
Copy
Production
Database
Copy
AfterAfter
ChangesChanges
#2 – Write SQL
Share test databasewith everyone else
• RI Accuracy?• Right Data?
Expensive,Dedicated Staff,Ongoing Responsibility
ChangesChanges
• Complex
• Subject to Change
Write SQLWrite SQL
ExtractExtract
ExtractExtract AfterAfter
12 © 2009 IBM Corporation
11
DevelopmentDevelopment
EnvironmentEnvironment
QAQA
EnvironmentEnvironment
TestTest
EnvironmentEnvironment
TrainingTraining
EnvironmentEnvironment
Production orProduction or
Production CloneProduction Clone DevelopmentDevelopment
EnvironmentEnvironment
Create targeted, right-sized test environments instead of cloning entire production environments
Development environments are then more manageable, speeding the testing process!
What is Subsetting?
13 © 2009 IBM Corporation
11
“When performing the development upgrade, it is important to leverage a representative subset of production data instead of an exact copy; this is because the development environment usually has less capacity in both memory and hard drive space than the test and production environments. Limiting the size of the conversion files during the development upgrade will better ensure that the processes will complete in a timely manner.”
Testing Best Practices – Oracle
Tip #27—Test with a Representative Subset of Production Data
14 © 2009 IBM Corporation
Production
Environment
Baseline Subset
Test(DB2 LUW/ AIX)
Dev(Oracle/ Solaris)
QA(Sybase/ Linux)
Extract/ Archive File
Dynamically load
relational intact data
sets & objects based
on selection criteria
Test Environment Creation Using Subsetting11
15 © 2009 IBM Corporation
Production
Environment
Baseline Subset
Creating the Baseline Subset
2 Common Approaches:
• Clone production and truncate transactions
• Extract and seed common set up data
11
16 © 2009 IBM Corporation
11Extracting a Subset Using Templates
• Criteria can be based on one or more modules
• All Date Values
• Create Date
• Transaction Date
• Effective Date
• Organizations
• Status
• Order number(s)
• “And/Or” combinations
• More….
18 © 2009 IBM Corporation
22 Data Masking
• Also known as: data de-identification, depersonalization, desensitization, obfuscation, data scrubbing
• Technology that helps conceal real data
• Scrambles data to create new, legible data
• Retains the data's properties, such as its width, type and format
• Common data masking algorithms include random, substring, concatenation, date aging
• Used in non-production environments as a Best Practice to protect sensitive data
19 © 2009 IBM Corporation
22 Data Privacy – General Principles
• Do What is Needed – But Not more
• Balance Costs vs. Data Breach Risks.
• Identify Company Best Practices
• Designate an internal champion
• Meet Regulatory/Legal Needs
• Government Regulations/Internal Privacy Policies
• Understand Application and Business Requirements
• Developers should be debugging the test application not the test data. Data should be masked appropriately and consistently in the application
• Volume of Data – Independent Test Environments
• Use smaller test beds of data for frequent refreshes
20 © 2009 IBM Corporation
22 Data Masking Techniques
Example 2Example 2Example 1Example 1
PersNbr FstNEvtOwn LstNEvtOwn
27645 Elliot Flynn
27645 Elliot Flynn
Event TableEvent Table
PersNbr FstNEvtOwn LstNEvtOwn
10002 Pablo Picasso
10002 Pablo Picasso
Event TableEvent Table
Personal Info TablePersonal Info Table
PersNbr FirstName LastName
08054 Alice Bennett
19101 Carl Davis
27645 Elliot Flynn
Personal Info TablePersonal Info Table
PersNbr FirstName LastName
10000 Jeanne Renoir
10001 Claude Monet
10002 Pablo Picasso
•Lookup values
•Intelligence
•Arithmetic expressions
•Concatenated expressions
•Date aging
•String literal values
•Character substrings
•Random/sequential numbers
Referential integrity is maintained with key propagation
Client InformationClient InformationClient Information
Client No. SSN
Name
Address
City State Zip
Client No. SSN
Name
Address
City State Zip
112233 123-45-6789
Amanda Winters
40 Bayberry Drive
Elgin IL 60123
123456 333-22-4444
Erica Schafer
12 Murray Court
Austin TX 78704
Data is masked with contextually correct data to preserve integrity of test data
21 © 2009 IBM Corporation
33 Browse and Edit Test Data
• You must be sure that all logic paths are tested
• BUT…
• Your production data may not contain all the needed test cases
• Errors
• Boundary conditions
• Unusual combinations of data
22 © 2009 IBM Corporation
33 Editing Test Data
• Browse or edit referentially intact sets of data, from multiple related tables, simultaneously on one screen
• Create data values to test program logic
• Inspect and correct data that is causing problems
• Verify execution results
• Dynamically “join” related tables and views, synchronously scroll related data, and edit the data displayed.
• Boundary conditions
• Error conditions
• Rare combinations of data.
23 © 2009 IBM Corporation
44Comparing Data
• Compare the "before" and "after" data from an application test
• Compare results after running modified application during regression testing
• Identify differences between separate databases
• Audit changes to a database
• Compare analyzes complete sets data –finding changes in rows in tables
• Single-table or multi-table compare
• Creates compare file of results
• Displays results on screen
24 © 2009 IBM Corporation
44Analyzing Test Data Results
• Both Invoices total $100
• Composition is different
• Could we have missed an error?
27645 86-4538 Widget#1 $80.00
27645 86-4538 Widget#PG13 $20.00
Invoice Total $100.00
Version 1
Version 2
INVOICES
27645 86-4538 Widget#1 $50.00
27645 86-4538 Widget#PG13 $50.00
Invoice Total $100.00
INVOICES
25 © 2009 IBM Corporation
44Browsing the Compare File
• Generated for each pair of tables
• Identifies tables containing unmatched rows
• Identifies tables containing duplicate match keys
27 © 2009 IBM Corporation
55
Production
Environment
Baseline Subset
Test(DB2 LUW/ AIX)
Dev(Oracle/ Solaris)
QA(Sybase/ Linux)
Extract/ Archive File
Dynamically load
relational intact data
sets & objects based
on selection criteria
Easily Refresh Test Environments
28 © 2009 IBM Corporation
Additional TDM Features to Consider
• Compare Pre and Post Mask
• Extract File Browsing
• Schedule Jobs
• Command Line Interface
• Federated Data Access
• MetaData Extracts
• Re-startability
29 © 2009 IBM Corporation
Benefits of Test Data Management
• Efficient creation and management of test environments
• Environment size reduction
• Replication time reduction
• Fewer users per environment through a segmented test process.
• Increase accuracy of testing through fresher Data
• Reduced time to conduct the tests
• More parallel testing possible
• Reusable tool and methodology
• Reduced risk to test data
• Reduced volume of exposed data.
• Reduced value of exposed data via masking
• Increased regulatory compliance
• Reduced risk of legal exposure
30 © 2009 IBM Corporation
Why Do Something? TDM Saves Money
Leading North American Financial Institution –
Eliminated downtime associated with rebuilding test environments -
savings of up to $250,000 per year. Achieved more than $100,000 annual savings collectively for 10 to 15 projects.
$Large International Financial Services Group –
Reduced the time needed to create a test environment by up to 90% (from 20 days to just 2 days). Improved time-to-deployment of new application
functionality, contributing to critical business/financial initiatives.
Leading Banking & Payment Technology Solutions –
Reduced operational cost and improved efficiencies by reducing
the size of test database from 1.2TB to 24GB
33 © 2009 IBM Corporation
About IBM Optim
• Proven leader in Integrated Data Management (IDM):
• Manage and Control Data Growth
• Data Retention, Compliance & Discovery
• Speed Application Delivery & Quality with Test Data Management
• Speed Application Upgrades & Migrations
• Application Retirement
• Improve Storage Management – ILM
• Improve Application Performance and SLAs
• Solving complex data management issues since 1989
• Global company: 2500 clients; 50% of Fortune 500
• Recognized by Gartner, IDC, META as EDM industry leader with 46% market share.
34 © 2009 IBM Corporation
Optim™ Solves the IDM Challenge
• Archiving
• Improve performance
• Control data growth, save storage
• Support retention compliance
• Streamline upgrades
• Test Data Management
• Create targeted, right sized test environments
• Improve application quality
• Speed iterative testing processes
• Data Privacy
• Mask confidential data
• Comply with privacy policies
• Application Migration & Retirement
• Maintain referential integrity
35 © 2009 IBM Corporation
IBM Integrated Data Management
IBM Optim: Enterprise Architecture
Database Design, Development & Administration, Data Growth, Data Privacy, Test Data
Management, Application Upgrades & Retirements, Data Retention & E-Discovery
36 © 2009 IBM Corporation
Trademarks and disclaimers
Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries./ Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.
Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. IT Infrastructure Library is a registered trademark of the Central Computer and Telecommunications Agency which is now part of the Office of Government Commerce. ITIL is a registered trademark, and a registered community trademark of the Office of Government Commerce, and is registered in the U.S. Patent and Trademark Office. UNIX is a registered trademark of The Open Group in the United States and other countries. Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both. Other company, product, or service names may be trademarks or service marks of others. Information is provided "AS IS" without warranty of any kind.
The customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics may vary by customer.
Information concerning non-IBM products was obtained from a supplier of these products, published announcement material, or other publicly available sources and does not constitute an endorsement of such products by IBM. Sources for non-IBM list prices and performance numbers are taken from publicly available information, including vendor announcements and vendor worldwide homepages. IBM has not tested these products and cannot confirm the accuracy of performance, capability, or any other claims related to non-IBM products. Questions on the capability of non-IBM products should be addressed to the supplier of those products.
All statements regarding IBM future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only.
Some information addresses anticipated future capabilities. Such information is not intended as a definitive statement of a commitment to specific levels of performance, function or delivery schedules with respect to any future products. Such commitments are only made in IBM product announcements. The information is presented here to communicate IBM's current investment and development activities as a good faith effort to help with our customers' future planning.
Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput or performance improvements equivalent to the ratios stated here.
Prices are suggested U.S. list prices and are subject to change without notice. Starting price may not include a hard drive, operating system or other features. Contact your IBM representative or Business Partner for the most current pricing in your geography.
Photographs shown may be engineering prototypes. Changes may be incorporated in production models.
© IBM Corporation 1994-2008. All rights reserved.
References in this document to IBM products or services do not imply that IBM intends to make them available in every country.
Trademarks of International Business Machines Corporation in the United States, other countries, or both can be found on the World Wide Web at http://www.ibm.com/legal/copytrade.shtml.