Top Banner
Ajay Nalabhatla , QA Lead Srihari Gopisetty , Technology Manager Wells Fargo India Solutions 1 Data Warehouse Testing Best practices to improve and sustain Data Quality – Getting ready for Serious DevOps
32

Data Warehouse Testing Best practices to improve …qaistc.com/2017/wp-content/uploads/2017/12/ajay_srihari.pdf · Data Warehouse Testing Best practices to improve and sustain Data

Aug 27, 2018

Download

Documents

dangdiep
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Data Warehouse Testing Best practices to improve …qaistc.com/2017/wp-content/uploads/2017/12/ajay_srihari.pdf · Data Warehouse Testing Best practices to improve and sustain Data

Ajay Nalabhatla , QA Lead

Srihari Gopisetty , Technology Manager

Wells Fargo India Solutions

1

Data Warehouse Testing Best practices to improve and sustain Data Quality

– Getting ready for Serious DevOps

Page 2: Data Warehouse Testing Best practices to improve …qaistc.com/2017/wp-content/uploads/2017/12/ajay_srihari.pdf · Data Warehouse Testing Best practices to improve and sustain Data

Abstract

In the age of Digital disruption, every organization wants to transform its technology arm in to advance practices - DevOps , to and fulfill the continuous demand from Business.

However, all organizations are data driven and they need to realize that the success does not rely only on faster throughput and speed but also on the ability to access the diverse, large volume of

complex data at real time to make strategic decisions.

Important question is – Does the organization have ‘Quality’ Data ?

“On average, U.S. organizations believe 32% of their data is inaccurate” -Gartner “Average organization loses $8.2 million annually due to poor Data Quality.“ -Experian “Less than 0.5% of all data is every analyzed” -Forrester

2

Even as many organizations are establishing the Data Warehouse Testing as specializedservice, recent surveys indicate that much more improvements needs to be done. It is acall–to-action for organizations to address Data Quality gaps

Page 3: Data Warehouse Testing Best practices to improve …qaistc.com/2017/wp-content/uploads/2017/12/ajay_srihari.pdf · Data Warehouse Testing Best practices to improve and sustain Data

Setting the context

Data are of high quality "if they are fit for their intended uses in operations, decision making and planning.“– J.M.Juran (Source –dqglossary.com)

Issue Drivers QA Key CausesData Quality

REMEMBER : Poor Data Quality = Use of Less Information for Decision Making3

DimensionValidity

CompletenessTimeliness

Integrity

AccuracyConsistency

Unavailability of Complete Data

ETL Transformation

Delayed Batch SLA

Batch Performance

Obsolete Jobs & Records

No Exhaustive Validation

Missing Defined Test Strategies

Lack of Tools / Accelerators

Incomplete DB Objects Validation

Missing End -to- End QA Framework

Missing Standard Process

Page 4: Data Warehouse Testing Best practices to improve …qaistc.com/2017/wp-content/uploads/2017/12/ajay_srihari.pdf · Data Warehouse Testing Best practices to improve and sustain Data

Where, What, Why

Staging/ODS

Xml

Ebcdic

Heterogeneous Sources

Data Warehouse BI

Ascii

DB

Extracts

Inte

rnal

So

urce

s

External Sources

Views

ETL

Extracts Downstream Apps Downstream

AppsET

L

4

Other DB Objects

DB Objects

Static TestingETL

Transformation Testing

Staging /ODS Validation

Data Warehouse Validation

Data Quality & Objects Validation

Batch Performance

BI Testing / Extracts

High Level Tests

High Medium Low

Views

Tables

Reports

Applications

OLTP OLAP

Page 5: Data Warehouse Testing Best practices to improve …qaistc.com/2017/wp-content/uploads/2017/12/ajay_srihari.pdf · Data Warehouse Testing Best practices to improve and sustain Data

QA Framework – For High Quality

Exhaustive validation at every intermediate check point

Data Integrity Validation – RI checks etc.

Heterogeneous sources validation –Xml/Ascii /Ebcdic

Database Privileges validation at Table/View/Report Level

Runbook and scheduler/dependence validations

Database Objects Validation – Partitions, Synonyms, Flashback etc.

Batch Performance Execution

BI reports –UI ,Data & Performance validation

5

Page 6: Data Warehouse Testing Best practices to improve …qaistc.com/2017/wp-content/uploads/2017/12/ajay_srihari.pdf · Data Warehouse Testing Best practices to improve and sustain Data

Specialized Testing

Database Object Validation - Partition

Dat

abas

e Ta

ble

Users View DBA’s View

Regression Interval & Merge Purge >13MDay Wise Partition

Inside DB Table 1st time LoadSource Merge @ Monthly Purge >13 Months

Jan’17

Feb’17Day2

Day 31

Day3

Daily Loads

Feb ‘18

Test

Str

ateg

y

Test Flow

Dec’17

Feb’17

Feb ‘18

6

Page 7: Data Warehouse Testing Best practices to improve …qaistc.com/2017/wp-content/uploads/2017/12/ajay_srihari.pdf · Data Warehouse Testing Best practices to improve and sustain Data

Automation Possibility

Extracts

Parameter Home Grown Tools External ToolsStatic Testing Limited Limited

Source File – Metadata / Layout / Fields Order Yes – Macro / UNIX Shell Yes

Exhaustive Validation – Diff Server DB’s Yes – Macro / UNIX Shell Yes for BothBatch Job execution in sequence Yes - UNIX NoHeterogeneous File Load & Comparison (ASCII / XML / EBCEDIC) Yes – ASCII / XML only YesRegression Testing – Tables /Extracts Yes – Macro / UNIX YesData Quality checks Yes – Macro / UNIX YesTable Metadata validation Yes – Macro / UNIX YesBI Reports Validation (Data /Graphs) Yes – Only Data YesBatch Performance Testing Yes -UNIX Yes

Views ValidationYes – Macro using ODBC

/ UNIX YesPartition /Index validation No NoTest Cases Batch Execution yes –Unix / Excel YesAutomate Test Execution Scheduler yes –Unix Yes

Market Tools= Automation possibility

7

Page 8: Data Warehouse Testing Best practices to improve …qaistc.com/2017/wp-content/uploads/2017/12/ajay_srihari.pdf · Data Warehouse Testing Best practices to improve and sustain Data

Integrate the ETL testingIn DevOps

Usu

al A

rchi

tect

ure

for E

TL

Static Testing ETL/DW Testing Batch Performance Testing

8

Page 9: Data Warehouse Testing Best practices to improve …qaistc.com/2017/wp-content/uploads/2017/12/ajay_srihari.pdf · Data Warehouse Testing Best practices to improve and sustain Data

Demo Overview

9

CICT

CD

Static Testing ETL/DW Testing Batch Performance Testing

Page 10: Data Warehouse Testing Best practices to improve …qaistc.com/2017/wp-content/uploads/2017/12/ajay_srihari.pdf · Data Warehouse Testing Best practices to improve and sustain Data

Sources & ETL Jobs

10

ETL Jobs

Shel

l Scr

ipt

SQL - Test cases

Page 11: Data Warehouse Testing Best practices to improve …qaistc.com/2017/wp-content/uploads/2017/12/ajay_srihari.pdf · Data Warehouse Testing Best practices to improve and sustain Data

Data Validation Test case

11

MyS

QL

Test

.SQ

L fi

le

.SQ

L in

Sc

ript

Page 12: Data Warehouse Testing Best practices to improve …qaistc.com/2017/wp-content/uploads/2017/12/ajay_srihari.pdf · Data Warehouse Testing Best practices to improve and sustain Data

Jenkins Dashboard

Page 13: Data Warehouse Testing Best practices to improve …qaistc.com/2017/wp-content/uploads/2017/12/ajay_srihari.pdf · Data Warehouse Testing Best practices to improve and sustain Data

Jenkins – First job Creation

Page 14: Data Warehouse Testing Best practices to improve …qaistc.com/2017/wp-content/uploads/2017/12/ajay_srihari.pdf · Data Warehouse Testing Best practices to improve and sustain Data

Adding Subsequent job(s)

Page 15: Data Warehouse Testing Best practices to improve …qaistc.com/2017/wp-content/uploads/2017/12/ajay_srihari.pdf · Data Warehouse Testing Best practices to improve and sustain Data

ETL Test case Execution Job

Page 16: Data Warehouse Testing Best practices to improve …qaistc.com/2017/wp-content/uploads/2017/12/ajay_srihari.pdf · Data Warehouse Testing Best practices to improve and sustain Data

Dependency Scheduling for Pipeline

Page 17: Data Warehouse Testing Best practices to improve …qaistc.com/2017/wp-content/uploads/2017/12/ajay_srihari.pdf · Data Warehouse Testing Best practices to improve and sustain Data

GitHub – Add WebHook with Jenkins

Page 18: Data Warehouse Testing Best practices to improve …qaistc.com/2017/wp-content/uploads/2017/12/ajay_srihari.pdf · Data Warehouse Testing Best practices to improve and sustain Data

Jenkins – GitHub for Automatic Triggering

Jenkins Scheduling using ‘Poll’ Feature

Page 19: Data Warehouse Testing Best practices to improve …qaistc.com/2017/wp-content/uploads/2017/12/ajay_srihari.pdf · Data Warehouse Testing Best practices to improve and sustain Data

GitHub , Add files to GitHub

Page 20: Data Warehouse Testing Best practices to improve …qaistc.com/2017/wp-content/uploads/2017/12/ajay_srihari.pdf · Data Warehouse Testing Best practices to improve and sustain Data

Pipeline – For Sequence Execution

Batch Execution Start

Batch Execution Finish

Green =Success

Page 21: Data Warehouse Testing Best practices to improve …qaistc.com/2017/wp-content/uploads/2017/12/ajay_srihari.pdf · Data Warehouse Testing Best practices to improve and sustain Data

Console o/p for jobs

Page 22: Data Warehouse Testing Best practices to improve …qaistc.com/2017/wp-content/uploads/2017/12/ajay_srihari.pdf · Data Warehouse Testing Best practices to improve and sustain Data

Final Pipeline O/p

Test Results document on Jenkins

Test Results on Linux

Page 23: Data Warehouse Testing Best practices to improve …qaistc.com/2017/wp-content/uploads/2017/12/ajay_srihari.pdf · Data Warehouse Testing Best practices to improve and sustain Data

Build Execution History

23

- Useful for Batch performance testing

Page 24: Data Warehouse Testing Best practices to improve …qaistc.com/2017/wp-content/uploads/2017/12/ajay_srihari.pdf · Data Warehouse Testing Best practices to improve and sustain Data

DevOps Readiness

24

Pre-

Requ

isite

sRequire thorough knowledge on ‘Line of Business’ Data flow

Runbook availability following predecessors & successors

Identification of suitable Test approach based on project types

Availability of Test data that represent all needs

Alternative analysis to avoid table unusable state issues

Ensure table Referential Integrity is addressed

Availability of TDM team for table refresh to previous state in case of failures

Batch performance SLA prediction

Focus about - Cultural , Process and Tools

Page 25: Data Warehouse Testing Best practices to improve …qaistc.com/2017/wp-content/uploads/2017/12/ajay_srihari.pdf · Data Warehouse Testing Best practices to improve and sustain Data

Benefits

25

Improved Business confidence on Quality Blueprint That helps companies to gear up Possible Faster Iteration, Quick Feedback , Great Collaboration Low-priced Automation possibilities Insights on various Database Object Validations Early Defect Detection

Page 26: Data Warehouse Testing Best practices to improve …qaistc.com/2017/wp-content/uploads/2017/12/ajay_srihari.pdf · Data Warehouse Testing Best practices to improve and sustain Data

26

Page 27: Data Warehouse Testing Best practices to improve …qaistc.com/2017/wp-content/uploads/2017/12/ajay_srihari.pdf · Data Warehouse Testing Best practices to improve and sustain Data

Ajay Nalabhatla works as a Data Management QA Lead at WellsFargo for a Line of Business. He has around 11+ years of experience in Quality Assurance for ETL, DWT & BI Testing. Over a decade, he has involved in various DWT projects for different Banking & Securities Clients and delivered them successfully . He was also involved in conducting the Due Diligence for various DWT clients and also suggested them many improvements in both process and Technical competencies.

Ajay’s holds Bachelor of Engineering in Electronics & communication from Anna University

Srihari Gopisetty is managing the Data Management and Digital Advisory Teams for a Line of Business at Wells Fargo India Solutions. He has more than 17 years of rich experience in leading teams for BFSI Domain and Microsoft Products. Prior to Wells Fargo, he has worked with Microsoft, First Advantage

Srihari’s educational background includes Bachelor of Engineering in Mechanical from Gulbarga University

Author Biography

Srihari Gopisetty

Ajay Nalabhatla

27

Page 28: Data Warehouse Testing Best practices to improve …qaistc.com/2017/wp-content/uploads/2017/12/ajay_srihari.pdf · Data Warehouse Testing Best practices to improve and sustain Data

References & Appendix

28

• http://www.pavantestingtools.com/p/load-runner.html• http://www.slideshare.net/ITRevolution/thursday-320-john-kosco-gb-final• https://en.wikipedia.org/wiki/DevOps• https://www.slideshare.net/Hadoop_Summit/scaling-self-service-on-hadoop• http://digitalcto.com/can-you-build-software-faster-cheaper-and-better/

Page 29: Data Warehouse Testing Best practices to improve …qaistc.com/2017/wp-content/uploads/2017/12/ajay_srihari.pdf · Data Warehouse Testing Best practices to improve and sustain Data

APPENDIX

29

Page 30: Data Warehouse Testing Best practices to improve …qaistc.com/2017/wp-content/uploads/2017/12/ajay_srihari.pdf · Data Warehouse Testing Best practices to improve and sustain Data

Test Approach - Data Migration

30

Pre - Migration Post -Migration Steady State

Analyze Db and identify the objects –

Tables/Indexes/views

Segregate wave wise plan -Forklift ,Consolidation, static

, Dynamic tables /views

Take the snapshot for Post comparison

Prioritize / Create Batch & collect pre-run stats

Compare all DB objectsBetween Legacy & New

Validate data for all tables b/w legacy & new

identified for migration

Validate New tables transformation rules are

as per specs

Parallel load comparison for tables between legacy

& new

Execute Batch performance testing &

compare stats with Legacy

Steady state Validations for Monthly loads

Downstream App’s support

Validate Purging process is as

expected for New system

Performance Monitoring for Data

Load

Page 31: Data Warehouse Testing Best practices to improve …qaistc.com/2017/wp-content/uploads/2017/12/ajay_srihari.pdf · Data Warehouse Testing Best practices to improve and sustain Data

Infrastructure Upgrade Test Strategy- ETL tool /Scheduler tool upgrade

Direct SQL processing

ETL Jobs

Store Procedure

jobs

File watcher / Legacy jobs using c/c++

Pre Migration

•Identify the various types of jobs•Identify the Priority Jobs for phase wise execution•Build run book for the phases with all dependencies•Collect the batch statistics•Take the snapshots of relevant table data

Post Migration

•Execute the identified jobs on upgrade system•Validate the batch / job performance with pre-stats•Regression Testing for tables•Parallel Load processing & data validation•Validate Job dependencies and predecessors

Steady State

•Batch performance monitoring .• Analyse the failures to understand if they are

upgrade related•Steady state support for downstream applications

Test Strategy

31

Job Types

Page 32: Data Warehouse Testing Best practices to improve …qaistc.com/2017/wp-content/uploads/2017/12/ajay_srihari.pdf · Data Warehouse Testing Best practices to improve and sustain Data

Thank You!!!

32