Top Banner
Target Corporation BI Framework Error Processing Mohan.Kumar2
13

BI Error Processing Framework

May 22, 2015

Download

Technology

Asis Mohanty

BI Error Processing Framework
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: BI Error Processing Framework

Target Corporation

BI Framework Error Processing

Mohan.Kumar2

Page 2: BI Error Processing Framework

Page 2 of 13

Table of Contents

1. Exception Handling Overview (ref 2.5.2) ....................................................................................... 3

1.1. Data Reprocessing ......................................................................................................................... 5

1.2. Infrastructure Exception Handling ................................................................................................ 7

1.3. Data Correction in DWH ................................................................................................................ 9

2. Error Processing – High Level .............................................................................................................. 11

2.1. Capturing ..................................................................................................................................... 11

2.2. Error threshold ............................................................................................................................ 11

2.3. Purging ........................................................................................................................................ 12

2.3.1. Landing Area ....................................................................................................................... 12

2.3.2. Staging Area ........................................................................................................................ 12

2.3.3. EDW ..................................................................................................................................... 12

2.3.4. Datamart ............................................................................................................................. 12

2.4. Purge threshold ........................................................................................................................... 12

2.5. Appendix ..................................................................................................................................... 12

2.5.1. About Target ....................................................................................................................... 12

2.5.2. Reference ............................................................................................................................ 13

2.5.3. Other Contributors .............................................................................................................. 13

Page 3: BI Error Processing Framework

Page 3 of 13

1. Exception Handling Overview (ref 2.5.2)

Exception Handling deals with any abnormal termination, unacceptable event or incorrect data that can impact the data flow or accuracy of data in the warehouse/mart.

Exceptions in ETL could be classified as Data Related Exceptions and Infrastructure Related Exceptions.

Please Note: In Infrastructure Related exception, Infrastructure glitches are not classified as exception as they are temporary and are resolved by the time the job(s) is/are rerun. But, the logs are tracked and maintained.

The process of recovering or gracefully exiting when an exception occurs is called exception handling.

Page 4: BI Error Processing Framework

Page 4 of 13

Data related exceptions are caused because of incorrect data format, incorrect value, incomplete data from the source system. This leads to Data validation exceptions and Data Rejects. The process of handling the Data Rejects is called Data Reprocessing.

Page 5: BI Error Processing Framework

Page 5 of 13

Infrastructure related exceptions are caused because of issues in the Network , the Database and the Operating System. Common Infrastructure exceptions are FTP failure, Database connectivity failure, File system full etc.

The data related exceptions are usually documented in the requirements, if not they must be because if the data related exceptions are not handled they lead to inaccurate data in the warehouse/mart. We also keep a threshold of maximum number of validation or reject failures allowed per load. Any value above the threshold would mean the data would be too inaccurate because to too many rejections.

There is one more exception which is the presence of inaccurate or incorrect data in the warehouse. This could happen due to

1. Incorrect requirement or missed, leading to incorrect ETL. 2. Incorrect interpretation of requirements leading to incorrect ETL. 3. Uncaught coding defects. 4. Incorrect data from source.

The process of Correction of the data already loaded in the warehouse involves fixing the data already loaded and also preventing the inaccuracy to persist in the future.

1.1. Data Reprocessing

Reprocessing is is an exception handling process which involves the correction of the data that is could not be loaded into the warehouse/mart. There could be many reasons why source data gets rejected from DWH. Most common of them are

Data Rejection - Source data not matching critical business codes/attributes. This is called Lookup Failure in ETL.

Data Cleansing - Source data containing junk values for business critical fields hence getting rejected during data validation. There are 3 options to deal with the rejected records. One, We could leave the rejected data out of DWH or, two we could correct it based on whether the rejected field is critical to business and is worth reprocessing, and then load it into DWH, and last option is to The process of correcting the rejected data and then loading into DWH is called Data Reprocessing.

Page 6: BI Error Processing Framework

Page 6 of 13

As depicted in the figure above, we reject the data during the data validation process, data cleansing process and data transformation process. The rejected data is collected in temporary files on the ETL server while the ETL is running. Once the ETL is complete, the rejected data is moved into the Landing Area. The end user and the business analyst are provided interfaces to read the reject data in landing area. They take this as the input, analyze the cause of rejection and correct the data at the source itself. Once the data is corrected at the source, it is again extracted (depicted in Brown line in the figure). The corrected data is not expected to get rejected again unless the correction provided was insufficient.

Page 7: BI Error Processing Framework

Page 7 of 13

In some business critical data warehouses which have very very low tolerance towards inaccurate data, we would need a sophisticated and a fast mechanism of handling rejected data in the landing area. Here we consider a database to land the data. The database schema is the same as that of source files/tables. We add two more columns to the schema, one to flag whether the record got rejected in ETL, and the other to identify the date when the data was sent by the source system. Having a database gives us an option of easily create applications to access and update the data in the landing area. Please note that adding a database in the landing area adds the infrastructure and maintenance costs. Adding the database would also increase the number of processes in the extraction process, thereby affecting the performance of ETL.

1.2. Infrastructure Exception Handling

Infrastructure related exceptions are caused because of issues in the Network connectivity, the Database operations and the Operating System.

Common Infrastructure exceptions are

Page 8: BI Error Processing Framework

Page 8 of 13

Database Errors like db connection error, Referential integrity constraint failure, primary key constraint failure, incorrect credentials, data type mismatch, Null in Not Null fields.

Network connection failure causing FTP failure.

Operating system issues on ETL server full causing aborts due to memory insufficiency, un-mounted file systems, 100% CPU utilization, incorrect file/directory permissions.

The diagram below depicts the exceptions and the process to handle them.

The process of detecting the abovementioned exception is generally caught by the ETL scheduler which checks whether there is a non zero value returned by the ETL process.

If an exception occurs, we make a log entry, send email or alerts to the users to notify that the ETL process has aborted and exit to the Operating System with a Non Zero value.

The notification process alerts the IS team to take appropriate action so that the ETL process can be restarted once the infrastructure issue is resolved.

Page 9: BI Error Processing Framework

Page 9 of 13

1.3. Data Correction in DWH

The data in the DWH could be incorrect or inaccurate due to a variety of reasons, mainly

1. Incorrect requirement or missed, leading to incorrect ETL.

2. Incorrect interpretation of requirements leading to incorrect ETL.

3. Uncaught coding defects. 4. Incorrect data from

source.

The reason 1, 2, and 3 would require us to revisit the ETL code with respect to the incorrect requirements, missed requirements and uncaught defects.

The figure below depicts the process to be followed to correct the data already loaded in DWH.

Detection

Most important is the detection of the inaccurate or incorrect data in DWH. Incorrect data loaded in DWH is usually detected long after the it has been loaded when some end-user identifies it in his/her report.

Analysis

Once reported, we analyze the report and its metadata. This would require understanding the report metadata, calculation and the SQL generated by the report.

Page 10: BI Error Processing Framework

Page 10 of 13

If there is a no issue in the report definition, we analyze the data in DWH. Once we have pin pointed the table, attributes and the data in DWH where the inaccuracy is, we perform the root cause of the inaccuracy.

The root cause would require us to check the data with respect to the requirements, design and code. The root cause helps us identify the next course of action.

Missing Requirements - If the root cause is massing requirements, then we go to the users and get the complete requirements.

Misinterpretation of Requirements - Here too we go to the end user and clarify on the misinterpreted requirement.

Defect in the code - There is a possibility of missing detecting bugs during the testing phase. If undetected, the bug could cause inaccuracy in data.

Correction Process

In case of missing requirements,

1. Get the new requirements from the users. 2. Document the new requirements. 3. Design the new ETL. 4. Code the new ETL. 5. Test the new ETL. 6. Make the DWH offline. 7. Perform the History Load for the new Requirements. This could be possible only when we have

added new tables or new attributes in the data model. 8. Check the report for new requirements. 9. If the reports are correct, then implement the new ETL into the regular ETL. 10. Perform the catch-up load for the duration the DWH was offline. 11. Bring the DWH online.

In case of misinterpreted requirements or undetected bugs,

1. Analyze the ETL and identify the changes in it. 2. Update the design. 3. Correct the code. 4. Test the code. 5. Create a patch to update the historical data (data already in DWH) to correct it. 6. Test the patch. 7. Bring the DWH offline. 8. Run the patch. 9. Check the report for correction. 10. If the reports are correct, then implement the corrected ETL. 11. Perform the catch-up load for the duration the DWH was offline. 12. Bring the DWH online.

Page 11: BI Error Processing Framework

Page 11 of 13

2. Error Processing – High Level

The error processing in Target is unique and flawless.

2.1. Capturing

All the various source system data is dumped into the landing area as is. All the records

in the landing area are marked as valid in the first instance during the load.

On a given schedule, the records are processed from landing area to the staging area

and all the business validation are executed on these records. Once the staging load is

finished, all the records which have not been loaded into the staging area are marked as

invalid record in landing area.

Information of all the rejected records which have failed will be stored into the error

tables with error code. There is another table having all reference to the error code.

Depending on the table(s), we would have multiple business validations for a each

record. Hence could end up having multiple entries in the error table(s) for a given

source record.

The records which have been marked as invalid would be processed for every staging

load until they are purged or if a corrected record is sent from the source.

2.2. Error threshold

If the no. of rejections reach a given threshold limit, mail is sent to EAM / Business data

quality team informing the abnormal behavior and job is aborted.

Page 12: BI Error Processing Framework

Page 12 of 13

Based on the feedback the jobs are rerun/re-triggered manually.

2.3. Purging

Purging is to delete the previous records which are no more required by a given

business process.

Following are the logic applied on various data.

Purging logic is based on the following:-

2.3.1. Landing Area

1. Valid records – Valid records which have been loaded into the Staging area

will retain only previous 7 days of data. Rest will be purged.

2. Invalid records - Invalid records which have been errored out from Staging

area will be retained for 30 days. Rest will be purged.

2.3.2. Staging Area

Truncate and load. An Area where we load and make sure data is good before

we do any changes to warehouse table.

2.3.3. EDW

Depending on Business need, data is maintained in EDW.

2.3.4. Datamart

Depending on Business need, data is maintained in EDW.

2.4. Purge threshold

During purging, the business can set a threshold limit to the number of records being

purged. If while deleting the threshold limit is crossed. The Purge jobs are automatically

aborted and a mail sent to the EAM / Business data quality team for confirmation.

Once the business confirms, the aborted jobs are later triggered manually.

2.5. Appendix

2.5.1. About Target

TBU

Page 13: BI Error Processing Framework

Page 13 of 13

2.5.2. Reference

The Exception Handling Overview is an extract from www.dwhinfo.com written

by [email protected]

2.5.3. Other Contributors

Krishan.Vinayak – Delivery Manager

Devanathan.Rajagopalan – Senior Technical Architect

Asis.Mohanty – BI Manager

Joseph.Raj – Technical Architect