Top Banner
Mining Financial Data Mining Financial Data Histograms & Contingency Histograms & Contingency Tables Tables Shishir Gupta Shishir Gupta Under the guidance of Under the guidance of Dr. Mirsad Hadzikadic Dr. Mirsad Hadzikadic In memory of In memory of Dr Dr . Jan Zytkow . Jan Zytkow SEP 09 1944 - JAN 16 SEP 09 1944 - JAN 16 2001 2001
23

Mining Financial Data Histograms & Contingency Tables Shishir Gupta Under the guidance of Dr. Mirsad Hadzikadic.

Dec 19, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Mining Financial Data Histograms & Contingency Tables Shishir Gupta Under the guidance of Dr. Mirsad Hadzikadic.

Mining Financial Data Mining Financial Data Histograms & Contingency Tables Histograms & Contingency Tables

Shishir GuptaShishir GuptaUnder the guidance ofUnder the guidance of

Dr. Mirsad HadzikadicDr. Mirsad Hadzikadic

In memory of In memory of

DrDr. Jan Zytkow. Jan ZytkowSEP 09 1944 - JAN 16 2001SEP 09 1944 - JAN 16 2001

Page 2: Mining Financial Data Histograms & Contingency Tables Shishir Gupta Under the guidance of Dr. Mirsad Hadzikadic.

AgendaAgenda• Database• Task goals• Tool & technique used• Data preparation and cleaning• Attribute selection• Data transformation• Data Mining/Pattern

Evaluation• Knowledge presentation• Pros/Cons• Questions & Demonstration

Page 3: Mining Financial Data Histograms & Contingency Tables Shishir Gupta Under the guidance of Dr. Mirsad Hadzikadic.

DatabaseDatabase

• Financial Dataset from PKDD 1999

• Financial Dataset from a Czech Bank

• Relational Dataset• 8 Relations

– ACCOUNT - LOAN– DEMOGRAPH - ORDER– TRANSACTION - CARD– DISPOSITION - CLIENT

Page 4: Mining Financial Data Histograms & Contingency Tables Shishir Gupta Under the guidance of Dr. Mirsad Hadzikadic.

Task GoalTask Goal

• Determine Good Client to offer some additional service

• Determine Bad Client to watch carefully to minimize bank loss

• Offer Services :– Loan– Credit Card

Page 5: Mining Financial Data Histograms & Contingency Tables Shishir Gupta Under the guidance of Dr. Mirsad Hadzikadic.

Technique Used - HistogramTechnique Used - Histogram

SQL Statement usedSQL Statement used

SELECT age, COUNT(age)

FROM table_x

GROUP BY age

ORDER BY age

Page 6: Mining Financial Data Histograms & Contingency Tables Shishir Gupta Under the guidance of Dr. Mirsad Hadzikadic.

Technique Used – C-TablesTechnique Used – C-Tables

SQL Statement usedSQL Statement used

SELECT sex, COUNT(sex), age

FROM table_x a, table_y b

WHERE a.id = b.fid

GROUP BY sex, age

ORDER BY sex, age

Page 7: Mining Financial Data Histograms & Contingency Tables Shishir Gupta Under the guidance of Dr. Mirsad Hadzikadic.

Technique Used – CorrelationTechnique Used – Correlation

SQL Statement usedSQL Statement usedSELECT x, y

FROM table_x a, table_y b

WHERE a.id = b.fid

ORDER BY x, y

Page 8: Mining Financial Data Histograms & Contingency Tables Shishir Gupta Under the guidance of Dr. Mirsad Hadzikadic.

Tool - ArchitectureTool - Architecture

Page 9: Mining Financial Data Histograms & Contingency Tables Shishir Gupta Under the guidance of Dr. Mirsad Hadzikadic.

Tool - DescriptionTool - Description

Page 10: Mining Financial Data Histograms & Contingency Tables Shishir Gupta Under the guidance of Dr. Mirsad Hadzikadic.

Data CleaningData Cleaning

• Missing Value– Relation

DEMOGRAPHIC

• Incorrect Values– Relation

TRANSACTION

(Data reduced by 10% after cleaning)

Page 11: Mining Financial Data Histograms & Contingency Tables Shishir Gupta Under the guidance of Dr. Mirsad Hadzikadic.

Data PreparationData Preparation

• Relation CLIENT– Separating SEX &

BDATE from BIRTHNUMBER

• All Date fields converted to AGE– Ref 199901.

Page 12: Mining Financial Data Histograms & Contingency Tables Shishir Gupta Under the guidance of Dr. Mirsad Hadzikadic.

Data Preparation Data Preparation Cont….Cont….

• Creating Table definitions

• Setting up data in table compatible format

• Loading data into Database

• Evaluate loading errors and changing attribute definitions accordingly

Page 13: Mining Financial Data Histograms & Contingency Tables Shishir Gupta Under the guidance of Dr. Mirsad Hadzikadic.

Attribute SelectionAttribute Selection

• Decision Relation– LOAN

• Decision Attributes– STATUS

• Classification Attributes– All other attributes

that do not belong to LOAN relation.

A4?

A6?A1?

Class1 Class2 Class1 Class2

Y N

Y N

N Y

Page 14: Mining Financial Data Histograms & Contingency Tables Shishir Gupta Under the guidance of Dr. Mirsad Hadzikadic.

Data TransformationData Transformation

• Discretization – Continuous attributes into 4 to 10 buckets

• Transactions performed in the year 1997 considered for relation TRANSACTION.– Due to resource limitations– Maximum loans were approved during this

period

TRANSFORM

Page 15: Mining Financial Data Histograms & Contingency Tables Shishir Gupta Under the guidance of Dr. Mirsad Hadzikadic.

Data Mining/Pattern EvaluationData Mining/Pattern Evaluation• Run Histogram on all

non-key attributes to study its distribution.

• Discretize continuous attributes.

• Run Contingency Table study the reference among two attributes.

• Check significance with Correlation function if both attributes are continuous.

Page 16: Mining Financial Data Histograms & Contingency Tables Shishir Gupta Under the guidance of Dr. Mirsad Hadzikadic.

Knowledge Presentation - 1Knowledge Presentation - 1

• All loans on accounts where a second person is allowed to dispose are GOOD LOANS

(100%)

Page 17: Mining Financial Data Histograms & Contingency Tables Shishir Gupta Under the guidance of Dr. Mirsad Hadzikadic.

Knowledge Presentation - 2Knowledge Presentation - 2

• Permanent Orders of type household & leasing indicates financial stability

Page 18: Mining Financial Data Histograms & Contingency Tables Shishir Gupta Under the guidance of Dr. Mirsad Hadzikadic.

Knowledge Presentation - 3Knowledge Presentation - 3

• Accounts with Cash withdrawals are more likely to repay their loans

Page 19: Mining Financial Data Histograms & Contingency Tables Shishir Gupta Under the guidance of Dr. Mirsad Hadzikadic.

Knowledge Presentation - 4Knowledge Presentation - 4

• Accounts with low transaction amounts indicate good loans

Page 20: Mining Financial Data Histograms & Contingency Tables Shishir Gupta Under the guidance of Dr. Mirsad Hadzikadic.

Knowledge Presentation - 5Knowledge Presentation - 5

• Accounts that are in debt indicates BAD LOAN

Page 21: Mining Financial Data Histograms & Contingency Tables Shishir Gupta Under the guidance of Dr. Mirsad Hadzikadic.

ProsPros

• Flexibility to alter data presentation to Flexibility to alter data presentation to understand the nature of dataunderstand the nature of data

• Customers with no background with Customers with no background with datamining can appreciate the output datamining can appreciate the output results because of its simplicityresults because of its simplicity

• Since there is a provision to store the Since there is a provision to store the results in a file, subsequent analysis results in a file, subsequent analysis on a subset of data becomes very on a subset of data becomes very easyeasy

Page 22: Mining Financial Data Histograms & Contingency Tables Shishir Gupta Under the guidance of Dr. Mirsad Hadzikadic.

ConsCons

• Needs capability for Multi-Variable Needs capability for Multi-Variable analysis.analysis.

• Some kind of quantification needs to Some kind of quantification needs to be put in.be put in.

• Performance issues with using Performance issues with using RDBMS.RDBMS.

Page 23: Mining Financial Data Histograms & Contingency Tables Shishir Gupta Under the guidance of Dr. Mirsad Hadzikadic.

Questions & DemonstrationQuestions & Demonstration