Top Banner
Data Warehousing and Data Mining
18

Data Warehousing and Data Mining - Bharathidasan · PDF fileContent •Introduction •Overview of data mining technology •Association rules •Classification •Clustering...

Mar 07, 2018

Download

Documents

phungtu
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Data Warehousing and Data Mining - Bharathidasan · PDF fileContent •Introduction •Overview of data mining technology •Association rules •Classification •Clustering •Applications

Data Warehousing and Data Mining

Page 2: Data Warehousing and Data Mining - Bharathidasan · PDF fileContent •Introduction •Overview of data mining technology •Association rules •Classification •Clustering •Applications

Content

• Introduction

• Overview of data mining technology

• Association rules

• Classification

• Clustering

• Applications of data mining

• Commercial tools

• Conclusion

Page 3: Data Warehousing and Data Mining - Bharathidasan · PDF fileContent •Introduction •Overview of data mining technology •Association rules •Classification •Clustering •Applications

Introduction

• What is data mining?

• Why do we need to ‘mine’ data?

• On what kind of data can we ‘mine’?

Page 4: Data Warehousing and Data Mining - Bharathidasan · PDF fileContent •Introduction •Overview of data mining technology •Association rules •Classification •Clustering •Applications

What is data mining? • The process of discovering meaningful new

correlations, patterns and trends by shifting through large amounts of data stored in repositories, using pattern recognition technologies as well as statistical and mathematical techniques.

• A part of Knowledge Discovery in Data (KDD) process.

Page 5: Data Warehousing and Data Mining - Bharathidasan · PDF fileContent •Introduction •Overview of data mining technology •Association rules •Classification •Clustering •Applications

Why data mining? The explosive growth in data collection

The storing of data in data warehouses

The availability of increased access to data from Web navigation and intranet

We have to find a more effective way to use these data in decision support process than just using traditional querry languages

Page 6: Data Warehousing and Data Mining - Bharathidasan · PDF fileContent •Introduction •Overview of data mining technology •Association rules •Classification •Clustering •Applications

On what kind of data? • Relational databases

• Data warehouses

• Transactional databases

• Advanced database systems Object-relational

Spatial and Temporal

Time-series

Multimedia, text

WWW

GeneFilter Comparison Report

GeneFilter 1 Name: GeneFilter 1 Name:

O2#1 8-20-99adjfinal N2#1finaladj

INTENSITIES

RAW NORMALIZED

ORF NAME GENE NAME CHRM F G R GF1 GF2 GF1 GF2 DIFFERENCE RATIO

YAL001C TFC3 1 1 A 1 2 12.03 7.38 403.83 209.79 194.04 1.92

YBL080C PET112 2 1 A 1 3 53.21 35.62 "1,786.11" "1,013.13" 772.98 1.76

YBR154C RPB5 2 1 A 1 4 79.26 78.51 "2,660.73" "2,232.86" 427.87 1.19

YCL044C 3 1 A 1 5 53.22 44.66 "1,786.53" "1,270.12" 516.41 1.41

YDL020C SON1 4 1 A 1 6 23.80 20.34 799.06 578.42 220.64 1.38

YDL211C 4 1 A 1 7 17.31 35.34 581.00 "1,005.18" -424.18 -1.73

YDR155C CPH1 4 1 A 1 8 349.78 401.84 "11,741.98" "11,428.10" 313.88 1.03

YDR346C 4 1 A 1 9 64.97 65.88 "2,180.87" "1,873.67" 307.21 1.16

YAL010C MDM10 1 1 A 2 2 13.73 9.61 461.03 273.36 187.67 1.69

YBL088C TEL1 2 1 A 2 3 8.50 7.74 285.38 220.01 65.37 1.30

YBR162C 2 1 A 2 4 226.84 293.83 "7,614.82" "8,356.39" -741.57 -1.10

YCL052C PBN1 3 1 A 2 5 41.28 34.79 "1,385.79" 989.41 396.38 1.40

YDL028C MPS1 4 1 A 2 6 7.95 6.24 266.99 177.34 89.65 1.51

YDL219W 4 1 A 2 7 16.08 11.33 539.93 322.20 217.74 1.68

YDR163W 4 1 A 2 8 19.13 14.19 642.17 403.56 238.61 1.59

YDR354W TRP4 4 1 A 2 9 62.24 40.74 "2,089.48" "1,158.64" 930.84 1.80

YAL018C 1 1 A 3 2 10.72 8.81 359.75 250.60 109.15 1.44

YBL096C 2 1 A 3 3 10.91 8.98 366.40 255.40 111.00 1.43

YBR169C SSE2 2 1 A 3 4 17.33 27.81 581.80 790.84 -209.05 -1.36

YCL060C 3 1 A 3 5 17.99 24.75 603.96 703.75 -99.79 -1.17

YDL036C 4 1 A 3 6 14.22 8.86 477.39 251.94 225.44 1.89

YDL227C HO 4 1 A 3 7 25.61 31.52 859.71 896.46 -36.75 -1.04

YDR171W HSP42 4 1 A 3 8 102.08 98.37 "3,426.83" "2,797.58" 629.25 1.22

YDR362C 4 1 A 3 9 16.32 12.95 547.96 368.39 179.57 1.49

YAL026C DRS2 1 1 A 4 2 11.32 7.97 379.85 226.53 153.33 1.68

YBL102W SFT2 2 1 A 4 3 55.88 63.74 "1,875.82" "1,812.81" 63.02 1.03

YBR177C 2 1 A 4 4 63.31 29.03 "2,125.20" 825.60 "1,299.60" 2.57

YCL068C 3 1 A 4 5 8.33 4.47 279.51 127.16 152.35 2.20

YDL044C MTF2 4 1 A 4 6 11.73 6.96 393.88 198.07 195.81 1.99

YDL235C YPD1 4 1 A 4 7 38.71 30.20 "1,299.33" 858.83 440.50 1.51

YDR179C 4 1 A 4 8 12.77 11.05 428.60 314.12 114.48 1.36

YDR370C 4 1 A 4 9 16.70 15.30 560.62 435.13 125.49 1.29

YAL034C FUN19 1 1 A 5 2 20.89 24.21 701.32 688.59 12.73 1.02

YBL111C 2 1 A 5 3 22.38 13.67 751.39 388.69 362.70 1.93

YBR185C MBA1 2 1 A 5 4 38.42 19.96 "1,289.61" 567.78 721.83 2.27

YCLX03C 3 1 A 5 5 8.69 3.66 291.77 104.11 187.66 2.80

YDL052C SLC1 4 1 A 5 6 52.37 49.87 "1,758.05" "1,418.33" 339.73 1.24

YDL243C 4 1 A 5 7 15.56 12.95 522.24 368.30 153.94 1.42

YDR186C 4 1 A 5 8 16.48 15.01 553.30 426.75 126.55 1.30

YDR378C 4 1 A 5 9 31.13 28.08 "1,045.01" 798.50 246.50 1.31

YAL040C CLN3 1 1 A 6 2 126.65 107.34 "4,251.70" "3,052.61" "1,199.08" 1.39

YBR006W 2 1 A 6 3 22.74 11.10 763.49 315.55 447.94 2.42

YBR193C 2 1 A 6 4 14.81 15.55 497.07 442.20 54.87 1.12

YCLX11W 3 1 A 6 5 161.96 175.34 "5,436.86" "4,986.41" 450.44 1.09

YDL060W 4 1 A 6 6 29.84 37.13 "1,001.65" "1,055.98" -54.34 -1.05

YDR003W 4 1 A 6 7 23.99 23.22 805.48 660.25 145.22 1.22

YDR194C MSS116 4 1 A 6 8 66.58 47.16 "2,235.07" "1,341.29" 893.78 1.67

YDR386W 4 1 A 6 9 11.27 5.75 378.27 163.46 214.81 2.31

YAL047C 1 1 A 7 2 15.54 11.30 521.74 321.28 200.46 1.62

YBR012W-B 2 1 A 7 3 54.70 79.97 "1,836.29" "2,274.15" -437.86 -1.24

YBR201W DER1 2 1 A 7 4 21.67 19.57 727.49 556.64 170.85 1.31

YCR007C 3 1 A 7 5 25.02 15.96 840.01 453.76 386.25 1.85

YDL068W 4 1 A 7 6 18.32 13.11 614.83 372.78 242.05 1.65

Structure - 3D Anatomy

Function – 1D Signal

Metadata – Annotation

Page 7: Data Warehousing and Data Mining - Bharathidasan · PDF fileContent •Introduction •Overview of data mining technology •Association rules •Classification •Clustering •Applications

Overview of data mining technology

• Data Mining vs. Data Warehousing

• Data Mining as a part of Knowledge Discovery Process

• Goals of Data Mining and Knowledge Discovery

• Types of Knowledge Discovery during Data Mining

Page 8: Data Warehousing and Data Mining - Bharathidasan · PDF fileContent •Introduction •Overview of data mining technology •Association rules •Classification •Clustering •Applications

Data Mining vs. Data Warehousing

• A Data Warehouse is a repository of integrated information, available for queries and analysis. Data and information are extracted from heterogeneous sources as they are generated.

• This makes it much easier and more efficient to run queries over data that originally came from different sources.

• The goal of data warehousing is to support decision making with data!

Page 9: Data Warehousing and Data Mining - Bharathidasan · PDF fileContent •Introduction •Overview of data mining technology •Association rules •Classification •Clustering •Applications

Knowledge Discovery in Databases and Data Mining

• The non-trivial extraction of implicit, unknown, and potentially useful information from databases.

Page 10: Data Warehousing and Data Mining - Bharathidasan · PDF fileContent •Introduction •Overview of data mining technology •Association rules •Classification •Clustering •Applications

Goals of Data Mining and KDD

• Prediction: How certain attributes within the data will behave in the future.

• Identification: Identify the existence of an item, an event, an activity.

• Classification: Partition the data into categories.

• Optimization: Optimize the use of limited resources.

Page 11: Data Warehousing and Data Mining - Bharathidasan · PDF fileContent •Introduction •Overview of data mining technology •Association rules •Classification •Clustering •Applications

Applications of data mining

• Market analysis Marketing Strategies

Advertisement

• Risk analysis and management Finance and finance investments

Manufacturing and production

• Fraud detection and detection of unusual patterns (outliers) Telecommunication Financial transactions

Anti-terrorism (!!!)

Page 12: Data Warehousing and Data Mining - Bharathidasan · PDF fileContent •Introduction •Overview of data mining technology •Association rules •Classification •Clustering •Applications

Applications

Service Providers in the mobile phone and utilities industries. • Mobile phone and utilities companies use Data Mining and Business

Intelligence to predict ‘churn’, the terms they use for when a customer leaves their company to get their phone/gas/broadband from another provider.

• They collate billing information, customer services interactions, website visits and other metrics to give each customer a probability score, then target offers and incentives to customers whom they perceive to be at a higher risk of churning.

Retail • Retailers segment customers into ‘Recency, Frequency, Monetary’ (RFM)

groups and target marketing and promotions to those different groups. • A customer who spends little but often and last did so recently will be

handled differently to a customer who spent big but only once, and also some time ago. The former may receive a loyalty, upsell and cross-sell offers, whereas the latter may be offered a win-back deal, for instance.

Page 13: Data Warehousing and Data Mining - Bharathidasan · PDF fileContent •Introduction •Overview of data mining technology •Association rules •Classification •Clustering •Applications

Cross Selling and Up Selling

• Buy an Android Phone, Get recommendations for memory card / accessories

• Buy an Android Phone, Get recommendations for an iPhone

Page 14: Data Warehousing and Data Mining - Bharathidasan · PDF fileContent •Introduction •Overview of data mining technology •Association rules •Classification •Clustering •Applications

Winback Offers

Page 15: Data Warehousing and Data Mining - Bharathidasan · PDF fileContent •Introduction •Overview of data mining technology •Association rules •Classification •Clustering •Applications

Applications

E-commerce

• Cross-sells and up-sells through their websites

• One of the most famous of these is, of course, Amazon, who use sophisticated mining techniques to drive their, ‘People who viewed that product, also liked this’ functionality.

Page 16: Data Warehousing and Data Mining - Bharathidasan · PDF fileContent •Introduction •Overview of data mining technology •Association rules •Classification •Clustering •Applications

Applications

Supermarkets • Famously, supermarket loyalty card programmes are

usually driven mostly, if not solely, by the desire to gather comprehensive data about customers for use in data mining.

Crime prevention agencies • To spot trends across myriads of data – helping with

everything from where to deploy police manpower (where is crime most likely to happen and when?), who to search at a border crossing (based on age/type of vehicle, number/age of occupants, border crossing history) and even which intelligence to take seriously in counter-terrorism activities.

Page 17: Data Warehousing and Data Mining - Bharathidasan · PDF fileContent •Introduction •Overview of data mining technology •Association rules •Classification •Clustering •Applications

Applications of data mining

• Text mining (news group, email, documents) and

Web mining

• Stream data mining

• DNA and bio-data analysis

Diseases outcome

Effectiveness of treatments

Identify new drugs

Page 18: Data Warehousing and Data Mining - Bharathidasan · PDF fileContent •Introduction •Overview of data mining technology •Association rules •Classification •Clustering •Applications

Closing thoughts!

• Data mining is a “decision support” process in which we search for patterns of information in data.

• This technique can be used on many types of data.

• Overlaps with machine learning, statistics, artificial intelligence, databases, visualization…