Top Banner
The Apriori Algorithm and its Extension by the Application of DeMorgan's Laws Mitch Fernandez Giri Narasimhan
34

11.19.2013.the.apriori.algorithm.and.its.extension.by.the.application.of.de morgans.laws

Aug 07, 2015

Download

Science

Mitch Fernandez
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 11.19.2013.the.apriori.algorithm.and.its.extension.by.the.application.of.de morgans.laws

The Apriori Algorithm and its Extension by the Application ofDeMorgan's Laws

Mitch FernandezGiri Narasimhan

Page 2: 11.19.2013.the.apriori.algorithm.and.its.extension.by.the.application.of.de morgans.laws

A Visit to Publix

http://sentimentalistlindley.blogspot.com

Page 3: 11.19.2013.the.apriori.algorithm.and.its.extension.by.the.application.of.de morgans.laws

Items and Transactions

http://food-pictures.feedio.net/

Page 4: 11.19.2013.the.apriori.algorithm.and.its.extension.by.the.application.of.de morgans.laws

Possible Item Sets

http://www.dreamstime.com/

Page 5: 11.19.2013.the.apriori.algorithm.and.its.extension.by.the.application.of.de morgans.laws

Possible Item Sets

http://trashcandiaries.wordpress.com

Page 6: 11.19.2013.the.apriori.algorithm.and.its.extension.by.the.application.of.de morgans.laws

The only way to verify you’ve found all of the item sets in a database is to check all of the possible item sets1 2 3 4 5 6 7 8 9

0.0005,000.000

10,000.00015,000.00020,000.00025,000.00030,000.00035,000.00040,000.00045,000.000

f(x) = 0.00167030278004725 exp( 2.1736345740781 x )R² = 0.994652089677174

Processing Time

Max Items

Seco

nds

11.3 hours

1 2 3 4 5 6 7 8 90

10,000,00020,000,00030,000,00040,000,00050,000,00060,000,00070,000,000

f(x) = 0.00131257744766744 x^11.7821784802445R² = 0.999883665008539

Item Sets Produced

Max Items

Item

Set

s

An NP-Hard ProblemMaximum Size Item Sets Time (secs) - Ermine Time (secs) - Laptop

2 5 0.162 0.1503 537 0.642 0.8104 14,858 10.456 10.5905 212,681 98.549 102.0006 1,927,513 1,135.844 7 12,385,790 7,992.549 8 60,727,444 40,870.235

Page 7: 11.19.2013.the.apriori.algorithm.and.its.extension.by.the.application.of.de morgans.laws

The Apriori Algorithm

Page 8: 11.19.2013.the.apriori.algorithm.and.its.extension.by.the.application.of.de morgans.laws

The Apriori Algorithm First pass

Count all item occurrences to calculate support Subsequent k th pass

Use item sets from the (k –1)th pass to generate candidate item sets

Calculate support for candidate item sets Prune candidates with support below threshold

Proceed to the (k +1)th pass

Page 9: 11.19.2013.the.apriori.algorithm.and.its.extension.by.the.application.of.de morgans.laws

Maximally Frequent Item Sets

Page 10: 11.19.2013.the.apriori.algorithm.and.its.extension.by.the.application.of.de morgans.laws

The Apriori Algorithm  Hot Dogs Buns Mustard Chips

Trans01 1 1 1 1

Trans02 0 1 0 1

Trans03 1 1 0 0

Trans04 0 1 0 1

Trans05 1 0 0 0

Set minimum support to 0.25

1st Pass• {Hot Dogs} = 0.60• {Buns} = 0.80• {Mustard} = 0.20• {Chips} = 0.60

2nd Pass• {Hot Dogs, Buns} = 0.40• {Hot Dogs, Chips} =

0.20• {Buns, Chips} = 0.60

3rd Pass• {Hot Dogs, Buns, Chips}

= 0.20

Page 11: 11.19.2013.the.apriori.algorithm.and.its.extension.by.the.application.of.de morgans.laws

DeMorgan’s Laws

Page 12: 11.19.2013.the.apriori.algorithm.and.its.extension.by.the.application.of.de morgans.laws

DeMorgan’s Laws

Page 13: 11.19.2013.the.apriori.algorithm.and.its.extension.by.the.application.of.de morgans.laws

DeMorgan’s Laws

Page 14: 11.19.2013.the.apriori.algorithm.and.its.extension.by.the.application.of.de morgans.laws

Creating a Complement Set

  Item01 Item02 Item03 Item04Trans0

1 1 0 1 1Trans0

2 0 1 0 1Trans0

3 1 0 0 1Trans0

4 0 1 1 0Trans0

5 1 0 0 1Trans0

6 0 1 1 1

Page 15: 11.19.2013.the.apriori.algorithm.and.its.extension.by.the.application.of.de morgans.laws

Creating a Complement Set

  Item01 Item02 Item03 Item04Trans0

1 0 1 0 0Trans0

2 1 0 1 0Trans0

3 0 1 1 0Trans0

4 1 0 0 1Trans0

5 0 1 1 0Trans0

6 1 0 0 0

Page 16: 11.19.2013.the.apriori.algorithm.and.its.extension.by.the.application.of.de morgans.laws

What DeMorgan Means for Apriori

If has 90% support, then must have 10% support since

must have 10% support

Page 17: 11.19.2013.the.apriori.algorithm.and.its.extension.by.the.application.of.de morgans.laws

What DeMorgan Means for Apriori

If has 10% support, then must have 10% support since

Therefore, must have 90% support

Page 18: 11.19.2013.the.apriori.algorithm.and.its.extension.by.the.application.of.de morgans.laws

Putting it Into Practice

1. Take the complement of the original data set by changing all the 1s to 0s and all the 0s to 1s2. Set minimum support as low as possible and set an upper limit for maximum support3. Run Apriori – Resulting item sets will be ORs 4. Correct the reported support by subtracting it from 100%

Page 19: 11.19.2013.the.apriori.algorithm.and.its.extension.by.the.application.of.de morgans.laws

But what if…

What if your data set looks like this?  Item01 Item02 Item03 Item04

Trans01 1 0 1 1

Trans02 0 1 0 1

Trans03 1 0 0 1

Trans04 0 1 1 0

Trans05 1 0 0 1

Trans06 0 1 1 1

Item01 or Item02 is found in every patient – set has 100% support. But Apriori can’t find that!

Page 20: 11.19.2013.the.apriori.algorithm.and.its.extension.by.the.application.of.de morgans.laws

… then

Add a dummy transaction to ensure all OR sets can be found  Item01 Item02 Item03 Item04

Trans01 1 0 1 1

Trans02 0 1 0 1

Trans03 1 0 0 1

Trans04 0 1 1 0

Trans05 1 0 0 1

Trans06 0 1 1 1

Dummy01 1 1 1 1

Now Apriori can find it!

Page 21: 11.19.2013.the.apriori.algorithm.and.its.extension.by.the.application.of.de morgans.laws

The Complete Procedure

1. Take the complement of the original data set by changing all the 1s to 0s and all the 0s to 1s2. Add dummy transactions to ensure all item sets can be found3. Set minimum support as low as possible and set an upper limit for maximum support4. Run Apriori – Resulting item sets will be ORs

Page 22: 11.19.2013.the.apriori.algorithm.and.its.extension.by.the.application.of.de morgans.laws

The Complete Procedure

5. Correct for the reported support using the formula:

Page 23: 11.19.2013.the.apriori.algorithm.and.its.extension.by.the.application.of.de morgans.laws

Lupus

http://www.mollysfund.org/

Page 24: 11.19.2013.the.apriori.algorithm.and.its.extension.by.the.application.of.de morgans.laws

Lupus

289

Items to Analyze

134 Subject

s

17 Muscular Symptoms 4 Neurological Symptoms 16 Dermatological Symptoms 10 Inflammatory Symptoms 23 Major Organ Symptoms 61 Miscellaneous Symptoms 158 Laboratory Tests

Page 25: 11.19.2013.the.apriori.algorithm.and.its.extension.by.the.application.of.de morgans.laws

Lupus – Symptoms Needed for Full Coverage

Raynaud’s

Syndrome

Photo-sensitivi

ty

Nasal or Oral

Ulcers

Joint Swelling

Swelling of 3 or More

Joints

Hand Swelling

Joint Pain

Swelling of Lower

Extremities

Alopecia

Swelling of Hand

Muscles

Fever Malar Rash

Page 26: 11.19.2013.the.apriori.algorithm.and.its.extension.by.the.application.of.de morgans.laws

Lupus – Full Coverage for SLE vs MCTD

Joint Pain

Hand Swelling

Nasal or Oral

Ulcers

Malar Rash

Raynaud’s

Syndrome

Hand Swellin

g

Muscle Weakne

ss

SLE MCTD

Page 27: 11.19.2013.the.apriori.algorithm.and.its.extension.by.the.application.of.de morgans.laws

Chronic Obstructive Pulmonary Disease

http://www.nhlbi.nih.gov/health/health-topics/topics/copd/

Page 28: 11.19.2013.the.apriori.algorithm.and.its.extension.by.the.application.of.de morgans.laws

Metagenomics

Page 29: 11.19.2013.the.apriori.algorithm.and.its.extension.by.the.application.of.de morgans.laws

Chronic Obstructive Pulmonary Disease

9939

Items to Analyze

55 Subject

s

Subjects with COPD Active Smokers Former Smokers Never Smokers

Page 30: 11.19.2013.the.apriori.algorithm.and.its.extension.by.the.application.of.de morgans.laws

Metagenomic Data Set

 Species0

1Species0

2Species0

3Species0

4Patient

01 12,347 400 1,254 845Patient

02 8,457 523 1,632 698Patient

03 9,322 1,351 1,324 1,709Patient

04 4,252 451 1,155 821Patient

05 7,453 1,625 959 58Patient

06 8,255 54 1,434 457

Page 31: 11.19.2013.the.apriori.algorithm.and.its.extension.by.the.application.of.de morgans.laws

Metagenomic Data Set

 Species0

1Species0

2Species0

3Species0

4Patient

01 High Low Med LowPatient

02 High Low Med LowPatient

03 High Med Med MedPatient

04 High Low Med LowPatient

05 High Med Low LowPatient

06 High Low Med Low

Page 32: 11.19.2013.the.apriori.algorithm.and.its.extension.by.the.application.of.de morgans.laws

-2.2 -2 -1.8 -1.6 -1.4 -1.2 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 More0

50

100

150

200

250

300

350

Z-score

Fre

quency

-0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 More0

1000

2000

3000

4000

5000

6000

Z-score

Freq

uenc

yDistribution of Normalized Read Counts

Low Med High

Page 33: 11.19.2013.the.apriori.algorithm.and.its.extension.by.the.application.of.de morgans.laws

Significant Frequent Item SetsItem Set ID Support - Active Support - Former Support - Never Max - Min No. of Items p-value Item Set Details

Active_0003 50.00% 4.17% 44.44% 45.83% 2 0.0242 Family.Pseudomonadaceae.001 = High, Rhizobium.001 = Med

Active_0009 50.00% 8.33% 22.22% 41.67% 3 0.0702 Staphylococcus.001 = High, Prevotella.002 = High, Porphyromonas.001 = Med

Active_0026 50.00% 12.50% 11.11% 38.89% 4 0.0889Prevotella.002 = High, Mogibacterium.001 = Med,

Family.Carnobacteriaceae.001 = Med, Order.Actinomycetales.001 = Med,

Active_0028 50.00% 20.83% 0.00% 50.00% 4 0.1033Staphylococcus.001 = High, Prevotella.002 = High,

Order.Actinomycetales.001 = Med, Mogibacterium.001 = Med

Active_0020 50.00% 25.00% 0.00% 50.00% 4 0.1469Staphylococcus.001 = High, Prevotella.002 = High,

Family.Prevotellaceae.001 = Med, Mycoplasma.001 = Med

Active_0029 50.00% 25.00% 0.00% 50.00% 4 0.1469Family.Pseudomonadaceae.001 = High,

Order.Actinomycetales.001 = Med, Prevotella.002 = High, Mogibacterium.001 = Med

Active_0010 50.00% 12.50% 22.22% 37.50% 3 0.1524 Family.Pseudomonadaceae.001 = High, Prevotella.002 = High, Porphyromonas.001 = Med

Active_0021 50.00% 12.50% 22.22% 37.50% 4 0.1524Neisseria.001 = High, Prevotella.002 = High,

Order.Actinomycetales.001 = Med, Treponema.001 = Med

Active_0014 50.00% 16.67% 11.11% 38.89% 3 0.1636 Family.Prevotellaceae.001 = Med, Prevotella.002 = High, Order.Actinomycetales.001 = Med

Former_0003 13.64% 50.00% 22.22% 36.36% 3 0.1868 Neisseria.001 = High, Porphyromonas.001 = High, Order.Clostridiales.004 = Med

Page 34: 11.19.2013.the.apriori.algorithm.and.its.extension.by.the.application.of.de morgans.laws

Dr. DeEtta Mills, Dr. Kalai Mathee, Annia Mesa

Ronald E. McNair Scholars Program

MBRS-RISE (NIH Grant #R5 GM061347)

Florida Dept. of Health

National Science Foundation – Graduate Research Fellowship Program

Thank You