Top Banner
A publication of The Do’s and Don’ts of DATA MINING Based on real-world experiences Published By
31
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The Do's and Don'ts of Data Mining

A publication of

The Do’s and Don’ts of

DATA MININGBased on real-world experiences

Published By

Page 2: The Do's and Don'ts of Data Mining

Data mining has come a long way over the past 300 years…

Page 3: The Do's and Don'ts of Data Mining

Over time, data practitioners have had their fair share of

Page 4: The Do's and Don'ts of Data Mining

Changethe way you do things, or KEEP doing what works!

Page 5: The Do's and Don'ts of Data Mining

The Do’s of

Data Mining✓

Page 6: The Do's and Don'ts of Data Mining

…According to Scott Terry

President

Rapid Progress Marketing and Modeling, LLCwww.RPMSquared.com

Page 7: The Do's and Don'ts of Data Mining

Do Create a Clearly-Defined, Measurable Objective for Every Project

Page 8: The Do's and Don'ts of Data Mining

To Increase Your Chances of Success

Do Simplify The Solution

Page 9: The Do's and Don'ts of Data Mining

…According to Gregory Piatetsky-Shapiro

Editor

www.kdnuggets.com @kdnuggets

Page 10: The Do's and Don'ts of Data Mining

DO ASK QUESTIONS.

Understanding the problem and asking the right question is more important than using an advanced algorithm.

Page 11: The Do's and Don'ts of Data Mining

…According to Jim Kenyon

Director of IT Services

Optimization Groupwww.optimizationgroup.com

Page 12: The Do's and Don'ts of Data Mining

While data is available for mining projects in ever-increasing amounts, it is the rare occasion when it will arrive in a tidy, mining-ready format. More typically, it will show up in multiple spreadsheets that vary in format and granularity. These varied formats frequently require hours (and hours) of ETL (Extract, Transform, Load) time.

Do Plan For Data To Be Messy

Page 13: The Do's and Don'ts of Data Mining

Do use more than 1 technique/algorithm.

Page 14: The Do's and Don'ts of Data Mining

Do cross-check data coming out of the ETL process with

the original values, and with project stakeholders.

Page 15: The Do's and Don'ts of Data Mining

…According to Falk Huettmann

Wildlife Ecologist

His work is explicit in space and time, and looks closely at the global effects of the economy.

Page 16: The Do's and Don'ts of Data Mining

DO BE INFORMEDStay fluent on the latest data mining concepts and approaches, as well as data mining history.

Page 17: The Do's and Don'ts of Data Mining

The Don’ts of

Data Mining✗

Page 18: The Do's and Don'ts of Data Mining

…According to Scott Terry

President

Rapid Progress Marketing and Modeling, LLCwww.RPMSquared.com

Page 19: The Do's and Don'ts of Data Mining

DO NOT EVER…

I MEAN EVER UNDERESTIMATE THE POWER OF GOOD DATA PREPARATION

Page 20: The Do's and Don'ts of Data Mining

Do Not Ascribe Them Mystical Powers and Wrongly Think“It’s All About the Algorithms”

Page 21: The Do's and Don'ts of Data Mining

…According to Dean Abbott

Founder & President

Abbott Analytics/Abbott Consultingwww.abbottanalytics.com @deanabb

Page 22: The Do's and Don'ts of Data Mining

DON’T USE THE DEFAULT MODEL ACCURACY METRIC

Page 23: The Do's and Don'ts of Data Mining

…According to Gregory Piatetsky-Shapiro

Editor

www.kdnuggets.com @kdnuggets

Page 24: The Do's and Don'ts of Data Mining

Don’t OverfitWith Big Data, it is easy to find patterns even in random data. Use appropriate tests such as randomization tests to avoid finding false patterns in test data, which will not hold later on.

Page 25: The Do's and Don'ts of Data Mining

…According to Jim Kenyon

Director of IT Services

Optimization Groupwww.optimizationgroup.com

Page 26: The Do's and Don'ts of Data Mining

Do not just collect a pile of data and “toss it into the big data mining engine” to see what comes out.

Domain knowledge is an important cross-check on the variables being used. Extraneous data can reduce model accuracy.

Page 27: The Do's and Don'ts of Data Mining

Do not underestimate the power of a simpler-to-understand solution that is slightly less accurate.

A model a client cannot grasp is one that will not be trusted as much as one that “makes sense.”

Page 28: The Do's and Don'ts of Data Mining

…According to Falk Huettmann

Wildlife Ecologist

His work is explicit in space and time, and looks closely at the global effects of the economy.

Page 29: The Do's and Don'ts of Data Mining

Don’t forget to document all modeling steps and underlying data

Page 30: The Do's and Don'ts of Data Mining

Do not blindly trust assumptions made to satisfy frequency statistics, as well asp-values and AIC

Page 31: The Do's and Don'ts of Data Mining

Fun data mining articles

SUBSCRIBE TO SALFORD

SYSTEMS’ BLOG

Sign up for more