1/17/17 1 IS5126 - HowBA Lecture 2 – Data, Databases, SQL, Behavioral AnalyCcs; Jan 18, 2017 Dr. Tuan Q Phan NUS IS5126 Admin • Pick up syllabus and schedule, also available on my website: hRp://www.tuanqphan.us • Purchase HBS Case from hRp:// hbsp.harvard.edu – Data.gov, #9-610-075 • Sign up team of 4 on IVLE by Jan. 30 – Use IVLE forums to find team mates Dr. Tuan Q PHAN, NUS IS5126, (c) 2017 Learning ObjecCves • Data.gov Case Discussion and PresentaCons • Data ManipulaCon, ETL • SQL – Database Design – Best PracCces – NormalizaCon Guidelines • MarkeCng and Behavioral AnalyCcs • Mini-case Dr. Tuan Q PHAN, NUS IS5126, (c) 2017 Learning ObjecCves • Products – Product Life Cycle – Supply/Demand – Market Basket – MarkeCng Strategy • People – CRM – UClity Modeling • OrganizaCons/Companies – CompeCCon – Strategy • CorrelaCon and CausaliCes • Resource: – The Ten Day MBA, Steven Silbiger – 50 Social/Psycology books: hRp://www.sparringmind.com/psychology-books/ Dr. Tuan Q PHAN, NUS IS5126, (c) 2017 Databases and ManipulaCon Raw data Data ware- house Import Transform Analyze Dr. Tuan Q PHAN, NUS IS5126, (c) 2017 Data ManipulaCon • Raw data is large, unstructured, noisy • Extract, Transform, Load (ETL): process to “clean up” the data for processing and storage • Extract: parsing, collecCon from mulCple sources/formats, webscraping • Transform: convert to appropriate format, apply set of rules, noise reducCon, error handling, translate codes, validaCon – Python, SQL, awk, sed, …. • Load: loads in to the data warehouse (database) • Staging environment • Resource: The Data Warehouse ETL Toolkit, Ralph Kimball & Joe Caserta Dr. Tuan Q PHAN, NUS IS5126, (c) 2017
11
Embed
IS5126 Lecture 02 - NUS Computing - Home · Lecture 2 – Data, Databases, SQL, Behavioral AnalyCcs; Jan 18 ... The Data Warehouse ETL Toolkit, Ralph Kimball ... Hive/Hadoop, Netezza
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
insert into books values (2, "Data Mining", 2011, 26.85, "Linoff");
id title published_year price author ---------- ------------- -------------- ---------- ---------- 1 Practical SQL 1998 14.0 Bowman 2 Data Mining 2011 26.85 Linoff
Dr.TuanQPHAN,NUSIS5126,(c)2017
SQL–Loadingdata
• Loaddatafromacsvfile:sopware-specificbooks.csv
3,"Scoring Points",2008,22.00,"Humby"
4,"Business Intelligence",2009,57.85,"Vercellis”
.separator ","
.import books.csv books
id title published_year price author ---------- ------------- -------------- ---------- ---------- 1 Practical SQL 1998 14.0 Bowman 2 Data Mining 2011 26.85 Linoff 3 Scoring Point 2008 22.0 Humby 4 Business Inte 2009 57.85 Vercellis
Dr.TuanQPHAN,NUSIS5126,(c)2017
SQL–DELETE&UPDATE
• Deletesarowdelete from books where id=4;
• Modifiesvalue(s)Update books set price=5.00;
id title published_year price author ---------- ------------- -------------- ---------- ---------- 1 Practical SQL 1998 5.0 Bowman 2 Data Mining 2011 5.0 Linoff 3 Scoring Point 2008 5.0 Humby 4 Business Inte 2009 5.0 Vercellis
Dr.TuanQPHAN,NUSIS5126,(c)2017
SQL–SELECT
• Querydatabaseselect * from books;
• Sortresultsselect * from books order by published_year desc;
id title published_year price author ---------- ------------- -------------- ---------- ---------- 1 Practical SQL 1998 5.0 Bowman 2 Data Mining 2011 5.0 Linoff 3 Scoring Point 2008 5.0 Humby 4 Business Inte 2009 5.0 Vercellis
id title published_year price author ---------- ----------- -------------- ---------- ---------- 2 Data Mining 2011 5.0 Linoff 4 Business In 2009 5.0 Vercellis 3 Scoring Poi 2008 5.0 Humby 1 Practical S 1998 5.0 Bowman
Dr.TuanQPHAN,NUSIS5126,(c)2017
SQL–SELECT…WHERE
• Whereclausesubsetsresultsselect title, author from books where published_year > 2000;
• CombiningcondiConsselect * from books where published_year > 2000 and author="Linoff";
title author published_year ----------- ---------- -------------- Data Mining Linoff 2011 Scoring Poi Humby 2008 Business In Vercellis 2009
id title published_year price author ---------- ----------- -------------- ---------- ---------- 2 Data Mining 2011 5.0 Linoff
Dr.TuanQPHAN,NUSIS5126,(c)2017
SQL–SELECT…FUZZY
• Allowsforwildcardstringmatching
select * from books where title like “%ness%”;
id title published_year price author ---------- --------------------- -------------- ---------- ---------- 4 Business Intelligence 2009 5.0 Vercellis
Dr.TuanQPHAN,NUSIS5126,(c)2017
1/17/17
4
SQL–Groupby
• Aggregatebyacolumn:insert into books values(5,"2008 book",2008,25.00,"Phan");
select published_year, count(*), avg(price), sum(price) from books group by published_year;
id title published_year price author ---------- ------------- -------------- ---------- ---------- 1 Practical SQL 1998 5.0 Bowman 2 Data Mining 2011 5.0 Linoff 3 Scoring Point 2008 5.0 Humby 4 Business Inte 2009 5.0 Vercellis 5 2008 book 2008 25.0 Phan
• AcquisiCon:– Acquisition rate (%) = (Number of prospects acquired / Number
of prospects targeted) x 100 – Acquisition is defined as the first purchase or purchasing in the
first predefined period – Denotes average probability of acquiring a customer – Always calculated for a group of customers – Usually computed on a campaign-by-campaign basis
• AcquisiConcostperprospect– Acquisition cost ($) = Acquisition spending ($) / Number of
prospects acquired – Measured in monetary terms – Precise values for companies targeting prospects through direct
mail – Less precise for broadcasted communications
• Average inter-purchase time = 1 / Number of purchase incidences from first purchase till current time period – Measured in time periods – Evaluation of metric – Easy to calculate – Useful for industries with frequent customer
purchases – Marketing intervention might be warranted
anytime customers fall considerably below their AIT
Dr.TuanQPHAN,NUSIS5126,(c)2017
1/17/17
9
People–CRMRetenCon/DefecConrates
• Retention rate – Average likelihood that a customer purchases in period t, given
that he/she has purchased in the last period t-1 – Retention rate (%) = [(Number of customers in cohort buying in
period t | buying in period t-1) / Number of customers in cohort buying in period t-1] x 100
• Number of retained customers in any period (t+n) = (Number of acquired customers in period t) x (Retention rate(t+n))
– Assuming a constant retention rate among acquired customers
• Example – Assume a constant retention rate of 0.75, or defection rate of
0.25 – Average lifetime duration = 4 (1 / [1 - 0.75]) – Customers starting at beginning of year 1 = 100 – Customers remaining at end of year 1 = 75.00 (100 x 0.751) – Customers remaining at end of year 2 = 56.25 (100 x 0.752) – Customers remaining at end of year 3 = 42.19 (100 x 0.753) – Customers remaining at end of year 4 = 31.64 (100 x 0.754)
Dr.TuanQPHAN,NUSIS5126,(c)2017
People-CRMDefecConRatevs.CustomerTenure
• Variation (or heterogeneity) around average lifetime duration of 4 years
0
5
10
15
20
25
30
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Customer Tenure (Periods)
# of
Cus
tom
ers
Def
ectin
g
Dr.TuanQPHAN,NUSIS5126,(c)2017
People-CRMLifeCmeDuraCon
• Less precise metric – Average lifetime duration = 1 / (1 - Average retention rate)
• More precise metric – Average lifetime duration =
– where N = cohort size, t = time period • Complete or incomplete information on customer
– Complete: customer’s time of first and last purchases are known
– Incomplete: either only time of first purchase, or only time of last purchase, or both time of first and last purchases are unknown
1Number of customers retained
T
tt
t
N=
×∑
Dr.TuanQPHAN,NUSIS5126,(c)2017
People–CRMProbability(AcCve)
• Probability of a customer being active in time t in a non-contractual setting – Probability(Active) = Tn – where n = number of purchases in a given
period, T = time of the last purchase (given as a fraction of the observation period)
– Simple approximation of probability(active) – More advanced computation methods exist
Dr.TuanQPHAN,NUSIS5126,(c)2017
People-CRMProbability(AcCve)
• Customer 1: T = (8/12) = 0.667 and n = 4 – Probability(Active) = (0.667)4 = 0.198
• Customer 2: T = (8/12) = 0.667 and n = 2 – Probability(Active) = (0.667)2 = 0.444
Customer 1
Customer 2
Observation Period Holdout Period
Month 1 Month 12 Month 8 Month 18
X indicates that a purchase was made by a customer in that month