Top Banner
The Big Data Challenges of Computational Market Research Frank Smadja [email protected] (@FrankieMbaye) EVP Engineering Toluna April 2014
21

Big data market research

Jan 26, 2015

Download

Software

Frank Smadja

Big data challenges for Market Research.
Presented at BIG 2014 (http://big2014.org) part of WWW2014 (http://www2014.kr/).
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Big data market research

The Big Data Challenges of

Computational Market Research Frank Smadja

[email protected] (@FrankieMbaye)EVP Engineering

Toluna

April 2014

Page 2: Big data market research

Toluna

Table of Content

1. What is a Market Research study2. The main challenge: Targeting.3. Machine Learning Problem and Model4. Some Experiments5. Current and Future Work

Page 3: Big data market research

Toluna

What is a Market Research Study?

Page 4: Big data market research

Toluna

Market Research Goal: Answering Questions for Brands

Customer/Employee Satisfaction:

• Are my customers happy?

• What can I do better for them?

• Am I getting better or worse?

Concept testing:

• Would dog owners buy my organic dog food?

• What should be my target market?

Ad testing:

• Is my advertising campaign effective?

Brand positioning:

• How is my brand doing compared to the competition?

• What are my perceived strong features?

• Where should I invest more?

And many more types of questions

Page 5: Big data market research

Toluna

Output Example : ‘Positioning survey’ for Hilton Garden Inn.

Page 8: Big data market research

Toluna

Example : Positioning survey for Beyonce

Page 9: Big data market research

Toluna

Market Research Main Challenge: Targeting

Select segment of respondents (sample) that is:

• Relevant to the question (dog owners who have one big dog and one small dog, smokers who are trying to stop, etc.)

• Representative and balanced (not biased).

The tougher/restrictive the targeting, the more expensive the study.

Page 10: Big data market research

Toluna

The Targeting Pipeline and Incidence Rate

Demographics Behavioral Study

Select the right population based on simple demographic attributes: Age, Gender, Region, Ethnicity, Income, etc.

Further select based on behavioral and custom attributes: fly more than 5 times a year, uses aspirin on a daily basis, etc.

Fixed set of attributes known beforehand

Free style attributes, usually unknown.

Incidence Rate:

IR = Completes / Starts

Cost is a growing function of IR

Targeting process

Start

Complete

Page 11: Big data market research

Toluna

Why is targeting hard?

Looking for 1,000 people in the UK who “smoke,” “tried to stop in the past,” “live around London,” “age 24-50.”

Data on UK population:

• 18% of the UK adults smoke

• 40% of smokers tried to stop

• 15% of the population is in the London area

• 30% is between 24-50Incidence rate:0.18 * 0.4 *.15 * .3 = 0.3 %Sample size: 333,333 UK

London

Adults

Smokers

Tried to stop

Page 12: Big data market research

Toluna

State of the Art: Use Known Demographic Features

• Basic Demographics are known: 100% incidence.o Age and London

• Smokers: 18%

• Tried to stop: 40%

Incidence rate:1 * 0.18 * 0.4 = 9 %

Sample size: 11,000

Adults in the London Area

Smokers

Tried to stop

Page 13: Big data market research

Toluna

New Direction: Use Known Features and Predict Unknown Features

• Basic Demographics are known: 100% incidence.o Age and London

• If we could predict smokers with 85% accuracy.

• Tried to stop still unknown: 40%

Incidence rate:1 * 0.85 * 0.4 = 34 %

Sample size: 2,900

Adults in the London Area who are predicted to be smokers

Tried to stop

Smokers

Page 14: Big data market research

Toluna

How to Predict Features?The Space Model

Users

Features

Shirt color

Red Blue

Smokes?

Yes NoSex, Age, Region, etc.

User 1

User 2

User 3

User 4

10^^9 users

10^^7 features

Sparse Matrix containing all the attributes (integer answers to questions) we have ever asked.

Demographic attributes

Behavioral attributes

Page 15: Big data market research

Toluna

The Learning Task - The Model

Try to predict answer to the “Smokes?” attribute based on other attributes.

Smokes? Dog owner? Jogger? Overweight?

Page 16: Big data market research

Toluna

The Learning Task - Collaborative FilteringUser correlation or Feature correlation

User correlation: High level features [William Cohen]

• If Josie and Bob both have the X feature then if Josie has the Y feature, Bob is more likely to have the Y feature as well.

• Dog owners

• Political inclination, Taste, LifestyleFeature correlation:

• If Josie has the X feature, Josie is more likely to also have the Y feature.

• Joggers (y) and Smokers (n)

• Favorite sports and Race/Ethnicity

• Income level and Education level

Page 17: Big data market research

Toluna

Smaller Task: Complete missing data on a single survey for a single customer.

Example: On a specific survey, some respondents skip some questions on income, some other skip the income level question. Use answers provided by other respondents to impute the missing data.

Imputation: Complete missing data with substituted values with more or less sophistication. Mean, Nearest neighbor, Multiple Imputation, etc. [Andridge & Little 2011], [Rubin 1987], ...

Implementation: IBM, SPSS Missing Values module. Uses an iterative Markov Chain Monte Carlo (MCMC) and multiple imputation.

Used by the US Census bureau.

First Experiments with Multiple Imputation

Page 18: Big data market research

Toluna

First Experiments with Multiple Imputation Some Results

Where it does not work:

• Too much missing data (over 10%)

• Too many possible answers (what is the name of your children? what is your home city, etc.)

• Not enough data overall (less than 1,000)

Example of features that work well:Dog owners, Smokers, Income level, Age (3 bands), etc.

Accuracy: 85% using blind tests.

Page 19: Big data market research

Toluna

Current Work

Currently working on the storing component in AWS using Hbase, Elastic search and Hadoop.

Some queries:

• Find people who Smoke, Have a red shirt and are between 22 and 34.

• Compute and store the similarity or correlation between any two pair of users.

• Compute and store the similarity between features.

Page 20: Big data market research

Toluna

Future Work

• Define model: binary features (smokes), Integer (number of children, income), Strings (city, diseases, etc.).

• Experiment on a large scale with Collaborative Filtering algorithm and others.

• Experiment with user based and feature based filtering (blend?, Slope-One?)

• Integrate this into Targeting methodology

Page 21: Big data market research

Toluna

Q&A

Suggestions? Ideas?Comments?Questions?