Center for Web Intelligence Center for Web Intelligence School of Computing, DePaul University Chicago, Illinois, USA Challenges and Opportunities in Data Mining: Big Data Predictive User Modeling and Big Data, Predictive User Modeling, and Personalization Bamshad Mobasher Center for Web Intelligence Center for Web Intelligence School of Computing DePaul University, Chicago, Illinois, USA April 20, 2012
31
Embed
Challenges and Opportunities in Data Mining: Big …facweb.cs.depaul.edu/mobasher/Sagence-April20-2012.pdfBig Data Predictive User Modeling andBig Data, Predictive User Modeling, and
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Center for Web IntelligenceCenter for Web IntelligenceSchool of Computing, DePaul University
Chicago, Illinois, USA
Challenges and Opportunities in Data Mining:Big Data Predictive User Modeling andBig Data, Predictive User Modeling, and
Personalization
Bamshad MobasherCenter for Web IntelligenceCenter for Web Intelligence
School of ComputingDePaul University, Chicago, Illinois, USA
April 20, 2012
Google Trends: Data Mining vs. AnalyticsGoogle Trends: Data Mining vs. Analytics
Center for Web IntelligenceCenter for Web IntelligenceSchool of Computing, DePaul University
Chicago, Illinois, USA
2
The Big Question?The Big Question?
Will data mining remain relevant? If so, how?
Quick survey: Do you think the amount of data available in the digital worldg
will decrease in the future?will become less complex?
Where is the Life we have lost in living?Where is the wisdom we have lost in knowledge?Where is the knowledge we have lost in information?
-- T.S. Eliot, “The Rock”
Center for Web IntelligenceCenter for Web IntelligenceSchool of Computing, DePaul University
Chicago, Illinois, USA
3
,
How much data?Google: ~20-30 PB a dayWayback Machine has ~4 PB + 100-200 TB/month
f /Facebook: ~3 PB of user data + 25 TB/dayeBay: ~7 PB of user data + 50 TB/dayCERN’s Large Hydron Collider generates 15 PB a yearCERN s Large Hydron Collider generates 15 PB a yearIn 2010, enterprises stored 7 Exabytes = 7,000,000,000 GB
640K ought to be enough for anybody.
Center for Web IntelligenceCenter for Web IntelligenceSchool of Computing, DePaul University
Chicago, Illinois, USA
The Data Tsunami
McKinsy Global Institute Report:“Big Data: the next frontier forg
innovation, competition and productivity”
Center for Web IntelligenceCenter for Web IntelligenceSchool of Computing, DePaul University
Chicago, Illinois, USA
5
Big Data Valueg
McKinsy Global Institute Report:“Big Data: the next frontier for innovation,
competition and productivity”
Center for Web IntelligenceCenter for Web IntelligenceSchool of Computing, DePaul University
Chicago, Illinois, USA
6
Center for Web IntelligenceCenter for Web IntelligenceSchool of Computing, DePaul University
Chicago, Illinois, USA
7
8
What’s Seen the Most Growth in 2008-2011
Types of Data Types of Activities/Areas• Location / Geo / Mobile Data • Search / Web content mining• Music / Audio• Social Media / Social Networks• Time Series
g• Text mining / opinion analysis• Personalization / recommendation• Social network / Social media
• Images / Video• User Profile data• Text feeds / Micro-blog data
analysis• Topic modeling / micro-blog analysis
H lth i f ti
Much of this growth is driven by end user mobile or Web-based applications
• Health informatics
applicationsusers are inundated with huge volume of complex informationneed for more personalized intelligent applications
Center for Web IntelligenceCenter for Web IntelligenceSchool of Computing, DePaul University
Chicago, Illinois, USA
9
Personalization
The ProblemDynamically serve customized content (pages productsDynamically serve customized content (pages, products, recommendations, etc.) to users based on their profiles, preferences, or expected interests
Why we need it?Information spaces are becoming much more complex for userInformation spaces are becoming much more complex for user to navigate (huge online repositories, social networks, mobile applications, blogs, ….)For businesses: need to grow customer loyalty / increase salesFor businesses: need to grow customer loyalty / increase salesIndustry Research: successful online retailers are generating as much as 35% of their business from recommendations
Center for Web IntelligenceCenter for Web IntelligenceSchool of Computing, DePaul University
Chicago, Illinois, USA
10
Data Mining and PersonalizationData Mining and Personalization
“Killer App” for data mining?Tangible successes both in the research and in industrial applications
recommender systemsrecommender systemspersonalized Web agentsuser adaptive systemsWeb marketing and eCRMpersonalized search
Sophisticated modeling approaches based on bothSophisticated modeling approaches based on both predictive and unsupervised DM techniques
Center for Web IntelligenceCenter for Web IntelligenceSchool of Computing, DePaul University
Chicago, Illinois, USA
11
PersonalizationCommon Approaches
Collaborative FilteringCollaborative FilteringGive recommendations to a user based on preferences of “similar” users
Content Based FilteringContent-Based FilteringGive recommendations to a user based on items with “similar” content in the user’s profile
R l B d (K l d B d) Filt iRule-Based (Knowledge-Based) FilteringProvide recommendations to users based on predefined (or learned) rulesage(x, 25-35) and income(x, 70-100K) and childred(x, >=3) recommend(x, Minivan)
Combined or Hybrid Approaches
Center for Web IntelligenceCenter for Web IntelligenceSchool of Computing, DePaul University
Chicago, Illinois, USA
12
The Recommendation Task
Basic formulation as a prediction problem
Given a profile Pu for a user u, and a target item it, predict the preference score of user u on item it
Typically the profile P contains preference scores by u
predict the preference score of user u on item it
Typically, the profile Pu contains preference scores by uon some other items, {i1, …, ik} different from it
preference scores on i1, …, ik may have been obtained explicitly ( i ti ) i li itl ( ti t d t(e.g., movie ratings) or implicitly (e.g., time spent on a product page or a news article)
Center for Web IntelligenceCenter for Web IntelligenceSchool of Computing, DePaul University
Chicago, Illinois, USA
13
The Recommendation Task
Content-Based RecommendationPredictions for unseen (target) items are computed based onPredictions for unseen (target) items are computed based on their similarity (in terms of content) to items in the user profile
C ll b ti R d tiCollaborative RecommendationPredictions for unseen (target) items are computed based the other users’ with similar interest scores on items in user u’sprofile
i.e. users with similar tastes (aka “nearest neighbors”)requires computing correlations between user u and other users
di i iaccording to interest scores or ratingsk-nearest-neighbor (knn) strategy
Center for Web IntelligenceCenter for Web IntelligenceSchool of Computing, DePaul University
Chicago, Illinois, USA
14
Content-Based Recommender
Systems
Center for Web IntelligenceCenter for Web IntelligenceSchool of Computing, DePaul University
Need to “learn” the user profile:pUser is an art historian?
User is a pop music fan?
Center for Web IntelligenceCenter for Web IntelligenceSchool of Computing, DePaul University
Chicago, Illinois, USA
16
Content-Based Recommenders:: more examples
Music recommendationsPlay list generation
Example: PandoraCenter for Web IntelligenceCenter for Web Intelligence
School of Computing, DePaul UniversityChicago, Illinois, USA
17
Example: Pandora
Collaborative Recommender
Systems
Center for Web IntelligenceCenter for Web IntelligenceSchool of Computing, DePaul University
Chicago, Illinois, USA
18
Collaborative Recommender Systems
Center for Web IntelligenceCenter for Web IntelligenceSchool of Computing, DePaul University
Chicago, Illinois, USA
19
Collaborative Recommender Systems
Center for Web IntelligenceCenter for Web IntelligenceSchool of Computing, DePaul University
Chicago, Illinois, USA
20
Personalization Based on User Behavior Data: Data Mining ApproachData Mining Approach
Typically an Offline ProcessData Preparation / Modeling Phase Pattern Discovery Phase
Implicit or explicit User preference data
(clicktrhoughs, ratings, purchases, reviews
Pattern FilteringAggregation
Pattern Analysis
p ,
Data CleaningData Integration
Data Preprocessing
AggregationCharacterization
AggregateUser Models
Data IntegrationData Transformation
Event Model GenerationSessionization
Data Mining
PatternsContent
& Structure
UserTransaction /PreferenceDatabase
User SegmentationItem Clustering / SimilarityUser/Item Classification
Correlation AnalysisAssociation Rule Mining
Sequential Pattern Mining
Domain Knowledge
Center for Web IntelligenceCenter for Web IntelligenceSchool of Computing, DePaul University
Chicago, Illinois, USA
21
Sequential Pattern Mining
Personalization Based on User Behavior Data: Data Mining ApproachData Mining Approach
Online Process
Recommendation Engine
Recommendations,Integrated
AggregateUser Models
<user,item1,item2, Recommendations,Predictions
gUser Profile
user,item1,item2,…>
Stored User Profile
Web Server Client ApplicationActive SessionDomain Knowledge
Center for Web IntelligenceCenter for Web IntelligenceSchool of Computing, DePaul University
Chicago, Illinois, USA
22
New Challenges g
Context-AwarenessCan s stems nderstand ser’s conte t sit ationCan systems understand user’s context, situation, current intentions?Need to understand “task” being performed; user’s g p ;environment, domain knowledge/characteristics; short-term and long-term preferences
I t ti D i K l dIntegrating Domain KnowledgeMost current modeling approaches focus on the discovery of “shallow” patternsdiscovery of shallow patternsDM + Domain Knowledge (DM + AI) intelligent apps that can reason about / explain patterns
Center for Web IntelligenceCenter for Web IntelligenceSchool of Computing, DePaul University
Chicago, Illinois, USA
23
New Challenges Security / Trust / Reputation
Many user adaptive systems vulnerable to malicious manipulation (e g “shilling”)manipulation (e.g., shilling ) Need more robust algorithms and ways to detect malicious profilesI i l t th ti f “ t ti ” b iti lIn social systems the notion of “reputation” beocmes critical
SerendipityMost predictive models not necessarily the bestp yNeed the ability to “surprise” or provide novelty
Big Data ChallengesQ i f l i f k d l i hQuestions of scale require new frameworks and algorithmsWide variation in user behaviors require more sophisticated models (e.g., matrix factorization, hybrid / ensemble models)
Center for Web IntelligenceCenter for Web IntelligenceSchool of Computing, DePaul University
Chicago, Illinois, USA
24
Challenges:: Problems of Scaleg
Center for Web IntelligenceCenter for Web IntelligenceSchool of Computing, DePaul University
Chicago, Illinois, USA
25
New Opportunities:: Social Annotation S stemsSystems
Center for Web IntelligenceCenter for Web IntelligenceSchool of Computing, DePaul University
Chicago, Illinois, USA
26
Amazon Example: Tags describe the gResource
• Tags can describeTags can describe• The resource (genre, actors, etc)
• Organizational (toRead)• Subjective (awesome)
• Ownership (abc)etc
Center for Web IntelligenceCenter for Web IntelligenceSchool of Computing, DePaul University
Chicago, Illinois, USA
• etc
Tag RecommendationTag Recommendation
Center for Web IntelligenceCenter for Web IntelligenceSchool of Computing, DePaul University
Chicago, Illinois, USA
Example: Tags describe the userThese systems are “collaborative.”
Example: Tags describe the user
Recommendation / Analytics based on the “wisdom of crowds.”
Rai Aren's profileRai Aren s profileco-author
“Secret of the Sands"
Center for Web IntelligenceCenter for Web IntelligenceSchool of Computing, DePaul University
Chicago, Illinois, USA
New Opportunities:: Social RecommendationRecommendation
A form of collaborative filtering using social network data
U filUsers profiles represented as sets of links to other nodes (users or items) in the networkPrediction problem: inferPrediction problem: infer a currently non-existent link in the network
Center for Web IntelligenceCenter for Web IntelligenceSchool of Computing, DePaul University
Chicago, Illinois, USA
30
ConclusionsPersonalization and Recommendation Technologies
The killer app for predictive data analyticsThe killer app for predictive data analyticsWill drive the next generation of Web applications
Lots of new (and old) challengesNew: Social media and social networks provide new challenges and opportunities; big data challenges scalability and effectiveness of old algorithmsscalability and effectiveness of old algorithmsOld: scalability, sparsity, scrutability, serendipityPromising new work:Promising new work:
New approaches to hybridizationSocial media analytics
Center for Web IntelligenceCenter for Web IntelligenceSchool of Computing, DePaul University