Forecasting Audience Increase on YouTube Matthew Rowe Knowledge Media Institute, The Open University, Milton Keynes, United Kingdom
Nov 18, 2014
Forecasting Audience Increase on YouTube
Matthew Rowe
Knowledge Media Institute, The Open University, Milton Keynes, United Kingdom
Forecasting Audience Increase on YouTube 2
Reputation on the Social Web
• Reputation is:“the beliefs or opinions that are generally held about
someone or something”
• On the Social Web, reputation = greater influence– Important to information flow– Control information diffusion
• How to quantify reputation?– Greater audience = greater reputation– Greater reputation = greater influence– How to measure ‘reputation’?
• In-degree – i.e. number of ‘in links’• Audience levels, subscriber counts
Forecasting Audience Increase on YouTube 3
Influential Social Nodes
4
Why Forecast?
• Users want to expand their audience– What can users do to increase their audience?– What factors contribute to increases?
• Solution: explore the relation between– Audience levels - i.e. in-degree, and;– Behaviour – of user and content
• Discover patterns, then use patterns for forecasting– Given my behaviour, will my audience grow?
Forecasting Audience Increase on YouTube
5
Features
• User behaviour statistics– In-degree – i.e. number of followers– Out-degree – i.e. number follows– User view count – number of posts viewed by
the user– Post count – number of posts uploaded by the
user• Content statistics
– Post view count – i.e. number of views– Favourite count – i.e. number of likes of content
Forecasting Audience Increase on YouTube
6
Schema Barrier
• Social Web platforms provide data using bespoke schemas– i.e. communicating through different languages
• Data from platform A == data from platform B• Schema from platform A != schema from
platform B
• Models must function across platforms– Enabling portable behaviour patterns
• How can we interpret data from different platforms?
Forecasting Audience Increase on YouTube
7
Behaviour Ontology
• Solution: OU Behaviour Ontology
• Defines behaviour in a common format– Extending the SIOC ontology– Captures ‘impact’
• Vital to capture time-stamped user statistics• Two classes for impact
– User impact• Models user features
– Post impact• Models post statistics
Forecasting Audience Increase on YouTube
www.purl.org/NET/oubo/0.23/
8
Data Collection: YouTube
• Gathered a dataset from the video-sharing platform YouTube
• One aim of usage is to increase ‘channel’ popularity– Gain more subscriptions
• For 10 days, at 4 hour intervals:– Logged 100 most recently uploaded videos
• Stopping once 2k were logged– Logged user + content stats for each video
• Randomly chose 10% for analysis– Split dataset into 80/20 for training/testing
Forecasting Audience Increase on YouTube
9
Forecasting Audience Increase
• How can we predict audience levels given observed features?
Forecasting Audience Increase on YouTube
10
Forecasting Audience Increase
• How can we predict audience levels given observed features?
Forecasting Audience Increase on YouTube
Coefficient/weight
Predictor/independent variable
Error/residual vector
11
Forecasting Audience Increase
• How can we predict audience levels given observed features?
• What features are good predictors?– i.e. can we induce a better model than above?– Perform model selection
Forecasting Audience Increase on YouTube
Coefficient/weight
Predictor/independent variable
Error/residual vector
12
Model Selection I
• To perform model selection:– Aim: maximise the coefficient of
determination– Procedure: average features
within the training split in the same time period
Forecasting Audience Increase on YouTube
13
Model Selection I
• To perform model selection:– Aim: maximise the coefficient of
determination– Procedure: average features
within the training split in the same time period
• First Model: all features
Forecasting Audience Increase on YouTube
14
Model Selection I
• To perform model selection:– Aim: maximise the coefficient of
determination– Procedure: average features
within the training split in the same time period
• First Model: all features
Forecasting Audience Increase on YouTube
15
Model Selection II
• How can we improve upon the previous model?
• Feature selection– Exhaustive search of
all possible feature combinations
– Optimize coefficient of determination
Forecasting Audience Increase on YouTube
16
Model Selection II
• How can we improve upon the previous model?
• Feature selection– Exhaustive search of
all possible feature combinations
– Optimize coefficient of determination
• Shows improvements using certain models
Forecasting Audience Increase on YouTube
Forecasting Audience Increase on YouTube 17
Model Selection III
• Exhaustive feature selection drops user view count
Forecasting Audience Increase on YouTube 18
Model Selection III
• Exhaustive feature selection drops user view count
Forecasting Audience Increase on YouTube 19
Forecasting I
• Now have 2 models to forecast with:– All features– Best features
Which model is best?
• Two experiments to test predictive power:– One-step forecast
• Train model on previous k-steps, predict k+1– Final-step forecast
• Predict t=10, train on previous k-steps– Predictions are user dependent
• Evaluation measure: Root Mean Square Error
Forecasting Audience Increase on YouTube 20
Forecasting: Results
• One-step
• Final Step
Forecasting Audience Increase on YouTube 21
Conclusions and Future Work
• Quantified reputation by audience levels• Content reception linked to increased levels:
– More content views = increased audience levels– More favourites = increased audience levels
• Able to accurately predict audience levels– Post feature selection improves performance
• Behaviour ontology captures required features– Common conceptualisation of behaviour
• Future work:– Extend analysis to a larger dataset– Applying models to additional platforms
22
QUESTIONS
Questions?people.kmi.open.ac.uk/[email protected]@mattroweshow
Forecasting Audience Increase on YouTube