Top Banner
I Tube, You Tube, Everybody Tubes… Pablo Rodriguez Telefonica Research Barcelona add image
29

Add image. 3 “ Content is NOT king ” 1950 1980 1995 today 3 40 analog cable digital cable Internet 100 infinite broadcast Time Number of TV channels.

Dec 24, 2015

Download

Documents

Rosalind Rose
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Add image. 3 “ Content is NOT king ” 1950 1980 1995 today 3 40 analog cable digital cable Internet 100 infinite broadcast Time Number of TV channels.

I Tube, You Tube, Everybody Tubes…

Pablo RodriguezTelefonica ResearchBarcelona

add image

Page 2: Add image. 3 “ Content is NOT king ” 1950 1980 1995 today 3 40 analog cable digital cable Internet 100 infinite broadcast Time Number of TV channels.

YouTube Video Example

Page 3: Add image. 3 “ Content is NOT king ” 1950 1980 1995 today 3 40 analog cable digital cable Internet 100 infinite broadcast Time Number of TV channels.

3

“Content is NOT king”

Page 4: Add image. 3 “ Content is NOT king ” 1950 1980 1995 today 3 40 analog cable digital cable Internet 100 infinite broadcast Time Number of TV channels.

Content Explosion

1950 1980 1995 today

340

analogcable

digitalcable

Internet

100

infinite

broadcast

Time

Nu

mb

er

of

TV

ch

an

nels

4

How to search content?

Page 5: Add image. 3 “ Content is NOT king ” 1950 1980 1995 today 3 40 analog cable digital cable Internet 100 infinite broadcast Time Number of TV channels.

5

Infinite Choice = Overwhelming Confusion

Filters required toconnect users with

content that appeal to their

interests

Aggregation and Recom-mendation

Page 6: Add image. 3 “ Content is NOT king ” 1950 1980 1995 today 3 40 analog cable digital cable Internet 100 infinite broadcast Time Number of TV channels.

6

Video and Social Net-works

Trends in video services Users generate new videos Users help each other finding videos

Need to understand users and con-tents Video characteristics in YouTube User-behavior and potential for recom-

mendations

Page 7: Add image. 3 “ Content is NOT king ” 1950 1980 1995 today 3 40 analog cable digital cable Internet 100 infinite broadcast Time Number of TV channels.

7

Particularities of

“bite-size bits for high-speed munch-

ing” [Wired mag. Mar 2007]

Plethora of YouTube clones

UGC is very different

How different?

Page 8: Add image. 3 “ Content is NOT king ” 1950 1980 1995 today 3 40 analog cable digital cable Internet 100 infinite broadcast Time Number of TV channels.

8

UGC vs. Non-UGC

Massive production scale15 days in YouTube to produce 120-yr worth of movies in IMDb!

Extreme publishers1000 uploads over few years vs. 100 movies over

50 years

Short video length30 sec–5 min vs. 100 min movies in LoveFilmthe rest: consumption patterns

Page 9: Add image. 3 “ Content is NOT king ” 1950 1980 1995 today 3 40 analog cable digital cable Internet 100 infinite broadcast Time Number of TV channels.

9

User Participation/Finding Videos

Despite Web 2.0 features, user participation remains low Only 0.16%-0.22% viewers rate videos/

comment.

47% videos have pointers from ex-ternal sites But requests from such sites account for

less than 3% of the total views

Page 10: Add image. 3 “ Content is NOT king ” 1950 1980 1995 today 3 40 analog cable digital cable Internet 100 infinite broadcast Time Number of TV channels.

10

Goals and Data

Potential for recommendation sys-tems?

Popularity evolution Content Duplication

Crawled YouTube and other UGC sys-temsmetadata: video ID, length, views1.6M Entertainment, 250KScience videos

Goals

Data

Page 11: Add image. 3 “ Content is NOT king ” 1950 1980 1995 today 3 40 analog cable digital cable Internet 100 infinite broadcast Time Number of TV channels.

11

Part1: Popularity Distri-bution• Static popularity characteristics• Underlying mechanism

Page 12: Add image. 3 “ Content is NOT king ” 1950 1980 1995 today 3 40 analog cable digital cable Internet 100 infinite broadcast Time Number of TV channels.

12

Pareto Principle

Normalized video rank-ing

Fra

cti

on

of

ag

gre

gate

vie

ws

Other online VoD systems show smaller skew!

10% popular videos account for 80% total views

Page 13: Add image. 3 “ Content is NOT king ” 1950 1980 1995 today 3 40 analog cable digital cable Internet 100 infinite broadcast Time Number of TV channels.

13

Dominant Power-Law Behav-ior

Richer-get-richer principleIf video has K views, then users will watch the

video with rate K

- word frequency- citations of papers - scale of earthquakes- web hits

City population (log)

Freq

uen

cy (

log)

y=xa

Page 14: Add image. 3 “ Content is NOT king ” 1950 1980 1995 today 3 40 analog cable digital cable Internet 100 infinite broadcast Time Number of TV channels.

14

UGC Video Distribution

Straight-line waists and truncated both ends

Page 15: Add image. 3 “ Content is NOT king ” 1950 1980 1995 today 3 40 analog cable digital cable Internet 100 infinite broadcast Time Number of TV channels.

15

Focusing on Popular Videos

Why popular videos deviate from power-law?

Fetch-at-most-once [SOSP2003] Behavior of fetching immutable objects

oncecf. visiting popular web sites many times

Page 16: Add image. 3 “ Content is NOT king ” 1950 1980 1995 today 3 40 analog cable digital cable Internet 100 infinite broadcast Time Number of TV channels.

16

Why the Unpopular Tail Falls Off

Natural shape is curved

Sampling bias or pre-filters Publishers tend to upload interesting

videos

Information filtering or post-filters Search results or suggestions favor popular

items

Page 17: Add image. 3 “ Content is NOT king ” 1950 1980 1995 today 3 40 analog cable digital cable Internet 100 infinite broadcast Time Number of TV channels.

17

Impact of Post-Filters

Videos exposed longer to filtering effect appear more truncated

video rank

Page 18: Add image. 3 “ Content is NOT king ” 1950 1980 1995 today 3 40 analog cable digital cable Internet 100 infinite broadcast Time Number of TV channels.

18

Is it Naturally Curved?

Science videos

Zipf

Log-normalExponen-tial

Zipf + exp cut-

off

Matlab curve fitting for Science

Page 19: Add image. 3 “ Content is NOT king ” 1950 1980 1995 today 3 40 analog cable digital cable Internet 100 infinite broadcast Time Number of TV channels.

19

Is it Naturally Curved?

Science videos

Zipf

Log-normalExponen-tial

Zipf + exp cut-

off

Zipf is scale-free, while exponential is scaled :

underlying mechanism is Zipf and truncation is due to bottlenecks

Matlab curve fitting for Science

Page 20: Add image. 3 “ Content is NOT king ” 1950 1980 1995 today 3 40 analog cable digital cable Internet 100 infinite broadcast Time Number of TV channels.

20

Implication of Our Findings

Latent demand for products that is sup-pressed by bottlenecks in the system

[Chris Anderson, The Long Tail]

Rankings

Vie

ws

Entertainment

40% additional views!How?

Personalized recommendationEnriched metadataAbundant videos

Page 21: Add image. 3 “ Content is NOT king ” 1950 1980 1995 today 3 40 analog cable digital cable Internet 100 infinite broadcast Time Number of TV channels.

21

Part2: Popularity Evolu-tion• Relationship between popularity and age

Page 22: Add image. 3 “ Content is NOT king ” 1950 1980 1995 today 3 40 analog cable digital cable Internet 100 infinite broadcast Time Number of TV channels.

22

Popularity Evolution

So far, we focused on static popularity Now focus on popularity dynamics

How requests on any given day are dis-tributed across the video age?

6-day daily trace of Science videos Step1- Group videos requested at least once by

age Step2- Count request volume per age group

Page 23: Add image. 3 “ Content is NOT king ” 1950 1980 1995 today 3 40 analog cable digital cable Internet 100 infinite broadcast Time Number of TV channels.

23

Request Volume Across AgeUser preference relatively insensitive to age--> 80% requests on videos older than a month

The probability of a video being watched is 43%, 18%, 17% and 14% for the first 24 hours, 6 days, 3 weeks, and 1 month accordingly

Page 24: Add image. 3 “ Content is NOT king ” 1950 1980 1995 today 3 40 analog cable digital cable Internet 100 infinite broadcast Time Number of TV channels.

24

Part4: Content Duplica-tion• Level of duplication• Birth of duplicates

Page 25: Add image. 3 “ Content is NOT king ” 1950 1980 1995 today 3 40 analog cable digital cable Internet 100 infinite broadcast Time Number of TV channels.

25

Content Duplication

Alias- identical or similar copies of the same content

Aliases dilute popularity of a single event Views distributed across multiple copies Difficulty in recommendation & ranking systems

Test with 51 volunteers Find alias using keyword search Identified 1,224 aliases for 184 original videos

Page 26: Add image. 3 “ Content is NOT king ” 1950 1980 1995 today 3 40 analog cable digital cable Internet 100 infinite broadcast Time Number of TV channels.

26

The Level of Popularity Dilution Popularity diluted up to few-orders magnitude

Often aliases got more requests than original (e.g. alias got >1000 times more re-quests)

Page 27: Add image. 3 “ Content is NOT king ” 1950 1980 1995 today 3 40 analog cable digital cable Internet 100 infinite broadcast Time Number of TV channels.

27

How Late Aliases Appear?

Significant aliases appear within one week

Within the first day of posting the original video, sometimes you get more than 80 aliases

Page 28: Add image. 3 “ Content is NOT king ” 1950 1980 1995 today 3 40 analog cable digital cable Internet 100 infinite broadcast Time Number of TV channels.

28

Conclusions

UGC is a new form of video social interac-tion

User interaction remains low

Lots of potential for social recommendations

Page 29: Add image. 3 “ Content is NOT king ” 1950 1980 1995 today 3 40 analog cable digital cable Internet 100 infinite broadcast Time Number of TV channels.

29

Questions?Dataset available at http://an.kaist.ac.kr/traces/IMC2007.html