Add image. 3 “ Content is NOT king ” 1950 1980 1995 today 3 40 analog cable digital cable Internet 100 infinite broadcast Time Number of TV channels.

Post on 24-Dec-2015

215 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

Transcript

I Tube, You Tube, Everybody Tubes…

Pablo RodriguezTelefonica ResearchBarcelona

add image

YouTube Video Example

3

“Content is NOT king”

Content Explosion

1950 1980 1995 today

340

analogcable

digitalcable

Internet

100

infinite

broadcast

Time

Nu

mb

er

of

TV

ch

an

nels

4

How to search content?

5

Infinite Choice = Overwhelming Confusion

Filters required toconnect users with

content that appeal to their

interests

Aggregation and Recom-mendation

6

Video and Social Net-works

Trends in video services Users generate new videos Users help each other finding videos

Need to understand users and con-tents Video characteristics in YouTube User-behavior and potential for recom-

mendations

7

Particularities of

“bite-size bits for high-speed munch-

ing” [Wired mag. Mar 2007]

Plethora of YouTube clones

UGC is very different

How different?

8

UGC vs. Non-UGC

Massive production scale15 days in YouTube to produce 120-yr worth of movies in IMDb!

Extreme publishers1000 uploads over few years vs. 100 movies over

50 years

Short video length30 sec–5 min vs. 100 min movies in LoveFilmthe rest: consumption patterns

9

User Participation/Finding Videos

Despite Web 2.0 features, user participation remains low Only 0.16%-0.22% viewers rate videos/

comment.

47% videos have pointers from ex-ternal sites But requests from such sites account for

less than 3% of the total views

10

Goals and Data

Potential for recommendation sys-tems?

Popularity evolution Content Duplication

Crawled YouTube and other UGC sys-temsmetadata: video ID, length, views1.6M Entertainment, 250KScience videos

Goals

Data

11

Part1: Popularity Distri-bution• Static popularity characteristics• Underlying mechanism

12

Pareto Principle

Normalized video rank-ing

Fra

cti

on

of

ag

gre

gate

vie

ws

Other online VoD systems show smaller skew!

10% popular videos account for 80% total views

13

Dominant Power-Law Behav-ior

Richer-get-richer principleIf video has K views, then users will watch the

video with rate K

- word frequency- citations of papers - scale of earthquakes- web hits

City population (log)

Freq

uen

cy (

log)

y=xa

14

UGC Video Distribution

Straight-line waists and truncated both ends

15

Focusing on Popular Videos

Why popular videos deviate from power-law?

Fetch-at-most-once [SOSP2003] Behavior of fetching immutable objects

oncecf. visiting popular web sites many times

16

Why the Unpopular Tail Falls Off

Natural shape is curved

Sampling bias or pre-filters Publishers tend to upload interesting

videos

Information filtering or post-filters Search results or suggestions favor popular

items

17

Impact of Post-Filters

Videos exposed longer to filtering effect appear more truncated

video rank

18

Is it Naturally Curved?

Science videos

Zipf

Log-normalExponen-tial

Zipf + exp cut-

off

Matlab curve fitting for Science

19

Is it Naturally Curved?

Science videos

Zipf

Log-normalExponen-tial

Zipf + exp cut-

off

Zipf is scale-free, while exponential is scaled :

underlying mechanism is Zipf and truncation is due to bottlenecks

Matlab curve fitting for Science

20

Implication of Our Findings

Latent demand for products that is sup-pressed by bottlenecks in the system

[Chris Anderson, The Long Tail]

Rankings

Vie

ws

Entertainment

40% additional views!How?

Personalized recommendationEnriched metadataAbundant videos

21

Part2: Popularity Evolu-tion• Relationship between popularity and age

22

Popularity Evolution

So far, we focused on static popularity Now focus on popularity dynamics

How requests on any given day are dis-tributed across the video age?

6-day daily trace of Science videos Step1- Group videos requested at least once by

age Step2- Count request volume per age group

23

Request Volume Across AgeUser preference relatively insensitive to age--> 80% requests on videos older than a month

The probability of a video being watched is 43%, 18%, 17% and 14% for the first 24 hours, 6 days, 3 weeks, and 1 month accordingly

24

Part4: Content Duplica-tion• Level of duplication• Birth of duplicates

25

Content Duplication

Alias- identical or similar copies of the same content

Aliases dilute popularity of a single event Views distributed across multiple copies Difficulty in recommendation & ranking systems

Test with 51 volunteers Find alias using keyword search Identified 1,224 aliases for 184 original videos

26

The Level of Popularity Dilution Popularity diluted up to few-orders magnitude

Often aliases got more requests than original (e.g. alias got >1000 times more re-quests)

27

How Late Aliases Appear?

Significant aliases appear within one week

Within the first day of posting the original video, sometimes you get more than 80 aliases

28

Conclusions

UGC is a new form of video social interac-tion

User interaction remains low

Lots of potential for social recommendations

29

Questions?Dataset available at http://an.kaist.ac.kr/traces/IMC2007.html

top related