Top Banner
Media Fragment Indexing Using Social Media Yunjia Li 1 , Raphael Troncy 2 , Mike Wald 1 and Gary Wills 1 1 School of Electronics and Computer Science University of Southampton, UK 2 EURECOM, Sophia Antipolis, France, 1
25

Media Fragments Indexing using Social Media

May 11, 2015

Download

Internet

LinkedTV

With more and more video shared on the Web, the practice of sharing a video object from a certain time point (deep-linking) has been implemented by many video sharing platforms. With so many media fragments created, annotated and shared, however, indexing video objects on a fine-grained level on the Web scale is still not implemented by major search engines. To solve this problem, this paper proposes Twitter Media Fragment Indexer, which monitors the Tweet text and uses the embedded URLs pointing to video fragments as the media to massively create index for media fragments. Some preliminary evaluation has shown that media fragments can be successfully indexed in large scale using this system.
This is a presentation from the LIME workshop at ESWC2014.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Media Fragments Indexing using Social Media

Media Fragment Indexing Using Social Media

Yunjia Li1, Raphael Troncy2, Mike Wald1 and Gary Wills1 1School of Electronics and Computer Science

University of Southampton, UK 2EURECOM, Sophia Antipolis, France,

1

Page 2: Media Fragments Indexing using Social Media

Agenda

• Media Fragments

• Media Fragment Indexing Framework

• Survey on Media Fragment URI Implementations on Video Sharing Platforms

• Indexing Media Fragments Using Twitter

• Conclusions and Future Work

2

Page 3: Media Fragments Indexing using Social Media

Media Fragment • Denote the inside content of multimedia resources

• Dimensions defined in the Media Fragment URI 1.0 spec

– Temporal dimension

http://example.org/test.mp4#t=3,7

– Spatial dimension (a rectangle area)

http://example.org/test.mp4#xywh=120,240,180,240

3

Page 4: Media Fragments Indexing using Social Media

Current Situation • Multimedia uploading, sharing, tagging is easy

• Searching a complete multimedia resource on major search engines is easy

• But searching multimedia resource at a fine-grained level on major search engines is difficult

– Availability of annotations: limited amount of annotations linked to media fragments

– SEO problem:

• The landing page is not search-engine-friendly • Everything is on the same page and the notion of

media fragment is not explicitly embedded in HTML 4

Page 5: Media Fragments Indexing using Social Media

Media Fragment Indexing Framework

5

Page 6: Media Fragments Indexing using Social Media

Google’s Ajax Content Crawler

• The Crawler is designed to index Ajax content

• Replace token “#!” in URLs with “_escaped_fragment_”

6 *Diagram from https://developers.google.com/webmasters/ajax-crawling/docs/getting-started

Page 7: Media Fragments Indexing using Social Media

Key Ideas

• The fragment information must be included in the URL

– Syntax: W3C Media Fragment 1.0 Specification

• Prepare two sets of pages for every media fragment

– original landing page for end-users

– a snapshot page for SEO

• Landing page keeps the original user interaction

– Highlight media fragments on opening

• SEO page

– ONLY includes annotations of the media fragment

– Embed rich snippet

7

Page 8: Media Fragments Indexing using Social Media

The Solution

8

Server

Crawler

1:

1: Submit pretty URL replay/1#!t=3,7 to the crawler

2:

2: Crawler asks server for replay/1?_escaped_fragment_=t=3,7

Terrace Theater 3:

Snapshot page Snapshot/1?_escaped_frag

ment_=t=3,7

3: Redirect the request to the snapshot page generated by the server. The snapshot page only contains annotations and Microdata for “#t=3,7”,

Terrace Theater Linked Data

Landing page replay/1#!t=3,7

Terrace Theater replay/1#!t=3,7

4:

4: The snapshot page is returned to the crawler with URL replay/1#!t=3,7

5: Terrace Theater

5: A user searches keyword “Terrace Theater”

6: replay/1#!t=3,7

6: Google includes replay/1#!t=3,7 in the search results

7:

7: The user click the link and ask for the document at replay/1#!t=3,7

8:

8: The server returns the landing page containing both “Terrace Theater” and “Linked Data”

9:

9: The landing page highlights the media fragment by start playing from 3s to 7s

Page 9: Media Fragments Indexing using Social Media

Discussion

• The Media Fragment Indexing Framework solved the SEO problem of media fragments

• The scalability of such method largely relies on whether there are large number of annotations linked to media fragments

• Looking for media fragment annotations?

– Timed-text, transcript, speech recognition

– Manual annotations on each video sharing platforms

– Social Media (Twitter)

9

Page 10: Media Fragments Indexing using Social Media

Survey on Media Fragment URI Implementation

10

Page 11: Media Fragments Indexing using Social Media

Media Fragments and Social Media • The deep-linking function

• A Media Fragment URL can be embedded in a Tweet

• Text of the Tweet is the annotation to the URL

• Get annotations by filtering Tweets that have MF URIs

11

Page 12: Media Fragments Indexing using Social Media

Filter Tweets by Media Fragment URIs

• Problem:

– Any URL in Tweet is potentially a MF URI

– Too many false-positive cases

http://example.org/1234#t=23

http://example.org/1234?t=23

http://example.org/1234?track=23

– They could all be MF URIs, need to be identified manually

• Work around:

– Identify platforms (partially-)implementing MF URI

– Only filter Tweets containing URLs from those domains

12

Page 13: Media Fragments Indexing using Social Media

Survey Methodology

• Find a list of video sharing platforms

– http://en.wikipedia.org/wiki/List_of_video_hosting_services

– 59 websites are targeted in the survey

– Some of them have access restrictions

• Go through each website manually to see whether they provide deep-linking function, such as:

– Social sharing button from a certain time point

– Deep-linking option in right click menu

13

Page 14: Media Fragments Indexing using Social Media

Survey Results (1) • 9 websites partially-implemented MFURI

– 56.com, Dailymotion, Hulu, Vbox7, Viddler, vimeo, Tudou, Youku and YouTube

• They use different syntax to encode temporal dimension

– Most of them use URI query, except YouTube & Vimeo

– Parameter name: “start”, “t”, “st”, etc

– Only Hulu implemented the end time

• Only YouTube partially implemented spatial dimension

– This is an external function implemented by Clickberry

https://clickberry.tv/video/6dafe30e-dcb8-44b8-8190-32be8249a297 14

Page 15: Media Fragments Indexing using Social Media

Survey Results (2) • Only 9 websites partially-implemented MFURI, however:

– Those websites have covered most videos shared on the web

– eBizMBA report: http://www.ebizmba.com/articles/video-websites

• Select filter keywords based on the survey results:

– Twitter is banned in China, so 56.com, Tudou and Youku are ignored

– Hulu has access restriction outside U.S.

• Filter keywords: “YouTube”, “Dailymotion”, “Vbox7”, “Vimeo” and “Viddler”

15

Page 16: Media Fragments Indexing using Social Media

Indexing Media Fragments Using Twitter

16

Page 17: Media Fragments Indexing using Social Media

Twitter Media Fragment Indexer • Collect Tweets filtered by the keywords

• Extract MF URIs in Tweets, parse the media fragment information

• Use Media Fragment Indexing Framework to publish Tweets as media fragment annotations

• Embed rich snippet in the snapshot pages

• Create sitemap for Google to crawl the snapshot pages

• User searches keywords in the Tweet in Google and the link will lead to the video with corresponding start time

17

Page 18: Media Fragments Indexing using Social Media

The Detailed Workflow

18

Page 19: Media Fragments Indexing using Social Media

Indexing Results (1) • Monitor 50-hour non-stop Twitter stream

• Filter phrase: “youtube, dailymotion, vimeo, vbox7, viddler”

• 5,779,858 Tweets examined, 5,269,742 contain URLs

• 32,754 Tweets contain MF URIs, 32796 MF URIs in total

• Media Fragment URIs shared in each website:

19

Website No. of MFURIs %

YouTube 32,666 99.604

Dailymotion 101 0.308

Vbox7 0 0

Viddler 0 0

Vimeo 29 0.088

Page 20: Media Fragments Indexing using Social Media

Indexing Results (2) • 13,088 distinct videos are found

• 17,854 distinct MF URIs for sitemap

– Many Tweets share the same video, but different fragments

– Many retweets

– Some video are not available in UK

• 17,479 URLs (97.9%) in the sitemap have been indexed by Google

• Only 775 URLs are indexed as VideoObject even though all rich snippets are embedded in all snapshot pages

20

Page 21: Media Fragments Indexing using Social Media

Demo • Search “Chris Eppstein”

• As a result, this landing page will be opened and the video start playing from the time indicated in the Tweet containing keywords “Chris Eppstein”

21

Page 22: Media Fragments Indexing using Social Media

Conclusions and Future Work

22

Page 23: Media Fragments Indexing using Social Media

Conclusions • Introduced Media Fragment Indexing Framework

• Propose the using of social media to acquire more annotations to media fragments

• Survey the MF URI implementation on major video sharing platforms

• Twitter Media Fragment Indexer

– Monitor Tweet Stream and automatically create media fragment annotations

– Index media fragments in Google

– YouTube is the most important domain to share media fragments on Twitter

23

Page 24: Media Fragments Indexing using Social Media

Future Work • How valid tweets could be served as media fragment

annotations

– many noisy and unrelated text

– many re-tweets

• Experiment on larger scale (billions of tweets and continuous monitoring)

• Expand the methodology to other media fragment annotations, such as timed-text

• Extract named entities from tweets and further link media fragments to the Linked Data Cloud

24

Page 25: Media Fragments Indexing using Social Media

Questions?

25