Top Banner
Social Data and Multimedia Analytics for News and Events Applications Dr. Yiannis Kompatsiaris, [email protected] Multimedia, Knowledge and Social Media Analytics Lab, Head CERTH-ITI Multimodal Social Data Management (MSDM) Workshop
51

Social Data and Multimedia Analytics for News and Events Applications

Jan 27, 2015

Download

Technology

The keynote discusses a framework enabling real-time multimedia indexing and search across multiple social media sources. It places particular emphasis on the real-time, social and contextual nature of content and information consumption in order to integrate topic and event detection, mining, search and retrieval, based on aggregation and indexing of shared user-generated multimedia content. User-friendly applications for the News and Events domains have been developed based on these approaches, incorporating novel user-centric media visualisation and browsing methods. The research and development is part of the FP7 EU project SocialSensor.

Content:
Introduction
Motivation – Challenges
SocialSensor Project and Use Cases
Research Approaches
Large-Scale visual search
Clustering
Verification
Demos – Applications
MM News Demo
Clusttour
Thessfest
Conclusions
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Social Data and Multimedia Analytics for News and Events Applications

Social Data and Multimedia Analytics for News and Events Applications

Dr. Yiannis Kompatsiaris, [email protected], Knowledge and Social Media Analytics Lab, HeadCERTH-ITI

Multimodal Social Data Management (MSDM) Workshop

Page 2: Social Data and Multimedia Analytics for News and Events Applications

MSDM 2014, Athens Social Data and Multimedia Analytics #2

Overview

• Introduction– Motivation – Challenges

• SocialSensor Project and Use Cases• Research Approaches

– Large-Scale visual search– Clustering– Verification

• Demos – Applications– MM News Demo– Clusttour– Thessfest

• Conclusions

Page 3: Social Data and Multimedia Analytics for News and Events Applications

MSDM 2014, Athens Social Data and Multimedia Analytics #3

IntroductionMotivationExample ApplicationsConceptual ArchitectureChallenges

Page 4: Social Data and Multimedia Analytics for News and Events Applications

MSDM 2014, Athens Social Data and Multimedia Analytics

http://www.puzzlemarketer.com/digital-social-brands-in-60-seconds/ (Apr, 2012)

Page 5: Social Data and Multimedia Analytics for News and Events Applications

MSDM 2014, Athens Social Data and Multimedia Analytics

Social Networks as Real-Life Sensors• Social Networks is a data source with an

extremely dynamic nature that reflects events and the evolution of community focus (user’s interests)

• Huge smartphones and mobile devices penetration provides real-time and location-based user feedback

• Transform individually rare but collectively frequent media to meaningful topics, events, points of interest, emotional states and social connections

• Present in an efficient way for a variety of applications (news, marketing, entertainment)

Page 6: Social Data and Multimedia Analytics for News and Events Applications

MSDM 2014, Athens Social Data and Multimedia Analytics #6

Pope Francis

Pope Benedict

2007: iPhone release

2008: Android release

2010: iPad release

http://petapixel.com/2013/03/14/a-starry-sea-of-cameras-at-the-unveiling-of-pope-francis/

Page 7: Social Data and Multimedia Analytics for News and Events Applications

MSDM 2014, Athens Social Data and Multimedia Analytics

Social Networks as Graphs

Page 8: Social Data and Multimedia Analytics for News and Events Applications

MSDM 2014, Athens Social Data and Multimedia Analytics #8

Social Networks as Graphs

“Social networks have emergent properties. Emergent properties are new attributes of a whole that arise from the interaction and interconnection of the parts”

•Emotions, Health, Sexual relationships do not depend just on our connections (e.g. number of them) but on our position - structure in the social graph

– Central – Hub– Outlier– Transitivity (connections between

friends)

Page 9: Social Data and Multimedia Analytics for News and Events Applications

MSDM 2014, Athens Social Data and Multimedia Analytics

Examples - Science

Xin Jin, Andrew Gallagher, Liangliang Cao, Jiebo Luo, and Jiawei Han. The wisdom of social multimedia: using flickr for prediction and forecast, International conference on Multimedia (MM '10). ACM.

9

“…if you're more than 100 km away from the epicenter [of an earthquake] you can read about the quake on twitter before it hits you…”

Page 10: Social Data and Multimedia Analytics for News and Events Applications

MSDM 2014, Athens Social Data and Multimedia Analytics

Example – News (Boston bombing)

#10

“Following the Boston Marathon bombings, one quarter of Americans reportedly looked to Facebook, Twitter and other social networking sites for information, according to The Pew Research Center. When the Boston Police Department posted its final “CAPTURED!!!” tweet of the manhunt, more than 140,000 people retweeted it.”

“Authorities have recognized that one the first places people go in events like this is to social media, to see what the crowd is saying about what to do next”

"I have been following my friend's Facebook [account] who is near the scene and she is updating everyone before it even gets to the news”

Page 11: Social Data and Multimedia Analytics for News and Events Applications

MSDM 2014, Athens Social Data and Multimedia Analytics

Events - Festivals

#11http://www.eventmanagerblog.com/uploads/2012/12/event-technology-infographic.jpg

Page 12: Social Data and Multimedia Analytics for News and Events Applications

MSDM 2014, Athens Social Data and Multimedia Analytics

API Wrapper

Website Wrapper

Scheduler

CRAWLING

Visual Indexing

Near-duplicates

Text Indexing

INDEXING

Media Fetcher

SNA

Sentiment - Influence

Trends - Topics

MINING

Model Building

Concepts

Relevance

Diversity

Popularity

RANKING

Veracity

Crawling Specs

Sources

Interaction

Responsiveness

Aggregation

VISUALIZATION

Aesthetics

Conceptual Architecture

Page 13: Social Data and Multimedia Analytics for News and Events Applications

MSDM 2014, Athens Social Data and Multimedia Analytics

Challenges – Content (Mining)

• Multi-modality: e.g. image + tags

• Rich social context: spatio-temporal, social connections, relations and social graph

• Inconsistent quality: noise, spam, ambiguity, fake, propaganda

• Huge volume: Massively produced and disseminated

• Multi-source: may be generated by different applications and user communities

• Also connected to other sources (e.g. LOD, web)

• Dynamic: Fast updates, real-time

Page 14: Social Data and Multimedia Analytics for News and Events Applications

MSDM 2014, Athens Social Data and Multimedia Analytics

Policy – Licensing – Legal challenges

• Fragmented access to data– Separate wrappers/APIs for each source (Twitter, Facebook, etc.)– Different data collection/crawling policies

• Limitations imposed by API providers (“Walled Gardens”)• Full access to data impossible or extremely expensive (e.g. see data

licensing plans for GNIP and DataSift• Non-transparent data access practices (e.g. access is provided to an

organization/person if they have a contact in Twitter) • Constant change of model and ToS of social APIs

– No backwards compatibility, additional development costs• Ephemeral nature of content

• Social search results often lead to removed content inconsistent and unreliable referencing

• User Privacy & Purpose of use• Fuzzy regulatory framework regarding mining user-contributed data

Page 15: Social Data and Multimedia Analytics for News and Events Applications

MSDM 2014, Athens Social Data and Multimedia Analytics #15

Social Sensor ProjectUse Cases

Page 16: Social Data and Multimedia Analytics for News and Events Applications

MSDM 2014, Athens Social Data and Multimedia Analytics

SocialSensor Project Objective

SocialSensor quickly surfaces trusted and relevant material from social media – with context.

DySCODySCO

behaviour

location

timecontent

usage

social context

Massive social mediaand unstructured web

Social media miningAggregation & indexing

News - InfotainmentPersonalised access

Ad-hoc P2P networks

Page 17: Social Data and Multimedia Analytics for News and Events Applications

MSDM 2014, Athens Social Data and Multimedia Analytics #17

The SocialSensor Vision

SocialSensor quickly surfaces trusted and relevant material from social media – with context.

•“quickly”: in real time•“surfaces”: automatically discovers, clusters and searches •“trusted”: automatic support in verification process•“relevant”: to the users, personalized•“material”: any material (text, image, audio, video = multimedia), aggregated with other sources (e.g. web)•“social media”: across all relevant social media platforms•“with context”: location, time, sentiment, influence

Page 18: Social Data and Multimedia Analytics for News and Events Applications

MSDM 2014, Athens Social Data and Multimedia Analytics #18

Conceptual Architecture and Main components

SEMANTIC MIDDLEWARE

Public Data

In-project Data

SEARCH & RECOMMENDATION

USER MODELLING & PRESENTATION

INDEXINGMINING

STORAGE

DATA COLLECTION / CRAWLING

• Real time dynamic topic and event clustering

• Trend, popularity and sentiment analysis

• Calculate trust/influence scores around people

• Personalized search, access & presentation based on social network interactions

• Semantic enrichment and discovery of services

Page 19: Social Data and Multimedia Analytics for News and Events Applications

MSDM 2014, Athens Social Data and Multimedia Analytics

Use Cases

Casual News application

Casual News Readers

Professional News application

Journalists, Editors, etc.

NEWS

EventLiveDashboard

Festival organizers

INFOTAINMENT

Social Media Walls

Festival attendants

Page 20: Social Data and Multimedia Analytics for News and Events Applications

MSDM 2014, Athens Social Data and Multimedia Analytics #20

“It has changed the way we do news”(MSN)

“Social media is the key place for emerging stories – internationally, nationally, locally” (BBC)

“Social media is transforming the way we do journalism”(New York Times)

Source: picture alliance / dpa

Page 21: Social Data and Multimedia Analytics for News and Events Applications

MSDM 2014, Athens Social Data and Multimedia Analytics #21

Source: Getty Images

“It’s really hard to find the nuggets of useful stuff in an ocean of content” (BBC)

“Things that aren’t relevant crowd out the content you are looking for” (MSN)

“The filters aren’t configurable enough” (CNN)

Page 22: Social Data and Multimedia Analytics for News and Events Applications

MSDM 2014, Athens Social Data and Multimedia Analytics

Verification was simpler in the past...

Source: Frank Grätz

#22

Page 23: Social Data and Multimedia Analytics for News and Events Applications

MSDM 2014, Athens Social Data and Multimedia Analytics #23

Infotainment• Events with large numbers

of visitors• Thessaloniki International

Film Festival – 80,000 viewers / 100,000

visitors in 10 days– 150 films, 350 screenings

• Discovery and presentation of relevant aggregated social media– Trending Topics– Sentiment– Tweet – film matching– Visualization (Social Walls)

Page 24: Social Data and Multimedia Analytics for News and Events Applications

MSDM 2014, Athens Social Data and Multimedia Analytics #24

Research ApproachesLarge-Scale Visual SearchClustering – Community DetectionSocial Media Verification

Page 25: Social Data and Multimedia Analytics for News and Events Applications

MSDM 2014, Athens Social Data and Multimedia Analytics #25

Scalable visual feature aggregation & indexing• Problem: Example-based image search

– Find images that represent same or similar object or scene with a given query image

– Viewed from different viewpoints, occlusions, clutter

• Challenge: Large-scale– Searching databases with tens of millions of images– Objectives to be full-filed:

• Sufficient discriminative power• Fast response times• Efficient memory usage

Page 26: Social Data and Multimedia Analytics for News and Events Applications

MSDM 2014, Athens Social Data and Multimedia Analytics #26

Large-scale visual search

image collection from social media/

Web

image local feature extraction

feature aggregation

feature indexingkNN visual similarity search

concept-based image annotation

image clustering

image (geo)tagging

concept-based search/filtering

duplicate detection

Page 27: Social Data and Multimedia Analytics for News and Events Applications

MSDM 2014, Athens Social Data and Multimedia Analytics #27

Framework

• Implementation and evaluation of the effectiveness of VLAD in combination with SURF

• Scalable image indexing

E. Spyromitros-Xioufis, et al. An Empirical Study on the Combination of SURF Features with VLAD Vectors for Image Search. In WIAMIS 2012, Dublin, Ireland, May 2012.

imagelocal

descriptor extraction

descriptor aggregation

dimensionality reductionset of local

descriptorsfixed size

vector

encoding & indexing

low dimensional vector

SIFT / SURF BOW / VLAD PCA

PQ + ADC/IVFADC

Page 28: Social Data and Multimedia Analytics for News and Events Applications

MSDM 2014, Athens Social Data and Multimedia Analytics #28

Scalable indexing of features

• ADC 16x8 requires 16 bytes per image– ~67M images per GB

• IVFADC requires 4 additional bytes per image– ~53.6M images per GB

• In current implementation we achieve only half of above numbers due to using short int[] instead of byte[], but possible to improve.

• Ideally, 1 billion images could be indexed on a server with 20GB of RAM (projection).

• Query time (for 1M vectors):– Exhaustive search of VLAD vectors (d’=128): 0.50 sec– Product Quantization with ADC 16x8: 0.10 sec (x5 faster)– Product Quantization with IVFADC 16x8: 0.02 sec (x25 faster)

Page 29: Social Data and Multimedia Analytics for News and Events Applications

MSDM 2014, Athens Social Data and Multimedia Analytics #29

VLAD+SIFT vs. VLAD+SURFAccuracy vs. dimensionality• VLAD+SURF improves VLAD+SIFT and FV+SIFT across all dimensions in

both Holidays and Oxford datasets

Results in rows starting with * are taken from Jégou et al., 2011, hence the missing values for some entries.SIFT corresponds to PCA reduced SIFT which yielded better results than standard SIFT in Jegou et al., 2011

Page 30: Social Data and Multimedia Analytics for News and Events Applications

MSDM 2014, Athens Social Data and Multimedia Analytics

Large-scale graph-based clustering• Problem: Discover

structure in large-scale datasets by exploiting their relations

• Challenges - Approach: – Large-scale– Fast response times– Efficient memory usage– Noise Resilient– Number of clusters not

known• Structural similarity +

local expansion community detection techniques

Page 31: Social Data and Multimedia Analytics for News and Events Applications

MSDM 2014, Athens Social Data and Multimedia Analytics

• Structural similarity + Local expansion

(highly efficient and scalable approach)

• Not necessary to know the number of clusters

• Noise resilient(not all nodes need to be part of a community)

• Generic approach adaptable to many applications

(depending on node – edge representation)

+

S. Papadopoulos, Y. Kompatsiaris, A. Vakali. “A Graph-based Clustering Scheme for Identifying Related Tags in Folksonomies”. In Proceedings of DaWaK'10, Springer-Verlag, 65-76

Large-scale graph-based clustering

Page 32: Social Data and Multimedia Analytics for News and Events Applications

MSDM 2014, Athens Social Data and Multimedia Analytics

Computational Verification in Social Media

• Create a computational verification framework to classify tweets with unreliable media content.

• Events used for experimentation

#32

Fake images posted during Hurricane Sandy natural disaster Fake images posted during Boston Marathon bombings

Page 33: Social Data and Multimedia Analytics for News and Events Applications

MSDM 2014, Athens Social Data and Multimedia Analytics

Methodology

#33

Page 34: Social Data and Multimedia Analytics for News and Events Applications

MSDM 2014, Athens Social Data and Multimedia Analytics

Results• Tweet Statistics

• Approaches

#34

Tweets with URLs 343939

Tweets with fake images 10758

Tweets with real images 3540

Hurricane Sandy Boston Marathon

Tweets with URLs 112449

Tweets with fake images 281

Tweets with real images 460

Classifier Classified correctly(%)

Content features

User features

Total features

J48 tree 81.41 67.72 80.68

KStar 81.28 71.16 81.38

Random Forest

80.59 70.15 80.94

Detection accuracy using cross – validation approach

Classifier Classified correctly(%)

Content features

User features

Total features

J48 tree 76.45 70.81 81.25

KStar 81.28 74.12 75.78

Random Forest

78.59 76.15 79.10

Hurricane Sandy Boston Marathon

Page 35: Social Data and Multimedia Analytics for News and Events Applications

MSDM 2014, Athens Social Data and Multimedia Analytics

Results(2)

#35

Classifier Classified correctly(%)

Content features

User features

Total features

J48 tree 73.79 51.06 65.06

KStar 75.30 62.29 53.31

Random Forest

74.02 63.10 65.96

Detection accuracy using different training and testing set in Hurricane Sandy

Classifier Classified correctly(%)

Content features

User features

Total features

J48 tree 55.05 50.12 54.10

KStar 50.01 50.10 50.97

Random Forest

58.75 51.03 58.78

Detection accuracy using Hurricane Sandy for training and Boston Marathon for testing

Page 36: Social Data and Multimedia Analytics for News and Events Applications

MSDM 2014, Athens Social Data and Multimedia Analytics #36

Other approaches

• Graph-based multimodal clustering for social event detection in large collections of images– automatic organization of a multimedia collection into

groups of items, each (group) of which corresponds to a distinct event.

• Unsupervised concept learning detection using social media as training data

• Text analysis for entities matching and sentiment analysis

• Placing images based on content-features• Retrieving diverse images for same entity

Page 37: Social Data and Multimedia Analytics for News and Events Applications

MSDM 2014, Athens Social Data and Multimedia Analytics #37

Demos - ApplicationsMM News DemoClusttourThesFest

Page 38: Social Data and Multimedia Analytics for News and Events Applications

MSDM 2014, Athens Social Data and Multimedia Analytics

Multimedia Demo

Page 39: Social Data and Multimedia Analytics for News and Events Applications

MSDM 2014, Athens Social Data and Multimedia Analytics #39

Multimedia Demo Architecture

#39

StreamManager

Twitter Facebook Flickr YouTube RSS Instagram160.xx.xx.207

MongoDBWrapper160.xx.xx.207

TextIndexer (Solr)160.xx.xx.207

160.xx.xx.207

MediaFetcher, FeatureExtractor (HDFS)160.xx.xx.58 160.xx.xx.107

Social Focused Crawler (HDFS)160.xx.xx.187

Nutch

Nutch VLAD

FeatureIndexer (HDFS)160.xx.xx.207

IVFADC

Data Mining160.xx.xx.191

Visual Clust. Geo Clust. Statistics

Web server160.xx.xx.116

API (3)API (4)

API (1) API (2)

Page 40: Social Data and Multimedia Analytics for News and Events Applications

MSDM 2014, Athens Social Data and Multimedia Analytics

tags: sagrada familia, cathedral, barcelona

taken: 12 May 2009lat: 41.4036, lon: 2.1743

PHOTOS & METADATASPATIAL CLUSTERING + TEMPORAL ANALYSIS

COMMUNITY DETECTION

CLASSIFICATION TO LANDMARKS/EVENTS

VISUAL

TAGHYBRID

[2 years, 50 users / 120 photos]

#users / #photos

duration[1 day, 2 users / 10 photos]

S. Papadopoulos, C. Zigkolis, Y. Kompatsiaris, A. Vakali. “Cluster-based Landmark and Event Detection on Tagged Photo Collections”. In IEEE Multimedia Magazine 18(1), pp. 52-63, 2011

City profile creation (Clusttour)

Page 41: Social Data and Multimedia Analytics for News and Events Applications

MSDM 2014, Athens Social Data and Multimedia Analytics #41

City profile creation (Clusttour)

Community detection on image similarity graphs

Nodes: photosEdges: visual and tag

similarity

Page 42: Social Data and Multimedia Analytics for News and Events Applications

MSDM 2014, Athens Social Data and Multimedia Analytics

Page 43: Social Data and Multimedia Analytics for News and Events Applications

MSDM 2014, Athens Social Data and Multimedia Analytics #43

ThessFest

• Thessaloniki International Film Festival

• Support twitter/comment usage within the app

• Ratings and comments per film

• Feedback aggregation• Votes• Tweets

• Real-time feedback to the organisation and visitors

ThessFest

Page 44: Social Data and Multimedia Analytics for News and Events Applications

MSDM 2014, Athens Social Data and Multimedia Analytics

Fête de la Musique Berlin app• FETEberlin in App Store and Google Play• More than 100K visitors• About 5K musicians• More than 5K app downloads, 25K

sessions

App features•Browse and filter detailed program•Interactive maps and routing •Social Sharing•Artists’ and Stages Details•Social MonitoringMain benefits for attendants•Visitors can browse through maps and don’t get lost as stages are numerous•Event schedule is available always and per stage

– Very useful when the server was down and there was no access to the online schedule

#44

Page 45: Social Data and Multimedia Analytics for News and Events Applications

MSDM 2014, Athens Social Data and Multimedia Analytics #45

Topic analysis

• Top-10 topics• Manual inspection

of clusters:– 53.8% of topic titles

considered informative

– 98.5% of clusters were found to be “clean”

• Topics in time

Page 46: Social Data and Multimedia Analytics for News and Events Applications

MSDM 2014, Athens Social Data and Multimedia Analytics

Other Application Areas

• Science– Sociology, machine learning (machine as a teacher), computer vision

(annotation)• Tourism – Leisure – Culture

– Off-the-beaten path POI extraction• Marketing

– Brand monitoring, personalised ads• Prediction

– Politics: election results• News

– Topics, trends event detection• Others

– Environment, emergency response, energy saving, etc

Page 47: Social Data and Multimedia Analytics for News and Events Applications

MSDM 2014, Athens Social Data and Multimedia Analytics

Conclusions – Further topics• Social media data useful in many applications• Not all data always available (e.g. User queries, fb)

– Infrastructure– Policy - Privacy issues

• Real-time and scalable approaches– Efficiency of semantics and analysis vs. performance vs. infrastructure

• Fusion of various modalities– Content, social, temporal, location

• Verification & Linking other sources (web, Linked Open Data)• Visualization - Interfaces• Applications and commercialization• User engagement

Page 48: Social Data and Multimedia Analytics for News and Events Applications

MSDM 2014, Athens Social Data and Multimedia Analytics

Reusable results

• Starting point: http://www.socialsensor.eu/results – Deliverables– Publications – Datasets– Software– e-letter: http://stcsn.ieee.net/e-letter/vol-1-no-3

• Open-source projects (Apache License v2): https://github.com/socialsensor

– Data collection (stream-manager, storm-focused-crawler)– Indexing (framework-client, multimedia-indexing)– Mining (topic-detection, multimedia-analysis, community-evolution-

analysis, social-event-detection)

Page 49: Social Data and Multimedia Analytics for News and Events Applications

MSDM 2014, Athens Social Data and Multimedia Analytics

European Centre for Social Media

• Topics– Social media analytics– Verification– Visualisation– Applications in different domains

• Activities– Listings of project, results, institutions, events– Community building– Support/organise events– Common social media presence (e.g. LinkedIn)– Funding from subscriptions, training, commercialisation

– Supporting projects: SocialSensor, Reveal, MULTISENSOR, PHEME, DecarboNet, MWCC, uComp,

– Website: http://www.socialmediacentre.eu/ – Research-academic: STCSN http://stcsn.ieee.net/

Page 50: Social Data and Multimedia Analytics for News and Events Applications

MSDM 2014, Athens Social Data and Multimedia Analytics

Contributions from• Dr. Symeon Papadopoulos

• Leading R&D in Social Media Mining• Large-Scale visual search• Community detection – Clusttour

• Dr. Sotirios Diplaris• SocialSensor Technical Project Manager

• Lefteris Spyromitros (PhD Student, AUTH)• Large-Scale visual search

• Christina Boididou • Social Media Verification

• Lazaros Apostolidis• Visualization - User Interface MM News Dem0

• Manos Schinas• Topic Analysis• Back-end Thessfest – Clusttour• MM News Demo

• Juxhin Bakalli • iOS Applications development (ThessFest - Clusttour)

• Antonis Latas• Android Application Development (Thessfest)

Page 51: Social Data and Multimedia Analytics for News and Events Applications

Thank you for your [email protected]

http://mklab.iti.gr