Top Banner
Data Mining for Moderation of Social Data Fernando G. Guerrero CEO SolidQ [email protected]
35
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Data Mining for Moderation of Social Data

Data Mining for Moderation of Social Data

Fernando G. Guerrero CEO SolidQ [email protected]

Page 2: Data Mining for Moderation of Social Data
Page 3: Data Mining for Moderation of Social Data

3 © 2011 SolidQ

Page 4: Data Mining for Moderation of Social Data

Introductions • Fernando G. Guerrero •Global CEO of SolidQ • [email protected]

•Microsoft Regional Director for Spain since 2004 • SQL Server MVP from year 2000 till 2007 •Usual suspect at many international conferences

Page 5: Data Mining for Moderation of Social Data

SolidQ 2012… 10th anniversary •160 people in 23 countries:

• Argentina, Australia, Austria, Bulgaria, Canada, Chile, Costa Rica, Croatia, Denmark, France, Germany, India, Israel, Italy, Mexico, Saudi Arabia, Serbia, Slovakia, Slovenia, Spain, Sweden, UK, USA

•50 current or former RDs or MVPs •Authors of many books, articles, and whitepapers •Research Collaboration with:

• Universidad de Alicante • Universidad de les Illes Balears • Universidad de Santiago de Compostela • The European Union • The Spanish Ministry of Economy and Innovation

Page 6: Data Mining for Moderation of Social Data

6 © 2012 SolidQ

Agenda

• Social Data •Market Research • Sentiment Analysis, Text Mining •Moderation, Data Mining • SolidQ Research Lines in Social Data

Page 7: Data Mining for Moderation of Social Data

7 © 2012 SolidQ

Social data is everywhere

Page 8: Data Mining for Moderation of Social Data

8

Page 9: Data Mining for Moderation of Social Data

9 © 2012 SolidQ

Social data is about everything

Music

Page 10: Data Mining for Moderation of Social Data

10 © 2012 SolidQ

Social is there

• Is your organization promoting social about you?

Products Services Stories

Page 11: Data Mining for Moderation of Social Data

11 © 2012 SolidQ

Social is there, reputation

•What is social saying about you? • Product • Services • Decisions • Image

Page 12: Data Mining for Moderation of Social Data

12 © 2012 SolidQ

Market Research

•What is social requesting you? • Future Services • Product updates

•Can you ask questions to social?

• Is this service going to succeed • How can I fixed the current problem • Is society ready for this law

Page 13: Data Mining for Moderation of Social Data

13 © 2012 SolidQ

Sentiment Analysis, Text Mining

The movie was fabulous!

The movie stars Mr. X

The movie was horrible!

[ Factual ] [ Sentimental ] [ Sentimental ]

Page 14: Data Mining for Moderation of Social Data

14 © 2011 SolidQ

Page 15: Data Mining for Moderation of Social Data

15 © 2012 SolidQ

What is Data Mining?

• Inform actionable business decisions •Contrasts with “machine learning”

Page 16: Data Mining for Moderation of Social Data

16 © 2012 SolidQ

Media Case Study

•Millions of posts per year (different moderation scenarios) •About 25% are human moderated •About 10% of the moderated posts fail •No Business Intelligence applications for analysis

or reporting

Page 17: Data Mining for Moderation of Social Data

17 © 2012 SolidQ

Moderation, Data Mining

• Contextual Information • Time • Location • User

• At 10am comments are safer than at 2AM. • A user maybe safe talking about science bad

dangerous talking about sports. • If a thread is hot (dangerous), comment maybe hot. • Combining context pattern the systems assign risk to

posts without going into the text.

Page 18: Data Mining for Moderation of Social Data

18 © 2012 SolidQ

Solution – Logical Model

•Post Context (behavior analysis) • Patterns, data mining.

•Post Content (text analysis) • Profanity, low score sentences, text mining, mood or

tone (sentiment analysis)

Page 19: Data Mining for Moderation of Social Data

19 © 2012 SolidQ

Typically Available Data on Posts

•Historical and real time data for: • User (e.g. userid, email, nationalid) • Location (e.g. Life & Style Fashion) • Time (e.g. 12 March 2011 18:56) • Content (e.g. text, link, picture, video). • Moderation result

•Other attributes like geography, age, education could be used

Page 20: Data Mining for Moderation of Social Data

Post context, Patterns, Data Mining •User behavior. • Time behavior. • Location behavior.

20 © 2012 Solid Quality Mentors

Page 21: Data Mining for Moderation of Social Data

Building useful attributes • 1.- Thread ( % Fails in a certain thread) • 2.- User (% Fails per User) • 3.- Diff Hour Forum Created (TimeDatePosted-TimeForumCreated) • 4.- User Forum (% Fails in a certain forum) • 5.- Diff Last for User (TimeDatePosted - TimeLastFailUser) • 6.- Hour of the day • 7.- Diff hour UserJoined-Now (TimeDatePosted-TimeUserJoined) • 8.- User Thread (% Fails per User in a thread) • 9.- Diff Hour Thread Created (TimeDatePosted-TimeThreadCreated) • 10.- Day of Week • More than 100 attributes.

21 © 2012 Solid Quality Mentors

Page 22: Data Mining for Moderation of Social Data

Hard Work •Periods. •Algorithms. •Algorithms' parameters. •Model refreshing. •Attribute analysis. •Outliers. •Overpopulating. •Behavior after this systems is in production.

22 © 2012 Solid Quality Mentors

Page 23: Data Mining for Moderation of Social Data

Data Mining Algorithms

•Decision Trees/Linear Regression • Sequence Analysis •Neural Networks/Logistic Regression •Clustering • Text Mining (Words and Phrases)

23 © 2012 SolidQ

Page 24: Data Mining for Moderation of Social Data

24 © 2012 SolidQ

Conclusion on Context

•Risk based on context of the post • Time • User’s history • Publish location

• Enables risk analysis for all type of content • Comments (in any language) • Links • Pictures • Videos

Page 25: Data Mining for Moderation of Social Data

Logical Model: Post content

•Profanity Analysis • Text Mining

The first minister and his secretary found sleeping together last night. They got drunk at a nearby pub.

• Sentiment Analysis

25 © 2012 SolidQ

Page 26: Data Mining for Moderation of Social Data

26 © 2011 SolidQ

Page 27: Data Mining for Moderation of Social Data

27 © 2012 SolidQ

Moderation, Data Mining System

Page 28: Data Mining for Moderation of Social Data

28 © 2011 SolidQ

Page 29: Data Mining for Moderation of Social Data

Analysis and Reporting •Published through integrated web application

• Moderation statistics. • Users statistics. • News and Stories Statistics. • Peaks.

29 © 2012 SolidQ

Page 30: Data Mining for Moderation of Social Data

30 © 2012 SolidQ

Conclusion: Benefits

•Moderating half of the total posts, the solution captures 90% of failing posts. The remaining 10% seem to be likely safe posts. •Using Intelligent Moderation, media companies

scan the whole universe of posts at a comparatively low cost. •At peak times, Intelligent Moderation works

perfect.

Page 31: Data Mining for Moderation of Social Data

31 © 2011 SolidQ

Football night in Europe

•On January 25th, 2012: • Liverpool defeated Manchester City in the Carling Cup • Barcelona defeated Real Madrid in Copa del Rey

•More than 100.000 comments arrived to the different BBC sites during 10 hours •All comments were filtered through our system •No problems observed during that time

Page 32: Data Mining for Moderation of Social Data

32 © 2012 SolidQ

SolidQ Team in this project

•Project Managers • Francisco Gonzalez, Javier Torrenteras, Alejandro

Leguizamo

•Developers • Itzik Ben-Gan, Enrique Puig, Ruben Pertusa, Carlos

Martinez , Fernando G. Guerrero

• Technical reviewers • Mark Tabladillo, Dejan Sarka

• Social Media Specialist. • Jose Quinto, Rocio Díaz

Page 33: Data Mining for Moderation of Social Data

33 © 2012 SolidQ

SolidQ Reseach

• Incomplete Grammar Analysis •Human interaction with IT systems

• Collaboration • Contextual analysis

• Sentiment Analysis • Market Research • Reputation

•Data Mining of context Social • Moderation • Market Research • Reputation

Page 34: Data Mining for Moderation of Social Data

Invisible computing…

34

… Driven by Social Data

Page 35: Data Mining for Moderation of Social Data

THANK YOU!

35 © 2012 SolidQ

Fernando G. Guerrero Global CEO SolidQ [email protected]