Top Banner
Elastic November 29th 2017 @elasticmark Tackling toxic content Mark Harwood, developer
19

Tackling toxic content - irsg.bcs.orgirsg.bcs.org/SearchSolutions/2017/presentations/Tackling toxic content.pdfTackling toxic content Mark Harwood, developer. Business drivers for

Aug 08, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Tackling toxic content - irsg.bcs.orgirsg.bcs.org/SearchSolutions/2017/presentations/Tackling toxic content.pdfTackling toxic content Mark Harwood, developer. Business drivers for

Elastic

November 29th 2017

@elasticmark

Tackling toxic content

Mark Harwood, developer

Page 2: Tackling toxic content - irsg.bcs.orgirsg.bcs.org/SearchSolutions/2017/presentations/Tackling toxic content.pdfTackling toxic content Mark Harwood, developer. Business drivers for

Business drivers for tackling toxic content

2

Fake news

Hate speech

Extremist videos

Advertisers

Withdrawing ads

Government

Fines, legal restrictions

Consumers

Reputational damage, loss of audience

Toxic content

!

Pressure groupsPublic shaming

Page 3: Tackling toxic content - irsg.bcs.orgirsg.bcs.org/SearchSolutions/2017/presentations/Tackling toxic content.pdfTackling toxic content Mark Harwood, developer. Business drivers for

How?

Page 4: Tackling toxic content - irsg.bcs.orgirsg.bcs.org/SearchSolutions/2017/presentations/Tackling toxic content.pdfTackling toxic content Mark Harwood, developer. Business drivers for

• Proactive

• Root out content before it gathers an audience

• Reactive

• Respond to complaints from the audience

Two approaches:

4

How do your staff determine what is

“toxic”?

Whose opinions do you trust?

Page 5: Tackling toxic content - irsg.bcs.orgirsg.bcs.org/SearchSolutions/2017/presentations/Tackling toxic content.pdfTackling toxic content Mark Harwood, developer. Business drivers for

Proactive challenge

How do we determine what is toxic?

Page 6: Tackling toxic content - irsg.bcs.orgirsg.bcs.org/SearchSolutions/2017/presentations/Tackling toxic content.pdfTackling toxic content Mark Harwood, developer. Business drivers for

• Parsing is hard - content is often binary e.g. audio or video

• Limited metadata - lack of descriptions or keywords

Content based analysis is hard

6

?

Page 7: Tackling toxic content - irsg.bcs.orgirsg.bcs.org/SearchSolutions/2017/presentations/Tackling toxic content.pdfTackling toxic content Mark Harwood, developer. Business drivers for

• Reuse the basis of recommendation engines - people who liked X also like Y

Easier to examine activity around content

7

! ?

Page 8: Tackling toxic content - irsg.bcs.orgirsg.bcs.org/SearchSolutions/2017/presentations/Tackling toxic content.pdfTackling toxic content Mark Harwood, developer. Business drivers for

Recommendations recap: MovieLens data

8

http://files.grouplens.org/datasets/movielens/ml-10m-README.html

Page 9: Tackling toxic content - irsg.bcs.orgirsg.bcs.org/SearchSolutions/2017/presentations/Tackling toxic content.pdfTackling toxic content Mark Harwood, developer. Business drivers for

Random samples should hold no surprises

9

• 17% of all people like “Forrest Gump”• In a random sample of people, 17% of

them will also like “Forrest Gump”

Dull. But in non-random samples something interesting happens…..

Page 10: Tackling toxic content - irsg.bcs.orgirsg.bcs.org/SearchSolutions/2017/presentations/Tackling toxic content.pdfTackling toxic content Mark Harwood, developer. Business drivers for

Non-random sample: people who liked “Talladega nights”

10

<0.5% of all people like “Anchorman”

In the set of “Talladega-likers”, 20% of them like “Anchorman”

..a huge uplift in popularity from the norm!

Find all people who liked movie #46970

Summarise how their movie tastes differ from everyone else

Page 11: Tackling toxic content - irsg.bcs.orgirsg.bcs.org/SearchSolutions/2017/presentations/Tackling toxic content.pdfTackling toxic content Mark Harwood, developer. Business drivers for

Proactive demo

Page 12: Tackling toxic content - irsg.bcs.orgirsg.bcs.org/SearchSolutions/2017/presentations/Tackling toxic content.pdfTackling toxic content Mark Harwood, developer. Business drivers for

Reactive challenge

Whose opinions do we trust?

Page 13: Tackling toxic content - irsg.bcs.orgirsg.bcs.org/SearchSolutions/2017/presentations/Tackling toxic content.pdfTackling toxic content Mark Harwood, developer. Business drivers for

Allow end users to report toxic content

13

Page 14: Tackling toxic content - irsg.bcs.orgirsg.bcs.org/SearchSolutions/2017/presentations/Tackling toxic content.pdfTackling toxic content Mark Harwood, developer. Business drivers for

BUT - some user reports, like some content, can be questionable

14

Page 15: Tackling toxic content - irsg.bcs.orgirsg.bcs.org/SearchSolutions/2017/presentations/Tackling toxic content.pdfTackling toxic content Mark Harwood, developer. Business drivers for

• Positive reviews - “shill” or “sock puppet” accounts are used to artificially inflate the reputation of sellers in a marketplace

• Negative reviews - fake accounts or mob-rallying is used to sabotage the reputation of an innocent party.

• Tell-tale signs of collusion might include:

• • A common IP address or user agent

• • A common "hit list" of items being flagged

• • A common phrase used in feedback

• • The same time-of-day when logging requests

• • The same site join-date

Review fraud is a thing

15

Page 16: Tackling toxic content - irsg.bcs.orgirsg.bcs.org/SearchSolutions/2017/presentations/Tackling toxic content.pdfTackling toxic content Mark Harwood, developer. Business drivers for

Components of a fraud detection stack

16

Ingest Linking Risk-scoring Investigation""" """

#

#

"""

#

#

%# %

# %

Entity resolution, filtering

Cleansing, enriching normalisation

Graph exploration, anomaly detection, scoring

Task lists, case management, visualisation

Outcomes

Page 17: Tackling toxic content - irsg.bcs.orgirsg.bcs.org/SearchSolutions/2017/presentations/Tackling toxic content.pdfTackling toxic content Mark Harwood, developer. Business drivers for

Bad actors make strange shapes

17

%

&

#

#

"

'&

'&

#

#

"

'&

' &

#

#

"

'&

' &

#

#

"

'&

' &

#

#

"

' &

'

#

#

"

#

#

"

It is hard for identity manipulators to

avoid reusing resources (IP addresses,

join dates, subject lists, phrases, time) .

Fraudsters generate too many

“coincidences”.

Use the Graph API to gather related

data then raise alerts on anomalies.

See example: http://bit.ly/es_fraud

Ingest Linking Risk-scoring Investigation""" """

#

#

"""

#

#

%# %

# %

( ( ( (

)

Page 18: Tackling toxic content - irsg.bcs.orgirsg.bcs.org/SearchSolutions/2017/presentations/Tackling toxic content.pdfTackling toxic content Mark Harwood, developer. Business drivers for

Responding to alerts

18

Kibana with the Graph plugin allows investigators to examine details behind alerts.

)

Ingest Linking Risk-scoring Investigation""" """

#

#

"""

#

#

%# %

# %

See example: http://bit.ly/es_fraud

Page 19: Tackling toxic content - irsg.bcs.orgirsg.bcs.org/SearchSolutions/2017/presentations/Tackling toxic content.pdfTackling toxic content Mark Harwood, developer. Business drivers for

Demo