Suspicious News Detection Using Micro Blog Texttagami/resources/PACLIC2018.pdfn Proposed a new task suspicious news detection using micro blog text n This task aims to detect suspicious

Suspicious News Detection Using Micro Blog Text

Tsubasa Tagami"Hiroki Ouchi#,", Hiroki Asano",#, Kazuaki Hanawa", Kaori Uchiyama",Kaito Suzuki", Kentaro Inui",#, Atsushi Komiya%, Atsuo Fujimura%,

Hitofumi Yanai&, Ryo Yamashita', Akinori Machino(

"Graduate School of Information Sciences, Tohoku University, Japan#RIKEN, %SmartNews, Inc., &FactCheck Initiative Japan,

'Watchdog for Accuracy inNews-reporting, Japan, (Hi-Ether Japan

PACLIC2018

n Proposed a new task suspicious news detection using micro blog text

n This task aims to detect suspicious news articles that need to be fact-checking

n Developed human-machine hybrid fact-checking n Applied to a real-world situation of Okinawa governor election and detected 21 Fake News

Outline

SuspiciousNewsDetectionUsingMicroBlogText 2SuspiciousNewsDetectionUsingMicroBlogText 2

predict

Fact-checker

suspicious

http://www.news1~I suspect it is fake news. Read WSJ...

http://www.news2~This is completely misinformation ...

The Post-Truth Eran “Fake News” is considered to be a significant problem

p Researchers said fake news on social media influenced US election voters [Bovet+, 2018]

p Fake News led a young man to murder nine people at a historic African-American church in Charleston

p A drama featuring Fake News is produced in Japan by national broadcasting company

SuspiciousNewsDetectionUsingMicroBlogText 3

https://www.nhk.or.jp/dodra/fakenews/

https://www.eurweb.com/2018/01/trump-reveals-winners-controversial-fake-news-awards/

https://theundefeated.com/features/how-fake-news-led-to-dylann-roof-to-murder-nine-people/

What is Fake Newsn Definition

p News articles that are intentionally false and could mislead readers [Shu et al., 2017]

n Problematic Issuep The spreading of Fake News has a negative impacton our society and the news industry

SuspiciousNewsDetectionUsingMicroBlogText

negatively affect an election cause a conflictFAKE

4

Difficulty of Fact-Checkingn Fact-checking is a time-consuming task, sometimes It

takes a whole day to research and write a articlen Fact-checkers cannot keep up with the amount of misinformation generated every day

n Human fact-checking is an intellectually demanding and laborious process


Narrowing down the number of articles that require human fact-checking is necessary

Difficulty of Narrowing Down Articles

n Simply filtering with specific keywords such as ‘misinformation’ and ‘fake’ can not find C efficientlyp Just saying personal impression on the article

p The target of mention is not the content of news


http://www.news1~I really can not believe it. I wish it were a misinformation. I’m lost for words, but I’ll send my prayers!

http://www.news1~Does anybody feel she is trying to talk around false teeth because of all those implants in her cheeks and chin?

Our Goaln Automating suspicious news detection using posts on SNS that cast suspicion on news articles


1Collect posts on SNS

collect

database

2Predict suspicious or not using posts

predict

suspicious

Definitions of Termsn Suspicion casting posts (SCP)

p Posts on SNS that refer to and cast suspicion on certain news articles

n Suspicious articles (SA)p News articles to be verified by human fact-checker p We defined SA are news articles mentioned by at least one SCP

SuspiciousNewsDetectionUsingMicroBlogTextcitizen

http://www.news.~I suspect it is fake news. Read WSJ article ‒ says ...

fact-checker

Suspicion casting post (SCP)Fact-checking

Suspicious article (SA)

8

Proposed Taskn Propose and formalize two tasksn Suspicion Casting Post Detection

Post on SNS that refer to a news articleJudgement whether it is SCP or just mentioning personal impression on the article

n Suspicious Article Detectionp Given a set of posts that refer to same article, judge whether the set include SCP or not


1Input :Output :

2

http://www.news1.~This article denotes misinformation, doesn’t it?

Suspicion casting post (SCP)http://www.news2.~I really can not believe it. I wish it were a lie. I‘ll send my prayers!

Not suspicion casting post

Datasetn Created two datasets for our tasksn Suspicion Casting Post Dataset1. Collected the posts on SNS including the URL of articles and specific keywords, such as misinformation and fake

2. Removed the noises such as article title, URL, mentions and hashtags from posts

3. To each collected post, we annotated 1 if the post casts suspicion and -1 otherwise


1

http:www.news.~ #pleaserepostThis article is completely misinformation because …

Dataset


This article is completely misinformation because …

Preprocess

n Created two datasets for our tasksn Suspicion Casting Post Dataset1. Collected the posts on SNS including the URL of articles and specific keywords, such as misinformation and fake


3. To each collected post, we annotated 1 if the post casts suspicion and -1 otherwise

1

http:www.news.~ #pleaserepostThis article is completely misinformation because …

Datasetn Created two datasets for our tasksn Suspicion Casting Post Dataset1. Collected the posts on SNS including the URL of articles and specific keywords, such as misinformation and fake


3. To each collected post, we annotated 1 if the post casts suspicion and 0 otherwise


1

This article is completely misinformation because …

Preprocesshttp:www.news.~ #pleaserepostThis article is completely misinformation because …

Suspicion casting post (SCP)

Datasetn We created two datasets for our tasksn Suspicious Article Dataset

1. Collected a set of posts that refer to same news article and preprocessed these posts similarly

2. Annotated 1 if a set of posts refer to the same article include at least one SCP and 0 otherwise


2

This is completely false …

This fiscal policy is wrong …

Annotate

Suspicious articleSuspicion casting post (SCP)

Datasetn Statistics of datasetsn Suspicion Casting Post Dataset

p Number of sample is 7,775 posts (pos:1,036 / neg:6,739)p Average length of posts is 56.6 characters

n Suspicious Article Datasetp Number of sample is 1,836 articles (pos:564 / neg:1,272)p Average length of posts is 60.4 charactersp Average number of posts per article is 2.75


1

2

Experiments Setupn Models

p Logistic Regression (LR)p SVMp Decision Tree (DT)p Random Forest (RF)p LSTM

n Settingsp Word embeddings : 300dim (Learned from 4.5M tweets)p Vocab. size : 80K

n Evaluationp Precision, Recall, Micro-F1, Recall@K (Only SA detection)p Stratified 5-fold cross validation


Resultsn Results for SCP detection

n Results for SA detection


Overall, the LR, SVM and LSTM models yielded higher Micro-F1 scores than DT and RF models

Similarly, the LR, SVM and LSTM models achieved higher scores than the other two models

Error Analysisn Analyzed incorrectly judged posts by all models

p It is difficult for the basic models to properly capture sentence-level meanings, since the models mainly used word-level features, • Answer : SCP, Prediction : not SCP

• Answer : not SCP, Prediction : SCP


http://www.news1~At last, the news source has got clear... I wished it had been misinformation

http://www.news1~The description that a part ... is not wrong, but since the level of ~ , this title can mislead readers.

Resultsn Recall@K curve of SA detection task


Most of the models achieved 80% recall at the top 40% ranked articles

We can collect 80% suspicious articles only checking the top 40% ranked articles

Applicationn Created an application to support manual Fact-

checking named Fact-checking console


Suspicion casting post

Suspicious article

Suspicious article

Suspicion casting post

Application


Fact-check Projectn Our project used Fact-checking Console at the

Okinawa governor election held in Sep. 2018


(2018.9.1~10.3)http://fij.info/project/okinawa2018

6 media and 26 volunteers participated in this project as Fact-checker

Fact-check Project Outline


1Collect posts on SNS

collect

database

2Predict suspicious or not using posts

predict

Fact-checker

Fact-checking console

suspicious

3Check suspiciousnews articles

suspiciouscheck

Example of detected Fake Newsn Some media reported a famous female singer NamieAmuro supported a candidate Denny Tamaki


Suspicion casting post (SCP)

Denny Tamaki

NamieAmuro

FAKEA misinformation as if

Namie Amuro is supporting Denny Tamaki is spreading.

Conclusionn Summary

p Formalized and tackled a task, suspicious news detection using microblog text

p Applied our system to fact-checking activities in a real-world situation and succeeded to detect fake news

n Future workp To develop systems, we will create more sophisticated models for suspicious news detection

p Evaluate the difference between using our application for fact-checking and not using

p Consider information of news articles to predict


Suspicious News Detection Using Micro Blog Texttagami/resources/PACLIC2018.pdfn Proposed a new task suspicious news detection using micro blog text n This task aims to detect suspicious

Documents