Suspicious News Detection Using Micro Blog Text Tsubasa Tagami " Hiroki Ouchi #," , Hiroki Asano ",# , Kazuaki Hanawa " , Kaori Uchiyama " , Kaito Suzuki " , Kentaro Inui ",# , Atsushi Komiya % , Atsuo Fujimura % , Hitofumi Yanai & , Ryo Yamashita ’ , Akinori Machino ( " Graduate School of Information Sciences, Tohoku University, Japan # RIKEN, % SmartNews, Inc., & FactCheck Initiative Japan, ’ Watchdog for Accuracy inNews-reporting, Japan, ( Hi-Ether Japan PACLIC2018
24
Embed
Suspicious News Detection Using Micro Blog Texttagami/resources/PACLIC2018.pdfn Proposed a new task suspicious news detection using micro blog text n This task aims to detect suspicious
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
p News articles that are intentionally false and could mislead readers [Shu et al., 2017]
n Problematic Issuep The spreading of Fake News has a negative impacton our society and the news industry
SuspiciousNewsDetectionUsingMicroBlogText
negatively affect an election cause a conflictFAKE
4
Difficulty of Fact-Checkingn Fact-checking is a time-consuming task, sometimes It
takes a whole day to research and write a articlen Fact-checkers cannot keep up with the amount of misinformation generated every day
n Human fact-checking is an intellectually demanding and laborious process
SuspiciousNewsDetectionUsingMicroBlogText 5
Narrowing down the number of articles that require human fact-checking is necessary
Difficulty of Narrowing Down Articles
n Simply filtering with specific keywords such as ‘misinformation’ and ‘fake’ can not find C efficientlyp Just saying personal impression on the article
p The target of mention is not the content of news
SuspiciousNewsDetectionUsingMicroBlogText 6
http://www.news1~I really can not believe it. I wish it were a misinformation. I’m lost for words, but I’ll send my prayers!
http://www.news1~Does anybody feel she is trying to talk around false teeth because of all those implants in her cheeks and chin?
Our Goaln Automating suspicious news detection using posts on SNS that cast suspicion on news articles
SuspiciousNewsDetectionUsingMicroBlogText 7
1Collect posts on SNS
collect
database
2Predict suspicious or not using posts
predict
suspicious
Definitions of Termsn Suspicion casting posts (SCP)
p Posts on SNS that refer to and cast suspicion on certain news articles
n Suspicious articles (SA)p News articles to be verified by human fact-checker p We defined SA are news articles mentioned by at least one SCP
SuspiciousNewsDetectionUsingMicroBlogTextcitizen
http://www.news.~I suspect it is fake news. Read WSJ article ‒ says ...
fact-checker
Suspicion casting post (SCP)Fact-checking
Suspicious article (SA)
8
Proposed Taskn Propose and formalize two tasksn Suspicion Casting Post Detection
Post on SNS that refer to a news articleJudgement whether it is SCP or just mentioning personal impression on the article
n Suspicious Article Detectionp Given a set of posts that refer to same article, judge whether the set include SCP or not
Suspicion casting post (SCP)http://www.news2.~I really can not believe it. I wish it were a lie. I‘ll send my prayers!
Not suspicion casting post
Datasetn Created two datasets for our tasksn Suspicion Casting Post Dataset1. Collected the posts on SNS including the URL of articles and specific keywords, such as misinformation and fake
2. Removed the noises such as article title, URL, mentions and hashtags from posts
3. To each collected post, we annotated 1 if the post casts suspicion and -1 otherwise
SuspiciousNewsDetectionUsingMicroBlogText 10
1
http:www.news.~ #pleaserepostThis article is completely misinformation because …
Dataset
SuspiciousNewsDetectionUsingMicroBlogText 11
This article is completely misinformation because …
Preprocess
n Created two datasets for our tasksn Suspicion Casting Post Dataset1. Collected the posts on SNS including the URL of articles and specific keywords, such as misinformation and fake
2. Removed the noises such as article title, URL, mentions and hashtags from posts
3. To each collected post, we annotated 1 if the post casts suspicion and -1 otherwise
1
http:www.news.~ #pleaserepostThis article is completely misinformation because …
Datasetn Created two datasets for our tasksn Suspicion Casting Post Dataset1. Collected the posts on SNS including the URL of articles and specific keywords, such as misinformation and fake
2. Removed the noises such as article title, URL, mentions and hashtags from posts
3. To each collected post, we annotated 1 if the post casts suspicion and 0 otherwise
SuspiciousNewsDetectionUsingMicroBlogText 12
1
This article is completely misinformation because …
Preprocesshttp:www.news.~ #pleaserepostThis article is completely misinformation because …
Suspicion casting post (SCP)
Datasetn We created two datasets for our tasksn Suspicious Article Dataset
1. Collected a set of posts that refer to same news article and preprocessed these posts similarly
2. Annotated 1 if a set of posts refer to the same article include at least one SCP and 0 otherwise
SuspiciousNewsDetectionUsingMicroBlogText 13
2
This is completely false …
This fiscal policy is wrong …
Annotate
Suspicious articleSuspicion casting post (SCP)
Datasetn Statistics of datasetsn Suspicion Casting Post Dataset
p Number of sample is 7,775 posts (pos:1,036 / neg:6,739)p Average length of posts is 56.6 characters
n Suspicious Article Datasetp Number of sample is 1,836 articles (pos:564 / neg:1,272)p Average length of posts is 60.4 charactersp Average number of posts per article is 2.75
SuspiciousNewsDetectionUsingMicroBlogText 14
1
2
Experiments Setupn Models
p Logistic Regression (LR)p SVMp Decision Tree (DT)p Random Forest (RF)p LSTM
n Settingsp Word embeddings : 300dim (Learned from 4.5M tweets)p Vocab. size : 80K
n Evaluationp Precision, Recall, Micro-F1, Recall@K (Only SA detection)p Stratified 5-fold cross validation
SuspiciousNewsDetectionUsingMicroBlogText 15
Resultsn Results for SCP detection
n Results for SA detection
SuspiciousNewsDetectionUsingMicroBlogText 16
Overall, the LR, SVM and LSTM models yielded higher Micro-F1 scores than DT and RF models
Similarly, the LR, SVM and LSTM models achieved higher scores than the other two models
Error Analysisn Analyzed incorrectly judged posts by all models
p It is difficult for the basic models to properly capture sentence-level meanings, since the models mainly used word-level features, • Answer : SCP, Prediction : not SCP
• Answer : not SCP, Prediction : SCP
SuspiciousNewsDetectionUsingMicroBlogText 17
http://www.news1~At last, the news source has got clear... I wished it had been misinformation
http://www.news1~The description that a part ... is not wrong, but since the level of ~ , this title can mislead readers.
Resultsn Recall@K curve of SA detection task
SuspiciousNewsDetectionUsingMicroBlogText 18
Most of the models achieved 80% recall at the top 40% ranked articles
We can collect 80% suspicious articles only checking the top 40% ranked articles
Applicationn Created an application to support manual Fact-
checking named Fact-checking console
SuspiciousNewsDetectionUsingMicroBlogText 19
Suspicion casting post
Suspicious article
Suspicious article
Suspicion casting post
Application
SuspiciousNewsDetectionUsingMicroBlogText 20
Fact-check Projectn Our project used Fact-checking Console at the