Unsupervised Content-Based Identification of Fake …snap.stanford.edu/mis2/files/MIS2_paper_2.pdf · Unsupervised Content-Based Identification of Fake News Articles with Tensor Decomposition

Unsupervised Content-Based Identification of Fake NewsArticles with Tensor Decomposition Ensembles

Seyedmehdi HosseinimotlaghUniversity of California Riverside

shoss007ucredu

Evangelos E PapalexakisUniversity of California Riverside

epapalexcsucredu

ABSTRACTSocial media provide a platform for quick and seamless access toinformation However the propagation of false information espe-cially during the last year raises major concerns especially giventhe fact that social media are the primary source of information fora large percentage of the population False information may manip-ulate peoplersquos beliefs and have real-life consequences ereforeone major challenge is to automatically identify false informationby categorizing it into dierent types and notify users about thecredibility of dierent articles shared online

Existing approaches primarily focus on feature generation andselection from various sources including corpus-related featuresHowever so far prior work has not paid considerable aention tothe following question how can we accurately distinguish dierentcategories of false news solely based on the content

In this paper we work on answering this question In partic-ular we propose a tensor modeling of the problem where wecapture latent relations between articles and terms as well as spa-tialcontextual relations between terms towards unlocking the fullpotential of the content Furthermore we propose an ensemblemethod which judiciously combines and consolidates results formdierent tensor decompositions into clean coherent and high-accuracy groups of articles that belong to dierent categories offalse news We extensively evaluate our proposed method on realdata for which we have labels and demonstrate that the proposedalgorithm was able to identify all dierent false news categorieswithin the corpus with average homogeneity per group of up to80

KEYWORDSTensor decomposition Ensemble method Fake news

1 INTRODUCTIONe advent of social media services has facilitated production shar-ing and searching information at an unprecedented level Forinstance over 65 of the US adult population has access to Face-book and share information on a daily basis1 Social media allowpeople to express their feelings state their opinions on news andrelay information from the media to their audience erefore so-cial media play a profound role in inuencing the social economic1hpwwwpewinternetorg20161111social-media-update-2016Permission to make digital or hard copies of part or all of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor prot or commercial advantage and that copies bear this notice and the full citationon the rst page Copyrights for third-party components of this work must be honoredFor all other uses contact the ownerauthor(s)MIS2 Marina Del Rey CA USAcopy 2018 Copyright held by the ownerauthor(s) 123-4567-24-5670806 $1500DOI 10475123 4

and political domains of everyday decision making For instanceduring 2016 US presidential election candidates eectively usedTwier to send their messages and express their opinions directlyto their supporters Hillary Clintonrsquos Twier for example hasreached 16 million followers with her most popular tweet receivedmore than 600000 re-tweets 2 and one million likes

Aside from all merits the proliferation of false information raisesmajor concerns In the 2013 World Economic Forum ldquothe rapidspread of misinformation onlinerdquo has been ranked among top tentrends for year 2014 [11] False information being deliberately pro-duced to manipulate readers is called disinformation for dierentpurposes for example stating a biased belief or temping audienceto clicking on the link On the other hand misinformation isthe unintentional relay of incorrect facts to an audience who mayperceive the delivered information as true

ere exist several online fact-checking applications like Hoaxy[30] which gather credibility scores of news from several onlinefact-checking websites such as Snopescom PolitiFactcom andFactCheckorg Recent research shows that fake news spread fastthrough social media networks (for instance Hurricane Sandy in2012 [4] and the Boston Marathon blasts in 2013 [31]) and thiscan adversely amplify public anxiety For instance in 2013 $130billion in ldquostock value was wiped outrdquo abruptly aer a rumor ofBarack Obama being injured by an explosion at the White House[26] us early identication of fake news before they nd theirway in online fact-checking repositories potentially using solelytheir content is imperative

A popular line of work focuses on analyzing user connectivityin order to understand how information is diused on a socialnetwork ere exist several proposed models such as indepen-dent cascade and linear threshold model [17] SIR [35] and tippingmodels [5] most of which seek to learn a parameterized probabilityof a user being infected as a function of their friends and theirbehavior

In addition to network-based analyses fake news identicationcan be done based on the content of a particular article For ex-ample Facebook recently proposed a number of tips 3 for spoingfake news such as being skeptical all-CAPS headlines and unusualformaing Aside from using merely meta-data of an news ar-ticle such as its title or potential hashtags shared along with itthe very text of the article can reveal crucial information In thispaper we focus on content-based unsupervised fake news identi-cation using factorization methods Most unsupervised clusteringmethods in text retrieval with basis of Non-negative Matrix Factor-ization (NMF) [16 34 20 32 18 12] Singular Value Decomposition

2hpswwwweforumorgagenda201608hillary-clinton-or-donald-trump-winning-on-twier3hpstechcrunchcom20170406facebook-puts-link-to-10-tips-for-spoing-false-news-atop-feed

(SVD) [7] and Independent Component Analysis (ICA) [19] focuson frequencies of terms (words) in documents To improve theresults tf-idf (term frequency-inverse document frequency) hasbeen widely proposed as term-weighting schemes to enhance theeectiveness of words e tf-idf value increases proportionally tothe number of times a word appears in the document and reducethe weight of term appears more frequently in corpus generallyAlthough the frequencies of terms are considered in these algo-rithms the anity of terms are ignored that causes irrelevant newswith same high keywords being selected in a same latent factor

e aforementioned matrix techniques are able to capture cor-relations and similarities between dierent documents howeverthey are not able to fully leverage the context of a document sincethey are restricted to unigrams (or even n-grams which still do notfully capture context) Context has been shown to be extremelyuseful in designing eective word vector representations withprime example the Word2Vec model [22]

In this paper we take this analysis a step further and explicitlymodel the context of words within a document via capturing thespatial vicinity of each word In particular we model the corpus asa third order tensor which simultaneously models article and termrelations as well as spatialcontextual relations between termsExploiting both aspects of the corpus and in particular the spatialrelations between words is a determining factor for identifyingcoherent groups of articles that fall under dierent types of falsenews To the best of our knowledge this work is the rst to si-multaneously model spatial relations between terms and latentrelations between terms and articles in the context of fake newsidentication Our contributions are

bull Exploitinghigher-order context information We modelthe corpus as a three-mode (article term term) tensorwhich captures spatial term relations and article-term re-lations We subsequently use the CPPARAFAC decom-position [14] to identify latent groups of articles that fallunder coherent categories of false news

bull Introducing an ensemble method We design an en-semble method which leverages multiple decompositionsof the (article term term) tensor to further rene the la-tent groups discovered by the decomposition and producea clean categorization of articles which is more accuratethan the state of the art In particular our proposed ensem-ble technique 1) discovers classes with higher homogene-ity 2) is able to identify ldquohardrdquo categories of false news(where baselines fail) and 3) produces results with loweroutlier diversity

bull Evaluation on real-world data We apply our proposedalgorithm on Kaggle fake news dataset [28] which con-tains several features including the body of news and alsolabels used only for evaluation Our results show that theproposed algorithm nds most of fake news categorieswith homogeneity value up to 80 on overall reduces thediversity of outliers to 25 and exploring all categories withhigh homogeneity whereas other algorithms are incapableof discovery all categories

Indicatively our proposed method was able to produce latentgroups with 65 coherence overall and more 80 for most of cat-egories whereas NMFSVD up to 50 on average of top 30 newsfor each factor

e outline of the paper is as follows In Section 2 we describepreliminaries and notations used in the paper In Section 3 wedene the problem and describe our two-tier proposed algorithmWe give the results of running numerous experiments with twodierent unsupervised clustering algorithms in Section 4 In Sec-tion 5 we provide a brief literature survey on false news Finallywe describe our conclusions in Section 6 about the usefulness ofour proposed method and how the two algorithms could be usedtogether to detect and analyze false news categories

2 PRELIMINARIES AND NOTATIONSBefore explaining our proposed algorithm we provide a few nec-essary denitions and describe our notations on tensor decomposi-tion and co-clustering which we use throughout this paper

21 CPPARAFAC Tensor DecompositionTensors are denoted by boldface underlined capital leers (X) ma-trices by boldface capital leers (X) and vectors by boldface lowercase leers (x) Furthermore Xi denotes the ith horizontal slicein X CPPARAFAC decomposition [14] of a 3-way X is wrienas X asymp [ABC] = sumR

r=1 λr ar o br o cr where the symbol o de-notes the outer product and ar (same for br and cr ) denotes thenormalized r th column (called component) of non-negative factormatrix A (same for B and C R is the rank of CPPARAFAC de-composition is minimum number of columns needed that theirouter products construct the original tensor However in mostcases a low-rank decomposition is preferred because it is able toeectively capture hidden paerns and similarities in the data Inthis paper we employ the CPPARAFAC decomposition with alter-nating optimization given a Poisson distribution on the columnof data which has been shown to be tailored when dealing withsparse count data [6] hence ABC isin [0 1]IntimesR and | |ar | | = 1(also for br and cr )

22 So Co-ClusteringCo-clustering searches for subsets of similar rows and columnsin the data matrix with least squares similarity In the Spare Ma-trix Regression algorithmin [23] the authors introduce a so ma-trix co-clustering method where co-clusters may overlap In co-clustering since rows and columns are decomposed is called bi-linear decomposition So XNtimesM asymp [RNtimesK CMtimesK ] = sumK

i=1 riciT

where ri and cTi is a latent factor of rows and columns respec-tively and K is the rank of co-clustering e authors of [23]further impose L1-norm regularization to promote sparsity onthe latent factors erefore the loss function is calculated as| |X minus RCT | |2T + λ

sumik |R(ik ) | + λ

sumjk |C(jk ) | |

3 PROBLEM DEFINITION AND PROPOSEDMETHOD

In this section we explain our two-tier proposed method Firstwe explain how to extract spatial relations on terms and then wediscus the co-clustering to decompose relevant documents

31 Problem descriptionGiven a corpus of fake news d = n1n2n3 with size ofN where each document ~ni is a vector of terms in a dictionarysum= t1 t2 t3 with size of T = |sum | e problem is clustering

of documents based on their terms into homogeneous classes withrespect to fake news categories To this end we rst cluster doc-uments based on appearance positions of each term in an articleand its correlations with other terms (Spatial relation extraction)following by designing an automatic ensemble co-clustering tocluster documents according to their positions in dierent factorsamong various low-rank decompositions

32 Tier-1 Spatial relation extractionIn this section we propose a method to decompose the documentsbased on spatial relations on their terms e intuition of this stageis that a subset of terms which are in anity to other represents ameaning In this paper we call the anities of terms in a documentspatial relations between terms and we propose a tensor constructedbased on these relations Low-rank tensor decomposition leads tomine the most important relations which represents a category offalse news

In most of previous content-based decomposition algorithms adocument-term matrix (for example ATtimesN has been constructedsuch that A(i j ) represents the number of occurrence of ith term ofthe dictionary in the document j Documents that have similaritiesin frequencies of words have the similar coecient in each factor ofdecomposed matrix erefore selecting top values of each factorin decomposed document matrix leads to a coherent number ofdocuments

In contrast to previous research that only existence and fre-quency of words in documents does maer we also take the spatialrelation of words into account To this end we propose a tensorof N timesT timesT each horizontal slice of Xi represents the spatialrelations in a document For a horizontal slice like S S(i j ) is thenumber of times that ith term and jth term of the dictionary ap-pear in an anity of each other erefore each slice of the tensorcontains the co-occurrence of terms within δ which is the size ofanity window

e problem is exatracting a densely correlated set of docu-ments and words usually manifesting in the data as dense blocksTo this end we use CPPARAFAC decomposition We computenon-negative CPPARAFAC with alternating Poisson regression(CP APR) that uses Kullback-Leibler divergence because of veryhigh sparsity in the tensor [6]

33 Tier 2 Tensor Ensemble Co-clusteringe second tier of our proposed method is a clustering algorithmfor categorizing news articles based on their cluster membershipamong a set of dierent tensor decompositions Intuitively the

proposed clustering seeks to nd a subset of news articles thatfrequently cluster together in dierent congurations of the tensordecomposition of the rst tier (where lsquocongurationrdquo refers to therank of the decomposition)

e intuition behind our proposed ensemble method is thatnews articles that tend to frequently appear surrounded each otheramong dierent rank congurations while having the same rank-ing within their latent factors are more likely to ultimately belongto the same category e ranking of a news article with respectto a latent factor is derived by simply sorting the coecients ofeach latent factor corresponding to the clustering membership ofa news article to a latent factor

Before we proceed with our proposed method below we describea number of dierent scenarios that for two dierent news articlesin our results both when dealing with a single decomposition andwhen dealing with an ensemble of congurations

331 Scenario 1 Within factors of CPPARAFAC Suppose thattwo news articles of n1 and n2 belong to a same category Besidesthey have much similarity with respect to content erefore onecan expect that aer a tensor decomposition they should appear intop of some columns close to each other (see Fig 1a)

332 Scenario 2 Dierent configurations If n1 and n2 are mostof times in anity of each other in dierent congurations oflow-rank decomposition one can expect that they belong to thesame category (n1 appears near n2 in both Fig 1a and Fig 1bcongurations)

333 Scenario 3 Dierent columns To the contrary if n1 andn2 appear at top of dierent columns (see g 1c) but in otherfactors they are usually near each other it is more than likely thatthey are outliers to those columns

334 Scenario 4 Transitivity If there exists another news liken3 such that n1 and n3 appear nearby in some congurations andn1 and n2 appear nearby in some other congurations itrsquos likelythat n1 n2 and n3 belong to a same category (see Fig 1d)

In order to extract similar news articles from the ensemble oftensor decompositions with dierent congurations we apply co-clustering In particular we propose to combine the clusteringresults of each individual tensor decomposition into a collective(news-article by latent-factor) matrix from which we are going toextract co-clusters of news articles and the corresponding latentfactors (coming from our ensemble of decompositions) indicat-ing that news articles within the same co-cluster tend to appear(with high membership value) frequently in the same latent tensordecomposition factor

When constructing the (news-article by latent-factor) matrixwe have to decide whether a particular news article belongs to oneof the r factors of each decomposition We thus use the value ofthe factor corresponding to each article and assign it accordinglyto one of the r factors To that end we divide each factor intoseveral partitions based on the Empirical Cumulative DistributionFunction (ECDF) of their values so that we do not create arbitrarypartitions or make arbitrary assignment decisions Aer computingthe partitions each column is reordered as shown in Figure 2 so

(a) (b) (c) (d)Figure 1 Examples of dierent conditions for news a) two news in their anity in a factor b) two news in their anity in a factor of dierentconguration c) two same label news appear as outliers of dierent factors d) transitive relation of news

Figure 2 Overview of our proposed algorithm demonstrating theconstruction of the S matrix out of an ensemble of tensor decom-positions and the extraction of high-quality clusters for fake newsarticles Next to each decomposition of the ensemble we also plotthe homogeneity (with respect to the actual labels) of each factorand we demonstrate how the homogeneity improves by combiningthe decomposition ensembles via co-clusteringthat all news articles that are assigned to the same latent factor (ielatent cluster) are located in contiguous positions in the vector

Aer constructing the matrix we co-cluster the rows (corre-sponding to news articles) and columns (corresponding to latenttensor factors) of that matrix e result will be set of news articlesthat frequently appear in the same latent factor partition (computedby the ECDF of each factor) with similar factor-membership valueIntuitively the proposed ensemble method decreases the likelihoodof outliers ie news articles which by mere noise would appearwith very high value in one of the decompositions but not in theentire ensemble

4 EXPERIMENTAL EVALUATIONIn this section we evaluate our proposed method on a real datasetWe use the fake news dataset provide by Kaggle [28] which containtext and metadata of over 12000 articles Table 1 describes thecategories of fake news labeled by BS detector [8] We discardfake category because of few instances and bs news since they aremissing labels We implement our proposed algorithms in MatlabFor tensor decomposition and SMR co-clustering we use the tensortoolbox library [3] and the publicly available implmentation in [9]respectively We only choose the news which contain more than100 words aer stemming and removing stop words For balancingthe dataset we randomly select the equal number of instances foreach category erefore it results to have 75 instances for each

Table 1 Fake news categoriesLabels from BS detector

class name [short form] descriptions[Satire] Sources that provide humorous commentary on current

events in the form of fake newsExtreme [Bias] Sources that trac in political propaganda and gross dis-

tortions of fact[Conspiracy] eory Sources that are well-known promoters of kooky conspir-

acy theories[Junk Sci]ence Sources that promote pseudoscience metaphysics natural-

istic fallacies and other scientically dubious claims[Hate] Group Sources that actively promote racism misogyny homopho-

bia and other forms of discrimination[State] News Sources in repressive states operating under government

sanction

category e default kernel size to construct the tensor is 10 unlessit is explicitly stated We conduct experiments a number of timeson dierent shued instances and report the median results

41 Evaluation of Tier-1First we show our results for the rst tier of our approach evaluat-ing it in dierent aspects More specically we analyze the resultsof 1) homogeneity and 2) variety of outliers in dierent algorithmsand 3) investigate the sensitivity of word kernel

411 Homogeneity Our primary objective is to identify high-quality clusters of fake news in an unsupervised manner To thisend we show the homogeneity of factors in dierent decomposi-tion algorithms indicating how pure each latent cluster is Figure3a depicts this metric for dierent thresholds of top news articlesWe observe that the homogenity of CPPARAFAC falls slightly andremains level to almost 56 on average when selecting top 30 newsof each factor while that of other algorithms drops signicantly inother decomposition methods

412 Outlier variety Another important evaluation metric thatwe measure in this paper is the variety of outliers As shown inFig 3b although CPPARAFAC reduces the number of outlierssubstantially outliers belong to dierent news categories Duringour experiments we noticed that the number of outliers increaseswhen increasing the number of top news articles to 25 or largerOverall comparing the results of homogeneity with the variety ofoutliers indicates that even though homogeneity is less sensitive tothe increase of the number of news articles per cluster the numberof outliers increases substantially warranting the second tier ofour proposed method

413 Categories identified e third dimension on which weevaluate our proposed method is its ability to discover as many cat-egories of fake news as possible As seen in Fig 3d-f bias hate andjunksci are relatively easy for all of the decomposition algorithms

(a) homogeniety (b) outliers diversity (c) kernel sensitivity

(d) bias (e) hate (f) conspiracy

(g) satire (h) state (i) junksciFigure 3 1tier decomposition algorithm a) homogeneity b) diversity of outliers c) kernel sensitivity d-f) average homogeneity in dierentcategories

In contrast to other algorithms our proposed CPPARAFAC-basedmethod is able to classify other harder categories with high homo-geneity More specically most other methods algorithms to evenidentify the category of state while our proposed method is iden-tifying this category with 80 accuracy erefore our proposedone-tier algorithm outperforms other algorithms in both overallhomogeneity and also capability of nding all categories

414 Spatial kernel sensitivity We investigate the sensitivity ofproposed CPPARAFAC the size of the spatial kernel δ for the termsAs seen previously considering the spatial relations in constructingthe corresponding tensor lead to homogeneity improvements offactors However one may ask which kernel size of terms resultsin a beer homogeneity or how sensitive is the proposed decom-position to kernel size Figure 3c illustrates the homogeneity ofCPPARAFAC with respect to kernel size We observe than ourmethod is not particularly sensitive to the exact choice of kernel δ

42 Evaluation of Tier-2In this section we discuss that beer homogeneity can be obtainedaer keeping track of relations between news We will show thatthe proposed method boosts the homogeneity while decreases thevariety of outliers Furthermore we show that homogeneity of eachfalse news category individually In this paper we partition factorsof SMR with their ecdf values of 90 80 and 65 percentiles aerexcluding zeros en we add a column to for each correspondingCPPARAFAC factor for zero values

421 Homogeneity In this section SMR is employed aer ap-plying Tier-1 decomposition method Fig 4a depicts the resultsof the proposed algorithm comparing with others Our proposedalgorithm outperforms other algorithms signicantly One meritof employing the proposed SMR on the Tier-1 decomposition isstabilization on homogeneity We observe that there in less vari-ation in homogeneity when selecting more top news from eachfactor Furthermore comparing to Tier-1 CPPARAFAC showssignicant improvement of the proposed CPPARAFAC-SMR in

(a) homogeniety (b) outliers diversity (c) rank sensitivity

(g) satire (h) state (i) junksciFigure 4 e results of proposed Tier-2 decomposition with δ = 10 a) homogeneity of ensemble co-cluster is almost 13 more than 1-tierCPPARAFAC b) diversity of outliers reduces by one on average comparing to 1-tier algorithms in c) sensitivity to dierent SMR ranksd-f)kmeans on constructed ensemble matrix of CPPARAFAC state satire junksci and biased clustered with low homogeneity g-l) averagehomogeneity in dierent categories all categories are clustered with an average of 90 homogenity

purity of some factors Later we will discuss the homogeneity ofeach new category separately

422 Outlier variety Fig 4b illustrates the varieties of outliersusing the proposed algorithm As one can see not only the numberof outliers but also the varieties of outliers reduce signicantlyIn some factors there exist only one type of outlier Comparingthe proposed method with tier-1 CPPARAFAC reveals that outliervariety reduces by more than one on average is indicates thatthe proposed algorithm eectively suppresses outliers

Table 2 describes the percentage of categories existed as outliersin other categories Conspiracy is the only outlier of state categoryand the dominant outliers of bias Moreover it seems similarity inhate and satire news although the class of hate news has a profoundhomogeneity

423 Categories identified We investigate the homogeneityand outliers of dierent news categories individually As seen inFig 4 g-l all of categories are clustered successfully Although the

Table 2 Outliers of each categorycategory outliers (in descending order) - percentage of

outliersSatire Conspiracy (lt 50) JunkSci (lt 50)Bias Conspiracy (lt 70) State (lt 20) JunkSci (lt

10)Conspir-

acyBias(lt 50) State (lt 30) hate (lt 10)

JunkSci Satire (lt 50) Conspiracy (lt 50)Hate Satire (lt 95) Bias(lt 5)State Conspiracy (lt 99)

results seems similar to tier-1 CPPARAFAC the improvement toeach news category is because of eliminating the irrelevant newsIt means that comparing the varieties of outliers in this stage and1-tier CPPARAFAC shows that relevant news is substituted with

news of the category that there is no other news from this categoryexists in anity

424 Sensitivity to decomposition rank Fig 4c shows the sensi-tivity of SMR aer employing CPPARAFAC As one can see thehomogeneity of the proposed method is robust with respect todierent decomposition rank congurations of SMR We observedthat although the homogeneity of classes in rank 8 is slightly higherthan that of other ranks it nds fewer kinds of categories compar-ing to others

5 RELATEDWORKIn order to detect misinformation supervised learning approacheshave been widely used to detect false information In [10] logis-tic regression classier proposed to detect the mismatch betweenheadline and content of articles Gupta et al [13] proposed a classi-er to estimate the credibility of tweet with various features suchas number of words URLs hashtags emojis presence of swearwords pronouns Horne et al [15] proposed a SVM algorithm onselected stylistic complexity and psychological features of bothtitle and body of articles to classify real fake and satire news basedon their contents Qazvinian et al [25] proposed a Bayes classieron content-based network-based and microblog-specic featuresdetect rumors on tweetsrsquo contents In [27] Klatsch framework hasbeen proposed with AdaBoost and SVM classifers on topologicalcrowdsourced and content-based (Hashtags Mentions URLs andPhrases with sentiment extraction) features to detect political mis-information at the early stage on Twier In [36] author extractedseveral statistical features from tweets such as tweet lengths theaverage number of hashtags the entropy ratio of the word fre-quency distribution for training a decision tree model to rank thelikelihood tweets In all of aforementioned research extractedfeatures have been substituted for content

NMF based algorithms have been widely used to clustering doc-uments Kuang et al [20] proposed a 2NMF to cluster hierarchicaldocuments based on contents e NMF feature extraction of doc-uments in [32] suers from adjusting the number of keywords forcategories In [16] authors proposed an anchor-free topic identi-cation based on word-word co-occurrence matrix However thelocation of terms and their anity relations in documents havenot been studied in this research

ere are some research that take the early labeled data in ac-count to estimate the credibility of newly emerging data In [33]authors proposed framework to clusters data into dierent ru-mor categories select features and train classiers that can detectemerging rumors using prior labeled rumors Sampson et al [29]proposed to use implicit linking on hashtags and web address forhelping the classication on set of content-based features [21]within detection deadline In this paper we focus our research onrelation between news to cluster them into categories instead ofutilizing the previous labeled information

In [22] authors proposed several techniques for learning dis-tributed representations of words such as Continuous Bag-of-Words(CBOW) Model Continuous Skip-gram Model to train high dimen-sional word vectors On contrary we employ the co-occurance ofterms within a sliding window for each document accordingly con-structing a tensor of documents Unlike [22] we proposed a spatial

CPPARAFAC that considers the relations of terms among docu-ments and representation of words within a documents withoutfurther training for classication

6 CONCLUSIONIn this paper we set out to cluster false news into dierent cat-egories We proposed a tensor based scheme which eectivelyleverages term context via capturing spatial relations betweenterms for each article We further introduce an ensemble methodwhich is able to consolidate and rene the results of multiple tensordecompositions into a single high-quality and high-coherence setof article clusters which achieves higher coherence than state-of-the-art baselines and is able to identify all dierent categories offake news within our dataset

7 ACKNOWLEDGEMENTSResearch was supported by an Adobe Data Science Research FacultyAward Any opinions ndings and conclusions or recommenda-tions expressed in this material are those of the author(s) and donot necessarily reect the views of the funding parties

REFERENCES[1] Fake news challenge dataset Available online hpwwwfakenewschallenge

org FEB 2017[2] Hunt Allco and Mahew Gentzkow Social media and fake news in the 2016

election Technical report National Bureau of Economic Research 2017[3] Bre W Bader Tamara G Kolda et al Matlab tensor toolbox version 26 Avail-

able online February 2015[4] Farida Vis Burgess Jean and Axel Bruns Hurricane sandy the most tweeted

pictures the guardian data blog Available online hpwwwguardiancouknewsdatabloggallery2012nov06hurricane-sandy-tweeted-pictures Novem-ber 2012

[5] Damon Centola e spread of behavior in an online social network experimentscience 329(5996)1194ndash1197 2010

[6] Eric C Chi and Tamara G Kolda On tensors sparsity and nonnegative factor-izations SIAM Journal on Matrix Analysis and Applications 33(4)1272ndash12992012

[7] Sco Deerwester Susan T Dumais George W Furnas omas K Landauer andRichard Harshman Indexing by latent semantic analysis Journal of the Americansociety for information science 41(6)391 1990

[8] BS Detector Fake news classications hpbsdetectortech 2017[9] N D Sidiropoulos E E Papalexakis and M N Garofalakis Smr library Available

online hpwwwmodelslifekudkcocluster February 2015[10] William Ferreira and Andreas Vlachos Emergent a novel data-set for stance

classication In Proceedings of the 2016 Conference of the North American Chapterof the Association for Computational Linguistics Human Language TechnologiesACL 2016

[11] e Gaurdian To tackle the spread of misinformation online we must rstunderstand it Available online hpswwwtheguardiancomcommentisfree2014apr24tackle-spread-misinformation-online 2014

[12] Nicolas Gillis Robustness analysis of hoopixx a linear programming model forfactoring nonnegative matrices SIAM Journal on Matrix Analysis and Applica-tions 34(3)1189ndash1212 2013

[13] Aditi Gupta and Ponnurangam Kumaraguru Credibility ranking of tweets duringhigh impact events In Proceedings of the 1st Workshop on Privacy and Security inOnline Social Media PSOSM rsquo12 pages 22ndash28 New York NY USA 2012 ACM

[14] Richard A Harshman Foundations of the parafac procedure Models and condi-tions for an ldquoexplanatoryrdquo multi-modal factor analysis UCLA Working Papers inPhonetic 1970 Available at hppublishuwocaharshmanwpppfac0pdf

[15] Benjamin Horne and Sibel Adali is just in Fake news packs a lot in title usessimpler repetitive content in text body more similar to satire than real news In11th International AAAI Conference on Web and Social Media 2017

[16] Kejun Huang Xiao Fu and Nikolaos D Sidiropoulos Anchor-free correlatedtopic modeling Identiability and algorithm In Advances in Neural InformationProcessing Systems pages 1786ndash1794 2016

[17] David Kempe Jon Kleinberg and va Tardos Maximizing the spread of inuencethrough a social network In In Proceedings of the ninth ACMSIGKDD internationalconference on Knowledge discovery and data mining KDD 03 pages 137ndash146ACM 2003

[18] Jingu Kim and Haesun Park Toward faster nonnegative matrix factorization Anew algorithm and comparisons In Data Mining 2008 ICDMrsquo08 Eighth IEEEInternational Conference on pages 353ndash362 IEEE 2008

[19] omas Kolenda Lars Kai Hansen and Sigurdur Sigurdsson Independent com-ponents in text In Advances in independent component analysis pages 235ndash256Springer 2000

[20] Da Kuang and Haesun Park Fast rank-2 nonnegative matrix factorization forhierarchical document clustering In Proceedings of the 19th ACM SIGKDD Inter-national Conference on Knowledge Discovery and Data Mining KDD rsquo13 pages739ndash747 New York NY USA 2013 ACM

[21] Jing Ma Wei Gao Zhongyu Wei Yueming Lu and Kam-Fai Wong Detect rumorsusing time series of social context information on microblogging websites InProceedings of the 24th ACM International on Conference on Information andKnowledge Management CIKM rsquo15 pages 1751ndash1754 New York NY USA 2015ACM

[22] Tomas Mikolov Kai Chen Greg Corrado and Jerey Dean Ecient estimationof word representations in vector space arXiv preprint arXiv13013781 2013

[23] Evangelos E Papalexakis and Nicholas D Sidiropoulos Co-clustering as multi-linear decomposition with sparse latent factors In Acoustics Speech and SignalProcessing (ICASSP) 2011 IEEE International Conference on pages 2064ndash2067IEEE 2011

[24] Martin Porter Porterstemming library Available online hpstartarusorgmartinPorterStemmer 2017

[25] Vahed Qazvinian Emily Rosengren Dragomir R Radev and Qiaozhu Mei Rumorhas it Identifying misinformation in microblogs In Proceedings of the Confer-ence on Empirical Methods in Natural Language Processing pages 1589ndash1599Association for Computational Linguistics 2011

[26] Kenneth Rapoza Can rsquofake newsrsquo impact the stock market Avail-able online hpswwwforbescomsiteskenrapoza20170226can-fake-news-impact-the-stock-market6584de9d2fac Feb 2017

[27] Jacob Ratkiewicz Michael Conover Mark R Meiss Bruno Goncalves AlessandroFlammini and Filippo Menczer Detecting and tracking political abuse in socialmedia ICWSM 11297ndash304 2011

[28] Megan Risdal Fake news dataset hpswwwkagglecommrisdalfake-news2017

[29] Justin Sampson Fred Morstaer Liang Wu and Huan Liu Leveraging the im-plicit structure within social media for emergent rumor detection In Proceedingsof the 25th ACM International on Conference on Information and Knowledge Man-agement CIKM rsquo16 pages 2377ndash2382 New York NY USA 2016 ACM

[30] Chengcheng Shao Giovanni Luca Ciampaglia Alessandro Flammini and FilippoMenczer Hoaxy A platform for tracking online misinformation In Proceedings ofthe 25th International Conference Companion on World Wide Web pages 745ndash750International World Wide Web Conferences Steering Commiee 2016

[31] e Independent Shih Gerry Boston marathon bombings How twier andreddit got it wrong Available online hpwwwindependentcouknewsworldamericasboston-marathon-bombings-how-twier-and-reddit-got-it-wrong-8581167html April 2013

[32] Hyun Ah Song and Soo-Young Lee Hierarchical representation using nmf In In-ternational Conference on Neural Information Processing pages 466ndash473 Springer2013

[33] Liang Wu Jundong Li Xia Hu and Huan Liu Gleaning wisdom from the pastEarly detection of emerging rumors in social media SDM 2016

[34] Wei Xu Xin Liu and Yihong Gong Document clustering based on non-negativematrix factorization In Proceedings of the 26th annual international ACM SIGIRconference on Research and development in informaion retrieval pages 267ndash273ACM 2003

[35] Reza Zafarani Mohammad Ali Abbasi and Huan Liu Social media mining anintroduction Cambridge University Press 2014

[36] Zhe Zhao Paul Resnick and Qiaozhu Mei Enquiring minds Early detection ofrumors in social media from enquiry posts In Proceedings of the 24th InternationalConference on World Wide Web WWW rsquo15 pages 1395ndash1405 Republic andCanton of Geneva Switzerland 2015 International World Wide Web ConferencesSteering Commiee

introduction

Preliminaries and Notations

CPPARAFAC Tensor Decomposition

Soft Co-Clustering

Problem definition and proposed method

Problem description

Tier-1 Spatial relation extraction

Tier 2 Tensor Ensemble Co-clustering

Experimental Evaluation

Evaluation of Tier-1


related work

conclusion

Acknowledgements