Analysis Methods in Neural Language Processing: A Survey · on neural networks for language is beyond our scope.4 However, we mention here a few representative studies that focused

Analysis Methods in Neural Language Processing: A Survey

Yonatan Belinkov1,2 and James Glass1

1MIT Computer Science and Artificial Intelligence Laboratory2Harvard School of Engineering and Applied Sciences

Cambridge, MA, USA{belinkov, glass}@mit.edu

Abstract

The field of natural language processing hasseen impressive progress in recent years, withneural network models replacing many of thetraditional systems. A plethora of new mod-els have been proposed, many of which arethought to be opaque compared to their feature-rich counterparts. This has led researchers toanalyze, interpret, and evaluate neural net-works in novel and more fine-grained ways. Inthis survey paper, we review analysis meth-ods in neural language processing, categorizethem according to prominent research trends,highlight existing limitations, and point to po-tential directions for future work.

1 Introduction

The rise of deep learning has transformed the fieldof natural language processing (NLP) in recentyears. Models based on neural networks haveobtained impressive improvements in varioustasks, including language modeling (Mikolovet al., 2010; Jozefowicz et al., 2016), syntacticparsing (Kiperwasser and Goldberg, 2016),machine translation (MT) (Bahdanau et al., 2014;Sutskever et al., 2014), and many other tasks; seeGoldberg (2017) for example success stories.

This progress has been accompanied by amyriad of new neural network architectures. Inmany cases, traditional feature-rich systems arebeing replaced by end-to-end neural networksthat aim to map input text to some output pre-diction. As end-to-end systems are gaining preva-lence, one may point to two trends. First, somepush back against the abandonment of linguis-tic knowledge and call for incorporating it inside

the networks in different ways.1 Others strive tobetter understand how NLP models work. Thistheme of analyzing neural networks has connec-tions to the broader work on interpretability inmachine learning, along with specific characteris-tics of the NLP field.

Why should we analyze our neural NLP mod-els? To some extent, this question falls intothe larger question of interpretability in machinelearning, which has been the subject of muchdebate in recent years.2 Arguments in favorof interpretability in machine learning usuallymention goals like accountability, trust, fairness,safety, and reliability (Doshi-Velez and Kim,2017; Lipton, 2016). Arguments against inter-pretability typically stress performance as themost important desideratum. All these argumentsnaturally apply to machine learning applicationsin NLP.

In the context of NLP, this question needs tobe understood in light of earlier NLP work, oftenreferred to as feature-rich or feature-engineeredsystems. In some of these systems, features aremore easily understood by humans—they can bemorphological properties, lexical classes, syn-tactic categories, semantic relations, etc. In theory,one could observe the importance assigned bystatistical NLP models to such features in orderto gain a better understanding of the model.3 In

1See, for instance, Noah Smith’s invited talk at ACL2017: vimeo.com/234958746. See also a recent debateon this matter by Chris Manning and Yann LeCun: www.youtube.com/watch?v=fKk9KhGRBdI. (Videosaccessedon December 11, 2018.)

2See, for example, the NIPS 2017 debate: www.youtube.com/watch?v=2hW05ZfsUUo. (Accessed on December 11,2018.)

3Nevertheless, one could question how feasible suchan analysis is; consider, for example, interpreting support vec-tors in high-dimensional support vector machines (SVMs).

49

Transactions of the Association for Computational Linguistics, vol. 7, pp. 49–72, 2019. Action Editor: Marco Baroni.Submission batch: 10/2018; Revision batch: 12/2018; Published 3/2019.

c© 2019 Association for Computational Linguistics. Distributed under a CC-BY 4.0 license.

vimeo.com/234958746

https://www.youtube.com/watch?v=2hW05ZfsUUo

https://www.youtube.com/watch?v=2hW05ZfsUUo

contrast, it is more difficult to understand whathappens in an end-to-end neural network modelthat takes input (say, word embeddings) andgenerates an output (say, a sentence classification).Much of the analysis work thus aims to understandhow linguistic concepts that were common asfeatures in NLP systems are captured in neuralnetworks.

As the analysis of neural networks for languageis becoming more and more prevalent, neuralnetworks in various NLP tasks are being analyzed;different network architectures and componentsare being compared, and a variety of new anal-ysis methods are being developed. This surveyaims to review and summarize this body of work,highlight current trends, and point to existinglacunae. It organizes the literature into severalthemes. Section 2 reviews work that targets afundamental question: What kind of linguistic in-formation is captured in neural networks? Wealso point to limitations in current methods foranswering this question. Section 3 discusses visu-alization methods, and emphasizes the difficultyin evaluating visualization work. In Section 4,we discuss the compilation of challenge sets, ortest suites, for fine-grained evaluation, a meth-odology that has old roots in NLP. Section 5deals with the generation and use of adversarialexamples to probe weaknesses of neural networks.We point to unique characteristics of dealing withtext as a discrete input and how different studieshandle them. Section 6 summarizes work onexplaining model predictions, an important goalof interpretability research. This is a relativelyunderexplored area, and we call for more workin this direction. Section 7 mentions a few othermethods that do not fall neatly into one of theabove themes. In the conclusion, we summarizethe main gaps and potential research directions forthe field.

The paper is accompanied by online supple-mentary materials that contain detailed referencesfor studies corresponding to Sections 2, 4, and5 (Tables SM1, SM2, and SM3, respectively),available at https://boknilev.github.io/nlp-analysis-methods.

Before proceeding, we briefly mention someearlier work of a similar spirit.

A Historical Note Reviewing the vast literatureon neural networks for language is beyond

our scope.4 However, we mention here a fewrepresentative studies that focused on analyzingsuch networks in order to illustrate how recenttrends have roots that go back to before the recentdeep learning revival.

Rumelhart and McClelland (1986) built afeedforward neural network for learning theEnglish past tense and analyzed its performanceon a variety of examples and conditions. Theywere especially concerned with the performanceover the course of training, as their goal was tomodel the past form acquisition in children. Theyalso analyzed a scaled-down version having eightinput units and eight output units, which allowedthem to describe it exhaustively and examine howcertain rules manifest in network weights.

In his seminal work on recurrent neuralnetworks (RNNs), Elman trained networks onsynthetic sentences in a language predictiontask (Elman, 1989, 1990, 1991). Through exten-sive analyses, he showed how networks discoverthe notion of a word when predicting characters;capture syntactic structures like number agree-ment; and acquire word representations thatreflect lexical and syntactic categories. Similaranalyses were later applied to other networks andtasks (Harris, 1990; Niklasson and Linaker, 2000;Pollack, 1990; Frank et al., 2013).

While Elman’s work was limited in some ways,such as evaluating generalization or various lin-guistic phenomena—as Elman himself recog-nized (Elman, 1989)—it introduced methods thatare still relevant today: from visualizing networkactivations in time, through clustering words byhidden state activations, to projecting represen-tations to dimensions that emerge as capturingproperties like sentence number or verb valency.The sections on visualization (Section 3) and iden-tifying linguistic information (Section 2) containmany examples for these kinds of analysis.

2 What Linguistic Information IsCaptured in Neural Networks?

Neural network models in NLP are typicallytrained in an end-to-end manner on input–outputpairs, without explicitly encoding linguistic

4For instance, a neural network that learns distributedrepresentations of words was developed already inMiikkulainen and Dyer (1991). See Goodfellow et al. (2016,chapter 12.4) for references to other important milestones.

50

http://boknilev.github.io/nlp-analysis-methods

http://boknilev.github.io/nlp-analysis-methods

features. Thus, a primary question is the fol-lowing: What linguistic information is capturedin neural networks? When examining answersto this question, it is convenient to considerthree dimensions: which methods are used forconducting the analysis, what kind of linguisticinformation is sought, and which objects in theneural network are being investigated. Table SM1(in the supplementary materials) categorizes rel-evant analysis work according to these criteria. Inthe next subsections, we discuss trends in analysiswork along these lines, followed by a discussionof limitations of current approaches.

2.1 Methods

The most common approach for associating neuralnetwork components with linguistic propertiesis to predict such properties from activations ofthe neural network. Typically, in this approach aneural network model is trained on some task(say, MT) and its weights are frozen. Then,the trained model is used for generating featurerepresentations for another task by running it ona corpus with linguistic annotations and recordingthe representations (say, hidden state activations).Another classifier is then used for predicting theproperty of interest (say, part-of-speech [POS]tags). The performance of this classifier isused for evaluating the quality of the generatedrepresentations, and by proxy that of the originalmodel. This kind of approach has been usedin numerous papers in recent years; see Table SM1for references.5 It is referred to by various names,including ‘‘auxiliary prediction tasks’’ (Adi et al.,2017b), ‘‘diagnostic classifiers’’ (Veldhoen et al.,2016), and ‘‘probing tasks’’ (Conneau et al., 2018).

As an example of this approach, let us walkthrough an application to analyzing syntax inneural machine translation (NMT) by Shi et al.(2016b). In this work, two NMT models weretrained on standard parallel data—English→French and English→German. The trained models(specifically, the encoders) were run on anannotated corpus and their hidden states wereused for training a logistic regression classifierthat predicts different syntactic properties. Theauthors concluded that the NMT encoders learn

5A similar method has been used to analyze hierarchicalstructure in neural networks trained on arithmetic expressions(Veldhoen et al., 2016; Hupkes et al., 2018).

significant syntactic information at both wordlevel and sentence level. They also comparedrepresentations at different encoding layers andfound that ‘‘local features are somehow preservedin the lower layer whereas more global, abstractinformation tends to be stored in the upperlayer.’’ These results demonstrate the kind ofinsights that the classification analysis may leadto, especially when comparing different modelsor model components.

Other methods for finding correspondencesbetween parts of the neural network and certainproperties include counting how often attentionweights agree with a linguistic property likeanaphora resolution (Voita et al., 2018) or directlycomputing correlations between neural networkactivations and some property; for example,correlating RNN state activations with depthin a syntactic tree (Qian et al., 2016a) orwith Melfrequency cepstral coefficient (MFCC)acoustic features (Wu and King, 2016). Suchcorrespondence may also be computed indirectly.For instance, Alishahi et al. (2017) defined anABX discrimination task to evaluate how a neuralmodel of speech (grounded in vision) encodedphonology. Given phoneme representations fromdifferent layers in their model, and three pho-nemes, A, B, and X, they compared whetherthe model representation for X is closer to Aor B. This discrimination task enabled them todraw conclusions about which layers encoderphonology better, observing that lower layersgenerally encode more phonological information.

2.2 Linguistic Phenomena

Different kinds of linguistic information havebeen analyzed, ranging from basic properties likesentence length, word position, word presence, orsimple word order, to morphological, syntactic,and semantic information. Phonetic/phonemicinformation, speaker information, and style andaccent information have been studied in neuralnetwork models for speech, or in joint audio-visualmodels. See Table SM1 for references.

While it is difficult to synthesize a holisticpicture from this diverse body of work, it ap-pears that neural networks are able to learn asubstantial amount of information on variouslinguistic phenomena. These models are especiallysuccessful at capturing frequent properties, whilesome rare properties are more difficult to learn.

51

Linzen et al. (2016), for instance, found that longshort-term memory (LSTM) language modelsare able to capture subject–verb agreement inmany common cases, while direct supervision isrequired for solving harder cases.

Another theme that emerges in severalstudies is the hierarchical nature of the learnedrepresentations. We have already mentioned suchfindings regarding NMT (Shi et al., 2016b) and avisually grounded speech model (Alishahi et al.,2017). Hierarchical representations of syntax werealso reported to emerge in other RNN models(Blevins et al., 2018).

Finally, a couple of papers discovered thatmodels trained with latent trees perform betteron natural language inference (NLI) (Williamset al., 2018; Maillard and Clark, 2018) thanones trained with linguistically annotated trees.Moreover, the trees in these models do notresemble syntactic trees corresponding to knownlinguistic theories, which casts doubts on theimportance of syntax-learning in the underlyingneural network.6

2.3 Neural Network ComponentsIn terms of the object of study, various neuralnetwork components were investigated, includingword embeddings, RNN hidden states or gateactivations, sentence embeddings, and attentionweights in sequence-to-sequence (seq2seq) mod-els. Generally less work has analyzed convo-lutional neural networks in NLP, but see Jacoviet al. (2018) for a recent exception. In speechprocessing, researchers have analyzed layers indeep neural networks for speech recognitionand different speaker embeddings. Some analysishas also been devoted to joint language–visionor audio–vision models, or to similarities bet-ween word embeddings and con volutional imagerepresentations. Table SM1 provides detailedreferences.

2.4 LimitationsThe classification approach may find that a certainamount of linguistic information is captured in theneural network. However, this does not necessar-ily mean that the information is used by the net-work. For example, Vanmassenhove et al. (2017)

6Others found that even simple binary trees may work wellin MT (Wang et al., 2018b) and sentence classification (Chenet al., 2015).

investigated aspect in NMT (and in phrase-basedstatistical MT). They trained a classifier on NMTsentence encoding vectors and found that they canaccurately predict tense about 90% of the time.However, when evaluating the output translations,they found them to have the correct tense only79% of the time. They interpreted this resultto mean that ‘‘part of the aspectual informationis lost during decoding.’’ Relatedly, Cıfka andBojar (2018) compared the performance of variousNMT models in terms of translation quality(BLEU) and representation quality (classificationtasks). They found a negative correlation betweenthe two, suggesting that high-quality systemsmay not be learning certain sentence meanings.In contrast, Artetxe et al. (2018) showed thatword embeddings contain divergent linguisticinformation, which can be uncovered by applyinga linear transformation on the learned embeddings.Their results suggest an alternative explanation,showing that ‘‘embedding models are able toencode divergent linguistic information but havelimits on how this information is surfaced.’’

From a methodological point of view, mostof the relevant analysis work is concerned withcorrelation: How correlated are neural networkcomponents with linguistic properties? What maybe lacking is a measure of causation: How doesthe encoding of linguistic properties affect thesystem output? Giulianelli et al. (2018) make someheadway on this question. They predicted numberagreement from RNN hidden states and gatesat different time steps. They then intervened inhow the model processes the sentence by changinga hidden activation based on the differencebetween the prediction and the correct label. Thisimproved agreement prediction accuracy, and theeffect persisted over the course of the sentence,indicating that this information has an effect on themodel. However, they did not report the effect onoverall model quality, for example by measuringperplexity. Methods from causal inference mayshed new light on some of these questions.

Finally, the predictor for the auxiliary taskis usually a simple classifier, such as logisticregression. A few studies compared different clas-sifiers and found that deeper classifiers lead tooverall better results, but do not alter the respectivetrends when comparing different models or com-ponents (Qian et al., 2016b; Belinkov, 2018).Interestingly, Conneau et al. (2018) found thattasks requiring more nuanced linguistic knowledge

52

Figure 1: A heatmap visualizing neuron activations.In this case, the activations capture position in thesentence.

(e.g., tree depth, coordination inversion) gain themost from using a deeper classifier. However, theapproach is usually taken for granted; given itsprevalence, it appears that better theoretical orempirical foundations are in place.

3 Visualization

Visualization is a valuable tool for analyzingneural networks in the language domain andbeyond. Early work visualized hidden unit ac-tivations in RNNs trained on an artificial lan-guage modeling task, and observed how theycorrespond to certain grammatical relations suchas agreement (Elman, 1991). Much recent workhas focused on visualizing activations on spe-cific examples in modern neural networks forlanguage (Karpathy et al., 2015; Kadar et al.,2017; Qian et al., 2016a; Liu et al., 2018) andspeech (Wu and King, 2016; Nagamine et al.,2015; Wang et al., 2017b). Figure 1 shows anexample visualization of a neuron that capturesposition of words in a sentence. The heatmapuses blue and red colors for negative and positiveactivation values, respectively, enabling the userto quickly grasp the function of this neuron.

The attention mechanism that originated inwork on NMT (Bahdanau et al., 2014) also lendsitself to a natural visualization. The alignmentsobtained via different attention mechanisms haveproduced visualizations ranging from tasks likeNLI (Rocktaschel et al., 2016; Yin et al.,2016), summarization (Rush et al., 2015), MTpost-editing (Jauregi Unanue et al., 2018), andmorphological inflection (Aharoni and Goldberg,2017) to matching users on social media (Tayet al., 2018). Figure 2 reproduces a visualizationof attention alignments from the original work byBahdanau et al. Here grayscale values correspondto the weight of the attention between words in anEnglish source sentence (columns) and its Frenchtranslation (rows). As Bahdanau et al. explain, thisvisualization demonstrates that the NMT modellearned a soft alignment between source and targetwords. Some aspects of word order may also be

Figure 2: A visualization of attention weights, showingsoft alignment between source and target sentencesin an NMT model. Reproduced from Bahdanau et al.(2014), with permission.

noticed, as in the reordering of noun and adjectivewhen translating the phrase ‘‘European EconomicArea.’’

Another line of work computes various saliencymeasures to attribute predictions to input features.The important or salient features can then bevisualized in selected examples (Li et al., 2016a;Aubakirova and Bansal, 2016; Sundararajan et al.,2017; Arras et al., 2017a,b; Ding et al., 2017;Murdoch et al., 2018; Mudrakarta et al., 2018;Montavon et al., 2018; Godin et al., 2018).Saliency can also be computed with respect tointermediate values, rather than input features(Ghaeini et al., 2018).7

An instructive visualization technique is tocluster neural network activations and comparethem to some linguistic property. Early workclustered RNN activations, showing that they or-ganize in lexical categories (Elman, 1989, 1990).Similar techniques have been followed by others.Recent examples include clustering of sentenceembeddings in an RNN encoder trained in amultitask learning scenario (Brunner et al., 2017),and phoneme clusters in a joint audio-visual RNNmodel (Alishahi et al., 2017).

A few online tools for visualizing neural net-works have recently become available. LSTMVis

7Generally, many of the visualization methods areadapted from the vision domain, where they have beenextremely popular; see Zhang and Zhu (2018) for a survey.

53

(Strobelt et al., 2018b) visualizes RNN activa-tions, focusing on tracing hidden state dynamics.8

Seq2Seq-Vis (Strobelt et al., 2018a) visual-izes different modules in attention-based seq2seqmodels, with the goal of examining model deci-sions and testing alternative decisions. Anothertool focused on comparing attention alignmentswas proposed by Rikters (2018). It also providestranslation confidence scores based on the distri-bution of attention weights. NeuroX (Dalvi et al.,2019b) is a tool for finding and analyzing indi-vidual neurons, focusing on machine translation.

Evaluation As in much work on interpretabil-ity, evaluating visualization quality is difficultand often limited to qualitative examples. A fewnotable exceptions report human evaluations ofvisualization quality. Singh et al. (2018) showedhuman raters hierarchical clusterings of inputwords generated by two interpretation methods,and asked them to evaluate which method is moreaccurate, or in which method they trust more.Others reported human evaluations for attentionvisualization in conversation modeling (Freemanet al., 2018) and medical code prediction tasks(Mullenbach et al., 2018).

The availability of open-source tools of the sortdescribed above will hopefully encourage usersto utilize visualization in their regular researchand development cycle. However, it remains to beseen how useful visualizations turn out to be.

4 Challenge Sets

The majority of benchmark datasets in NLP aredrawn from text corpora, reflecting a naturalfrequency distribution of language phenomena.While useful in practice for evaluating systemperformance in the average case, such datasetsmay fail to capture a wide range of phenomena.An alternative evaluation framework consists ofchallenge sets, also known as test suites, whichhave been used in NLP for a long time (Lehmannet al., 1996), especially for evaluating MT sys-tems (King and Falkedal, 1990; Isahara, 1995;Koh et al., 2001). Lehmann et al. (1996) notedseveral key properties of test suites: systematicity,control over data, inclusion of negative data,

8RNNVis (Ming et al., 2017) is a similar tool, but itsonline demo does not seem to be available at the time ofwriting.

and exhaustivity. They contrasted such datasetswith test corpora, ‘‘whose main advantage isthat they reflect naturally occurring data.’’ Thisidea underlines much of the work on challengesets and is echoed in more recent work (Wanget al., 2018a). For instance, Cooper et al. (1996)constructed a semantic test suite that targets phe-nomena as diverse as quantifiers, plurals, ana-phora, ellipsis, adjectival properties, and so on.

After a hiatus of a couple of decades,9 challengesets have recently gained renewed popularity inthe NLP community. In this section, we includedatasets used for evaluating neural networkmodels that diverge from the common average-case evaluation. Many of them share some ofthe properties noted by Lehmann et al. (1996),although negative examples (ill-formed data) aretypically less utilized. The challenge datasets canbe categorized along the following criteria: thetask they seek to evaluate, the linguistic phe-nomena they aim to study, the language(s) theytarget, their size, their method of construction,and how performance is evaluated.10 Table SM2(in the supplementary materials) categorizes manyrecent challenge sets along these criteria. Belowwe discuss common trends along these lines.

4.1 Task

By far, the most targeted tasks in challenge setsare NLI and MT. This can partly be explained bythe popularity of these tasks and the prevalence ofneural models proposed for solving them. Perhapsmore importantly, tasks like NLI and MT arguablyrequire inferences at various linguistic levels,making the challenge set evaluation especiallyattractive. Still, other high-level tasks like readingcomprehension or question answering have notreceived as much attention, and may also benefitfrom the careful construction of challenge sets.

A significant body of work aims to evaluatethe quality of embedding models by correlatingthe similarity they induce on word or sentencepairs with human similarity judgments. Datasetscontaining such similarity scores are often used

9One could speculate that their decrease in popularitycan be attributed to the rise of large-scale quantitative eval-uation of statistical NLP systems.

10Another typology of evaluation protocols was put forthby Burlot and Yvon (2017). Their criteria are partiallyoverlapping with ours, although they did not provide acomprehensive categorization like the one compiled here.

54

to evaluate word embeddings (Finkelstein et al.,2002; Bruni et al., 2012; Hill et al., 2015, interalia) or sentence embeddings; see the manyshared tasks on semantic textual similarity inSemEval (Cer et al., 2017, and previous editions).Many of these datasets evaluate similarity at acoarse-grained level, but some provide a morefine-grained evaluation of similarity or related-ness. For example, some datasets are dedicatedfor specific word classes such as verbs (Gerzet al., 2016) or rare words (Luong et al., 2013),or for evaluating compositional knowledge in sen-tence embeddings (Marelli et al., 2014). Mul-tilingual and cross-lingual versions have alsobeen collected (Leviant and Reichart, 2015; Ceret al., 2017). Although these datasets are widelyused, this kind of evaluation has been criticizedfor its subjectivity and questionable correlationwith downstream performance (Faruqui et al.,2016).

4.2 Linguistic Phenomena

One of the primary goals of challenge sets isto evaluate models on their ability to handlespecific linguistic phenomena. While earlierstudies emphasized exhaustivity (Cooper et al.,1996; Lehmann et al., 1996), recent ones tendto focus on a few properties of interest. Forexample, Sennrich (2017) introduced a challengeset for MT evaluation focusing on five proper-ties: subject–verb agreement, noun phrase agree-ment, verb–particle constructions, polarity, andtransliteration. Slightly more elaborated is anMT challenge set for morphology, including14 morphological properties (Burlot and Yvon,2017). See Table SM2 for references to datasetstargeting other phenomena.

Other challenge sets cover a more diverserange of linguistic properties, in the spirit ofsome of the earlier work. For instance, extend-ing the categories in Cooper et al. (1996), theGLUE analysis set for NLI covers more than30 phenomena in four coarse categories (lexicalsemantics, predicate–argument structure, logic,and knowledge). In MT evaluation, Burchardtet al. (2017) reported results using a large testsuite covering 120 phenomena, partly based onLehmann et al. (1996).11 Isabelle et al. (2017)

11Their dataset does not seem to be available yet, but moredetails are promised to appear in a future publication.

and Isabelle and Kuhn (2018) prepared challengesets for MT evaluation covering fine-grainedphenomena at morpho-syntactic, syntactic, andlexical levels.

Generally, datasets that are constructed pro-grammatically tend to cover less fine-grainedlinguistic properties, while manually constructeddatasets represent more diverse phenomena.

4.3 Languages

As unfortunately usual in much NLP work, es-pecially neural NLP, the vast majority of challengesets are in English. This situation is slightly betterin MT evaluation, where naturally all datasetsfeature other languages (see Table SM2). Anotable exception is the work by Gulordava et al.(2018), who constructed examples for evaluatingnumber agreement in language modeling inEnglish, Russian, Hebrew, and Italian. Clearly,there is room for more challenge sets in non-English languages. However, perhaps more press-ing is the need for large-scale non-Englishdatasets (besides MT) to develop neural modelsfor popular NLP tasks.

4.4 Scale

The size of proposed challenge sets varies greatly(Table SM2). As expected, datasets constructedby hand are smaller, with typical sizes in thehundreds. Automatically built datasets are muchlarger, ranging from several thousands to close to ahundred thousand (Sennrich, 2017), or even morethan one million examples (Linzen et al., 2016).In the latter case, the authors argue that such alarge test set is needed for obtaining a sufficientrepresentation of rare cases. A few manuallyconstructed datasets contain a fairly large numberof examples, up to 10 thousand (Burchardt et al.,2017).

4.5 Construction Method

Challenge sets are usually created either prog-rammatically or manually, by handcrafting spe-cific examples. Often, semi-automatic methodsare used to compile an initial list of examples thatis manually verified by annotators. The specificmethod also affects the kind of language use andhow natural or artificial/synthetic the examplesare. We describe here some trends in datasetconstruction methods in the hope that they may beuseful for researchers contemplating new datasets.

55

Several datasets were constructed by modifyingor extracting examples from existing datasets.For instance, Sanchez et al. (2018) and Glockneret al. (2018) extracted examples from SNLI(Bowman et al., 2015) and replaced specific wordssuch as hypernyms, synonyms, and antonyms,followed by manual verification. Linzen et al.(2016), on the other hand, extracted examplesof subject–verb agreement from raw texts usingheuristics, resulting in a large-scale dataset.Gulordava et al. (2018) extended this to otheragreement phenomena, but they relied on syntacticinformation available in treebanks, resulting in asmaller dataset.

Several challenge sets utilize existing test suites,either as a direct source of examples (Burchardtet al., 2017) or for searching similar naturallyoccurring examples (Wang et al., 2018a).12

Sennrich (2017) introduced a method for eval-uating NMT systems via contrastive translationpairs, where the system is asked to estimatethe probability of two candidate translations thatare designed to reflect specific linguistic prop-erties. Sennrich generated such pairs program-matically by applying simple heuristics, such aschanging gender and number to induce agreementerrors, resulting in a large-scale challenge setof close to 100 thousand examples. This frame-work was extended to evaluate other properties,but often requiring more sophisticated genera-tion methods like using morphological analyzers/generators (Burlot and Yvon, 2017) or more man-ual involvement in generation (Bawden et al.,2018) or verification (Rios Gonzales et al., 2017).

Finally, a few studies define templates thatcapture certain linguistic properties and instanti-ate them with word lists (Dasgupta et al., 2018;Rudinger et al., 2018; Zhao et al., 2018a).Template-based generation has the advantage ofproviding more control, for example for obtaininga specific vocabulary distribution, but this comesat the expense of how natural the examples are.

4.6 Evaluation

Systems are typically evaluated by their per-formance on the challenge set examples, eitherwith the same metric used for evaluating thesystem in the first place, or via a proxy, as in the

12Wang et al. (2018a) also verified that their examples donot contain annotation artifacts, a potential problem noted inrecent studies (Gururangan et al., 2018; Poliak et al., 2018b).

contrastive pairs evaluation of Sennrich (2017).Automatic evaluation metrics are cheap to obtainand can be calculated on a large scale. However,they may miss certain aspects. Thus a few studiesreport human evaluation on their challenge sets,such as in MT (Isabelle et al., 2017; Burchardtet al., 2017).

We note here also that judging the quality of amodel by its performance on a challenge set canbe tricky. Some authors emphasize their wishto test systems on extreme or difficult cases,‘‘beyond normal operational capacity’’ (Naiket al., 2018). However, whether one should expectsystems to perform well on specially chosen cases(as opposed to the average case) may dependon one’s goals. To put results in perspective,one may compare model performance to humanperformance on the same task (Gulordava et al.,2018).

5 Adversarial Examples

Understanding a model also requires an under-standing of its failures. Despite their successin many tasks, machine learning systems canalso be very sensitive to malicious attacks oradversarial examples (Szegedy et al., 2014;Goodfellow et al., 2015). In the vision domain,small changes to the input image can lead tomisclassification, even if such changes are in-distinguishable by humans.

The basic setup in work on adversarial examplescan be described as follows.13 Given a neuralnetwork model f and an input example x, weseek to generate an adversarial example x′ thatwill have a minimal distance from x, while beingassigned a different label by f :

minx′||x− x′||

s.t. f(x) = l, f(x′) = l′, l 6= l′

In the vision domain, x can be the input imagepixels, resulting in a fairly intuitive interpretationof this optimization problem: measuring thedistance ||x − x′|| is straightforward, and findingx′ can be done by computing gradients with respectto the input, since all quantities are continuous.

In the text domain, the input is discrete (forexample, a sequence of words), which poses twoproblems. First, it is not clear how to measure

13The notation here follows Yuan et al. (2017).

56

the distance between the original and adversarialexamples, x and x′, which are two discrete objects(say, two words or sentences). Second, minimizingthis distance cannot be easily formulated as anoptimization problem, as this requires computinggradients with respect to a discrete input.

In the following, we review methods forhandling these difficulties according to severalcriteria: the adversary’s knowledge, the specificityof the attack, the linguistic unit being modified,and the task on which the attacked model wastrained.14 Table SM3 (in the supplementary ma-terials) categorizes work on adversarial examplesin NLP according to these criteria.

5.1 Adversary’s KnowledgeAdversarial examples can be generated usingaccess to model parameters, also known aswhite-box attacks, or without such access, withblack-box attacks (Papernot et al., 2016a, 2017;Narodytska and Kasiviswanathan, 2017; Liuet al., 2017).

White-box attacks are difficult to adapt to thetext world as they typically require computinggradients with respect to the input, which wouldbe discrete in the text case. One option is tocompute gradients with respect to the input wordembeddings, and perturb the embeddings. Sincethis may result in a vector that does not correspondto any word, one could search for the closest wordembedding in a given dictionary (Papernot et al.,2016b); Cheng et al. (2018) extended this idea toseq2seq models. Others computed gradients withrespect to input word embeddings to identify andrank words to be modified (Samanta and Mehta,2017; Liang et al., 2018). Ebrahimi et al. (2018b)developed an alternative method by representingtext edit operations in vector space (e.g., a binaryvector specifying which characters in a wordwould be changed) and approximating the changein loss with the derivative along this vector.

Given the difficulty in generating white-boxadversarial examples for text, much research hasbeen devoted to black-box examples. Often, theadversarial examples are inspired by text edits thatare thought to be natural or commonly generatedby humans, such as typos, misspellings, and so

14These criteria are partly taken from Yuan et al. (2017),where a more elaborate taxonomy is laid out. At present,though, the work on adversarial examples in NLP is morelimited than in computer vision, so our criteria will suffice.

on (Sakaguchi et al., 2017; Heigold et al., 2018;Belinkov and Bisk, 2018). Gao et al. (2018)defined scoring functions to identify tokens tomodify. Their functions do not require access tomodel internals, but they do require the modelprediction score. After identifying the importanttokens, they modify characters with common editoperations.

Zhao et al. (2018c) used generative adversar-ial networks (GANs) (Goodfellow et al., 2014) tominimize the distance between latent repre-sentations of input and adversarial examples, andperformed perturbations in latent space. Since thelatent representations do not need to come fromthe attacked model, this is a black-box attack.

Finally, Alzantot et al. (2018) developed aninteresting population-based genetic algorithmfor crafting adversarial examples for text clas-sification by maintaining a population of mod-ifications of the original sentence and evaluatingfitness of modifications at each generation. Theydo not require access to model parameters, but douse prediction scores. A similar idea was proposedby Kuleshov et al. (2018).

5.2 Attack SpecificityAdversarial attacks can be classified to targetedvs. non-targeted attacks (Yuan et al., 2017). Atargeted attack specifies a specific false class, l′,while a nontargeted attack cares only that thepredicted class is wrong, l′ 6= l. Targeted attacksare more difficult to generate, as they typicallyrequire knowledge of model parameters; that is,they are white-box attacks. This might explainwhy the majority of adversarial examples in NLPare nontargeted (see Table SM3). A few targetedattacks include Liang et al. (2018), which specifieda desired class to fool a text classifier, and Chenet al. (2018a), which specified words or captionsto generate in an image captioning model. Otherstargeted specific words to omit, replace, or includewhen attacking seq2seq models (Cheng et al.,2018; Ebrahimi et al., 2018a).

Methods for generating targeted attacks inNLP could possibly take more inspiration fromadversarial attacks in other fields. For instance,in attacking malware detection systems, severalstudies developed targeted attacks in a black-box scenario (Yuan et al., 2017). A black-boxtargeted attack for MT was proposed by Zhaoet al. (2018c), who used GANs to search for

57

attacks on Google’s MT system after mappingsentences into continuous space with adversariallyregularized autoencoders (Zhao et al., 2018b).

5.3 Linguistic UnitMost of the work on adversarial text examplesinvolves modifications at the character- and/orword-level; see Table SM3 for specific references.Other transformations include adding sentencesor text chunks (Jia and Liang, 2017) or gener-ating paraphrases with desired syntactic structures(Iyyer et al., 2018). In image captioning, Chenet al. (2018a) modified pixels in the input imageto generate targeted attacks on the caption text.

5.4 TaskGenerally, most work on adversarial examplesin NLP concentrates on relatively high-levellanguage understanding tasks, such as textclassification (including sentiment analysis) andreading comprehension, while work on text gen-eration focuses mainly on MT. See Table SM3for references. There is relatively little work onadversarial examples for more low-level languageprocessing tasks, although one can mentionmorphological tagging (Heigold et al., 2018) andspelling correction (Sakaguchi et al., 2017).

5.5 Coherence and PerturbationMeasurement

In adversarial image examples, it is fairlystraightforward to measure the perturbation,either by measuring distance in pixel space, say||x − x′|| under some norm, or with alternativemeasures that are better correlated with humanperception (Rozsa et al., 2016). It is also visuallycompelling to present an adversarial image withimperceptible difference from its source image.In the text domain, measuring distance is not asstraightforward, and even small changes to the textmay be perceptible by humans. Thus, evaluationof attacks is fairly tricky. Some studies imposedconstraints on adversarial examples to have asmall number of edit operations (Gao et al., 2018).Others ensured syntactic or semantic coherence indifferent ways, such as filtering replacements byword similarity or sentence similarity (Alzantotet al., 2018; Kuleshov et al., 2018), or by usingsynonyms and other word lists (Samanta andMehta, 2017; Yang et al., 2018).

Some reported whether a human can classifythe adversarial example correctly (Yang et al.,

2018), but this does not indicate how perceptiblethe changes are. More informative human stud-ies evaluate grammaticality or similarity of theadversarial examples to the original ones (Zhaoet al., 2018c; Alzantot et al., 2018). Given theinherent difficulty in generating imperceptiblechanges in text, more such evaluations are needed.

6 Explaining Predictions

Explaining specific predictions is recognized asa desideratum in intereptability work (Lipton,2016), argued to increase the accountability ofmachine learning systems (Doshi-Velez et al.,2017). However, explaining why a deep, highlynon-linear neural network makes a certain pre-diction is not trivial. One solution is to ask themodel to generate explanations along with itsprimary prediction (Zaidan et al., 2007; Zhanget al., 2016),15 but this approach requires manualannotations of explanations, which may be hardto collect.

An alternative approach is to use parts of theinput as explanations. For example, Lei et al.(2016) defined a generator that learns a distri-bution over text fragments as candidate ratio-nales for justifying predictions, evaluated onsentiment analysis. Alvarez-Melis and Jaakkola(2017) discovered input–output associations ina sequence-to-sequence learning scenario, byperturbing the input and finding the most relevantassociations. Gupta and Schutze (2018) inspectedhow information is accumulated in RNNs towardsa prediction, and associated peaks in predictionscores with important input segments. As thesemethods use input segments to explain predictions,they do not shed much light on the internalcomputations that take place in the network.

At present, despite the recognized importancefor interpretability, our ability to explain pre-dictions of neural networks in NLP is still limited.

7 Other Methods

We briefly mention here several analysis methodsthat do not fall neatly into the previous sections.

A number of studies evaluated the effectof erasing or masking certain neural networkcomponents, such as word embedding dimensions,hidden units, or even full words (Li et al., 2016b;

15Other work considered learning textual-visual expla-nations from multimodal annotations (Park et al., 2018).

58

Feng et al., 2018; Khandelwal et al., 2018;Bau et al., 2018). For example, Li et al.(2016b) erased specific dimensions in wordembeddings or hidden states and computed thechange in probability assigned to different labels.Their experiments revealed interesting differencesbetween word embedding models, where in somemodels information is more focused in individualdimensions. They also found that information ismore distributed in hidden layers than in the inputlayer, and erased entire words to find importantwords in a sentiment analysis task.

Several studies conducted behavioral experi-ments to interpret word embeddings by definingintrusion tasks, where humans need to identifyan intruder word, chosen based on differencein word embedding dimensions (Murphy et al.,2012; Fyshe et al., 2015; Faruqui et al., 2015).16

In this kind of work, a word embedding modelmay be deemed more interpretable if humans arebetter able to identify the intruding words. Sincethe evaluation is costly for high-dimensional rep-resentations, alternative automatic metrics wereconsidered (Park et al., 2017; Senel et al., 2018).

A long tradition in work on neural networksis to evaluate and analyze their ability to learndifferent formal languages (Das et al., 1992;Casey, 1996; Gers and Schmidhuber, 2001; Bodenand Wiles, 2002; Chalup and Blair, 2003). Thistrend continues today, with research into modernarchitectures and what formal languages theycan learn (Weiss et al., 2018; Bernardy, 2018;Suzgun et al., 2019), or the formal properties theypossess (Chen et al., 2018b).

8 Conclusion

Analyzing neural networks has become a hot topicin NLP research. This survey attempted to reviewand summarize as much of the current researchas possible, while organizing it along severalprominent themes. We have emphasized aspectsin analysis that are specific to language—namely,what linguistic information is captured in neuralnetworks, which phenomena they are successfulat capturing, and where they fail. Many of theanalysis methods are general techniques from thelarger machine learning community, such as

16The methodology follows earlier work on evaluating theinterpretability of probabilistic topic models with intrusiontasks (Chang et al., 2009).

visualization via saliency measures or evaluationby adversarial examples. But even those some-times require non-trivial adaptations to work withtext input. Some methods are more specific tothe field, but may prove useful in other domains.Challenge sets or test suites are such a case.

Throughout this survey, we have identifiedseveral limitations or gaps in current analysiswork:

• The use of auxiliary classification tasksfor identifying which linguistic propertiesneural networks capture has become standardpractice (Section 2), while lacking both atheoretical foundation and a better empiricalconsideration of the link between theauxiliary tasks and the original task.

• Evaluation of analysis work is often limitedor qualitative, especially in visualizationtechniques (Section 3). Newer forms of eval-uation are needed for determining the suc-cess of different methods.

• Relatively little work has been done onexplaining predictions of neural networkmodels, apart from providing visualizations(Section 6). With the increasing publicdemand for explaining algorithmic choicesin machine learning systems (Doshi-Velezand Kim, 2017; Doshi-Velez et al., 2017),there is pressing need for progress in thisdirection.

• Much of the analysis work is focused on theEnglish language, especially in constructingchallenge sets for various tasks (Section 4),with the exception of MT due to its inherentmultilingual character. Developing resourcesand evaluating methods on other languagesis important as the field grows and matures.

• More challenge sets for evaluating other tasksbesides NLI and MT are needed.

Finally, as with any survey in a rapidly evolv-ing field, this paper is likely to omit relevantrecent work by the time of publication. While weintend to continue updating the online appendixwith newer publications, we hope that our sum-marization of prominent analysis work and itscategorization into several themes will be a usefulguide for scholars interested in analyzing andunderstanding neural networks for NLP.

59

Acknowledgments

We would like to thank the anonymous review-ers and the action editor for their very helpfulcomments. This work was supported by theQatar Computing Research Institute. Y.B. is alsosupported by the Harvard Mind, Brain, BehaviorInitiative.

References

Yossi Adi, Einat Kermany, Yonatan Belinkov,Ofer Lavi, and Yoav Goldberg. 2017a. Anal-ysis of sentence embedding models usingprediction tasks in natural language processing.IBM Journal of Research and Development,61(4):3–9.

Yossi Adi, Einat Kermany, Yonatan Belinkov,Ofer Lavi, and Yoav Goldberg. 2017. Fine-Grained Analysis of Sentence EmbeddingsUsing Auxiliary Prediction Tasks. In Interna-tional Conference on Learning Representations(ICLR).

Roee Aharoni and Yoav Goldberg. 2017.Morphological Inflection Generation with HardMonotonic Attention. In Proceedings of the55th Annual Meeting of the Association forComputational Linguistics (Volume 1: LongPapers), pages 2004–2015. Association forComputational Linguistics.

Wasi Uddin Ahmad, Xueying Bai, ZhechaoHuang, Chao Jiang, Nanyun Peng, and Kai-WeiChang. 2018. Multi-task Learning for UniversalSentence Embeddings: A Thorough Evaluationusing Transfer and Auxiliary Tasks. arXivpreprint arXiv:1804.07911v2.

Afra Alishahi, Marie Barking, and GrzegorzChrupała. 2017. Encoding of phonology in arecurrent neural model of grounded speech.In Proceedings of the 21st Conference onComputational Natural Language Learning(CoNLL 2017), pages 368–378. Association forComputational Linguistics.

David Alvarez-Melis and Tommi Jaakkola.2017. A causal framework for explaining thepredictions of black-box sequence-to-sequencemodels. In Proceedings of the 2017 Conferenceon Empirical Methods in Natural Language

Processing, pages 412–421. Association forComputational Linguistics.

Moustafa Alzantot, Yash Sharma, AhmedElgohary, Bo-Jhang Ho, Mani Srivastava, andKai-Wei Chang. 2018. Generating NaturalLanguage Adversarial Examples. In Proceed-ings of the 2018 Conference on EmpiricalMethods in Natural Language Processing,pages 2890–2896. Association for Computa-tional Linguistics.

Leila Arras, Franziska Horn, Gregoire Montavon,Klaus-Robert Muller, and Wojciech Samek.2017a. ‘‘What is relevant in a text document?’’:An interpretable machine learning approach.PLOS ONE, 12(8):1–23.

Leila Arras, Gregoire Montavon, Klaus-RobertMuller, and Wojciech Samek. 2017b. Explain-ing Recurrent Neural Network Predictions inSentiment Analysis. In Proceedings of the8th Workshop on Computational Approachesto Subjectivity, Sentiment and Social MediaAnalysis, pages 159–168. Association forComputational Linguistics.

Mikel Artetxe, Gorka Labaka, Inigo Lopez-Gazpio, and Eneko Agirre. 2018. UncoveringDivergent Linguistic Information in WordEmbeddings with Lessons for Intrinsic andExtrinsic Evaluation. In Proceedings of the22nd Conference on Computational NaturalLanguage Learning, pages 282–291. Associa-tion for Computational Linguistics.

Malika Aubakirova and Mohit Bansal. 2016. Inter-preting Neural Networks to Improve PolitenessComprehension. In Proceedings of the 2016Conference on Empirical Methods in Natu-ral Language Processing, pages 2035–2041.Association for Computational Linguistics.

Dzmitry Bahdanau, Kyunghyun Cho, and YoshuaBengio. 2014. Neural Machine Translation byJointly Learning to Align and Translate. arXivpreprint arXiv:1409.0473v7.

Anthony Bau, Yonatan Belinkov, Hassan Sajjad,Nadir Durrani, Fahim Dalvi, and James Glass.2018. Identifying and Controlling ImportantNeurons in Neural Machine Translation. arXivpreprint arXiv:1811.01157v1.

60

Rachel Bawden, Rico Sennrich, Alexandra Birch,and Barry Haddow. 2018. Evaluating DiscoursePhenomena in Neural Machine Translation. InProceedings of the 2018 Conference of theNorth American Chapter of the Associationfor Computational Linguistics: Human Lan-guage Technologies, Volume 1 (Long Papers),pages 1304–1313. Association for Computa-tional Linguistics.

Yonatan Belinkov. 2018. On Internal LanguageRepresentations in Deep Learning: An Analy-sis of Machine Translation and Speech Recog-nition. Ph.D. thesis, Massachusetts Institute ofTechnology.

Yonatan Belinkov and Yonatan Bisk. 2018. Syn-thetic and Natural Noise Both Break NeuralMachine Translation. In International Confer-ence on Learning Representations (ICLR).

Yonatan Belinkov, Nadir Durrani, Fahim Dalvi,Hassan Sajjad, and James Glass. 2017a.What do Neural Machine Translation ModelsLearn about Morphology? In Proceedings ofthe 55th Annual Meeting of the Associationfor Computational Linguistics (Volume 1:Long Papers), pages 861–872. Association forComputational Linguistics.

Yonatan Belinkov and James Glass. 2017, Anal-yzing Hidden Representations in End-to-EndAutomatic Speech Recognition Systems, I. Guyon,U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus,S. Vishwanathan, and R. Garnett, editors, Ad-vances in Neural Information Processing Sys-tems 30, pages 2441–2451. Curran Associates,Inc.

Yonatan Belinkov, Lluıs Marquez, Hassan Sajjad,Nadir Durrani, Fahim Dalvi, and James Glass.2017b. Evaluating Layers of Representation inNeural Machine Translation on Part-of-Speechand Semantic Tagging Tasks. In Proceedingsof the Eighth International Joint Conferenceon Natural Language Processing (Volume 1:Long Papers), pages 1–10. Asian Federation ofNatural Language Processing.

Jean-Philippe Bernardy. 2018. Can RecurrentNeural Networks Learn Nested Recursion?LiLT (Linguistic Issues in Language Tech-nology), 16(1).

Arianna Bisazza and Clara Tump. 2018. The LazyEncoder: A Fine-Grained Analysis of the Roleof Morphology in Neural Machine Translation.In Proceedings of the 2018 Conference onEmpirical Methods in Natural LanguageProcessing, pages 2871–2876. Association forComputational Linguistics.

Terra Blevins, Omer Levy, and Luke Zettlemoyer.2018. Deep RNNs Encode Soft Hierarchi-cal Syntax. In Proceedings of the 56th AnnualMeeting of the Association for Computa-tional Linguistics (Volume 2: Short Papers),pages 14–19. Association for ComputationalLinguistics.

Mikael Boden and Janet Wiles. 2002. On learningcontext-free and context-sensitive languages.IEEE Transactions on Neural Networks, 13(2):491–493.

Samuel R. Bowman, Gabor Angeli, ChristopherPotts, and Christopher D. Manning. 2015. Alarge annotated corpus for learning naturallanguage inference. In Proceedings of the2015 Conference on Empirical Methods inNatural Language Processing, pages 632–642.Association for Computational Linguistics.

Elia Bruni, Gemma Boleda, Marco Baroni,and Nam Khanh Tran. 2012. DistributionalSemantics in Technicolor. In Proceedings ofthe 50th Annual Meeting of the Associationfor Computational Linguistics (Volume 1:Long Papers), pages 136–145. Association forComputational Linguistics.

Gino Brunner, Yuyi Wang, Roger Wattenhofer,and Michael Weigelt. 2017. Natural LanguageMultitasking: Analyzing and Improving Syn-tactic Saliency of Hidden Representations. The31st Annual Conference on Neural InformationProcessing (NIPS)—Workshop on LearningDisentangled Features: From Perception toControl.

Aljoscha Burchardt, Vivien Macketanz, JonDehdari, Georg Heigold, Jan-Thorsten Peter,and Philip Williams. 2017. A LinguisticEvaluation of Rule-Based, Phrase-Based, andNeural MT Engines. The Prague Bulletin ofMathematical Linguistics, 108(1):159–170.

Franck Burlot and Francois Yvon. 2017.Evaluating the morphological competence of

61

Machine Translation Systems. In Proceedingsof the Second Conference on Machine Trans-lation, pages 43–55. Association for Compu-tational Linguistics.

Mike Casey. 1996. The Dynamics of Discrete-Time Computation, with Application to Re-current Neural Networks and Finite StateMachine Extraction. Neural Computation,8(6):1135–1178.

Daniel Cer, Mona Diab, Eneko Agirre, InigoLopez-Gazpio, and Lucia Specia. 2017.SemEval-2017 Task 1: Semantic Textual Sim-ilarity Multilingual and Crosslingual FocusedEvaluation. In Proceedings of the 11th Inter-national Workshop on Semantic Evaluation(SemEval-2017), pages 1–14. Association forComputational Linguistics.

Rahma Chaabouni, Ewan Dunbar, Neil Zeghidour,and Emmanuel Dupoux. 2017. Learning weaklysupervised multimodal phoneme embeddings.In Interspeech 2017.

Stephan K. Chalup and Alan D. Blair. 2003.Incremental Training of First Order RecurrentNeural Networks to Predict a Context-SensitiveLanguage. Neural Networks, 16(7):955–972.

Jonathan Chang, Sean Gerrish, Chong Wang,Jordan L. Boyd-graber, and David M. Blei.2009, Reading Tea Leaves: How Humans Inter-pret Topic Models, Y. Bengio, D. Schuurmans,J. D. Lafferty, C. K. I. Williams, and A. Culotta,editors, Advances in Neural Information Pro-cessing Systems 22, pages 288–296, CurranAssociates, Inc..

Hongge Chen, Huan Zhang, Pin-Yu Chen, JinfengYi, and Cho-Jui Hsieh. 2018a. Attacking visuallanguage grounding with adversarial examples:A case study on neural image captioning. InProceedings of the 56th Annual Meeting ofthe Association for Computational Linguistics(Volume 1: Long Papers), pages 2587–2597.Association for Computational Linguistics.

Xinchi Chen, Xipeng Qiu, Chenxi Zhu, Shiyu Wu,and Xuanjing Huang. 2015. Sentence Modelingwith Gated Recursive Neural Network. In Proc-eedings of the 2015 Conference on EmpiricalMethods in Natural Language Processing,pages 793–798. Association for ComputationalLinguistics.

Yining Chen, Sorcha Gilroy, Andreas Maletti,Jonathan May, and Kevin Knight. 2018b.Recurrent Neural Networks as WeightedLanguage Recognizers. In Proceedings ofthe 2018 Conference of the North AmericanChapter of the Association for ComputationalLinguistics: Human Language Technologies,Volume 1 (Long Papers), pages 2261–2271.Association for Computational Linguistics.

Minhao Cheng, Jinfeng Yi, Huan Zhang, Pin-YuChen, and Cho-Jui Hsieh. 2018. Seq2Sick:Evaluating the Robustness of Sequence-to-Sequence Models with Adversarial Examples.arXiv preprint arXiv:1803.01128v1.

Grzegorz Chrupała, Lieke Gelderloos, and AfraAlishahi. 2017. Representations of language ina model of visually grounded speech signal.In Proceedings of the 55th Annual Meeting ofthe Association for Computational Linguistics(Volume 1: Long Papers), pages 613–622.Association for Computational Linguistics.

Ondrej Cıfka and Ondrej Bojar. 2018. Are BLEUand Meaning Representation in Opposition? InProceedings of the 56th Annual Meeting ofthe Association for Computational Linguistics(Volume 1: Long Papers), pages 1362–1371.Association for Computational Linguistics.

Alexis Conneau, German Kruszewski, GuillaumeLample, Loıc Barrault, and Marco Baroni.2018. What you can cram into a single$&!#* vector: Probing sentence embeddingsfor linguistic properties. In Proceedings of the56th Annual Meeting of the Association forComputational Linguistics (Volume 1: LongPapers), pages 2126–2136. Association forComputational Linguistics.

Robin Cooper, Dick Crouch, Jan van Eijck, ChrisFox, Josef van Genabith, Jan Jaspars, HansKamp, David Milward, Manfred Pinkal,Massimo Poesio, Steve Pulman, Ted Briscoe,Holger Maier, and Karsten Konrad. 1996, Usingthe framework. Technical report, The FraCaSConsortium.

Fahim Dalvi, Nadir Durrani, Hassan Sajjad,Yonatan Belinkov, D. Anthony Bau, and JamesGlass. 2019a, January. What Is One Grainof Sand in the Desert? Analyzing IndividualNeurons in Deep NLP Models. In Proceedings

62

of the Thirty-Third AAAI Conference onArtificial Intelligence (AAAI).

Fahim Dalvi, Nadir Durrani, Hassan Sajjad,Yonatan Belinkov, and Stephan Vogel. 2017.Understanding and Improving MorphologicalLearning in the Neural Machine Transla-tion Decoder. In Proceedings of the EighthInternational Joint Conference on Natural Lan-guage Processing (Volume 1: Long Papers),pages 142–151. Asian Federation of NaturalLanguage Processing.

Fahim Dalvi, Avery Nortonsmith, D. AnthonyBau, Yonatan Belinkov, Hassan Sajjad, NadirDurrani, and James Glass. 2019b, January.NeuroX: A Toolkit for Analyzing IndividualNeurons in Neural Networks. In Proceedings ofthe Thirty-Third AAAI Conference on ArtificialIntelligence (AAAI): Demonstrations Track.

Sreerupa Das, C. Lee Giles, and Guo-ZhengSun. 1992. Learning Context-Free Grammars:Capabilities and Limitations of a RecurrentNeural Network with an External StackMemory. In Proceedings of The FourteenthAnnual Conference of Cognitive ScienceSociety. Indiana University, page 14.

Ishita Dasgupta, Demi Guo, Andreas Stuhlmuller,Samuel J. Gershman, and Noah D. Goodman.2018. Evaluating Compositionality in Sen-tence Embeddings. arXiv preprint arXiv:1802.04302v2.

Dhanush Dharmaretnam and Alona Fyshe.2018. The Emergence of Semantics inNeural Network Representations of VisualInformation. In Proceedings of the 2018Conference of the North American Chapter ofthe Association for Computational Linguistics:Human Language Technologies, Volume 2(Short Papers), pages 776–780. Association forComputational Linguistics.

Yanzhuo Ding, Yang Liu, Huanbo Luan,and Maosong Sun. 2017. Visualizing andUnderstanding Neural Machine Translation. InProceedings of the 55th Annual Meeting ofthe Association for Computational Linguistics(Volume 1: Long Papers), pages 1150–1159.Association for Computational Linguistics.

Finale Doshi-Velez and Been Kim. 2017.Towards a Rigorous Science of Interpretable

Machine Learning. In arXiv preprint arXiv:1702.08608v2.

Finale Doshi-Velez, Mason Kortz, RyanBudish, Chris Bavitz, Sam Gershman, DavidO’Brien, Stuart Shieber, James Waldo, DavidWeinberger, and Alexandra Wood. 2017.Accountability of AI Under the Law: TheRole of Explanation. Privacy Law ScholarsConference.

Jennifer Drexler and James Glass. 2017. Analysisof Audio-Visual Features for UnsupervisedSpeech Recognition. In International Workshopon Grounding Language Understanding.

Javid Ebrahimi, Daniel Lowd, and DejingDou. 2018a. On Adversarial Examples forCharacter-Level Neural Machine Translation.In Proceedings of the 27th InternationalConference on Computational Linguistics,pages 653–663. Association for ComputationalLinguistics.

Javid Ebrahimi, Anyi Rao, Daniel Lowd, andDejing Dou. 2018b. HotFlip: White-BoxAdversarial Examples for Text Classification.In Proceedings of the 56th Annual Meeting ofthe Association for Computational Linguistics(Volume 2: Short Papers), pages 31–36.Association for Computational Linguistics.

Ali Elkahky, Kellie Webster, Daniel Andor,and Emily Pitler. 2018. A Challenge Setand Methods for Noun-Verb Ambiguity. InProceedings of the 2018 Conference onEmpirical Methods in Natural LanguageProcessing, pages 2562–2572. Association forComputational Linguistics.

Zied Elloumi, Laurent Besacier, Olivier Galibert,and Benjamin Lecouteux. 2018. AnalyzingLearned Representations of a Deep ASRPerformance Prediction Model. In Proceedingsof the 2018 EMNLP Workshop BlackboxNLP:Analyzing and Interpreting Neural Networksfor NLP, pages 9–15. Association forComputational Linguistics.

Jeffrey L. Elman. 1989. Representation andStructure in Connectionist Models, Universityof California, San Diego, Center for Researchin Language.

63

Jeffrey L. Elman. 1990. Finding Structure inTime. Cognitive Science, 14(2):179–211.

Jeffrey L. Elman. 1991. Distributed represen-tations, simple recurrent networks, and gram-matical structure. Machine Learning, 7(2–3):195–225.

Allyson Ettinger, Ahmed Elgohary, and PhilipResnik. 2016. Probing for semantic evidence ofcomposition by means of simple classificationtasks. In Proceedings of the 1st Workshopon Evaluating Vector-Space Representationsfor NLP, pages 134–139. Association forComputational Linguistics.

Manaal Faruqui, Yulia Tsvetkov, PushpendreRastogi, and Chris Dyer. 2016. ProblemsWith Evaluation of Word Embeddings UsingWord Similarity Tasks. In Proceedings of the1st Workshop on Evaluating Vector SpaceRepresentations for NLP.

Manaal Faruqui, Yulia Tsvetkov, Dani Yogatama,Chris Dyer, and Noah A. Smith. 2015. SparseOvercomplete Word Vector Representations.In Proceedings of the 53rd Annual Meeting ofthe Association for Computational Linguisticsand the 7th International Joint Conferenceon Natural Language Processing (Volume 1:Long Papers), pages 1491–1500. Associationfor Computational Linguistics.

Shi Feng, Eric Wallace, Alvin Grissom II,Mohit Iyyer, Pedro Rodriguez, and JordanBoyd-Graber. 2018. Pathologies of NeuralModels Make Interpretations Difficult. InProceedings of the 2018 Conference onEmpirical Methods in Natural LanguageProcessing, pages 3719–3728. Association forComputational Linguistics.

Lev Finkelstein, Evgeniy Gabrilovich, YossiMatias, Ehud Rivlin, Zach Solan, GadiWolfman, and Eytan Ruppin. 2002. PlacingSearch in Context: The Concept Revisited.ACM Transactions on Information Systems,20(1):116–131.

Robert Frank, Donald Mathis, and WilliamBadecker. 2013. The Acquisition of Anaphoraby Simple Recurrent Networks. LanguageAcquisition, 20(3):181–227.

Cynthia Freeman, Jonathan Merriman, AbhinavAggarwal, Ian Beaver, and Abdullah Mueen.2018. Paying Attention to Attention: High-lighting Influential Samples in SequentialAnalysis. arXiv preprint arXiv:1808.02113v1.

Alona Fyshe, Leila Wehbe, Partha P. Talukdar,Brian Murphy, and Tom M. Mitchell. 2015.A Compositional and Interpretable SemanticSpace. In Proceedings of the 2015 Conferenceof the North American Chapter of theAssociation for Computational Linguistics:Human Language Technologies, pages 32–41.Association for Computational Linguistics.

David Gaddy, Mitchell Stern, and Dan Klein.2018. What’s Going On in Neural ConstituencyParsers? An Analysis. In Proceedings ofthe 2018 Conference of the North AmericanChapter of the Association for ComputationalLinguistics: Human Language Technologies,Volume 1 (Long Papers), pages 999–1010.Association for Computational Linguistics.

J. Ganesh, Manish Gupta, and VasudevaVarma. 2017. Interpretation of SemanticTweet Representations. In Proceedings of the2017 IEEE/ACM International Conference onAdvances in Social Networks Analysis andMining 2017, ASONAM ’17, pages 95–102,New York, NY, USA. ACM.

Ji Gao, Jack Lanchantin, Mary Lou Soffa, andYanjun Qi. 2018. Black-box Generation ofAdversarial Text Sequences to Evade DeepLearning Classifiers. arXiv preprint arXiv:1801.04354v5.

Lieke Gelderloos and Grzegorz Chrupała. 2016.From phonemes to images: Levels of repre-sentation in a recurrent neural model of visually-grounded language learning. In Proceedingsof COLING 2016, the 26th InternationalConference on Computational Linguistics:Technical Papers, pages 1309–1319, Osaka,Japan, The COLING 2016 OrganizingCommittee.

Felix A. Gers and Jurgen Schmidhuber. 2001.LSTM Recurrent Networks Learn SimpleContext-Free and Context-Sensitive Languages.IEEE Transactions on Neural Networks, 12(6):1333–1340.

64

Daniela Gerz, Ivan Vulic, Felix Hill, Roi Reichart,and Anna Korhonen. 2016. SimVerb-3500: ALarge-Scale Evaluation Set of Verb Similarity.In Proceedings of the 2016 Conference onEmpirical Methods in Natural LanguageProcessing, pages 2173–2182. Association forComputational Linguistics.

Hamidreza Ghader and Christof Monz. 2017.What does Attention in Neural MachineTranslation Pay Attention to? In Proceedingsof the Eighth International Joint Conferenceon Natural Language Processing (Volume 1:Long Papers), pages 30–39. Asian Federationof Natural Language Processing.

Reza Ghaeini, Xiaoli Fern, and Prasad Tadepalli.2018. Interpreting Recurrent and Attention-Based Neural Models: A Case Study on Nat-ural Language Inference. In Proceedings of the2018 Conference on Empirical Methods in Nat-ural Language Processing, pages 4952–4957.Association for Computational Linguistics.

Mario Giulianelli, Jack Harding, Florian Mohnert,Dieuwke Hupkes, and Willem Zuidema. 2018.Under the Hood: Using Diagnostic Classifiersto Investigate and Improve How LanguageModels Track Agreement Information. InProceedings of the 2018 EMNLP WorkshopBlackboxNLP: Analyzing and InterpretingNeural Networks for NLP, pages 240–248.Association for Computational Linguistics.

Max Glockner, Vered Shwartz, and YoavGoldberg. 2018. Breaking NLI Systems withSentences that Require Simple Lexical Infer-ences. In Proceedings of the 56th AnnualMeeting of the Association for Computa-tional Linguistics (Volume 2: Short Papers),pages 650–655. Association for ComputationalLinguistics.

Frederic Godin, Kris Demuynck, Joni Dambre,Wesley De Neve, and Thomas Demeester.2018. Explaining Character-Aware Neural Net-works for Word-Level Prediction: Do They Dis-cover Linguistic Rules? In Proceedings of the2018 Conference on Empirical Methods in Nat-ural Language Processing, pages 3275–3284.Association for Computational Linguistics.

Yoav Goldberg. 2017. Neural Network methodsfor Natural Language Processing, volume 10

of Synthesis Lectures on Human LanguageTechnologies. Morgan & Claypool Publishers.

Ian Goodfellow, Yoshua Bengio, and AaronCourville. 2016. Deep Learning, MIT Press.http://www.deepleaningbook.org.

Ian Goodfellow, Jean Pouget-Abadie, MehdiMirza, Bing Xu, David Warde-Farley, SherjilOzair, Aaron Courville, and Yoshua Bengio.2014. Generative Adversarial Nets. In Advancesin Neural Information Processing Systems,pages 2672–2680.

Ian J. Goodfellow, Jonathon Shlens, and ChristianSzegedy. 2015. Explaining and HarnessingAdversarial Examples. In International Con-ference on Learning Representations (ICLR).

Kristina Gulordava, Piotr Bojanowski, EdouardGrave, Tal Linzen, and Marco Baroni. 2018.Colorless Green Recurrent Networks DreamHierarchically. In Proceedings of the 2018Conference of the North American Chapter ofthe Association for Computational Linguistics:Human Language Technologies, Volume 1(Long Papers), pages 1195–1205. Associationfor Computational Linguistics.

Abhijeet Gupta, Gemma Boleda, Marco Baroni,and Sebastian Pado. 2015. Distributionalvectors encode referential attributes. In Pro-ceedings of the 2015 Conference on EmpiricalMethods in Natural Language Processing,pages 12–21. Association for ComputationalLinguistics.

Pankaj Gupta and Hinrich Schutze. 2018. LISA:Explaining Recurrent Neural Network Judg-ments via Layer-wIse Semantic Accumulationand Example to Pattern Transformation. InProceedings of the 2018 EMNLP Work-shop BlackboxNLP: Analyzing and InterpretingNeural Networks for NLP, pages 154–164.Association for Computational Linguistics.

Suchin Gururangan, Swabha Swayamdipta, OmerLevy, Roy Schwartz, Samuel Bowman, andNoah A. Smith. 2018. Annotation Artifactsin Natural Language Inference Data. InProceedings of the 2018 Conference of theNorth American Chapter of the Association forComputational Linguistics: Human LanguageTechnologies, Volume 2 (Short Papers),

65

http://www.deepleaningbook.org

pages 107–112. Association for ComputationalLinguistics.

Catherine L. Harris. 1990. Connectionism andCognitive Linguistics. Connection Science,2(1–2):7–33.

David Harwath and James Glass. 2017. LearningWord-Like Units from Joint Audio-Visual Ana-lysis. In Proceedings of the 55th Annual Meetingof the Association for Computational Lin-guistics (Volume 1: Long Papers), pages 506–517.Association for Computational Linguistics.

Georg Heigold, Gunter Neumann, and Josefvan Genabith. 2018. How Robust AreCharacter-Based Word Embeddings in Taggingand MT Against Wrod Scramlbing or RanddmNouse? In Proceedings of the 13th Conferenceof The Association for Machine Translationin the Americas (Volume 1: Research Track),pages 68–79.

Felix Hill, Roi Reichart, and Anna Korhonen.2015. SimLex-999: Evaluating SemanticModels with (Genuine) Similarity Estimation.Computational Linguistics, 41(4):665–695.

Dieuwke Hupkes, Sara Veldhoen, and WillemZuidema. 2018. Visualisation and ‘‘diagnosticclassifiers’’ reveal how recurrent and recursiveneural networks process hierarchical structure.Journal of Artificial Intelligence Research,61:907–926.

Pierre Isabelle, Colin Cherry, and George Foster.2017. A Challenge Set Approach to Eval-uating Machine Translation. In Proceed-ings of the 2017 Conference on EmpiricalMethods in Natural Language Processing,pages 2486–2496. Association for Computa-tional Linguistics.

Pierre Isabelle and Roland Kuhn. 2018. A Chal-lenge Set for French–> English Machine Trans-lation. arXiv preprint arXiv:1806.02725v2.

Hitoshi Isahara. 1995. JEIDA’s test-sets forquality evaluation of MT systems—technicalevaluation from the developer’s point of view.In Proceedings of MT Summit V .

Mohit Iyyer, John Wieting, Kevin Gimpel, andLuke Zettlemoyer. 2018. Adversarial Exam-ple Generation with Syntactically Controlled

Paraphrase Networks. In Proceedings of the2018 Conference of the North AmericanChapter of the Association for ComputationalLinguistics: Human Language Technologies,Volume 1 (Long Papers), pages 1875–1885.Association for Computational Linguistics.

Alon Jacovi, Oren Sar Shalom, and YoavGoldberg. 2018. Understanding ConvolutionalNeural Networks for Text Classification. InProceedings of the 2018 EMNLP WorkshopBlackboxNLP: Analyzing and InterpretingNeural Networks for NLP, pages 56–65.Association for Computational Linguistics.

Inigo Jauregi Unanue, Ehsan Zare Borzeshi, andMassimo Piccardi. 2018. A Shared Atten-tion Mechanism for Interpretation of NeuralAutomatic Post-Editing Systems. In Proceed-ings of the 2nd Workshop on Neural MachineTranslation and Generation, pages 11–17.Association for Computational Linguistics.

Robin Jia and Percy Liang. 2017. Adversarialexamples for evaluating reading comprehensionsystems. In Proceedings of the 2017 Conferenceon Empirical Methods in Natural LanguageProcessing, pages 2021–2031. Association forComputational Linguistics.

Rafal Jozefowicz, Oriol Vinyals, Mike Schuster,Noam Shazeer, and Yonghui Wu. 2016.Exploring the Limits of Language Modeling.arXiv preprint arXiv:1602.02410v2.

Akos Kadar, Grzegorz Chrupała, and AfraAlishahi. 2017. Representation of Lin-guistic Form and Function in RecurrentNeural Networks. Computational Linguistics,43(4):761–780.

Andrej Karpathy, Justin Johnson, and Fei-Fei Li.2015. Visualizing and Understanding RecurrentNetworks. arXiv preprint arXiv:1506.02078v2.

Urvashi Khandelwal, He He, Peng Qi, and DanJurafsky. 2018. Sharp Nearby, Fuzzy Far Away:How Neural Language Models Use Context. InProceedings of the 56th Annual Meeting ofthe Association for Computational Linguistics(Volume 1: Long Papers), pages 284–294.Association for Computational Linguistics.

Margaret King and Kirsten Falkedal. 1990.Using Test Suites in Evaluation of Machine

66

Translation Systems. In COLNG 1990 Volume 2:Papers Presented to the 13th InternationalConference on Computational Linguistics.

Eliyahu Kiperwasser and Yoav Goldberg. 2016.Simple and Accurate Dependency ParsingUsing Bidirectional LSTM Feature Represen-tations. Transactions of the Association forComputational Linguistics, 4:313–327.

Sungryong Koh, Jinee Maeng, Ji-Young Lee,Young-Sook Chae, and Key-Sun Choi. 2001. Atest suite for evaluation of English-to-Koreanmachine translation systems. In MT SummitConference.

Arne Kohn. 2015. What’s in an Embedding?Analyzing Word Embeddings through Multi-lingual Evaluation. In Proceedings of the 2015Conference on Empirical Methods in Natu-ral Language Processing, pages 2067–2073,Lisbon, Portugal. Association for Computa-tional Linguistics.

Volodymyr Kuleshov, Shantanu Thakoor,Tingfung Lau, and Stefano Ermon. 2018.Adversarial Examples for Natural LanguageClassification Problems.

Brenden Lake and Marco Baroni. 2018.Generalization without Systematicity: On theCompositional Skills of Sequence-to-SequenceRecurrent Networks. In Proceedings of the35th International Conference on MachineLearning, volume 80 of Proceedings of Ma-chine Learning Research, pages 2873–2882,Stockholmsmassan, Stockholm, Sweden. PMLR.

Sabine Lehmann, Stephan Oepen, Sylvie Regnier-Prost, Klaus Netter, Veronika Lux, JudithKlein, Kirsten Falkedal, Frederik Fouvry,Dominique Estival, Eva Dauphin, HerveCompagnion, Judith Baur, Lorna Balkan, andDoug Arnold. 1996. TSNLP—Test Suites forNatural Language Processing. In COLING 1996Volume 2: The 16th International Conferenceon Computational Linguistics.

Tao Lei, Regina Barzilay, and Tommi Jaakkola.2016. Rationalizing Neural Predictions. InProceedings of the 2016 Conference onEmpirical Methods in Natural LanguageProcessing, pages 107–117. Association forComputational Linguistics.

Ira Leviant and Roi Reichart. 2015. Separated byan Un-Common Language: Towards JudgmentLanguage Informed Vector Space Modeling.arXiv preprint arXiv:1508.00106v5.

Jiwei Li, Xinlei Chen, Eduard Hovy, andDan Jurafsky. 2016a. Visualizing and Under-standing Neural Models in NLP. In Proceedingsof the 2016 Conference of the North AmericanChapter of the Association for ComputationalLinguistics: Human Language Technologies,pages 681–691. Association for ComputationalLinguistics.

Jiwei Li, Will Monroe, and Dan Jurafsky. 2016b.Understanding Neural Networks throughRepresentation Erasure. arXiv preprint arXiv:1612.08220v3.

Bin Liang, Hongcheng Li, Miaoqiang Su,Pan Bian, Xirong Li, and Wenchang Shi.2018. Deep Text Classification Can BeFooled. In Proceedings of the Twenty-SeventhInternational Joint Conference on ArtificialIntelligence, IJCAI-18, pages 4208–4215.International Joint Conferences on ArtificialIntelligence Organization.

Tal Linzen, Emmanuel Dupoux, and YoavGoldberg. 2016. Assessing the Ability ofLSTMs to Learn Syntax-Sensitive Depen-dencies. Transactions of the Association forComputational Linguistics, 4:521–535.

Zachary C. Lipton. 2016. The Mythos of ModelInterpretability. In ICML Workshop on HumanInterpretability of Machine Learning.

Nelson F. Liu, Omer Levy, Roy Schwartz,Chenhao Tan, and Noah A. Smith. 2018.LSTMs Exploit Linguistic Attributes of Data.In Proceedings of The Third Workshop on Rep-resentation Learning for NLP, pages 180–186.Association for Computational Linguistics.

Yanpei Liu, Xinyun Chen, Chang Liu, andDawn Song. 2017. Delving into TransferableAdversarial Examples and Black-Box Attacks.In International Conference on LearningRepresentations (ICLR).

Thang Luong, Richard Socher, and ChristopherManning. 2013. Better Word Representationswith Recursive Neural Networks for Mor-phology. In Proceedings of the Seventeenth

67

Conference on Computational Natural Lan-guage Learning, pages 104–113. Associationfor Computational Linguistics.

Jean Maillard and Stephen Clark. 2018. LatentTree Learning with Differentiable Parsers:Shift-Reduce Parsing and Chart Parsing. InProceedings of the Workshop on the Relevanceof Linguistic Structure in Neural Architecturesfor NLP, pages 13–18. Association for Com-putational Linguistics.

Marco Marelli, Luisa Bentivogli, MarcoBaroni, Raffaella Bernardi, Stefano Menini,and Roberto Zamparelli. 2014. SemEval-2014 Task 1: Evaluation of CompositionalDistributional Semantic Models on Full Sen-tences through Semantic Relatedness andTextual Entailment. In Proceedings of the8th International Workshop on Semantic Eval-uation (SemEval 2014), pages 1–8. Associationfor Computational Linguistics.

R. Thomas McCoy, Robert Frank, and TalLinzen. 2018. Revisiting the poverty of thestimulus: Hierarchical generalization without ahierarchical bias in recurrent neural networks.In Proceedings of the 40th Annual Conferenceof the Cognitive Science Society.

Risto Miikkulainen and Michael G. Dyer. 1991.Natural Language Processing with Modular PdpNetworks and Distributed Lexicon. CognitiveScience, 15(3):343–399.

Tomas Mikolov, Martin Karafiat, Lukas Burget,Jan Cernocky, and Sanjeev Khudanpur. 2010.Recurrent neural network based languagemodel. In Eleventh Annual Conference ofthe International Speech CommunicationAssociation.

Yao Ming, Shaozu Cao, Ruixiang Zhang, ZhenLi, Yuanzhe Chen, Yangqiu Song, and HuaminQu. 2017. Understanding Hidden Memoriesof Recurrent Neural Networks. In IEEEConference on Visual Analytics Science andTechnology (IEEE VAST 2017).

Gregoire Montavon, Wojciech Samek, and Klaus-Robert Muller. 2018. Methods for interpretingand understanding deep neural networks.Digital Signal Processing, 73:1–15.

Pramod Kaushik Mudrakarta, Ankur Taly,Mukund Sundararajan, and Kedar Dhamdhere.2018. Did the Model Understand the Question?In Proceedings of the 56th Annual Meeting ofthe Association for Computational Linguistics(Volume 1: Long Papers), pages 1896–1906.Association for Computational Linguistics.

James Mullenbach, Sarah Wiegreffe, Jon Duke,Jimeng Sun, and Jacob Eisenstein. 2018.Explainable Prediction of Medical Codes fromClinical Text. In Proceedings of the 2018Conference of the North American Chapter ofthe Association for Computational Linguistics:Human Language Technologies, Volume 1(Long Papers), pages 1101–1111. Associationfor Computational Linguistics.

W. James Murdoch, Peter J. Liu, and Bin Yu.2018. Beyond Word Importance: ContextualDecomposition to Extract Interactions fromLSTMs. In International Conference onLearning Representations.

Brian Murphy, Partha Talukdar, and TomMitchell. 2012. Learning Effective andInterpretable Semantic Models Using Non-Negative Sparse Embedding. In Proceedingsof COLING 2012, pages 1933–1950. TheCOLING 2012 Organizing Committee.

Tasha Nagamine, Michael L. Seltzer, and NimaMesgarani. 2015. Exploring How Deep NeuralNetworks Form Phonemic Categories. InInterspeech 2015.

Tasha Nagamine, Michael L. Seltzer, andNima Mesgarani. 2016. On the Role of Non-linear Transformations in Deep Neural Net-work Acoustic Models. In Interspeech 2016,pages 803–807.

Aakanksha Naik, Abhilasha Ravichander,Norman Sadeh, Carolyn Rose, and GrahamNeubig. 2018. Stress Test Evaluation forNatural Language Inference. In Proceedingsof the 27th International Conference onComputational Linguistics, pages 2340–2353.Association for Computational Linguistics.

Nina Narodytska and Shiva Kasiviswanathan. 2017.Simple Black-Box Adversarial Attacks on DeepNeural Networks. In 2017 IEEE Conferenceon Computer Vision and Pattern RecognitionWorkshops (CVPRW), pages 1310–1318.

68

Lars Niklasson and Fredrik Linaker. 2000.Distributed representations for extended syn-tactic transformation. Connection Science,12(3–4):299–314.

Tong Niu and Mohit Bansal. 2018. AdversarialOver-Sensitivity and Over-Stability Strategiesfor Dialogue Models. In Proceedings of the22nd Conference on Computational NaturalLanguage Learning, pages 486–496. Associa-tion for Computational Linguistics.

Nicolas Papernot, Patrick McDaniel, and IanGoodfellow. 2016. Transferability in MachineLearning: From Phenomena to Black-BoxAttacks Using Adversarial Samples. arXivpreprint arXiv:1605.07277v1.

Nicolas Papernot, Patrick McDaniel, IanGoodfellow, Somesh Jha, Z. Berkay Celik,and Ananthram Swami. 2017. Practical Black-Box Attacks Against Machine Learning. InProceedings of the 2017 ACM on Asia Con-ference on Computer and CommunicationsSecurity, ASIA CCS ’17, pages 506–519,New York, NY, USA, ACM.

Nicolas Papernot, Patrick McDaniel, AnanthramSwami, and Richard Harang. 2016. CraftingAdversarial Input Sequences for RecurrentNeural Networks. In Military CommunicationsConference, MILCOM 2016, pages 49–54.IEEE.

Dong Huk Park, Lisa Anne Hendricks, ZeynepAkata, Anna Rohrbach, Bernt Schiele, TrevorDarrell, and Marcus Rohrbach. 2018. Multi-modal Explanations: Justifying Decisions andPointing to the Evidence. In The IEEE Con-ference on Computer Vision and PatternRecognition (CVPR).

Sungjoon Park, JinYeong Bak, and Alice Oh.2017. Rotated Word Vector Representationsand Their Interpretability. In Proceedings ofthe 2017 Conference on Empirical Methods inNatural Language Processing, pages 401–411.Association for Computational Linguistics.

Matthew Peters, Mark Neumann, LukeZettlemoyer, and Wen-tau Yih. 2018. Dissect-ing Contextual Word Embeddings: Architectureand Representation. In Proceedings of the 2018

Conference on Empirical Methods in Natu-ral Language Processing, pages 1499–1509.Association for Computational Linguistics.

Adam Poliak, Aparajita Haldar, Rachel Rudinger,J. Edward Hu, Ellie Pavlick, Aaron StevenWhite, and Benjamin Van Durme. 2018a. Col-lecting Diverse Natural Language InferenceProblems for Sentence Representation Evalu-ation. In Proceedings of the 2018 Conferenceon Empirical Methods in Natural LanguageProcessing, pages 67–81. Association forComputational Linguistics.

Adam Poliak, Jason Naradowsky, AparajitaHaldar, Rachel Rudinger, and Benjamin VanDurme. 2018. Hypothesis Only Baselines inNatural Language Inference. In Proceedingsof the Seventh Joint Conference on Lexicaland Computational Semantics, pages 180–191.Association for Computational Linguistics.

Jordan B. Pollack. 1990. Recursive distrib-uted representations. Artificial Intelligence,46(1):77–105.

Peng Qian, Xipeng Qiu, and Xuanjing Huang.2016a. Analyzing Linguistic Knowledge inSequential Model of Sentence. In Proceedingsof the 2016 Conference on Empirical Methodsin Natural Language Processing, pages 826–835,Austin, Texas. Association for ComputationalLinguistics.

Peng Qian, Xipeng Qiu, and Xuanjing Huang.2016b. Investigating Language Universal andSpecific Properties in Word Embeddings. InProceedings of the 54th Annual Meeting ofthe Association for Computational Linguistics(Volume 1: Long Papers), pages 1478–1488,Berlin, Germany. Association for Computa-tional Linguistics.

Marco Tulio Ribeiro, Sameer Singh, andCarlos Guestrin. 2018. Semantically EquivalentAdversarial Rules for Debugging NLP models.In Proceedings of the 56th Annual Meeting ofthe Association for Computational Linguistics(Volume 1: Long Papers), pages 856–865.Association for Computational Linguistics.

Matıss Rikters. 2018. Debugging Neural Ma-chine Translations. arXiv preprint arXiv:1808.02733v1.

69

Annette Rios Gonzales, Laura Mascarell, andRico Sennrich. 2017. Improving Word SenseDisambiguation in Neural Machine Translationwith Sense Embeddings. In Proceedings of theSecond Conference on Machine Translation,pages 11–19. Association for ComputationalLinguistics.

Tim Rocktaschel, Edward Grefenstette, KarlMoritz Hermann, Tomas Kocisky, and PhilBlunsom. 2016. Reasoning about Entailmentwith Neural Attention. In International Con-ference on Learning Representations (ICLR).

Andras Rozsa, Ethan M. Rudd, and Terrance E.Boult. 2016. Adversarial Diversity and HardPositive Generation. In Proceedings of the IEEEConference on Computer Vision and PatternRecognition Workshops, pages 25–32.

Rachel Rudinger, Jason Naradowsky, BrianLeonard, and Benjamin Van Durme. 2018.Gender Bias in Coreference Resolution. InProceedings of the 2018 Conference of theNorth American Chapter of the Association forComputational Linguistics: Human LanguageTechnologies, Volume 2 (Short Papers),pages 8–14. Association for ComputationalLinguistics.

D. E. Rumelhart and J. L. McClelland. 1986.Parallel Distributed Processing: Explorationsin the Microstructure of Cognition. volume 2,chapter On Leaning the Past Tenses of EnglishVerbs, pages 216–271. MIT Press, Cambridge,MA, USA.

Alexander M. Rush, Sumit Chopra, and JasonWeston. 2015. A Neural Attention Modelfor Abstractive Sentence Summarization. InProceedings of the 2015 Conference on Em-pirical Methods in Natural Language Pro-cessing, pages 379–389. Association forComputational Linguistics.

Keisuke Sakaguchi, Kevin Duh, Matt Post, andBenjamin Van Durme. 2017. Robsut WrodReocginiton via Semi-Character RecurrentNeural Network. In Proceedings of theThirty-First AAAI Conference on ArtificialIntelligence, February 4-9, 2017, San Francisco,California, USA., pages 3281–3287. AAAIPress.

Suranjana Samanta and Sameep Mehta. 2017.Towards Crafting Text Adversarial Samples.arXiv preprint arXiv:1707.02812v1.

Ivan Sanchez, Jeff Mitchell, and Sebastian Riedel.2018. Behavior Analysis of NLI Models:Uncovering the Influence of Three Factorson Robustness. In Proceedings of the 2018Conference of the North American Chapter ofthe Association for Computational Linguistics:Human Language Technologies, Volume 1(Long Papers), pages 1975–1985. Associationfor Computational Linguistics.

Motoki Sato, Jun Suzuki, Hiroyuki Shindo, andYuji Matsumoto. 2018. Interpretable Adversar-ial Perturbation in Input Embedding Space forText. In Proceedings of the Twenty-SeventhInternational Joint Conference on Artifi-cial Intelligence, IJCAI-18, pages 4323–4330.International Joint Conferences on ArtificialIntelligence Organization.

Lutfi Kerem Senel, Ihsan Utlu, Veysel Yucesoy,Aykut Koc, and Tolga Cukur. 2018. Se-mantic Structure and Interpretability of WordEmbeddings. IEEE/ACM Transactions onAudio, Speech, and Language Processing.

Rico Sennrich. 2017. How Grammatical IsCharacter-Level Neural Machine Translation?Assessing MT Quality with Contrastive Trans-lation Pairs. In Proceedings of the 15th Con-ference of the European Chapter of theAssociation for Computational Linguistics:Volume 2, Short Papers, pages 376–382.Association for Computational Linguistics.

Haoyue Shi, Jiayuan Mao, Tete Xiao, YuningJiang, and Jian Sun. 2018. Learning Visually-Grounded Semantics from Contrastive Adver-sarial Samples. In Proceedings of the 27thInternational Conference on ComputationalLinguistics, pages 3715–3727. Association forComputational Linguistics.

Xing Shi, Kevin Knight, and Deniz Yuret.2016a. Why Neural Translations are the RightLength. In Proceedings of the 2016 Conferenceon Empirical Methods in Natural LanguageProcessing, pages 2278–2282. Association forComputational Linguistics.

Xing Shi, Inkit Padhi, and Kevin Knight. 2016b.Does String-Based Neural MT Learn Source

70

Syntax? In Proceedings of the 2016 Conferenceon Empirical Methods in Natural LanguageProcessing, pages 1526–1534, Austin, Texas.Association for Computational Linguistics.

Chandan Singh, W. James Murdoch, and BinYu. 2018. Hierarchical interpretations forneural network predictions. arXiv preprintarXiv:1806.05337v1.

Hendrik Strobelt, Sebastian Gehrmann, MichaelBehrisch, Adam Perer, Hanspeter Pfister, andAlexander M. Rush. 2018a. Seq2Seq-Vis:A Visual Debugging Tool for Sequence-to-Sequence Models. arXiv preprint arXiv:1804.09299v1.

Hendrik Strobelt, Sebastian Gehrmann, HanspeterPfister, and Alexander M. Rush. 2018b.LSTMVis: A Tool for Visual Analysis ofHidden State Dynamics in Recurrent NeuralNetworks. IEEE Transactions on Visualizationand Computer Graphics, 24(1):667–676.

Mukund Sundararajan, Ankur Taly, and QiqiYan. 2017. Axiomatic Attribution for DeepNetworks. In Proceedings of the 34th Inter-national Conference on Machine Learning,Volume 70 of Proceedings of Machine Learn-ing Research, pages 3319–3328, InternationalConvention Centre, Sydney, Australia. PMLR.

Ilya Sutskever, Oriol Vinyals, and Quoc V. Le.2014. Sequence to Sequence Learning withNeural Networks. In Advances in neural infor-mation processing systems, pages 3104–3112.

Mirac Suzgun, Yonatan Belinkov, and Stuart M.Shieber. 2019. On Evaluating the Generaliza-tion of LSTM Models in Formal Languages.In Proceedings of the Society for Computationin Linguistics (SCiL).

Christian Szegedy, Wojciech Zaremba, IlyaSutskever, Joan Bruna, Dumitru Erhan, IanGoodfellow, and Rob Fergus. 2014. Intriguingproperties of neural networks. In InternationalConference on Learning Representations(ICLR).

Gongbo Tang, Rico Sennrich, and Joakim Nivre.2018. An Analysis of Attention Mechanisms:The Case of Word Sense Disambiguation inNeural Machine Translation. In Proceedings of

the Third Conference on Machine Translation:Research Papers, pages 26–35. Association forComputational Linguistics.

Yi Tay, Anh Tuan Luu, and Siu Cheung Hui.2018. CoupleNet: Paying Attention to Coupleswith Coupled Attention for Relationship Rec-ommendation. In Proceedings of the TwelfthInternational AAAI Conference on Web andSocial Media (ICWSM).

Ke Tran, Arianna Bisazza, and Christof Monz.2018. The Importance of Being Recurrent forModeling Hierarchical Structure. In Proceed-ings of the 2018 Conference on EmpiricalMethods in Natural Language Processing,pages 4731–4736. Association for Computa-tional Linguistics.

Eva Vanmassenhove, Jinhua Du, and Andy Way.2017. Investigating ‘‘Aspect’’ in NMT andSMT: Translating the English Simple Past andPresent Perfect. Computational Linguistics inthe Netherlands Journal, 7:109–128.

Sara Veldhoen, Dieuwke Hupkes, and WillemZuidema. 2016. Diagnostic Classifiers: Reveal-ing How Neural Networks Process HierarchicalStructure. In CEUR Workshop Proceedings.

Elena Voita, Pavel Serdyukov, Rico Sennrich,and Ivan Titov. 2018. Context-Aware NeuralMachine Translation Learns Anaphora Resolu-tion. In Proceedings of the 56th Annual Meetingof the Association for Computational Linguis-tics (Volume 1: Long Papers), pages 1264–1274.Association for Computational Linguistics.

Ekaterina Vylomova, Trevor Cohn, XuanliHe, and Gholamreza Haffari. 2016. WordRepresentation Models for MorphologicallyRich Languages in Neural Machine Translation.arXiv preprint arXiv:1606.04217v1.

Alex Wang, Amapreet Singh, Julian Michael,Felix Hill, Omer Levy, and Samuel R. Bowman.2018a. GLUE: A Multi-Task Benchmark andAnalysis Platform for Natural Language Under-standing. arXiv preprint arXiv:1804.07461v1.

Shuai Wang, Yanmin Qian, and Kai Yu. 2017a.What Does the Speaker Embedding Encode?In Interspeech 2017, pages 1497–1501.

71

Xinyi Wang, Hieu Pham, Pengcheng Yin, andGraham Neubig. 2018b. A Tree-Based Decoderfor Neural Machine Translation. In Conferenceon Empirical Methods in Natural LanguageProcessing (EMNLP). Brussels, Belgium.

Yu-Hsuan Wang, Cheng-Tao Chung, and Hung-yiLee. 2017b. Gate Activation Signal Analysisfor Gated Recurrent Neural Networks andIts Correlation with Phoneme Boundaries. InInterspeech 2017.

Gail Weiss, Yoav Goldberg, and Eran Yahav.2018. On the Practical Computational Powerof Finite Precision RNNs for LanguageRecognition. In Proceedings of the 56th AnnualMeeting of the Association for Computa-tional Linguistics (Volume 2: Short Papers),pages 740–745. Association for ComputationalLinguistics.

Adina Williams, Andrew Drozdov, and Samuel R.Bowman. 2018. Do latent tree learning modelsidentify meaningful structure in sentences?Transactions of the Association for Compu-tational Linguistics, 6:253–267.

Zhizheng Wu and Simon King. 2016. Inves-tigating gated recurrent networks for speechsynthesis. In 2016 IEEE International Con-ference on Acoustics, Speech and SignalProcessing (ICASSP), pages 5140–5144. IEEE.

Puyudi Yang, Jianbo Chen, Cho-Jui Hsieh, Jane-Ling Wang, and Michael I. Jordan. 2018.Greedy Attack and Gumbel Attack: GeneratingAdversarial Examples for Discrete Data. arXivpreprint arXiv:1805.12316v1.

Wenpeng Yin, Hinrich Schutze, Bing Xiang, andBowen Zhou. 2016. ABCNN: Attention-BasedConvolutional Neural Network for ModelingSentence Pairs. Transactions of the Associationfor Computational Linguistics, 4:259–272.

Xiaoyong Yuan, Pan He, Qile Zhu, and XiaolinLi. 2017. Adversarial Examples: Attacks andDefenses for Deep Learning. arXiv preprintarXiv:1712.07107v3.

Omar Zaidan, Jason Eisner, and ChristinePiatko. 2007. Using ‘‘Annotator Rationales’’to Improve Machine Learning for Text Cate-gorization. In Human Language Technologies2007: The Conference of the North AmericanChapter of the Association for Computa-tional Linguistics; Proceedings of the MainConference, pages 260–267. Association forComputational Linguistics.

Quan-shi Zhang and Song-chun Zhu. 2018.Visual interpretability for deep learning: Asurvey. Frontiers of Information Technology& Electronic Engineering, 19(1):27–39.

Ye Zhang, Iain Marshall, and Byron C. Wallace.2016. Rationale-Augmented ConvolutionalNeural Networks for Text Classification. InProceedings of the 2016 Conference onEmpirical Methods in Natural LanguageProcessing, pages 795–804. Association forComputational Linguistics.

Jieyu Zhao, Tianlu Wang, Mark Yatskar, VicenteOrdonez, and Kai-Wei Chang. 2018a. GenderBias in Coreference Resolution: Evaluation andDebiasing Methods. In Proceedings of the 2018Conference of the North American Chapter ofthe Association for Computational Linguistics:Human Language Technologies, Volume 2(Short Papers), pages 15–20. Association forComputational Linguistics.

Junbo Zhao, Yoon Kim, Kelly Zhang, AlexanderRush, and Yann LeCun. 2018b. AdversariallyRegularized Autoencoders. In Proceedings ofthe 35th International Conference on MachineLearning, Volume 80 of Proceedings of Ma-chine Learning Research, pages 5902–5911,Stockholmsmassan, Stockholm, Sweden. PMLR.

Zhengli Zhao, Dheeru Dua, and SameerSingh. 2018c. Generating Natural AdversarialExamples. In International Conference onLearning Representations.

72

Analysis Methods in Neural Language Processing: A Survey · on neural networks for language is beyond our scope.4 However, we mention here a few representative studies that focused

Documents