Information Retrieval and the Philosophy of Language

Information Retrieval and the Philosophy of Language

D. C. BLAIRThe University of Michigan, Ann Arbor, Michigan, 48109-1234

This discussion takes the position that information retrieval systems are fundamentally linguistic in nature - in essence,the languages of document representation and searching are dialects of natural language. Because of this, the disciplineof the Philosophy of Language should have some bearing on the problems of document representation and search queryformulation. The philosophies of Austin, Searle, Grice and Wittgenstein are briefly examined and their relevance toinformation retrieval theory is discussed.

Received December 1991

Information Retrieval systems are fundamentally linguis-tic: the content or context of documents must bedescribed, and the inquirers' needs for documents mustbe expressed. These descriptions and expressions aremost frequently articulated in free or controlled vocabu-laries that have some of the same characteristics asnatural language. As a consequence, the processes ofdocument description or request formulation must bestrongly related to the processes of description andinquiry in natural language. This is not an originalobservation, the fields of linguistics and formal logichave had an early and continuing impact on InformationRetrieval theory [see, e.g. Cooper, Sparck Jones, andmore recently, van Rijsbergen].10'2528 But recentlyInformation Retrieval theory has begun to be influencedby past and present theory in the Philosophy ofLanguage.513 The rationale for this shift is clear: themore we understand about how language works, thebetter we can understand how to describe and ask fordocuments. The Philosophy of Language, though adifficult, and at times obscure, discipline, has a lot to sayabout how language works. It is the purpose of thispaper to show how, first, Philosophy of Language hasalready helped Information Retrieval research, and,second, how it may continue to do so. To this end, thispaper will discuss three specific Philosophies of Languageand how they relate to Information Retrieval research:the theory of Illocutionary Acts of Austin and Searle;Grice's theory of conversation; and Wittgenstein'sOrdinary Language Philosophy.

Information Retrieval and Illocutionary Acts

Traditionally, the Philosophy of Language has beenprimarily concerned with the propositional content oflanguage - the nature of factual discourse. Most battlesin the Philosophy of Language have been waged over thegrounds of truth, that is, how it is that we can assert thatcertain things are, in fact, the case. Although theevaluation of the propositional content of language hashad an influence on the design of computerized in-formation systems - primarily ' fact retrieval' or quest-ion-answering systems — it has not had a significantinfluence on information retrieval design [an exception tothis is Cooper].10 But language is not solely concernedwith factual assertions. John Austin showed that therewas a class of linguistic acts that are not governedprimarily by truth conditions.2 In a large number of cases

language could be used to do things: that is, instead ofmerely describing what had occurred, language couldalso be used to make things happen; Austin called thisthe performative nature of language. I can say:

(1) a. I'll pay you right back.b. I name this ship the 'Norton Sound'.c. Finish the report before tomorrow's meeting.d. Bill's a better worker than Bob

When we say things like this (make a promise, christen aship, give an order, make an evaluation) we aren't somuch talking about something, we are in face doingsomething with our statement. If a reasonable individualin a normal situation promises to meet you tomorrow forlunch, then by virtue of that statement, he has made apromise. It would make no sense to ask, ' Did he reallymake a promise?' Of course, he could break his promise,but that would not change the fact that a promise hadbeen made. What makes the promise, or any otherperformative, 'work' is a set of'felicity' conditions-aset of normal circumstances - that are presupposed bythe performative act. For example, the individual mustbe the kind of person who can promise (e.g. an infantcan't promise), the circumstances must be appropriatefor the promise (e.g. I can't promise you something if Iam all alone and no one hears me), I can't promise whatis beyond my power (e.g. that I can make you the Dukeof Kent), etc. Performatives work within the broadcontext of personal and social conventions, and oftenrequire an institutional context in which to be successful.For example, only a minister can declare a man andwoman 'man and wife'; and this can only take placein an appropriate ceremony, with a prescribed numberof witnesses, following local and federal statutes thatspecify who is legally eligible to marry. Further, theman and woman must be willing to marry each other,etc.

Some illocutionary acts are very formal and quiteinstitutionally dependent (e.g. legal contracts, or orderswithin 'chain of command' in the military) while othersare less formal (e.g. ordering a drink at a bar, orpromising to have lunch with a friend). Nevertheless, themost important characteristic of illocutionary acts is thatthey, by virtue of being uttered in the appropriatecircumstances, cause something to happen (a promise tobe made, an order to be given, etc.).

200 THE COMPUTER JOURNAL, VOL. 35, NO. 3, 1992

Dow

nloaded from https://academ

ic.oup.com/com

jnl/article/35/3/200/525552 by guest on 10 August 2022

INFORMATION RETRIEVAL AND PHILOSOPHY OF LANGUAGE

A taxonomy of Illocutionary Acts

Although both Austin and Searle proposed that Illocu-tionary Acts fell into several specific categories, Searle'staxonomy, while similar to Austin's both in number andtype, has had the greater impact on recent linguistictheory and information system design (Searle calls theseacts 'Speech Acts', but the descriptions 'Speech Acts'and 'Illocutionary Acts' can be used interchangeably).The principal difference between Austin's and Searle'sclassification of Illocutionary Acts is, according toSearle,23 that Austin's classification is really a classifi-cation of illocutionary verbs, while Searle's is a classifica-tion of illocutionary acts. Further, while Austin wasuncomfortable grouping his illocutionary verbs intocategories, Searle has been quite insistent that hisclassification is definitive and final. In fact, Searle limitedthe number of different illocutionary acts in directresponse to Wittgenstein's insistence that there are'countless' kinds of sentences, or what he calledLanguage Games. While it is not clear that IllocutionaryActs and Language Games are the same kind ofphenomenon, Searle felt that they were similar enough totake Wittgenstein's claim about Language Games asbeing, ceteris paribus, applicable to Illocutionary Acts,too.

Briefly, according to Searle, there are the followingkinds of Illocutionary Acts:

• Assertives: in which we tell others (truly or falsely)how things are. E.g. 'Bill was at Mary's party lastFriday.'

• Directives: in which we attempt to get others to dothings. E.g. 'All hands on deck!'

• Commissives: in which we commit ourselves to doingspecific things. E.g. 'I'll send the report to youtomorrow.'

• Declarations: in which we bring about changes in ourworld by our utterance - in short, saying makes it so.E.g. 'I now pronounce you husband and wife.'

• Expressives: in which we express our personal feelingsand attitudes. E.g. 'You did a great job!'

Searle noticed that there is some overlap betweenAssertives and Declarations, and proposed a sixthcategory, or sub-category, called 'assertive declarations'.These are Declarations that are based on the assertionthat something is or is not the case. For example, whenthe referee in a basketball game cries ' Foul!', he declaresa penalty and asserts that some specific event hasoccurred. The difference between this category and'pure' declarations is that for the Assertive Declarationthe speaker can lie, but for the 'pure' declaration thespeaker cannot lie. The principal difference betweenAustin's and Searle's classification is that Austin has nocategory that corresponds to Searle's 'Assertives'. Thereare rough parallels between the other categories, though.

Illocutionary Acts and Information Retrieval

Blair5 has proposed a document indexing structure basedon Austin's taxonomy of Illocutionary acts, but thetheory of Illocutionary Acts has had its greatest influencein electronic messaging systems. Such systems, since theyusually involve the transmission of textual information,can be thought of as a special case of information

retrieval. The best-known application of IllocutionaryActs to electronic messaging is the COORDINATORsystem.12'30-31 The structure of the COORDINATOR isbased on the observation that messages are not alwaysindependent, but are often parts of' conversations' that,together, carry out some activity. If the 'activities' oflanguage can be placed under five well-defined Illocu-tionary Acts, as Searle believes, then, by inference,messages or documents can be similarly classified (ofcourse, a message or document may comprise more thanone Illocutionary Act, which means that such classifica-tion may not always be straightforward). What theCOORDINATOR does is to structure electronic mes-sages according to the Illocutionary Act under which itfalls (it also urges the sender to limit his/her messages tothose that perform a single Illocutionary Act). Forexample, if an individual makes a request, he identifieshis message as a ' request' and the system then imposes ageneral 'request structure' on that and all subsequentmessages that are part of this transaction. That is,individual A may make a request to B, who, in turn, maypromise to fulfil A's request, or make a counter-offer ofhow he, B, would prefer to fulfil A's request. A, then,could accept B's offer or make a counter-offer of his own,putting the onus on B to continue the dialogue. Ofcourse, at either time one or both of the individuals cancancel the dialogue, or conclude it. The COORDI-NATOR contains procedures to manage this interchangeefficiently. It will prompt the initiator of a request for a' respond by' date and/or a ' completion' date, and thenprompt the receiver of the request to reply or finish bythat date. It also organizes and presents the ongoing'conversations' (requests, promises, offers, what if, orquestions) in a structured format so that the individualusing the systems can see exactly how each of hisconversational activities stands in terms of completion orneed for further dialogue. An individual could bepresented with a screen on his computer/workstationthat summarizes both the responses due to him as well asthe responses he is required to make. It can also presenta complete history of all the messages exchanged in agiven 'conversation'.

A more ambitious application of Speech Act Theoryto electronic messaging is the work of Kimbrough et al.16

Kimbrough has been working on a Formal Language forBusiness Communication originally based on the illocu-tionary logic of Searle and Vanderveken.24 Kimbroughhas since abandoned illocutionary logic and has de-veloped a method for capturing the propositionalattitudes of Speech Act Theory in first-order predicatelogic.17 This modelling language has been appliedprimarily to electronic messaging systems.

While electronic messages are often thought of as'ephemeral' documents and are not routinely kept forany length of time, electronic messaging systems increas-ingly are being used to undertake transactions whichmay be important to keep. Consequently, it will benecessary to store the messages/documents created bythem in a more traditional information retrieval system.In that case, it would be useful if the documentrepresentations were to be based on the IllocutionaryActs that were used to structure them when they weremessages. That is, the taxonomy of Illocutionary Actsbecomes a classification scheme for the representation ofperformative messages, and the links that were made

THE COMPUTER JOURNAL, VOL. 35, NO. 3, 1992 201

Dow


ic.oup.com/com


D. C. BLAIR

with other messages that were used to perform the sameact become useful ways of clustering the messages whenthey are stored as documents. The description of thesubject content of these documents, as well as therecording of important contextual information (such asthe author's name, date the message was written, etc.)may still remain important parts of the documentrepresentation, but the taxonomy of Illocutionary Actsand the linking of documents according to their mutualparticipation in a particular activity are importantadditions to our theories of how to represent documentsfor retrieval.

Traditionally, Information Retrieval theory has beenapplied primarily to scholarly articles and related journalcommunication; but even this kind of organization canbe seen often as implicitly following certain types ofIllocutionary Acts. For example, the citations at the endof a scholarly article are often used as links to previousarticles that are, in the author's opinion, relevant tohis/her work. The inclusion of a citation in thebibliography of the article is, from an illocutionaryviewpoint, an ' assertive declaration'; that is, the authordeclares that the citation(s) are part of the bibliographyof the article, and asserts that the articles they refer to arerelevant to the citing article. It is also clear that scholarlyarticles routinely contain expressives (e.g. evaluations ofprevious research), directives (e.g. 'see' or 'see-also'references), and even commissives (e.g. 'This work will becontinued in a later paper').

Future applications of Illocutionary Acts

There are several ways in which the basic notion ofIllocutionary Acts can be expanded to assist in in-formation retrieval. In the first place, the application ofIllocutionary Acts as an organizing principle for in-formation retrieval should not be limited to Searle'staxonomy. Searle's rigid classification of IllocutionaryActs, it must be remembered, is primarily useful formaintaining distinctions important to the Philosophy ofLanguage. Such distinctions may not be as crucial forinformation retrieval. In fact, casual inspection quicklyshows that, for example, all promises are not the same. Apromise between two business associates to have lunch isclearly not the same as a promise (that is, a contract) bya company to provide specific products or services to aclient. The latter would be, at the very least, a bindinglegal obligation, while the former would not be. As aresult, if messages were used to carry out each of theabove activities, they would certainly require differentkinds of information to represent those documents. Forexample, to execute a business contract, specific docu-ments must be drawn up, and certain empoweredindividuals must authorize or witness the execution ofthem. Further, the messages that document a legalcontract may be used quite differently than a messagepromising to have lunch with someone. The documentsused to execute the contract would probably be used todemonstrate that the contract had been executed in goodfaith and according to proper, binding procedures. Themessage promising to have lunch might only be used toestablish that two individuals probably met at a particularpoint in time, but could not indicate much beyond that.Both could be important from a legal point of view, andboth are promises, in the strict sense; but the structure of

the acts which they perform differ markedly. Thisdifference would need to be taken into consideration inthe representational scheme of the information retrievalsystem that manages them. Similar variety could befound in the other categories of Illocutionary Acts(Assertives, Directives, Declarations and Expressives),and even among philosophers there is no consensus thatSearle's taxonomy is final (Bach and Harnish3 give adetailed description of this variety, breaking eachcategory down into more specific types). The originatorof Illocutionary Act theory, John Austin,2 believed thateven his own classification of such acts was very roughand not rigorous, and Wittgenstein,32"34 as we havementioned, believed that no such segmentation oflinguistic activity was possible. There are also indicationsthat linguists see complexity where Searle sees sim-plicity.519

In spite of the limitations of Searle's taxonomy, it stilloffers a new perspective on how to represent documents- specifically, that documents, like language in general,can be used to perform certain types of acts, and that itmight be useful to use a performative taxonomy torepresent documents for retrieval (although there cer-tainly remains a great deal to be done working out thedetails of an indexing or representation scheme based onIllocutionary Acts). It also points to the importance ofinformation external to the text of a document forrepresenting or describing that document for retrieval.Traditionally, document representation has been thoughtof primarily as representing the intellectual content ofdocuments - what they are about. The theory of Illocu-tionary Acts, when applied to information retrieval,shows that documents not only have a content, but theyhave a use — in fact, they may have multiple uses (forexample, a document could be used to execute a contract,but could later be used as evidence to support theassertion that the contract was valid); and these uses ofa document may not be entirely deducible from the textof that document. The reason for this is that thesuccessful performance of an illocutionary act is predictedon the satisfaction of certain more or less specific felicityconditions, and usually the satisfaction of the felicityconditions is part of the milieu of the document, not partof its text. For example, the drawing up and signing of acontract (promise) does not necessarily execute thecontract for the reason that only certain individuals areempowered to execute the contract. In this case, anindividual can be said, in effect, to promise something,whereas it could be shown in a court of law that althoughhe made the 'right noises' he did not actually make apromise (since he was not empowered to do so). Since thesatisfaction of felicity conditions is an important pre-condition for the execution of an Illocutionary Act, thenit stands to reason that the felicity conditions themselvesmay also play an important role in any documentrepresentation scheme based on Illocutionary Acts (forexample, if only specific, empowered individuals areallowed to execute contracts, then an informationretrieval system that stores contracts should have someway of maintaining an authority list of empoweredindividuals and be able to match it to the names ofindividuals who have executed stored contracts).

We mentioned before that even scholarly articles canbe seen as conducting Illocutionary Acts, and, clearly,there are ways to extend such analysis. First of all, the


Dow


ic.oup.com/com



declarations, commissives, expressives, etc. found inscholarly articles can be broken down further. Forexample, we stated that citations were implicit assertionsthat the articles or books cited were relevant to the citingarticle. But certainly there is more than one way in whicha previous article or book can be relevant to the citingarticle, so the implicit assertions of the bibliographies arenot all asserting the same thing (Hodges15 describes 14different kinds of relation that a citation could imply).

Wilson29 has demonstrated that there may be a way ofclassifying scholarly writings that has some interestingsimilarities to Elocutionary Acts. He has shown that thetraditional 'form subdivision' of the Library of Congressis similar to what has been known in literary studies asgenre, that is, kinds of literature. Wilson makes the pointthat the notion of genre has applications well beyond justliterary studies, and has characterized genres as 'longSpeech Acts' [personal communication]. 'Any field ofhuman activity is likely to develop a repertory of' definiteand relatively stable typical forms of construction'characterizing linguistic communication in the field; ' Aparticular function (scientific, technical, commentarial,business, everyday) and the particular conditions ofspeech communication specific for each sphere give riseto particular genres, that is, certain relatively stablethematic, compositional, and stylistic types of utterances.'Nonliterary genres have not been the subject of muchserious study, but it becomes clear on reflection thatmore or less well-settled conventional types are to befound throughout the world of text production, notmerely in the literary section of that world' [p. 37. Wilsonquotes Bakhtin]. Clearly, then, there are issues in thecontrol of scholarly texts that are similar to, and may beinformed by, the notion of Illocutionary Acts.

Brute facts vs. institutional facts

A final distinction that Searle makes (originally proposedby Anscombe)1 is that between 'brute facts' and'institutional facts' [see Blair6]. Brute facts are similar tothe largely physical sense data which we call' facts' in thenatural sciences. The reductionistic nature of scientificinvestigation assumes that all scientific study is a study ofbrute facts, and as much as the study of language aspiresto scientific status it, too, has been dominated by thestudy of the brute facts of language - syntax, wordfrequencies, statistical models of word/phrase occur-rence, etc. Yet, in spite of the superficial rigor that thetabulation of such linguistic brute facts attains, it cannever do what it purports to do, it can never be the basisfor any reasonably complete semantic analysis. Thereason for this, according to Searle, is that meaning inlanguage is based on institutional facts, not brute facts.Meaning in language is best accessible through ananalysis of conventions and human activities that areinextricably tied up with human institutions. Butalthough these institutions may have some brute orstatistical facts about them, Searle is quite insistent thatinstitutional facts are not derivable from brute facts inany complete way. In this, Searle would find agreementwith his predecessors, Austin and Wittgenstein; and itwas Wittgenstein who put it so succinctly, 'Only in thestream of thought and life do words have meaning'[Wittgenstein,33 para. 173]. Yet, if this is the case, then wehave a real problem in information retrieval, for clearly

the words and phrases that we use to describe and searchfor documents have been, for the most part, taken out of'the stream of thought and life'. In fact, what is mostreadily accessible for analysis in computerized Infor-mation Retrieval systems are the 'brute facts' ofdocuments: the frequencies by which index terms areassigned to documents; the frequencies in which termsco-occur in their assignment to documents; and for full-text retrieval systems: which words occur in the text ofdocuments; which words co-occur with others in thesame document, the same sentence, in the same para-graph, and within a specified proximity of other words.On this simple foundation of brute facts we have builtremarkably complex models of document representation.But have we captured the 'meaning' of the documentsrepresented by these simple facts? If Searle is correct, wehave not - nor can we. No number or complex com-bination of brute facts can be produced to give us the'meaning' of a document - where the 'meaning' of adocument would include such things as its subject,intellectual content, context, use, purpose or links toother documents. The 'meaning' of a document isunderdetermined by the brute facts of that document. Ofcourse, this is not to say that word or index termoccurrences have no relation to the meaning of adocument. There is a relation, but it is an adumbrativeone, where the occurrences of words in a text (or somecomplex metric based on them) only hint at the' meaning'of the document.

The obvious question, though, is: 'Do we really needthe "complete meaning" of a document in an in-formation retrieval system?' Perhaps not. The only waywe could tell whether 'semantically sparse' documentrepresentations were good enough is to conduct frequent,extensive tests of retrieval effectiveness on large, op-erational systems. This, unfortunately, is not done veryoften - primarily because such tests are so costly. Butthis may also be a reason for applying the theory ofIllocutionary Acts (as well as other theories of language)to document representation. That is, if IllocutionaryActs are good, though perhaps incomplete, models oflanguage usage, then using them as a framework fordocument representation guarantees, at the very least,that those document representations will function morelike natural language than document representationsthat are not tied to any theory of language use. It is areasonable assumption that document representationswhich function more like natural language will workbetter than those that do not, even if we do not have theempirical evidence based on tests of retrieval effectivenessto support it. Theories of natural language usage, then,may become the theoretical foundation that we need ininformation retrieval. This is not to say that there won'tbe other theories that will help, too. But these theories ofnatural language use may become the theoretical bridgesof information retrieval that, like the theories ofastronomy and quantum physics, allow us to makereasonable advances in document retrieval theory inbetween the rare empirical tests of large-scale systemeffectiveness.

The inferential nature of language

If models of language use have some relevance toinformation retrieval theory, then it is no great in-


Dow


ic.oup.com/com


D. C. BLAIR

tellectual leap to see why they are applicable. In short,the language of document representation and searchingis a ' dialect' of natural language, and the searching andindexing processes are, fundamentally, processes ofcommunication between indexers and searchers.5 Thatsuch communication is awkward or stilted does notchange the fact that it is, fundamentally, communication.As communication, the language of information retrievalmust conform to the principles of conversation. For anunderstanding of this area of linguistic theory we need toturn to the work of Paul Grice.14 Until Grice's work, theascendant theories of meaning in language were, explicitlyor implicitly, theories of coding (see, e.g. Eco11). That is,the way that one individual understood another was forthe listener to hear the words/phrases of the speaker andthen to ' decode' those words/phrases into some kind ofsemantic content. Different theories of meaning weredistinguished primarily by how they asserted that thisdecoding took place. Grice showed that the decodingtheories could not account for, among other things, howa hearer could tell when a speaker mis-spoke: forexample, when the speaker made a mistake or lied (thatis, simple decoding theories would always take languageliterally, something that is clearly not the case in naturaldiscourse). Grice showed that this phenomenon couldonly be accounted for if the hearer inferred some or allof the speaker's meaning and intentions prior to thespeaking of the words in question. In short, Grice wasable to show how it was possible to convey anunambiguous idea with an ambiguous sentence.27

Grice never said that no decoding takes place innatural discourse; understanding is a process of bothdecoding and inference. The inferences that individualsmake about a speaker's meaning are based on therecognition of subtle contextual and behavioural cues(e.g., I know that although Mary says 'Thanks!' the toneof her voice, her manner and what transpired before, hastold me that she is being sarcastic). Such inferences arenot generally available to the inquirer who uses aninformation retrieval system or to the indexer whoprepares documents for retrieval. Consequently, thelanguage of information retrieval is missing much of thedimension of inferential communication, and reduces toa system of literal semantic transactions; hence, mosttheories of document representation or indexing areimplicitly coding theories. What effect does this have onthe process of information retrieval? It's not clear,without, again, many tests of retrieval effectiveness. But,lacking such tests, and accepting the hypothesis thatmaking the language of information retrieval more likenatural language is a good thing, it appears thatinformation retrieval languages - to become more natu-ral - should have two components: a set of descriptorsand rules to assign meaning to them; and, a system ofcontextual reference that provides information thatwould permit the inquirer to infer other, non-literalmeanings from the words which describe the documentsin the system. Just what this system of contextualreference should comprise is not clear, although somepreliminary suggestions were made [see Blair5, pp.183ff.]. It may also be the case that the system ofcontextual reference may be composed mostly of' insti-tutional facts'. For example, it may be that for aninquirer to retrieve documents effectively on an in-formation retrieval system used in litigation support,

there may need to be included in the indexing structureinformation specific to that lawsuit, e.g. the titles andaffiliations of individuals named in the documents;significant dates, such as when contracts were signed, thetime periods in which individuals held significantpositions in the involved institutions; the professionaland perhaps casual relationships between individualsnamed in the lawsuit; etc. All of this would be sensitiveto the specific context of retrieval. The inferential natureof natural language implies, therefore, that any theory ofdocument representation/query formulation that doesnot take this inferential dimension into considerationmay be less effective than a theory that does.

The cooperative principle

Also of interest to information retrieval is Grice's notionof conversational cooperation. For Grice,14 our conversa-tions are characteristically cooperative efforts. Because ofthe inherent ambiguity of meaning, in order for thehearer to make any inferences about the speaker'sintended meaning, the hearer must assume that thespeaker is honestly trying to communicate in good faith.This leads the hearer, according to Grice, to assumeseveral quite specific things about the speaker's inten-tions, namely

Maxims of quantity(1) The speaker should be as informative as required.(2) The speaker should not be more informative than

required.

Maxims of quality(1) The speaker should not say what he believes to be

false.(2) The speaker should not say that for which he lacks

evidence.

Maxim of relevanceThe speaker should be relevant.

Maxims of manner(1) The speaker should avoid obscurity of expression.(2) The speaker should avoid ambiguity.(3) The speaker should be brief.(4) The speaker should be orderly.

In so far as an information retrieval search is like aconversation - with the inquirer making requests and theretrieval system 'answering' with sets of documentswhich match the requests - it stands to reason that thesemaxims of cooperation hold up in the search process.Note an important distinction, though; Grice's principleof cooperation does not assert that these maxims arealways upheld in conversation, he asserts the more subtlepoint that they are assumed to be upheld in conversation.That is, unless there is evidence to the contrary, thelistener assumes that the speaker is being as informativeas necessary, is not lying, is relevant, brief, orderly, etc.In information retrieval, then, it is likely that the inquirersubmitting search queries to a system and receiving setsof documents in return assumes that his search queriesare being dealt with cooperatively. But is this the case? Itis often hard to tell. Unlike an ordinary conversationwhere we have continual evidence of cooperation, or atleast the means by which to test it, in the informationretrieval process it is more difficult to determine whetherthe maxims of cooperation are being fulfilled. In fact, we


Dow


ic.oup.com/com



can only easily detect the violation of two of the maximsof cooperation: the second maxim of quantity: thespeaker should not be more informative than necessary;and the maxim of relation: the speaker should berelevant. The first maxim is obviously violated wheneverthe inquirer receives excessively large sets of retrieveddocuments in reply to his queries, and the second maximis violated whenever the inquirer receives retrieved setswith large numbers of non-relevant documents alongwith relevant documents (n.b. the retrieval of no relevantdocuments is not necessarily a violation of the maxim ofrelation - for the obvious reason that there may be norelevant documents on the retrieval system). The otherseven maxims of cooperation are very difficult even tochallenge (much less substantiate) during the retrievalprocess. Why is this? Just think how we would challengewhether one of these maxims is being upheld during anormal conversation: we give the speaker feedbackabout his/her conversation. For example, we might say,'Aren't you beating around the bush here?' if we think heis not being brief, or, 'That's pretty ambiguous', if heappears to violate the second maxim of manner, or,'Can't you just say it in plain English?' if he is beingobscure, or, 'Are you sure that's right?', if he violates thefirst maxim of quality, or, 'What makes you believethat?' if he violates the second maxim of quality, etc. Ourlanguage is replete with phrases that would permit us tochallenge any of the maxims of cooperation during thecourse of conversation. It also gives us linguistic ways foranswering these challenges. For most information re-trieval systems, though, we cannot ask these kinds ofquestions; and since we cannot ask these kinds ofquestions, we cannot challenge the cooperation of thesystem. Because the non-cooperative conversation is theexception rather than the rule in ordinary discourse, andsince we do not have the ability to assess the cooperationof the information retrieval process, it is natural thatinquirers usually will assume that retrieval systems aremore cooperative than they actually may be. Thisexplains why, in spite of the uniformly low reportedlevels of retrieval performance,925 inquirers generally donot express much dissatisfaction with large, operationaldocument retrieval systems. In short, inquirers typicallyassume more cooperation - that is, conscientious re-trieval - than is probably the case. This may also explainwhy the evaluation of large scale, operational retrievalsystems is not treated with the urgency that it deserves,and why empirical tests which show low levels ofretrieval effectiveness, even when rigorously carried out,are often met with scepticism or even disbelief.22

If information retrieval is really a form of conversation,then Grice's maxims of cooperation become heuristicsfor effective information retrieval: Maxims of coop-erative retrieval. Consequently, the challenge of in-formation retrieval research is to build into retrievalprocesses effective tools that the inquirer can use tochallenge and assess the cooperativeness of the documentretrieval process. This will not be an easy task, but it isbetter to work on a difficult important problem than towork on a more tractable but less significant problem.

Wittgenstein

Wittgenstein's Philosophy of Language defies simpleexplanation, and there already exist both explanations

(especially by Pitkin21) and a lengthy discussion of itsapplication to information retrieval.5 So the discussionhere will be brief. But even though Wittgenstein'sphilosophy of language is difficult to summarize, thefollowing two quotations encapsulate the essence ofWittgenstein's understanding of language.

For a large class of cases - though not for all - in which weemploy the word 'meaning' it can be denned thus: the meaningof a word is its use in the language [Wittgenstein32, para. 43].

We don't start from certain words, but from certain occasionsor activities [Wittgenstein34, p. 3].

Where Searle saw simplicity and only a few types oflanguage use, Wittgenstein saw a welter of complexity.For Wittgenstein, language is primarily a set of toolswhich we use to engage in certain specific activities. Welearn how to use language in the same way that we learnhow to use tools or implements - not by definitions andexplanations, but by having the appropriate usagedemonstrated or shown to us by means of whatWittgenstein called 'perspicuous examples/representa-,tions' (iibersichtliche Darstellung); and the only placewhere we can see language being demonstrated correctlyis in the 'occasions or activities' in which it is embedded.Language does have a structure, but it is not a grammarin the traditional sense; and this structure is dynamic-like a game being played, rather than a set of instructionsto be followed. The structures of language use areembodied by what Wittgenstein called 'LanguageGames', and the occasions or activities in which thesegames are embedded are the intensely human activitiesthat he called ' Forms of Life' - what he sometimesreferred to as our 'Natural History'. Our language, then,is inextricably caught up in the things that we do, and wemust understand how we participate in these activitiesbefore we can understand how language is used in them.In this sense, Wittgenstein would appear to favourGrice's inferential theory of meaning rather than a moretraditional coding theory. In other words, like Grice,Wittgenstein would agree that we often know a lot aboutwhat a speaker will say before he says it. We have acomplex set of'criteria' that help us disambiguate bothlanguage meaning and the speaker's intention.

But the point of Wittgenstein's analysis of languagewas not just to try to show how it worked, but to alsoshow how its misuse could lead to certain systematicmisunderstandings - what he called 'diseases of think-ing'. The primary cause for these problems was thatthose who looked at language (mostly, philosophers)tried to examine it independently of its usage, andindependently of the activities in which it was 'at home'.The scientific method of Wittgenstein's day was primarilyand extractive one, a method which isolated the object ofstudy and removed it from its natural context. Whilesuch a method may have success in some scientificpursuits, for Wittgenstein it could only be misleading inthe study of language. Further, the reason problems inlanguage can lead to mistakes in thinking was, accordingto Wittgenstein's theory, because language was not theproduct of thought, but the vehicle of thought:

When I think in language, there aren't 'meanings' goingthrough my mind in addition to the verbal expressions: thelanguage is itself the vehicle of thought' [Wittgenstein32, 1953,para. 329].


Dow


ic.oup.com/com


D. C. BLAIR

Language is the means by which the speaker works histhought out - it is the set of tools and the work areawhich are the foundation of his thinking. The speaker'sability to think clearly is as reliant on his mastery oflanguage as the artist's ability to express himself is relianton the paints and brushes he has and his ability to usethem.

Although we have described Wittgenstein's Philosophyof Language in the briefest possible terms, we can seesome broad implications for information retrieval. In thefirst place, the language of document retrieval, likeordinary language, must have its meaning grounded inactivities. Consequently, there won't be one way ofdescribing a document, but a variety of ways, each basedon the activity that uses the document in question. Thus,information retrieval systems are activity-specific. They,like their language, are dependent on the activities thatthey serve. The role of the indexer or the designer ofindexing algorithms is to relate the usage of the termswhich represent the documents to the usage of thosewords in the activities that employ those documents. Asa result, the study of information retrieval can bethought of as the study of information in context. Wecannot separate the design of information retrievalsystems from the activities in which they are embedded(some Information Retrieval researchers have begun toexplore this connection between document descriptionand activities; see Oddy et al.20). Further, if our languageis truly a vehicle of our thought, then the kinds ofthinking we can do in the process of searching fordocuments will be strongly dependent on the languageused to represent those documents. The level of com-plexity or refinement for working out and expressing ourinformation needs will be ineluctably limited to the levelof complexity or refinement of the language of represen-tation available to us.

Conclusion

This discussion has been an attempt to outline, howeverbriefly, some parts of the Philosophy of Language thathave bearing on the task of Information Retrieval.Clearly, information retrieval systems are in large partlinguistic systems, so it has been the implicit thesis of thisdiscussion that theories of language use or meaning are,mutatis mutandis, theories of document representation orquery formulation. The natural question at this point,then, is why do we need any theory of informationretrieval at all? Why can't we just build informationretrieval systems, changing and adjusting them as we goalong until we get good systems? The answer is, thatbecause of the linguistic nature of Information Retrieval,

there are simply too many degrees of freedom in designfor us to arrive at good designs haphazardly. But if thelanguage of Information Retrieval is really based onnatural language (here, English) in some non-trivial way,then it stands to reason that any theories of how naturallanguage works will help us to understand the linguisticaspects of Information Retrieval. To build a language forinformation retrieval without considering how naturallanguages work is presumptuous at best, impossible atworst. As Thomas Kuhn wrote,18 one of the necessarycharacteristics of a scientific discipline, as opposed to anon-scientific or pseudo-scientific one, is that its failuresare informative, they exclude certain avenues of researchand encourage others. The Philosophy of Language canhelp to make our failures in information retrieval designmore informative than they presently are.

But although theories of natural language use can behelpful for understanding of the linguistic aspects ofinformation retrieval, there may be aspects of thelanguage of information retrieval that have no clearcorrelation with the processes of natural language.Specifically, because of the enormous size of theirdocument collections, some current retrieval systems arecreating linguistic universes that are unprecedented. Forexample, our language was never meant to make the kindof subject distinctions that it is being called on to make inlarge-scale systems (one merely needs to look in thesubject catalogue of a large research library to see howinadequate most subject classifications are for makingfine distinctions in large document or book collections -and these research collections are usually indexed byprofessional cataloguers, a luxury that most commercialinformation retrieval systems do not have). It may be thecase that it is simply impossible to make fine distinctionsof intellectual content on large document retrievalsystems. If this is the case, then the kinds of systems wecan build and the levels of effectiveness that we canexpect from them may have to undergo substantialreconsideration (for example, if fine distinctions in theintellectual content of documents must be made, then itmay be more profitable to develop strategies to keep thecollection small, rather than letting the collection growand trying to make finer distinctions in the indexinglanguage).7 Whatever the case may be, we probably havenot treated the problem of document representation withthe care and intensity that it deserves. If the languages ofdocument representation and searching are really dialectsof natural language, then we should expect that anytheory of document representation and searching will beno less complex than the theories of language meaningand use are.

R E F E R E N C E S

1. G. E. M. Anscombe, On brute facts. Analysis 18 (3)(1958).

2. J. L. Austin, How to Do Things With Words. OxfordUniversity Press, Oxford (1962).

3. K. Bach and R. M. Harnish, Linguistic Communication andSpeech Acts. The MIT Press, Cambridge, MA (1982).

4. M. M. Bakhtin, The problem of speech genres, in hisSpeech Genres and Other Late Essays. University of TexasPress, Austin (1986).

5. D. C. Blair, Language and Representation in Information'Retrieval. Elsevier Science Publishers, Amsterdam (1990).

6. D. C. Blair, Brute facts, institutional facts, and large-scalecomputerized information systems. Unpublished manu-script (1991).

7. D. C. Blair, Search exhaustivity and data base size as aframework for text retrieval systems. Unpublished manu-script (1991).

8. D. C. Blair and M. E. Maron, An evaluation of retrievaleffectiveness for a full-text document retrieval system.Communications of the ACM 28 (3), 289-299 (1985).

9. D. C. Blair and M. E. Maron, An evaluation of retrievaleffectiveness for a full-text document retrieval system:

206 T H E COMPUTER JOURNAL, VOL. 35, NO. 3, 1992

Dow


ic.oup.com/com



technical correspondence. Communications of the ACM 28(11), 1238-1242(1985).

10. W. S. Cooper, A definition of relevance for InformationRetrieval. Information Storage and Retrieval, 7, 19-37(1971).

11. U. Eco, A Theory of Semiotics. Indiana University Press,Bloomington, Indiana (1976).

12. F. Flores, M. Graves, B. Hartfield and T. Winograd, Com-puter systems and the design of organizational interaction.ACM Transactions on Office Information Systems 6 (2),153-172 (1988).

13. B. Frohman, Rules of indexing: a critique of mentalism inInformation Retrieval theory. Journal of Documentation 46(2), 81-101 (1990).

14. H. P. Grice, Studies in the Way of Words. HarvardUniversity Press, Cambridge, MA (1989).

15. T. Hodges, Citation indexing: its potential for biblio-graphic control. Unpublished doctoral dissertation, Uni-versity of California, Berkeley (1972).

16. S. O. Kimbrough and R. M. Lee, On illocutionary logic asa telecommunications language. Proceedings of the 7thInternational Conference on Information Systems, SanDiego, 15-26(1986).

17. S. O. Kimbrough, On representing schemes for promisingelectronically. Decision Support Systems 6 (2), pp. 99-122(1990).

18. T. Kuhn, The Structure of Scientific Revolutions. Uni-versity of Chicago Press, Chicago, 2nd ed., 1970.

19. Levinson, S. C. Pragmatics. Cambridge University Press,Cambridge, 1983, pp. 280-281.

20. R. N. Oddy, E. D. Liddy, B. Balakrishnan, A. Bishop, J.Elewononi and E. Martin. Towards the use of situationalinformation in Information Retrieval. Journal of Docu-mentation, June 1992 (forthcoming).

21. H. F. Pitkin, Wittgenstein and Justice. University of Cal-ifornia Press, Berkeley (1972).

22. G. Salton, Another look at Automatic Text RetrievalSystems. Communications of the ACM. 29 (7), pp. 648-656(1986).

23. J. Searle, Speech Acts: An Essay in the Philosophyof Language. Cambridge University Press, Cambridge(1969).

24. J. Searle and D. Vanderveken, Foundation of IllocutionaryLogic. Cambridge University Press, Cambridge (1985).

25. K. Sparck Jones, Information Retrieval Experiment.Butterworths, London (1981).

26. K. Sparck Jones and M. Kay. Linguistics and InformationScience. Academic Press, New York (1973).

27. D. Sperber and D. Wilson, Relevance: Communication andCognition. Harvard University Press, Cambridge, MA(1988).

28. C. J. van Rijsbergen, A non-classical logic for InformationRetrieval. The Computer Journal 29, 481-485 (1986).

29. P. Wilson and N. Robinson, Form subdivisions and genre.LRTS 34 (1), pp. 36-43 (1990).

30. T. Winograd, Where the action is. Byte, 256A-258(1988).

31. T. Winograd and F. Flores, Understanding Computers andCognition: A New Foundation for Design. Addison Wesley,Reading, MA (1987).

32. L. Wittgenstein, Philosophical Investigations. The Mac-millan Company, New York (1953).

33. L. Wittgenstein, Zettel. University of California Press,Berkeley (1967).

34. L. Wittgenstein, Lectures and Conversations on AestheticsPsychology and Religious Belief, edited C. Barrett. Uni-versity of California Press, Berkeley (1972).

Book Review

RAJ JAINThe Art of Computer Systems PerformanceAnalysis (Techniques for Experimental Design,Measurement, Simulation, and Modeling)John Wiley & Sons, Inc., New York. £42.50.ISBN 0 471 50336 3.

A friend of mine once said that 'common'sense was pretty rare in the computing area. Agood dose of common sense is a commonfactor in nearly all my favoured computingtexts. In order to measure this book's per-formance we must define our criteria. To adeal of common sense we may wish to add thecriteria of comprehensibility, clarity and in-terest.

The book scores well on common sense: forexample, a number of checklists are presentedto highlight common problems encountered.The author reinforces the view that simpletechniques are always better than complexones where the simpler ones will suffice.

The book is admirable in its comprehen-siveness. A wide range of theoretical tech-niques are presented, but the large number ofpractical examples prevents the treatmentfrom becoming dry or sterile. The book isdivided into six sections comprising an over-view, techniques and tools, probability and

statistics, experimental design and analysis,simulation and queuing models. Each sectionconcludes with a section pointing the reader tofurther sources in that area. The author claimsthat no other text covers this range of topics ina single treatment, and the reviewer wouldconcur with this view. Certainly the book hasrevealed a number of techniques new to me.

The text compares favourably for claritywith texts on probability and statistics en-countered by the author. A text of this typecannot assume prior knowledge of such abroad range of topics. The text requires abackground in computer systems but no greatprior knowledge of statistics. The structure isof the ' tell them what you're going to say, sayit, and tell them what you said' type, based onthe six principal sections. This aids the clarityof what might otherwise become an unstruc-tured collection of methods.

Interest is maintained through the author'swriting style together with a large number ofexamples and case studies. Overall, the book isrecommended for its interesting and com-prehensive treatment of the subject. However,there are a number of limitations which shouldbe mentioned.

The first arises from the broad scope of thetext. Some subjects are treated in a rather

superficial way, for example chapter 29 pro-vides a list of commonly used distributions,but with limited explanation.

It is not always apparent where to find anappropriate technique if you dip into the bookwithout looking for a specific named tech-nique.

Perhaps the most obvious flaw is the needfor four pages of tabulated errata, particularlyfifty errors in the Author Index. As an authormyself, I understand the difficulty in ensuringthat a correct and complete manuscript re-aches publication. I also respect the author'sintegrity in providing such a full and pains-taking errata section. However, the basicintegrity of the text should be better, and it isinconvenient to keep checking the informationcontained in the main text against the errata.

Taken as a whole, these limitations not-withstanding, the text is recommended as acomprehensible and comprehensive treatmentof the subject, written in an interesting way.Let's hope that it sells well enough to requireanother print run, so that those niggling erratacan be incorporated into the main text.

A. GILLESSalford


Dow


ic.oup.com/com


Information Retrieval and the Philosophy of Language

Documents