Introduction Software Agents for Information Retrieval

Software Agents for Information Retrieval April 21, 1998

Finin, Nicholas and Mayfield 1

Software Agents forInformation Retrieval

Timothy FininUniversity of Maryland Baltimore County

[email protected]

Charles NicholasUniversity of Maryland Baltimore County

[email protected]

James MayfieldThe Johns Hopkins University Applied Physics Laboratory

[email protected]

2

Introduction

Overview of the Tutorial• Introduction

• What is an agent?

• Agent technologies– Agent theories

– Knowledge representation for agents

– Agent languages

– Agent communication

• Why agents for information retrieval and digital libraries?

• Examples of Agent-Based Information Retrieval Systems

• Conclusions

3

Introduction

Tutorial Objectives• Clear away, if possible, some of the current agent

hype.

• Demonstrate a variety of uses for agenttechnologies in digital libraries for informationrouting, filtering, and querying.

• Provide a snapshot of the state of the art in agent-based information retrieval.

• Present examples in enough detail to allow you toinvestigate the use of these technologiesyourselves.

4

Introduction

Why Study Agents?• Bruce Croft presented an IR top-ten wish list (D-

Lib magazine, November 1995), in an attempt tocapture his view of the most significant problemsfacing current IR systems.

• Agent technologies are potentially applicable tomost of these ten issues.

• Guaranteed multi-year funding.

5

Introduction

Croft’s Top Ten List10. Relevance Feedback

9. Information Extraction

8. Multimedia Retrieval

7. Effective Retrieval

6. Routing and Filtering

5. Interfaces and Browsing

4. “Magic” (term expansion)

3. Efficient, FlexibleIndexing and Retrieval

1. Integrated Solutions

and the number two IR problem is...

6

Introduction

The Number Two IR Problem

2. Distributed Information Retrieval



7

Introduction

What’s anagent?

8

Introduction

What is an agent?A brief tour of agent-space

• Daemons (e.g., ftp agent)• User interface clients (e.g., mail agent)• Physical agents (e.g., robotics)• Believable agents (e.g., VR and graphics)• Intelligent software agents

An agent is a powerful and ubiquitousabstraction in computer science

9

Introduction

What is a software agent?

• User-facing agents– Intelligent human-computer interface agents– Adaptive user modeling agents

• Personal (expert) assistants– personal retrieval agent

– financial portfolio manager

• Mobile software technology– mobile agents for network switch management

• Cooperating software agents– Resource discovery agents– Mediators and facilitators– Market agents ...

We can identify some sub-categories

10

Introduction

So, what’s a software agent?• No consensus yet, but several key

concepts are important to this emergingparadigm.

• A software agent:– is an autonomous, goal-directed process

– is situated in, is aware of, and reacts to itsenvironment

– cooperates with other agents (software or human) toaccomplish its tasks

11

Introduction

an emerging system-building paradigm(with hype)

DistributedSystems

Database &Knowledge base

Technology

InformationRetrieval

MachineLearning

agents Cognitive ScienceAI &Mobile computing

objects

1982

agents

1997

structuredprogramming1975

= =12

Introduction

Software Agent CharacteristicsCooperation

Autonomy Adaptation

after Hyacinth Nwana, 1996



13

Introduction

Agent Characteristic: Adaptation

Cooperation

Autonomy Adaptation

Agents adapt to theirenvironment and users andlearn from experience.

• Via machine learning,knowledge discovery, datamining, etc.

• Via exchange of metadata,brokering, and facilitation.

• Interface agents acquire anduse user models

• Situated in and aware of theirenvironment

14

Introduction

Agent Characteristic: Cooperation

Cooperation

Autonomy Adaptation

Agents use standardlanguages & protocols tocooperate and collaborateto achieve common goals.• Cooperate with human agents

and other software agents• Supported by agent

communication languages andprotocols.

• Consistent with humanconventions and intuition.

• Toward team formation andagent ensembles

15

Introduction

Agent Characteristic: Autonomy

Cooperation

Autonomy Adaptation

Agents act autonomously topursue their agenda.• Proactive and reactive• Goal directed behavior• Appropriately persistent• Multi-threaded behavior• Encourage viewing from an

“ intentional stance”

16

Introduction

Agent Characteristic:Mobility?

• Examples: programs in Telescript, Agent-Tcl, Voyager, etc.and, to a limited degree, Java Applets.

• Note -- this definition implies some agent attributes, e.g.autonomy, persistence, ...

• Mobile agents offer some very interesting advantages as well assome disadvantages.

• This is an important technology for distributed systems but islargely orthogonal to other “agent” issues.

A mobile agent is an executing program thatmigrates from machine to machine in a

heterogeneous network under its own control.

17

Introduction

Agent Characteristic:Intelligence?

A: The size of the price tag.

More seriously…– The paradigm covers agents of varying degrees of intelligence

– Intelligent agents will tend to• know and apply more sophisticated domain knowledge• recognizing underlying goals and intentions• react to unexpected situations in a robust manner• better NLP skills• etc.

Much of what we will be saying applies to agents of little or nointelligence.

Q: What makes an agent an “intelligent agent”?

18

Introduction

Some key ideas

• Software agents offer a new paradigm for very largescale distributed heterogeneous applications.

• The paradigm focuses on the interactions ofautonomous, cooperating processes which can adaptto humans and other agents.

• Mobility is an orthogonal characteristic which many,but not all, consider central.

• Intelligence is always a desirable characteristic but isnot required by the paradigm.

• The paradigm is still forming.



19

Introduction

AgentTheory andTechnology

20

Introduction

Agent Theories

1 Database and Knowledge-base technology

2 Distributed computing

3 Cognitive Science

4 Computational linguistics

5 Econometrics

6 Biological analogies

7 Machine learning

We will touch on some of the theoretical machineryuseful for agent-based systems:

21

Introduction

Theory 1:Database and Knowledge-base Technology

Intelligent agents need to be able to represent andreason about a number of things, including:

– metadata about documents and collections of documents

– linguistic knowledge (e.g., thesauri, proper namerecognition, etc)

– domain-specific data, information and knowledge

– models of other agents (human or artificial): theircapabilities, performance, beliefs, desires, intentions,plans, etc.

– tasks, task structures, plans, etc.

22

Introduction

Theory 2:Distributed Computing

• Concurrency– analyzing and specifying protocols, e.g., deadlock and

livelock prevention, fairness

– achieving and preserving consistency

• Performance evaluation– visualization

– debugging

• Exploit the advantages of parallelism– multi-threaded implementation

23

Introduction

Theory 3:

Cognitive Science

The BDI agent model considers an agent to have– Beliefs about itself, other agents and its environment

– Desires about future states (i.e., goals)

– Intentions about its own future actions (i.e., plans)

The BDI model is particularly useful for– Developing formal models of agents

– Developing a deep model of agent communication

– Inferring an agent’s internal state from its behavior

– Predicting what other agents will do

24

Introduction

Theory 4:

Computational Linguistics• We draw on research in (computational) linguistics

for an underlying communication model

• Speech act theory is a high level frameworkdeveloped by philosophers and linguists to accountfor human communication– Speakers do more than just utter sentences which have a logical

(truth-theoretic) meaning.

•“The cat is on the mat”

– Speakers perform speech acts: requests, suggestions, promises,warnings, threats, criticisms, praise, etc.

•“I hereby promise to buy you lunch”

– Every utterance is a speech actstart asked told

Ask-one

TellDeny

UntellSorry



25

Introduction

Performatives

• In speech act theory, a performative is a speech actwhose utterance performs an action.

• Examples: promise, assert, request, confirm, ask ...

• Heuristic: X is a speech act if you can say “I hereby X”.

• The KQML agent communication language uses theterm performative to mean a primitive message type

• A key objective is to develop systems based on the rightset of communication primitives

26

Introduction

Theory 5: Econometric models

• Economics studies how to modelpredict and control the aggregatebehavior of large collections of independent agents.

• Conceptual tools include:– game theory– general equilibrium market mechanisms– protocols for voting and auctions

• An objective is to design good artificial markets andprotocols to result in the desired behavior of our agents

• Michigan’s Digital Library project uses a market-basedapproach to control its agents.

27

Introduction

Theory 6:Biological Analogies

• One way that agents could adapt is via anevolutionary process.

• Individual agents use their own strategies forbehavior with “death” as the punishment for poorperformance and “reproduction” the reward forgood.

• Artificial life techniques include:– genetic programming

– natural selection

– sexual reproduction28

Introduction

Theory 7:Machine Learning

• Techniques developed for machine learning arebeing applied to software agents to allow them toadapt to users and other agents.

• Popular techniques include:– Reasoning with uncertainty– Decision tree induction– Neural networks– Reinforcement learning– Memory-based reasoning– Genetic algorithms

Rev. Thomas Bayesand his Theorem

29

Introduction

Conclusions

• There is a wealth of prior theory fromseveral disciplines being applied tosoftware agents

• You don’t need to be an expert on thetheory to be able to apply it or use thepractical concepts or technology itsupports.

30

Introduction

AgentTechnology



31

Introduction

Agent Technology

We will cover four aspects of technologyrelated to building agent-based systems

– Agent architectures

– Agent programming languages

– Mobile agent frameworks

– Agent communication languages

32

Introduction

AgentArchitectures

33

Introduction

Agent Architectures

• Mediated architectures

• Multi-agent systems

• Markets and swarms

People are using several architecturesfor agent-based information systems.

34

Introduction

Mediated Architectures• Agents generalize the client-server architecture which has

dominated the Internet since its beginning

• Wiederhold introduced the notion of a “mediatedarchitecture” for information systems

Server

Server

ServerC

C

C

C

C

C

SoftwareObject

SoftwareObject

DataObject

DataObject

Server

Server

ServerSoftware

Object

SoftwareObject

DataObject

DataObject

Clients

A

A

A

A

A

A

A

A

A

AA

A

A

A

A

A

A

A

A

AA

A

A

A

A

AA

A

A

A

A

A

A

A

A

A

A

A

A

A

A

Server

Server

DataService

DataService

Server

Server

DataService

DataService

A

A

35

Introduction

Multi-agent Systems

• Some research focuses on developing sophisticatedindividual agents with advanced capabilities.

• Other research is focused on multi-agent systems(MAS) with an emphasis on– agent-to-agent communication

– cooperation and collaboration

– team and coalition formation

– information sharing among the team

– joint beliefs, goals and plans

36

Introduction

Agent markets and swarms

• Yet another architectural view is the decentralizedmarket or swarm.

• Key idea -- the parallel, autonomous actions of alarge collection of individual agents results inemergent behavior of the collective.

• The market view usually assumes rational agentswhereas the swarm view, associated with artificiallife, does not.



37

Introduction

AgentProgramming

Systems38

Introduction

Desiderata forAgent programming languagesAlthough many languages can be used for writingsoftware agents, here are useful characteristics

– Support for maintaining DBs and/or KBs

– Good interoperability features to support mediatedarchitectures and component-based programming

– Good communication support•Multi-threaded

– Support for security (communication & execution)

39

Introduction

What are people using?• Mobile agents require a language that supports

mobility. Some examples:– Languages designed to work in a sandbox: Java, Safe-

Tcl, Safe-Python, …– Java-based frameworks for mobile languages: Aglets,

Voyager, Odyssey, Concordia, …– Other mobile languages: Tacoma, Agent-Tcl, Plan, ...

• Non-mobile agents can be written in your favoriteprogramming language:– Java, C/C++, Lisp, Prolog, Smalltalk, Tcl, Perl…

with a suitable agent-communication module40

Introduction

12

Example -- TKQMLTKQML application adds

KQML functionality to anyTcl/Tk shell.

• Provides simple route to high levelcommunication among Tcl/Tkbased agents.

• Incorporates tools for a facilitatedenvironment (resource brokering).

• Parameters to KQML functions areconverted to and from stringrepresentations automatically.

• TKQML implements a call-by-value-result mechanism to bestmirror usage of C functions.

• Scripts are registered dynamicallyfor handling incoming messages.

• Current processing is interrupted byhandler, and resumes whenmessage handling is complete.

set receiver bobset reply nullkqml send_msg 0 tell 24 receiver reply

The KQMLfunction.

Specifies no timeout

Performative

TKQML willproperly interpret this as a variable

Content

Since reply namesa variable, TKQML will

block, and a reference to thereturn message will be

stored in reply

41

Introduction

A Sample TKQML Agentset myName “MonA-[lrange $argv 0]”kqml initialize myNameset A 0set subscribers ““kqml register-script subscribe { set new [kqml get_field :sender kqml_msg] if {[lsearch $subscribers $new] == -1} { set subscribers [linsert $subscribers 0 $new] }}kqml register-script unsubscribe { set name [kqml get_field :sender kqml_msg] set idx [lsearch $subscribers $name] if {$idx >= 0} { set subscribers [lreplace $subscribers $idx $idx] }}while {1} { set AA $A set A [toolkit prob_1] if {$AA != $A} { foreach sub $subscribers { kqml send_msg 0 tell A sub NULL } } after 1000}

Agent name specified on command lineInitialize agent, register with ANSInitialize A to 0 (not important to example)Initialize subscribers list to emptyRegister the following script to handle subscribe messages Get name of sender (kqml_msg always contains message) If new, … Add name to list of subscribers

Register the following script to handle unsubscribe messages Get name of sender Locate name in list of subscribers If present, … Remove name from list

While true, … Set AA to the previous value of A Probe for new value of A (details not important to example) If the value in question has changed, … For each name on the subscriber list, … Send a message reporting the new value of A

Wait for some period, then continue 42

Introduction

AgentCommunication



43

Introduction

Agent Communication

• We assume that agents use an AgentCommunication Language or ACL tocommunication information and knowledge.

• Genesereth (CACM, 92) defined a software agentas any system which uses an ACL to exchangeinformation.

44

Introduction

Some ACLs• Is CORBA an ACL?

• Knowledge sharing approach– KQML, KIF, Ontologies

• Foundation for IntelligentPhysical Agents (FIPA)

• Ad hoc languages– e.g., SRI’s Open Agent

Architecture language

Shared objects, procedure callsand data structures

Shared facts, rules, constraints, procedures and knowledge

Shared beliefs, plans, goals,and intentions

Sharedexperiencesand strategies

e.g.,CORBARPC

e.g., KQML+KIF, FIPA, Aglets

e.g., AgentTalk

KnowledgeSharing

IntentionalSharing

ExperientialSharing

ObjectSharing

45

Introduction

Knowledge Sharing Effort

• The DARPA KSE is a distributed research project involvingover a dozen research groups aimed at developing techniques,methodologies and software tools for knowledge sharing andknowledge reuse.

• This requires a common language (syntax, semantics,pragmatics)

• Some existing components that can be used independently ortogether:– KIF -- knowledge interchange format (syntax)– Ontolingua - a language for defining sharable ontologies (semantics)– KQML - a high-level interaction language (pragmatics)

46

Introduction

Knowledge Interchange Format

• KIF ~ First order logic with set theory

• An interlingua for encoded knowledge– Takes translation among n systems from O(n2) to O(n)

• Common language for reusable knowledge–Implementation independent semantics

–Highly expressive -- can represent knowledge in typical applicationknowledge bases.

–Translatable -- into and out of typical application languages

Know. Basein

Lang1

KIF <-> Lang1 Translator

Sys 1Library Know. Base

inLang2

Knowledge in KIF

KIF <-> Lang2 Translator

Sys 2

47

Introduction

Ontologies

Ontology : A common vocabulary and agreed uponmeanings to describe a subject domain.

• This is not a profoundly new idea …– Vocabulary specification– Domain theory– Conceptual schema (for a data base)– Class-subclass taxonomy– Object schema

• An ontology contains specification of concepts to beused for expressing knowledge

-- Types of entities -- Relations and functions-- Attributes and properties -- Constraints

48

Introduction

Ontology Library and Editing Tools

Models ofSpace

Browse Compare Compose Extend Check

°Ž

ŽŽ

Ž

EditingTools

SharedLibrary

WordNetPenman OntologyCYC Upper Ontology

Models ofTime

PhysicalObjects

Actions& Causality

Lexicons &Skeleton Ontologies

CommonOntologies & Theories

Geography& Terrain

Situations& Contexts

OperationsLogisticsSensor ManagementBattlefield SituationsCommand and Control

Domain-SpecificOntologies & Theories

Basic Representation Concepts: Sets, Sequences, Arrays, Quantities, Probabilities

Ontolingua is a languagefor building, publishing, andsharing ontologies.– A web-based interface to a

browser/editor server athttp://ontolingua.stanford.edu/and mirror sites.

– Ontologies can be translatedinto a number of contentlanguages, including KIF,LOOM, Prolog, CLIPS, etc.



49

Introduction

• KQML is a high-level, message-oriented, communicationlanguage and protocol for information exchange independent ofcontent syntax and ontology.

• KQML is independent of– Transport mechanism (e.g., TCP/IP, email, CORBA objects, IIOP, etc.)

– Content language (e.g., KIF, SQL, STEP, Prolog, etc.)

– Ontology assumed by the content.

• KQML includes primitive message types of particular interest tobuilding interesting agent architectures (e.g., for mediators,sharing intentions, etc.)

KQMLKnowledge Query and Manipulation Language

50

Introduction

A KQML Message

Represents a single speech act or performativeask , tell, reply, subscribe, achieve, monitor, ...

with an associated semantics and protocol and a listof attribute/value pairs

:content, :language, :from, :in-reply-to

(tell :sender bhkAgent :receiver fininBot :in-reply-to id7.24.97.45391 :ontology ecbk12 :language Prolog :content “price(ISBN3429459,24.95)”)

performative

parameter

value

51

Introduction

Some KQML Performatives• Basic query performatives:

– evaluate, ask-if, ask-in, ask-one, ask-all, ...

• Multi-response query performatives:– stream-in, stream-all

• Response performatives:

– reply, sorry, error

• Generic informational performatives:– tell, achieve, cancel, untell, unachieve

• Generator performatives:– standby, ready, next, rest, discard, generator

• Capability-definition performatives:– advertise, subscribe, monitor, import, export, ...

• Networking performatives:– register, unregister, forward, broadcast, route, ...

52

Introduction

Simple Query Performatives

A B

The ask, ask-one, ask-all, ask-if, and ask-aboutperformatives provide a simple query mechanism.

ask-one(P)tell(P)

ask-all(P)tell((p1 p2 p3...))

tell(p1)tell(p2)

tell(p3)

ask-all(P)

53

Introduction

KQML protocols

C

broker(ask(P))

B

advertise(ask(P))

ask(P)

tell(P)tell(P)

A

C

recruit(ask(P))

B

advertise(ask(P))

ask(P)

tell(P)

A

C

recommend(ask(P))

B

adv(ask(P))

tell(P)

ask(P)fwd(adv(ask(P)))

A

ask-one(register(B,...))A

BANS1Cregister(B,...)

tell(register(B,,...))

ANS2

ask-one(register(D,...))tell(register(D,,...))

Broker

Recruit

Recommend

Register54

Introduction

KQML Semantics

Semantics currently defined as:–preconditions and postconditions on sending

and receiving agents in terms of

–predicates describing the virtual knowledgebase (“mental states”) of agents and

–grammars describing constraints oncoherent discourse.

See (Labrou 1996) and (Labrou and Finin, 1997)



55

Introduction

KQML APIs andSystem Interfaces

KQML APIs:– KATS (Loral/UMBC)– KAPI (Lockheed/EIT/Stanford)– MAGENTA (Stanford)– Infosleuth (MCC)– LogicWare (Cryzalys)– Toronto - Lisp and C– JavaAgentTemplate (Stanford), JAFMAS (Cincinnati), JKQML (IBM), Jackal

(UMBC) -- Java– TKQML (UMBC) -- Tcl/Tk– Agentbase (SICS) -- Prolog

System interfaces– Languages: Lisp, C, C++, Perl, Prolog, Tcl, CLIPS, Scheme, Java, ...– KR languages: LOOM, KRSL, ...– Systems: DBMSs, SIMS, ...

56

Introduction

KQML Utility Agents• We have developed a number of KQML speaking utility agents

based on standard ontologies.• Agent Name Server • Logger

• Message Router • Authenticator

• Broker • Communication Visualizer

• Controller • Proxy Agent

Netw

ork loggerC

Logstream

Visualizer

registrations

Log

ANSC

ControllerC

Agent

Agent

• These facilitate thedevelopment of agent-basedsystems.

57

Introduction

KQML Utility Agents

58

Introduction

Agent Technology Conclusions

• Modern programming languages support buildingagents

• Mobility is a new feature, however.

• Communication is of central importance and, inparticular:– Establishing common agent communication languages.

– Developing common ontologies for application domains.

– Establishing common agent protocols.

What Do AgentsHave To Do With

InformationRetrieval?

60

Introduction

Agent-based Information Systems

• Agents facilitate access to multipleinformation sources

• Mediated architectures facilitate access tospecialized collections

• Distributed (agent) architectures facilitatescalability



61

Introduction

Information Retrieval andAgent Characteristics

A d a p ta t io n C o o p e ra t io n A u to n o m y

1 0 . R e le v a n c e F e e d b a c k ✔ ✔

9 . In fo rm a tio n E x tra c tio n ✔

8 . M u lt im e d ia R e tr ie v a l ✔

7 . E f fe c t iv e R e tr ie v a l ✔ ✔

6 . R o u tin g & F il te r in g ✔ ✔ ✔

5 . In te r fa c e s & B ro w s in g ✔ ✔

4 . T e rm E x p a n s io n

3 . E f fic ie n c y & F le x ib il i ty ✔ ✔ ✔

2 . D is tr ib u te d IR ✔ ✔

1 . In te g ra te d s o lu tio n s ✔ ✔ ✔

62

IntroductionAgent-Based InformationRetrieval (ABIR) Mind Map

Large, Heterogeneous,

Distributed

ABIR

CollaborativeAdaptive Interface

Shopbots

Knowbot

Metasearch

CARROT InfoSleuth Retsina SAIREUMDL

SimilaritiesEngine

EachMovie

Firefly GroupLens Morse MovieCritic

Phoaks RARE/Tunes ReferralWeb

SiteSeer Yenta Letizia

Push

RemembranceAgent

Backweb Marimba Pointcast SIFT TopicAGENTsFishwrapMyYahoo

Netbot JangoShopbot

All-in-one Fastfind Metacrawler Metasearch Profusion Savvysearch WebCompass

Fab

Syskill and Webert

urlAgents

ContentExperts

Proactive

Single interface to multiple resources

Adaptation to users and content

Contentcomprehension

63

Introduction

Knowbot ABIR Systems• A ‘knowbot’ is an ABIR system that provides a single

query language to access a variety of informationsources

• It serves as a representative for the user (which demandsprogram autonomy).

• Examples of knowbots: MetaCrawler,

SavvySearch, Shopbot, BargainFinder, Fido.• Metacrawler:

– Queries a variety of search engines in parallel.

– Provides a uniform user interface.

– Merges results from different engines.

– Downloads and scans pages ifnecessary.

– Passes ads through.

Knowbots

64

Introduction

Example: SavvySearch

• Tries to maximize the likelihood of finding goodlinks while holding resource consumption to aminimum.

• Reasons about resource requirements.

• Ranks the available search engines for how wellthey respond to the terms in the query.

• Dispatches query only to top-ranked searchengines.

Knowbots

65

Introduction

Netbot Jango

• A “shopbot” for a selected set ofproducts (e.g., music CDs, PCsoftware, cigars)

• Server can do some limited “naturallanguage processing” to interpretquery and match to sources

• Adapters for sources downloadedfrom server to client as needed.

• Client runs in browser on user’smachine and issues final queries

• Bought by Excite early in 1998

Knowbots

66

Introduction

Netbot Jango

• Dynamic displayshows sources andtheir status

• Collectsinformation onsellers as well asratings.

• Final displaysummarizesproducts, costs, etc.

• Provides links forordering

Knowbots



67

Introduction

Fab -- Adaptive IR

• Fab recommends web pages using adaptive information retrievaltechniques to learn an individual’s profile

• Like any recommendation service, Fab has three components:– Collection -- collect the items to be recommended

– Selection -- select from collected items those best for a particular user

– Delivery -- deliver the selected items to the user)

• Users’ feedback on how much they liked recommended pages used to– adapt the user’s profile

– assign credit or blame to the recommending collection agents

• A “genetic algorithm” is used to evolve the population of collection agents– collection agents specialize over time to different topics, serving distinct groups

of users.

– Useless collection agents die, successful ones live and “reproduce”

Http://fab.stanford.edu

Adaptive

68

Introduction

Collaborative Filtration ABIRSystems

• A collaborative filtration ABIR system makesrecommendations to a person based on thepreferences of similar users.

• Recommending People: Yenta, ReferralWeb

• Recommending Products: Firefly,Similarities Engine, Tunes (music),EachMovie, Morse, RARE, MovieCritic(movies & videos)

• Recommending Readings: Wisewire, Firefly,Fab, Phoaks

Collaborative

69

Introduction

Content-based vs. CollaborativeRecommendation

• Content-based recommendation retrieves otherdocuments similar to those the user liked earlier.

• Collaborative recommendation retrieves documentsliked by other people similar to the user.

User User

Document RecommendedDocument

likes likes

similar to

similar to

content-basedrecommendation

collaborativerecommendation

Collaborative

70

Introduction

How Does CollaborativeFiltering Work?

• Form a large vector space, in which each e.g.movie represents one dimension.

• Represent each user as a (sparse) vector of movieratings.

• P(Ui likes Mj) = ΣP(Uk likes Mj) for k ε {nearestneighbors(i) who have rated Mj }

Collaborative

71

Introduction

Example: Firefly

•Has been applied to music,movies, web pages, books,etc.

•Uses several neighbor sets toimprove precision.

•Will recommend people whorate a given page highly, andpages that a giver personrates highly (e.g., MyYahoo).

•Developed user “passport”allowing a single user profileto be used in a variety ofapplications.

•Bought by Microsoft in April98.

Collaborative

72

Introduction

SysKill & Webert•Takes overbrowser adding anew panel.

•Makes it easy forthe user to rate apage as “hot” or“cold” withrespect to one ofseveral user-definedcategories.

•Can retrieve newpages which theuser should like.

Collaborative



73

Introduction

Phoaks

• People Helping One Another KnowStuff

• "Together, we know it all."

• Looks for postings to topic-orientedUsenet newsgroups (e.g.,comp.lang.prolog) whichrecommend web resources.

• Tallies and summarize thoserecommendations.

• Indexed like Netnews

• Filters out spurious and signaturereferences

• http://www.phoaks.com/

Collaborative

74

Introduction

Proactive ABIR Systems• Remembrance Agent

• Letizia

• Push model ABIR Systems– Marimba

– Backweb

– Pointcast

– Verity TopicAGENTs

– SIFT (routing of Usenet news articles)

Proactive

75

Introduction

Example: Remembrance Agent• Indexes personal files and e-mail.

• Embedded in Emacs.

• When you perform a task, it automatically suggests relevant documents.

• Provides continuous associative recall.

Proactive

76

Introduction

Example: Letizia

• Agent browses as youbrowse.

• People typically browsedepth-first; Letizia browsesbreadth-first.

• Uses a variety of heuristicsto identify interesting pages.

• When an interesting page isidentified, it is displayed ina separate browser window.

Proactive

77

Introduction

Example: TopicAGENTs• Provides an agent view

of information retrievaltasks to the user.

• Tasks include filtering, categorization, and routing.

• A variety of delivery modes are available, including webpage, database entry, e-mail, fax, pager.

• Verity’s SEARCH'97 Agent Server “enables developersand integrators to set up Agents, which are personalizedqueries that include administrative and notification”

Proactive

78

Introduction

Adding Semantic Markup• Problem: agents’ understanding of NL text is very low.

• Long term solution: better NLP technology

• Immediate solution: add key semantic information in astructured form.

• Possible encodings:– SHOE -- Simple HTML Ontology Extensions developed at UMCP

– XML -- eXtensible Markup Language

– RDF -- Resource Description Format

• A key to this will be our ability to develop “consensusontologies” or common conceptual models for the semanticmarkup.

Content experts



79

Introduction

Proposed W3C Standards for Metadata

•XML provides an extensible markup language (SGML light)

•RDF data consists of nodes and attached attribute/valuepairs, providing the expressive power of semantic networks– Nodes can be any web resources (pages, servers, basically anything for which

you can give a URI) including other RDF expressions.

– Attributes are named properties of the nodes, and their values are eitheratomic (e.g., strings, numbers) or other resources or metadata instances.

•Other standards are built on top of RDF, including:– P3P: Platform for Privacy Preference for the exchange of privacy

practices and preferences among Web sites, agents and users.

– PICS: an infrastructure forassociating metadata labelswith Internet content

Content experts

80

Introduction

An RDF Example

<?xml:namespace name="http://docs.r.us.com/bibliography-info/" as="BIB"?>

<?xml:namespace name="http://www.w3.org/TR/WD-rdf-syntax#" as="RDF"?>

<RDF:RDF>

<RDF:Description RDF:HREF="http://www.bar.com/some.doc">

<BIB:Author>

<RDF:Description>

<BIB:Name>John Smith</BIB:Name>

<BIB:Email>[email protected]</BIB:Email>

<BIB:Phone>+1 (555) 123-4567</BIB:Phone>

</RDF:Description>

</BIB:Author>

</RDF:Description>

</RDF:RDF>

This document definesthe BIB ontology

This document definesthe RDF syntax

Document http://www.bar.com/some.doc has an authorwhose name is “John Smith”, email address is “[email protected]”and phone number is “+1 (555) 123-4567”.

Content experts

81

Introduction

Large, Heterogeneous Corpora

• Dynamic corpus, measured in MBs (or GBs)

• Documents may vary in size, format, or language

• Queries may be comparable to documents in size

• Assume the duality of queries and documents

corpusNew

documentsQueries

Distributed

82

Introduction

Basic Agent-based IR Architecture

Distributed

MultipleUsers

Multiple information sources

Queries &documents

feedback

queries

documents

Agents

83

Introduction

Key Ideas in Basic Architecture

• User as client, data provider as server

• Data providers may be very different

• Metadata - data about the data. Salient characteristics of aquery, a document, or corpus can be discovered, stored, andused for future searches

• Similarity function - to match query against candidatedocument, or against metadata

• Relevance Feedback - let the user indicate which documentsare/are not useful, and let that information be used in futurequeries

Distributed

84

Introduction

Issues in the Basic ABIR Architecture

• Each user is represented by (at least) one agent, which has(or acquires) information about that user’s preferences andinformation needs. How best to do this is the user profileproblem

• Queries may be modified (e.g. expanded) and dividedamong providers: this is the query processing problem

• The providers may use different data models and/or queryformats: this is the heterogeneity problem

• Documents derived from different sources need to bemerged or ranked in a consistent manner: this is the datafusion problem

Distributed



85

Introduction

Mediated Agent-based IRArchitecture

Distributed

MultipleUsers

Multiple information sources

Queries &documents

feedback

queries

documents

Metadataon collections

Specialized Agents

Broker Agents

Provider Agents

Metadata

about Users

User Agents

86

Introduction

How Does this Help?

• User Agents– “Remember” what each user likes and dislikes

– May be involved in gathering and fusing results

• Brokers– Facilitate communication among user agents and back-

ends, and in particular

– Route queries and new documents to the correct back-ends

• Back-ends– Handle operations native to providers (IR systems or

DBMSs)

Distributed

87

Introduction

Metadata

• Back-ends generate metadata, which describes their owndata

• Brokers collect metadata, and use that to make routing anddata fusion decisions

• User Agents collect metadata as they record what materialthe user does or does not want

• Document “centroids” are useful as metadata– May need to use several such objects to describe “logical” sub-

corpora

– Centroid may be cumbersome, so some reduction may be called for

Distributed

88

Introduction

SAIRE: Scalable Agent-basedInformation Retrieval Engine

• To provide access toNASA EOSDIS data

• Support for naive andexpert users

• Three varieties of agents:User Interface Agents, Coordinator Agents, and DomainSpecialists

• Uses a “coarse-grained” agent communication language

• Agents include a CLIPS shell, and are complete expertsystems

• To see a demo, visit http://saire.ivv.nasa.gov/saire.html

Distributed

89

Introduction

UMDL: University of MichiganDigital Library

Three classes of agents:• User Interface Agents (UIAs) accept queries and

add them to user profiles

• Mediators apply user profile to plan and forward toCollection Interface Agents (CIAs)

• CIAs provide interface functions to search engines

Distributed

90

Introduction

UMDL•UMDL treats alternative informationservices as competing economicactivities.

•Agents interact in supplier-producerrelationships.

•Agents dynamically connect with eachother as opportunities arise.

•The collections, represented by collection interface agents, provide ``rawmaterials'' in this process.

•Library end users, represented by user interface agents are the consumers of the``finished goods''.

•Mediator agents bridge the gap by bringing to bear knowledge, processing,storage, or other computational resources to improve the expected value of theinformation.



91

Introduction

Ontologies in UMDL

• The UMDL ontology for bibliographicrelations defines a fairly elaboratestructure of precisely defined concepts.

• Users can explore the CD holdings via aJava applet which accesses an ontologyknowledge base describing them.

92

Introduction

Retsina• Reusable Task Structure-based Intelligent Network

Agents

• Interface Agents represent users, and communicatewith

• Task Agents, which are capable of making plans toachieve goals– Domain-independent sub-plans, indexed by goal

– Domain-specific sub-plans can be added

• Information Agents may collaborate with eachother

Distributed

93

Introduction

Retsina ArchitectureDistributed

94

Introduction

Inside a RETSINA Agent

Distributed

95

Introduction

Warren Portfolio Manager

Distributed

An application of this architecture is WARREN, a personalexpert agent providing an integrated financial picture of aninvestment portfolio using existing Internet informationresources.

96

Introduction

CARROT: Cooperating Agent-basedRouting and Retrieval of Text

Distributed

• Agents provide access to differentcorpora, using existing IR engines

• Agents share metadata with Brokeragents, which route queries and newdocuments to the “right” place(s).

• Two enabling technologies:- KQML agent communication language

- N-gram processing



97

Introduction

Agent ControlAgent

R

R

R

R

R

R

R

Back-end(mg)

AgentNameServer

Back-end(Telltale)

AgentBroker

Agent

KQML Messages

Back-end(Telltale)

Back-end(DBMS)

Queries& Data

Distributed

98

Introduction

CARROT Back-end Agent• Interacts with local IR/DB engines to access data

• No interference with existing applications

• Generates metadata, to be shared with one or more brokers

• Metadata for a set of documents is a [compressed] n-gramprofile

KQML TKQMLYour

app. hereTCLAgent

Telltale

Network

TK GUI

Corpus

Distributed

99

Introduction

CARROT Broker

KQML TKQML

TK GUI

broker.cTCLAgent

Telltale

Network Metadata

• Collects metadata from back-end agents

• Uses Telltale to manage these metadata corpora

• Otherwise similar to back-end agents

• Can be organized in hierarchies

Distributed

100

Introduction

Telltale

• Telltale is an information retrieval engine designed for– scalability

– use with a wide variety of document types and languages

– embedding in larger systems

– generating corpus metadata

• Some key features include– vector space model for representing documents and queries

– use of character n-grams

– use of corpus “centroids” as metadata

– Tcl/Tk user interface

– Agent API using KQML

101

Introduction

n-grams vs. words• An IR system can use n-grams or words as terms

• Advantages of n-grams– Don’t need a morphological model (e.g., for stemming) so good for multi-linguistic

environment or non-language corpora (e.g., Java code).

– Robust with respect to letter errors (typos, OCR errors, etc)

– Provides some context since they span adjacent words“computer science” --> compu + … + ter_ s + er_sc + r_sci + _scie + ...

• Advantages of words– Can be more precise (“computation” but not “computer”)

– Amenable to boolean combinations

• Bottom line?– Depends on specifics of application

102

Introduction

Desiderata for Corpus Metadata

• Effective - metadata accurately predicts corpuscontent

• Concise - metadata can be quickly transmitted tobrokers

• Abstractable - to support hierarchies of brokers

• Interchangeable - applies to queries, documents,and corpora in the same manner

• Generatable - automatically

• Versatile - applicable to a wide variety ofdocuments

Corpus centroids based on n-grams satisfy thesecriteria



103

Introduction

Telltale user interface

Query

Currentdocument Documents in

corpus, sorted bysimilarity to query

Show highlights

Set thresholds

Functions

Relevancefeedback

highlightedword or phrase

104

Introduction

VR based visualization of retrieval

• The only way to comprehend a largecorpus or result set is throughvisualization.

• SFA, for example, provides

– Real-time, interactive stereo viewing ofresults of information retrieval engine

– Each document returned is rendered as aglyph (icon)

– Document properties mapped to 3Dlocation, shape, color, transparency, andtexture.

– Spatialization of complex relationships andcomprehensible display of multiplevariables

105

Introduction

VR Approach

• Immersive– Isolates the user from environment– Expensive

• Minimally-immersive– Access to environment– Collaboration possible– Low cost– Two hands give proprioception

• Uses Two 3D Trackers with Buttons– User manipulates 3D scene with trackers

– Each hand has a distinct role -- left sets upcontext and right performs fine manipulation

106

Introduction

Visualizing a document spaceMappings

X: similarity to “federalreserve bank”

Y: similarity to “commodityprices”

Z: similarity to “foreignexchange rate of thedollar”

Shape: similarity to “coupattempt against Noreiga”with cube as lowest andcone as highest

Color: age of document withblue as the oldest andyellow as the newest

Transparency:Texture:

107

Introduction

Conclusions• ABIR provides advantages

– Handles dynamic, distributed, heterogeneous corpora

– Better performance, via specialized agents

– Supports user pull and data push

– Can adapt to users interests and preferences

• Enabling technologies are here– Agent communication languages and protocols, e.g. KQML

– Semantic markup languages and ontologies, e.g. SHOE, XML, RDF

– Machine learning methods and algorithms

• What’s next?– We need better ways to represent and process metadata

– Ability to handle rich media -- images, speech, video, data, etc.

– Intelligent fusion across heterogeneous sources

– More experience through building real systems in this paradigm

108

Introduction

Conclusions



109

Introduction

Conclusions

• The evidence suggests that agents can help withsome interesting problems

• Growing consensus on agent varieties andattributes

• Enabling technologies are here:– Agent Communication Languages, e.g. KQML

– Ontology software and standards (e.g., Ontolingua, XML, RDF)

• Example systems are being built in a variety ofrelevant tasks, including information retrieval,filtering, and routing, intelligent user interfaces,and ontology construction.

110

Introduction

Not a Magic Bullet

• We can’t point to a single task or problem orcapability that requires an agent-oriented approach.

• However, we believe that many tasks will beeasier if we follow this paradigm.

• Moreover, the resulting system should be easier toextend to new capabilities.

111

Introduction

Research Issues ABIR

• Questions of scale– Metadata

– Data fusion

– Performance on interesting (large) corpora

– Architectural issues

• User interface

• Multi-lingual corpora

• Multi-media documents

112

Introduction

Research Issues inAgent-Based Computation

• Social acceptance

• Generic vs. specialized approaches andarchitectures

• Standards for languages and protocols

• Controlling agents in an open environment

• Social models for large sets of agents– Economies or ecologies?

• More experience with non-trivial systems

113

Introduction

To Learn More

• See http://www.cs.umbc.edu/abir forinformation on agent-based informationretrieval

• See http://www.cs.umbc.edu/agents forgeneral information on agents

114

Introduction

Tim FininDr. Timothy Finin is a Professor of Computer Science andElectrical Engineering at the University of MarylandBaltimore County. He has had over 25 years of experiencein the applications of Artificial Intelligence to problems indatabase and knowledge base systems, intelligentinformation systems, expert systems, natural languageprocessing, intelligent interfaces and robotics. He iscurrently working on the development of technology tosupport intelligent information agents. Prior to joining theUMBC, he was a Technical Director at the Unisys Centerfor Advanced Information Technology, a member of thefaculty of the University of Pennsylvania, and on researchstaff of the MIT AI Lab. He holds a PhD in ComputerScience from the University of Illinois. Finin is the authorof over 100 research publications. He has been chair orprogram chair of several conferences in the area ofintelligent systems and will serve as technical co-chair ofAutonomous Agents-98.



115

Introduction

James MayfieldDr. James Mayfield is an Associate Professor in theUMBC CSEE Department currently on leave at theJohns Hopkins Applied Physics Laboratory. Hereceived a Ph.D. in Computer Science from theUniversity of California at Berkeley in 1989.Mayfield’s dissertation, which was part of the UnixConsultant project, explored how a consultant systemcan recognize the plans and goals of its users based ontheir English queries, so as to more effectively addresstheir needs. Mayfield also has extensive researchexperience in developing applied natural languageprocessing systems and participated in the third, fourthand fifth ARPA-supported Message UnderstandingConferences (MUC3, MUC4 and MUC5). Mayfield hasorganized four workshops in the area of "NaturalLanguage Text Retrieval", "Intelligent HypertextSystems" and "Intelligent Information Agents".

116

Introduction

Charles Nicholas

Dr. Charles Nicholas is an AssociateProfessor of Computer Science andElectrical Engineering at UMBC. Hereceived a Ph.D. in Computer Science fromThe Ohio State University in 1988.Nicholas was (is) General Chair of theACM Conference on Information andKnowledge Management CIKM’95,CIKM’96 and CIKM’97, and Co-Chair ofthe Principles of Document ProcessingWorkshop PODP’96 and PODP’98. Hisareas of interest include informationretrieval, electronic document processing,and software engineering.