Top Banner
Worldwide Lexicon Brian McConnell May, 2002
24

Worldwide Lexicon Brian McConnell May, 2002. WWL – Brian McConnell Worldwide Lexicon Intro Automatic discovery of dictionary, semantic net and translation.

Jan 29, 2016

Download

Documents

Peter Lester
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Worldwide Lexicon Brian McConnell May, 2002. WWL – Brian McConnell Worldwide Lexicon Intro Automatic discovery of dictionary, semantic net and translation.

Worldwide LexiconBrian McConnell

May, 2002

Page 2: Worldwide Lexicon Brian McConnell May, 2002. WWL – Brian McConnell Worldwide Lexicon Intro Automatic discovery of dictionary, semantic net and translation.

WWL – Brian McConnell

Worldwide Lexicon Intro

• Automatic discovery of dictionary, semantic net and translation servers throughout the net

• Creates standard client/server interface for communicating with servers

• Creates distributed human computing grid (allows servers to poll idle users to enter data, score recent submissions)

• “GNUtella for language services”

Page 3: Worldwide Lexicon Brian McConnell May, 2002. WWL – Brian McConnell Worldwide Lexicon Intro Automatic discovery of dictionary, semantic net and translation.

WWL – Brian McConnell

What WWL Does

• Creates a SOAP based interface for locating and communicating with language services

• Creates mechanism for discovering WWL servers on the fly

• Allows any application to talk to language servers with a few lines of code

• Allows existing dictionaries and MT systems to expose their data via WWL

• Creates something similar to SETI@Home, except it taps idle users to contribute knowledge

• Creates a web services API for language services

Page 4: Worldwide Lexicon Brian McConnell May, 2002. WWL – Brian McConnell Worldwide Lexicon Intro Automatic discovery of dictionary, semantic net and translation.

WWL – Brian McConnell

What WWL Does Not Do

• Does not create a global, centrally managed dictionary (WWL is a P2P network of dictionaries and language servers)

• WWL does not provide machine translation services (although WWL can be used to talk to existing MT servers)

• WWL does not compete with existing dictionaries or translation services. It makes existing systems more accessible to applications and their users.

• WWL does not specify details about how dictionary and MT server internal processes

Page 5: Worldwide Lexicon Brian McConnell May, 2002. WWL – Brian McConnell Worldwide Lexicon Intro Automatic discovery of dictionary, semantic net and translation.

WWL – Brian McConnell

Some Example Applications

• Browser and text editor plug ins

• Extended dictionaries for machine translation systems

• Human assisted document translation

• Lexicon@Home client (polls users to enter data when they’re not busy)

• Multilingual chat clients (poll WWL data sources as needed to assist with translations)

• Real-time translation (via Jabber or SMS)

• Teaching aids

• User supported dictionaries and translation memories

Page 6: Worldwide Lexicon Brian McConnell May, 2002. WWL – Brian McConnell Worldwide Lexicon Intro Automatic discovery of dictionary, semantic net and translation.

WWL – Brian McConnell

Worldwide Lexicon Protocol

• Built upon the Simple Object Access Protocol

• Applications communicate via a small set of SOAP methods

• HTTP CGI interface also used for data entry and user peer review

• Goal: allow developers to locate and query any WWL data source with a few lines of code.

Page 7: Worldwide Lexicon Brian McConnell May, 2002. WWL – Brian McConnell Worldwide Lexicon Intro Automatic discovery of dictionary, semantic net and translation.

WWL – Brian McConnell

Protocol Overview

• Three types of methods

• WWL server discovery and network status methods

• WWL client/server query methods

• Utility functions

• About a dozen methods overall

Page 8: Worldwide Lexicon Brian McConnell May, 2002. WWL – Brian McConnell Worldwide Lexicon Intro Automatic discovery of dictionary, semantic net and translation.

WWL – Brian McConnell

System Overview

• Four basic types of nodes

• Supernodes (directory servers)

• WWL servers (dictionaries, MT servers, semantic nets)

• Gateways (allow non-WWL servers to present WWL front end)

• Client apps (plug ins, IM clients, Lexicon@Home, etc)

Page 9: Worldwide Lexicon Brian McConnell May, 2002. WWL – Brian McConnell Worldwide Lexicon Intro Automatic discovery of dictionary, semantic net and translation.

WWL – Brian McConnell

WWL Server Discovery

• Client app contacts a WWL supernode

• Invokes WWLFindServers() to fetch list of active servers and gateways that can process client’s request

• Supernode replies with a list of WWL servers, as well as information about each server’s capabilities

• WWL servers and gateways announce selves to supernodes at startup via WWLRegister() and WWLServerStatus() methods

Page 10: Worldwide Lexicon Brian McConnell May, 2002. WWL – Brian McConnell Worldwide Lexicon Intro Automatic discovery of dictionary, semantic net and translation.

WWL – Brian McConnell

WWL Supernodes

• Track current status of WWL servers and their peers (servers send registration and status messages)

• Client apps use supernodes to locate WWL servers and gateways on the fly (e.g. locate Spanish-French full-text translation server)

• Supernodes also provide quality control (known WWL servers are listed first)

• Anyone can host a supernode (similar to GNUtella directory servers)

Page 11: Worldwide Lexicon Brian McConnell May, 2002. WWL – Brian McConnell Worldwide Lexicon Intro Automatic discovery of dictionary, semantic net and translation.

WWL – Brian McConnell

WWL Gateways

• Translate WWL/SOAP method calls into other formats

• Can be used to talk to DICT dictionary servers

• Can be used to talk to proprietary systems

• Can do screen scraping (e.g. send query to web based MT server via CGI, scrape results from HTML response)

• Can even be used to cache and index static wordlists, and to make them appear to users as WWL data sources to any WWL client

Page 12: Worldwide Lexicon Brian McConnell May, 2002. WWL – Brian McConnell Worldwide Lexicon Intro Automatic discovery of dictionary, semantic net and translation.

WWL – Brian McConnell

Client/Server Communication

• Three SOAP methods allow clients to submit queries to WWL servers via standard interface.

• WWL servers reply via SOAP, results are returned to client app in XML data structure

• WWL interface can co-exist with other interfaces (DICT, web/cgi, WAP, etc)

Page 13: Worldwide Lexicon Brian McConnell May, 2002. WWL – Brian McConnell Worldwide Lexicon Intro Automatic discovery of dictionary, semantic net and translation.

WWL – Brian McConnell

Typical Client Session

• Contacts WWL supernode(s) to fetch list of active WWL servers according to language, services required

• Contacts top ranked WWL server to perform query (e.g. translate phrase from spanish to french)

• If query fails, contacts other WWL servers to perform query

Page 14: Worldwide Lexicon Brian McConnell May, 2002. WWL – Brian McConnell Worldwide Lexicon Intro Automatic discovery of dictionary, semantic net and translation.

WWL – Brian McConnell

Application Development

• WWL defines a client/server interface

• Client and server apps can be developed and tested independently

• System is complex, but individual components are simple

• Perfect fit for open source development model

Page 15: Worldwide Lexicon Brian McConnell May, 2002. WWL – Brian McConnell Worldwide Lexicon Intro Automatic discovery of dictionary, semantic net and translation.

WWL – Brian McConnell

Server Apps & Projects

• Updating existing dictionaries and machine translation servers for WWL and Lexicon@Home

• Building gateway servers that emulate WWL while talking to non-WWL servers (DICT, HTTP, etc)

• Document translation servers based on Lexicon@Home concept

Page 16: Worldwide Lexicon Brian McConnell May, 2002. WWL – Brian McConnell Worldwide Lexicon Intro Automatic discovery of dictionary, semantic net and translation.

WWL – Brian McConnell

Client Applications

• Browser/text editor plug ins

• WWL chat clients

• Lexicon@Home clients

• Teaching aids

Page 17: Worldwide Lexicon Brian McConnell May, 2002. WWL – Brian McConnell Worldwide Lexicon Intro Automatic discovery of dictionary, semantic net and translation.

WWL – Brian McConnell

Updating Existing Servers

• As simple as adding a few scripts to respond to SOAP calls (reply via SOAP versus HTML)

• SOAP/WWL interface co-exists with other front ends

• WWL server can be read-only, or can allow user data entry through Lexicon@Home initiative

• Allows hundreds of existing dictionaries, encyclopedia and machine translation servers to participate in WWL with minimal effort

Page 18: Worldwide Lexicon Brian McConnell May, 2002. WWL – Brian McConnell Worldwide Lexicon Intro Automatic discovery of dictionary, semantic net and translation.

WWL – Brian McConnell

Example: WWL Chat Client

• Listens to incoming and outgoing messages

• When user enables translation, IM client uses WWL to contact machine translation servers as needed

• When user enables dictionary features, IM client assists user in translating words and phrases when composing messages (ideal for users who know a language but are not fluent)

Page 19: Worldwide Lexicon Brian McConnell May, 2002. WWL – Brian McConnell Worldwide Lexicon Intro Automatic discovery of dictionary, semantic net and translation.

WWL – Brian McConnell

Lexicon@Home

• Distributed human computing

• Users download small client program that polls WWL server(s) for jobs when user is not busy

• When WWL server has job, it instructs Lexicon@Home client to force browser to form/CGI user (data entry form is generated by WWL server)

• User enters requested information (definition, translation, score for other user’s submission)

• Each user does small amount of work, with large population system learns at rapid pace

Page 20: Worldwide Lexicon Brian McConnell May, 2002. WWL – Brian McConnell Worldwide Lexicon Intro Automatic discovery of dictionary, semantic net and translation.

WWL – Brian McConnell

Quality Control

• Editorial oversight (WWL servers can require some or all user submissions to be reviewed by editors and trusted users via private CGI form)

• Randomized peer review (WWL server asks some lexicon@home users to score submissions from the peers.

• Hybrid system that combines randomized peer review with editorial oversight (editors focus on submissions with ambiguous scores or from unknown users).

Page 21: Worldwide Lexicon Brian McConnell May, 2002. WWL – Brian McConnell Worldwide Lexicon Intro Automatic discovery of dictionary, semantic net and translation.

WWL – Brian McConnell

Project Timeline

• WWL protocol spec is available at www.worldwidelexicon.org

• Work to develop first generation apps (supernodes, retrofit existing dictionary servers) is underway

• Work to develop Lexicon@Home client is in progress

• Looking for developers to contribute to project

Page 22: Worldwide Lexicon Brian McConnell May, 2002. WWL – Brian McConnell Worldwide Lexicon Intro Automatic discovery of dictionary, semantic net and translation.

WWL – Brian McConnell

Development Priorities

• Stable supernode server

• Source libraries for use by existing dictionary and translation servers

• WWL gateway servers (to talk to non-WWL sites)

• Lexicon@Home client

• Simple client apps (browser plug in, IM client that links to MT servers)

Page 23: Worldwide Lexicon Brian McConnell May, 2002. WWL – Brian McConnell Worldwide Lexicon Intro Automatic discovery of dictionary, semantic net and translation.
Page 24: Worldwide Lexicon Brian McConnell May, 2002. WWL – Brian McConnell Worldwide Lexicon Intro Automatic discovery of dictionary, semantic net and translation.

WWL – Brian McConnell