Top Banner

of 25

Screen Supp

Aug 08, 2018

Download

Documents

Mydays31
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 8/22/2019 Screen Supp

    1/25

    LANGUAGE TECHNOLOGY

    The Guide fromThe Guide fromMultiLingual Computing & TechnologyMultiLingual Computing & Technology

    GETTINGSTARTEDGU

    IDE

    What Is Language Technology?Chris Langewis

    Comparing Ba sic Fea tures of TM ToolsAngelika Zerfa

    Technology and the Freelance Translator

    J onathan T. Hine, J r.

    Finding the Right Language Tec hnology ToolsDavid Shadbolt

    44

    1111

    1155

    1818

    Oc tobe r/Nove mbe r 2002Oc tob er/Nove mbe r 2002#51 Supplement#51 Supplement

    GETTINGSTARTEDGU

    IDE

    contents

    contents

  • 8/22/2019 Screen Supp

    2/25

    GETTINGSTARTEDGU

    IDE

    MultiLingual

    Computing &Tec hnology

    Publisher / Editor: Donna Parrish

    Editor-in-Chief: Seth Thomas Schneider

    Associate Publisher:J ennifer Del Carlo

    Managing Editor: Laurel Wagers

    Associate Editor:J im Healey

    Editorial Board

    StaffBecky BennettSandy Compton

    Kendra GrayOlya HelmsDoug Jones

    J erry Luther

    Courtney McSherryBonnie MerrellSarah Ragan

    David ShadboltAric Spence

    Cecilia Spence

    J eff AllenHenri Broekmate

    Bill HallAndres Heuberger

    Chris Langewis

    Ken LundeJ ohn OConner

    Mandy PetChris Pratley

    Reinhard Schler

    Printed in the USA

    This guide is published as a supplement toMultiLingual Computing & Technology,

    the magazine aboutlanguage technology,Web globalization

    and international software development.

    To subscribe toMultiLingual Computing & Technology,

    go to www.multilingual.com/subscribe

    or send your name and address to

    MultiLingual Computing, Inc.319 North First Avenue

    Sandpoint, Idaho 83864 USA208-263-8178 Fax: 208-263-6310

    [email protected]

    Subscription rates:One year (eight issues) $58

    Two years (sixteen issues)$100Outside the United States,

    add $20 per year for postage

    RALPH M CELROY TRANSLATION COM PAN Y

    All Languages Website Experti se

    Experienced Project Management

    Sophisticated Production Capabili ties

    34 Years of Experience

    910 West Ave., Austin, Texas 78701 1-800-531-9977 www.mcelroytranslat ion.com

    EXCELLEN CE WITH A SEN SE OF URGEN CY

    LANGUAGE TECHNOLOGY

    Kevin Watson

  • 8/22/2019 Screen Supp

    3/25

    33

    GETTINGSTARTEDGU

    IDE

    LANGUAGE TECHNOLOGY

    I am not a very handy person. So, when I walk into a hardware store, I am amazedat the choice of tools, implements and machines that are available.The usefulness of someof the tools is inherently obvious.We can all appreciate a fine well-crafted hammer andimagine how it can make a carpenters job easier.Some of the things I find there are a totalenigma to me. What does it do?Why would I want it to do that?And, then, once I final-ly understand its function, what clever person designed this?

    I think this must be what it is like to approach the choices of language technology prod-ucts. The range of capabilit ies and applications is mind-boggling.So,weve put together thissupplement to introduce you to the field of multilingual language technology.

    Chris Langewis gives us an introduction to language technology with definitions of allthe types of tools and descriptions of how they have evolved to make the language profes-sionals job easier. Translation memory products are some of the most useful tools, andAngelika Zerfa describes their common features. Jonathan Hine walks us through the tech-nology used by a typical freelance translator and how that translator makes the tools work. So,once you have a grasp of this technology, what companies produce what?David Shadboltintroduces us to some of the producers in the industry.We follow that with a table of productlistings to give you an idea of what is out there.We gathered much more information than isin the table as you can see at www.multilingual.com/ltDetail

    It might be helpful to view the field of products as the parts forming a modern trans-portation bridge. The foundation is made up of engineering tools and computer resources

    font libraries,character sets,internationalization andlocalization routines that help form the basis of

    translation and localization processes.The hor-izontal girders are comprised of translationsoftware such as dictionaries, translationmemory databases, machine translation

    products or desktop suites of these tools.All of these help to form the road thatcarr ies localized software, e-mail, data,documents and other translated mate-

    rials. The bridges suspension structurerepresents the emerging array of multi lingual workflow

    systems, including content management, project management andworkflow monitoring. Together these products are changing the nature of

    communications by creating a high-speed bridge across the multil ingual gulf.There is a Chinese proverb: To do good work, one must first have good tools. There

    are some fine,well-crafted language technology products available. Its time to pick up theones you need and do some good work! Donna Parr ish, Publisher

    CHRIS LANGEWIS is vice president for marketing at Logos Corporation, teachescomputer-assisted translation at the M onterey Institute of International Studies and isa member of the MultiLingual Computing & Technology editorial board. He can bereached at [email protected]

    ANGELIKA ZERFA is a freelance consultant and trainer for translation tools andlocalization-related processes. She can be reached at [email protected]

    J ONATHAN T. HINE, J R. is a freelance translator and also teaches technical trans-

    lation and business organization for language mediation professionals. He can bereached at [email protected]

    DAVID S HADBOLTis a research editor for MultiLingual Computing & Technologyandcan be reached at [email protected]

    Langewis Hine

    La ngua ge TLa ngua ge Technologyechnology

    a

    uthors

    authors

    Zerfa Shadbolt

  • 8/22/2019 Screen Supp

    4/25

    I remember a conversation at a major busi-ness conference in the 1980s. I was sitting with agroup of executives from Europe, Asia and theUnited States.We were discussing the problemsassociated with language and translation of ourvarious high-tech product lines across the vari-ous languages of our markets.One opinion that

    surfaced caught me by surprise that thewhole issue of translation in our communitywould become academic since English wouldbecome the universal business language any-how.A lively discussion ensued, needless to say.We concluded that even though many people inmany countries could speak English, they didnot necessarily wish to use that foreign languagein their daily routines. If you are to successfullysell your product in a foreign market, it shouldbe adapted to the local user environment.

    History has since then confirmed that ourlanguage, localization and translation issues

    have not only stayed with us, but have dramati-cally increased in some critical aspects.Some ofthe phenomena conspire to seriously challengetodays multinational marketer. The productsthat we market and the format of their translat-able content have become much more complex.The time-to-market requirements have short-ened dramatically. The volume of the contentthat requires translation has grown exponential-ly.The subject matters of the translatables coveran increasing range of topics and fields.Products are going into a larger number of for-eign markets,some of whose languages are con-

    sidered to be exotic.So, we are faced with localizing a growingnumber of increasingly large,complex productstargeted to more exotic foreign markets, and ithas to happen faster. Sounds like a challengingsituation, which portends to strain both work-force and budget.In response to these issues,lotsof energy and thought have gone into the ideathat a technological solution might exist toresolve this dilemma,inventing thereby the fieldof Language Technology. One definition ofthis isa computer based tool or program whichsupports humans who are involved in adapting a

    product and i ts language content for optimumusabili ty in a foreign market.

    Many products and solutions have beendeveloped that are designed to address the

    problems of getting your products into foreignmarkets more quickly and elegantly. But whichproduct do you choose?How do you integratethese solutions into your business processes?How,exactly,does one address an issue as com-plex as human language combined with high-tech (or even low-tech) products?First, lets

    analyze and define the problem a bit so that wecan segment the issues into manageable pieces.Regardless of why we send a product over-

    seas, it has a human language component(called content, generally) that needs to betranslated into multiple alternate languages.This content could be manifested as a printedpage, a Web site, the user interface of a softwareproduct, on-line material or any number ofother representations. The problem is not onlythat the translation of the content is a complexissue, but also the process of re-publishing theproduct in each alternate language is in itself a

    very complex process. In fact, if you start with aproduct in one language and re-publish it inanother language, only 25% or so of this entireprocess is actually language work, translation.The overall process consists of many steps,including content extraction, project and work-flow management, translation, product recon-stitution, review,QA and testing.

    Where and how do we apply languagetechnology?The trick is to identify the criticalpath in the overall process and to isolate the nec-essarily human tasks versus those tasks that canbe handled by technology. We can divide the

    field into categories,even though there are over-laps where a given tool might appear more thanonce.But generally the categories are translationtools, engineering resources and localizationtools and multilingual workflow systems.

    TTranslationranslation

    Translation is the process whereby themeaning, style and information of a piece of wri t-

    ten text in one human language are reproduced in

    another wri tten human language.

    In the old days a translator would receive

    a stack of paper containing text of some kind,and would be asked to return a stack of paperwith that text in a different language. A type-writer might be used, but it might also have

    been a pen or pencil. In the process of translat-ing,a dictionary and thesaurus would be used,and the translator conferred with colleagues anddocument authors to resolve terminology andmeaning issues. Unique terminology (specificto the subject matter of the assignment) wouldbe captured bilingually on file cards for future

    reference and to ensure consistency of use in thedocument being translated. The whole processrequired static text and took a long time.

    This process worked for centuries. Imagesof monks using quill pens by candlelight, bentover scrolls of paper, translating the Bible comesto mind. They had time on their hands. Thesource documents were fairly static very fewupdates were issued (presumably). But moderntimes are more difficult. Computers with wordprocessors came along.Documents proliferated,and translations of them were needed rapidly.Translators now needed computers and WP sys-

    tems to keep up with demand.One very criticalfact was learned in all this that a persons out-put of translated material is basically the samewhether you type on a computer or type on atypewriter or write with a pencil. Generally,2,000 words per day is the amount of produc-tivity you can expect,with variations dependingupon language, subject matter and complexity.This fact dictates much of the critical path inorganizing projects.

    To complicate matters, documentscould change during the translation process.When a change was made in a source docu-

    ment (an update) in mid-process, it wreakedhavoc. Do you cut and paste the new materi-al into your work in progress? Were theredeletions, changes?Where are all the differ-ences? Just finding the changes is very timeconsuming and error prone. So, lets inventsome language technology to address thisand other issues.

    Along comes the first wave of translationtools, one of the main categories within the lan-guage technology field.

    TTranslation Translation Toolsools

    Translation tool is the label given to thatcategory of computer program that supports

    humans who are performing translat ion tasks.

    44

    GETTINGSTARTEDGU

    IDE

    LANGUAGE TECHNOLOGY

    Wha t IsWha t Is

    Language TLanguage Technology?echnology?Chris LangewisChris Langewis

  • 8/22/2019 Screen Supp

    5/25

    Key BenefitsDistributed Environment

    Your translators can work frommultiplelocations anywhere in the world and canshare the same memory. Translationmemories can also be located anywherein the world.

    FlexibilityThere is no restriction on TM software

    used by your translator. You have a widerchoice of translators for a project.

    Ease of useNo retraining necessary! Translators workin MS Word.

    ManagementThere is a significant reduction of projectmanagers' technical and preparation tasks.

    QualityTranslations will benefit fromincreasedleveraging and consistency.

    Control & SecurityYour translation capital is protected. YourTM is stored safely on your disks, so thereis no need to distribute them.

    T-Remote Memory

    Seehttp:/ / www.t elelingua.comfor more information.B R U S S E L S - P A R I S - P R I N C ET ON

    Next Generat ion

    Translation Memory Systems

    Or chestr at i ng Languages

    Run your translation team in real time and share thesame data.

    T-Remote Memory is powered by the Internet andallows you to run several translation memories,terminology databases and MT output simultaneously!

    Translators work from remote locations in real time,and project managers manage just a single database.

  • 8/22/2019 Screen Supp

    6/25

    The tools merely provide environmentswhich enhance the ability of humans to pro-vide timely translations of high quality in acost-effective manner.Translation memory (orsentence memory), terminology managementsystems,bilingual editing systems and machine

    translation are examples of this type of tool.Interestingly, these tools can have dramaticimpact on how you approach and conducteverything from product development tomulti lingual publishing.

    Translation Memory. Translation mem-ory (or sentence memory) is the label givento that category of computer program thatstores sentences, along with the translation ofeach, in a database.

    The single most important and usefultool in this category is translation memory(TM). This tool is quite simple in principle,

    yet can become very sophisticated in practice.TM is a database into which a translatorplaces sentences from a source documentbeing translated and then correlates corre-sponding translated sentences (target sen-tences) wi th each source sentence. When thedocument translation is finished, you have adatabase which correlates the entire sourceand target document sentence by sentence.However, this does not get the project doneany faster. You can sti ll get only 2,000 wordsper day of translations created. But severalextremely powerful benefits are endowed

    upon future processes.You may be able to getthe job donesooner, and you might save a lotof time and money.

    Since we cant materially change transla-tors productivity (words per day), we mustextract increased productivity by insulatingthe translator from process-related issues.Wedo this by finding ways to invoke the ThreeGolden Rulesof translator productivity: 1)Deal with fewer words.2) Avoid repeti tion.3)Avoid non-translation-related tasks. In fact,to qualify as a translation tool, one of theserules must be invoked. If you cant relate the

    effect of a given tool to one of the GoldenRules, it wont be of any help to translationproductivity. To see how TM fits these crite-ria, here are a couple of examples.

    Lets assume your translator is partly donewith the translation of a project (which couldbe a document, a Web page, strings from aresource file or any other content).The contentowner decides that some changes are needed inthe source material and delivers an updatedsource file to the translator.This is traditionallythe worst possible scenario.Time will be wastedeither by starting over, or by researching

    changes, cutting, pasting and so on. Theseactivities are repetitive. They increase the wordcount being dealt with, and they involve non-translation work.

    But the project has been done usingTM. We simply stop work on the originaldocument and start with the new version ofit . The TM system wil l look at each sentenceof the new document and compare it towhats in the database. If it finds an exactly

    matching source sentence, then i ts previoustranslation is sti ll correct, so it is used in thenew documents translation automatically(leveraged)without translator intervention.If a sentence in the new document is almostthe same as the original (a fuzzy match),the original translation is reused with a noteto the translator to check and edit it to reflectthe changes. Any new content is simplypassed through to be translated.In this man-ner, the update causes almost no disruptionto the translators productivi ty. All pri orwork is recycled into the updated project. All

    of our Golden Rulesare actively applied: notranslation work was repeated, very l it tlenon-translation related work was done, andthe translator could focus on translatingonly new words.

    The power of this capability lies in theeffect it has on overall process options.The abil-ity to deal elegantly with updates enables you tobegin a translation project on unfinished con-tent, thereby starting sooner and finishingsooner, crit ical in managing time-to-market ofcritical materials. Further, translating futurereleases of similar materials may be greatly sim-

    plified. Any content in a new release that is thesame or similar to an older version will enableTM to leverage its translation, thereby savingmuch time and money. Also, if you have prod-ucts that share pieces of content, you need onlyto translate this content once and then leveragethat translation into each recurring instance.

    Some TM systems come with a built-inediting environment so that translators viewand translate content within the TM session.Other TMs use third-party editors such asMicrosoft Word to view and edit, but then cap-ture and catalog the results in a database for you.

    In any case, the abili ty to view and edit bilin-gually is critical to terminology management.Terminology Management (Glossary or

    Dictionary). Terminology Management isthe label given to programs that catalogwords and phrases along with pertinent relat-ed information in a database in a mannerconducive for use in l inguistic applications.

    Another powerful feature that many TMsystems employ is a Terminology Manager.Using terminology consistently within agiven project is critical to achieving a highquality of translation. In going from one

    human language to another, any given wordmight have multiple meanings and thereforewil l have mult iple possible translations. Forexample, in English the word polemight

    mean stickor rod, but it might also refer toone end of a magnetor to the top or bottom ofthe planet Earth. Each of these would requirea different translation depending upon con-text. Also, new words are invented whichrequire creative translation that must then be

    applied consistently, not only within theproject at hand, but subsequently amongsimilar projects as well.

    Instead of relying on note cards to cata-logue these details, the terminology manage-ment software can be used to create glossaries.Each word with unique usage in your partic-ular content can be placed in a glossary, beassigned a subject matter, translation, defini-tion, and other characteristics. The glossaryinteracts with TM so that translators arereminded of the proper translation for termsthat appear in the content being worked on.

    This helps to create consistency and qualitywithin a given project, among a series ofprojects, and within a workgroup translatingin parallel. The content owner can archive theglossaries and reuse them in future works,thereby saving a great deal of time andmoney by avoiding rework.

    Machine Translation. Machine Translation(MT) is the label given to that category of com-puter program that converts a sentence of

    human language into an equivalent sentence in

    another human language.

    This writer teaches a translation technolo-

    gy course (Computer Assisted Translation) atthe Monterey Institute of International Studiesin Monterey, California. At the beginning ofeach semester,I ask the class to raise their handsif they believe that MT works. Typically, nohands go up. After all, only humans can reallytranslate. Why else are they at the Institute toearn a degree in the subject?However, by theend of the course,all hands go up in response tothe same question. The difference lies in know-ing when,why and how to apply the technology.

    MT has the appearance of producingflawed, even laughable translations. However

    well deserved this reputation, there is ample rea-son to believe otherwise. MT is like an automo-bile in that it is the driver that determines whereit goes and whether the trip is successful. Byunderstanding the critical path to maximizingresults, one gets radically different results fromthe same car or from an MT system.

    Without going into great detail , it s rele-vant to know what makes MT t ick. The basicidea is to create a translation of a whole sen-tence automatically. The various productsthat exist accomplish this somewhat differ-ently, but the general pattern is to parse the

    sentence into words; look the words up in acustom dictionary and establish possible rolesfor each word (noun, verb and so on); basedupon grammatic and syntactic rules of the

    66

    GETTINGSTARTEDGU

    IDE

    LANGUAGE TECHNOLOGY

  • 8/22/2019 Screen Supp

    7/25

    ITR International TranslationResources Ltd.

    1 Dolphin Square, Edensor Road, London, W4 2ST UK44-20-8987-8000 Fax: 44-20-8987-8080

    [email protected] www.itrblackjack.com

    Are you under pressure to:

    measure translation accuracy? improve t ransla t ion qua li ty? streamline the translation process?

    ITR BlackJack, translation qual i ty evaluationsoftware, enables reviewers to:

    objectively measure translation qual i ty andaccuracy against standards

    identi fy errors and their causes genera te e r ro r repor ts

    ana lyze resu l ts and take remed ia l ac t ion improve translation qual i ty and consistency speed up the val idation process test translators for recrui tment

    Works with MSWord and XML-tagged fi les.

    Asia CommunicationsP.O. Box 535, Station B,

    Ottawa, ON, Canada K1P 5P6819-777-2394 Fax: 819-777-9418

    [email protected] www.cjkware.com

    The Authoring, Teachingand Learning System

    ( for Chinese Japanese and Korean included)

    Fo r PC W i nd ow s 9 8 / M e / N T/ 2 0 0 0 / X P For Mac OS 9.1 & Mac OS X

    For the needs of writers, teachers, learners

    New E-Learn features Text To Speech 24 0,0 00 -word Chinese/ Engl ish dictionary which is

    constantly updated On-line reading tool w orks directly in y our browser A lot more than just a word processor Used in computer language labs worldwide

    Download your free 3 -month trial version.

    Fontlab Ltd.136 E. 8th Street

    Port Angeles, WA 98362877-366-8522, 509-272-3260 Fax: 509-272-3260

    [email protected] www.fontlab.com

    Fontlab DigitalTypography Tools

    Fontlab typography tools can be used to design,convert, manipulate, edit and create fonts f or everylanguage in the world. Work on Mac or PC withOpenType, TrueType, Type 1 and most bitmap fontformats. Specialized tools for

    the conversion of fonts betw een platforms andformats

    bitmap font editing, hinting, kerning, large (up to65 ,00 0 characters) fonts

    the creation of new glyphs from scans of type,handwriting and so on

    Custom font tools available.

    ITR BlackJack TranslationQuality Evaluation Software

    source language, establish the role of eachword or group of words (colocations); basedupon rules of grammar and the syntax of thetarget language, build a model of the translat-ed sentence; and populate the target sentencewith appropriate words from the dictionary,inflect and modify as necessary.

    The factors critical to making this processwork properly are 1) correctness and clarity ofsource sentence; 2) completeness and correct-ness of dictionaries; and 3) sufficiency of rule-base (grammar,syntax) that is being used.

    Given a perfect situation in the threefactors above, an MT system can very likelyproduce a perfect translation. But any smalldeficiency in any of these factors wil l result ina very poor translation. Using colloquialexpressions or slang will confuse most sys-tems, for example. If a word is found to bemissing from the dictionary (a new technicalterm, for example), its role and meaning can-not be established, so a very bad translationwil l result . Fragments of sentences are diffi-cult for MT systems to parse because therules of grammar generally dont apply.

    Human skills are required to comprehendthem.Worst of all is ambiguity.What is obvi-ous to a human may be confusing to amachine. For example, the sentence the boy

    threw the rock at the window and it broke isclear to a human who understands fragil ity.Amachine may need to guess whether the win-dow or the rock broke.

    Therefore, to maximize results you knowwhat needs to be done:ensure that all input sen-tences are grammatically correct and unam-biguous;ensure that all words in the sources areproperly represented in dictionaries; and adjustthe rulebase to cover any special writing style ofthe source content.

    Even given the optimum scenario of per-

    fect inputs and preparation, you still need toreview and post-edit the output to ensurequality,which can be a significant task in itself.

    Granted, all this can be difficult to accom-plish. It will take significant effort to glean anyreasonable results from an MT system.So,whenis it useful?A project might require turnaroundtimes that are not possible without MT. Theremay be times when a project is just too large forhumans, such as 500,000 pages of technicalmanuals. When optimized, MT is very fast. Ahigh-quality draft can be produced in minutes,leaving only the QA process. In short, large proj-

    ects with short turnaround requirements aregood candidates. Also,special projects that canaccept draft or gist translation, where less-than-perfect is acceptable,are good candidates.

    An example that I recently encounteredillustrates these points. A Japanese company isdeveloping a software product using two engi-neering teams,one in the United States and theother in Japan. Technical documentation is cre-ated for the product by each team in its nativelanguage as work progresses.The two documentparts comprise one manual. How does theEnglish team read the Japanese document (andvice versa) so that development can stay syn-chronized? At first, translators were hired totranslate all of the English content into Japanese,

    and vice versa, so that a complete English andJapanese document would be available.This wastoo slow to be useful,since it took weeks for oneteam to see what the other side was doing.It wasalso very costly.

    The solution was to use MT.All English isinstantly translated into Japanese and viceversa. At almost no incremental cost, a com-plete bilingual document is created in realtime. The translation is not perfect, but it issufficient for use by the engineers.Also,not allof the content is used by the opposite team, sowhy pay for perfection?Parts of the content

    which are critical can be post-edited byexperts when needed. This proved to be atimely and cost-effective solution and optimumapplication of MT technology.

    77

    GETTINGSTARTEDGU

    IDE

    LANGUAGE TECHNOLOGY

  • 8/22/2019 Screen Supp

    8/25

    Some additional ideas include gist transla-tion of e-mails; translation of newswire con-tent on the Internet; and on-line catalog itemdescriptions in fast-turnover sites. Generally,where speed is mission critical or volume ishigh,MT may offer a solution.Creative applica-

    tion is always needed,but can be rewarding.

    After the TAfter the Translationranslation

    Regardless of how we got there, letsassume we have our content translated. Wenow have the content of our Web site, doc-ument or software product in mult iple lan-guages. Even though the translation hasbeen completed, much work actuallyremains to be done. We need to put thecontent in a live environment to makesure it looks and reads right and to ensure

    that the resulting translated object func-tions as the original did. Also, we need toensure that the result ing translated productis what will be considered native lan-guage quali ty in the target market. This isespecially critical with software and Web-site projects. Were very li kely to find someproblems. These problems that come tolight as a result of translation wil l requiresome engineering resources, such as local-ization or internationalization, to resolve.Fortunately, a category of language tech-nology has evolved to help us here.

    Engineering ResourEngineering Resourcesces

    Engineering Resources are those toolsand capabil it ies that are used in conjuncti on

    wi th t ranslati on acti vi ti es to adapt a domes-

    ti c product to a foreign market envi ronment

    so that i t has the same techni cal suit abil i ty

    as a product that has been created in that

    foreign market.

    Localization andLocalization andInterInternationalizationnationalization

    Localization is that process whereby anobject such as a computer program, mul time-

    dia presentati on or document i s not only trans-

    lated, but also is adapted to another culture or

    foreign locale. Such things as character set han-

    dling, date formats, metr ic/Engli sh measures,

    comma/decimal conventions and other ele-

    ments are considered in additi on to the normal

    language issues.

    Internationalization is that processwhereby an object is designed and developed

    in such a manner to minimize or eliminate

    the need to perform specific localizationtasks after tr anslat ion.

    A good example of a variety of localiza-tion and internationalization issues is found

    in software that is marketed in multiplecountries or locales. We have a combinationof program code and textual content. Thecontent shows up in menus, dialogs, on-l inehelp systems and more.

    Content Extraction. The first problem

    that arises is how to organize the content sothat translators can deal with it. In the olddays, content (menus, error messages and soon) was sprinkled throughout the programcode wherever the programmer decided toplace it. You needed a special extraction tool

    just to get to it and to put the translated textback where it belonged. Most of these toolswere custom made by the software develop-ers.Very rapidly,the cost and awkwardness ofthis process led to the first internationaliza-tion strategy put all the content in a fileand have the program refer to it and call it up

    as needed. So, the Resource File was created.Today,almost all software has content in sep-arate, easy-to-read files. Simply translate thisfile or fi les, preserve its format, and the pro-gram displays the foreign language contentduring program execution. Without theresource files, youll need to build or buyextraction tools.

    Coincidentally, TM tools perform manyextraction tasks for you. In fact, you mightchoose a particular TM tool because it is ableto deal with the part icular resource structureor text file that you have. For example,

    todays content is often held in HTML, XMLor other sophisticated formats.Most TM sys-tems can read and regenerate these structuresand thereby insulate the translator from thecomplexit ies of dealing with the embeddedcoding. In TM parlance, these extractiondevices are called fi lters. Before selecting anytool used in a language technology context,make sure that it has a filter for your particu-lar file formats.

    Character Set Handling. In most mod-ern software and Web-site environments,foreign character sets are handled at the

    operating system level or by the browser.Youmight still find some issues, however. If youstart with English or any language that uses asingle-byte character set, and the translationis into a double-byte set (such as Japanese orChinese), testing is a must. Even more prob-lematic are the right-to-left (Arabic) andbidi rectional scripts.Your program code maybe fooled by some of the characters encod-ing. In the case of Web sites that hold contentin a database, the exotic characters can foillookup routines and str ing handling code. Itswise to anticipate these factors.

    A code sweep may be in order wherea special tool scans your program code to iden-tify telltale areas where character encoding willtrip you up. Many developers have created

    custom tools to help with this, but many areavailable on the market as well. Of course, asdevelopers get experience with these factors,they anticipate and internationalize their codeby avoiding the pitfalls to begin with.

    Another very effective way to check your

    Web site, program, or any other object thatyou will translate is to pre-translate it andrun it through QA. This can be done veryinexpensively by running all of the contentthrough an MT system, available on-line formost languages at li tt le or no cost. Of course,you can always purchase a system for localuse at a variety of prices. In any case, yourenot concerned so much with the quality ofthe output, just with the character set andother factors unique to the language. If yoursoftware works properly with MT content, itshould be fine with human-translated con-

    tent as well.Why wait until it s too late to findout that a problem exists?Sizing Issues. One localization problem

    that often arises is related to size.As comparedto English especially,most other languages usemore space to say the same thing. Going intoEuropean languages may require 30% or 35%more room in a page, dialog box or menu. So,even if you have a perfect translation, it maylook corrupted when displayed in a dialog boxthats too small. You need to adjust the soft-wares program code to change the dimensionof dialog boxes or to change the size of a but-

    ton that has text on it. Also, as the spelling ofitems in menus changes, you may need torearrange the list or to assign new hot-keysequences. All of these activit ies typicallyrequire special training. If left to translatorsalone, it may prove to be very inefficient, yet itcan also be very inconvenient to tie up a devel-opment team with the tasks.

    Localization Tools. These problems havebeen so pervasive that special localization toolshave emerged to deal with them. These toolshave many traits in common, while each hasspecial features as well. Basically these tools are

    similar to TM in that they have filters thatenable you to read and extract content from avariety of fi letypes, and they use a database ofsome sort to house content and its translation.As with standard TM functionality, you canleverage translations between updates as youwork and between versions over time.

    What differentiates these tools is theirabil it y to operate in a WYSIWYG mode thesystem shows you the content that requirestranslation in context (like a dialog box ormenu) and enables you to translate it . At thesame time, you can easily adjust button and

    dialog box dimensions and configurations tosuit the new language size requirements.An innovation on the part of some local-

    ization products is that you do not need to

    88

    GETTINGSTARTEDGU

    IDE

    LANGUAGE TECHNOLOGY

  • 8/22/2019 Screen Supp

    9/25

    WH&PEspace Beethoven BP 102

    06140 Sophia Antipolis, France33-493-004-030 Fax: 33-493-004-030

    [email protected] www.whp.net or www.whp.fr

    Why Should YouWork With WH&P?

    Our network of dedicated local ization experts isused to t ight deadl ines, h igh-end technology andtop qua li ty you have read tha t be fo re .

    Well , we do i t consistently: WH& P has beenbenchmarked Best Localization Companyfor the second time in 3 years!

    Based in Sophia Antipol is, France, WH&P islocated in the hear t o f Europe . Since 19 94 , w e

    have provided turnkey local ization services toFortune 500 corporations.

    We help them to enter the complex and verydemand ing European marke t smooth ly and tostay on top .

    Follow-UpRua Mxico 31 / 403

    Rio de Janeiro, RJ, Brazil 20031-14455-21-2224-0225 Fax: 55-21-2507-2170

    [email protected] www.follow-up.com.br

    Wish to work with a company

    in the technical translation and softw are localiza-tion market for 13 years?

    with strong project management and qualityassurance policies?

    which counts on latest translation and localizationtechnologies and tools?

    in w hich all professionals, besides being expertsin their areas, are highly m otivated?

    So work with us!

    We have much more to offer!

    Fast, Accessible, High-QualityBrazilian Translations

    Connexor OyHelsinki Science Park

    Koetilantie 3, 00710 Helsinki, Finland358-9-37468500 Fax: 358-9-37468502

    [email protected] www.connexor.com

    Natural Language forSmart Apps

    Machinese interprets natural language for computers.By enriching text with simple and logical markup,Machinese helps design solutions for knowledgeman agement, human-computer interaction, speechproducts and so on.

    Informative: entities, terms, agents, objects,propositions, events, states, circumstances,semantics and ontology

    Multil ingual: English, French, German, Spanish,Italian, Dutch, Swedish and Finnish

    Numerous platform s and developer tools supported

    Try our on-line demos and contact us for licensinginformation.

    have the source code or even the developersresource files to be able to create a completelylocalized version of the product. You canoperate directly on the executable version.Also, at least one of the localization productscan do an automatic pseudo-translationofyour software product or Web site. Source-language text is partly replaced with charac-ters unique to your target language andstrings are expanded to mimic a translatedeffect. This way you can test run a productand see if any character set tolerance issues

    exist or if any reconfiguration is needed.Given a perfect scenario, developersinternationalize a product so that it willoperate correctly regardless of the languagethat its translated into or the locale in whichit operates. What the developers missremains as a localization task after the trans-lation is done. Fortunately, we have tools tohelp us here.

    So,we see that we can take either a wholeproduct or the language content of a productand get help from language technology in get-ting it translated and localized. But earlier it

    was noted that this is only a fraction of theentire process that ultimately fields a productin a foreign market. Organizations with multi-national experience have learned that much of

    the effort and cost is hidden in process man-agement.This has caused the emergence of yetanother category of language technology multilingual workflow systems.

    Multilingual WMultilingual Workflow Systemorkflow System

    Multilingual Workflow System is the labelgiven to that category of computer program which

    creates an environment that supports and orches-

    trates a range of activities that facili tates the devel-

    opment of mult il ingual products.

    The first time an organization decidesto convert a product (or translate a Website, for example) into a foreign languageversion, I call i t aStage A organization. Thetranslation is handled much like an event.You f irst f inish the source language version,freeze it , market i t domestically and thengive it to someone to locali ze. Some monthslater youre ready to go to your foreign mar-ket. By the time you get roll ing, a newsource version is ready, thereby obsoletingyour translated version. Oops. To avoidmulti lingual versionit is, we need to develop

    a way to release the product in multiplemarkets simultaneously by foreseeing theleadtimes and the necessary activities. Letsuse some language technology to increase

    productivity and set the stage to leveragefrom one version of product to the next.This would make you a Stage Borganiza-tion, where you actively anti cipate foreignmarket requirements.

    Youre successful with your overseasmarketing. You are now global; the planet isyour market. When you design product, youhave all of your markets in mind, and youdesign to local requirements, incorporatingfeedback from the field. You publish yourproducts multilingually rather than in one

    language and then in other languages. Youinternationalize your products to minimizelocalization issues that emerge from the trans-lation process.Youre now aStage Corganiza-tion. But you learn that managing the overallprocess is getting very cumbersome and diffi-cult. Many products, many languages, severaltranslation vendors, many groups and indi-viduals within your own organization shareownership of the overall picture.What now?

    Fortunately, there is some light on thehorizon.A new category of language technolo-gy is emerging the multilingual workflow

    system. Implementing this would put you atStage D. There are several possible incarnationsof the technology but the objective is simi larwith all,which is to smooth the process through

    99

    GETTINGSTARTEDGU

    IDE

    LANGUAGE TECHNOLOGY

  • 8/22/2019 Screen Supp

    10/25

    the pipeline from product development to thefinal delivery of localized product.

    The PrThe Processocess

    Even though we can isolate individual

    translation and localization tools and discusstheir benefits and features, thats sti ll just thetip of the iceberg. Its critical that these toolsand technologies are applied to create effi-ciency, but it s also necessary to make surethat they fit into the greater context of yourglobal plans and infrastructure.

    A sophisticated process flows all the wayfrom product planning to global delivery andwould contain the following elements: defineglobal product needs; assess cost/benefit perlanguage; design multi lingual publicationstrategy and process; design product with

    internationalization; identify translation/local-ization resources; deliver translatables to globalmanager; deliver files to translation/localiza-tion resources; translate/localize; manage anddistr ibute updates to work in progress; get anddisseminate status information; manage feed-back loop to product development and mar-keting; retrieve content fi les; build product totest; test feedback loop; QA feedbackloop; and publish/deliver.

    The pacing of items in managing thisprocess is often hidden in the logistics of filetransfer and control, and in managing the great

    variety of resources involved in the process.The most difficult thing is often being able toget the information you need about status.There are many parallel activities,and there is amix of in-house and outside resourcesinvolved. Disparate groups need to be closelycoordinated.What may happen in a complete-ly manual system is that a file may sit in some-ones e-mail box inactively for an amount oftime,may be misplaced or misdirected or maysimply flow slowly in the due course of activi-ty. Status information is not readily availableand must usually be synthesized by contacting

    mult iple resources. Often, one has to searchand seek out where something is. As theprocess increases in complexity (number ofproducts, number of languages), the projectmanagement becomes more difficult, timeconsuming, error prone, slow and expensive.At some point,automation becomes necessary.

    Fortunately,some solutions are emerging.The In-house System. Although not

    very interesting to the outsider,many Stage Dorganizations have developed their own sys-tems to deal with automating the workflow.The benefit of an in-house solution is that it

    can be tailored to meet the exact needs of theorganization and its process.The solutions are server based or Web

    based. The project manager defines a project,

    describes the objects that are elements,assignsowners and describes the sequence of opera-tions. The main objects that traverse the sys-tem are the translatable content fi les. If thereis an in-house translation group, they are sim-ply included in the process. However, many

    times an outside translation vendor is used, sothat can create a hole in the process whereits difficult to integrate the function directly.

    The key benefit derived from these sys-tems is visibi li ty. One can simply inquire andreceive a wealth of information about whereany given object is,whats happening to it, whois doing it and when completion is expected.Also, these systems can track cost information,which has historically been very dif ficult to getfor translation/localization projects. So manydifferent groups are involved (inside, outside,contractors) that consolidating financial data

    has proved to be elusive.Vendor-based System. The translationservice bureau, or vendor, has historically hadproblems that are very similar to the Stage Dgroups: many clients, many languages, manyprojects simultaneously.Projects involve usingboth in-house and contract resources. Allprojects are time-crit ical, and costs absolutelymust be tracked. So, not surprisingly, manyhave developed very sophisticated systems todeal with managing the workflow. Their sys-tems allow things like price estimates to begenerated and sent to prospective customers,

    automated client-viewable status information,and complete start to final QA process man-agement. Savings result ing from greater effi-ciency accrue both to the customer of such avendor and to the vendor itself.

    The customer merely drops sample filesinto an inbox along with a job description andgets a price quote and schedule for the work.After confirming and approving the quote, acomplete set of translatables is placed in theinbox.Completed work (or interim versions forreview and testing) is subsequently delivered tothem in an outbox. The customer can log on at

    any time and see whatever status information heor she needs.The vendor runs a much more effi-cient business by having control over itsresources and workflow. But the customer stil lhas a problem of managing the rest of its processin-house. If the customer also has an in-housesystem,it would ideally be connected to the ven-dors system to simplify the overall process.

    Hosted System. Slowly but surely, thecombination of turn-around requirements,cost-control needs and overall complexity isdrawing innovators to the table. New anddemanding applications are emerging, which

    fuel the incentive to supply solutions. Picture aWeb-based catalog sales organization with aglobal audience.Each items product descriptionin the catalog must be presented in multiple

    languages; there are thousands of items; newitems post daily; and the translations are neededvery quickly in hours instead of weeks.

    What would be good is if the writer of theproduct description were able to drop it intoan automated system that routed it to standby

    translators, who rapidly return translatedcopy, which the system posts on the Web sitefor viewing. Similarly, larger projects couldflow automatically through a system where thecustomer and all involved vendors had somekind of pre-set financial understanding.

    This is exactly what is emerging in themarket today: systems hosted on the Internet,much like an ASP, where as an authorized useryou can greatly simplify your process and proj-ect management. Although some evolution issure to further improve the arrangements,youcan deal now with systems that have automat-

    ed the process of quotations, implementingtranslation resources, reporting, and deliveryof final goods. There may be some limitationsas to the selection of resources, language and soon, but the current offerings are significant.

    Future System. Today, we create a prod-uct with a given language comprising its userinterface or other textual content. The processwhereby we convert our products content tobe displayed in additional languages isbecoming more sophisticated and simplified.Ult imately, we must evolve to a point wherelanguage becomes simply an attribute of the

    works that we create, as opposed to addinglanguages on as an afterthought.We will see fully automated processes con-

    nected to the point of origin of the works wecreate. This way, you simply declare when youstart a project that you want your work mani-fested in English, German, Japanese and so on.When you type something in English, the sys-tem routes it through a process that returns it ineach target language.When you have completedyour work, it is already published also in each ofyour target languages. Sound incredible? Wehave most of whats needed for this already,and

    projects are in the works to push it further.

    In SummarIn Summaryy

    Demand for translation is likely to contin-ue to increase.We can expect projects to be larg-er and larger, increasingly complex, and will bedemanded in ever-shrinking timeframes. Tech-nology is coming to our rescue,but only by care-fully analyzing our process requirements can weimplement meaningful solutions. Much of thechallenge lies in managing the process and inminimizing pressure on critical path resources

    such as translators. We have not yet fully inte-grated all of the tools and technologies availableinto a seamless top-to-bottom solution, but weare on a realistic path to do so.

    1010

    GETTINGSTARTEDGU

    IDE

    LANGUAGE TECHNOLOGY

  • 8/22/2019 Screen Supp

    11/25

    In this article I will discuss some of themost prominent features of translation memo-ry (TM) tools. My intention is not togive you a list of this tool or feature isbetter than that one. What I want to

    show is how the different tools haverealized the basics of a TM system.At fir st glance, all the TM

    tools promise the same things:recycling of repeti ti ve text parts,time and cost savings, better con-sistency, data exchange and so on.That sounds nice and easy. But assoon as you have decided that youneed to work with one or more ofthese tools, you will have to spendsome time evaluating and comparing fea-tures and processes.

    Although I will not be able to saveyou all the time and effort of looking atthe tools for yourself, this art icle mightgive you an idea what to look for when youdo your evaluation. Please always keep inmind that there is no one best tool thatcan satisfy all the needs that a translator orproject manager might have. One toolmight serve your purpose better thananother, depending on the processes youhave, the file formats you work with or theuser preferences you have.

    I have looked at some of the TM

    tools on the market today: Dj Vuby Atril ; SDLX by SDL; Trans-lators Workbench by TRADOS;Transit by STAR; TRANS Suite2000 by Cypresoft ; and Wordfastby Champollion Wordfast, Ltd. Ihave not included IBM TranslationManager, as this tool is no longerbeing marketed.

    In this arti cle, I wil l take alook at these features: TM model;translation environment; transla-tion memory exchange format

    (TMX) ; statistics such as wordcount; fuzzy matching; and spe-cial elements such as abbrevia-ti ons and acronyms.

    TM Mode lsTM Mode ls

    Before going into individual features,I would like to introduce the two differentmodels that are being used in TMs: thedatabase model and the reference model.

    The database model saves the sourcesegment and i ts translation as one unit, aso-called translation unit, often abbreviatedas TU. But it saves this unit out of context.

    The reference model does not save thesegments and their translations in oneplace but references their position withinthe documents themselves. That is, if you

    want to see the segment in

    context, you can call

    up the reference material and takea look. But you also have to decidebefore you start a project which

    fi les or previous projects you wantto use as reference material, whereas

    a database will just grow with everytranslation unit you save.

    TTransla t ionransla t ion

    EnvirEnvironmentonment

    Let me start with a comparison ofthe most visible part of all translationtools the translation environment.

    There are basically two differ-ent ways: working with an existing

    editor in addition to the TM tool andworking within the TM tool.

    Some tools (TRADOS TranslatorsWorkbench,Wordfast) use Word as an editorwhich simplifies the translation of all fi le for-mats that can be comfortably displayed inWord. But as soon as other f ile formats comeinto play, they wil l need to be prepared (con-verted) for use with Word or they have to betranslated with a different editor altogether.

    The other tools have their own editors.They import the files to translate and showthem either in the form of a two-columntable (one segment per table cell) or in differ-ent windows (one for the source language

    and one for the target language). They alsodiffer in how they show formattinginformation and fi le structure infor-mation. Structural information iseither omitted altogether or shownwithin protected tags. Formattinginformation can appear as tags orpartly also in WYSIWYG mode.

    File ForFile Formatsmats

    Each tool provides a list offile formats that i t supports. It is

    basically possible to translate anyformat with any tool. It reallydepends on the filters or conver-sion utilities that are offered by

    1111

    GETTINGSTARTEDGU

    IDE

    LANGUAGE TECHNOLOGY

    Compa ring Ba s icCompa ring Ba s ic

    FeaturFeatur e s of TM Te s of TM ToolsoolsAngelika ZerAnge lika Zer fa fa

    The reference m odel is being used by Transit

    and m ost softw are localization tools

    The database m odel is used by SD LX, D j Vu, Translators

    W orkbench, TR AN S S uite and W ordfast

    Translationis saved in target file.Project file or source and targetfiles can be used as reference

    for further projects File totranslate

    ReferenceTM

    Referencesource

    file

    Referencetarget file

    Target file

    Segment toTM

    Suggestion fortranslation to editor

    Segment to TM

    Suggestion fortranslation to

    editor

    Segment pair toTM

    TMDatabase

    Sourcesegments and

    translation are savedas a unit in the

    database

    File totranslate

  • 8/22/2019 Screen Supp

    12/25

    the manufacturer or the possibility to createor customize them.

    A tool that uses its own editor, such asTransit , SDLX, Dj Vu and TRANS Suite,will offer a list of supported file formats

    during the creation of a project. But sometools only allow one format per project; withothers you can choose several formats to beincluded in one project.

    A tool that uses Word as its editor, li keTranslators Workbench and Wordfast,doesnt need any conversion or filtering aslong as the files to be translated can beopened comfortably in Word. As soon as

    other f ile formats have to be processed, thetool either offers another editor, such asTagEditor from TRADOS, or the fi les needto be converted so that they can be opened

    in Word for translation. This also meansthat after translation the files have to beconverted back again.

    Although the display has not much todo with the actual translation features of a

    tool, this is one of the most important factorsfor many translators when they decide whichtool to use. Another decisive factor is thatmany customers today know about transla-tion tools or even use one of them in-houseand want the translator to use that tool aswell. But with the advent of TMX ( transla-tion memory exchange format), this can becircumvented to some extent.

    TMXTMX

    TMX is a format that is used to

    exchange data among TM systems. So, if youwant to reuse a TM that was produced withone tool in another one, TMX is the way togo.TMX is an XML-based descript ion of theinformation that is contained in a TM sys-tem.There are three different levels of TMXcompliance, which I wil l l ist here verybriefly. (To see more information on theTMX specifications, go to the site www.lisa.org/tmx)

    TMX level 1 includes a description oftext information only.

    TMX level 2 includes a description of

    text information and formatting information.TMX level 3 includes a description oftext, formatt ing and tool-specific addit ionalinformation. Some tools offer the possibi li ty

    1212

    GETTINGSTARTEDGU

    IDE

    LANGUAGE TECHNOLOGY

    D j Vu

    Text-only table

    Transit

    W indow s for

    source and target

    text, partly w ith

    W Y SIW Y G

    form atting

    Copy of source

    text is overw ritten

    during translation

    TRAN S Suite 2000

    Source and

    target w indow

    SDLX

    Table, form atting

    partly W YS IW YG

    Examples of Six TExamples of Six Trans la tion Envirrans la tion Environmentsonments

  • 8/22/2019 Screen Supp

    13/25

    of adding information such as project name,file format, customer name, name of trans-

    lator and so on.Most tools on the market today com-ply wi th l evel 2.

    Even though the TMX standard exists,this sti ll does not guarantee that all data canbe brought safely from one tool to the other,as different tools use different structures forencoding RTF formatting information, sub-segments such as footnotes or index entries.So, even though a tool can export text andformatting information into a TMX file,another tool might not be able to read itcorrectly, and you will end up with text only

    in your new memory.

    ExamplesExamples

    The following sentence in an RTF file This sentence containsdifferentformat-ting information. wi ll be represented di f-ferently in TMX depending on the toolused, as you can see in the accompanyingcode example.

    Level 3 has not been implemented inany tool yet, as it demands the encoding oftool-specific information such as project

    name, customer name and other additionalinformation with XML tags. But as dif ferenttools offer different customizability forsuch information, it is difficult to definehow to encode it . For example, one tooloffers a dialog where the user can define asmany additional fields as he or she wants.Another tool offers only a limited numberof fields. Here, the import from one into theother might result in loss of information.

    Most tools also offer an import/exportfeature for the format of other tools manu-facturers. But as new tools pop up all the

    time and as manufacturers tend to work ontheir own formats improving or migrat-ing completely to another format it isnot possible for any tool to support all the

    other formats on the market. That was oneof the reasons for the development of TMX.

    S ta t i s t i c s : WS ta t i s t i c s : Worord Countd Count

    How to calculate the word count (or linecount respectively) may be one of the mostdisputed issues in the world of translation,since it is usually the basis for cost calcula-tions. Here are some examples of how differ-ently words are counted with different tools.

    Word. Word counts everything be-tween spaces as a word. A list of stand-alonesymbols such as , %, &, / creates a wordcount of four words. Numbers are also

    counted as words. Terms linked with ahyphen or a colon, such as pre-translation

    or human:machine, are counted as oneword each.

    SDLX. SDLX offers two different waysto count words: like Word or SDLX specific.The SDLX-specific counting includes num-bers and special characters as words but notthe symbols of bulleted or numbered lists,whereas Word counts these as words, too.

    Translators Workbench. This toolcounts only translatable text. No stand-alonenumbers, no symbols. A word here consistsof at least one letter surrounded by spaces.

    Most TM tools count dif ferently as theyconcentrate on translatable text. Stand-alone numbers or symbols are, therefore,

    usually not counted as words; or they counthyphenated terms as two words. In the

    1313

    GETTINGSTARTEDGU

    IDE

    LANGUAGE TECHNOLOGY

    Translators

    W orkbench

    Interface to W ordthrough TRA D O S

    toolbar

    W ord w indow ,

    segm ent to

    translate

    W ordfast

    Plug-in for W ord

    Interface to W ord

    through W ordfast

    toolbar

    W ord w indow ,

    segm ent to

    translate

    xTranslators Workbench Example

    This {\b sentence} contains {\i

    different} {\ul formatting information}.

    This {\b sentence} contains {\i

    different} {\ul formatting information}.

    SDLX Example

    This sentence contains different formatting

    information.

    Dieser Satz enthlt verschiedene Formatierungen.

    TMX Representation Differs Between Tools

  • 8/22/2019 Screen Supp

    14/25

    Word Count Examples table, I wil l giveyou a few examples. This table illustrateswhat the word counts are for a few sampletexts when using different tools.

    As you can see, the statistics mightlook very different depending on the type oftext that you are analyzing.And if you havesub-segments such as index entries or foot-notes, the differences are even more visible.

    Fuzzy MatchingFuzzy Matching

    and Match Ratesand Match Rates

    Fuzzy matching is the technology thatis used to find segments in the TM similarto those that have to be translated. The per-centage of similari ty between segments inthe document and source segments in theTM defines the recyclabil it y of the TM.

    Most TM tools will create a statistic ofmatch rates either with a separate analysisfeature or automatically during import ofthe files to translate into the TM system.But because the fuzzy matching algorithmsare a lit tle different for every tool, the

    match rates in the analysis statisti cs are notreally comparable.Looking at the match rates for each tool

    as shown in the Match Rate Examples table,you will find that they are all set up differently.

    Ha ndl ing of S pecia lHa ndl ing of S pecia l

    Elements in a SegmentElements in a Segment

    Abbreviations. Most tools recognizethe common abbreviations of the languagethat is set as source language by internalli sts. That means that even though they

    encounter a dot after a word, they do notend the segment there because they knowthat this is an abbreviation. Abbreviationsthat are not recognized can either be addedin a dialog or wi th an external text fi le con-taining a list of unknown abbreviations.Some tools also offer a list of words afterimport of the files to translate where theuser can define which of those wordsshould be handled as abbreviati ons.

    Acronyms. Acronyms are words con-sisting only of uppercase letters, such as

    WYSIWYG. Some tools can recognizeacronyms as units that do not need to betranslated. Acronyms are even substi tutedautomatically by some of the tools.

    Source segment 1:Use the acronymTM here.

    Source segment 2:Use the acronymMT here.

    During translation of the second seg-ment, some tools automatically substi tutethe acronym from the source segment forthe one that was saved in the TM.

    SummarSummaryy

    For this art icle I just did some small testswith simple RTF files, and there are manymore things to consider when you are workingwith files coming from a DTP application ortagged files in HTML, SGML or XML. Takeyour time when you evaluate the tools and usesome real-li fe files,not the sample files that areprovided with the tools.They are great for get-ting to know how the tools work, but they donot give you the real-life picture.

    1414

    GETTINGSTARTEDGU

    IDE

    LANGUAGE TECHNOLOGY

    Test Segment/Tool

    1.i tem

    2.point bullet list

    12356

    The symbols are , % and &.

    Abbreviations like e.g. or i .e. in onesegment.

    Hyphenated two-word expression.

    Word

    8

    1

    7

    8

    3

    TranslatorsWorkbench

    4

    0

    4

    8

    4

    Transit

    4

    0

    4

    8

    4

    SDLX

    4

    1

    7

    10

    3

    Dj Vu

    6

    1

    4

    10

    3

    TRANS Suite 2000

    4

    1

    6(5 translatable words)

    8(6 translatable words)

    3

    Wordfast

    4

    0

    6

    8

    4

    Word Count Examples

    Test Segment

    There is information on a new tool.

    There isnew information on a tool.

    There is information on a newtool.

    Tool information

    Toolsinformation

    SDLX

    81%

    99%

    49%

    TranslatorsWorkbench

    92%

    99%

    67%

    TRANS Suite2000

    85%

    97%

    No match

    Transit

    99%

    98%

    No match

    Dj Vudoes not use percentages,but levels

    0-9 (very simi larto very fuzzy)

    Level 1 creates a match

    Level 1 creates a match

    No match

    Nature of Change

    Changed positionof one word

    Same segment withdifferent (bold) formatting

    Match Rate Examples

  • 8/22/2019 Screen Supp

    15/25

    A TA Tale of Tale of Tomom

    Tom had just reached 55 km/hr down hisfavorite mountain road when he felt the pagerbuzz against his back. At the bottom of thehill, he turned his bicycle onto a side streetand checked the number in the pager win-

    dow.It was a translation company in Paris.Hefound a quiet area near a picnic table andcalled them back with his cellular telephone.

    A half-hour later,he took a shower whilehis computer booted up. When he sat at hisdesk, he connected to the Internet and openedhis e-mail. The message from Paris containedthe specifications of the job and a two-megabyte compressed fi le attached to it. Whilethe fi le downloaded,he opened his accountingsoftware and started an estimate/job orderform. Within a half-hour, he had opened thefiles, inspected them,veri fied the clients word

    count and sent his estimate as an e-mailattachment to the client.He decided not to usedictation software for this assignment, so heput a CD of SchubertsThi rd Symphonyon thestereo and went to work.

    The assignment consisted of engineeringreports to the main office of a multinational oilcompany.Tom had done some of these the yearbefore, so his translation tool had them inmemory.He opened the tool, imported the joband examined it on the left side of the screen.He moved a few segments together and splitout some others. Then he ordered the tool to

    pre-translate.While the tool worked,he openedthe paper mail he had brought in from his bicy-cle ride. It took the tool five minutes to pre-translate all the reports, but i t found matchesfor more than half of the segments in the doc-ument. It was time to edit the pre-translation.

    Tom loaded a technical dictionary in theCD-ROM drive and launched his browser.Scrolling down, he compared each segmentof source text to the target text and edited thetarget text as needed.Each time he finished asegment, a single keystroke ordered the toolto pre-translate all future segments using

    what Tom had chosen so far and to put hiscursor in the next segment.The next day Tom finished the docu-

    ment in the translation tool. He ran the code

    checker to make sure no formatting codeswere lost and then the spell -checker. Finally,he exported the file back into Word andprinted a copy for proofreading. The proof-reading went faster than it used to before hestarted using the translation tool, but healways managed to find something on paper

    that he had missed on the screen.He entered the corrections in the docu-ment and attached the final version to an e-mail. His accounting software converted hisestimate into an invoice and posted an entry toAccounts Receivable for him. He attached theinvoice to an e-mail to his clients accountingmanager.While he was on-line,he checked hisbank balance and saw that a wire transfer fromanother customer had come in. He down-loaded that information into his accountingsoftware and called it a good days work. Hedecided to ride to his favorite pizzeria to meet

    some friends and relax.

    Sa me J ob DifSa me J ob Diffe rfe rent Went Worldorld

    Not much has changed about what trans-lators like Tom (a fictional character) do sinceJerome translated the Christian Bible into Latin.Howwe do it has changed a lot: for example,howwe receive the source material and send thetranslation and our invoice to the clients (com-munications and administration); howwe lookup answers to our questions (research); andhowwe draft the translation and check our work

    (production and quality control).

    CommunicationsCommunications

    Being there when the client calls is cru-cial. The translator who answers the phone orcalls back first gets the assignment and keepsthe client. How we achieve this depends onour individual lifestyles, our working envi-ronments and our budgets.

    Whether the translator is working fromhome or an office, a voice telephone line hasto stay clear. This may mean having a second

    telephone line for the computer and fax or aDSL (Data Subscriber Line) for the computer.My old-fashioned fax machine is reliable

    and works with the computer turned off. But

    today services such as Efax (www.efax.com)convert faxes into e-mail attachments.Incomingfaxes are free; outgoing are inexpensive.Electronic faxes can be enlarged, rotated andcopied to the clipboard. Electronic fax servicesin wired countries, such as www.freefax.itin I taly, allow the subscriber to give clients a

    local telephone number for faxes. Advertising-supported services such as www.freefax.comoffer free outgoing faxes.

    Cellular telephones have become standardissue.Some telephone companies offer a featurethat allows the subscriber to forward the officephone to the cell number when running errandsor working off-site.Some translators use the cellphone as their primary office phone.

    Pagers come as separate units or areincorporated in cell phones. They let a cus-tomer leave a number or a message.The voicemail feature offered by some telephone com-

    panies will dial the pager when someoneleaves a message. Pager signals can reach inareas where cell phone service fades.

    Once the initial contact is made,often bytelephone, most communications take placeby e-mail. There are two elements in usinge-mail and the Internet: the Internet connec-tion and the various programs that reach outto the world through that connection.

    The connection itself is the job of theInternet service provider (ISP). To check e-mailoccasionally,a telephone line with a modem maysuffice. Translators who rely on Internet sources

    for research and terminology checking (throughEurodicautom, yourDictionary.com, subject-area Web sites and so on) may need to be on-linefor long periods. They may need a DSL or cableconnection. Cable TV companies, telephonecompanies and cellular companies are compet-ing vigorously for small office/home-office(SOHO) business. Translators and languageservice companies in populous metropolitanareas may need help choosing among the offers.

    The programs running through theInternet connection include mainly the browser(Netscape, Internet Explorer and so on) and e-

    mail (usually with the browser,but also Eudora,Mulberry, Outlook and others). Then there arethe applications that come with built-in Internetlinks,many of which support the business itself.

    1515

    GETTINGSTARTEDGU

    IDE

    LANGUAGE TECHNOLOGY

    TTechnology andechnology and

    the Frthe Freela nc e Teela nc e TranslatorranslatorJ onathan TJ onathan T. Hine, J r. Hine, J r..

  • 8/22/2019 Screen Supp

    16/25

    AdministrationAdministration

    Taking care of business requires integrat-ing the same technologies that translating does,plus on-line banking,e-commerce and supportsoftware. Armed with a credit card and a cur-

    rent browser, the wired freelancer sends invoic-es, pays bills and purchases what the businessneeds. Modern accounting software programs,such as Quicken, Money,Peachtree and others,include integrated browsing to various servicesover secure links.They also generate invoices intext or HTML that may be attached to e-mail.

    At the very minimum, a spreadsheetprogram such as Excel and a presentationprogram such as PowerPoint should be in thefreelancers kit. Often, translation assign-ments come as combinations of Word fileswith embedded spreadsheets as tables.

    As we become increasingly dependent onelectronic media,the stakes involved in a hard-drive crash, a fire or a power failure becomeeven more important. It is time to include aCD read/wri te (R/W) drive, often called aCD-burner, in the computer suite.CD-burn-ers offer a cost-effective way to back up files.They make it possible to store everything onthe computer safely in a distant location.

    An uninterruptible power supply (UPS)is also a good investment.An economical onecan protect equipment from power fluctua-tions and provide time to save files and shut

    down gracefully during a blackout.Finally, no discussion of technology canpass without mention of anti-virus software andintrusion protection.Translators and companies

    that use continuous connections like DSL orcable,in part icular,need to install firewalls andkeep their virus checkers updated. Symantec,makers of Norton AntiVirus, and NetworkAssociates, which sells McAfee, are two of thebest-known companies in the field. Those who

    do not know what a virus,a worm, a hoax and afirewall are or how to recognize a suspiciousattachment should find out immediately.

    ResearResearchc h

    Not so long ago, translators collectedbound dictionaries with a passion matchedonly by devoted philatelists. Today, we stillneed paper dictionaries for many reasons,butCD-ROMs offer cost-effective informationstorage.And what we need quickly,but not forvery long, we can often look up faster on the

    Internet than at the local library.All this means that technical translatorsneed the fastest CD-ROM drive they can find.Lots of memory in the computer (at least 256MB), speed (at least 133 MHz), a large videocard (2 GHz) and the biggest hard drive that willfit the budget are all essential. Computer poweris needed to use the CD-ROMs efficiently; totake full advantage of a fast Internet connectionif available; and to run modern programs,which are full of heavy routines and graphicsthat slow down less powerful computers.

    PrProduction and Quality Controduction and Quality Controlol

    So far,what a translator needs differs littlefrom the needs of any freelancer in the SOHO

    workplace. Software tools set the technicaltranslator apart. Freelancers typically workwith tools written specifically for the transla-tion task and with tools written for other pur-poses, which translators may use effectively.

    TTranslation Translation Toolsools

    The first group includes products such asDj Vu and TransCorpora, which help thehuman being translate efficiently. They per-form pre-translation and present the translatorwith a split screen: source text on one side, tar-get on the other.Our mythical Tom was usingsuch a tool. Localization project managementsoftware often includes translator modules thatcan be sent to the freelancers participating inthe project. TRADOS,SDLX and STAR Transitare examples. These three also offer freelance

    editions which translators or freelance projectmanagers can buy and use independently.With translation tools, the source docu-

    ment has to be in an electronic format. Thisposes a problem for freelancers who perform acraft-like service on older paper documents,such as evidence in legal cases, or work withclients who do not provide electronic files.However, as attaching word-processing files toe-mail becomes more common, more free-lancers will turn to translation tools to maxi-mize efficiency.

    Most translation tools suitable for free-

    lancing or outsourcing projects work by accu-mulating str ings (segments) of matched sourceand target texts into a translation memory(TM). The translator or the project team usu-ally creates the TM by aligning segments of textin already-translated document pairs and sav-ing the results.The tools also feature terminol-ogy databases. These are matched sets ofsource and target terms with additional infor-mation for sorting and filtering (parts ofspeech, gender, subject area and so on).

    When ready to work on a new translation,the freelancer or project manager imports the

    source document into the tool. The latest toolswill import from most major office applicationsuites (Office, Lotus, Corel and so on). Theyembed codes in the imported document to pre-serve the formatting of the source documentwhile working on the translated text.

    The tool displays the imported file on halfthe screen, thus allowing the user to make anychanges needed before pre-translation. Forexample, the tool may have started a new seg-ment after each period in an abbreviation ornumbered list.The user can join the parts of thelist or the letters of the abbreviation, or change

    the segmentation rules and re-import the file.When the source file looks good, theuser orders the tool to pre-translate. The toolcompares each segment with the segments in

    1616

    GETTINGSTARTEDGU

    IDE

    LANGUAGE TECHNOLOGY

  • 8/22/2019 Screen Supp

    17/25

    its TM.When it finds a match, it puts the tar-get segment for the match next to the source,fi ll ing up the other side of the screen segmentby segment. Segments with no match may beleft blank or may have the source text insert-ed automatically. Fuzzy matching allows the

    tool to retrieve partially matched segments ifperfect matches are not found. For example,only the year or some numbers may be dif-ferent in an otherwise perfect match.

    Next, the translator edits what the tool haspresented.This can be tedious or easy,depend-ing on how many segments in the source docu-ment found a good match in the TM. Thesource and target segments are aligned on thescreen, so comparing them and scrolling downare physically easy for the human being.As thetranslator scrolls to the next segment pair, thetool can automatically save the work, update

    the TM and pre-translate any segments stillahead. As the work proceeds, more segmentsfrom this particular document go into the TM.The tool presents more matched segments asthe translator scrolls down.

    The tool maintains its own fi le for all thiswork, so the source document is never modi-fied. This allows a project manager to send theimported file and the TM and terminologydatabase to a freelance translator who also hasthe tool. It can be sent at any point in theprocess, so who does the impor ting, checkingand pre-translation can depend on the human

    beings individual involvement in the project.These tools work best when the sourcematerial is consistent in style and terminolo-gy and when there is enough of it to becomerepeti tious. My own accuracy and speedincreased just by having the split screen keepmy work aligned in front of me and by hav-ing the tool pre-translate portions before Iget to them.TM can also be shared,transmit-ted and updated by different people workingon a common project. This facili tates consis-tency between translators.

    For translators and companies that need

    to manage terminology, several programs aredevoted to that task. They do not feature TMand pre-translation, but may provide cost-effective support to those working with papersource documents. LogiTerm is an example.

    Other TOther Tools for Tools for Translatorsranslators

    The other group of tools includes desk-top publishing (DTP) programs and dictationsoftware. DTP programs Quark, Ventura,MS Publisher and others perform the lay-out and typesetting needed for finished print

    documents such as this magazine. Technicalcommunicators in many fields, includingtranslators,may need these tools for particularclients. The DTP programs pick up where the

    word-processing programs leave off, so free-lancers should know all the features of theirown word processors Word, WordPerfect,Lotus Notes, for example before buyingDTP software.

    Dictation software replaces the old tape-

    recording that was transcribed by a typist inthe old days. Before the personal computer,this was how high-volume, very specializedtranslators worked. At 125 words/minutewith a familiar source text, a translator coulddictate 40,000 words/day. Even today, it is acost-effective way to produce the first draft ofa translation. Dragon Naturally Speaking isprobably the best-known of these programs.

    Both groups of tools can also be used whileworking in the office programs with which theyare compatible. For example, I can call up atranslation tool database (terminology or TM)

    from MS Word, check a choice and insert itfrom the tool into the Word document directly.Similarly, I can have Dragon Naturally Speakingrunning in the background and dictate directlyinto the Word document. How well this integra-tion works depends on the versions of eachapplication and the amount of RAM, clockspeed and free disk space available to the user.

    Changes in the WChanges in the Worksorks

    Today, we see all this technology beingcombined and rearranged in many ways. For

    example, personal digital assistants (PDAs)are not only replacing bulky agenda booksand phonebooks but are also integrating thecell phone and pager and providing portable

    PC technology. Notebooks and laptop com-puters allow freelancers to work anywhere,whether traveling for business or just workingsomewhere different. Wireless technology isbecoming affordable and widespread, therebyalleviating the need to find a telephone con-

    nection for the computer on the road.However, let us not forget the most

    amazing technological marvel of all .

    TTimeless Timeless Technologyechnology

    What is at the center of this wired SOHOworkplace?A few dozen kilograms of mostlywater and organic material performing untoldmiracles of human thought. Some technologyin the office may not be spectacular, butbecomes vitally important to the long-termhealth of the human beings it serves. We

    should read about lighting, monitor height,wrist position and ergonomics and take agood look at where we work.

    For example, is there adequate sunlight?If not, I would use full-spectrum l ight bulbsin some of the lamps. We wil l spend most ofour working lives in the chair at the worksta-tion, so we need the most comfortable, high-est quality armchair we can afford. It took metwo upgrades over four years to afford thechair I use now.

    Tom probably set a good example for usby making sure his work did not keep him off

    his bicycle. We have to make the technologywork for us and not the other way around. Thatway we will be there for ourselves, our clientsand our families for many years to come.

    1717

    GETTINGSTARTEDGU

    IDE

    LANGUAGE TECHNOLOGY

    A N Y D O C U M E N T, A N Y L A N G U A G EO u r s e r v i c e s i n c l u d e :

    Technical t ranslat ion by nat ive speakers exper ienced in the subject m at ter

    Content ma nagement that re duces t ime and cost

    An easy-to-use re quest form an d rea l-t ime job tracking via our secure internet si te atwww.gltac.com

    Pa t en t and t ech n i ca l t r ans la t i on servi ces in Japa nese , F rench , German , Span i sh ,Ital ian, Portuguese and Russian . A registere d US pate nt a gent is on staff to assist withyour t echn i ca l t r ans l a t i on needs

    Quali t y con t ro l mai n t a i ned t h r ough i ndepen den t r e vi ew and exper i encedman agemen t superv is i on

    MSDS authoring software a nd re gulatory compliance assistance by our te chnical staffand affi liates

    Tran slation mana gemen t consulting services for your in-house sta ff or tran slation mana ger

    GlobalLanguageTranslationsandConsulting,Inc.2513 Louanna Street, Suite 101, Midland, MI 48640www.gltac.com 877-68-TRANSLATE 989-839-5804

    MSDSWeb Sites

    Product DataSafety Manuals

    Marketing Literature

    PatentsHR Documents

    SAP Training MaterialCustomer/Employee Surveys

    User Manuals and much more

  • 8/22/2019 Screen Supp

    18/25

    If your organization is new to languagetechnology, this article introduces you to a fewlanguage software tools that help reduce thecosts of globalization and improve delivery ofdifferent language versions to market. For thepurposes of this article, the tools are categorizedas machine translation (MT); internationaliza-

    tion; localization embracing both translationmemory (TM) and computer-aided translation(CAT); and, finally, multilingual content man-agement, including multilingual workflow.

    Before choosing tools for internal use or alocalization vendor with those tools, AndreaBhringer,marketing manager at STAR Group,suggests an organization needs to look at thenumber of files that needs translating, as wellas information within graphics for translating.It would need to consider timeframe, file man-agement, publishing platforms and whether touse SGML or XML. I also recommend training

    technical writers in the techniques of writingand formatting to meet the requirements ofTM processes, as well as building a terminolo-gy pool for source and target languages beforestarting with translation. If possible, imple-menting a workflow system allows automatedhandling of routine tasks.

    Machine TMachine Translationranslation

    After in-depth evaluation, an organizationmay decide that MT will provide sufficient qual-ity to meet sales and customer service require-

    ments, particularly if it has large volumes ofcontent or if real-time translation is essential. Inessence,MT scans a Web page that is in anotherlanguage, parses the sentences into words andmatches those words against custom dictionar-ies for comprehension before constructingtranslated sentences. Translation quality de-pends on the level of grammatical correctnessand unambiguousness of the source language,coupled with MT technology that containsbilingual dictionaries containing all the sourcewords. Although MT produces only gist-levelcomprehension, the benefits of low cost and

    real-time translation have led to an enormousamount of documents translated with MT.Founded in 1968, SYSTRAN claims that

    more than 400,000 Web sites translate over six

    million Web pages per day using its MT tech-nologies.The company has recently undergonea complete redesign of its architecture toimprove quality. Among many improvements,the redesign has modularized the code, con-verted to declarative programming, becomefully XML compliant and made its dictionaries

    more accessible and intuitive.Reba Rosenbluth,director of corporate sales at SYSTRAN, says,SYSTRAN offers three levels of translationquality.To begin with, there is gist quality, thenfirst level customization which is the ability tobuild your own dictionaries with IntuitiveCoding Technology.The next level is publishingquality for near perfect translations, used forlarge-scale corporate customization projects.Often people do not know what they want,andwe have to help them make the correct choicethrough a highly interactive process.Its an edu-cation process, and the marketing incorporates

    the education. The bottom line is that we haveto supply what the customer wants. We supplymetrics to our customers,and if we say that wecan provide 70% or 80% or 90% coverage ofthe language, we deliver.

    Another MT vendor is SDL International.Its Enterprise Translation Server (ETS) canprocess three million words per hour (50,000per minute) and, claims Hedley Rees-Evans,marketing director at SDL, has an applicationprogram interface that is much easier to imple-ment than our main competitors, as wel