Seminar 3 Multiterm Extract Eng

Post on 25-Dec-2015

28 Views

Category:

Documents

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

methods

Transcript

Multiterm ExtractTerminology Extraction

Marchitan Irina, Senior lecture, MA

Dep.Translation, Interpreting and Applied Linguistics

MultiTerm Extract

• Multiterm Extract uses statistical extraction (based on the frequency of the appearance of candidate terms) to extract term candidates and, for multilingual termbases, their probable translations and present them in a term candidate list.

MultiTerm Extract

• You can extract terms from monolingual or bilingual documents and from translation memories and then quickly and easily validate the terms.

• You can then use the multiple export options to incorporate extracted terms into your existing MultiTerm termbase, MultiTerm XML format or a tab-delimited format.

• SDL MultiTerm Extract supports all languages, including those of the Far East.

MultiTerm Extract allows you to:

• Create term candidate lists from monolingual documents.

• Create bilingual term candidate lists from bilingual file formats.

• Update existing multilingual termbases by creating new translations for terms already stored in MultiTerm.

MultiTerm Extract allows you to:

• Analyze the quality of termbases and documents by comparing the terms found in documents with those stored in the termbase.

• Compile bilingual dictionaries from bilingual file formats.

MultiTerm Extract allows you to:

• Manually extract terms from the source document(s) to complement and enhance the automatic term extraction process.

• Export extracted terms to MultiTerm, MultiTerm XML or a tab-delimited format.

Objectives

• Compiling glossaries for simultaneous and consecutive translation

• Updating existing TB• Making up AutoSuggest dictionaries

Stages of work

• 1. Create a new SDL MultiTerm Extract project. Select File > New Project from the menu bar and launch New Project Wizard.

• Choose project type, add working files, make necessary settings and stop word lists.

Stages of work

• 2. Launch term extraction process.• 3. Review and validate the extracted

terms.• 4. Export validated terms using Export

Wizard.

Types of projects

• There are five types of projects in SDL MultiTerm Extract:

• o Monolingual Term Extraction• o Bilingual Term Extraction• o Translation• o Quality Assurance• o Dictionary Compilation

• Project Monolingual Term Extraction• Extracts terminology from a text in one

language.• Project Bilingual Term Extraction• Extracts terminology from monolingual

formats. • Extracts and translates from bilingual

formats.

• Project Dictionary Compilation• Extracts a dictionary from an existing

translation memory file .• Project Translation• Provides suggestions from existing texts and

allows you to enter translations to a termbase.

• Proiect QA (Quality Assurance)• Analyzes and allows you to improve the

quality of termbases and documents.

CREATING A PROJECT

• 1. Launch MultiTerm Extract.

• 2. Select New Project in File menu. The New Project Wizard opens and the Choose project type, name and location page is displayed. Use this page to define the project type, name and location.

Select Monolingual Project

denumirea

Locul pe disc D

Add files with ST

Exclusion settings (what to exclude)

Settings for terms excluded by user

Excluded terms:

• Excluded terms settings are used to define the exclusion and learning settings for your project.

• This page is not displayed for Translation Projects and QA Projects.

Exclusion Settings

• Exclusion Settings – during the extraction process, SDL MultiTerm Extract ignores any terms that already exist in the termbase and/or file that is specified under Exclusion Settings.

Settings for saving terms excluded by the user

• Learning Settings - during the extraction process, SDL MultiTerm Extract records any terms that you manually reject in a dedicated database.

• SDL MultiTerm Extract learns to automatically ignore these unwanted terms as extraction continues.

To specify the Exclusion Settings:

• Click Exclude termbase. • The Choose Exclude Termbase dialog

box is displayed.

• Select the termbase you want to exclude from the term extraction process.

• Click OK. • If you selected a project termbase

already in step 5, this is automatically displayed under Exclusion Settings.

• Complete the Choose Index to Exclude dialog box.

• Select the source index to which SDL MultiTerm Extract should refer when checking for terms to exclude from the term extraction process.

• Click OK to confirm settings.

• Click Exclude file to specify lists of terms you do not want to include in the extraction process.

• Select the corresponding file (list of terms).

• SDL MultiTerm Extract comes with basic vocabulary lists for certain European languages. If there is a basic vocabulary list available for your source language, this is automatically displayed.

• Press Select to confirm your choice.

To specify the Learning Settings:

• Select the Record discarded terms• check box to choose a learning database

where terms excluded by the user will be stored and „learned“ by MultiTerm.

• Click Browse to select the database or to create a new one.

• You can also select a remote termbase, if available. Click OK to confirm.

Terms excluded manually by the user are stored in the specified

termbase

• Depending on the type of project you are creating, one of three pages is displayed.

• For Monolingual and Bilingual Term Extraction Projects, the Term extraction settings page is displayed.

• In the Minimum term length box, specify the minimum number of words required to form a term candidate. The default setting is 1.

• In the Maximum term length box, specify the maximum number of words that a term candidate may have. The default setting is 10.

• Select Maximum number of extracted terms to set the maximum number of term candidates that SDL MultiTerm Extract will extract. The default setting is 100.

Silence/noise ratio. • The higher the silence ratio, the fewer

term candidates are extracted. However, these term candidates are generally of a higher quality.

• The higher the noise ratio, the more term candidates are extracted. However, it is likely that these will include a number of lower quality candidates.

• Click Stopword Lists to set the language-specific list(s) containing articles, pronouns and other items that are to be excluded from the extraction process.

• These lists are available on the net and should be downloaded to disc D.

Validating extracted terms

• 1. Review the term candidates in the Term window. Validate terms by selecting a term and pressing the space bar or selecting the check box. You can also right-click and select Validate from the shortcut menu. Use the down arrow to scroll through the term candidate list. You can delete terms that are not relevant.

Review the term and its translations using the Generate sentences button

Context can be generated for 1 term or all terms

• 2. After you review all term candidates in the Term window, use the Term properties window to examine terms in more detail. You can add new terms and remove term candidates that are not useful.

You can validate all terms by right-clicking in the Term window and selecting Validate

All from the shortcut menu

EXPORTING EXTRACTED TERMS

• You can export to a:

• MultiTerm Termbase• MultiTerm XML Format• Tab-deliminated Text Format.

Export in TXT format

• To export extracted terms in TXT format:• 1 Select File > Export from the menu

bar. The Export Wizard is launched.• 2. Click Next. The Export Definition page

is displayed. Use this page to select the export type.

• 3. Tick the option Create a new export definition and select Tab-delimited.

• 4. Click Next. • 5. Select the Export File Location. By

default, SDL MultiTerm Extract selects the current projects folder and creates a text file (*.txt) with the same name as the project name.

• Filter settings window is displayed.

• You can select a filter from the drop-down list. SDL MultiTerm Extract has two predefined filter definitions: Only Validated and Only Non Validated terms. The default setting is No Filter.

6. Click Next and the export process starts.

As a result we have a list of extracted terms

top related