Top Banner
Multiterm Extract Terminology Extraction Marchitan Irina, Senior lecture, MA Dep.Translation, Interpreting and Applied Linguistics
67

Seminar 3 Multiterm Extract Eng

Dec 25, 2015

Download

Documents

CagAllecy

methods
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Seminar 3 Multiterm Extract Eng

Multiterm ExtractTerminology Extraction

Marchitan Irina, Senior lecture, MA

Dep.Translation, Interpreting and Applied Linguistics

Page 2: Seminar 3 Multiterm Extract Eng

MultiTerm Extract

• Multiterm Extract uses statistical extraction (based on the frequency of the appearance of candidate terms) to extract term candidates and, for multilingual termbases, their probable translations and present them in a term candidate list.

Page 3: Seminar 3 Multiterm Extract Eng

MultiTerm Extract

• You can extract terms from monolingual or bilingual documents and from translation memories and then quickly and easily validate the terms.

• You can then use the multiple export options to incorporate extracted terms into your existing MultiTerm termbase, MultiTerm XML format or a tab-delimited format.

Page 4: Seminar 3 Multiterm Extract Eng

• SDL MultiTerm Extract supports all languages, including those of the Far East.

Page 5: Seminar 3 Multiterm Extract Eng

MultiTerm Extract allows you to:

• Create term candidate lists from monolingual documents.

• Create bilingual term candidate lists from bilingual file formats.

• Update existing multilingual termbases by creating new translations for terms already stored in MultiTerm.

Page 6: Seminar 3 Multiterm Extract Eng

MultiTerm Extract allows you to:

• Analyze the quality of termbases and documents by comparing the terms found in documents with those stored in the termbase.

• Compile bilingual dictionaries from bilingual file formats.

Page 7: Seminar 3 Multiterm Extract Eng

MultiTerm Extract allows you to:

• Manually extract terms from the source document(s) to complement and enhance the automatic term extraction process.

• Export extracted terms to MultiTerm, MultiTerm XML or a tab-delimited format.

Page 8: Seminar 3 Multiterm Extract Eng

Objectives

• Compiling glossaries for simultaneous and consecutive translation

• Updating existing TB• Making up AutoSuggest dictionaries

Page 9: Seminar 3 Multiterm Extract Eng

Stages of work

• 1. Create a new SDL MultiTerm Extract project. Select File > New Project from the menu bar and launch New Project Wizard.

• Choose project type, add working files, make necessary settings and stop word lists.

Page 10: Seminar 3 Multiterm Extract Eng

Stages of work

• 2. Launch term extraction process.• 3. Review and validate the extracted

terms.• 4. Export validated terms using Export

Wizard.

Page 11: Seminar 3 Multiterm Extract Eng

Types of projects

• There are five types of projects in SDL MultiTerm Extract:

• o Monolingual Term Extraction• o Bilingual Term Extraction• o Translation• o Quality Assurance• o Dictionary Compilation

Page 12: Seminar 3 Multiterm Extract Eng

• Project Monolingual Term Extraction• Extracts terminology from a text in one

language.• Project Bilingual Term Extraction• Extracts terminology from monolingual

formats. • Extracts and translates from bilingual

formats.

Page 13: Seminar 3 Multiterm Extract Eng

• Project Dictionary Compilation• Extracts a dictionary from an existing

translation memory file .• Project Translation• Provides suggestions from existing texts and

allows you to enter translations to a termbase.

• Proiect QA (Quality Assurance)• Analyzes and allows you to improve the

quality of termbases and documents.

Page 14: Seminar 3 Multiterm Extract Eng

CREATING A PROJECT

• 1. Launch MultiTerm Extract.

• 2. Select New Project in File menu. The New Project Wizard opens and the Choose project type, name and location page is displayed. Use this page to define the project type, name and location.

Page 15: Seminar 3 Multiterm Extract Eng
Page 16: Seminar 3 Multiterm Extract Eng

Select Monolingual Project

denumirea

Locul pe disc D

Page 17: Seminar 3 Multiterm Extract Eng
Page 18: Seminar 3 Multiterm Extract Eng
Page 19: Seminar 3 Multiterm Extract Eng

Add files with ST

Page 20: Seminar 3 Multiterm Extract Eng
Page 21: Seminar 3 Multiterm Extract Eng
Page 22: Seminar 3 Multiterm Extract Eng

Exclusion settings (what to exclude)

Settings for terms excluded by user

Page 23: Seminar 3 Multiterm Extract Eng

Excluded terms:

• Excluded terms settings are used to define the exclusion and learning settings for your project.

• This page is not displayed for Translation Projects and QA Projects.

Page 24: Seminar 3 Multiterm Extract Eng

Exclusion Settings

• Exclusion Settings – during the extraction process, SDL MultiTerm Extract ignores any terms that already exist in the termbase and/or file that is specified under Exclusion Settings.

Page 25: Seminar 3 Multiterm Extract Eng

Settings for saving terms excluded by the user

• Learning Settings - during the extraction process, SDL MultiTerm Extract records any terms that you manually reject in a dedicated database.

• SDL MultiTerm Extract learns to automatically ignore these unwanted terms as extraction continues.

Page 26: Seminar 3 Multiterm Extract Eng

To specify the Exclusion Settings:

• Click Exclude termbase. • The Choose Exclude Termbase dialog

box is displayed.

Page 27: Seminar 3 Multiterm Extract Eng
Page 28: Seminar 3 Multiterm Extract Eng

• Select the termbase you want to exclude from the term extraction process.

• Click OK. • If you selected a project termbase

already in step 5, this is automatically displayed under Exclusion Settings.

Page 29: Seminar 3 Multiterm Extract Eng
Page 30: Seminar 3 Multiterm Extract Eng

• Complete the Choose Index to Exclude dialog box.

• Select the source index to which SDL MultiTerm Extract should refer when checking for terms to exclude from the term extraction process.

• Click OK to confirm settings.

Page 31: Seminar 3 Multiterm Extract Eng
Page 32: Seminar 3 Multiterm Extract Eng

• Click Exclude file to specify lists of terms you do not want to include in the extraction process.

• Select the corresponding file (list of terms).

Page 33: Seminar 3 Multiterm Extract Eng

• SDL MultiTerm Extract comes with basic vocabulary lists for certain European languages. If there is a basic vocabulary list available for your source language, this is automatically displayed.

• Press Select to confirm your choice.

Page 34: Seminar 3 Multiterm Extract Eng

To specify the Learning Settings:

• Select the Record discarded terms• check box to choose a learning database

where terms excluded by the user will be stored and „learned“ by MultiTerm.

• Click Browse to select the database or to create a new one.

• You can also select a remote termbase, if available. Click OK to confirm.

Page 35: Seminar 3 Multiterm Extract Eng
Page 36: Seminar 3 Multiterm Extract Eng

Terms excluded manually by the user are stored in the specified

termbase

Page 37: Seminar 3 Multiterm Extract Eng

• Depending on the type of project you are creating, one of three pages is displayed.

• For Monolingual and Bilingual Term Extraction Projects, the Term extraction settings page is displayed.

• In the Minimum term length box, specify the minimum number of words required to form a term candidate. The default setting is 1.

Page 38: Seminar 3 Multiterm Extract Eng

• In the Maximum term length box, specify the maximum number of words that a term candidate may have. The default setting is 10.

• Select Maximum number of extracted terms to set the maximum number of term candidates that SDL MultiTerm Extract will extract. The default setting is 100.

Page 39: Seminar 3 Multiterm Extract Eng
Page 40: Seminar 3 Multiterm Extract Eng

Silence/noise ratio. • The higher the silence ratio, the fewer

term candidates are extracted. However, these term candidates are generally of a higher quality.

• The higher the noise ratio, the more term candidates are extracted. However, it is likely that these will include a number of lower quality candidates.

Page 41: Seminar 3 Multiterm Extract Eng
Page 42: Seminar 3 Multiterm Extract Eng

• Click Stopword Lists to set the language-specific list(s) containing articles, pronouns and other items that are to be excluded from the extraction process.

• These lists are available on the net and should be downloaded to disc D.

Page 43: Seminar 3 Multiterm Extract Eng
Page 44: Seminar 3 Multiterm Extract Eng
Page 45: Seminar 3 Multiterm Extract Eng
Page 46: Seminar 3 Multiterm Extract Eng
Page 47: Seminar 3 Multiterm Extract Eng
Page 48: Seminar 3 Multiterm Extract Eng

Validating extracted terms

Page 49: Seminar 3 Multiterm Extract Eng

• 1. Review the term candidates in the Term window. Validate terms by selecting a term and pressing the space bar or selecting the check box. You can also right-click and select Validate from the shortcut menu. Use the down arrow to scroll through the term candidate list. You can delete terms that are not relevant.

Page 50: Seminar 3 Multiterm Extract Eng

Review the term and its translations using the Generate sentences button

Page 51: Seminar 3 Multiterm Extract Eng

Context can be generated for 1 term or all terms

Page 52: Seminar 3 Multiterm Extract Eng
Page 53: Seminar 3 Multiterm Extract Eng

• 2. After you review all term candidates in the Term window, use the Term properties window to examine terms in more detail. You can add new terms and remove term candidates that are not useful.

Page 54: Seminar 3 Multiterm Extract Eng
Page 55: Seminar 3 Multiterm Extract Eng

You can validate all terms by right-clicking in the Term window and selecting Validate

All from the shortcut menu

Page 56: Seminar 3 Multiterm Extract Eng

EXPORTING EXTRACTED TERMS

• You can export to a:

• MultiTerm Termbase• MultiTerm XML Format• Tab-deliminated Text Format.

Page 57: Seminar 3 Multiterm Extract Eng

Export in TXT format

• To export extracted terms in TXT format:• 1 Select File > Export from the menu

bar. The Export Wizard is launched.• 2. Click Next. The Export Definition page

is displayed. Use this page to select the export type.

Page 58: Seminar 3 Multiterm Extract Eng
Page 59: Seminar 3 Multiterm Extract Eng
Page 60: Seminar 3 Multiterm Extract Eng
Page 61: Seminar 3 Multiterm Extract Eng

• 3. Tick the option Create a new export definition and select Tab-delimited.

• 4. Click Next. • 5. Select the Export File Location. By

default, SDL MultiTerm Extract selects the current projects folder and creates a text file (*.txt) with the same name as the project name.

• Filter settings window is displayed.

Page 62: Seminar 3 Multiterm Extract Eng
Page 63: Seminar 3 Multiterm Extract Eng

• You can select a filter from the drop-down list. SDL MultiTerm Extract has two predefined filter definitions: Only Validated and Only Non Validated terms. The default setting is No Filter.

Page 64: Seminar 3 Multiterm Extract Eng
Page 65: Seminar 3 Multiterm Extract Eng

6. Click Next and the export process starts.

Page 66: Seminar 3 Multiterm Extract Eng

As a result we have a list of extracted terms

Page 67: Seminar 3 Multiterm Extract Eng