Top Banner
integrated translation environment Introduction to Regular Expressions in memoQ © 2004-2013 Kilgray Translation Technologies. All rights reserved.
14

Introduction Regular Expressions En

Nov 07, 2014

Download

Documents

cheddo

MEMOQ guide to regex.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Introduction Regular Expressions En

integrated translation environment

Introduction to Regular

Expressions in memoQ

© 2004-2013 Kilgray Translation Technologies.

All rights reserved.

Page 2: Introduction Regular Expressions En

Regular expressions tutorial

memoQ integrated translation environment Page 2 of 14

Contents

Contents ...................................................................................................................................... 2

1 Basics on regular expressions ..................................................................................................... 3

1.1 Introduction to regular expressions .......................................................................................... 3

1.2 Special characters in regular expressions ................................................................................. 3

2 memoQ and regular expressions ................................................................................................ 5

2.1 Auto-translation rules ............................................................................................................... 5

2.1.1 Using Auto-translatables in the QA check .............................................................................. 8

2.2 Segmentation rules ................................................................................................................... 8

2.3 Cascading filters ...................................................................................................................... 10

2.4 The Regex text filter ................................................................................................................ 11

2.5 The internal Regex Tagger ....................................................................................................... 13

This tutorial covers the regular expressions functionalities of memoQ 6.2. It contains text items from

the English user interface of the program. These items are under constant verification and are sub-

ject to change without prior notification.

Page 3: Introduction Regular Expressions En

Regular expressions tutorial

memoQ integrated translation environment Page 3 of 14

1 Basics on regular expressions

Regular expressions are patterns used to describe text strings and to match these character combina-

tions in strings.

1.1 Introduction to regular expressions

To explain regular expressions, the following example sentence is used:

This is a regular expression.

This example sentence can be “described” as the following:

• a group of 5 words ending with a dot

• a string that starts with a capital “T” and ends with a dot

• a group of characters, followed by a space, followed by another group of characters, etc. un-

til we meet a dot

There are countless possibilities to describe this regular expression and with what you want to match

it with.

1.2 Special characters in regular expressions

There are a few basic rules to observe when you create regular expressions. The first rule: Keep it

simple. 10 “commands” are enough for a basic usage.

The following commands can be used to create your own expressions:

• any character

( ) a group

[ ] a range

\s any space

\d any digit

? as few times as possible

* matches the preceding character 0 or more times

+ matches the preceding character 1 or more times

{ x } exactly x times

{ x, y } between x and y times

When you want to literally match one of the characters used as commands, you can use:

\ backslash

\* an asterisk

\[ a left bracket

\( a left parenthesis

Page 4: Introduction Regular Expressions En

Regular expressions tutorial

memoQ integrated translation environment Page 4 of 14

1.2.1 Example

How would you describe a date given in its numeric form? – 31/01/2012

31/01/2012 = \d{1,2}/ \d{1,2}/ \d{4}

Or (if you are sure day and month will always be marked using 2 digits)

\d{2}/ \d{2}/ \d{4}

You can break down the regular expression into groups:

31/01/2012

DD/MM/YYYY

=

(\d{2})/( \d{2})/ (\d{4})

group 1 group 2 group3

$1 $2 $3

You can also transform the regular expression when you have the numeric date as following:

MM/DD/YYYY. To transform your regular expression, exchange the groups:

MM/DD/YYYY

group 2 group 1 group3

$2 $1 $3

Note: A group is of what is in parenthesis like (\d{2}) and can be expressed as $1. The dollar

symbol expresses such a group.

Page 5: Introduction Regular Expressions En

Regular expressions tutorial

memoQ integrated translation environment Page 5 of 14

2 memoQ and regular expressions

In memoQ, you can adjust light resources (auto-translation rules, segmentation rules) to enhance the

default regular expressions or to create new ones. If you are an advanced user of regular expressions,

memoQ uses the standard .NET implementation of regex.

You can also use a cascading filter and Regex text filter functionalities in memoQ to improve the file

import. You can also use the Regex Tagger to tag code after the document was already imported. The

following sections describe how you can edit, create and use regular expressions in memoQ.

Please also see the Kilgray webinar on regular expressions on the Kilgray website > Resource Center:

http://kilgray.com/webinars/regex-masses-english-011749

2.1 Auto-translation rules

In memoQ, you can define regular expressions. Open memoQ, go to Tools > Resource console. Select

Auto-translation rules in the Resource categories on the left.

Page 6: Introduction Regular Expressions En

Regular expressions tutorial

memoQ integrated translation environment Page 6 of 14

memoQ offers you default regular expressions for each language. You can create new expressions in

clicking the Create new command link below, or you can clone an existing one. To clone a default set

of auto-translatables, click the Clone command link below the Auto-translation rules list. You can also

import auto-translatable rules created by another memoQ user. Click the Import new command link

below the Auto-translation rules list, browse to the MQRES file which contains the regular expres-

sion, and import the file.

Note: memoQ enables you to exchange light resources such as Auto-Translation rules in the

MQRES file format, which is a memoQ proprietary file format, to import and export resources

from memoQ.

Select an auto-translation rule set, and click the Edit link. The Edit auto-translation rule set dialog ap-

pears:

Page 7: Introduction Regular Expressions En

Regular expressions tutorial

memoQ integrated translation environment Page 7 of 14

1. Enter your rule in the Auto-translation rules.

2. Click the Add button to add your auto-translation rule. You can also change an existing rule,

then click the Change button. Click the Delete button to delete a rule.

3. When you enter a rule, you need to enter in the Replace order rules the rule you want to re-

place your expression with.

In the example above:

• Figuren is replaced by Figs.,

• ([\d] (1,4)) is the first number to be replaced, made of 1 to 4 digits and corresponds to group

$2 in the Replace order rules

• (bis) is replaced by to

• ([\d] (1,4)) is the second number to be replaced, also made of 1 to 4 digits and corresponds

to group $4 in the Replace order rules

4. Click OK to close the Edit auto-translation rule set dialog.

Note: If you want to automatically replace some words or expressions by their translated equiva-

lents, you can enter custom translation pairs in the Translation pairs. Translation pairs can, for

example, be used for translating names of months, days, names of measurement units etc.

Further information on auto-translation rule sets can be found in the memoQ Help: Functions and

Settings > Edit resources > Light resources > Edit auto-translation rules. Further information on regu-

Page 8: Introduction Regular Expressions En

Regular expressions tutorial

memoQ integrated translation environment Page 8 of 14

lar expressions can be found in the memoQ Help as well: Functions and Settings > Regular Expres-

sions and Tagging.

2.1.1 Using Auto-translatables in the QA check

You can check if the auto-translation rules are correctly applied in your target text in using the QA.

In Project home, go to Settings > QA settings. Select the QA settings for this project, click the Edit

command link. The Edit QA settings dialog appears:

In the Segments and terms tab, check the Check auto-translatables check box. memoQ now checks if

the auto-translation rules for the target text are correctly applied when you run the QA.

2.2 Segmentation rules

In memoQ, you can define segmentation rules. Open memoQ, and go to Tools > Resource console. Se-

lect Segmentation rules in the Resource categories on the left. Select Segmentation rules, and click the

Edit command link. The Edit segmentation rule set dialog appears:

Page 9: Introduction Regular Expressions En

Regular expressions tutorial

memoQ integrated translation environment Page 9 of 14

Note: You can also import or export SRX files. SRX is the Segmentation Rule Exchange format to

exchange segmentation rules from different tools. This enables you to use the same segmenta-

tion rules in memoQ as well as in other tools.

In the Segmentation tab, you have 2 lists: Rules and Exceptions. You can add, change, or delete seg-

mentation rules. Click the Preview button to display a preview of the segmentation rule set which

you want to apply.

In the Custom lists tab, you can add, change, or delete Custom lists. Select a custom list to see the

corresponding elements in the List items displayed:

Page 10: Introduction Regular Expressions En

Regular expressions tutorial

memoQ integrated translation environment Page 10 of 14

Select for instance the #abbr_long# in the general abbreviation list. You can see the abbreviations of

this list in the List items. Items in #abbr_long# do not have to be preceded by a whitespace.

Select for instance the #abbr_short# items; they need to be preceded by a whitespace. For example,

if you have eg. as an abbreviation, you should include it here, not in #abbr_long#. If you do not,

memoQ will not start a new sentence after "beg."

Click the Preview button to display a preview of the segmentation rule you created or changed.

IMPORTANT: Segmentation rules must be selected before you import documents into a memoQ

project. To assign segmentation rules, create a project, but do not import documents. You can

click Finish in the New project wizard after the first dialog. In Project home, go to Settings > Seg-

mentation rules, and check the check box of the segmentation rules you want to apply for the

document import. Then start the document import.

2.3 Cascading filters

You can use cascading filters to import a document. memoQ detects the default filter based on the

file extension, but you can also select a second filter in a filter chain. A filter chain or a cascading filter

is a document filter configuration where memoQ runs a second filter after the default document fil-

ter when it imports a document.

This is useful when the imported text contains further markup. For example, cells in an Excel work-

book may contain HTML markup, and you can turn that into sensible inline tags by applying the

HTML filter or the Regex tagger after the Excel document filter.

How to add a cascading filter:

1. In the Translations pane of Project home, click the Import with options... link, then select

Change filter and configuation to display the Document import settings dialog.

2. In the Document import settings dialog, click Add cascading filter....

3. In the Filter drop-down list, choose one of the filters available. Choose one that is most ap-

propriate for the text that is imported by the first filter. In most cases, you will use the XML

filter, the HTML filter, or the Regex tagger in this place.

4. In the Filter configuration drop-down list, you can choose the available filter configurations

for the selected document type or filter. When you configure a filter, you can save the set-

tings to be re-used or shared as a filter configuration resource. Select a filter configuration

from the drop-down list. Click the folder icon to display the Load filter configuration dialog.

A cascading filter selection could look like the following:

Page 11: Introduction Regular Expressions En

Regular expressions tutorial

memoQ integrated translation environment Page 11 of 14

5. Click OK to return to the Document import options, and click OK to import the document us-

ing the specified cascading filter.

2.4 The Regex text filter

Using a Regex text filter, memoQ can process structured text files and extract translatable content

from these files. memoQ can also extract context and comments for the imported content. You can

mainly control the regex text filter through regular expressions.

The Regex text filter processes structured text files in three steps:

1. It breaks up the files into paragraphs.

2. Extracts paragraphs that contain translatable text.

3. From the extracted paragraphs, it extracts translatable text, and optionally context and

comments.

The options of the filter follow these three steps:

1. To specify how paragraphs are separated;

2. To specify how an imported paragraph should look like;

3. To list those parts that really needed to be translated.

This procedure requires writing up regular expressions, and this is something you can do through trial

and error. Before you proceed with importing the file, you can always click the Preview tab to see

what will be imported.

You have 2 options to configure a Regex-based text filter:

Option 1:

Page 12: Introduction Regular Expressions En

Regular expressions tutorial

memoQ integrated translation environment Page 12 of 14

1. In the Translations pane of Project home, click the Import with options... command link

below the document list.

2. In the Open dialog, select All files from the Files of type drop-down list. Click Open to

proceed: the Document import settings dialog appears.

3. From the Filter drop-down list, choose Regex text filter.

Option 2:

1. From the Tools menu, choose Resource console > Filter configurations.

2. Click the Create new command link, and choose from the Filters drop-down list Regex text

filter.

3. Enter a name for the filter, and click OK. The filter is now listed in the Filters list.

4. Select the just created filter, and click the Edit command link.

Both options open the Edit filter configuration dialog:

Further information on how to use the different tabs to configure this filter can be found in the

memoQ Help.

Page 13: Introduction Regular Expressions En

Regular expressions tutorial

memoQ integrated translation environment Page 13 of 14

2.5 The internal Regex Tagger

The internal Regex Tagger runs after a document was imported. You can use the Regex Tagger to

create tags out of non-translatable elements in any text.

When you are working on a document in the translation editor, choose Run Regex Tagger... from the

Format menu. The Tag current document dialog appears:

You can set up multiple rules in a single regular expression filter configuration. These are listed in the

top box of the Rules section.

To add a pattern to turn into a tag:

1. First type a regular expression in the Regular expression text box. This can be a simple

expression. For example, if you want to replace the word 'memoQ' with an empty inline tag,

simply type 'memoQ' in the Regular expression text field.

2. You can also enter more complex expressions where a simple pattern can represent several

different character sequences. If you click the Pattern... link next to the text field, you get a

menu of elements with the most commong available commands.

For example, the regular expression '<[^/].*?>' matches text that starts with the '<' character,

followed by the shortest possible sequence of characters that does not contain the '/'

character, and ends in a '>' character. In short, text that matches this pattern looks like an

XML <tag>.

Page 14: Introduction Regular Expressions En

Regular expressions tutorial

memoQ integrated translation environment Page 14 of 14

3. After you type the regular expression, choose the type of tag you want to see in the place of

the text. You can choose to use an opening tag , a closing tag , or an empty tag . These

correspond to the types of tags commonly used in XML markup.

Note: If you check the Required check box, memoQ will indicate a tagging error instead of a

simple warning if the corresponding tag is not copied to the target text.

4. In the Display text text box, you can specify what label memoQ should write inside the tag.

This is called a replacement rule, and you also use these in auto-translation rules. You can

write any text here, but you can also use the pre-defined $0 expression which matches the

whole pattern..

Note: If the regular expression contains groups, you can use $1, $2 etc. to refer to the first,

second etc. group in the replacement rule. You can choose from available options if you click

the Pattern... link next to the Display text box.

5. After you filled in the Display text field, click Add to add the rule to the list. You can also edit

an existing rule, click the rule in the list, and click Change. You can also remove a rule from

the list, click the rule, and click Delete.

6. Click Run tagger now to close the dialog, and match your patterns against the text in the

active document. Click Cancel to close the dialog without making changes.

Note: The Run tagger now button does not save the rules. If you want to re-use the rule set

for later tagging, click the Save icon at the top before leaving the Tag current document dia-

log.