Go Global Fearless Conquer the world by Internationalizing your product! D V enkata Rajesh Principal QA Engineer Progress Software
Go Global Fearless
Conquer the world by Internationalizing your product!
D V enkata RajeshPrincipal QA Engineer Progress Software
© 2014 Progress Software Corporation. All rights reserved.2
Agenda
Introduction –I18N , L10N
All about I18N & L10N - Terminology!
Unicode – Deep dive into details
Localization testing tips
© 2014 Progress Software Corporation. All rights reserved.3
Why L10N?
TOP 10 Global Internet Websites have 81% of User base outside America
92% of the Top 25 Grossing iPhone apps in China use Chinese names
80% of the Top 25 Grossing Android apps in Japan use Japanese names
41% of the total app global revenue came from Asia, while North America generated 31% and Europe23%
72.4% of global consumers indicated that they prefer to use their native language when shopping online
Sources: KPCB, Common Sense Advisory, App Annie 2014
© 2014 Progress Software Corporation. All rights reserved.4
Localization with trending Mobile & Cloud era – Social media
© 2014 Progress Software Corporation. All rights reserved.5
I18N and L10N
Internationalization is a process of designing a software application to adapt to various languages and regions without any changes in source
Localization is the process of customizing a software application that was originally designed for a domestic market so that it can be released in foreign markets
© 2014 Progress Software Corporation. All rights reserved.6
Internationalization Process
© 2014 Progress Software Corporation. All rights reserved.7
Internationalization process
Source code
Hard coded contents
Resource bundles
Move contents to a properties file
MessagesBundle_fr_FR.propertiesMessagesBundle_en_US.properties
© 2014 Progress Software Corporation. All rights reserved.8
EvolutionCharacter sets, Code pages, Encoding
© 2014 Progress Software Corporation. All rights reserved.9
11000011 10000000
ÀéçЉД文字निخ�
ÀU+00C0
Process of text to encoding
© 2014 Progress Software Corporation. All rights reserved.10
Code Pages
IBM code pages
ISO code pages
Microsoft code pages
Code pages NameISO 8859-1 Latin-1ISO 8859-2 Latin-2ISO 8859-3 Latin-3ISO 8859-4 Latin-4ISO 8859-5 CyrillicISO 8859-6 ArabicISO 8859-7 GreekISO 8859-8 HebrewISO 8859-9 Latin-5
ISO 8859-10 Latin-6ISO 8859-11 ThaiISO 8859-13 Latin-7ISO 8859-14 Latin-8ISO 8859-15 Latin-9ISO 8859-16 Latin-10
Code pages Name
CP 1250 Latin 2
CP 1251 Cyrillic
CP 1252 Latin 1
CP 1253 GreekCP 1254 Latin 5
CP 1255 Hebrew
CP 1256 ArabicCP 1257 Baltic
CP 1258 Viet NamCP 874 Thai
Code pages Name37 USA/Canada - CECP256 International #1259 Symbols, Set 7
273Germany F.R./Austria - CECP
274 Old Belgium Code Page275 Brazil - CECP276 Canada (French) - 94
850Personal Computer - Multilingual Page
278 Finland, Sweden - CECP280 Italy - CECP281 Japan (Latin) - CECP282 Portugal - CECP
284 Spain/Latin America - CECP
285 United Kingdom - CECP
© 2014 Progress Software Corporation. All rights reserved.11
Common Encoding Problems
Tofuhollow boxes
Mojibakegarbage characters
Question Marks (conversion not supported)
© 2014 Progress Software Corporation. All rights reserved.12
Unicode
Deep dive into Normalization , Compatibility, Replacement characters ..
© 2014 Progress Software Corporation. All rights reserved.13
Unicode - Encodes the world’s scripts
Code space of up to 0x10FFFF (about 1.1 million) characters
Currently encodes 120,737 characters
Currently allocated code points 264,256
U+0041 <= hex notation
Plane Allocated code points Assigned characters
0 BMP 65,392 55,181
1 SMP 14,000 11,833
2 SIP 53,424 53,386
3 TIP 16,672 799
14 SSP 368 337
15 PUA-A 65,536
16 PUA-B 65,536
Totals 264,256 120,737
© 2014 Progress Software Corporation. All rights reserved.14
Four Normalization Forms
Form Dcanonical decomposition
Form Ccanonical decomposition followed by composition
Form KDCompatibility decomposition
Form KCCompatibility decomposition followed by composition
ways to represent:U+01FA
U+00C5 U+0301U+00C1 U+030AU+212B U+0301
U+0041 U+0301 U+030AU+0041 U+030A U+0301
Ǻ
© 2014 Progress Software Corporation. All rights reserved.15
Unicode Encoding Forms
UTF-32• Uses 32-bit code units • All characters are the same width
UTF-16• Uses 16-bit code units• BMP characters use one 16-bit code unit• Supplementary characters use two special 16-bit code units: a “surrogate pair”
UTF-8• Uses 8-bit code units (bytes!)• It’s a multi-byte encoding! • Characters use between 1 and 4 bytes• ASCII is ASCII in UTF-8
© 2014 Progress Software Corporation. All rights reserved.16
Localization testing tips
© 2014 Progress Software Corporation. All rights reserved.17
Case study: A Website + 10 languages + 4 Browsers + 20 test cases
LOCALIZATIoN
L10N
TESTING
© 2014 Progress Software Corporation. All rights reserved.18
Localization testing UI checks
Layout Hot keys Text Graphics• Text truncation• Control truncation• Misalignment• Overlapping• Tabbing order• Oversized dialogs• Different layout in general
• Duplicated hotkeys• Missing hotkey• Inappropriate hotkey
• Un-translated text• Mistranslated text• Unexpected text• Inconsistent translation• Technical inaccuracy• Double space after full stop• Wrong alphabetical order• Wrong date/time format• Corrupt characters
• Missing graphics• Different graphics• Un-translated icons
© 2014 Progress Software Corporation. All rights reserved.19
Pseudo Localization testing
A way to evaluate a website or software product’s readiness for the localization process
Considered a part of the internationalization testing process
1. Identify hard-coded strings that should be translatable
2. Find strings in the source files that shouldn’t be translated
3. Identify design and layout issues that will affect the
software or site when it is translated
© 2014 Progress Software Corporation. All rights reserved.20
Regional differences
© 2014 Progress Software Corporation. All rights reserved.21
More localization testing
Aspect Challenge
Limitation of screen sizeCharacter count and font of characters differ in various languages.
DirectionSome languages are written left to right, whereas others are written right to left.
Spelling rules and upper and lower case conversions Rules differ based on locale.
Regional StandardsApplications may have to be compatible with not only national languages, but also the regional languages
Data HandlingDifferent data storages and processing mechanisms along with different encoding/code pages.
Context and Special Characters
The translation of special characters needs to be handled carefully as different characters may have different meanings in different languages.
Collation And Sorting Sorting and collation rules differ in various languages.
© 2014 Progress Software Corporation. All rights reserved.22