Top Banner
DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT
86

DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

Dec 23, 2015

Download

Documents

Lesley Nelson
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

DEV-23: Global Applications and Code Pages

Jordi SastreApplication Architect

PSC IT

Page 2: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation2 DEV-23: Global Applications and Code Pages

Introduction

Global applications need to deal with several languages, countries and time zones

Do’s and don'ts about globalization using OpenEdge® technology

Based on real experience from an IT department

Not a complete review of OpenEdge features

Page 3: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation3 DEV-23: Global Applications and Code Pages

Agenda

Code Pages Overview OpenEdge Settings Common Mistakes Hints & Tips Linguistic Sorting and Collation Time Zones Summary Questions

Page 4: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation4 DEV-23: Global Applications and Code Pages

Code Pages Overview

Code page is a table that maps characters to numbers (code points)

ASCII was created in 1963 to encode 127 characters based on the English alphabet

ASCII = “American Standard Code for Information Interchange”

EBCDIC = “Extended Binary Coded Decimal Interchange Code”

8-bit code pages appeared for other languages, encoding up to 255 characters

Page 5: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation5 DEV-23: Global Applications and Code Pages

Code Pages Overview

All code pages include the ASCII encoding in the first 127 code points, except EBCDIC

A single code page does not contain all characters for all languages, except Unicode

A character may have different code points in different code pages

Data may become corrupted when transferred between two different code pages

Page 6: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation6 DEV-23: Global Applications and Code Pages

Data Corruption

“è”

France Czech Republic

E8 E8

“č”

1250ISO8859-1

Page 7: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation7 DEV-23: Global Applications and Code Pages

Data Corruption

English uses the 127 codes that are common in all code pages, including Unicode

Problems may occur when:• Handling non-English data• Using platforms with non-English settings• Pasting MS Office text, even in English

Page 8: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation8 DEV-23: Global Applications and Code Pages

8-bit Code Pages

ISO8859-1 and ISO8859-2 were defined by ISO and mainly used on Unix systems.

1250 and 1252 were defined by Microsoft and used on MS Windows.

IBM437, IBM850 and IBM852 were defined by IBM and used on PC-DOS/MS-DOS.

Page 9: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation9 DEV-23: Global Applications and Code Pages

8-bit Code Pages

ISO8859-1, IBM850 and 1252 are used for Western European languages:• Danish, Dutch, English, Finnish, French, German, Italian,

Norwegian, Portuguese, Spanish, Swedish, etc.

ISO8859-2, IBM852 and 1250 are used for Central European languages:• Czech, Hungarian, Polish, German, etc.

IBM437 is mainly used for English, although it contains some extra characters

Page 10: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation10 DEV-23: Global Applications and Code Pages

8-bit Code Pages

Examples of character encoding:

ISO8859-1 ISO8859-2 1252 1250 IBM437 IBM850 IBM852

a 61 61 61 61 61 61 61

á E1 E1 E1 E1 A0 A0 A0

È C8 n/a C8 n/a n/a D4 n/a

Č n/a C8 n/a C8 n/a n/a AC

“ n/a n/a 93 93 n/a n/a n/a

Page 11: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation11 DEV-23: Global Applications and Code Pages

8-bit Code Pages

Where to find code page tables:

• 10.1B Internationalizing Applications manual (IBM850 and

ISO8859-1) • http://www.microsoft.com/globaldev/reference/cphome.mspx• http://www-03.ibm.com/servers/eserver/iseries/software/

globalization/codepages.html• http://en.wikipedia.org• http://www.fileformat.info/info/charset/index.htm

Page 12: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation12 DEV-23: Global Applications and Code Pages

Unicode

What is Unicode?

Unicode provides a unique number for every character,no matter what the platform,no matter what the program,no matter what the language.

http://www.unicode.org/standard/WhatIsUnicode.html

ISO/IEC 10646 It covers virtually ALL characters in the world!

Page 13: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation13 DEV-23: Global Applications and Code Pages

Unicode and UTF

Unicode stands for “Unique Code” UTF stands for “Unicode Transformation

Format” UTF is not a code page, but an encoding

format for the Unicode code page UTF encodes Unicode codes into 1 to 4 bytes UTF-8, UTF-16 and UTF-32 are the three

basic encoding forms supported by Unicode All UTF formats handle all Unicode codes

Page 14: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation14 DEV-23: Global Applications and Code Pages

UTF Encoding Examples

Unicode UTF-8 UTF-16 UTF-32

U+004D 4D 00 4D 00 00 00 4D

U+00A1 C2 A1 00 A1 00 00 00 A1

U+00E1 C3 A1 00 E1 00 00 00 E1

U+0470 D0 C0 04 70 00 00 04 70

U+4E9C E4 BA 9C 4E 9C 00 00 4E 9C

U+10302 F0 90 9C 82 D8 00 DF 02 00 01 03 02

Page 15: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation15 DEV-23: Global Applications and Code Pages

Unicode Conversion

All code pages convert to Unicode Unicode may not convert to other code pages

UnicodeUnicode

IBM437IBM437

IBM852IBM852

IBM850IBM850

12501250

12521252

ISO8859-2ISO8859-2

ISO8859-1ISO8859-1

IBM437IBM437

IBM852IBM852

IBM850IBM850

12501250

12521252

ISO8859-2ISO8859-2

ISO8859-1ISO8859-1

??

Page 16: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation16 DEV-23: Global Applications and Code Pages

Agenda

Code Pages Overview OpenEdge Settings Common Mistakes Hints & Tips Linguistic Sorting and Collation Time Zones Summary Questions

Page 17: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation17 DEV-23: Global Applications and Code Pages

OpenEdge Settings

Database settings• _db._db-xl-name: Database code page

• _db._db-coll-name: Database collation

Startup parameters• -cpinternal: Process code page

• -cpstream: Input/Output code page

• -cpcoll: Process collation

• -d: Date format

• -E: Numeric format

Page 18: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation18 DEV-23: Global Applications and Code Pages

More OpenEdge Settings

-cplog: Code page for log files (-cpstream) -cpterm: Code page for screen I/O (-cpstream) -cpprint: Code page for printing (-cpstream) -numsep: Separator for thousands (-E) -numdec: Separator for decimals (-E) -cprcodein/-cprcodeout: Code page for

compiled code (-cpinternal) -lng: Translation Manager language

Page 19: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation19 DEV-23: Global Applications and Code Pages

Even More OpenEdge Settings

convmap.cp: Character Processing Tables progress.ini: Fonts

(More parameters in documentation)

Page 20: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation20 DEV-23: Global Applications and Code Pages

OpenEdge Settings

OpenEdge ProcessOpenEdge Process

OS filesOS files

KeyboardKeyboard

ScreenScreen

-cpinternal

-cpstream

DatabaseDatabase

_db-xl-name

OpenEdgeOpenEdgecode pagecode page

conversions !conversions !

OpenEdgeOpenEdgecode pagecode page

conversions !conversions !PrinterPrinter

GU

IG

UI

CH

UI

CH

UI

_db-xl-name, -cpinternal and -cpstream

Page 21: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation21 DEV-23: Global Applications and Code Pages

DB SERVER_mprosrv

DB SERVER_mprosrv

OS filesOS files

-cpinternal

-cpstream

OpenEdge Settings

GUI CLIENTprowin32

GUI CLIENTprowin32

OS filesOS files

KeyboardKeyboard

ScreenScreen

-cpinternal

-cpstream

DatabaseDatabase

_db-xl-name

WEBSPEED™_progres -web

WEBSPEED™_progres -web

OS filesOS files

WebBrowser

WebBrowser

-cpinternal

-cpstream

APPSERVER™

_proapsv

APPSERVER™

_proapsv

OS filesOS files

-cpinternal

-cpstream

PrinterPrinter

PrinterPrinter

CHUI CLIENT_progres

CHUI CLIENT_progres

OS filesOS files

KeyboardKeyboard

ScreenScreen

-cpinternal

-cpstream

PrinterPrinter

Page 22: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation22 DEV-23: Global Applications and Code Pages

OpenEdge Settings

Since OpenEdge 10 supports UTF-8 in most processes…

… just configure all OE settings to UTF-8 ! Well, not really. We need to look at:

• Operating System• Web Server• Printer drivers• Data from/to other systems• OCX’s• Terminal Emulators, etc.

Page 23: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation23 DEV-23: Global Applications and Code Pages

DB SERVER_mprosrv

DB SERVER_mprosrv

OS filesOS files

-cpinternal

-cpstream

OpenEdge Settings

GUI CLIENTprowin32

GUI CLIENTprowin32

OS filesOS files

KeyboardKeyboard

ScreenScreen

-cpinternal

-cpstream

DatabaseDatabase

_db-xl-name

WEBSPEED™_progres -web

WEBSPEED™_progres -web

OS filesOS files

WebBrowser

WebBrowser

-cpinternal

-cpstream

APPSERVER™

_proapsv

APPSERVER™

_proapsv

OS filesOS files

-cpinternal

-cpstream

PrinterPrinter

PrinterPrinter

CHUI CLIENT_progres

CHUI CLIENT_progres

OS filesOS files

KeyboardKeyboard

ScreenScreen

-cpinternal

-cpstream

PrinterPrinter

Page 24: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation24 DEV-23: Global Applications and Code Pages

OpenEdge Settings

Database should use Unicode (UTF-8) to ensure support for all characters

_db-xl-name (metaschema field)

Page 25: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation25 DEV-23: Global Applications and Code Pages

OpenEdge Settings

Processes should use Unicode to ensure support for all characters

Best if -cpinternal matches database Batch Client (_progres –b) can use Unicode,

but Character Client (_progres) cannot Interfaces with Windows controls

-cpinternal (startup parameter)

Page 26: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation26 DEV-23: Global Applications and Code Pages

OpenEdge Settings

-cpstream is the main cause of data corruption when set incorrectly

It tells the code page of input/output data from/to files

On Character Client it also tells the code page of keyboard and screen

Rule of thumb:• Set -cpstream to match the Operating System

code page• Use ABL to override -cpstream when needed

-cpstream (startup parameter)

Page 27: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation27 DEV-23: Global Applications and Code Pages

OpenEdge Settings

-cpstream (startup parameter)

Unix/Linux code page

DOS code page

C:\>mode con cp

Status for device CON:---------------------- Code page: 437

% locale charmapISO8859-1

Page 28: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation28 DEV-23: Global Applications and Code Pages

OpenEdge Settings

Contains the Character Processing Tables DLC/convmap.cp DLC/prolang/convmap/convmap.dat OpenEdge 10.1B out of the box contains:

• 54 code pages

• 595 code page conversion tables

• 491 collation tables

More tables can be added

convmap.cp (OpenEdge file)

Page 29: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation29 DEV-23: Global Applications and Code Pages

OpenEdge Settings

Use appropriate fonts for code page and language:• Recommended to replace MS Sans Serif with

Microsoft Sans Serif

• MS Gothic or MS Mincho for Japanese

• MS Song for Chinese

• Use script when needed font0=Courier New, size=8, script=russian

font0=Courier New, size=8, script=easteurope

progress.ini (OpenEdge file)

Page 30: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation30 DEV-23: Global Applications and Code Pages

Not an OpenEdge Setting

Linked fonts

Information about Windows fonts:http://www.microsoft.com/typography/fonts/default.aspx

http://www.microsoft.com/globaldev/getwr/steps/wrg_font.mspx

Windows Fonts

Page 31: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation31 DEV-23: Global Applications and Code Pages

OpenEdge Settings

Meet requirements for Input/Output:• -cpinternal for process and GUI I/O (UTF-8)

• -cpstream for file I/O and CHUI I/O (OS)

Decide the code page when exporting data Know the code page when importing data

Summary

OpenEdge ProcessOpenEdge Process

OS filesOS files

KeyboardKeyboard

ScreenScreen

-cpinternal

-cpstream

DatabaseDatabase

_db-xl-name

PrinterPrinter

GU

IG

UI

CH

UI

CH

UI

Page 32: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation32 DEV-23: Global Applications and Code Pages

Agenda

Code Pages Overview OpenEdge Settings Common Mistakes Hints & Tips Linguistic Sorting and Collation Time Zones Summary Questions

Page 33: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation33 DEV-23: Global Applications and Code Pages

Common Mistakes

Loading or importing data with the wrong code page

C4 8C 7A 65

63 68

C4 8C 7A 65

63 68

ČzechČzech

ÄŚzechÄŚzech

ÄŒzechÄŒzechISO8859-1

UTF-8

1250

Page 34: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation34 DEV-23: Global Applications and Code Pages

Byte Order Mark (BOM)

Identifies the UTF encoding of a data file Unicode code point U+FEFF U+FEFF is also encoded:

UTF-8: EF BB BF UTF-16BE: FE FF UTF-16LE: FF FE UTF-32BE: 00 00 FE FF UTF-32LE: FF FE 00 00

OpenEdge understands BOMs when reading

Page 35: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation35 DEV-23: Global Applications and Code Pages

Byte Order Mark (BOM)

EF BB DF C4

8C 7A 65 63

68

EF BB DF C4

8C 7A 65 63

68

ČzechČzech

ČzechČzech

ČzechČzechISO8859-1

UTF-8

1250

CautionCaution !

Page 36: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation36 DEV-23: Global Applications and Code Pages

(…)"imuller" "Ian Muller" "Y" "C" 1657 283200"jdoe" "Jane Doe" "N" "U" 3275 450010"jsmith" "John Smith" "Y" "C" 1450 323700"jsanchez" "Juan Sánchez" "Y" "C" 4250 323900.PSCfilename=usersrecords=0000000001133ldbname=mydatabasetimestamp=2007/03/28-20:55:03numformat=44,46dateformat=mdy-1950map=NO-MAPcpstream=ISO8859-1.0000143373

(…)"imuller" "Ian Muller" "Y" "C" 1657 283200"jdoe" "Jane Doe" "N" "U" 3275 450010"jsmith" "John Smith" "Y" "C" 1450 323700"jsanchez" "Juan Sánchez" "Y" "C" 4250 323900.PSCfilename=usersrecords=0000000001133ldbname=mydatabasetimestamp=2007/03/28-20:55:03numformat=44,46dateformat=mdy-1950map=NO-MAPcpstream=ISO8859-1.0000143373

Common Mistakes

Loading or importing data with the wrong code page

Page 37: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation37 DEV-23: Global Applications and Code Pages

_progres_progres

E0-cpstream IBM850

Common Mistakes

Updating data with the wrong code page

_mprosrv_mprosrv

OS = 1252

àà ÓE0

D3

E0D3-cpinternal IBM850

-cpinternal ISO8859-1

_db-xl-nameISO8859-1

Page 38: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation38 DEV-23: Global Applications and Code Pages

_progres_progres

E0-cpstream 1252

Common Mistakes

Updating data with the CORRECT code page

_mprosrv_mprosrv

OS = 1252

àà àE0

E0

85E0-cpinternal IBM850

-cpinternal ISO8859-1

_db-xl-nameISO8859-1

Page 39: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation39 DEV-23: Global Applications and Code Pages

_progres –web_progres –web

Common Mistakes

Updating data with the wrong code page

-cpstream UTF-8

Page 40: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation40 DEV-23: Global Applications and Code Pages

Common Mistakes

Incorrect tools to verify data

Notepad sometimes guesses the code page based on the content

Notepad understands BOM, Excel doesn’t Startup parameters in Procedure Editor Fonts in progress.ini Terminal Emulator needs to be configured to

support remote OS code page Use an Hexadecimal Editor Two wrongs may make it look right

Page 41: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation41 DEV-23: Global Applications and Code Pages

Agenda

Code Pages Overview OpenEdge Settings Common Mistakes Hints & Tips Linguistic Sorting and Collation Time Zones Summary Questions

Page 42: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation42 DEV-23: Global Applications and Code Pages

Tips & Hints

When starting development, make sure all the components have the correct code page settings

Each application may need different code page settings

When integrating, review the code page settings of all applications and processes involved

Development and Integration

Page 43: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation43 DEV-23: Global Applications and Code Pages

Tips & Hints

How to display the code page settings:

MESSAGE "Database = " DBCODEPAGE(1) SKIP "Collation = " DBCOLLATION(1) SKIP "-cpinternal = " SESSION:CPINTERNAL SKIP "-cpstream = " SESSION:CPSTREAM SKIP "-cpcoll = " SESSION:CPCOLL SKIP VIEW-AS ALERT-BOX.

Page 44: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation44 DEV-23: Global Applications and Code Pages

Tips & Hints

Temp-tables use their own word-break tables for word indexes

Use -ttwrdrul parameter

Temp-tables using Word Indexes

DatabaseDatabase

Word Break Table

Word Break Table

Progress clients

prowin32

_progres [-web]

Progress clients

prowin32

_progres [-web]

proutil -C wbreak-compiler

-ttwrdrulproutil -C word-rules

Page 45: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation45 DEV-23: Global Applications and Code Pages

Tips & Hints

When using OUTPUT TO, know the code page you need the output to be converted to, which will be dependant on how the file will be used

When using INPUT FROM, know in what code page the imported data was encoded

To override the -cpstream default:OUTPUT TO file CONVERT TARGET "UTF-8".INPUT FROM file CONVERT SOURCE "UTF-8".

Stamp code page, especially for integration

Input/Output

Page 46: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation46 DEV-23: Global Applications and Code Pages

Tips & Hints

Many UTF-8 characters are more than one byte:

returns

UTF-8 can be multi-byte!

DEFINE VARIABLE c AS CHARACTER INIT "á".MESSAGE LENGTH(c) SKIP LENGTH(c,"RAW") VIEW-AS ALERT-BOX.

Page 47: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation47 DEV-23: Global Applications and Code Pages

Tips & Hints

Use CHR() and ASC() with code page parameters

Do not hard-code encoding values See examples…

CHR() and ASC()

Page 48: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation48 DEV-23: Global Applications and Code Pages

Tips & Hints

Detecting non-breaking blank spaces (NBSP)

Better code:

CHR() and ASC() – Example 1

CASE SESSION:CPINTERNAL: WHEN "UTF-8" THEN IF c = CHR(49824) THEN MESSAGE "NBSP" VIEW-AS ALERT-BOX. WHEN "ISO8859-1" THEN IF c = CHR(160) THEN MESSAGE "NBSP" VIEW-AS ALERT-BOX.END CASE.

IF c = CHR(49824,SESSION:CPINTERNAL,"UTF-8") THEN MESSAGE "NBSP" VIEW-AS ALERT-BOX.

Page 49: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation49 DEV-23: Global Applications and Code Pages

Tips & Hints

OpenEdge silently ignores incorrect values to ASC() or CHR()

CHR() and ASC() – Example 2

/* When run with –cpinternal UTF-8 it returns YES because 160 is not a valid UTF-8 encoding. When run with –cpinternal 1252 it returns NO.*/MESSAGE CHR(160) = "" VIEW-AS ALERT-BOX.

/* Always returns NO */MESSAGE CHR(49824,SESSION:CPINTERNAL,"UTF-8") = "" VIEW-AS ALERT-BOX.

Page 50: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation50 DEV-23: Global Applications and Code Pages

Tips & Hints

CHR() and ASC() work with encoding values, as opposed to code points

For example, this code run on a session with -cpinternal UTF-8

returns 50081 (C3A1) and not 225 (00E1).

Unicode UTF-8

U+00E1 C3 A1

CHR() and ASC() – Example 3

DEFINE VARIABLE c AS CHARACTER NO-UNDO.c = "á".MESSAGE ASC(c) VIEW-AS ALERT-BOX.

Page 51: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation51 DEV-23: Global Applications and Code Pages

Tips & Hints

If needed, Unicode code points can be used:

Unicode code points

DEFINE VARIABLE c AS CHARACTER NO-UNDO.c = "á".MESSAGE c = "~u00E1" SKIP c = CHR(50081) SKIP c = CHR(225,"UTF-8","1252") VIEW-AS ALERT-BOX.

Page 52: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation52 DEV-23: Global Applications and Code Pages

_progres_progres

E0-cpstream IBM850

Tips & Hints

-mprosrv-mprosrv

OS = 1252

àà ÓE0

D3

E0D3-cpinternal IBM850

-cpinternal ISO8859-1

_db-xl-nameISO8859-1

Un-corrupting data

Page 53: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation53 DEV-23: Global Applications and Code Pages

Tips & Hints

Un-corrupting data

FOR EACH myTable EXCLUSIVE-LOCK. RUN FixChar(INPUT-OUTPUT myTable.myField).END.

PROCEDURE FixChar: DEF INPUT-OUTPUT PARAM c AS CHAR NO-UNDO. c = CODEPAGE-CONVERT(c,"IBM850","ISO8859-1").END PROCEDURE.

ISO8859-1 database with data encoded in IBM850

Run on session with -cpinternal iso8859-1

Page 54: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation54 DEV-23: Global Applications and Code Pages

Tips & Hints

How to output UTF-8 BOM to a file

Intended for Notepad (.txt) or web browser (.html)

BOM

OUTPUT TO text.txt CONVERT TARGET "UTF-8".PUT CONTROL "~357~273~277". /* BOM */PUT UNFORMATTED "UTF-8 text".OUTPUT CLOSE.

Page 55: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation55 DEV-23: Global Applications and Code Pages

Tips & Hints

Web browser needs to map WebSpeed’s -cpstream

Original outputHeader procedure:

PROCEDURE outputHeader: output-content-type ("text/html").END PROCEDURE.

_progres –web_progres –web-cpstream UTF-8 Encoding ???

Web Browser

Page 56: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation56 DEV-23: Global Applications and Code Pages

Tips & Hints

Web browser needs to map WebSpeed’s -cpstream (1)

Use OpenEdge’s convcp.p procedure

PROCEDURE outputHeader: DEF VAR cMimeCP AS CHAR NO-UNDO. RUN adecomm/convcp.p(SESSION:CPSTREAM, "ToMime", OUTPUT cMimeCP). output-content-type ("text/html; charset=" + cMimeCP).END PROCEDURE.

Page 57: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation57 DEV-23: Global Applications and Code Pages

Tips & Hints

Web browser needs to map WebSpeed’s –cpstream (2)

Use User Defined Function

GetMimeCP converts OpenEdge code page names to MIME names

See example…

PROCEDURE outputHeader: output-content-type ("text/html; charset=" + GetMimeCP(SESSION:CPSTREAM)).END PROCEDURE.

Page 58: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation58 DEV-23: Global Applications and Code Pages

FUNCTION GetMimeCP RETURNS CHAR (INPUT progress-CodePage AS CHAR): DEF VAR pro-cplist AS CHAR INIT"1250,1251,1252,1253,1254,1255,1256,1257,1258,620-2533,BIG-5,EUCJIS,GB2312,IBM037,IBM273,IBM277,IBM278,IBM284,IBM297,IBM437,IBM500,IBM850,IBM851,IBM852,IBM857,IBM858,IBM861,IBM862,IBM866,ISO8859-1,ISO8859-10,ISO8859-15,ISO8859-2,ISO8859-3,ISO8859-4,ISO8859-5,ISO8859-6,ISO8859-7,ISO8859-8,ISO8859-9,KOI8-R,KSC5601,ROMAN-8,SHIFT-JIS,UCS2,UTF-8". DEF VAR MIME-cplist AS CHAR INIT"Windows-1250,Windows-1251,Windows-1252,Windows-1253,Windows-1254,Windows-1255,Windows-1256,Windows-1257,Windows-1258,TIS-620,Big5,EUC-JP,GB_2312-80,IBM037,IBM273,IBM277,IBM278,IBM284,IBM297,IBM437,IBM500,IBM850,IBM851,IBM852,IBM857,IBM00858,IBM861,IBM862,IBM866,ISO-8859-1,ISO-8859-10,ISO-8859-15,ISO-8859-2,ISO-8859-3,ISO-8859-4,ISO-8859-5,ISO-8859-6,ISO-8859-7,ISO-8859-8,ISO-8859-9,KOI8-R,KS_C_5601-1987,hp-roman8,Shift_JIS,UTF-16,UTF-8". DEF VAR i AS INT. i = LOOKUP(progress-CodePage,pro-cplist). RETURN IF i = 0 THEN "Unknown" ELSE ENTRY(i,MIME-cplist).END FUNCTION.

Tips & Hints

GetMimeCP example

Page 59: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation59 DEV-23: Global Applications and Code Pages

Do not store decimal values in char fields

prog2.p will fail if run with a different -E or -numdec than prog1.p

Comma-delimited lists

Tips & Hints

Caution with numeric format

/* prog1.p */DEFINE VARIABLE d AS DECIMAL INIT 123.45.CREATE table.table.char1 = STRING(d).

/* prog2.p */FIND FIRST table.DISPLAY DECIMAL(table.char1).

Page 60: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation60 DEV-23: Global Applications and Code Pages

Tips & Hints

Date and Numeric formats can be changed at run time

DEFINE VARIABLE mynum AS DECIMAL NO-UNDO.SESSION:DATE-FORMAT = "mdy".DISPLAY SESSION:DATE-FORMAT TODAY SKIP.SESSION:DATE-FORMAT = "dmy".DISPLAY SESSION:DATE-FORMAT TODAY FORMAT "99-99-9999" SKIP.SESSION:DATE-FORMAT = "ymd".DISPLAY SESSION:DATE-FORMAT TODAY FORMAT "9999.99.99" SKIP.mynum = 12345.67.SESSION:NUMERIC-FORMAT = "American".DISPLAY SESSION:NUMERIC-FORMAT STRING(mynum) SKIP.SESSION:NUMERIC-FORMAT = "European".DISPLAY SESSION:NUMERIC-FORMAT STRING(mynum) SKIP WITH NO-LABELS.

Page 61: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation61 DEV-23: Global Applications and Code Pages

Tips & Hints

Never use the “undefined” code page If the source and target code pages are the

same, no conversion happens If we always make the same mistake we’ll not

notice the data corruption r-code is encoded using -cpinternal Source files are encoded using -cpstream Recognize UTF-8 read as iso8859-1:

• ö becomes ö

Miscellaneous

Page 62: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation62 DEV-23: Global Applications and Code Pages

Tips & Hints

How to create a UTF-8 word-break table:> proutil -C wbreak-compiler %DLC%\prolang\convmap\utf8-bas.wbt 1

> copy proword.1 %DLC%

How to create a UTF-8 database:> prodb <db> %DLC%\prolang\utf\empty.db

> proutil <db> -C word-rules 1

How to start a UTF-8 client:> _progres -b –cpinternal UTF-8 -ttwrdrul 1

> prowin32 –cpinternal UTF-8 -ttwrdrul 1

DBA reminder

Page 63: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation63 DEV-23: Global Applications and Code Pages

Agenda

Code Pages Overview OpenEdge Settings Common Mistakes Hints & Tips Linguistic Sorting and Collation Time Zones Summary Questions

Page 64: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation64 DEV-23: Global Applications and Code Pages

Linguistic Sorting and Collation

Collation: Set of rules for ordering and comparing character data

OpenEdge supports 54 ICU (International Components for Unicode) collations with UTF-8

Local databases vs global databases COMPARE and COLLATE

Page 65: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation65 DEV-23: Global Applications and Code Pages

Linguistic Sorting and Collation

FOR EACH mytable BY myfield: DISPLAY myfield WITH FONT 8.END.

Sorting with Basic collation

AaaÁááÄääÇççĈĉĉBbbCccZzz

Basic

Page 66: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation66 DEV-23: Global Applications and Code Pages

Linguistic Sorting and Collation

FOR EACH mytable BY COLLATE(myfield,"CASE-INSENSITIVE","ICU-UCA"): DISPLAY myfield WITH FONT 8.END.

Sorting with English collation

AaaÁááÄääBbbCccĈĉĉÇççZzz

AaaÁááÄääÇççĈĉĉBbbCccZzz

Basic ICU-UCA

Page 67: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation67 DEV-23: Global Applications and Code Pages

Linguistic Sorting and Collation

FOR EACH mytable BY COLLATE(myfield,"CASE-INSENSITIVE","ICU-fi"): DISPLAY myfield WITH FONT 8.END.

Sorting with Finnish collation

AaaÁááBbbCccĈĉĉÇççZzzÄää

AaaÁááÄääÇççĈĉĉBbbCccZzz

Basic ICU-fi

AaaÁááÄääBbbCccĈĉĉÇççZzz

ICU-UCA

Page 68: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation68 DEV-23: Global Applications and Code Pages

Linguistic Sorting and Collation

FOR EACH mytable WHERE myfield >= "C" BY myfield: DISPLAY myfield WITH FONT 8.END.

Comparing with Basic collation

CccZzz

Basic

Page 69: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation69 DEV-23: Global Applications and Code Pages

Linguistic Sorting and Collation

FOR EACH mytable WHERE COMPARE(myfield,">=","C", "CASE-INSENSITIVE","ICU-UCA") BY COLLATE(myfield,"CASE-INSENSITIVE","ICU-UCA"): DISPLAY myfield WITH FONT 8.END.

Comparing with English collation

CccZzz

Basic

CccĈĉĉÇççZzz

ICU-UCA

Page 70: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation70 DEV-23: Global Applications and Code Pages

Linguistic Sorting and Collation

FOR EACH mytable WHERE COMPARE(myfield,">=","C", "CASE-INSENSITIVE","ICU-fi") BY COLLATE(myfield,"CASE-INSENSITIVE","ICU-fi"): DISPLAY myfield WITH FONT 8.END.

Comparing with Finnish collation

CccĈĉĉÇççZzzÄää

CccZzz

Basic ICU-fi

CccĈĉĉÇççZzz

ICU-UCA

Page 71: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation71 DEV-23: Global Applications and Code Pages

Linguistic Sorting and Collation

Global Setup

DatabaseDatabase

-cpcoll ICU-uca-cpcoll ICU-uca

AppServerAppServerEnglish User

French User

Czech User

Finnish User

TEMP-TABLES

TEMP-TABLES

TEMP-TABLES

TEMP-TABLES

TEMP-TABLES

TEMP-TABLES

TEMP-TABLES

TEMP-TABLES

-cpcoll ICU-en-cpcoll ICU-en

-cpcoll ICU-fr-cpcoll ICU-fr

-cpcoll ICU-cs-cpcoll ICU-cs

-cpcoll ICU-fi-cpcoll ICU-fi

-cpcoll ICU-uca---

Uses clientcollation inCOMPARE

andCOLLATE

-cpcoll ICU-uca---

Uses clientcollation inCOMPARE

andCOLLATE

RUN ASprg.p ON hAppServer (INPUT SESSION:CPCOLL, INPUT USERID, INPUT <other parameters>, OUTPUT TABLE ttMytable).

Caution with performance!Caution with performance!

Page 72: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation72 DEV-23: Global Applications and Code Pages

Agenda

Code Pages Overview OpenEdge Settings Common Mistakes Hints & Tips Linguistic Sorting and Collation Time Zones Summary Questions

Page 73: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation73 DEV-23: Global Applications and Code Pages

Time Zones

Timestamps: client vs server vs GMT Display time: saved vs converted Database queries: saved vs converted

Considerations

http://www.csgnetwork.com/timezonemap.html

Page 74: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation74 DEV-23: Global Applications and Code Pages

Time Zones

DST usedDST no longer usedDST never used http://en.wikipedia.org/wiki/Daylight_saving_time

Extra consideration

Daylight Saving Time for time conversions

Page 75: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation75 DEV-23: Global Applications and Code Pages

Time Zones

Operating Systems have time zone tables• Solaris: /usr/share/lib/zoneinfo

• HP-UX: /usr/lib/tztab

• Red Hat: /usr/share/zoneinfo• Windows: HKEY_LOCAL_MACHINE\SOFTWARE\

Microsoft\Windows NT\CurrentVersion\Time Zones

Java uses its own time zone tables OpenEdge relies on the platform

OS Support

Page 76: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation76 DEV-23: Global Applications and Code Pages

Time Zones

DATETIME and DATETIME-TZ data types

DEFINE VARIABLE dt AS DATETIME.DEFINE VARIABLE dtz AS DATETIME-TZ.dt = NOW.dtz = NOW.MESSAGE dt SKIP dtz VIEW-AS ALERT-BOX.

This is offset,not Time Zone !This is offset,not Time Zone !

Page 77: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation77 DEV-23: Global Applications and Code Pages

Time Zones

Timestamping

DatabaseDatabase

All timesare GMT

All timesare GMT

AppServerAppServer

UserUserGets OS timein GMT

Gets OS timein GMT

Converts GMTTo User’sTime Zone

Converts GMTTo User’sTime Zone

Page 78: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation78 DEV-23: Global Applications and Code Pages

Time Zones

Displaying times

DatabaseDatabase

GMT TimesGMT Times

AppServerAppServer

UserUser

Converts GMTTo User’sTime Zone

Converts GMTTo User’sTime Zone

UserUser

UserUser

UserUser

12:30GMT

08:30

14:30

22:30

22:30

(-1) 07:30

(-1) 13:30

(0) 22:30

(+1) 23:30

Summer Winter

BedfordUSA

BerlinGermany

BrisbaneAustralia

SydneyAustralia

Page 79: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation79 DEV-23: Global Applications and Code Pages

Time Zones

users 10 user-id C X(8) User ID 20 tz-id C X(4) Time zone ID

timezones 10 tz-id C X(4) Time zone ID 20 tz-name C X(40) Time zone name

tz-changes 10 tz-id C X(4) Time zone ID 20 tz-date D 99/99/9999 Date that the changes apply from 30 min-1 I ->>>9 Normal minutes of difference from GMT 40 min-2 I ->>>9 Minutes of difference from GMT during DST 50 from-month I >9 Month when DST starts 60 from-day I 9 Code for day when DST starts 70 from-time C 99:99 Time when DST starts 80 to-month I >9 Month when DST ends 90 to-day I 9 Code for day when DST ends 100 to-time C 99:99 Time when DST ends

Database tables

Page 80: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation80 DEV-23: Global Applications and Code Pages

Time Zones

ABL functions

GetGMT() to get current time in GMT

FUNCTION GetGMT RETURNS DATETIME (): DEF VAR dtGMT AS DATETIME NO-UNDO. dtGMT = ADD-INTERVAL(NOW,- TIMEZONE,'MINUTES'). RETURN dtGMT.END FUNCTION.

Page 81: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation81 DEV-23: Global Applications and Code Pages

Time Zones

ABL functions

ConvertDT() to convert GMT to user’s time

FUNCTION ConvertDT RETURNS DATETIME (INPUT pdtNow AS DATETIME NO-UNDO, INPUT pcTz-id AS CHARACTER NO-UNDO):

DEF VAR dtOut AS DATETIME NO-UNDO. FIND LAST tz-change NO-LOCK WHERE tz-change.tz-id = pcTz-id AND tz-change.tz-date <= DATE(pdtNow) NO-ERROR. (...) RETURN dtOut.END FUNCTION.

Page 82: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation82 DEV-23: Global Applications and Code Pages

Agenda

Code Pages Overview OpenEdge Settings Common Mistakes Hints & Tips Linguistic Sorting and Collation Time Zones Summary Questions

Page 83: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation83 DEV-23: Global Applications and Code Pages

Summary

UTF-8 for database and -cpinternal as a start Know the code page of data getting into and

out of OpenEdge (-cpstream / CONVERT) Two wrongs may make it look right It’s not only about conversion, but checking

results as well – Use hexadecimal tools Take a look at the 10.1B Internationalizing

Applications manual Code Pages are tricky, but fun !

Page 84: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation84 DEV-23: Global Applications and Code Pages

Questions?

Page 85: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation85 DEV-23: Global Applications and Code Pages

Thank you foryour time

Page 86: DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

© 2007 Progress Software Corporation86 DEV-23: Global Applications and Code Pages