DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT
Dec 23, 2015
© 2007 Progress Software Corporation2 DEV-23: Global Applications and Code Pages
Introduction
Global applications need to deal with several languages, countries and time zones
Do’s and don'ts about globalization using OpenEdge® technology
Based on real experience from an IT department
Not a complete review of OpenEdge features
© 2007 Progress Software Corporation3 DEV-23: Global Applications and Code Pages
Agenda
Code Pages Overview OpenEdge Settings Common Mistakes Hints & Tips Linguistic Sorting and Collation Time Zones Summary Questions
© 2007 Progress Software Corporation4 DEV-23: Global Applications and Code Pages
Code Pages Overview
Code page is a table that maps characters to numbers (code points)
ASCII was created in 1963 to encode 127 characters based on the English alphabet
ASCII = “American Standard Code for Information Interchange”
EBCDIC = “Extended Binary Coded Decimal Interchange Code”
8-bit code pages appeared for other languages, encoding up to 255 characters
© 2007 Progress Software Corporation5 DEV-23: Global Applications and Code Pages
Code Pages Overview
All code pages include the ASCII encoding in the first 127 code points, except EBCDIC
A single code page does not contain all characters for all languages, except Unicode
A character may have different code points in different code pages
Data may become corrupted when transferred between two different code pages
© 2007 Progress Software Corporation6 DEV-23: Global Applications and Code Pages
Data Corruption
“è”
France Czech Republic
E8 E8
“č”
1250ISO8859-1
© 2007 Progress Software Corporation7 DEV-23: Global Applications and Code Pages
Data Corruption
English uses the 127 codes that are common in all code pages, including Unicode
Problems may occur when:• Handling non-English data• Using platforms with non-English settings• Pasting MS Office text, even in English
© 2007 Progress Software Corporation8 DEV-23: Global Applications and Code Pages
8-bit Code Pages
ISO8859-1 and ISO8859-2 were defined by ISO and mainly used on Unix systems.
1250 and 1252 were defined by Microsoft and used on MS Windows.
IBM437, IBM850 and IBM852 were defined by IBM and used on PC-DOS/MS-DOS.
© 2007 Progress Software Corporation9 DEV-23: Global Applications and Code Pages
8-bit Code Pages
ISO8859-1, IBM850 and 1252 are used for Western European languages:• Danish, Dutch, English, Finnish, French, German, Italian,
Norwegian, Portuguese, Spanish, Swedish, etc.
ISO8859-2, IBM852 and 1250 are used for Central European languages:• Czech, Hungarian, Polish, German, etc.
IBM437 is mainly used for English, although it contains some extra characters
© 2007 Progress Software Corporation10 DEV-23: Global Applications and Code Pages
8-bit Code Pages
Examples of character encoding:
ISO8859-1 ISO8859-2 1252 1250 IBM437 IBM850 IBM852
a 61 61 61 61 61 61 61
á E1 E1 E1 E1 A0 A0 A0
È C8 n/a C8 n/a n/a D4 n/a
Č n/a C8 n/a C8 n/a n/a AC
“ n/a n/a 93 93 n/a n/a n/a
© 2007 Progress Software Corporation11 DEV-23: Global Applications and Code Pages
8-bit Code Pages
Where to find code page tables:
• 10.1B Internationalizing Applications manual (IBM850 and
ISO8859-1) • http://www.microsoft.com/globaldev/reference/cphome.mspx• http://www-03.ibm.com/servers/eserver/iseries/software/
globalization/codepages.html• http://en.wikipedia.org• http://www.fileformat.info/info/charset/index.htm
© 2007 Progress Software Corporation12 DEV-23: Global Applications and Code Pages
Unicode
What is Unicode?
Unicode provides a unique number for every character,no matter what the platform,no matter what the program,no matter what the language.
http://www.unicode.org/standard/WhatIsUnicode.html
ISO/IEC 10646 It covers virtually ALL characters in the world!
© 2007 Progress Software Corporation13 DEV-23: Global Applications and Code Pages
Unicode and UTF
Unicode stands for “Unique Code” UTF stands for “Unicode Transformation
Format” UTF is not a code page, but an encoding
format for the Unicode code page UTF encodes Unicode codes into 1 to 4 bytes UTF-8, UTF-16 and UTF-32 are the three
basic encoding forms supported by Unicode All UTF formats handle all Unicode codes
© 2007 Progress Software Corporation14 DEV-23: Global Applications and Code Pages
UTF Encoding Examples
Unicode UTF-8 UTF-16 UTF-32
U+004D 4D 00 4D 00 00 00 4D
U+00A1 C2 A1 00 A1 00 00 00 A1
U+00E1 C3 A1 00 E1 00 00 00 E1
U+0470 D0 C0 04 70 00 00 04 70
U+4E9C E4 BA 9C 4E 9C 00 00 4E 9C
U+10302 F0 90 9C 82 D8 00 DF 02 00 01 03 02
© 2007 Progress Software Corporation15 DEV-23: Global Applications and Code Pages
Unicode Conversion
All code pages convert to Unicode Unicode may not convert to other code pages
UnicodeUnicode
IBM437IBM437
IBM852IBM852
IBM850IBM850
12501250
12521252
ISO8859-2ISO8859-2
ISO8859-1ISO8859-1
IBM437IBM437
IBM852IBM852
IBM850IBM850
12501250
12521252
ISO8859-2ISO8859-2
ISO8859-1ISO8859-1
??
© 2007 Progress Software Corporation16 DEV-23: Global Applications and Code Pages
Agenda
Code Pages Overview OpenEdge Settings Common Mistakes Hints & Tips Linguistic Sorting and Collation Time Zones Summary Questions
© 2007 Progress Software Corporation17 DEV-23: Global Applications and Code Pages
OpenEdge Settings
Database settings• _db._db-xl-name: Database code page
• _db._db-coll-name: Database collation
Startup parameters• -cpinternal: Process code page
• -cpstream: Input/Output code page
• -cpcoll: Process collation
• -d: Date format
• -E: Numeric format
© 2007 Progress Software Corporation18 DEV-23: Global Applications and Code Pages
More OpenEdge Settings
-cplog: Code page for log files (-cpstream) -cpterm: Code page for screen I/O (-cpstream) -cpprint: Code page for printing (-cpstream) -numsep: Separator for thousands (-E) -numdec: Separator for decimals (-E) -cprcodein/-cprcodeout: Code page for
compiled code (-cpinternal) -lng: Translation Manager language
© 2007 Progress Software Corporation19 DEV-23: Global Applications and Code Pages
Even More OpenEdge Settings
convmap.cp: Character Processing Tables progress.ini: Fonts
(More parameters in documentation)
© 2007 Progress Software Corporation20 DEV-23: Global Applications and Code Pages
OpenEdge Settings
OpenEdge ProcessOpenEdge Process
OS filesOS files
KeyboardKeyboard
ScreenScreen
-cpinternal
-cpstream
DatabaseDatabase
_db-xl-name
OpenEdgeOpenEdgecode pagecode page
conversions !conversions !
OpenEdgeOpenEdgecode pagecode page
conversions !conversions !PrinterPrinter
GU
IG
UI
CH
UI
CH
UI
_db-xl-name, -cpinternal and -cpstream
© 2007 Progress Software Corporation21 DEV-23: Global Applications and Code Pages
DB SERVER_mprosrv
DB SERVER_mprosrv
OS filesOS files
-cpinternal
-cpstream
OpenEdge Settings
GUI CLIENTprowin32
GUI CLIENTprowin32
OS filesOS files
KeyboardKeyboard
ScreenScreen
-cpinternal
-cpstream
DatabaseDatabase
_db-xl-name
WEBSPEED™_progres -web
WEBSPEED™_progres -web
OS filesOS files
WebBrowser
WebBrowser
-cpinternal
-cpstream
APPSERVER™
_proapsv
APPSERVER™
_proapsv
OS filesOS files
-cpinternal
-cpstream
PrinterPrinter
PrinterPrinter
CHUI CLIENT_progres
CHUI CLIENT_progres
OS filesOS files
KeyboardKeyboard
ScreenScreen
-cpinternal
-cpstream
PrinterPrinter
© 2007 Progress Software Corporation22 DEV-23: Global Applications and Code Pages
OpenEdge Settings
Since OpenEdge 10 supports UTF-8 in most processes…
… just configure all OE settings to UTF-8 ! Well, not really. We need to look at:
• Operating System• Web Server• Printer drivers• Data from/to other systems• OCX’s• Terminal Emulators, etc.
© 2007 Progress Software Corporation23 DEV-23: Global Applications and Code Pages
DB SERVER_mprosrv
DB SERVER_mprosrv
OS filesOS files
-cpinternal
-cpstream
OpenEdge Settings
GUI CLIENTprowin32
GUI CLIENTprowin32
OS filesOS files
KeyboardKeyboard
ScreenScreen
-cpinternal
-cpstream
DatabaseDatabase
_db-xl-name
WEBSPEED™_progres -web
WEBSPEED™_progres -web
OS filesOS files
WebBrowser
WebBrowser
-cpinternal
-cpstream
APPSERVER™
_proapsv
APPSERVER™
_proapsv
OS filesOS files
-cpinternal
-cpstream
PrinterPrinter
PrinterPrinter
CHUI CLIENT_progres
CHUI CLIENT_progres
OS filesOS files
KeyboardKeyboard
ScreenScreen
-cpinternal
-cpstream
PrinterPrinter
© 2007 Progress Software Corporation24 DEV-23: Global Applications and Code Pages
OpenEdge Settings
Database should use Unicode (UTF-8) to ensure support for all characters
_db-xl-name (metaschema field)
© 2007 Progress Software Corporation25 DEV-23: Global Applications and Code Pages
OpenEdge Settings
Processes should use Unicode to ensure support for all characters
Best if -cpinternal matches database Batch Client (_progres –b) can use Unicode,
but Character Client (_progres) cannot Interfaces with Windows controls
-cpinternal (startup parameter)
© 2007 Progress Software Corporation26 DEV-23: Global Applications and Code Pages
OpenEdge Settings
-cpstream is the main cause of data corruption when set incorrectly
It tells the code page of input/output data from/to files
On Character Client it also tells the code page of keyboard and screen
Rule of thumb:• Set -cpstream to match the Operating System
code page• Use ABL to override -cpstream when needed
-cpstream (startup parameter)
© 2007 Progress Software Corporation27 DEV-23: Global Applications and Code Pages
OpenEdge Settings
-cpstream (startup parameter)
Unix/Linux code page
DOS code page
C:\>mode con cp
Status for device CON:---------------------- Code page: 437
% locale charmapISO8859-1
© 2007 Progress Software Corporation28 DEV-23: Global Applications and Code Pages
OpenEdge Settings
Contains the Character Processing Tables DLC/convmap.cp DLC/prolang/convmap/convmap.dat OpenEdge 10.1B out of the box contains:
• 54 code pages
• 595 code page conversion tables
• 491 collation tables
More tables can be added
convmap.cp (OpenEdge file)
© 2007 Progress Software Corporation29 DEV-23: Global Applications and Code Pages
OpenEdge Settings
Use appropriate fonts for code page and language:• Recommended to replace MS Sans Serif with
Microsoft Sans Serif
• MS Gothic or MS Mincho for Japanese
• MS Song for Chinese
• Use script when needed font0=Courier New, size=8, script=russian
font0=Courier New, size=8, script=easteurope
progress.ini (OpenEdge file)
© 2007 Progress Software Corporation30 DEV-23: Global Applications and Code Pages
Not an OpenEdge Setting
Linked fonts
Information about Windows fonts:http://www.microsoft.com/typography/fonts/default.aspx
http://www.microsoft.com/globaldev/getwr/steps/wrg_font.mspx
Windows Fonts
© 2007 Progress Software Corporation31 DEV-23: Global Applications and Code Pages
OpenEdge Settings
Meet requirements for Input/Output:• -cpinternal for process and GUI I/O (UTF-8)
• -cpstream for file I/O and CHUI I/O (OS)
Decide the code page when exporting data Know the code page when importing data
Summary
OpenEdge ProcessOpenEdge Process
OS filesOS files
KeyboardKeyboard
ScreenScreen
-cpinternal
-cpstream
DatabaseDatabase
_db-xl-name
PrinterPrinter
GU
IG
UI
CH
UI
CH
UI
© 2007 Progress Software Corporation32 DEV-23: Global Applications and Code Pages
Agenda
Code Pages Overview OpenEdge Settings Common Mistakes Hints & Tips Linguistic Sorting and Collation Time Zones Summary Questions
© 2007 Progress Software Corporation33 DEV-23: Global Applications and Code Pages
Common Mistakes
Loading or importing data with the wrong code page
C4 8C 7A 65
63 68
C4 8C 7A 65
63 68
ČzechČzech
ÄŚzechÄŚzech
ÄŒzechÄŒzechISO8859-1
UTF-8
1250
© 2007 Progress Software Corporation34 DEV-23: Global Applications and Code Pages
Byte Order Mark (BOM)
Identifies the UTF encoding of a data file Unicode code point U+FEFF U+FEFF is also encoded:
UTF-8: EF BB BF UTF-16BE: FE FF UTF-16LE: FF FE UTF-32BE: 00 00 FE FF UTF-32LE: FF FE 00 00
OpenEdge understands BOMs when reading
© 2007 Progress Software Corporation35 DEV-23: Global Applications and Code Pages
Byte Order Mark (BOM)
EF BB DF C4
8C 7A 65 63
68
EF BB DF C4
8C 7A 65 63
68
ČzechČzech
ČzechČzech
ČzechČzechISO8859-1
UTF-8
1250
CautionCaution !
© 2007 Progress Software Corporation36 DEV-23: Global Applications and Code Pages
(…)"imuller" "Ian Muller" "Y" "C" 1657 283200"jdoe" "Jane Doe" "N" "U" 3275 450010"jsmith" "John Smith" "Y" "C" 1450 323700"jsanchez" "Juan Sánchez" "Y" "C" 4250 323900.PSCfilename=usersrecords=0000000001133ldbname=mydatabasetimestamp=2007/03/28-20:55:03numformat=44,46dateformat=mdy-1950map=NO-MAPcpstream=ISO8859-1.0000143373
(…)"imuller" "Ian Muller" "Y" "C" 1657 283200"jdoe" "Jane Doe" "N" "U" 3275 450010"jsmith" "John Smith" "Y" "C" 1450 323700"jsanchez" "Juan Sánchez" "Y" "C" 4250 323900.PSCfilename=usersrecords=0000000001133ldbname=mydatabasetimestamp=2007/03/28-20:55:03numformat=44,46dateformat=mdy-1950map=NO-MAPcpstream=ISO8859-1.0000143373
Common Mistakes
Loading or importing data with the wrong code page
© 2007 Progress Software Corporation37 DEV-23: Global Applications and Code Pages
_progres_progres
E0-cpstream IBM850
Common Mistakes
Updating data with the wrong code page
_mprosrv_mprosrv
OS = 1252
àà ÓE0
D3
E0D3-cpinternal IBM850
-cpinternal ISO8859-1
_db-xl-nameISO8859-1
© 2007 Progress Software Corporation38 DEV-23: Global Applications and Code Pages
_progres_progres
E0-cpstream 1252
Common Mistakes
Updating data with the CORRECT code page
_mprosrv_mprosrv
OS = 1252
àà àE0
E0
85E0-cpinternal IBM850
-cpinternal ISO8859-1
_db-xl-nameISO8859-1
© 2007 Progress Software Corporation39 DEV-23: Global Applications and Code Pages
_progres –web_progres –web
Common Mistakes
Updating data with the wrong code page
-cpstream UTF-8
© 2007 Progress Software Corporation40 DEV-23: Global Applications and Code Pages
Common Mistakes
Incorrect tools to verify data
Notepad sometimes guesses the code page based on the content
Notepad understands BOM, Excel doesn’t Startup parameters in Procedure Editor Fonts in progress.ini Terminal Emulator needs to be configured to
support remote OS code page Use an Hexadecimal Editor Two wrongs may make it look right
© 2007 Progress Software Corporation41 DEV-23: Global Applications and Code Pages
Agenda
Code Pages Overview OpenEdge Settings Common Mistakes Hints & Tips Linguistic Sorting and Collation Time Zones Summary Questions
© 2007 Progress Software Corporation42 DEV-23: Global Applications and Code Pages
Tips & Hints
When starting development, make sure all the components have the correct code page settings
Each application may need different code page settings
When integrating, review the code page settings of all applications and processes involved
Development and Integration
© 2007 Progress Software Corporation43 DEV-23: Global Applications and Code Pages
Tips & Hints
How to display the code page settings:
MESSAGE "Database = " DBCODEPAGE(1) SKIP "Collation = " DBCOLLATION(1) SKIP "-cpinternal = " SESSION:CPINTERNAL SKIP "-cpstream = " SESSION:CPSTREAM SKIP "-cpcoll = " SESSION:CPCOLL SKIP VIEW-AS ALERT-BOX.
© 2007 Progress Software Corporation44 DEV-23: Global Applications and Code Pages
Tips & Hints
Temp-tables use their own word-break tables for word indexes
Use -ttwrdrul parameter
Temp-tables using Word Indexes
DatabaseDatabase
Word Break Table
Word Break Table
Progress clients
prowin32
_progres [-web]
Progress clients
prowin32
_progres [-web]
proutil -C wbreak-compiler
-ttwrdrulproutil -C word-rules
© 2007 Progress Software Corporation45 DEV-23: Global Applications and Code Pages
Tips & Hints
When using OUTPUT TO, know the code page you need the output to be converted to, which will be dependant on how the file will be used
When using INPUT FROM, know in what code page the imported data was encoded
To override the -cpstream default:OUTPUT TO file CONVERT TARGET "UTF-8".INPUT FROM file CONVERT SOURCE "UTF-8".
Stamp code page, especially for integration
Input/Output
© 2007 Progress Software Corporation46 DEV-23: Global Applications and Code Pages
Tips & Hints
Many UTF-8 characters are more than one byte:
returns
UTF-8 can be multi-byte!
DEFINE VARIABLE c AS CHARACTER INIT "á".MESSAGE LENGTH(c) SKIP LENGTH(c,"RAW") VIEW-AS ALERT-BOX.
© 2007 Progress Software Corporation47 DEV-23: Global Applications and Code Pages
Tips & Hints
Use CHR() and ASC() with code page parameters
Do not hard-code encoding values See examples…
CHR() and ASC()
© 2007 Progress Software Corporation48 DEV-23: Global Applications and Code Pages
Tips & Hints
Detecting non-breaking blank spaces (NBSP)
Better code:
CHR() and ASC() – Example 1
CASE SESSION:CPINTERNAL: WHEN "UTF-8" THEN IF c = CHR(49824) THEN MESSAGE "NBSP" VIEW-AS ALERT-BOX. WHEN "ISO8859-1" THEN IF c = CHR(160) THEN MESSAGE "NBSP" VIEW-AS ALERT-BOX.END CASE.
IF c = CHR(49824,SESSION:CPINTERNAL,"UTF-8") THEN MESSAGE "NBSP" VIEW-AS ALERT-BOX.
© 2007 Progress Software Corporation49 DEV-23: Global Applications and Code Pages
Tips & Hints
OpenEdge silently ignores incorrect values to ASC() or CHR()
CHR() and ASC() – Example 2
/* When run with –cpinternal UTF-8 it returns YES because 160 is not a valid UTF-8 encoding. When run with –cpinternal 1252 it returns NO.*/MESSAGE CHR(160) = "" VIEW-AS ALERT-BOX.
/* Always returns NO */MESSAGE CHR(49824,SESSION:CPINTERNAL,"UTF-8") = "" VIEW-AS ALERT-BOX.
© 2007 Progress Software Corporation50 DEV-23: Global Applications and Code Pages
Tips & Hints
CHR() and ASC() work with encoding values, as opposed to code points
For example, this code run on a session with -cpinternal UTF-8
returns 50081 (C3A1) and not 225 (00E1).
Unicode UTF-8
U+00E1 C3 A1
CHR() and ASC() – Example 3
DEFINE VARIABLE c AS CHARACTER NO-UNDO.c = "á".MESSAGE ASC(c) VIEW-AS ALERT-BOX.
© 2007 Progress Software Corporation51 DEV-23: Global Applications and Code Pages
Tips & Hints
If needed, Unicode code points can be used:
Unicode code points
DEFINE VARIABLE c AS CHARACTER NO-UNDO.c = "á".MESSAGE c = "~u00E1" SKIP c = CHR(50081) SKIP c = CHR(225,"UTF-8","1252") VIEW-AS ALERT-BOX.
© 2007 Progress Software Corporation52 DEV-23: Global Applications and Code Pages
_progres_progres
E0-cpstream IBM850
Tips & Hints
-mprosrv-mprosrv
OS = 1252
àà ÓE0
D3
E0D3-cpinternal IBM850
-cpinternal ISO8859-1
_db-xl-nameISO8859-1
Un-corrupting data
© 2007 Progress Software Corporation53 DEV-23: Global Applications and Code Pages
Tips & Hints
Un-corrupting data
FOR EACH myTable EXCLUSIVE-LOCK. RUN FixChar(INPUT-OUTPUT myTable.myField).END.
PROCEDURE FixChar: DEF INPUT-OUTPUT PARAM c AS CHAR NO-UNDO. c = CODEPAGE-CONVERT(c,"IBM850","ISO8859-1").END PROCEDURE.
ISO8859-1 database with data encoded in IBM850
Run on session with -cpinternal iso8859-1
© 2007 Progress Software Corporation54 DEV-23: Global Applications and Code Pages
Tips & Hints
How to output UTF-8 BOM to a file
Intended for Notepad (.txt) or web browser (.html)
BOM
OUTPUT TO text.txt CONVERT TARGET "UTF-8".PUT CONTROL "~357~273~277". /* BOM */PUT UNFORMATTED "UTF-8 text".OUTPUT CLOSE.
© 2007 Progress Software Corporation55 DEV-23: Global Applications and Code Pages
Tips & Hints
Web browser needs to map WebSpeed’s -cpstream
Original outputHeader procedure:
PROCEDURE outputHeader: output-content-type ("text/html").END PROCEDURE.
_progres –web_progres –web-cpstream UTF-8 Encoding ???
Web Browser
© 2007 Progress Software Corporation56 DEV-23: Global Applications and Code Pages
Tips & Hints
Web browser needs to map WebSpeed’s -cpstream (1)
Use OpenEdge’s convcp.p procedure
PROCEDURE outputHeader: DEF VAR cMimeCP AS CHAR NO-UNDO. RUN adecomm/convcp.p(SESSION:CPSTREAM, "ToMime", OUTPUT cMimeCP). output-content-type ("text/html; charset=" + cMimeCP).END PROCEDURE.
© 2007 Progress Software Corporation57 DEV-23: Global Applications and Code Pages
Tips & Hints
Web browser needs to map WebSpeed’s –cpstream (2)
Use User Defined Function
GetMimeCP converts OpenEdge code page names to MIME names
See example…
PROCEDURE outputHeader: output-content-type ("text/html; charset=" + GetMimeCP(SESSION:CPSTREAM)).END PROCEDURE.
© 2007 Progress Software Corporation58 DEV-23: Global Applications and Code Pages
FUNCTION GetMimeCP RETURNS CHAR (INPUT progress-CodePage AS CHAR): DEF VAR pro-cplist AS CHAR INIT"1250,1251,1252,1253,1254,1255,1256,1257,1258,620-2533,BIG-5,EUCJIS,GB2312,IBM037,IBM273,IBM277,IBM278,IBM284,IBM297,IBM437,IBM500,IBM850,IBM851,IBM852,IBM857,IBM858,IBM861,IBM862,IBM866,ISO8859-1,ISO8859-10,ISO8859-15,ISO8859-2,ISO8859-3,ISO8859-4,ISO8859-5,ISO8859-6,ISO8859-7,ISO8859-8,ISO8859-9,KOI8-R,KSC5601,ROMAN-8,SHIFT-JIS,UCS2,UTF-8". DEF VAR MIME-cplist AS CHAR INIT"Windows-1250,Windows-1251,Windows-1252,Windows-1253,Windows-1254,Windows-1255,Windows-1256,Windows-1257,Windows-1258,TIS-620,Big5,EUC-JP,GB_2312-80,IBM037,IBM273,IBM277,IBM278,IBM284,IBM297,IBM437,IBM500,IBM850,IBM851,IBM852,IBM857,IBM00858,IBM861,IBM862,IBM866,ISO-8859-1,ISO-8859-10,ISO-8859-15,ISO-8859-2,ISO-8859-3,ISO-8859-4,ISO-8859-5,ISO-8859-6,ISO-8859-7,ISO-8859-8,ISO-8859-9,KOI8-R,KS_C_5601-1987,hp-roman8,Shift_JIS,UTF-16,UTF-8". DEF VAR i AS INT. i = LOOKUP(progress-CodePage,pro-cplist). RETURN IF i = 0 THEN "Unknown" ELSE ENTRY(i,MIME-cplist).END FUNCTION.
Tips & Hints
GetMimeCP example
© 2007 Progress Software Corporation59 DEV-23: Global Applications and Code Pages
Do not store decimal values in char fields
prog2.p will fail if run with a different -E or -numdec than prog1.p
Comma-delimited lists
Tips & Hints
Caution with numeric format
/* prog1.p */DEFINE VARIABLE d AS DECIMAL INIT 123.45.CREATE table.table.char1 = STRING(d).
/* prog2.p */FIND FIRST table.DISPLAY DECIMAL(table.char1).
© 2007 Progress Software Corporation60 DEV-23: Global Applications and Code Pages
Tips & Hints
Date and Numeric formats can be changed at run time
DEFINE VARIABLE mynum AS DECIMAL NO-UNDO.SESSION:DATE-FORMAT = "mdy".DISPLAY SESSION:DATE-FORMAT TODAY SKIP.SESSION:DATE-FORMAT = "dmy".DISPLAY SESSION:DATE-FORMAT TODAY FORMAT "99-99-9999" SKIP.SESSION:DATE-FORMAT = "ymd".DISPLAY SESSION:DATE-FORMAT TODAY FORMAT "9999.99.99" SKIP.mynum = 12345.67.SESSION:NUMERIC-FORMAT = "American".DISPLAY SESSION:NUMERIC-FORMAT STRING(mynum) SKIP.SESSION:NUMERIC-FORMAT = "European".DISPLAY SESSION:NUMERIC-FORMAT STRING(mynum) SKIP WITH NO-LABELS.
© 2007 Progress Software Corporation61 DEV-23: Global Applications and Code Pages
Tips & Hints
Never use the “undefined” code page If the source and target code pages are the
same, no conversion happens If we always make the same mistake we’ll not
notice the data corruption r-code is encoded using -cpinternal Source files are encoded using -cpstream Recognize UTF-8 read as iso8859-1:
• ö becomes ö
Miscellaneous
© 2007 Progress Software Corporation62 DEV-23: Global Applications and Code Pages
Tips & Hints
How to create a UTF-8 word-break table:> proutil -C wbreak-compiler %DLC%\prolang\convmap\utf8-bas.wbt 1
> copy proword.1 %DLC%
How to create a UTF-8 database:> prodb <db> %DLC%\prolang\utf\empty.db
> proutil <db> -C word-rules 1
How to start a UTF-8 client:> _progres -b –cpinternal UTF-8 -ttwrdrul 1
> prowin32 –cpinternal UTF-8 -ttwrdrul 1
DBA reminder
© 2007 Progress Software Corporation63 DEV-23: Global Applications and Code Pages
Agenda
Code Pages Overview OpenEdge Settings Common Mistakes Hints & Tips Linguistic Sorting and Collation Time Zones Summary Questions
© 2007 Progress Software Corporation64 DEV-23: Global Applications and Code Pages
Linguistic Sorting and Collation
Collation: Set of rules for ordering and comparing character data
OpenEdge supports 54 ICU (International Components for Unicode) collations with UTF-8
Local databases vs global databases COMPARE and COLLATE
© 2007 Progress Software Corporation65 DEV-23: Global Applications and Code Pages
Linguistic Sorting and Collation
FOR EACH mytable BY myfield: DISPLAY myfield WITH FONT 8.END.
Sorting with Basic collation
AaaÁááÄääÇççĈĉĉBbbCccZzz
Basic
© 2007 Progress Software Corporation66 DEV-23: Global Applications and Code Pages
Linguistic Sorting and Collation
FOR EACH mytable BY COLLATE(myfield,"CASE-INSENSITIVE","ICU-UCA"): DISPLAY myfield WITH FONT 8.END.
Sorting with English collation
AaaÁááÄääBbbCccĈĉĉÇççZzz
AaaÁááÄääÇççĈĉĉBbbCccZzz
Basic ICU-UCA
© 2007 Progress Software Corporation67 DEV-23: Global Applications and Code Pages
Linguistic Sorting and Collation
FOR EACH mytable BY COLLATE(myfield,"CASE-INSENSITIVE","ICU-fi"): DISPLAY myfield WITH FONT 8.END.
Sorting with Finnish collation
AaaÁááBbbCccĈĉĉÇççZzzÄää
AaaÁááÄääÇççĈĉĉBbbCccZzz
Basic ICU-fi
AaaÁááÄääBbbCccĈĉĉÇççZzz
ICU-UCA
© 2007 Progress Software Corporation68 DEV-23: Global Applications and Code Pages
Linguistic Sorting and Collation
FOR EACH mytable WHERE myfield >= "C" BY myfield: DISPLAY myfield WITH FONT 8.END.
Comparing with Basic collation
CccZzz
Basic
© 2007 Progress Software Corporation69 DEV-23: Global Applications and Code Pages
Linguistic Sorting and Collation
FOR EACH mytable WHERE COMPARE(myfield,">=","C", "CASE-INSENSITIVE","ICU-UCA") BY COLLATE(myfield,"CASE-INSENSITIVE","ICU-UCA"): DISPLAY myfield WITH FONT 8.END.
Comparing with English collation
CccZzz
Basic
CccĈĉĉÇççZzz
ICU-UCA
© 2007 Progress Software Corporation70 DEV-23: Global Applications and Code Pages
Linguistic Sorting and Collation
FOR EACH mytable WHERE COMPARE(myfield,">=","C", "CASE-INSENSITIVE","ICU-fi") BY COLLATE(myfield,"CASE-INSENSITIVE","ICU-fi"): DISPLAY myfield WITH FONT 8.END.
Comparing with Finnish collation
CccĈĉĉÇççZzzÄää
CccZzz
Basic ICU-fi
CccĈĉĉÇççZzz
ICU-UCA
© 2007 Progress Software Corporation71 DEV-23: Global Applications and Code Pages
Linguistic Sorting and Collation
Global Setup
DatabaseDatabase
-cpcoll ICU-uca-cpcoll ICU-uca
AppServerAppServerEnglish User
French User
Czech User
Finnish User
TEMP-TABLES
TEMP-TABLES
TEMP-TABLES
TEMP-TABLES
TEMP-TABLES
TEMP-TABLES
TEMP-TABLES
TEMP-TABLES
-cpcoll ICU-en-cpcoll ICU-en
-cpcoll ICU-fr-cpcoll ICU-fr
-cpcoll ICU-cs-cpcoll ICU-cs
-cpcoll ICU-fi-cpcoll ICU-fi
-cpcoll ICU-uca---
Uses clientcollation inCOMPARE
andCOLLATE
-cpcoll ICU-uca---
Uses clientcollation inCOMPARE
andCOLLATE
RUN ASprg.p ON hAppServer (INPUT SESSION:CPCOLL, INPUT USERID, INPUT <other parameters>, OUTPUT TABLE ttMytable).
Caution with performance!Caution with performance!
© 2007 Progress Software Corporation72 DEV-23: Global Applications and Code Pages
Agenda
Code Pages Overview OpenEdge Settings Common Mistakes Hints & Tips Linguistic Sorting and Collation Time Zones Summary Questions
© 2007 Progress Software Corporation73 DEV-23: Global Applications and Code Pages
Time Zones
Timestamps: client vs server vs GMT Display time: saved vs converted Database queries: saved vs converted
Considerations
http://www.csgnetwork.com/timezonemap.html
© 2007 Progress Software Corporation74 DEV-23: Global Applications and Code Pages
Time Zones
DST usedDST no longer usedDST never used http://en.wikipedia.org/wiki/Daylight_saving_time
Extra consideration
Daylight Saving Time for time conversions
© 2007 Progress Software Corporation75 DEV-23: Global Applications and Code Pages
Time Zones
Operating Systems have time zone tables• Solaris: /usr/share/lib/zoneinfo
• HP-UX: /usr/lib/tztab
• Red Hat: /usr/share/zoneinfo• Windows: HKEY_LOCAL_MACHINE\SOFTWARE\
Microsoft\Windows NT\CurrentVersion\Time Zones
Java uses its own time zone tables OpenEdge relies on the platform
OS Support
© 2007 Progress Software Corporation76 DEV-23: Global Applications and Code Pages
Time Zones
DATETIME and DATETIME-TZ data types
DEFINE VARIABLE dt AS DATETIME.DEFINE VARIABLE dtz AS DATETIME-TZ.dt = NOW.dtz = NOW.MESSAGE dt SKIP dtz VIEW-AS ALERT-BOX.
This is offset,not Time Zone !This is offset,not Time Zone !
© 2007 Progress Software Corporation77 DEV-23: Global Applications and Code Pages
Time Zones
Timestamping
DatabaseDatabase
All timesare GMT
All timesare GMT
AppServerAppServer
UserUserGets OS timein GMT
Gets OS timein GMT
Converts GMTTo User’sTime Zone
Converts GMTTo User’sTime Zone
© 2007 Progress Software Corporation78 DEV-23: Global Applications and Code Pages
Time Zones
Displaying times
DatabaseDatabase
GMT TimesGMT Times
AppServerAppServer
UserUser
Converts GMTTo User’sTime Zone
Converts GMTTo User’sTime Zone
UserUser
UserUser
UserUser
12:30GMT
08:30
14:30
22:30
22:30
(-1) 07:30
(-1) 13:30
(0) 22:30
(+1) 23:30
Summer Winter
BedfordUSA
BerlinGermany
BrisbaneAustralia
SydneyAustralia
© 2007 Progress Software Corporation79 DEV-23: Global Applications and Code Pages
Time Zones
users 10 user-id C X(8) User ID 20 tz-id C X(4) Time zone ID
timezones 10 tz-id C X(4) Time zone ID 20 tz-name C X(40) Time zone name
tz-changes 10 tz-id C X(4) Time zone ID 20 tz-date D 99/99/9999 Date that the changes apply from 30 min-1 I ->>>9 Normal minutes of difference from GMT 40 min-2 I ->>>9 Minutes of difference from GMT during DST 50 from-month I >9 Month when DST starts 60 from-day I 9 Code for day when DST starts 70 from-time C 99:99 Time when DST starts 80 to-month I >9 Month when DST ends 90 to-day I 9 Code for day when DST ends 100 to-time C 99:99 Time when DST ends
Database tables
© 2007 Progress Software Corporation80 DEV-23: Global Applications and Code Pages
Time Zones
ABL functions
GetGMT() to get current time in GMT
FUNCTION GetGMT RETURNS DATETIME (): DEF VAR dtGMT AS DATETIME NO-UNDO. dtGMT = ADD-INTERVAL(NOW,- TIMEZONE,'MINUTES'). RETURN dtGMT.END FUNCTION.
© 2007 Progress Software Corporation81 DEV-23: Global Applications and Code Pages
Time Zones
ABL functions
ConvertDT() to convert GMT to user’s time
FUNCTION ConvertDT RETURNS DATETIME (INPUT pdtNow AS DATETIME NO-UNDO, INPUT pcTz-id AS CHARACTER NO-UNDO):
DEF VAR dtOut AS DATETIME NO-UNDO. FIND LAST tz-change NO-LOCK WHERE tz-change.tz-id = pcTz-id AND tz-change.tz-date <= DATE(pdtNow) NO-ERROR. (...) RETURN dtOut.END FUNCTION.
© 2007 Progress Software Corporation82 DEV-23: Global Applications and Code Pages
Agenda
Code Pages Overview OpenEdge Settings Common Mistakes Hints & Tips Linguistic Sorting and Collation Time Zones Summary Questions
© 2007 Progress Software Corporation83 DEV-23: Global Applications and Code Pages
Summary
UTF-8 for database and -cpinternal as a start Know the code page of data getting into and
out of OpenEdge (-cpstream / CONVERT) Two wrongs may make it look right It’s not only about conversion, but checking
results as well – Use hexadecimal tools Take a look at the 10.1B Internationalizing
Applications manual Code Pages are tricky, but fun !
© 2007 Progress Software Corporation85 DEV-23: Global Applications and Code Pages
Thank you foryour time