Unicode Support in SAP Web Application Server Dr. Christian Hansen Matthias Mittelstein Server Technology Internationalization SAP AG
Unicode Support in SAP Web Application Server
Dr. Christian Hansen Matthias MittelsteinServer Technology Internationalization SAP AG
2002 SAP Labs, LLC, WAS203, Christian Hansen 2
Agenda
Scripts, Characters and Code Pages� Conventional code pages� Unicode
Unicode in the Web Application Server� Front End� Communication� Application server� Database� Printing
Conversion to Unicode� C and C++ programs� ABAP programs� Database� System landscape
Availability and Release Planning
Summary
2002 SAP Labs, LLC, WAS203, Christian Hansen 3
Scripts, Characters and Code Pages: Languages
English
German
Turkish
DanishDutch,
FinnishFrench, Italian
NorwegianPortugueseSpanish
Swedish
CroatianCzechHungarianPolish
RumanianSlovakian
Slovene
RussianUkrainian
Greek
Hebrew
Thai
Korean
Japanese Chinese
Taiwanese
Icel
andi
c
2002 SAP Labs, LLC, WAS203, Christian Hansen 4
Conventional Code Pages: ISO8859-1
English
German
Turkish
DanishDutch,
FinnishFrench, Italian
NorwegianPortugueseSpanish
Swedish
CroatianCzechHungarianPolish
RumanianSlovakian
Slovene
RussianUkrainian
Greek
Hebrew
Thai
Korean
Japanese Chinese
Taiwanese
Icel
andi
c
2002 SAP Labs, LLC, WAS203, Christian Hansen 5
Conventional Code Pages ISO8859-5
German
Turkish
DanishDutch,
FinnishFrench, Italian
NorwegianPortugueseSpanish
Swedish
CroatianCzechHungarianPolish
RumanianSlovakian
Slovene
RussianUkrainian
Greek
Hebrew
Thai
Korean
Japanese Chinese
Taiwanese
Icel
andi
c
English
2002 SAP Labs, LLC, WAS203, Christian Hansen 6
Conventional Code Pages: ISO8859-7
German
Turkish
DanishDutch,
FinnishFrench, Italian
NorwegianPortugueseSpanish
Swedish
CroatianCzechHungarianPolish
RumanianSlovakian
Slovene
RussianUkrainian
Greek
Hebrew
Thai
Korean
Japanese Chinese
Taiwanese
Icel
andi
c
English
2002 SAP Labs, LLC, WAS203, Christian Hansen 7
Conventional Code Pages: ISO8859-8
German
Turkish
DanishDutch,
FinnishFrench, Italian
NorwegianPortugueseSpanish
Swedish
CroatianCzechHungarianPolish
RumanianSlovakian
Slovene
RussianUkrainian
Greek
Hebrew
Thai
Korean
Japanese Chinese
Taiwanese
Icel
andi
c
English
2002 SAP Labs, LLC, WAS203, Christian Hansen 8
Conventional Code Pages: Shift-JIS
German
Turkish
DanishDutch,
FinnishFrench, Italian
NorwegianPortugueseSpanish
Swedish
CroatianCzechHungarianPolish
RumanianSlovakian
Slovene
RussianUkrainian
Greek
Hebrew
Thai
Korean
Chinese
Taiwanese
Icel
andi
c
Japanese
English
2002 SAP Labs, LLC, WAS203, Christian Hansen 9
Conventional Code Pages: GB2312
German
Turkish
DanishDutch,
FinnishFrench, Italian
NorwegianPortugueseSpanish
Swedish
CroatianCzechHungarianPolish
RumanianSlovakian
Slovene
RussianUkrainian
Greek
Hebrew
Thai
Korean
Japanese
Taiwanese
Icel
andi
c
Chinese
English
2002 SAP Labs, LLC, WAS203, Christian Hansen 10
Conventional Code Pages: KSC5601-1992
German
Turkish
DanishDutch,
FinnishFrench, Italian
NorwegianPortugueseSpanish
Swedish
CroatianCzechHungarianPolish
RumanianSlovakian
Slovene
RussianUkrainian
Greek
Hebrew
Thai
Japanese Chinese
Taiwanese
Icel
andi
cKorean
English
2002 SAP Labs, LLC, WAS203, Christian Hansen 11
Conventional Code Pages: Earlier SAP approaches
Earlier SAP approaches used a fixed mapping
from database to code page� "single code page system“
from application server to code page� "MNLS“ (obsolete)
� R/3 releases 2.2F to 3.0C
from language to code page� "MDMP“ (current SAP technique for using multiple code
pages)
� Using tables TCP0C, TCP0D, TCPDB
2002 SAP Labs, LLC, WAS203, Christian Hansen 12
Conventional Code Pages: Earlier SAP approaches (MDMP)
West European View Japanese View Korean View
2002 SAP Labs, LLC, WAS203, Christian Hansen 13
Conventional Code Pages: Earlier SAP approaches
� ambiguous data, when accessing across a code page boundary
� data encoding hard to understand for non-SAP programs
� sophisticated programming techniques needed to handle data in the appropriate code page
� each user limited to her own language
� …
� …
� …
Earlier SAP approaches had disadvantages when the n umber of concurrent system code pages increases:
2002 SAP Labs, LLC, WAS203, Christian Hansen 14
Solution: Unicode
English
German
Turkish
DanishDutch,
FinnishFrench, Italian
NorwegianPortugueseSpanish
Swedish
CroatianCzechHungarianPolish
RumanianSlovakian
Slovene
RussianUkrainian
Greek
Hebrew
Thai
Korean
Japanese Chinese
Taiwanese
Icel
andi
c
And morelanguagescan besupportedeasilywithout theneed fornew code
pages orother new
methods
2002 SAP Labs, LLC, WAS203, Christian Hansen 15
Solution: Unicode characters
ASCIIGeneral Scripts
Symbols
CJK Ideographs
Hangul
Compatibility
Surrogate Area
65,000 characters
Additional 1,000,000 characters
2002 SAP Labs, LLC, WAS203, Christian Hansen 16
E3 91 B979 3434 79U+3479
CE B1B1 0303 B1U+03B1αααα
C3 A4E4 0000 E4U+00E4ä
6161 0000 61U+0061a
UTF-8UTF-16little endian
UTF-16big endian
Unicodescalar value
Character
Representation of Unicode Characters
UTF-16 – Unicode Transformation Format, 16 bit encoding
� Fixed length, 1 character = 2 bytes (surrogate pai rs = 2 + 2 bytes)
� Platform-dependent byte order (big/little endian)
� 2 byte alignment restriction
UTF-8 – Unicode Transformation Format, 8 bit encoding
� Variable length, 1 character = 1...4 bytes
� Platform independent
� no alignment restriction
� 7 bit US ASCII compatible
2002 SAP Labs, LLC, WAS203, Christian Hansen 17
Agenda
Scripts, Characters and Code Pages� Conventional code pages� Unicode
Unicode in the Web Application Server� Front End� Communication� Application server� Database� Printing
Conversion to Unicode� C and C++ programs� ABAP programs� Database� System landscape
Availability and Release Planing
Summary
2002 SAP Labs, LLC, WAS203, Christian Hansen 18
Unicode in the SAP Web Application Server
non-Unicode Unicode
� Release 6.20 GUIs use UTF-8 for communication and UTF-8 and UTF-16 internally
� WinGUI 6.30 will use UTF-16 internally
� SAP GUI
� UTF-16
� Unicode:
� UTF-8 printer to cover all characters
� Normal printers restricted to local texts with reduced character set
� RFC, XML and other: Code page conversions on character data are explicit and mandatory
Fro
nt-e
ndC
omm
u-ni
catio
nA
ppl.
Ser
ver
Dat
abas
eP
rintin
g
� UTF-16 chosen by � CESU-8 DBMS vendor
� ASCII, DBCS, EBCDIC, MDMP, ...
� Or only bytes?
� ASCII, DBCS, EBCDIC, MDMP, blended code pages, …
� RFC and other: Implicit reinterpretations andmemcopy based loopholes create ambiguous data
� One standard printer type for each code page
� UTF-8 printer for all code pages
2002 SAP Labs, LLC, WAS203, Christian Hansen 19
Unicode in the Front-end: SAPGUI
Use a 6.20 SAPGUI when working on a Unicode system.a) Frontend code page is set to 4110 (UTF-8)
b) Multibyte functionality is activated (see note 508 854)
The feature to use Unicode is built into the SAPGUIs . No separate executable or separate installation is nec essary.
6.20-SAPGUIs use UTF-8 for communication when in Unicode mode.
a) b)
2002 SAP Labs, LLC, WAS203, Christian Hansen 20
SAPGUI for the Windows Environment
WinGUI 6.20�delivered
�Screenshot:Version 6.20 Revision 2 Patch level 18
�Standard installation
2002 SAP Labs, LLC, WAS203, Christian Hansen 21
SAPGUI for the Java Environment
JavaGUI 6.20�delivered
�Screenshot:Version 6.20 Revision 4
�Standard installation
�Plus modified C:\program files\JavaSoft\
JRE\1.3.1_02\lib\font.properties
2002 SAP Labs, LLC, WAS203, Christian Hansen 22
SAPGUI for the HTML Environment
WebGUI�prototype
�Version 6.20
�Remember:WAS 6.20 comes with ITS 6.10
2002 SAP Labs, LLC, WAS203, Christian Hansen 23
Unicode and RFC (1)
RFC non-Unicode – non-Unicode� Long running standard� Receiver converts code pages, if it can
RFC Unicode – Unicode� No problem
RFC Unicode -- non-Unicode� The Unicode-side converts from/to the old code page� In Unicode <->MDMP communication data is interprete d using
language key information � System settings allow to catch or ignore possible c onversion
problems� Applications have to readjust fields, if structured data has been
transported stored in character containers
When the data has a complicated structure RFC already uses XML und UTF-8 since release 4.6C
2002 SAP Labs, LLC, WAS203, Christian Hansen 24
Unicode and RFC (2)
� A Unicode receiver can receive all characters.
( solid lines )
� A non-Unicode receiver cannot receive characters that are not in its own code page. But as long as you restrict the character set, data can be sent from everywhere to everywhere.
( dotted lines )
2002 SAP Labs, LLC, WAS203, Christian Hansen 25
Unicode in the application servers
Application servers use UTF-16
� not UCS-4 or UTF-32, because that would be too expe nsive (memory, bandwidth)
� not UTF-8 because the dynamic length and offset wou ld be too complicated (CPU consumption, robustness of applica tions)
� ABAP like JAVA
Kernel has only one C / C++ source
Applications have only one ABAP source
Additional hardware requirement: � CPU 30-35%
� Memory 50%
2002 SAP Labs, LLC, WAS203, Christian Hansen 26
Unicode in the databases
Databases use UTF-16 or CESU-8 * internally
� Hidden by database client software library� The library interface to the kernel uses UTF-16.
Additional hardware requirement: � UTF-8/CESU-8 36%� UTF-16 60-70%
UTF-16SAPDB 7.0
- not supported -Informix
later: UTF-8DB/2 390
UTF-16DB/2 400
UTF-8DB/2 6000
UTF-16MS SQL Server
CESU-8Oracle
* CESU-8 is similar to UTF-8, but when binary sorting is used, it gives the same result as binary sorting on UTF-16BE. The difference between UTF-8 and CESU-8 is visible only for surrogate pairs.
See note 379940 for current status.
2002 SAP Labs, LLC, WAS203, Christian Hansen 27
Unicode and printers: LEXMARK UTF-8 printer
Printer of choice when connected to a Unicode system
Can handle any single language in an MDMP system
Can print mixed languages in an MDMP system, whenSAPscript is used carefully. ( See scan of printout.)
2002 SAP Labs, LLC, WAS203, Christian Hansen 28
Agenda
Scripts, Characters and Code Pages� Conventional code pages� Unicode
Unicode in the Web Application Server� Front End� Communication� Application server� Database� Printing
Conversion to Unicode� C and C++ programs� ABAP programs� Database� System landscape
Availability and Release Planing
Summary
2002 SAP Labs, LLC, WAS203, Christian Hansen 29
Conversion of C and C++ programs
SAP : C and C++ programs, the Kernel (done)
� Modify sources (with tools ccU and ccQ++ and some manual changes) Replace char, strcmp, fopen, freadwith SAP_UC, strcmpU, fopenU, freadUor SAP_RAW, strcmpR, , freadR
� Generate backward compatible non-Unicode kernels an d UTF-16 kernels from the same sources.
� Suggest char16_t and u"String" to standard organizations.
� Use ICU in Unicode systems for locale-dependent fun ctionality.The International Components for Unicode (ICU) is a C and C++ library that provides Unicode support functionality. ICU is a collaborative, open-source development project. It is licensed under the X License. For more details see the http://oss.software.ibm.com/icu/index.hltml. There is also a parallel ICU for Java project http://oss.software.ibm.com/icu4j/.
Customer: external RFC-applications in C or C++
� Modify sources (with tools ccU and ccQ++ and some manual changes, see SAP RFC Software Development Kit docum entation for further information)
2002 SAP Labs, LLC, WAS203, Christian Hansen 30
Conversion of ABAP programs
ABAP character data types (C,N,D,T,STRING) are automatically Unicode in a Unicode system.
� Major part of ABAP coding is ready for Unicode without any changes
� Minor part of ABAP coding has to be adapted to comply with Unicode restrictions ( ���� Workshop ABAP217)
Use UCCHECK to activate Unicode syntax check and view problematic places
Do runtime tests to detect semantic changes in the application. Screen runtime tests with the ABAP Cov erage Analyzer SCOV .
2002 SAP Labs, LLC, WAS203, Christian Hansen 31
Conversion of ABAP programs: UCCHECK
2002 SAP Labs, LLC, WAS203, Christian Hansen 32
Conversion of ABAP programs: SCOV
2002 SAP Labs, LLC, WAS203, Christian Hansen 33
Conversion of the database: Single Code Page System
Building up a Unicode system requires converting al l character data in the system to Unicode
In a single code page system the code page of the c haracter data is unambiguous.
The conversion is done with R3load:
� R3load – export (conversion to Unicode is done)
� R3load – import (Unicode data is imported)
Downtime may be reduced with IMIG incremental conversion with minimal downtime (in development)
2002 SAP Labs, LLC, WAS203, Christian Hansen 34
Unicode database conversion and downtime
Downtime
Non-Unicode system Unicode system
Export File
Online conversion time
Conversionof rest
2002 SAP Labs, LLC, WAS203, Christian Hansen 35
Conversion of the database: MDMP System
In MDMP systems the code page of the character data is ambiguous and has to be derived from secondary information:
� Scan database with R3 transaction SPUMG
�Find explicit language fields and hidden language f ields
�Recognize typical characters, recognize language de pendent words
�Classify tables and report problematic data
� Enhance database or give hints
The conversion is done with R3load:
� R3load multi-code page export
� R3load import
� Post conversion repair with SUMG (to correct wrong table classifications)
(plus IMIG )
2002 SAP Labs, LLC, WAS203, Christian Hansen 36
Unicode Conversion: System landscapes
Convert systems one by one:
� RFC and other communication between non-Unicode and Unicode systems can do code page conversions where necessary
� Convert data destinations first (only Unicode systems can receive all data)
� Think of external files as own systems and determine their code page. Convert files once or setup conversion durin g reading.
� If you had systems separated by code pages, they ca n be migrated into a single system now
2002 SAP Labs, LLC, WAS203, Christian Hansen 37
Agenda
Scripts, Characters and Code Pages� Conventional code pages� Unicode
Unicode in the Web Application Server� Front End� Communication� Application server� Database� Printing
Conversion to Unicode� C and C++ programs� ABAP programs� Database� System landscape
Availability and Release Planing
Summary
2002 SAP Labs, LLC, WAS203, Christian Hansen 38
Unicode Conversion and Release planning
R/3 4.6D
NU
SAP Web AS 6.20
NU
SAP Web AS 6.20Unicode
SAP Web AS 6.30Unicode
New
Installation
Unicodeconversion
Up-gradewithUnicode conversion ?
Upgrade
Upgrade
2002 SAP Labs, LLC, WAS203, Christian Hansen 39
Unicode enabled mySAP.com Status and Planning
mySAP CRM SAP CRM 3.0
selected stand-alone
SAP CRM 4.0complete
mySAP SCMSAP SC Event Manager 1.1
SAP SCM 4.01
mySAP Enterprise Portals
SAP EP 5.0: ISO-LATIN1
SAP EP 6.0: Unicode
mySAP BISAP BW 3.1SAP R/3 Enterprise
Unicode Roll-Out is started
All the availability dates and release schedules gi ven (see note 79991) are based on SAP internal planning only and, thus, may be subject to change.
mySAP ExchangesSAP XI 2.0
2002 SAP Labs, LLC, WAS203, Christian Hansen 40
Agenda
Scripts, Characters and Code Pages� Conventional code pages� Unicode
Unicode in the Web Application Server� Front End� Communication� Application server� Database� Printing
Conversion to Unicode� C and C++ programs� ABAP programs� Database� System landscape
Availability and Release Planning
Summary
2002 SAP Labs, LLC, WAS203, Christian Hansen 41
Summary
After this lecture you know
the benefits of Unicode
SAPs implementation of Unicode in the Web Application Server
the steps necessary to convert a existing SAP system into a Unicode system
the availability dates of already Unicode enabled mySAP.com products
2002 SAP Labs, LLC, WAS203, Christian Hansen 42
���� Service Marketplace:Service Marketplace:Service Marketplace:Service Marketplace:
Technical information: http://service.sap.com/Unicode@SAP
Customer contact: http://service.sap.com/Unicode
���� Public Web:Public Web:Public Web:Public Web:
www.sap.com
���� Related Related Related Related WorkshopWorkshopWorkshopWorkshop at SAP at SAP at SAP at SAP TechEdTechEdTechEdTechEd 2002200220022002
Unicode Enabling of ABAP Programs:
Tue., 4:15:00 PM - 6:15:00 PM, 391Wed., 4:15:00 PM - 6:15:00 PM, 298 / 299
���� Related LecturesRelated LecturesRelated LecturesRelated Lectures at SAP at SAP at SAP at SAP TechEdTechEdTechEdTechEd 2002200220022002
Global Solutions: Legal Requirements, Languages, Unicode:
Tue., 2:45:00 PM - 3:45:00 PM, 398 / 399Wed., 5:45:00 PM - 6:45:00 PM, 350 / 351
Further Information
2002 SAP Labs, LLC, WAS203, Christian Hansen 43
Q&AQ&AQ&AQ&A
Questions?
2002 SAP Labs, LLC, WAS203, Christian Hansen 44
Feedback
Please complete your session evaluation and drop it in the box on
your way out.
Be courteous — deposit your trash, and do not take the handouts for the
following session.
The SAP TechEd ’02 New Orleans Team
2002 SAP Labs, LLC, WAS203, Christian Hansen 45
No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP AG. The information contained herein may be changed without prior notice.
Some software products marketed by SAP AG and its distributors contain proprietary software components of other software vendors.
Microsoft®, WINDOWS®, NT®, EXCEL®, Word®, PowerPoint® and SQL Server® are registered trademarks of Microsoft Corporation.
IBM®, DB2®, DB2 Universal Database, OS/2®, Parallel Sysplex®, MVS/ESA, AIX®, S/390®, AS/400®, OS/390®, OS/400®, iSeries, pSeries, xSeries, zSeries, z/OS, AFP, Intelligent Miner, WebSphere®, Netfinity®, Tivoli®, Informix and Informix® Dynamic ServerTM are trademarks of IBM Corporation in USA and/or other countries.
ORACLE® is a registered trademark of ORACLE Corporation.
UNIX®, X/Open®, OSF/1®, and Motif® are registered trademarks of the Open Group.
Citrix®, the Citrix logo, ICA®, Program Neighborhood®, MetaFrame®, WinFrame®, VideoFrame®, MultiWin® and other Citrix product names referenced herein are trademarks of Citrix Systems, Inc.
HTML, DHTML, XML, XHTML are trademarks or registered trademarks of W3C®, World Wide Web Consortium, Massachusetts Institute of Technology.
JAVA® is a registered trademark of Sun Microsystems, Inc.
JAVASCRIPT® is a registered trademark of Sun Microsystems, Inc., used under license for technology invented and implemented by Netscape.
MarketSet and Enterprise Buyer are jointly owned trademarks of SAP Markets and Commerce One.
SAP, SAP Logo, R/2, R/3, mySAP, mySAP.com and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP AG in Germany and in several other countries all over the world. All other product and service names mentioned are trademarks of their respective companies.
Copyright 2002 SAP AG. All Rights Reserved