Standardization of Internationalized Domain Name at IETF

Post on 12-Jan-2016

18 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Standardization of Internationalized Domain Name at IETF. 24 Jan 2002 Yoshiro YONEYA JPNIC. What is IDN?. I nternationalized D omain N ame. Current domain name is represented with ASCII alpha-numeric and hyphen characters. - PowerPoint PPT Presentation

Transcript

Standardization of Internationalized Domain Name

at IETF

24 Jan 2002

Yoshiro YONEYA <yone@nic.ad.jp>

JPNIC

24 Jan 2002 APAN2002 Conference 2

What is IDN?

• Internationalized Domain Name.– Current domain name is represented with

ASCII alpha-numeric and hyphen characters.– IDN is a technical challenge to represent

domain name with not only ASCII but also NON-ASCII characters.

24 Jan 2002 APAN2002 Conference 3

What is Internationalization?

• Framework to extend character repertoire for domain names.

• Need to be a Global Standard not to lose global communication.

• IETF IDN (Internationalized Domain Name) WG is doing the work.

• Some confusion by using the word ‘Multilingualization’.– Character is just one of a component of languages.– Multilingual domain name is a service level’s aspect.

24 Jan 2002 APAN2002 Conference 4

Internationalized Domain Names

华人 .公司 .cn 華人 .商業 .tw

高島屋 . 会社 .jp

삼성 . 회사 .kr 三星 . 회사 .krم. االهرام

viagénie.qc.caקום.ישראל

ที�เอชนิ�ค.พาณิ�ชย์ .ไทีย์

現代 .com ヤフー .comhttp://www.jdna.jp/activities/event/jdn-tutorial/IDNSDK.pdf

24 Jan 2002 APAN2002 Conference 5

Why IDN?

• Increases of the Internet users who are not familiar with English.– Easy to memorize, type in, etc.

• Drastic changes of usage of domain name.– Domain name is now used as not only host

name but also signboard.

• Creates new business opportunities.– Many ventures began services.

24 Jan 2002 APAN2002 Conference 6

Drawback of IDN

• Loses global acceptability at end-user interface.– Hard to type in or display NON-ASCII characte

rs without appropriate I/O devices and / or softwares.

• Cause impact to the operation.– Requires software update and / or additional pr

ocessing.– Deployment issue.

24 Jan 2002 APAN2002 Conference 7

History of IDN WG

• Established on Jan 2000.– Mainly discussion is done on mailing list.

• Had 1st meeting at 47th IETF at Adelaide.– From then, having meeting every IETF.

• Decided WG’s solution at last (52nd) IETF.– IDNA, NAMEPREP and Punycode (formerly k

nown as AMC-ACE-Z).– Waiting for WG last call.

24 Jan 2002 APAN2002 Conference 8

Scope and priority of IDN WG

• Provide standard.– Not to divide the global connectivity and communication

of the Internet.

• Backward compatibility.– Compatibility with current DNS and application protocols

to work with current Internet infrastructure.

• No localization.– Independent from certain regions, countries and / or

languages– Refer to existing universal standards– Common framework essential to internationalization

24 Jan 2002 APAN2002 Conference 9

IDNA(Internationalizing Domain Names In Applications)

draft-ietf-idn-idna-06.txt

• An architecture denotes how to process IDN.– Use Unicode which is upper compatible with ASCII as

a character codeset.– Normalize internal representation of characters which h

as multiple code points such as upper/lower, full-width/half-width and composing characters, into a single representation not to fail matching.

– Represent NON-ASCII characters which inputted or displayed at user interface as an ASCII Compatible Encoding (ACE) string on the Network.

– Those processes be performed in application software.

24 Jan 2002 APAN2002 Conference 10

Important point of IDNA

• Representation at the user interface layer and the network layer is different.– Though the same for ASCII domain names.

• Application solution.– Least impact to the Internet infrastructure.

24 Jan 2002 APAN2002 Conference 11

Image of the IDNA

User

InternalRepresentation

UI

API

Application servers

End system

Application

Local

Int’l

Resolver

DNS servers

NAMEPREPTo/From Unicode

To/From ACE

NAMEPREP

To/From ACE

To/From Unicode

24 Jan 2002 APAN2002 Conference 12

NAMEPREP(Stringprep Profile for Internationalized Host Name

s) draft-ietf-idn-nameprep-07.txt

• Profile for STRINGPREP (Preparation of Internationalized Strings)– draft-hoffman-stringprep-00.txt

• Some scripts such as alphabet have multiple representation for a character.– Domain name is case insensitive.

• Normalization process to unify representation of strings that is the same in meaning or displaying into a single representation.– Case (upper / lower)– Compatible character (full / half width)– Composing character

24 Jan 2002 APAN2002 Conference 13

Important point of NAMEPREP

• Normalize representation of Internationalized domain name string to match correctly.– ‘a’ vs ‘A’– ‘u’+‘¨’ vs ‘ü’– ‘ ア’ vs ‘ ’ア

24 Jan 2002 APAN2002 Conference 14

Processes in NAMEPREP

1. map• Case folding of upper/lower characters

(UTR#21)

2. normalize• Normalize representation of string (UAX#15

NFKC)

3. prohibit• Check out inappropriate character as domain

name.

24 Jan 2002 APAN2002 Conference 15

ACE(ASCII Compatible Encoding)

• Represent NON-ASCII characters by ASCII characters.– Easy to apply current DNS.– Least impact to current applications.

• Decreases maximum characters in each label.– Penalty of using only 5bit to represent 8bit data.– Requires some sort of compression algorithm.

24 Jan 2002 APAN2002 Conference 16

ACE Identifier

• Requires explicit ACE-identifier.– For reverse conversion.– Choice of ACE-ID is political issue.

• ACE-ID itself is ASCII string, so that if any proposal for ACE-ID is raised, it will be registered as ASCII domain name.

• Actually happened at gTLD.

• IANA will assign the ACE-ID.

24 Jan 2002 APAN2002 Conference 17

Criteria of ACE selection

• Simple algorithm.– For ease implementation.– Interoperability.

• Effective compression results for practical IDNs.– To accommodate characters as much as possible.

• bilateral corresponding between encoding and decoding.– To avoid existence of alternative encoded representatio

n for one IDN.– Security consideration.

24 Jan 2002 APAN2002 Conference 18

Comparison of ACE proposals

RACE BQ--3BS6KZZMRKPDBSJQ4EYKIMHTKQGYUZU2CM.JP

Punycode ZQ--ECKWD4C7C777U7MWO4BOV4JIOAU09J.JP

Encoding sample of ‘ 日本語ドメイン名試験 .JP’

Evaluation resultfrom existingJapanese JPdomain names

24 Jan 2002 APAN2002 Conference 19

Punycode draft-ietf-idn-punycode-00.txt

• Selected ACE of IDN WG.• Compression algorithm.

– Extract characters by ascending order of codepoint.– Encode difference of codepoint from previously proces

sed character’s and the position into an integer.– Extract Letters, Digits and Hyphen as bootstring.

• ASCII conversion algorithm.– Introduced new concept named ‘Generalized variable-l

ength integers’.– BASE36 (A-Z, 0-9).

24 Jan 2002 APAN2002 Conference 20

Compression process of Punycode(simplified for understanding)

• “ 文字列例”• Compression.

1. 1:U+6587 2:U+5B57 3:U+5217 4:U+4F8B

2. 4:0x4F8B 3:0x28C 2:0x440 1:0xA30

3. 0x13E30 0xA33 0x1102 0x28C1

sort, diff

To integer(diff*chars+position)

24 Jan 2002 APAN2002 Conference 21

Generalized variable-length integers of Punycode

• 12345 in decimal is represented as 1*10^4+2*10^3+3*10^2+4*10^1+5*10^0

• Digits in all place are 0-9, so components in sequential 12345 cannot distinguish 123 and 45 or 1234 and 5.

• Furthermore, 012345 and 12345 are the same value with different representation.

• GVLI (Generalized variable-length integers) is an idea to solve this problem.

• Defines threshold for each place, and recognize a number below the threshold is delimiter.

• Threshold is an appropriate number smaller than base number.

24 Jan 2002 APAN2002 Conference 22

Encoding process of Punycode (simplified for understanding)

• Assign A-Z0-9 to GVLI.– Assume 36 for base, 10, 18, 25, 25 for thresholds.1. 0x13E30 0xA33 0x1102 0x28C1

2. OIUD3. BS44. CN85. XML

• “ 文字列例” =>“OUIDBS4CN8XML” .– Real Punycode generates “FSQW5D78MBSK”.

24*1+18*26(=1*(36-10))+30*468(=26*(36-18))+13*5148(=468*(36-25))

11*1+28*26+4*46812*1+23*26+8*468

33*1+22*26+21*468

24 Jan 2002 APAN2002 Conference 23

Standardization of IDN is just the start point of utilization

• End users uses IDN with application softwares.– Web, Mail, etc.

• IDNA requires application’s correspondence.• Must define how to deal IDNs in application proto

cols.

Standardization of IDN does not mean ready to use. Just a start point for applications incorporating

new features.

24 Jan 2002 APAN2002 Conference 24

HTTP Request(DNS resolve only)

Web

User

http:// ジェーピーニック .JP/

ZQ--HCKQZ9BZB1CYRB.JP

Web server’s

IP adress

GET http:// ジェーピーニック .JP/ HTTP/1.1Host: ジェーピーニック .JPReferer: http:// ジェーピーニック .JP/

Error!

DNS

24 Jan 2002 APAN2002 Conference 25

HTTP Request(ACE in HTTP header)

Web

User

http:// ジェーピーニック .JP/

ZQ--HCKQZ9BZB1CYRB.JP

Web server’s IP address

GET http://ZQ--HCKQZ9BZB1CYRB.JP/ HTTP/1.1Host: ZQ--HCKQZ9BZB1CYRB.JPReferer: http://ZQ--HCKQZ9BZB1CYRB.JP/

Contents

DNS

24 Jan 2002 APAN2002 Conference 26

References

• IETF IDN WG Web page– http://www.i-d-n.net/

• Unicode Consortium– http://www.unicode.org/

24 Jan 2002 APAN2002 Conference 27

Acknowledgement

• Telecommunications Advancement Organization of Japan (TAO).– JPNIC’s research activity of security investigati

on of IDN is a part of TAO’s research.– http://www.shiba.tao.go.jp/

24 Jan 2002 APAN2002 Conference 28

IDN Compliant clients & implementations

• Mozillahttp://playground.i-dns.net/mozilla/index.html– Plug-in to Mozilla, resolution using RACE

• Operahttp://www.opera.com/– Native, Resolution using RACE

• Internet Explorer 5 or higherhttp://www.microsoft.com/windows/ie/default.asp– Uses keyword search engine as RACE converter

• mDNkithttp://www.nic.ad.jp/jp/research/idn/mdnkit/download/– Opensource toolkit for developing IDN compliant softwares

top related