Problems with Non-roman Character (Korean) Searching Prepared by
Prepared by Young Ki Lee Young Ki Lee Senior Cataloging Specialist
Senior Cataloging Specialist Korean/Chinese Team Korean/Chinese
Team RCCD RCCD Library of Congress Library of Congress Slide 2
Topics to be covered 1.Non-roman script (Korean) searching under
CJK data fields without spacing 2.No Unified index (Normalization)
between Hangul (Korean) and Hancha (Chinese character) 3.Microsoft
Korean IME 4.Display of search results 5.CJK Compatibility Database
Slide 3 Title Word Search for Title Word Search for Search ( : the
border): -the number of hits on this ti: search is 363 -the ratio
of relevant hits only 13 % (13 out of 99) in the 1 st group (Books
1970-1993) -the records which have the word in any position in the
title fields (includes between subfields) are picked up by System,
such as : / : / /, : /, etc. -In Voyager (currently with space),
same search (tkey ) retrieves only 9 hits Slide 4 Search9 Slide 5
Title Word Search for Title Word Search for Search ( : the border):
-the number of hits on this ti: search is 360 -the ratio of
relevant hits only 13 % (13 out of 99) in the 1 st group (Books
1970-1993) -the records which have the word in any position in the
title fields (includes between subfields) are picked up by System,
such as : / : / /, : /, etc. -In Voyager (currently with space),
same search (tkey ) retrieves only 9 hits Slide 6 Title Word Search
for Title Word Search for Search ( : the border): -the number of
hits on this ti: search is 360 -the ratio of relevant hits only 13
% (13 out of 99) in the 1 st group (Books 1970- 1993) -the records
which have the word in any position in the title fields (includes
between subfields) are retrieved, such as = / = / /, = /, etc. -In
Voyager (currently with space), same search (tkey ) retrieves only
9 hits Slide 7 Title Word Search for Title Word Search for Search (
: the border): -the number of hits on this ti: search is 360 -the
ratio of relevant hits only 13 % (13 out of 99) in the 1 st group
(Books 1970- 1993) -the records which have the word in any position
in the title fields (includes between subfields) are retrieved,
such as =, =, etc. -In Voyager (currently with space), same search
(tkey ) retrieves only 9 hits Slide 8 Title Word Search for Title
Word Search for Search ( : the border): -the number of hits on this
ti: search is 360 -the ratio of relevant hits only 13 % (13 out of
99) in the 1 st group (Books 1970- 1993) -the records which have
the word in any position in the title fields (includes between
subfields) are retrieved, such as = / = / = / /, = /, etc. -In
Voyager (currently with space), same search (tkey ) retrieves only
9 hits Slide 9 Title Word Search for Title Word Search for Search (
: the border): -the number of hits on this ti: search is 360 -the
ratio of relevant hits only 13 % (13 out of 99) in the 1 st group
(Books 1970-1993) -the records which have the word in any position
in the title fields (includes between subfields) are retrieved,
such as = / = / /, = /, etc. -In Voyager (currently with space),
same search (tkey ) retrieves only 9 hits Slide 10 Title Word
Search for Title Word Search for Search ( : the border): -the
number of hits on this ti: search is 360 -the ratio of relevant
hits only 13 % (13 out of 99) in the 1 st group (Books 1970-1993)
-the records which have the word in any position in the title
fields (includes between subfields) are retrieved, such as = / = /
/ = /, etc. -In Voyager (currently with space), same search (tkey )
retrieves only 9 hits Slide 11 7 Slide 12 Title Word Search for
Title Word Search for Search ( : the border): -the number of hits
on this ti: search is 360 -the ratio of relevant hits only 13 % (13
out of 99) in the 1 st group (Books 1970-1993) -the records which
have the word in any position in the title fields (includes between
subfields) are retrieved, such as = / = / /, = /, etc. -In LC
Online Catalog: (currently with space), title word search retrieves
only 9 hits Slide 13 Title Word Search for Title Word Search for
Search ( : philology): -In OCLC, the number of hits on ti: search
is 308 -the ratio of relevant hits is only 37% (36 out of 95) in
the first group (Books 1900-1991) -Includes = = = / = / = = = / =
/, = /, etc., = /, etc. -In Voyager (currently with space), same
search (tkey ) retrieves 32 hits Slide 14 Title Word Search for
Title Word Search for Search ( : name of ancient Korean country)
Search ( : name of ancient Korean country) retrieves irrelevant
records, such as retrieves irrelevant records, such as = / / / / /
= / / / / / CD-ROM = CD-ROM/ / / / /CD-ROM = CD-ROM/ / / / / = / /
= / / = / / / / / / = / / / / / / = / / = / / 5 5 = / / /5 / / / /
/ / / / = / / /5 / / / / / / / / = / / /, etc. = / / /, etc. Slide
15 2 Slide 16 4 Slide 17 7 Slide 18 Kochoson8 Slide 19 komunso1
Slide 20 Komunso2 Slide 21 Komunso3 Slide 22 Title Word Search for
Title Word Search for ( : Korean Economy): ti: search ( : Korean
Economy): ti: search -search : the number of hits 300 -search : the
number of hits 652 -search : the number of hits 3 -search : the
number of hits 0 -search Hanguk kyongje : the number of hits 1,490
Title Phrase search for : ti= search Slide 23 Title Word Search for
Title Word Search for ( : Korean Economy): ti: search ( : Korean
Economy): ti: search -search : the number of hits 295 -search : the
number of hits 652 -search : the number of hits 3 -search : the
number of hits 0 -search Hanguk kyongje : the number of hits 1,490
Title Phrase search for : ti= search Slide 24 Title Word Search for
Title Word Search for ( : Korean Economy): ti: search ( : Korean
Economy): ti: search -search : the number of hits 295 -search : the
number of hits 652 -search : the number of hits 3 -search : the
number of hits 0 -search Hanguk kyongje : the number of hits 1,490
Title Phrase search for : ti= search Slide 25 Title Word Search for
Title Word Search for ( : Korean Economy): ti: search ( : Korean
Economy): ti: search -search : the number of hits 295 -search : the
number of hits 652 -search : the number of hits 3 -search : the
number of hits 0 -search Hanguk kyongje : the number of hits 1,490
Title Phrase search for : ti= search Slide 26 Title Word Search for
Title Word Search for ( : Korean Economy): ti: search ( : Korean
Economy): ti: search -search : the number of hits 295 -search : the
number of hits 652 -search : the number of hits 3 -search : the
number of hits 0 -search Hanguk kyongje : the number of hits 1,499
Title Phrase search for : ti= search Slide 27 Title Phrase Search
for Title Phrase Search for ( : Korean Economy): ti: search ( :
Korean Economy): ti: search -search : the number of hits 295
-search : the number of hits 652 -search : the number of hits 3
-search : the number of hits 0 -search Hanguk kyongje : the number
of hits 1,490 -search # : the number of hits : 461 (ti: AND ti: )
Title Phrase search for : ti= search Slide 28 Search ti: nodongja
or or or Search ti: nodongja or or or Slide 29 Slide 30 Korean IME
Problems 1. Personal name search with invalid character from Korean
IME -Search in pn: : 0 hit. (F9E1) is invalid character from Korean
IME -Search in pn: : 157 hits. (674E) is valid MARC21 character 2.
Title search with invalid character from Korean IME 2. Title search
with invalid character from Korean IME -Search in ti: : 0 hit.
(F941) is invalid character from Korean IME -Search in ti: : 21,393
hits. (8AD6) is valid MARC21 character 3. Korean Family name 3.
Korean Family name -No MARC 21 equivalent Slide 31 Display Order 1.
Browse search: sorted by Unicode value number roman Japanese Hancha
Hangul 2.Keyword search: sorted by alphabet order of Romanization
form number -- Romanization 3.Display order : character by
character on designated value Slide 32 sort2 Unicode total strokes
radical (# : stroke) : 9280: 14 167 (gold) 8 : 9580 : 8 169 (gate)
8 : 990A: 15 184 (eat) 6 : 9B42 14 194 (ghost) 10 : AC00 Slide 33
sort3 Slide 34 Display Order 1. Browse search: sorted by Unicode
value number roman Japanese Hancha Hangul 2.Keyword search: sorted
by alphabet order of Romanization form number -- Romanization
3.Display order : character by character on designated value NOT
word by word Slide 35 Slide 36 sort1 : C9C4 : CE68 : C911 : C778
Slide 37 Display Order 1.Browse search: sorted by Unicode value
number roman Japanese Hancha Hangul 2.Keyword search: sorted by
alphabet order of Romanization form number -- Romanization
3.Display order : character by character on designated value NOT
word by word Slide 38 CJK Compatibility Database 1. The CJK
Compatibility Database includes more than 450 non-MARC21 Chinese,
Japanese and Korean characters, Hangul syllables and diacritic
marks, matched with their MARC21 equivalents. 2. The database is
intended to enable catalogers to quickly and conveniently replace a
non-MARC21 character with its MARC21 equivalent. 3. The list of
characters in the database was initially identified by LC staff,
and was supplemented by entries in a similar database at Yale
University. 4. The database is a cooperative undertaking, and is
intended for the use of all CJK catalogers. If you encounter a
non-MARC21 character in the course of your work, please report it
to us so that it can be added to the database. Notify Young Ki Lee,
Senior Cataloging Specialist, Korean/Chinese Team, Library of
Congress, at [email protected]. Slide 39 Thank you