Top Banner
Korean script searching in Korean Library OPACs Junglim Chae Yonsei University
39

Korean script searching in Korean Library OPACs

Jan 21, 2016

Download

Documents

Caesar

Korean script searching in Korean Library OPACs. Junglim Chae Yonsei University. Indexing Method. N-Gram Morphological Analysis. N-Gram Indexing. N-Gram : Unigram, Bigram, Trigram, N-Gram E.g.) 아버지가 방에 들어가신다 12 Index by Bigram Segmentation - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • Korean script searching in Korean Library OPACs Junglim Chae Yonsei University

  • Indexing Method

    N-Gram

    Morphological Analysis

  • N-Gram IndexingN-Gram : Unigram, Bigram, Trigram, N-GramE.g.) 12 Index by Bigram Segmentation, , , 0 , 0, , 0 , 0, , , , Many index terms-many results but lots of noise High recall ratio but low precision ratio

  • Morphological AnalysisRequires a morphological analysis dictionaryE.g.) Three Index by morphological analysis, , Ability to match linguistically similar terms Faster performance with a smaller index Accurate matches that meet user expectationsHigh precision ratio but low recall ratio

  • N-Gram Vs. Morphological Analysis

    N-GramMorphological AnalysisRecall RatioHighLowPrecision RatioLowHighSize of IndexBigSmallIndexing SpeedFastSlowSearch SpeedSlowFastApplicationLibrariesWeb Search Engines

  • A Case Study

    Yonsei University LibraryLibrary System: Maestro-Y Search Engine: K2 by VerityIndexing Method N-Gram (bigram) + Morphological AnalysisIndexing RulesRule1: Divide Strings by space Rule2: Extract index using bigram indexing methodRule3: Add the whole string excluding spaces between strings Rule4: Add words from Korean morphological analysis dictionary

  • A Case Study

    Yonsei University LibraryE.g.)

    / (rule1), , , , (rule2)(rule3)(rule4)Index: , , , , , ,

  • Search Tips

  • Search Tips(1)Keyword Search

    , Default Search OptionUse at most 3 keywordsUse Boolean operatorsOmit Stop-words

  • Search Tips(2)Keyword Search

    Follow the Korean Word Division Rules E.g.) (O) (X)

  • Search Tips(3)

    Keyword Search

    Compound Nounsdo not use spaces between nounsE.g.) (O), (X )

  • Browse SearchBegin with or Truncation,

    When you already know the first word of the title, author, or publisher E.g.)

    Search Tips(4)

  • Browse Search

    Korean ClassicsE.g.)

    Search Tips(5)

  • Exact Match

    Precise Search

    Known itemsE.g.) Search Tips(6)

  • Exact Match

    Single character wordsE.g.) , , C

    Search Tips(7)

  • Support Hangul/Hancha Searching

    E.g.) /

    Search Tips(8)

  • Japanese KanaArchaic KoreanRussianSpecial characters : Choose scripts from Multi-language Input Table

    Search Tips(9)

  • E.g.) Multi-Script Input Table

  • Japanese Kana//

    Search Tips(10)

  • Personal names ; Shakespeare ; Murakami, Haruki ; ; ,

    Search Tips(11)

  • Space Considered as ANDE.g.) = AND In some OPACs, spaces in the character fields do make a difference in retrieval

    Search Tips(12)

  • Comparative search with and without space

  • Thank You

    [email protected]

    *********************.************

    *****

    *