Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan. Text-based Construction and Text-based Construction and Comparison of Domain Ontology: Comparison of Domain Ontology: A study based on classical A study based on classical poetry poetry Chu-Ren Huang Academia Sinica
60
Embed
Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan. Text-based Construction and Comparison of Domain Ontology: A study based on classical.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.
Text-based Construction and Text-based Construction and Comparison of Domain Comparison of Domain
Ontology: Ontology: A study based on classical A study based on classical
poetrypoetry
Chu-Ren HuangAcademia Sinica
Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.
Outline• Motivation and Framework: Laying the fou
ndation
• Basic Resources: The building blocks
• From General Ontology to Specific Ontology: Study of Shu-Shi Poems
• Epilogue: From Specific Ontology to General Ontology
• Conclusion
Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.
Motivation and Framework: Laying the
foundationKnowledge Structure Discovery
Issues and Significance
Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.
Knowledge and Knowledge Structure
Variation
Knowledge is Structured Information
• Most salient factors dictating variations in knowledge structures are time, space, and domain
• Language is both the product and conduit of the conceptual structure of its speakers
Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.
Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.
The Shakespearean-garden Approach
• A Shakespearean garden collects all the plants referred to in Shakespearean texts.– The garden is used to illustrate the flora of the
Shakespearean England and gives scholars a context in which to interpret his work.
• There is a knowledge structure behind each corpus (i.e. a collection of texts with design criteria)
Lexicon as a Structured Inventory of Conceptual Atoms
For instance, complete set of texts by an author, from a certain period, or in a certain domain
Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.
The Ontology-merging as Ontology-discovery
Approach I• Ontology provides a structure for
knowledge to be situated• However, there is a dilemma for the
construction of a new ontology– If no existing ontology is referred to:
reinventing the wheel, difficult to start a structure from scratch without rules
– If existing ontology is referred to: mislead by existing structure, mismatched or erroneous
Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.
The Ontology-merging as Ontology-discovery
Approach IIThe Solution• Map conceptual atoms to two (or more) ref
erence ontologies• Merge the two resultant ontologies
– Matched Mapping: Confirmation of knowledge structure
– Mismatched Mapping: Only one or neither is correct. Possibly lead to discovery of new knowledge structure
– Complimentary Mapping: Increases coverage
Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.
Further Developments• The Ontology of Chinese Characters: A co
mmon knowledge structure for East Asian Cultures
• Contrary to earlier study of constructing specific ontologies based on general ontology, the Chinese character ontology will be a crucial general ontology based on a specific ontology
Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.
Basic Resources: The building blocks
From Text to Lexicon
From Lexicon to Ontology
Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.
Resources used • WordNet
• SUMO Ontology
• Academia Sinica Bilingual Ontological Wordnet (Sinica BOW)
• Domain Lexicon Management System:Segmentation,
New Word Detection
Lexical Database
Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.
Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.
What We Learned about Specific Ontology
Constructing ontology from a larger corpus and comparison of two specific ontologies
• Local information can be effectively mapped• Global information offers deeper insights into the knowle
dge structure ☆Human conceptualization of animals and plants has be
en relatively stable. But NOT artifacts.☆Regardless of the criteria for classification, genetically determine
d features (behaviors, appearances etc.) do not vary greatly☆However, human technology is highly fluid. Our conceptualizatio
n of artifacts is highly dependent on the development of engineering and by our varying societal needs.
Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.
Towards a Workbench for Specific Ontology: Browser
and EditorUser loginUser login
Function menu(Personal ontologies list)
Function menu(Personal ontologies list)
Browse an ontologyBrowse an ontology Edit an ontologyEdit an ontology Add an ontologyAdd an ontology
1. SUMO2. SUMO + WordNet +concept map with l
exicon
1. SUMO2. SUMO + WordNet +concept map with l
exicon
LogoutLogout
1. Update lexical concepts
2. Update mapping between WordNet synset and lexicon
3. Edit other information in lexicon
1. Update lexical concepts
2. Update mapping between WordNet synset and lexicon
3. Edit other information in lexicon
Import textImport text Import lexiconImport lexicon
Word segmentationWord segmentation
Match concept and synset automatically
Match concept and synset automatically
1. Suggestion list2. Missing list
1. Suggestion list2. Missing list
Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.
Constructing a Specific Ontology
• Import text, or domain lexicon – Select style of writing– Select category of word list for word segmentation– Select reference ontologies to match SUMO and lexi
con
• Information of suggestion list– Candidate synset– Candidate synset synonyms– Explanation of candidate synset– Concept of candidate synset
Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.
Example of SUMO concept
Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.
http://bow.sinica.edu.tw/ont/SuShi_ont.html
Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.
Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.
Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.
Summary and Future Work• Ontologies represent the knowledge structure of
a domain or historical period• We have provided an online interface to browse
ontologies and lexica• In the future, we will complete the online ontolog
y editor and browser, which will– Map lexicon, WordNet and SUMO.– Integrate ontologies based on different texts.– Facilitate comparative studies of various domain ont
ologies.
Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.
From Specific Ontology to General Ontology漢字知識本體
An introduction to Hanzi ontologyResearch in Collaboration with an
d Conducted by
Ya-Ming Zhou 周亞民
Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.
Outline• Introduction• The logographic features of Hanzi• Semantic symbols of Hanzi• The structure of lexicon relation• The structure of Hanzi ontology• Summery
Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.
Introduction (1/2)• Ideograph: Each Chinese character (kanji) is a
writing unit which also represents a pre-defined concept.
The represented concept is independent of phonological variations, including language changes and cross-lingual adaptation
• The complete Han writing system is expected to consists of 40,000-70,000 characters each representing one or more concepts.
Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.
Logographic Features of Hanzi
• 馬 is a semantic symbol of horse
• Examples:– 驩 : 馬名 a kind of horse– 驫 : 眾馬 horses– 騎 : 騎馬 riding a horse– 驍 : 良馬 a good horse– 驚 : 馬驚 a scared horse馬
Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.
Semantic Symbols in Hanzi(1/3)
• The characteristics of Hanzi mainly come from semantic symbols.
• According to Xyu Shen’s ShoWenJieZi (100 A.D.) , there are 540 semantic classes (radicals)
• These radicals represent the knowledge structure of Hanzi.
Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.
Semantic Symbols in Hanzi• 540 radicals are used to classify all Chinese cha
racters and represented
• The semantic symbols about animals:– 鳥 (bird), 隹 (bird), 犬 (dog), 馬 (horse), 羊 (sheep), 虫
(insect)…
• The semantic symbols about plant:– 艸 , 木 , 竹 , 禾…
• The semantic symbols about religion:– 示
Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.
The Classification of Hanzi with 艸 ( 艹 )
蕃藥蔬菜薪苑藩藉茭
萌莖芽茄苗蓮葉
茲蒼芳落茸茂荒薄芬蒸莊
蕉蘭芒蒙菌蔓苦菊茱范荷茅蕈蔚菲草
Parts
DescriptionUsage
Plants
Name Parts Description Usage
Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.
The structure of Hanzi lexical relation(1/2)
Paradigmatic associations
Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.
The structure of Hanzi lexical relation(2/2)
Syntagmatic associations
Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.
Tennis Problem: Cross-classification and multiple
Inheritance• WordNet suffers from “tennis problem”( Gorge Miller,1993)
• Tennis refers both to the entity, the sport, and the organization running the sport etc.
• Tennis problem is caused by the lack of syntagmatic association in WordNet
• Most computational ontologies only have paradigmatic associations, and have the same problem
Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.
The semantic symbol 艸 ( 艹 ) pulls the concepts about plant
s together
Body Parts
Physical
Subjective Assessment Attribute
Internal Attribute
Plants
Process
Abstract
External Attribute
Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.
The structure of Hanzi Ontology
• Each character is connected with semantic symbols ontology.
• The derived and loan meaning of character are mapping to SUMO.
Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.
字形 意義
G1 G2 G3 . . . Gm
M1 - - . . . O
M2 - O - . . . -
M3 - - . . . -
M4 L D - . . . -
M5 - D . . . -
. O - - . . . -
. - L - . . . -
Mn - - D . . . -
鳥 艸 竹
……
SUMO
時間 Ti+1 的字形、意義、 、
水 示
Ti
Semantic symbols ontology
G-Glyph M-Meaning O-Original Meaning D-Derived Meaning L-Loan Meaning
D
Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.
字形 意義
G1 G2 G3 . . . Gm
M1 - - . . . O
M2 - O - . . . -
M3 - - . . . -
M4 L D - . . . -
M5 - D . . . -
. O - - . . . -
. - L - . . . -
Mn - - D . . . -
鳥 艸 竹
……
SUMO
水 示
Ti+1
Semantic symbols ontology
G-Glyph M-Meaning O-Original Meaning D-Derived Meaning L-Loan Meaning
Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.
字形 意義
G1 G2 G3 . . . Gm
M1 - - . . . O
M2 - O - . . . -
M3 - - . . . -
M4 L D - . . . -
M5 - D . . . -
. O - - . . . -
. - L - . . . -
Mn - - D . . . -
鳥 艸 竹
……
SUMO
水 示
Ti+1
Semantic symbols ontology
Ti-1Ti
G-Glyph M-Meaning O-Original Meaning D-Derived Meaning L-Loan Meaning Time
Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.
Summery• Hanzi ontology is organized by glyph( 字
形 ) structure of each character.
• We believe it can assist solving the problem in Chinese information processing.
• It would also be a good candidate for cross-lingual general ontology for Easter Asian languages using Chinese characters.
Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.
Conclusion• KSD can be achieved by mapping lexicon
to general ontology• General ontologies have constraints in thei
r applicability. In particular, the sub-structures dependent on human activities will also vary easily when human societies change.
• Specific ontology can turn out to be a useful basis for general ontology.