1 The Next-Generation Knowledge Management for Multilingual Agricultural Information Asanee Kawtrakul, Mukda Suktarachan, Aurawan Imsombut, Chaveewan Petchsiri, Chalermpon Sirigayon, Thana Sukvaree, Trakul Permpool, Prachaya Boonkhuan, Worapoj Peerawit, Intiraporn Mulasastra The Specialty Research Unit of Natural Language Processing and Intelligent Information System Technology Department of Computer Engineering, Kasetsart University, Bangkok, Thailand [email protected]Fifth Agricultural Ontology Service (AOS) Workshop 29 April 2004, Beijing, China
70
Embed
The next generation knowledge management for multilingual agricultural information
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
• NECTEC – I-Know (Information Extraction and Knowledge
Discovery) Project
• AFITA 2002 (First, work for Fun , then get the Fund)
3
Agenda
• Motivation
• System Architecture: Knowledge Management
• Automatic Ontology Construction and
Maintenance
• Ontology-based Knowledge Management
– Information Extraction
– Sumarization
– Knowledge Discovery
– Knowledge Tracking
4
Motivation
– Information Overload, especially, unstructured electronic articles and
reports – Language barriers
– Thailand is the Agriculture-based Country
Knowledge Management for Multilingual Agricultural Information Management
5
What is KM?
• Knowledge Acquisition • Knowledge Processing :
– Knowledge Discovery
– Best Practice
• Knowledge Service– Knowledge Tracking
6
Agricultural Information Knowledge Management
Related Projects
1. Multilingual Dictionary
2. Ontology Construction and Maintenance System
3. Knowledge Portal
• Information Extraction
• Summarization
• Knowledge Discovery
4. Knowledge Tracking
5. Machine translation
7
IntelligentSearch Engine
Knowledge Portal Processing
WWW
Unstructured,Semi-structured,
StructuredDocument
Meta DataAnnotation tools
KnowledgeStructure
Thai AGRISCorpus
Agricultural Information Bases
Real-World Ontology
Ontology Task Oriented
Ontology
MultilingualDictionary
MT KT
System Architecture
Rice
Diseases&How to protect?
How to plant in
the winter?
Follow up the price
etc.
Yield
8
Ontology Construction Ontology Construction and and
Maintenance SystemMaintenance System
9
Introduction to Ontology
Two essential aspects of Ontologies
- Real-world Ontology
- For IR, IE and Semantic Web
- Task-Oriented Ontology
- For IE, Knowledge Tracking
10
Introduction to OntologyReal World Plants Taxonomy Ontology
family genu
s
taxonomy
species plants
Part-of relation
plant reproductive organs
plant vegetative organs
fruit seeds
flower
Acalypha
concepts
instances
. . .
IS-A relation
11
Introduction to OntologyTask Oriented ontology
disease control
cause from pathogen
cause from environment
Plant Diseases
symptom
causeTreatment
Scorch
Blight
. . .IS-A relation
concepts
instances
specific relations(e.g. Cause, hasSymptom)
. . .
12
Why needs Automatic Ontology construction and maintenance
system?
• Enhance performance of Information processing system such as IR, IE, Knowledge Tracking, etc.
• Creating ontology by the expert is an expensive task and it is endless task for ontology maintenance, especially new instance.
13
Automatic Ontology Construction
System Architecture
Heuristic Rules
Structured CorpusUnstructured Corpus
Raw Text Dictionary AGROVOC Thesaurus
Morphological Analysis
Term Extraction
Structure Analysis
Database Conversion
Thesaurus Recycling
Organizing System
VerificationSystem
Semantic Relation
Identification
14
Automatic Ontology Construction
• Sources– Thesaurus– Dictionary of Agriculture – Technical paper, Published document, Encyclopedia
• Differentiation of 3 sources.Thesaurus Dictionary Text
Structuring Yes Yes No
Terms Relation Organization
Yes Yes No
Expert Validation Yes Yes No
Up-to-date Data No No Yes
Amount of Data Small Small Large
15
Ontology from AGROVOC Thesaurus
• Technique:– Convert BT/NT to IS-A Relation
• Problem:– Not all BT/NT could define to IS-A Relation
Their semantic could defined as Ingredient of and other. For example.
MILK NT: Milk Fat (Ingredient of)
• Solutions:– NLP Technique: NP Analysis
16
Ontology from Dictionary• Applied Plants Name Dictionary for adding Formal
Name, Local Name which familiar to users in retrieving and machine translating.
Acalypha EUPHORBIACEAEbrachystachya Hornem. H ตำ��แยดอยใบบ�ง Tamyae doi bai bang
( General ).chinensis Roxb. = A. indica L.delpyana Gagnep. US ข�งปอยตำ วเมี�ย Khang poi tua mia (Central).evrardii Gagnep. = A.siamensis Oliv. ex Gagehispida Burm. f. ExS เกี้��ยวเกี้ล้�า Kiao klao,ไหมพรม Mai phrom (Northern);ห�งกระรอกแดง Hang krarok daeng (Bangkok); หางแมว Hang maeo(Central) ห�ปล้าช่�อน Hu pla chon (Ratchaburi); chenille plant, Red hotcat's tail.
Family/SubfamilyGenus
Specific epithetAuthor Name Formal Name
Local NameHabit
17
Ontology from Plant Names Dictionary
• Technique:– Applied task oriented parser to extract relation terms.
– Converted terms by alphabet characteristic and position of terms to relational database
– Concept Boundary identification• ผงไหมี ใช�ทำ��เป�น ฟิ"ล์$มีเคล์�อบร กษ�คว�มีสดในอ(ตำส�หกรรมีประมีง• Silk powder used as film for maintain freshness in seafood industry.
Concept => film, film for maintain freshness, film for maintain freshness in seafood industry
• Many herbs can be used as medicine and some of them are manufactured in the industry level, such as garlic, ginkgo biloba.Candidate Terms => herbs, medicine, industry
20
• Clue Word Ambiguity• ทานตะว#นเป�นพ&ช่น��าม#น • Sun-flower is oil crop.
=> HYPONYM (Sun-flower, Oil Crop)
• ดอกี้ต#วผ��ม�ล้#กี้ษณะช่�อเป�นพ� �มส�เขี�ยวอ�อน• Staminate is a green bush.
=> PROPERTIES (Flower, Color)
• Implicit Expression (No Clue-word)• Phrase level
“Jasmine Rice” => HYPONYM (Jasmine Rice, Rice)
Problem
21
Solutions
Problem TechniqueConcept Identification NP Analysis by using grammatical rules and
statistical bases.
Clue Word Ambiguity heuristic rules such as using the word list of object properties to eliminate non-concept term.
Implicit Expression Name Entity Extraction
22
Forest Ontology Organizing
• Use AGROVOC Ontology to be core tree
• Merge forest ontology from dictionary and text to
core ontology by NLP Technique such as Phrase
Analysis, Term Matching
23
Plant Products
Fruit
Watermelons (a)
Crops
Oil Crops
Oil Palms
Crops
Sesame
(c)
(d)
Fruit
Tamarind
Plant Products
Fruit
Watermelons Tamarind+
Crops
Oil Crops
Oil Palms
Sesame+
Crops
Oil Crops
Oil Palms
Crops
Oil Palms
Crops
Oil Crops
Oil Palms+
Cereals
Maize Maize
Dent Maize(b)
+
Field Crops
Maize
Cereals
Maize
Field Crops
Maize
Dent Maize Dent Maize
Forest Ontology Organizing
24
Verification Tools• For the expert to verify output and add additional related
1. To generating text segmentation (EDU: Elementary Discourse Unit )
2. To Build discourse tree structure from EDUs in step 1.
3. Select leaf nodes as Knowledge Summary(salient unit) .
43
Knowledge Discovery
44
Knowledge Processing Architecture
Template Construction
Template
Text Extraction
Knowledge Summary
Summarization
Knowledge Structure
Document
AnnotatedCorpus
KnowledgeExtraction
Generalizationrules
Knowledge Discovery
Ontology
Ontology
45
Knowledge structure• Knowledge structure consists of
– Plant growing method• Variety selection
• Soil preparation
• Seedling preparation
• Cultural practice
– Plant disease and insect control• Cause and symptom
• Treatment / killing
• Protection
46
Knowledge extraction
• Relation to be extracted– Cause relation
• e.g. Pyricularia grisea Causes of Blast disease in rice
– Effect relation• e.g. The Blast symptoms caused by Pyricularia grisea
are big brown spots like eye shape on leaf and……..
– Consequence relation
47
Generalization of Cause/Result Relations
• Need processes• Knowledge representation
• Induction reasoning
• Need ontology to define the supper set of insect and micro organism: e.g Louse = {เพล้��ยไฟพร�กี้, เพล้��ยไกี้�แจำ�, เพล้��ยจำ#;กี้จำ#-นฝัอย, เพล้��ยจำ#;กี้จำ#-น ….}
48
Knowledge Discovery
• Generalized Rules x Disease(x, เพล้��ยฯ)Symptom(x, ใบ, ใบหง�กี้งอ )
x Disease(x, Disease fromLouse)Symptom(x,leaf, curve )
x Disease(x, โรคุใบไหม�/โรคุไหม�)Symptom(x, ใบ, แผล้ส�เทา)
x Disease(x, Leaf blight/Blast)Symptom(x, leaf, grey blot)
49
Knowledge Discovery
• Generalized Rules for symptom– All louses cause of curve leaf symptom in fruit
– All Blast and Blight leaf diseases in plant have
grey blots/spots
50
From Generalized Symptoms Matrix to Inductive Prediction
• By using ID3 technique X3:leaf,curve
X7:leaf, grey blot
Y
Y
N
Disease caused by louse
Leaf blight/Blast Disease
51
Information Retrieval
Multi-viewpoint Knowledge Tracking
52
Why needs Multiviewpoint Knowledge Tracking?
53
What’s Knowledge Tracking?
– Interesting Viewpoint of Knowledge that different for each user.
Gain Knowledge About :- Have 5 Documents in Computer
Computer
Author Year
Line Author Year
A 2000
B 2001
C 2002
Gain Knowledge About :-Have 5 Documents in Computer-3 Authors Published In Computer Domain-Computer Domain Started at year 2000- and more ...
Extract from Doc.
59
Knowledge Tracking : Different Tracking Paths (Same Documents)
1 23 4 5
Computer
Author
A B C
2000 2002 2004 2001 2002
1 2 34 5
Computer
Year
2000 2001 2002
A B C A C
Another Knowledge Gain :-Author B is a new researcher.-Author C publishes papers continuously-Author A do not publish in year 2001-And more...
Another Knowledge Gain :-Author C is only one who published in year 2001-Author A and B are pioneer researcher in domain.-And more ...
60
Tracking by domain
Plant
Disease
Cause SymptomPrevention
Domain=Plant
C F A D B E
Title=Ginger Title=Cabbage Title=Cucumber
…
61
Title=Cabbage
A D
Author=Doae Author=KU
…
Plant
Disease
Cause SymptomPrevention
Tracking by title
62
Plant
Disease
Cause SymptomPrevention
Tracking by author
Author = KU
C F A D B E
…
Title=Ginger Title=Cabbage Title=Cucumber
63
MetadataClassification
MachineTranslation
mode
Input word search
Contentarea
64
65
66
67
68
ConclusionTo Be continued :Forever Maintaining Ontology in
• AFITA/WCCA2004
Joint Conference the 4th International Conference of the Asian Federation of Information Technology in Agricultural andThe 2nd World Congress of Computers in Agricultural and Natural Resources
August 9-12,2004 in Bangkok, Thailand
69
THE END.
Thank you for your attention.
84
Future Works
• Dissolving Problems
– Head VS. Non-Head of NP• ม�กี้ารน�าใบหมี�อนมาใช่�เป�นอ�ห�รส ตำว$ เช่�น ปล์� ว ว คว�ย เป�นต�น
– Implicit Expression in Sentence level สารพ�ษในโล้�ต�;น ม�คุ�ณสมบ#ต�ในกี้ารไล้�แมีล์ง ถ�าใช่�ในร�ปผงจำะม�ประส�ทธิ�ภาพ