Top Banner
IndoWordNet Database Design Presented By: Konkani NLP Team Goa University IndoWordNet Database Design 1
33

IndoWordNet Database Design Presented By: Konkani NLP Team Goa University IndoWordNet Database Design 1.

Dec 13, 2015

Download

Documents

Sybil Armstrong
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: IndoWordNet Database Design Presented By: Konkani NLP Team Goa University IndoWordNet Database Design 1.

1

IndoWordNet Database Design

Presented By:Konkani NLP Team

Goa University

IndoWordNet Database Design

Page 2: IndoWordNet Database Design Presented By: Konkani NLP Team Goa University IndoWordNet Database Design 1.

2

Brief Outline• Objectives

• Background

• Requirements

• Proposed database design

– Database design details

– Issues to be resolved

– Tools and Scripts

• API’s

– IndoWordNet API

– Layers of API

– Class Diagram

– Sample API code IndoWordNet Database Design

Page 3: IndoWordNet Database Design Presented By: Konkani NLP Team Goa University IndoWordNet Database Design 1.

IndoWordNet Database Design 3

Objectives

• To finalize the database design.

• To finalize tools/script necessary for distributing the database.

• API design.

• API demonstration.

Page 4: IndoWordNet Database Design Presented By: Konkani NLP Team Goa University IndoWordNet Database Design 1.

IndoWordNet Database Design 4

Background

• IndoWordNet is a multilingual WordNet that links WordNets of different Indian languages.

• A WordNet is a crucial resource for a language which aids in NLP tasks such as Machine Translation, Information Retrieval, etc.

• Databases necessary maintain the data for one or multiple WordNets.

• Database needs to support development of online and offline applications.

Page 5: IndoWordNet Database Design Presented By: Konkani NLP Team Goa University IndoWordNet Database Design 1.

IndoWordNet Database Design 5

Requirements

• Database design should accommodate multiple languages.

• Store synsets of different languages. • Store semantic relations.• Store lexical relationships. • Store ontological details.• Allow any additional information to be stored for each synset.

• Avoid duplication of data.• Open, scalable, modular design.• Independent of storage technology.

Page 6: IndoWordNet Database Design Presented By: Konkani NLP Team Goa University IndoWordNet Database Design 1.

IndoWordNet Database Design 6

Proposed Database Design• Software Platform: Reference implementation done using

Mysql– Mysql is freeware– Supported by Windows & Linux O.S

Database design details– wordnet_master• It contains language independent data.

– wordnet_<respective_language>• It contains language dependent data.• It contains the synset data for a language.

– wordnet_admin• It contains data necessary for administrative purpose.

Page 7: IndoWordNet Database Design Presented By: Konkani NLP Team Goa University IndoWordNet Database Design 1.

IndoWordNet Database Design 7

wordnet_master

• The wordnet_master maintains the data shared by all the languages.

• The wordnet_master includes tables for semantic relations.

• It will include all ontology related tables in English.

• The language specific data will be available in the wordnet_<respective_language> database.

Page 8: IndoWordNet Database Design Presented By: Konkani NLP Team Goa University IndoWordNet Database Design 1.

IndoWordNet Database Design 8

List of tables in wordnet_master– wn_master_category• To maintain the different grammatical categories such as noun, verb,

etc.

– wn_master_language• To maintain the language information in a database.

– wn_master_language_lss_range• To maintain language specific synset range w.r.t. the given language.

– wn_master_synset_file• To associate a file with a synset.

Page 9: IndoWordNet Database Design Presented By: Konkani NLP Team Goa University IndoWordNet Database Design 1.

IndoWordNet Database Design 9

Tables for maintaining semantic relation

• wn_rel_hypernymy_hyponymy – To maintain the hypernymy and hyponymy type of a relation which is a IS-A-

KIND-OF type of a semantic relationship between synsets.

• wn_rel_meronymy_holonymy – To maintain the meronymy and holonymy type of a relation which is a PART-

WHOLE type of a semantic relationship between synsets.

• wn_rel_troponymy– To maintain the troponymy type of a semantic relationship between synsets.

• wn_rel_causative– To maintain the causative semantic relation between synsets.

Page 10: IndoWordNet Database Design Presented By: Konkani NLP Team Goa University IndoWordNet Database Design 1.

IndoWordNet Database Design 10

• wn_rel_entailment– To maintain the entailment type of a semantic relationship between

synsets.

• wn_rel_similar– To maintain the relation between similar types of synsets

• wn_rel_also_see – To maintain the relation between synsets other than the regular

semantic relations .

• wn_rel_noun_verb_link – To maintain the semantic relation between synsets namely a noun

synset and associated verb synset.

Page 11: IndoWordNet Database Design Presented By: Konkani NLP Team Goa University IndoWordNet Database Design 1.

IndoWordNet Database Design 11

• wn_rel_noun_adjective_attribute_link – To maintain the semantic relation between synsets, namely a noun synset and

associated adjective attribute that go together.

• wn_rel_adjective_modifies_noun– To maintain the semantic relation between synsets namely an adjective synset

and the corresponding noun synset which it modifies.

• wn_rel_adverb_modifies_verb – To maintain the semantic relation between synsets namely an adverb synset

and the corresponding verb synset which it modifies.

• wn_rel_near_synsets– To maintain the near synsets relation between synsets.

Page 12: IndoWordNet Database Design Presented By: Konkani NLP Team Goa University IndoWordNet Database Design 1.

IndoWordNet Database Design 12

• wn_property_antonymy_gradation – To maintain the different types of relation properties, like antonym relation

have properties such as colour, gender, etc.

wn__property_meronymy_holonymy– To maintain the different types of relation properties, for relations like

meronymy, holonymy that have properties like component-object, feature-activity, etc.

• wn_relation_types– To maintain the relation information of all the relation tables.

• wn_semantic_relations– To maintain the semantic relations w.r.t. the synsets.

Page 13: IndoWordNet Database Design Presented By: Konkani NLP Team Goa University IndoWordNet Database Design 1.

IndoWordNet Database Design 13

Tables for maintaining ontology relation

• wn_ontology_nodes– To maintain the different ontology types or positions. (Common

information in English)

• wn_ontology_tree– To maintain the hierarchical relationship of the ontology types. – The root node in the ontology hierarchy has id value = 1.

• wn_ontology_synset_map– To link a synset/concept to a particular position in the ontology.

Page 14: IndoWordNet Database Design Presented By: Konkani NLP Team Goa University IndoWordNet Database Design 1.

14

wordnet_<respective_language>

• The wordnet_<respective_language> database will keep tables which will have information related to the particular language.

• It will include tables to keep synset details, words in the language, examples, etc.

• <respective_language> is to be replaced by any of the languages of the IndoWordNet group. viz. Assamese, Hindi, Konkani, Oriya, Punjabi, Urdu, etc as applicable.

• wordnet_bodo

IndoWordNet Database Design

Page 15: IndoWordNet Database Design Presented By: Konkani NLP Team Goa University IndoWordNet Database Design 1.

IndoWordNet Database Design 15

wordnet_admin

• This database is used to keep other related tables such as:– Feedback table– FAQ table– Website administration tables– User + password table– …

Page 16: IndoWordNet Database Design Presented By: Konkani NLP Team Goa University IndoWordNet Database Design 1.

IndoWordNet Database Design 16

Fig 1: Some of the important tables which are part of the WordNet with colour coding to show common data shared by all languages and data different for each language

Language dependent data

Language independent data

Page 17: IndoWordNet Database Design Presented By: Konkani NLP Team Goa University IndoWordNet Database Design 1.

IndoWordNet Database Design 17

Issues to be Resolved• The tables below:– wn_rel_adjective_modifies_noun– wn_rel_adverb_modifies_verb– wn_rel_noun_adjective_attribute_link– wn_rel_noun_verb_link

-are to be stored as Language independent data or Language dependent data? ( in view of change in POS category reported by language groups)

Page 18: IndoWordNet Database Design Presented By: Konkani NLP Team Goa University IndoWordNet Database Design 1.

IndoWordNet Database Design 18

• In table ‘wn_ontology_nodes’ the data should be only in English and the data in other language can be kept in their respective language database.

• Need to be done NOW To approve master and <respective_language>

tables of each language.

Page 19: IndoWordNet Database Design Presented By: Konkani NLP Team Goa University IndoWordNet Database Design 1.

IndoWordNet Database Design 19

Tools & Scripts• Tool to populate data into the various tables of the database.

• Population of data into tables such as– wn_synset– wn_word– wn_synset_words– wn_synset_example

• Scripts to create language specific data tables.

• Scripts to dump and restore data.

• Scripts to manage/update incremental changes done to tables in wordnet_master

Page 20: IndoWordNet Database Design Presented By: Konkani NLP Team Goa University IndoWordNet Database Design 1.

IndoWordNet Database Design 20

Graphical User Interface to Populate data into the database Tables

Page 21: IndoWordNet Database Design Presented By: Konkani NLP Team Goa University IndoWordNet Database Design 1.

IndoWordNet Database Design 21

Questions?

Page 22: IndoWordNet Database Design Presented By: Konkani NLP Team Goa University IndoWordNet Database Design 1.

IndoWordNet Database Design 22

API’s• An Application Programming Interface (API) is a set of commands,

functions and protocols which programmers can use when building a software.

• It allows the programmers to use predefined functions to interact with systems, instead of writing them from scratch.

• Characteristics of good API– Easy to learn and use, Hard to misuse.– Easy to read and maintain code that uses it.– Is programming language neutral.– Sufficiently powerful to support all computational requirements.

Page 23: IndoWordNet Database Design Presented By: Konkani NLP Team Goa University IndoWordNet Database Design 1.

23

IndoWordNet API• It allows a user to use the API without the knowledge of the database

design.

• The API is object-oriented design.

• The API is designed in such a way that it supports single/multiple languages.

• API design consist of two layers:– Application layer– Database layer

• The Database layer will change depending on the DBMS but the Application layer will mostly remain unchanged.

IndoWordNet Database Design

Page 24: IndoWordNet Database Design Presented By: Konkani NLP Team Goa University IndoWordNet Database Design 1.

IndoWordNet Database Design 24

Application layer

• The Application layer incorporates the logical part of the IndoWordNet requirements, so as to provide classes and objects to perform all the operations to be performed on the synset, relations, ontology, other master data, etc.

• Reference Implementation is being done in Java and PHP.

Page 25: IndoWordNet Database Design Presented By: Konkani NLP Team Goa University IndoWordNet Database Design 1.

IndoWordNet Database Design 25

Application Layer consists of the following classes:

– IWAPIClass• A class that allows to initialise API library for use.• Maintain master tables.• Manage connectivity to language specific databases.

– IWSynset• A class that represents a Synset

– IWWord• A class that represents a Word

– IWSynsetCollection• Collection of Synsets

– IWWordCollection• Collection of words for a synset

– IWOntology• A class that represents Ontology• Each synset is mapped into some place in the ontology tree

Page 26: IndoWordNet Database Design Presented By: Konkani NLP Team Goa University IndoWordNet Database Design 1.

IndoWordNet Database Design 26

– IWOntologyCollection• Collection of child nodes for a given onto node

– IWExampleCollection• Collection of examples

– IWFile• A class that represents a File

– IWDataFile• A class that represents a data file

– IWPictureFile• A class that represents a picture files

– IWFileCollection• Collection of files

Page 27: IndoWordNet Database Design Presented By: Konkani NLP Team Goa University IndoWordNet Database Design 1.

IndoWordNet Database Design 27

• The Application Layer allows us to perform operations such as:– get all the synsets– get various relations for a given synset/ word– get words for a given synset– add a new source or domain– add a new relation– update the records in the table– delete a synset/ source/ domain– modify ontology information

Page 28: IndoWordNet Database Design Presented By: Konkani NLP Team Goa University IndoWordNet Database Design 1.

IndoWordNet Database Design 28

Database layer

• The Database layer deals with encapsulation of the database design.

• It provides a standard interface to the application layer.

• The Database layer supports all the operations needed to be performed on the database.

Page 29: IndoWordNet Database Design Presented By: Konkani NLP Team Goa University IndoWordNet Database Design 1.

IndoWordNet Database Design 29

Database Layer consists of the following classes:

– IWDb• A class that connects to a Language Dependent Database.

– IWCon• A class that sets up a connection to a database

– IWStatement• A class which contains all the queries pertaining to the application layer• Also the basic functions such as updation, deletion, insertion, selection, etc.

– IWResult• A class which returns results to the application layer, the results of executed queries

– IWField• A class which returns to the application, the proper data-type irrespective of the db

data-type or vice versa

Page 30: IndoWordNet Database Design Presented By: Konkani NLP Team Goa University IndoWordNet Database Design 1.

IndoWordNet Database Design 30

Class Diagram

Page 31: IndoWordNet Database Design Presented By: Konkani NLP Team Goa University IndoWordNet Database Design 1.

IndoWordNet Database Design 31

Sample API code• Set up of database connection

– IWDb dbobject = new IWDb ( IWAPIClass.Language_Name);

• Create object for synset– IWSynset synsetobject = new IWSynset ( synsetID, dbobject );

• Get concept for a synset– String concept = synsetobject.getConcept();

• Set concept to a synset– boolean flag = synsetobject.setConcept (“ conceptDefination ”);

• Get word collection for a synset

– IWWordCollection words = synsetobject.getWords();

Page 32: IndoWordNet Database Design Presented By: Konkani NLP Team Goa University IndoWordNet Database Design 1.

IndoWordNet Database Design 32

Questions?

Page 33: IndoWordNet Database Design Presented By: Konkani NLP Team Goa University IndoWordNet Database Design 1.

IndoWordNet Database Design 33

THANK YOU