Top Banner
Thesis HinMaT MT Framework 83 | Page Chapter 3 Study of Hindi & Marathi Languages “Language is a process of free creation; its laws and principles are fixed, but the manner in which the principles of generation are used is free and infinitely varied.” -Noam Chomsky In order to develop a rule based MT, we must have thorough knowledge of the language pair under MT system. Hence vis-a-vis study of the grammar of SL and TL is necessary. This chapter presents our study of Hindi and Marathi from purely linguistic and computational perspective. Complete detailed description of grammar of both the languages is itself a huge topic, hence we have confined our discussion on grammar in the context of HinMaT. However, where ever necessary, we have presented detail discussion on particular topics. One of the objective of this study is to freeze the approach and architecture for HinMaT, which is highly constrained by the linguistic character of the language pair i.e. Hindi-Marathi. Hence as stated earlier, this study was scoped to purely linguistic perspective and MT perspective. This study involved detail study of script i.e. Devanagari, basic alphabet set (Vowels and Consonants), morphology of both languages, part-of-speech types, grammatical categories ( याकर णक को टयाँ ) like gender, number, person, case, tense, aspect, modality & voice, sentence structures, anatomical study of verbs and verb phrases. Based on the study, a paper was presented in All India Conference on Linguistics (AICL) held at Deccan College, Pune (Bhavsar & Pawar, 2008). For carrying out this study, linguistic literature (Sing, 2006), (Sing, 1985), (Sahay, 2000), (Tiwari, 2000) (Pandharipande, 1997) (Dhongade & Wali, 2009), (Kaul, 2008), (Guru, 1920) (Bajpayee K. P., 1959), (Deshmukh, 1990), (Deshmukh, 1990), (Basutkar, 1970), (Hiremath, 1993), (Sabanis, 1974), (Sahay, 2000), (Vishwanathdev), (Valimbe, 1983),
97

Chapter 3 Study of Hindi & Marathi Languages

Dec 09, 2016

Download

Documents

dohanh
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • Thesis HinMaT MT Framework

    83 | P a g e

    Chapter 3

    Study of Hindi & MarathiLanguages

    Language is a process of free creation; its laws and principles are fixed, but the

    manner in which the principles of generation are used is free and infinitely varied.

    -Noam Chomsky

    In order to develop a rule based MT, we must have thorough knowledge of the

    language pair under MT system. Hence vis-a-vis study of the grammar of SL and TL

    is necessary. This chapter presents our study of Hindi and Marathi from purely

    linguistic and computational perspective. Complete detailed description of grammar

    of both the languages is itself a huge topic, hence we have confined our discussion on

    grammar in the context of HinMaT. However, where ever necessary, we have

    presented detail discussion on particular topics. One of the objective of this study is to

    freeze the approach and architecture for HinMaT, which is highly constrained by the

    linguistic character of the language pair i.e. Hindi-Marathi. Hence as stated earlier,

    this study was scoped to purely linguistic perspective and MT perspective. This study

    involved detail study of script i.e. Devanagari, basic alphabet set (Vowels and

    Consonants), morphology of both languages, part-of-speech types, grammatical

    categories ( ) like gender, number, person, case, tense, aspect,

    modality & voice, sentence structures, anatomical study of verbs and verb phrases.

    Based on the study, a paper was presented in All India Conference on Linguistics

    (AICL) held at Deccan College, Pune (Bhavsar & Pawar, 2008). For carrying out this

    study, linguistic literature (Sing, 2006), (Sing, 1985), (Sahay, 2000), (Tiwari, 2000)

    (Pandharipande, 1997) (Dhongade & Wali, 2009), (Kaul, 2008), (Guru, 1920)

    (Bajpayee K. P., 1959), (Deshmukh, 1990), (Deshmukh, 1990), (Basutkar, 1970),

    (Hiremath, 1993), (Sabanis, 1974), (Sahay, 2000), (Vishwanathdev), (Valimbe, 1983),

  • Thesis HinMaT MT Framework

    84 | P a g e

    (Damale M. K., 1965) and few internet sites were referred. Our study is comparative

    as well as contrastive in nature.

    3.1 General Introduction

    India is a very vibrant country full of cultural as well as lingual diversity. According

    to a comprehensive survey by the People's Linguistic Survey of India (PLSI), a public

    consultation and appraisal forum, in 2009-2013, around 780 spoken languages and 86

    scripts are used in the country and country has lost 250 languages in last 50 years.

    PLSI survey was conducted in 04 years time in collaboration with 85 institutions and

    universities involving 3000 experts. This is the first survey in independent democratic

    India, after earlier linguistic survey conducted by Irish linguistic scholar George

    Abraham Grierson during 1898 to 1928 (Indian Express article dated 16th July 2013).

    Out of these languages, 122 languages (already recognized in government census) are

    spoken by population more than 10,000 people, while others are spoken by less than

    that number. Government of India, under schedule 8 of Indian constitution, has

    recognized 22 languages as official languages of Indian union. These 22 languages

    includes Assames, Bengali, Bodo, Dogri, Gujrati, Hindi, Malayalam, Manipuri,

    Marathi, Nepali, Oriya, Punjabi, Sanskrit, Kannada, Kashmiri, Konkani, Maithili,

    Santali, Sindhi, Tamil, Telugu, Urdu. It means language pair under the HinMaT

    purview is recognized under schedule 8 of Indian constitution. Figure 3.1 below

    shows the language map of India, which clearly reveals that Hindi is largest spoken

    language of India, which is spoken in around 10 states. After independence, Hindi

    written in Devnagari has been declared as the official language of Indian union

    (Indian Constitution, Official Language Act-1963, Part 17, article 343, section A);

    while Marathi was declared as official language of Maharashtra state vide

    Maharashtra Official Language Act 1964 of Government of Maharashtra. Both

    languages belong to Indo-Aryan language family, which is sub set of Indo-European

    language family, which is the worlds biggest language family.

    Globalization has changed the lingual character of countries worldwide; hence

    different languages have crossed their traditional jurisdiction and are now spoken in

    many parts of the world. Besides India, Hindi is spoken in other parts of the world

    also, majorly including Pakistan, Bangladesh, Nepal, UAE etc., Marathi is widely

    spoken by Marathi community in Maharashtra and rest of the country as well as

  • Thesis HinMaT MT Framework

    85 | P a g e

    Mauritius & Israel due to mass migration of labour class. Today Hindi is ranked as

    worlds 4th largest spoken language and Marathi is worlds 16th most spoken language

    (Lewis, 2014).

    Figure 3.1 Language Map of India (courtesy: http://www.mapsofindia.com)

    3.2 History of evolution and development

    The history of origin and evolution of Hindi and Marathi (Sing, 2006), (Guru, 1920),

    (Pandharipande, 1997) is shown using following tree diagram on time scale (Figure

    3.2):

  • Thesis HinMaT MT Framework

    86 | P a g e

    Figure 3.2 Hindi-Marathi Language Evolution Tree

    From the above Figure 3.2, it is clear that both languages have their root in Sanskrit

    and evolved after Apbhransha ( ) era. Milestone eras of development of these

    languages as excerpted from (Sing, 2006), (Deshmukh, 1990) are presented in the

    following Table 3.1.

    Table 3.1 Historical Milestones in development of Hindi and Marathi

    Period Ruler Highlights

    Hindi

    1000-1200 AD

    Early PeriodHindu Rulers

    Last Hindu Rule phase Use of Apbhransha Languages in Literature Shauraseni Apbhransha become language for

    literary work in Northern Indian literature Evolution of Urdu and Khadhi boli from

    apbhransh

    1200-1500 AD

    Pre-Medieval

    Period

    Muslim Rulers

    Heirs of Ghulam family, Khilaji family, Tuglagfamily and Lodi family rulers ruled country.

    Spiritual Era: Amir Khusaro(1225-1325),Kabir(1378-1448), Guru Nanak(1469-1538),Sant Dnyaneshwar, Sant Namdev

    Ending of Apbhransha and rising of Khadi boli Spread of Khadi boli in south(Dakhhani) Pharsi became official language of India

  • Thesis HinMaT MT Framework

    87 | P a g e

    1500-1800 AD

    Post-Medieval

    Period

    Muslim(Mughal)

    Rulers

    Era of peace and prosperity (Akbar, Jahangir,Shahjahan)

    Encouragement for Arts, Languages andLiterature.

    Spread of Hindi during Mughal rule

    1800-1947 AD

    Modern Period British Rule

    Fall of Mughal rule Use of Khadi Boli for literature preparation

    and propagation Translation of Bible in Hindi Hindi played crucial role in Independence

    movement. Gandhijis role in popularization of Hindi at

    national level. Era of standardization and research and

    development initiatives for Hindi Grammar Inclusion of Hindi in education system through

    Fort Williams College.

    1947-Till date

    Post

    Independent

    era

    Indian Republic

    Modernization and Standardization of Hindi Hindi declared as official language of Indian

    union vide article 343 of the Indianconstitution.

    Enforcement of Official Language Act 1963 Official Language rules (1976/1987) Establishment of Central Hindi Directorate,

    Central Hindi Training Institute, CentralTranslation bureau(CTB), Department ofofficial languages

    Development of Hindi font by Gist group, C-DAC, Pune

    Development of Software like dictionaries,encyclopedia, etc. for Hindi language,

    Emergence of Unicode (1987).

    Marathi

    605 AD - First inscription at Shravan Belgolar

    (Mysore), Karnataka in support of existence ofMarathi.

    1250-1350 AD Yadav era Important literary work: Mukundraj-Vivek Sindhu Dnyaneshwar Dnyaneshwari Mahanubhav Panth LilaCharitra

    1350-1600 AD Bramhani era Important literary work: Sant Eknath Ramayana and Bhagwat Dasopant Mitarnav,Padarnav

    1600-1700 AD Shivaji era Important literary work: Sant Ramdas Dasbodh and Manache

    Shlok Tukaram Abhang

  • Thesis HinMaT MT Framework

    88 | P a g e

    Vaman Pandit Yatharth Dipeeka Raghunath Pandiit Damayanti

    Swayamvar Muketeshwar Ramayana,

    Harishchandrakhy- an

    1700-1800 AD Peshava Rule

    Important literary work: Moropant Sanskrit Granth, Mahabharat,

    Ramayana Kavya Ram Joshi Chhand-manjiri Shridhar Harivijayy, Pandav Pratap,

    Shivlilamrut

    1800- till dateBritish Rule to

    Indian Republic

    New dimensions to literature by variousscholars Marathi Grammar(Dadoba) Use of Marathi in Mathematics,

    Geography, Astrology, Newspaper, Dramawriting, Spiritual Granthas, TranslatedGranthas

    3.3 Script ( )

    Hindi as well as Marathi has officially adopted Devnagari script for writing purpose.

    It is important to note here that during 17th century, alternative cursive script called

    modii was used in official documentation by Hemadpanth () rule as well as

    during Shivaji era, it was replaced by Balbodh(based on Sanskrit Devnagari) in 1917

    during British rule. Balbodh script was used for writing Marathi poetry. Devnagari

    script is based on Brahmi script (Abugida family of writing system). The Brahmi

    script is said to have come into existence in 500 BC (though there is dispute on this

    amongst scholars). When it came to India, it got split into two streams North Stream

    and South Stream and they got transformed into different scripts. The details of this

    evolution (Sing, 2006) are presented in the chart form below (pl. see Figure 3.3):

  • Thesis HinMaT MT Framework

    89 | P a g e

    Figure 3.3 Evolution of Devnagari Script

    Like most of the world languages, Devnagari is also written from left to right. Due to

    its syllabic nature and alphabet set, it is scientifically a perfect script than other

    existing scripts in the world, like Chinese, Arabic and Roman because, every sound

    can be expressed in alphabet. Like Roman script, it does not have provision for capital

    and small letters. Traditional Devnagari alphabet set has 52 characters. It does have

    cluster consonants ( ) , , , Maatras and special symbols. The

    alphabet set is exhaustive and was augmented due to influence of foreign languages

    like Arabic-Pharsi (, , , , ) and English (, ). Some vowels,

    especially from Sanskrit have been deprecated (, , ) due least usage. The

    Devnagari alphabet set as adopted in Unicode is given in Appendix A. The Unicode

    set contains the original character set (along with deprecated letters) plus extended

    characters.

    3.3.1 Hindi Alphabet set ( )

    We found that there is disagreement in linguistic community with regards to number

    of consonants and vowels from original Devnagari alphabet set. Here we have

    presented the alphabet set (pl. see Table 3.2 below) as dictated by Kendriya Hindi

    BramhiScript(500 BC)

    NorthStream

    Gupta Lipi(400-500 AD)

    Kutil Lipi(600 AD)

    Ancient Nagari(900-1100 AD)

    Bangla Devnagari

    Modernization(15th Century)

    FurtherModifications(19th Century)

    Gujrathi Asamiya

    SouthStream

    Nandnagari

    Tamil Kannada Telugu Malayalam

  • Thesis HinMaT MT Framework

    90 | P a g e

    Sansthan, Agra, which is government of India funded autonomous institution, set up

    especially for promotion of Hindi.

    Table 3.2 Hindi Alphabet Set ()

    (Vowels) : 15 a aa i ee,ii,I u oo,uu,U e ai,ei o

    au,ou a^ a:,aH A O Ria (Consonants) : 33 ka,ca kha ga gha Nga cha chha ja jha Nja Ta Tha Da Dha Na ta tha da dha na pa pha,fa ba bha ma ya ra la va,wa sha Sha sa ha

    xa,kSha tra Gya, jNja, dny D_a Dh_a (Nuktas) : 05 qa Kha Ga za Fa (Matras) : 12 aa i ee,ii,I u oo,uu,O R e ai,ei o au,ou A

    O , (special symbols): 03

    ^ H M

    3.3.2 Marathi Alphabet set ( )

    Traditionally, Marathi alphabet set has 52 alphabets, which includes 16 vowels and 36

    consonants. As Marathi follows the Sanskrit alphabet set, there are few vowels and

    consonants which are typically found in Sanskrit; hence most of grammarians have

    recommended deletion of such vowels and consonants from Marathi Alphabet set.

    These includes consonants like , , and whereas leading linguist M. K.

    Damale is of the opinion that only cluster consonants and may be deleted. Other

  • Thesis HinMaT MT Framework

    91 | P a g e

    linguist M. P. Sabanis is against of deleting any consonant. Arvind Manglurkar has

    advocated for only 22 consonants. With respect to vowels M. K. Damale suggested

    deletion of long , , and as they are used only in Sanskrit. Arvind

    Manglurkar has recommended 07 traditional (excluding , , , , ,

    ), while Dr. Lila Govilkar is in favor of 9 traditional (, , , , , ,

    , ) and 02 from English (, ). Ideally consonants are written using halant

    () i.e. , , , etc. but they cant be pronounced in their bare form, hence

    while representing them, we have used their vowelized letter forms, which can be

    pronounced (Pl. see following Table 3.3).

    Table 3.3 Marathi Alphabet Set ()

    50

    , , , , , , , , , , , , , (14), , , , ,

    , , , , , , , , , , , , , , , , , , ,

    , , , , ,

    So, if we compare the alphabet set of Marathi and Hindi, we can find that the alphabet

    is not present in Hindi, while Hindi alphabets which are added due to influence of

    Pharsi and Urdu like , , , , as well as , are not found in

    Marathi. On Vowel side they are same with exception of but as already stated

    many people have advocated for its deletion from the alphabet set of Marathi.

    Regardless of these conflicting views and controversies on accepting alphabet sets, we

    are of the opinion that whatever is available in Unicode character set for Devnagari

    and whatever is required by all sections of society i.e. lay man, intellect, writer, poet,

  • Thesis HinMaT MT Framework

    92 | P a g e

    government and private sector employees, business community etc. should be

    acceptable.

    In Many cases, Hindi and Marathi differ in manner of spelling vowels, transformation

    from short length vowel ( ) to long length vowel ( ) and vice-a-versa is a

    common phenomenon. This is common source of spelling mistakes by speakers of

    these languages, e. g. -, -. As a raw observation, we have

    witnessed common transformations like -, -, -/, -, - in

    few alphabets. These are illustrated in the following table (Table 3.4) below:

    Table 3.4 Alphabet Transformations in Hindi-Marathi Spellings

    Hindi Marathi Example(Hindi-Marathi)

    - , -

    - , -, -

    / -, -

    -, -

    -, -

    -

    -, -

    , -

    - , -

    3.4 Language Structure ()

    Language is collection of sentences, which are formed using words in its vocabulary

    and grammar rules. Vocabulary in normal sense refers to the collection of words in a

    given Natural Language. Before, we discuss the nature and structure of Hindi and

    Marathi vocabulary, few notions in the context of vocabulary are discussed first.

  • Thesis HinMaT MT Framework

    93 | P a g e

    3.4.1 Word

    A meaningful cluster of alphabets is called as word. Every word has definite meaning

    and its independent existence. When word is employed in a sentence, it may undergo

    some changes (mostly morphological)6 and it is transformed into new form called

    Pada, hence Lexicon or Dictionary always contains words and not Padas.

    3.4.2 Classification of Words

    Words can be classified using different dimensions of abstraction such as its

    grammatical category, morphology, vocabulary etc. Language vocabulary gets

    enriched from various sources. For Indian Languages, words are primarily classified

    using four main categories as depicted in (Sharma), these are formally presented in

    Figure 3.4 below:

    6 Morphological changes inflect words with Gender, Number, Person, Case, Tense, Aspect and Moodsuffixes (last three applies only to verb category).

  • Thesis HinMaT MT Framework

    94 | P a g e

    Figure 3.4 Word Classification

    WordClassification

    Criteria

    Part-of-Speech(POS)

    Category

    Noun/Pronoun/Verb/Auxiliaryverb/Adjective/Adverb

    Construction

    ( )

    Traditional ( )

    Conjunct ( )

    suffixing ( )

    Sandhi

    Compounding()

    Prefixing ()

    Origin

    ( )

    Tatsam( )

    Tadbhav( )

    Deshaj()

    Foreign words ( )

    Arabic()

    Farsi()

    Turki( )

    English( )

    Portuguese ( )

    Other Indian Languages

    ( )

    Meaning

    ( )

    Synonyms ()

    Polysemic( )

    Antonyms( )

    Usage

    ( )

    Declinable

    )

    Noun

    Pronoun

    Adjective

    Verb

    Indeclinable

    )

    Adverb

    Conjuctives

    Relative post position

    Vocatives

  • Thesis HinMaT MT Framework

    95 | P a g e

    3.4.2.1 Part-of-Speech (POS)

    Every word impart a syntactic role, when employed in a sentence, this syntactic role is

    its lexical category, which is called as, part-of-speech (POS). The term POS is also

    referred as word type or syntactic category or simply category. Universally eight

    basic POS categories have been recommended, these are: Noun, Adjective, Pronoun,

    Verb, (Auxiliary verb), Adverb, Pre-positions/Post-positions, Conjunction and

    Interjection. Besides these, Determiner/Article and Auxiliary verb, which are

    subclasses of Pronoun and Verb, are also treated as separate POS categories. Each of

    the POS type can be further sub classified into hierarchical level of any depth, thus

    defining ontology. Both Hindi and Marathi both follow the universal POS categories,

    Pre-positions are not observed in Hindi as well as Marathi. Following table (Table

    3.5) lists the POS categories of English, Hindi and Marathi.

    Table 3.5 English-Hindi-Marathi POS categories

    Sr.No. English Hindi Marathi

    1 Noun 2 Adjective 3 Pronoun ,

    ( : , )

    ( : ,

    )

    4 Verb/

    (Auxiliary verb)

    , ,

    5 Adverb 6

    Prepositions

    Post-positions, ,

    Post-positions , ,

    7 Conjunction

    8 Interjection

    (exclamatory)

  • Thesis HinMaT MT Framework

    96 | P a g e

    Except verb, which is complex category, other POS categories have conventional

    meaning/sense and are comparatively easy to understand; hence we are skipping detail

    discussion on other POS categories. Since verb is very important lexical category

    from computational perspective, we are presenting detail discussion on verbs. Sample

    words for each POS category from Hindi and Marathi are listed in Appendix-B.

    3.4.2.1.1 Verb category

    Verb is very crucial lexical category as far as sentence analysis and construction is

    concerned, because verb denotes the action/message specified in the sentence. Entire

    sentence is anchored at verb. Verb system of any language including Hindi and

    Marathi is complex because an action may be expressed using combination of lexical

    categories (mostly noun, adjective, and verb) and intensifiers, verbalizers and

    auxiliary verbs. Verb classification is interesting problem, because verbs can be

    classified using different levels of abstractions such as lexical, functional (predicate-

    argument structure) and semantic (action scope) levels.

    At lexical level, verbs are primarily classified as main verb and auxiliary verb.

    Auxiliary verbs assist the main verb to specify grammatical information pertaining to

    verb action such as tense, mood and aspect (these are discussed in section 3.5.2.5)

    ahead. The auxiliary verbs in Hindi and Marathi7 can be further sub classified as

    forms of / , modal auxiliaries like /, /, and

    auxiliaries denoting TAM and Voice. Main verb can be classified as simple verb (verb

    root), complex verb ( ), compound verb ( ), and conjunct verb

    ( ) (Sing, 1985).

    Simple verbs are formed from verb roots like (eat), (drink), / (wash),

    (beat) etc., while complex verbs ( ) are formed with the help of noun or

    adjective and verbalizer ( ). Verbalizer , / are most frequently used,

    while in idiomatic/conventional complex verb constructions, other verbalizers like

    , , , , etc. are also used. These verbalizers can also be

    7 x/y forms denote word x in Hindi and y in Marathi, unless specified.

  • Thesis HinMaT MT Framework

    97 | P a g e

    used as intensifiers. Compound verb ( ) is constructed using lexical verbs

    ( ) (generally two verbs) and an intensifier ( ). Intensifier, itself

    can appear as simple verb. In complex verbs the intensifier losses its own lexical

    meaning rather, it is used to intensify the meaning of lexical verb. Around 18

    intensifiers are observed in Hindi, but primarily only eight are used. These are /

    (to go), / (to take sense), / (to come) , /

    (obligation sense), /, /, /, / .

    Compound verb is peculiarity of Indian languages. Conjunct verb ( ) is

    constructed with the help of two lexical verbs, such that meaning of both verbs is

    preserved rather such verbs indicate multiple actions. E.g. /

    (delivering and returning back) would mean going-giving and coming back. Use of

    between two verbs is also observed like (delivering and returning

    back).

    From functional point of view (argument structure wise) verbs can be classified as

    intransitive (), transitive (), di-transitive/co-agentive ( / ),

    subject + ( ), linking verb ( ), verbs taking object compliment

    (), and causatives ( ) (Sing, 2006).

    At meaning level (verb action scope) verbs are classified as finite and non-finite verb

    forms. Finite forms denote finiteness of verb action (constrained to time), while non-

    finite forms express non-finiteness of verb forms. It is important to note that the only

    finite form verbs can serve as root (main verb) of a sentence. Non-finite forms are

    used in adjectival and adverbial sense. Non-finite forms are further subcategorized as

    infinitives and participles. Participles are further classified as perfective participle,

    imperfective participle and conjunctive participles (Kaul, 2008). Complete verb

    classification is shown in following chart (Figure 3.5).

  • Thesis HinMaT MT Framework

    98 | P a g e

    Figure 3.5 Verb Abstraction Classification

    Various types of verbs as listed in this classification are explained in brief below:

    Intransitive (): Intransitive verb take only subject (Karta) as its mandatoryargument, it cant take any object (Karma). Verb agrees with GNP of subject (Karta)

    except in past tense constructions where Karta takes post-position marker and hence

    does not agree with verb in such forms. Hindi verb root and Marathi verb root,

    are examples of intransitive verbs. Example sentences for these verbs are:

    Hindi: (subject) (Shoham slept)

    Marathi: (subject) . (Shoham slept)

    Transitive (): Transitive verbs take subject (Karta) as well as object (Karma).When subject takes post position marker verb agrees with object in GNP.

    E. g. Hindi: (Shyam ate Mango)

    Marathi: . (Shyam ate Mango)

  • Thesis HinMaT MT Framework

    99 | P a g e

    Linking Verbs ( ): Linking verbs are used in copula sentences, where they

    link the subject with predicate (complement/). Noun, adjective and adverb can

    appear as the predicate. Hindi verb root form of and Marathi verb root form of

    are mostly used as linking verbs.

    E.g. Hindi: i) (subject) (noun predicate) (Rafiq is my servant)

    ii) (subject) (noun predicate) (Radha is beautiful)

    Marathi: i) . ( Rafiq is my servant)

    ii) . (Radha is beautiful)

    Di-transitive/Co-agentive ( / ): Di-transitive verb forms take 03arguments, subject, direct object and indirect object. Di-transitive constructions are

    also referred as dative constructions.

    E.g. consider following illustrations,

    Hindi: i) (subject) (indirect object) (direct object) (Rameshgave money to servant)

    ii) (co-agent: Ablative, source) (Gita took money frommother)

    iii) (co-agent: Instrument) (Prashant opened the lockwith key)

    Their Marathi equivalents are:

    i) .

    ii) (co-agent: Ablative, source) .

    iii) (co-agent: Instrument) .

    Object Compliments ():

  • Thesis HinMaT MT Framework

    100 | P a g e

    The object in such verbs requires a complement () to complete the meaning ofsentence. It is important to note that these verbs are not classified under di-transitive

    though their argument structure appears to be like di-transitive.

    E.g.

    Hindi: i) (subject) (object) (object complement) (Weelected Ramesh as our leader)

    ii) (object) (object complement)

    Marathi: i) (object) (object complement) .

    ii) (object) (object complement) .

    Causative verbs ( ):

    Two types of causatives i.e. Causative-1 and Causative-2 are witnessed in Hindi as

    well as Marathi. Intransitive and transitive verbs can be causitivized by introducing

    causitivization suffixes. Causative-1 constructions involve two essential entities,

    causer (sponsor), and agent (subject), followed by optional object8 (karma) and

    causative form of the verb. Here sponsor or causer functions as grammatical Karta

    which agrees with verb, while the agent (Karta) from real world point of view (

    ) takes post position marker (quite often ko:). Causer in causative-1 formsparticipate in action. Causative-2 verb forms require a mediator to perform verb

    action. Causer in such cases is only triggering the verb action, mediator is

    participating in the process to aid the agent ( ) for performing verb action.Intransitive as well as transitive verb forms can be transformed into causative-2 verb

    forms.

    It is important to note here that not all verb roots can be causitivised (Basutkar, 1970).

    E.g. Hindi verb roots like (come), (go), (want) etc. cant be causitivized,

    the same is true for Marathi also. Use of causatives is not as frequent as Hindi, in

    8 Presence of object depends upon the functional nature of verb i.e. intransitive or transitive.

  • Thesis HinMaT MT Framework

    101 | P a g e

    Marathi. Hindi suffixes -, , - are mainly used for obtaining causative

    forms. Marathi causitivisation is done in two ways, first by adding causative suffixes

    or -9 with some morphological transformations and secondly by adding

    additional affixes like , , and , intensifier. First one is more popular and

    frequent. Following table (Table 3.6) shows the causative suffixes for both Hindi as

    well as Marathi (Basutkar, 1970) in both causative forms along with examples (note

    that / in the following table indicate option).

    Table 3.6 Causative suffixes in Hindi and Marathi

    Language Causitive-1 suffix Causitive-2suffix

    Example

    Hindi

    , -, , ,

    E.g.

    -,- ,-,,

    , -,-

    E.g.

    -,- ,-

    Causitive-1:

    i) (causer) (agent) ii)(causer) (agent) Causitive-2:

    i) (causer) (mediator) (agent) ii) (causer) (mediator) (agent)

    Marathi

    Type1:suffixing

    -,-,-E.g.-/ , -/ -/ , - Type2:transformations

    , , E.g.

    -, -,

    Type1:suffixing

    -,-,-E.g.

    -/ ,-/ ,-/ ,-/

    Type2:usingintensifiers incompound verb

    Causitive-1:

    i) (causer) (agent) .ii) (causer) (agent) .iii) (causer) (agent) .iv) (causer) (agent) .v)

    9 Marathi uses same suffixes for both causative forms i.e. causative-1 and causative-2.

  • Thesis HinMaT MT Framework

    102 | P a g e

    -, - , , .Causitive-2:

    i) (causer) (mediator) (agent) .ii) (causer) (mediator) (agent) .iii) (causer) (mediator) (agent) .

    Infinitives ( ): These are also called as gerunds and are used as noun(abstract class) or adjective. In Hindi, infinitive form is obtained by adding suffix -

    to verb stem. It may also take object as its argument. Adjectival use of infinitives is

    observed in verbs of obligation like , . In Hindi, if infinitive is transitive, it

    may take inflected forms of suffix - i.e. , -. Marathi infinitives use suffix -

    with stem to mark infinitive.

    E.g. Following examples show use of Infinitives which are marked with underline,

    while infinitival suffixes are represented with bold face.

    Hindi: (noun) (Running is good for health)

    Marathi: (noun) .

    Hindi: (object) (He is fast in book reading)

    Marathi: (object) .

    Hindi: (object) (I will have to drink the medicine)

    Marathi: (object) .

    Hindi: (object) (I want to drink water.)

  • Thesis HinMaT MT Framework

    103 | P a g e

    Marathi: (object) .

    Participles ( ): Participles in Hindi work as adjectives and adverb. They are

    further classified as perfective, imperfective and conjunctive participles (Kaul, 2008).

    Perfective participles indicate completed activities, while imperfective denote

    unfinished actions.

    In Hindi, imperfective participles are formed by adding suffix, - (ms)10, - (fs),

    - (mp), - (fp) to stem to agree with noun in GNP, in case of adverbial use, only

    - suffix is used, while for adjectival usage, all suffixes are used. Adverbialimperfectives may be reduplicated and used in time expressions. Adjectival

    imperfective as well as perfective participles are expanded with simple present

    inflections of Hindi Auxiliary verb i.e. (ms), (*p), (fs).

    Perfective participles in Hindi are formed by adding suffixes -, -, -. Likeimperfective, perfective participles can be used adjectivally or adverbially. Perfective

    adverbial participles are often reduplicated. Conjunctive participles are used in

    sentences where two actions share same subject and these actions are temporal in

    nature where first one is antecedent of other. In such sentences the first verb appears

    in stem form followed by purvkalik krudant ( ) suffix (kar), while

    second verb take other conjugation suffixes. Examples of each type of participles are

    given in the following table (Table 3.7), note that the suffixes are marked with bold,

    while participles are underlined in examples.

    Table 3.7 Participle Usage and Examples

    Participle Usage Example

    Imperfective Adjectival

    Hindi: Marathi: .(Running boy fall down)Hindi:

    10 ms-masculine singular, mp-masculine plural, fp-feminine plural, fs-feminine singular, *- anything(mor f/s or p)

  • Thesis HinMaT MT Framework

    104 | P a g e

    Marathi: .

    Adverbial

    Hindi: (Radha met me while returning back fromthe school.)

    Marathi: .

    Perfective

    Adjectival

    Hindi: (ms)/ (fs)/(mp)(//) (ms)/(fs)/(mp)(Sitting man/woman/boys)

    Marathi: (ms) / (fs)/ (mp)

    Adverbial

    Hindi: (ms)/(fs) / (The man/woman sitting on the roof wassinging)

    Marathi: (ms)/ (fs)(ms)/(fs) /.Hindi: - (I got tired of sitting ideal at home.)Marathi: - .

    Conjunctive

    Hindi: (verb1 stem+) (verb2)(He wrote the letter, after reading thenewspaper.)

    Marathi:

    .

    Hindi: (conjunctivemarker) (He took tea, after finishing the work,).Marathi: .

    AdverbialHindi: (He went and came back)

  • Thesis HinMaT MT Framework

    105 | P a g e

    Marathi: .

    Fixed expressions

    Hindi: (I specially meet him.)

    Marathi: .

    3.4.2.2 Construction

    Words can be classified on the basis of their construction. There are two sub

    classifications under this categorization viz. Traditional( ) and Conjunct( ).Traditional words are those words which have been in practice since past and accepted

    as part of tradition, while conjunct words are formed by conjoining two or more

    words. This conjoining may be done through affixation, Sandhi, Compounding

    (). Affixation is done using suffixes ( ) and prefixes ().

    3.4.2.3 Origin ( )

    There are four subtypes under this type i.e. Tatsam ( ), Tadbhav ( ), Deshaj

    () and Videshaj ( ). Most of these words are loan words or transformations

    of loan words except for Deshaj category.

    Tatsam words are loan words borrowed from Sanskrit and used in as-it-is form.Tadbhav words are borrowed Sanskrit words that have undergone some

    transformations.

    Deshaj words are words not borrowed from other Indian languages or Sanskrit

    language but came from dialects and have strong influence of local culture and

    lifestyle.

    Videshaj words are borrowed strictly from foreign (non-Indian) languages.

    3.4.2.4 Meaning ()

    This typically refers to semantic ontological classification like synonymy, antonymy,

    and polysemy. Synonymy talks about meaning equivalence between two different

    words, while antonymy relates words with opposite meaning. Polysemy refers to

    multiple meanings associated with the same word.

  • Thesis HinMaT MT Framework

    106 | P a g e

    3.4.2.5 Usage ( )

    This classification is done purely on linguistic consideration of morphology and part

    of speech (POS). Declinable words are those words that undergo morphological

    inflections due to Gender, Number, Person and Case, while indeclinable words do not

    change at all.

    3.4.3 Hindi & Marathi Vocabulary

    Hindi and Marathi are rich languages in terms of their vocabulary (Deshmukh, 1990).

    Their vocabulary consists of all types of words as expressed in Figure 3.4 above.

    Since Maharashtra share its state border with neighboring states like Karanata, Andhra

    Pradesh, Gujrat and Madhya Pradesh (Dhongade & Wali, 2009), Marathi vocabulary

    is enriched by Telugu, Kannada, Gujrati and even Hindi. Hindi and Marathi languages

    also have good amount of Tatsam and Tadbhav words as well as foreign words (pl.

    see Appendix B). It is interesting to see that Hindi and Marathi vocabulary are highly

    influenced by Sanskrit. It is common to observe words having same origin but

    different meanings and same origin and different spelling. Consider following

    examples from (Deshmukh, 1990):

    E.g. 1) same source different spellings:

    English word Guest: (Hindi) and (Marathi)

    English word Committee: (Hindi) and (Marathi).

    2) Same source different meaning:

    Hindi word means (attempt), while in Marathi same word means

    (mockery), similarly (Hindi) means education, while in

    Marathi it means punishment.

    3.5 Morphology ( )

    Morphology is branch of linguistics that deals with deriving new word forms from

    the language vocabulary. Morphology attracted special attention of scholars after

    advent of NLP and when practical NLP application development started. Every

    natural language in the world has its own morphology system. In Indian context, study

  • Thesis HinMaT MT Framework

    107 | P a g e

    of morphology is very important aspect because Indian Languages are

    morphologically very rich. Morphology helps a word to get employed in a sentence by

    deriving Pada. The linguistic literature (Kaul, 2008), (Bajpayee K. P., 1959), (Guru,

    1920) prescribes two kinds of morphologies viz. Inflectional and Derivational

    morphology. The derivation process may or may not change the Part of Speech (POS)

    category of a word. Based on later phenomenon morphology can be classified as

    inflectional or derivational. Before, we initiate discussion on these types, lets

    discuss detail structure of a Pada.

    3.5.1 Anatomy of Pada

    For study purpose the Pada is divided into two parts Prakriti and Pratyaya (Affix).

    Group of Prakriti and Pratyaya is called as Tidant( ) (Deshmukh, 1990). Prakriti

    part can be further classified as Pratipadik (stem) and Dhaatu (Verbal root). The term

    Pratipadik (stem) is used to denote non-verb POS categories, while Dhaatu (Verbal

    root) is used for referring to Verb. Partipadik can be subtyped as Vyutpanna

    (constructed) and Avyutpanna. Vyutpanna (constructed) Pratipadiks are constructed

    by affixation process. Compounds (Saamasik words) are also treated as Vyutpanna.

    The Pratyayas (Affixes) are classified as grammatical and Vyutpadak krut () and

    taddhit ( ). The detail classification is shown chart (pl. see Figure 3.6 below)

    krut affixes gets conjoined only with Dhaatu, while taddhit can get conjoined only

    with Pratipadiks. Both Hindi and Marathi Padas follow same structure as depicted in

    Figure 3.6 below. It is important to note here that both languages have only prefixes

    and suffixes but no infixes. Words formed by suffixing krut suffixes to Dhaatu are

    called as kridant ( ), which can further take taddhit Suffix(es).

    Words formed by suffixing taddhit suffixes can recursively take additional taddhit

    suffix(es) (not more than 2-3 levels). Both Marathi and Hindi have borrowed prefixes,

    from Sanskrit as well as foreign languages like Farsi and Arabic etc. List of sample

    prefixes and suffixes (Sharma), (Deshmukh, 1990), (Pandharipande, 1997),

    (Dhongade & Wali, 2009) is presented in the following Table 3.8 below. Discussion

    on Grammatical suffixes is presented in following section of this chapter.

  • Thesis HinMaT MT Framework

    108 | P a g e

    Table 3.8 Sample Affixes used in Hindi/Marathi

    Affix Type Hindi Marathi

    Suffix

    Sanskrit:-, -, -, - , -, -, -, - , -, -, -,,-, - ,-, -, -, - , -,- , -, -, -, - , -, -, -, -,-,-, -

    Arabic/Farsi:-, -, -, -, -,-, -, -, -, -, - , -, -, -, -,

    Sanskrit:-,-,-, - ,-, -,-, - , -, -, -, -, -, - ,-, -,-, - ,-, -, - ,- ,-, - , - , -, , -, -, -

    Arabic/Farsi:-, - , -, -, -, -, -, -, - , -,-, -, -/-, -, -, -, -,-, -

    Prefix

    Sanskrit: -, -, -, -, -,-, -, -, -, -, -, -, -, -, -, -, -, -, -, -, -, -, -

    Arabic/Farsi:-, -, -, -, -, -, -, -, -, -, -, -

    Sanskrit:-, -, -, -, -,-, - -,-, -,-, -, -, -, -, -, -, -, -, -, -, -,-, -, -, - -, -Arabic/Farsi:-, -, -, -, -, -,-, -, -

  • Thesis HinMaT MT Framework

    109 | P a g e

    Figure 3.6 Anatomy of Pada

    3.5.2 Grammatical Categories ( )

    Words employed in a sentence (Padas) convey two types of information viz. lexical

    meaning ( ) and grammatical information ( ). Lexical

    meaning reflects some object or concept in the physical world, whereas the

    grammatical information includes attributes/features like Gender, Number, Person,

    Case, Tense, Aspect, Voice and Mood. The Gender, Number, Person, Case are

    collectively referred as GNPC, while Tense, Aspect, Mood are acronym as TAM.

    Lexical meaning convey semantic character of word, which is called logical category

    and the grammatical information, is called as grammatical category (Deshmukh,

    1990) (Sing, 2006). This grammatical category is often confused with part-of-speech

    (POS) category, which is actually the lexical category of word. E.g. consider a

    sentence (Girl is eating chapatti). Here word (girl) conveys

    the lexical meaning about a female human being and grammatical information that

    word has feminine gender and singular (only one) number, while verb phrase

    tells about the action called eating in present continuous tense. Grammatical

    = +

  • Thesis HinMaT MT Framework

    110 | P a g e

    categories are scoped only to grammatical world. They may or may not have

    relevance in physical world context. E.g. Grammatical category Gender, may or may

    not have any resemblance to the concept of biological sex because if we consider

    English word chair, it as such does not have any biological sex in physical sense, it

    has feminine gender in Hindi and neuter gender in English. In our opinion this is more

    a matter of convenience for coping with grammar world than physical world. As

    stated above, 07 such grammatical categories like Gender, Number, Person, Case,

    Tense, Aspect, Voice and Mood have been considered. The later four are specifically

    applicable to only verb and auxiliary verb, while first three are applicable to noun and

    adjectives. Their detail classification is shown in Figure 3.7 below. Verb does not

    have gender or number, person by nature but verbs do get inflected for GNP features

    because for maintaining the harmony of sentence construction, verb form in a

    sentence has to agree with either subject (Karta) or object (Karma) or none. Hence

    does get inflected due to gender, number, person and case. Following section

    discusses these categories to sufficient detail level, first in general context and then in

    specific context of Hindi and Marathi.

    Figure 3.7 Grammatical Categories ( )

    3.5.2.1 Gender

    This important category is amongst the most discussed grammatical categories in

    linguistic literature. As stated in opening discussion (pl. see 3.5.2), the concept of

    Gender ( ) is inspired from biological sex ( ) which is

    Verb/Aux Verb

    Noun/Adjective

    Tense ()Aspect ( )Mood ( )Voice ( )

    Gender ( )Number ()Person ( )Case ()

  • Thesis HinMaT MT Framework

    111 | P a g e

    natural phenomenon in all living beings but at times it may or may not have any

    resemblance with natural sex, in such cases it is more a grammatical convenience.

    Gender of a word is scoped to grammar space, while biological sex is scoped to

    physical world. Every language in the world has adopted variable number of

    permissible genders ranging from two to up to 30 (Deshmukh, 1990). By and large

    most of the languages follow three gender ( ) system. These include masculine,

    feminine and neutral genders. As a thumb rule the animate things are classified as

    masculine or feminine according to their biological sex and non-living things are

    normally put under neutral gender class. This rule is followed in Sanskrit. Most of the

    Indian languages follow three genders system, while Hindi has only two genders:

    masculine and feminine. Gender fixing and identification is common problem across

    most of the languages due to the fact that gender system is irregular and not well

    defined in many languages. Here, we are presenting some commonly used tricks for

    identifying gender of words in Hindi and Marathi language. These tricks are based

    more on intuitions and heuristics than sound scientific principles. Their applicability

    to other languages has not been studied in present research work. One may very easily

    find exceptions to the rules. These tricks are explained below:

    a. Using Affixes: Affixes can be used to identify gender in some cases in Hindi as

    well as Marathi. E.g. Hindi Suffixes , -, -, -, , -,

    -,- are commonly used to denote feminine gender , ,

    .

    b. Karaka: Hindi Vibhakti symbols / / and Marathi Vibhakti symbols

    // can also be used for identifying gender. E.g. (son of Ram),

    (Daughter of Ram), (son of Ram),

    (Daughter of Ram), (tree of tamarind)

    c. Compound words: [Prince] (masculine), [Princess] (feminine).

    d. Based on Importance/size/greatness: Things bigger in size and important from

    prevailing social norms are treated as masculine, while things smaller in size and

    socially less important are treated as feminine. E.g. (King),

    (proprietor), (driver), (Commander in chief) are treated as

  • Thesis HinMaT MT Framework

    112 | P a g e

    masculine, while (queen), (land lady),etc. are feminine,

    exceptions to this are, (police), (rope), (rope), are treated as

    feminine, while ' (thread) is treated as masculine.

    e. Word ending: generally - or- ending words are considered as feminine in

    Hindi as well as Marathi, while ending words are treated as masculine.

    E.g. Hindi: (m) (bull), (f)(rope), (f)(Jalebi-sweet dish) etc.

    Marathi: (m)(boy), (m)(horse), (f)(girl), (f)(mare:

    female horse) etc.

    Exceptions: Marathi-(f)(school)

    In Indian languages, Gender has impact on number inflections. It is interesting to note

    here that synonymous words may be divided into masculine and feminine classes.

    Also vocabulary of a language is often dominated by masculine gender hence by

    default gender transformation suffixes are found for masculine to feminine

    transformation. Discussion on this issue is presented in section on Number category.

    3.5.2.1.1 Hindi Gender System

    Though Hindi has its root in Sanskrit, unlike Sanskrit11 it has two genders i.e.

    masculine and feminine. It does not have neutral gender. When, we tried to trace the

    reason for omission of neutral gender, scholars have stated that during long span of

    foreign rule specifically Islamic, Hindi underwent influence of Farsi, which has only

    masculine and feminine genders and hence Hindi too has two genders (Yadav, 2011).

    Senior grammarian Kishoridas Bajpayee has strongly justified the two genders system

    for bringing in simplicity in gender system. It is a common practice to put most of

    neutral gender words from Sanskrit and other languages under masculine gender. The

    two gender system is source of problem for Hindi speakers while learning foreign and

    other Indian languages having three genders. As reported by (Yadav, 2011) Hindi

    gender system is not regular and stable, this is due to the fact that Hindi has wide

    geographical spread and hence it has influence of local culture, traditions and dialects.

    Gender transformation in foreign words is also observed in Hindi. Gender forms are

    11 Sanskrit has three genders, masculine, feminine, and neutral

  • Thesis HinMaT MT Framework

    113 | P a g e

    generated by Suffixing gender suffixes to stems, this process is called inflection. It is

    important to note here that only masculine nouns are converted to their feminine

    counterparts. Not all masculine nouns can be transformed to feminine nouns. Few

    nouns are always in masculine plural gender, e.g. (darshana), (tears),

    (lips), (hair). Stems are inflected not only for gender but also for other

    grammatical categories listed above. This kind of inflection is possible for verbs also.

    Besides GNP features, verbs are also inflected for Tense, Aspect and Mood12. Gender

    has strong influence on Number inflections. For feminine gender, suffix - is heavily

    used in Hindi. Sanskrit feminine suffixes , - are derived as , -,

    -, -, - in Hindi. In addition to these suffixes Hindi also uses other

    suffixes but the 08 frequently used suffixes are: , -, -, -, -, -

    , and -. Sample masculine and feminine words in Hindi and Marathi are given in

    Appendix B. Most of the Hindi pronouns in all three persons and cases are gender

    neutral so to say; they can be used for masculine and feminine gender.

    Computationally this is very important aspect in the context of Hindi parsing and

    Machine Translation.

    3.5.2.1.2 Marathi Gender System

    Since Marathi follows Sanskrit gender system, hence it has three genders i.e.

    masculine, feminine and neutral. Marathi pronouns are more diversified in terms of

    gender than Hindi. On the contrary senior grammarian Dadoba Pandurang has

    advocated for using additional common gender ( ) for words for whom we

    cant extract gender in any tense, E.g. (you), (bird) but other grammarians like

    Damale and Chiplunkar have opposed the idea of common gender (Hiremath, 1993).

    As far as Gender fixing or identification is concerned, word ending and traditional

    usage are important criterion. Due to three genders, Marathi gender system is more

    complex as compared to Hindi. Like Hindi, Marathi Gender system is also found to be

    irregular, because we can find words, whose Gender cant really be justified within

    existing frame of rules but these have been followed as part of tradition and ancient

    12 In our view, TAM is a manifestation of case feature for verbs

  • Thesis HinMaT MT Framework

    114 | P a g e

    practice, e.g. (Gold)-Neutral, (Silver)-Feminine. Neutral gender is also used

    for animate living things. Words (common nouns) representing particular ontological

    class (college, class of all fruits etc.) are generally put under Neutral gender.

    Inanimate things can be put under any of the three genders. Various suffixes are used

    for Gender identification as well as gender transformation, the list of frequently used

    suffixes are listed in following table (Table 3.9). It is interesting to note that only

    masculine to feminine and vice-versa gender transformation is observed in Marathi,

    masculine or feminine to neutral is not observed. However neutral to masculine or

    feminine are observed in some cases (e.g. / ) Marathi pronouns are more

    diversified as compared to Hindi. Marathi verbs are not affected by Gender in their

    future tense forms. Few words in Marathi represent more than one gender, common

    nouns representing profession generally fall under this category e.g. (advocate),

    (judge), (client), (registrar), (Prime-minister) etc.

    Borrowed foreign words as well as Sanskrit words may also undergo gender

    transformation in some cases.

    Table 3.9 List of Hindi-Marathi Gender Transformation Suffixes

    Language

    Gender

    Masculine

    (word ending)

    Feminine

    (word ending)

    Hindi

    -( ) - ( )

    -( ) -( )

    Hindi

    -() -()

    -( ) - ( )

    -() - ( )

    -() - ()

  • Thesis HinMaT MT Framework

    115 | P a g e

    -() - ()

    -() - ()

    -() - ()

    -() - ( )

    - ( ) - ( )

    - ( ) - ( )

    - ( ) - ( )

    - () - ()

    - ( ) - ( )

    Marathi

    -() - ()

    -() - ( )

    () - ( )

    -() - ()

    -( ) - ( )

    -() - ()

    -( ) - ( )

    -() - ()

    - ( ) - ( )

  • Thesis HinMaT MT Framework

    116 | P a g e

    Hindi and Marathi vocabulary contain lot of common words. Considerable number

    of such masculine Hindi words is put under neutral gender in Marathi. Some

    examples of such words are given in the following table (Table 3.10).

    Table 3.10 Hindi-Marathi common words with different gender

    Due to difference in genders between Hindi and Marathi pair, gender divergence is

    largely observed during translation. This divergence has strong effect on MT, as it

    may break the agreement between sentence constituents. This effect is not limited to

    only the divergent word and its modifiers, but it may even affect the verb and

    auxiliary verbs, if divergent word governs the GNP features of the verb. HinMaT

    handles these issues very neatly and carefully.

    3.5.2.2 Number

    This is a simple grammatical category, as compared to other categories. Number is

    used to represent the cardinality (count) of things denoted by lexical item. Even

    though it is primarily associated with nouns13, its effect can be observed on adjectives

    and verbs, as they can also be inflected for number by affixing appropriate suffix. This

    is grammatical convenience for coping up with feature agreements between sentence

    constituents. Quite often, notion of grammatical number agrees with real world

    number, but sometimes there is disagreement. But this is due to the fact that some

    words are being used in that way by tradition. E.g. wheat (), sugar ( ) denotes

    singular number, whereas it actually refers to any number of wheat grains. Few words

    13 Only Common Nouns/Pronouns are affected due to number category, other noun types such asProper noun, Abstract noun are not affected by number.

    Word Marathi Gender Hindi Gender

    Neutral Masculine Neutral Masculine

    Neutral Masculine Neutral Masculine Neutral Masculine Neutral Feminine Neutral Masculine Neutral Masculine Neutral Masculine

  • Thesis HinMaT MT Framework

    117 | P a g e

    are always used in either singular or plural form. We cant change the number of such

    words, uncountable things (milk, water, hair) fall under this category. To quantify

    such things, metrics such as liter, grams etc. are used. There is no uniformity

    regarding number category amongst different languages of the world. Languages like

    Greek, Latin, and Sanskrit have three numbers, singular (), double number

    ( ) and plural (). Fiji language has four numbers; they are singular

    (), double number ( ), tri number ( ) and plural (). Both

    Hindi and Marathi use two numbers i.e. singular () and plural (). Detail

    discussion on number systems of Hindi and Marathi languages is presented below.

    3.5.2.2.1 Hindi Number System

    Hindi uses two numbers, singular and plural. Ontological classification of number

    (with examples) for noun POS category is represented using following Figure 3.8.

    Figure 3.8 Number category classifications in Hindi

    The uncountable nouns under mass category above are quantified with the help of

    metrics like liter, kilo-gram, meter etc. depending on their natural property like liquid,

    solid etc. The uninflected word form normally denotes singular number14. Plural

    forms are derived by affixing plural suffixes to singular word forms. These suffixes

    are shown in Table 3.11 below.

    14 Excluding those words which are by default plural

    Number

    Countable

    Singular

    (boy),(goat),(horse

    )

    Group

    (crowd), (meeting),

    (family)

    Uncountable

    Mass

    (gold), (steel),

    (water)

    Abstract

    (truth), (fear)

  • Thesis HinMaT MT Framework

    118 | P a g e

    Singularity or plurality is also governed by another factor i.e. case marker. Two forms

    are found for singular and plural representation depending on use of post-position

    marker15, these forms are called (Savibhaktik: with post-position marker)

    and (Avibhaktik: without post-position marker). It is important to note

    that, for Hindi, all masculine plural direct case common noun forms of Hindi are same

    are masculine singular oblique forms, E.g. word (bacche:boys) denote plural

    direct case, whereas same word in the phrase (bacche ne), denotes

    singular sense a boy. This fact can be modelled using following equation: , , = , , Table 3.11 Hindi Plural Suffixes (Direct case)

    This phenomenon is not observed in Marathi. For plural forms two separate word

    forms are found in Hindi. While pluralizing singular form through affixing, sometimes

    the long length vowels are transformed to short length vowels, e.g. (ladki/girl:

    Singular) (ladkiyaan/girls: Plural), (Nadii/River: Singular)

    (Nadiiyaan/Rivers: Plural). Adjectives, pronouns and verbs are also inflected

    due to number category.

    In case of Adjectives, masculine-singular (ms), masculine-plural (mp), feminine-

    singular (fs) and feminine-plural (fp) are possible. However, feminine singular and

    plural forms are same as well as masculine plural direct case forms are same as

    masculine singular oblique form. E.g. (Acchha:good, masculine, S),

    (Acchha:good, feminine, S/P), (Acchhe:good, masculine, /). This important fact is depicted using following equation.

    15 The term post position is used here in broader senses, which also include case markers. The termcase markers are also found in literature.

    Form( ) Singular() Plural() , , , , ,

  • Thesis HinMaT MT Framework

    119 | P a g e

    , , = , , , , = , ,

    In Marathi one more form i.e. total four forms are found, they are explained in later

    section. Verb forms get inflected for number, and take the number of either subject or

    object or none. For the none agreement case i.e. when Hindi verb does not agree with

    either karta or karma, Hindi verb form is always in masculine, singular form. In case

    of masculine/feminine plural verb forms, the auxiliary verbs are nasalized with

    anuswaar (), e.g. / (They are going). If the auxiliary verb is not present

    then feminine plural verb form is nasalized with anuswaar (), e.g. (gave-

    feminine, plural), (ate- feminine, plural), (sent- feminine, plural) etc.,

    while the masculine plural forms become - , e.g. (gave), (ate),

    (sent). The plurality can also be described using compound words ( ),

    reduplication ( ), and quantifiers ( ). Few words by default are used

    in plural form ( ) only. Examples of such words are given below:

    Compound Words: (cow) + (buffalos) = , (sheeps) +

    (goats) = , (teacher) + (community/group)=

    Reduplication: -, - etc.

    Quantifiers: (five boys), (all students), (some rupees),

    (all employees) etc.

    Default Plural: (tears), (life), (Darshana) etc.

    Number feature of foreign words is mostly decided as per rules of foreign language; in

    few cases Hindi suffixes are conjoined to such words to derive plural forms. Detail

    discussion on this aspect can be found in (Guru, 1920). Pronouns in Hindi are also

    affected by number, their morphology is completely irregular. Detail discussion on

    pronouns is presented in Person category (section 3.5.2.3 ahead). -

  • Thesis HinMaT MT Framework

    120 | P a g e

    adjectives are normally inflected for gender and number. Plural forms are also used

    for denoting honor, in which case they are actually being used in singular sense. As

    stated earlier (section 3.5.2.1), Gender and the word ending letter ( ) have strong

    influence on number suffixes along with Case. The following Table 3.12 describes

    the paradigm used in deriving plural inflections, considering gender, word ending and

    direct/oblique cases.

    Table 3.12 Hindi Plural Suffixes Paradigm

    Form ( ) Gender( ) Word Ending(- )

    Plural()

    Example(.)

    ( )

    Withoutpost-position

    marker

    (masculine)

    - - ,,

    - ,

    (feminine)

    - +

    +

    , , , ,

    - -

    - -

    -, ,

    , - -

    (masculine)

    -/- - ,

    (feminine)

    -/- -

    ./ . -/- - ->- : --

    ,

    ./ . - - (--) , Rest AllForms

    Rest All Forms - , ,

  • Thesis HinMaT MT Framework

    121 | P a g e

    The suffix besides above usage with post position markers, is also used to denote

    plurality of words without post position markers e.g. (many decades),

    (many years), (both), (thousands), (crores) etc.

    3.5.2.2.2 Marathi Number system

    Marathi also follows two numbers viz. singular and plural. Like Hindi, number

    category affects only common nouns in Marathi. Pronoun classification is based on

    gender and number, personal plural pronouns are also used to express honour, and in

    such cases they denote singular number case. Verb forms are also affected by number,

    depending upon their agreement with either subject or object or none at particular time

    instance, in case of no agreement, verb form is always in neutral gender and

    singular form (ns). For expressing singular form no suffixes are required. However in

    singular number, word has two forms, with post-position marker ( ) and

    without post-position marker ( ). When used with post-positions, oblique

    forms of nouns and adjectives are used. Derivation of Marathi oblique forms is

    explained in later section. The paradigm for pluralization of Marathi words in

    different genders is given in following chart (Table 3.13).

    Table 3.13 Marathi Plural Suffixes Paradigm

    WordEnding

    Masculine Feminine Neutral

    -, -,- - - - No word -, - - -, - - -, - - -, - - No word - - No word - No word - - No word - No word No word

  • Thesis HinMaT MT Framework

    122 | P a g e

    The chart is self explanatory. In Marathi only ending masculine word forms are

    inflected for plural number, other forms are same in singular as well as plural. It is

    also important to note here that, certain vowel ending words are not found in all three

    genders. During pluralization, Marathi words also undergo some morphological

    changes like change in vowel length (long to short) and introduction of or at

    the end16. ending feminine forms can be pluralized in three different ways (-

    , - or both). In both classes, two plural forms are derived, e.g.

    (behavior) plural-1, plural-2 etc. Like Hindi, plurality can also be

    described using compound words ( ), reduplication ( ), and quantifiers

    ( ) in Marathi. Like Hindi, few words are by default used in plural form

    ( ) only. It is important to note here that Marathi has four inflected forms

    for gender and number as against three in Hindi. E.g. /// ,

    / // , / // . For declinable words, following

    equations hold.Word , , = Word , ,Word , , = Word , ,Word , / , = Word , / , = Word , / ,

    This feature overloading aspect of Marathi morphology, on word forms is very

    important from computational as well as from storage point of view, since we dont

    need to store all these forms with different feature specification separately, we can

    store them as single word with compact feature specification. In such cases, we must

    resolve them to appropriate features specification from above before parsing,

    otherwise we may parse wrong words.

    16 / change, e.g. , . later is governed by letter clustering rules( ): - += (E.g. - ), - += (E.g. , )

  • Thesis HinMaT MT Framework

    123 | P a g e

    3.5.2.3 Person

    Besides message (sentence gist), every sentence also encodes reference to either

    speaker or listener or some other entity by means of person category. All languages

    use three persons to denote the participating entities in the sentence. Formally first

    person, second person and third person. First person refers to speaker, second person

    to listener or hearer while third person refers to anything other than these two. All

    nouns (except pronouns) are always treated under third person. Number affects the

    person category in all three persons. Different pronouns are used to denote singular or

    plural number for each of the first, second and third persons. First and Second person

    pronouns are gender neutral i.e. they are not affected by gender, only third person

    pronouns/nouns are affected by gender category. Person category also affects the

    auxiliary verbs in most of the languages. The third person pronouns can be further

    classified based on grammatical features like human (+h)/non-human (-h),

    proximity/remoteness, definiteness/indefiniteness, interrogative, presence/absence,

    relational and reflexive, Detail discussion on the person category w.r.t. to Hindi and

    Marathi is presented below.

    3.5.2.3.1 Hindi and Marathi Person category

    Like English, Hindi and Marathi too have three persons. Person category primarily

    affects the pronouns and auxiliary verbs. Theoretically, person category is influenced

    by gender as well as number categories. However, Hindi pronouns are affected only

    by number and not by gender. Whereas Marathi pronouns are affected by both

    number and gender. For third person personal pronouns four forms are observed in

    Marathi e.g. for English demonstrative pronoun (remote) that, we have

    /// forms in Hindi, this- demonstrative pronoun (proximity) has

    / // equivalent forms in Marathi. Hindi has only one form for all these

    cases, (this) and (that). The and forms are used for marking

    oblique case in all three genders & both numbers of Marathi language. This is

    important factor from MT point of view, as there is one to many mapping between

    pronouns of Hindi and Marathi. This hints the parsing process to fix the person feature

    value of the pronoun under consideration. As such no divergence is observed in

  • Thesis HinMaT MT Framework

    124 | P a g e

    pronouns. The complete list of pronouns in Hindi and Marathi has been presented in

    following table (Table 3.14).

    Table 3.14 Hindi & Marathi Person System

    As stated earlier the auxiliary verbs are also affected by person category in Hindi as

    well as Marathi. This impact is more in Hindi as compared to Marathi. The Hindi and

    Marathi auxiliary verbs in different persons are listed in following table (Table 3.15).

    Hindi ( ) Marathi ( )Singular

    ()Plural

    ()Singular

    ()Plural

    ()(First) (I) (we) (I) (We)

    (Second)/(you) (you) (you) /(

    you)

    (you)()

    (you)()

    (you)()

    (you)()

    (Third)

    /(Demonstrative)

    /(He) /(They)

    (m)(He/that)/(f) (she/that)/(n)(it/that)/(*Obl)(that)

    (m)/ (f)/: (Those)

    (Proximate)

    (this)(this)

    (these)(these)

    (m)(this)/ (f)(this)/(n)(this)/ (*Obl)(this)

    (m)/ (f)/ (f):

    (these)

    (Relative)

    (who)/(who) (whom)

    (m)(who)/(f)(who)/

    (n) (who)/(*Obl)(those)

    (m)/ (pl. obl)/(n):(those)

    (Interrogative)

    (who) (who) (who) (who)

    (Indefinite)

    / (anybody)

    / (anybody)

    (who)

    (Reflexive)

    -(automatically)

    (self)

    -(automati

    cally)(self)

    (self))

    (you)(Self)

  • Thesis HinMaT MT Framework

    125 | P a g e

    Table 3.15 Person suffixes for Auxiliary verbs of Hindi & Marathi

    (Person)

    (Tense)

    (Hindi) (Marathi)

    (Masculine)

    (Feminine)

    (Masculine)

    (Feminine)

    (Neutral)Sing

    ..Plural

    ..Sing

    ..Plural

    ..Sing

    ..Plural

    ..Sing

    ..Plural

    ..Sing

    ..Plural

    ..

    First.. - - - - - - -

    - -

    ..

    -

    -

    -

    .. -/-

    -/-

    -/-

    -/-

    - -

    .. -

    -

    -

    -

    -

    . - - - - - - - - - -.. -

    -

    -

    -

    -

    - -

    -

    Second

    .. - - -

    - - - - -- -

    ..

    -

    -

    -

    -

    .. -/-

    -/-

    -/-

    - /-

    -/-

    --/

    - /-

    - /-

    - -

    ..

    -

    -

    -

    -

    - -

    .. - - - - - - - - - -.. -

    -

    -

    -

    -

    /

    -

    /

    -

    /

    -

    / - -

    Third

    .. - - - - - - - - - -

    ..

    -

    -

    -

    -

    -

    -

    .. -/-

    -/-

    -/- -

    -/-

    -

    -

    -

    -

    -

    -

  • Thesis HinMaT MT Framework

    126 | P a g e

    ..

    -

    -

    -

    -

    -

    -

    .. - - - - - -/ - - - -.. -

    -

    -

    -

    - /

    -/

    - /

    -/

    - /

    -/

    3.5.2.4 Karaka

    Karaka (case) is an important category as it is directly related to sentence level and

    can be exploited computationally during the parsing of sentence. Generally sentence is

    defined as sequence of meaningful words, but this is not a complete definition because

    it is mandatory for the participating words to have some relationship with each other.

    These relationships help at different levels of sentence analysis i.e. morphological,

    syntactic and semantic. Without this correlation amongst the words, sentence is not

    meaningful. So we can say that for maintaining the harmony in the sentence, words in

    the sentence must be compatible to each other. This relationship between these words

    is denoted by Karaka. For a sentence to be meaningful, it should have three

    characteristics: Yogyata (eligibility), Aakansha (expectancy), and Aasatti (bonding).

    These have been mentioned in Sanskrit verse by Hindi scholar Vishwanathji

    (Vishwanathdev), it says, . The three

    characteristics as mentioned above are explained below:

    Yogyata (eligibility): The meaning of sentence constituents should be capable of

    relating to other constituents in meaningful way.

    Aakansha (expectancy): Meaning of some constituents cant be expressed solely, such

    constituents expect presence of other constituents with whom they relate. This

    dependence is called as Aakansha (expectancy). The expectancy is of two types:

    mandatory and optional. The first type of expectancy is such that without fulfillment

    of such expectancy, sentence is not meaningful and while second type is required for

    extending the meaning of sentence.

  • Thesis HinMaT MT Framework

    127 | P a g e

    Aasatti (bonding): This is also called Saannidhi, it talks about the positional proximity

    (word position closeness) between related sentence constituents.

    With regards to exact definition of Karaka, scholars have different opinions, according

    to Jespersen (Otto, 1965), relationship between noun(s), adjective(s) or pronoun with

    other constituents of sentence is called Karaka relationship. This relationship is

    scoped to noun-noun, noun-verb, auxiliary verb main verb, adjective-noun etc.

    relationships. According many Sanskrit scholars including Sanskrit legend Panini,

    i.e. any relationship between constituents (mostly nouns) with

    verb in a given sentence is Karaka relation. The nature of this relationship is

    functional. These scholars dont accept noun-noun or adjective-noun or non-noun-

    verb relationships as karaka relationship. Western philosopher, Fillmore (Fillmore,

    1968) gave serious thought to Case theory. His notion of case is based on early

    conception of theta role theory. According to theta theory, following seven theta

    roles17 have been specified, these per se does not go hand in hand with Sanskrit

    Karakas:

    1. Agent ( ): Doer of action is Agent. E.g. He (agent) is writing letter.

    2. Experiencer (): One who experiences the act mentioned in verb or or

    takes denoted action. E.g. Ramesh (experiencer) was happy to receive the

    prize.

    17 The numbers of theta roles as such are not fixed, but we have described primary theta roles. Thetaroles give semantic (functional) relationship between constituent words at meaning level.

  • Thesis HinMaT MT Framework

    128 | P a g e

    3. Instrumental (): Inanimate thing or object which is used in carrying out

    action specified in verb. He cut the apple with a knife(instrument)

    4. Object/Patient ( ): Thing or somebody who undergoes change as

    implied in verb. E.g. Ram painted the house(patient)

    5. Theme: Something or somebody, who is topic of discussion and whose state

    can be perceived by speaker as in motion or steady. E.g. the ball (theme) is

    rolling down, the bottle is green (theme).

    6. Locative ( ): Place of action. E.g. He was arrested in Diamond Hotel

    (locative).

    7. Source ( ): The point of separation, which remains stationary as the action

    progresses in most of the motion verbs. E.g. The train departed from the

    platform (source)

    8. Goal ( ): Last point of action where the action ends in stative/motion verbs.

    E.g. Ram went to school (goal) from his home (source).

    The standard Latin and Greek case grammar assumes 07 cases, these are nominative

    case, accusative case, instrumental case, dative case, ablative case, possessive case,

    locative case, and vocative case, while Paninis karaka theory (500 BC) describe six

    Karaka relations, Karta, Karm, Karan, Sampradan, Apadan, and Adhikaran. In

    Sanskrit, the term Vibhakti refers to word form which gets morphologically inflected

    for denoting particular case using special post position marker symbol, as prescribed

    for that Vibhakti. The relation between Karaka and Vibhakti is many to many

    because, same Karaka can be expressed by many morphological forms (Vibhakti) and

    one morphological form (Vibhakti) can represent different karakas. Karakas are

    syntactico-semantic (Bharati, Chaitanya, & Sangal, 1995) in nature, they are identified

    with syntactic cues like post position markers (Vibhakti symbols). Karaka and

    Vibhakti are closely related to each other; hence many people often confuse each

    other. But Panini has clearly differentiated Karaka from Vibhakti. According to him,

    Karaka is a semantic element ( ) and Vibhakti is its morphological

    representation ( ). We have presented discussion on this aspect in later section

    of this chapter. Now, we will review the Hindi and Marathi Karaka system.

  • Thesis HinMaT MT Framework

    129 | P a g e

    3.5.2.4.1 Hindi Karaka system

    Due to western influence on Hindi Grammar, the early grammarians have prescribed

    08 Karaka relations (Guru, 1920), (Kellogs, 1955). These Karakas along with their

    equivalent cases in Case Theory are presented below:

    1. Karta () Karaka (Nominative case)

    2. Karm () Karaka (Accusative case)

    3. Karan () Karaka (Instrumental case)

    4. Sampradan ( ) Karaka (Dative case)

    5. Apadan () Karaka (Ablative case)

    6. Sambandh () Karaka (Possessive case)

    7. Adhikaran ( ) Karaka (Locative case)

    8. Sambodhan () Karaka (Vocative case)

    Hindi Scholars like Kishoridas Bajpayee (Bajpayee K. P., 1959) and others coming

    from Sanskrit school of thought are not in favor Sambandh Karaka (Possessive case)

    and Sambodhan Karaka (Vocative case) as they do not describe any relationship with

    Verb. For HinMaT, we have considered 1-7 Karakas. In Hindi karaka relation is

    expressed with the help of post-position markers (Vibhakti symbol/parsarga).

    Following table (Table 3.16) shows Karaka and their Vibhakti symbols in Hindi.

    Table 3.16 Hindi Karaka-Vibhakti Table

    -Zero(no) Vibhakti marker

    Sr. No. Karaka Vibhakti Symbol

    1 Karta (ne), ()2 Karma (ko), ()3 Karan (se)4 Sampradan (ko), (ke liye)5 Apadan (se)6 Sambandh (ka)/ (ki)/(ke)7 Adhikaran (mai)/(par)

  • Thesis HinMaT MT Framework

    130 | P a g e

    It is apparent from above table that Vibhakti markers , are overloaded, is

    used to denote Karma (Accusative/theme/object) Karaka and Sampradan

    (dative/beneficiary) Karaka, while denotes Karan (Instrumental) and Apadan

    (Ablative/Source) Karaka. From parsing point of view, this is important aspect as weneed to resolve the appropriate Karaka. Whenever a Vibhakti marker is used to

    specify Karaka relationship(s), for which it is designated, such usage of Vibhakti

    marker is called SwaVibhaktik ( : native usage), while in other instances,

    where Vibhakti Marker specify other Karaka relationship(s) is called Parvibhakti

    ( : foreign usage). The foreign Vibhakti usage is discussed during discussion

    on individual Karaka.

    Hindi sentence words must be converted to their oblique form, whenever they take

    Vibhakti marker to specify a Karaka relationship. Since, proper noun word forms are

    same for direct as well as oblique case by default; they dont undergo any

    morphological inflection, while other noun types based on their number and gender

    may undergo morphological change(s). The Hindi oblique morphological suffixes

    (Kaul, 2008) are listed in following table (Table 3.17a) and their examples are

    presented in table (Table 3.17b)

    Table 3.17a Hindi Oblique Suffixes Table (Kaul, 2008)

    Case Masculine Feminine

    Singular Plural Singular Plural

    Direct

    Oblique - - - -

    Vocative - - - -

  • Thesis HinMaT MT Framework

    131 | P a g e

    Table 3.17b Hindi Oblique Word Form Examples

    Hindi literature uses the terms Parsarg or Vibhakti to mean case markers. Early

    grammarians like Pandit Kamata Prasad Guru and Pandit Kishoridas Bajpeyi used the

    term Vibhakti, as it is being used by tradition from Sanskrit times. Parsarg or Vibhakti

    are different from normal suffixes because the normal suffixes agglutinate with

    preceding word (noun, adjective etc.) E.g. + = , while Vibhakti suffixes

    does not18. Another argument is that Vibhakti marker is right extreme suffix (

    ), i.e. after that no other marker can be used, E.g. , etc., but

    Parsargas can appear in any finite number of times, E.g. ..., ...,

    , Here - , - , - two parsargas are appearing in sequence. In our

    understanding Parsarga is a broader class of post position markers which also includes

    Vibhakti markers. Parsarga can be further classified as declinable ( ) and non-

    declinable ( ), as given in following figure (Figure 3.9).

    18 Exception pronoun, all pronominal forms agglutinate in their oblique cases, eg. , , etc.

    Case Masculine Feminine

    Singular Plural Singular Plural

    Direct ()(boy)

    ()(boys)

    ()(girl)

    ()(girls)

    Oblique (-)(boy)

    (-)(boys)

    (-)(girl)

    (-)(girls)

    Vocative (-)//

    (hey boy)

    (-)// // (hey boys)

    (-)//

    (hey girl)

    (-)//

    (hey girls)

  • Thesis HinMaT MT Framework

    132 | P a g e

    Figure 3.9 Post-position marker classification

    3.5.2.4.1.1 Karaka Usage in Hindi Sentences

    Each of the 07 Karakas as discussed in preceding section are explained below, our

    discussion here is not confined to traditional literature and opinions of the early

    grammarians but also the modern Paninian analysis theory from Computational

    Linguistics point of view by Prof. Rajeev Sangal, Vineet Chaitnya, Prof. Amba

    Kulkarni (Bharati, et al., 2006), (Begum, Husain, Dhwaj, Sharma, & L. Bai, 2008):

    3.5.2.4.1.1.1 Karta () Karaka

    Karta Karaka refers to the doer of action or subject in the sentence. E.g.

    (Ram went to home), (Sita cooked the food). Here and

    have appeared in Karta Karaka. Karta can govern the GNP features of the verb

    in given sentence. In maximum cases the Karta is animate entity. The Karta Karaka

    may or may not have resemblance with real world conception of doer of action, in

    such cases the scope of Karta is restricted to grammatical world and it is grammatical

    Karta. E.g. (The man died). This is common scenario in causative

    sentence constructions, where actual action indicated in the verb is performed by an

    entity, but it is enforced or initiated by someone else with the help of one more object.

    E.g. (Mother made the baby to drink milk from

    maid). Here (Bachhe: oblique form of baby) is actually drinking the milk, he is

    doer of the action, but he is not drinking it himself, he is drinking it with the help of

    Post Positionmarkers

    Non declinable

    others: , ,, , ,

    Declinable

    (Possesive):

    , , , , ,

    Similarity: , ,, ,

    ,

  • Thesis HinMaT MT Framework

    133 | P a g e

    maid who is doing so on orders from the mother. Maid is mediating between mother

    and baby. Mother (), maid () and baby ( -oblique form) are treated

    as prayojak Karta(sponsor), madhyasth Karta(mediator) and prayojya/anubhavak

    Karta (experiencer) respectively. This terminology is used in modern Paninian

    analysis (Bharati, et al., 2006), (Begum, Husain, Dhwaj, Sharma, & L. Bai, 2008).

    Karta Karaka is specified using Vibhakti marker, whenever the verb is in past

    tense form (Karmani Prayog), where Karma generally agrees with GNP of

    verb. It takes no Vibhakti () in (Kartari Prayog), where Karta aggress

    with GNP features of verb. Vibhakti marker symbol is also used to denote Karta

    Karaka in conjunction with verbs like, , , , and -

    main verbs + auxiliary verb usages. Whenever the Karta is in experiencer

    role, it takes Vibhakti marker. Detailed discussion on use of as Karta is

    presented in (Sing, 1985). Examples of above usage are given below:

    1. (Mohan wants the book)

    2. (Ram is feeling hungry)

    3. (He appears to be brave to me)

    4. (Kalpana got the prize)

    In i.e. verb does not agree with either Karta or Karma in GNP and is

    always in masculine singular form, Vibhakti symbol is used to denote Kartas

    inability or ability to perform action in the verb. E.g. (Ram is

    not able (unable) to eat), (I committed a mistake).

    3.5.2.4.1.1.2 Karma () Karaka

    Object of verb or thing which is directly impacted due to action indicated in verb is

    called the Karma. E.g. (Ram eats mango),

    (Ram killed Ravan) Here words (mango), (Ravan) are representing the

    Karma Karaka. One more instance of Karma called gaun karma is described in

  • Thesis HinMaT MT Framework

    134 | P a g e

    Paninian analysis, e.g. (home) (Bombay)

    (Leader) . In these sentences the bold words

    (home), (Bombay), (Leader) are gaun karma.

    Vibhakti marker is used with animate nouns, in case of inanimate nouns generally

    no Vibhakti () is used, except in special cases where use of is allowed, e.g.

    (you save the country).

    The Vibhakti Marker is used with animate nouns in accusative case with special

    reference to verbs of psychological predicate like /(to speak), (to

    ask), (to demand) e.g. (Ram said to his father) etc.

    Kamataprasad Guru (Guru, 1920) has described this usage of Karma as Gaun karma.

    19 Vibhakti marker is used to mark Karma Karaka in sentence with complex verb

    ( ) as discussed in (3.5.2.3.1.1.6) ahead. E.g. (we

    worship god)

    3.5.2.4.1.1.3 Karan () Karaka

    Karan denotes the Instrument used for carrying out the action. E.g. (knife)

    (Gita cut the apple with knife). (knife) is used to cut the apple

    hence it has occurred in Instrumental case. The Vibhakti Marker is used to mark

    this Karaka.

    3.5.2.4.1.1.4 Sampradan ( ) Karaka

    The beneficiary of action in di-transitive verbs is denoted by Sampradan Karaka. E.g.

    (Dhananjay gave book to Vilas). Vilas who received the

    book is the beneficiary of the verb (Give) and appears in Sampradan Karaka.

    3.5.2.4.1.1.5 Apadan () Karaka

    19 is used in representative sense to denote its inflections , also.

  • Thesis HinMaT MT Framework

    135 | P a g e

    Panini in his legendary Ashtadhyayi has mentioned three types of Apandan,

    (motion), (state/stative), (about fear).

    Apadan Karaka is used to indicate point of separation which remains stationary as the

    action indicated in the verb progresses in motion verbs or verbs indicating change in

    state. Some people treat Apadan as source from where verb action begins e.g. consider

    Hindi sentences (Train departed from the platform),

    (Cat fall off the roof), (He saved me from tiger),

    (I am afraid of snake). Here (platform), (roof) are

    denoting the point of departure, which remains stationary after the train and cat moves

    ahead hence they have appeared in Apadan Karaka. So does (Tiger),

    (Snake) in psychological predicates like saving-from, afraid-of etc. Besides, one more

    type of Apadan is prescribed in modern Paninian analysis (Bharati, et al., 2006),

    (Begum, Husain, Dhwaj, Sharma, & L. Bai, 2008) called Prakritik Apadan (Natural

    ablative), which is related to change of state of material e.g.

    (Doors are made from wood). (Ice-cream is made from

    milk). Here words (Wood), (milk) denote source material from which

    doors and ice-cream are made. We can observe that the Vibhakti marker symbol is

    used to mark two Karakas i.e. Karan and Apadan, disambiguating its instance usage

    amongst these two is a computational challenge in Hindi parsing. Apadan is also

    expressed with the help of Vibhakti compounds like , . E.g.

    (bullet fired from gun), (he fall off the roof). Different

    usages of are listed in the following table (Table. 3.18).

    Table 3.18 Use of (Se) post position Marker

    Sr. No. Usage Example

    1. (Instrumental )

    |

    2. (Ablative)

    , , |

    3. (Comparative)

    | | | ,

  • Thesis HinMaT MT Framework

    136 | P a g e

    4. (Incapabilitaive)

    ( , ) | ( ) / ( ) (causative-2)

    5. (Indirect Object)

    , , , , : |

    6. (Co-agentive)

    |

    3.5.2.4.1.1.6 Sambandh () Karaka

    Sambandh Karaka specifies possessive ( ) relationship between two nouns. The

    Vibhakti marker (kaa) is used to mark this Karaka, which gets inflected for GNP

    features to feminine form and plural form (kaa), (ke) respectively. It agrees

    with the GNP feature of its following noun, so this Vibhakti marker is a preposition,

    e.g. (Latas father/father of Lata), (Rams

    brother/brother of Ram), (Sachins Car). Actually these Vibhakti

    markers are also used in conjunction with verbs in conjunct verbs ( ), hence

    in modern Paninian analysis fine shades of this Karaka as Karta Sambandh vachak or

    Karma sambandh vachak in complex predicate and argument of complex verb have

    been described.

    E.g. 1) (The shop was inaugurated yesterday)

    2) (Vice-Chancellor sir inaugurated the

    exhibition yesterday)

    3) (Guests are about to come).

    In example 1) the bold word (exhibition) appears as Karta-Sambandh karaka

    as (Inauguration) is part of complex verb , while in example 2)

    it is Karma-Sambandh karaka and in 3) infinitive appears in Kriya-

    Sambandh karaka.

  • Thesis HinMaT MT Framework

    137 | P a g e

    3.5.2.3.1.1.7 Adhikaran ( ) Karaka

    Adhikaran primarily answers the questions like where and when in the context of verb

    action. In classical sense, it denotes place where the action took place. The locative

    case markers in Hindi, , are also used to denote objects other than place and

    time expressions also.

    E. g. 1. (We had met in Delhi)

    2. (The book was kept on the table)

    3. (they were discussing the state of thecountry)

    Here bold words in 1) and 2) show Adhikaran of place and time respectively while in

    3) bold words denote Vishayadhikaran. The Vibhakti symbol or compound

    Vibhakti compound are used for comparison between two objects, e.g.

    ?(Who sings well out of Suman and Sudip),

    (Out of three of you, first one is better)

    In foreign usage ( ) of Vibhakti marker, , are also used to express

    Adhikaran Karaka of time.

    E.g. 1) (I had not been to office since last four days)

    2) (Santosh will return on Sunday)

    Cl