Top Banner
Sanskrit Linguistic Processing Character-encoding, morphology, and lexicography Peter M. Scharf Brown University 23 December 2009
49

Sanskrit Linguistic Processing Character-encoding, morphology, and lexicography Peter M. Scharf Brown University 23 December 2009.

Dec 17, 2015

Download

Documents

Randall Chase
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Sanskrit Linguistic Processing Character-encoding, morphology, and lexicography Peter M. Scharf Brown University 23 December 2009.

Sanskrit Linguistic Processing

Character-encoding,morphology,

and lexicography

Peter M. ScharfBrown University

23 December 2009

Page 2: Sanskrit Linguistic Processing Character-encoding, morphology, and lexicography Peter M. Scharf Brown University 23 December 2009.

Peter M. Scharf, 23 Dec. 2009: 2

Roman-based Standards

Page 3: Sanskrit Linguistic Processing Character-encoding, morphology, and lexicography Peter M. Scharf Brown University 23 December 2009.

Peter M. Scharf, 23 Dec. 2009: 3

Devanagarī-based Standards

Page 4: Sanskrit Linguistic Processing Character-encoding, morphology, and lexicography Peter M. Scharf Brown University 23 December 2009.

Peter M. Scharf, 23 Dec. 2009: 4

Nominal inflection

Page 5: Sanskrit Linguistic Processing Character-encoding, morphology, and lexicography Peter M. Scharf Brown University 23 December 2009.

Peter M. Scharf, 23 Dec. 2009: 5

Verbal inflection

Page 6: Sanskrit Linguistic Processing Character-encoding, morphology, and lexicography Peter M. Scharf Brown University 23 December 2009.

Peter M. Scharf, 23 Dec. 2009: 6

Vedic Unicode

Page 7: Sanskrit Linguistic Processing Character-encoding, morphology, and lexicography Peter M. Scharf Brown University 23 December 2009.

Peter M. Scharf, 23 Dec. 2009: 7

Encoding Vedic Characters

The Vedic Unicode Proposal recommends the addition of Vedic characters to the Unicode standard so that tone marks that appear in red in this palmleaf manuscript of the Vājasaneyisaṃhitā may be accurately represented in print.

Page 8: Sanskrit Linguistic Processing Character-encoding, morphology, and lexicography Peter M. Scharf Brown University 23 December 2009.

Peter M. Scharf, 23 Dec. 2009: 8

Vedic Unicode Charts

Page 9: Sanskrit Linguistic Processing Character-encoding, morphology, and lexicography Peter M. Scharf Brown University 23 December 2009.

Peter M. Scharf, 23 Dec. 2009: 9

Devanāgarī Extended

Page 10: Sanskrit Linguistic Processing Character-encoding, morphology, and lexicography Peter M. Scharf Brown University 23 December 2009.

Peter M. Scharf, 23 Dec. 2009: 10

Vedic Extensions

Page 11: Sanskrit Linguistic Processing Character-encoding, morphology, and lexicography Peter M. Scharf Brown University 23 December 2009.

Peter M. Scharf, 23 Dec. 2009: 11

LIES Appendix B

The Sanskrit Library Phonetic Basic encoding scheme (SLP1) attempts to meet high standards of unambiguous encoding while restricting encoding to 75 codepoints in the ASCII character set. SLP1 utilizes 57 codepoints to encode segments: 53 to represent phonetic segments and four to represent punctuation. In addition SLP1 utilizes 18 codepoints to encode phonetic features: three to indicate stricture, five to indicate length, eight to indicate tone, and one to indicate nasalization….

Page 12: Sanskrit Linguistic Processing Character-encoding, morphology, and lexicography Peter M. Scharf Brown University 23 December 2009.

Peter M. Scharf, 23 Dec. 2009: 12

SLP1Basic Segments

Page 13: Sanskrit Linguistic Processing Character-encoding, morphology, and lexicography Peter M. Scharf Brown University 23 December 2009.

Peter M. Scharf, 23 Dec. 2009: 13

B.3 Modifiers

Modifiers are added after a character to indicate variations in segment stricture, length, accent, and nasalization, in the order stated. Prolonged length, accent, and nasalization occur in classical Sanskrit as well as Vedic. Modifiers are used in combination to indicate special features of stricture, length, accent, and nasalization in Vedic.

Page 14: Sanskrit Linguistic Processing Character-encoding, morphology, and lexicography Peter M. Scharf Brown University 23 December 2009.

Peter M. Scharf, 23 Dec. 2009: 14

B.3.1 Stricture

_ heaviness [used for semivowels y or v] = lightness [used for semivowels y or v]

! lack of release (abhinidhāna)[used for stops or semivowels y, v, or l]

Page 15: Sanskrit Linguistic Processing Character-encoding, morphology, and lexicography Peter M. Scharf Brown University 23 December 2009.

Peter M. Scharf, 23 Dec. 2009: 15

B.3.2 Length* subsegmental epenthetic vowel (svarabhakti)

# length of half a mora

1 length of one mora [used in Vedic after shortagitated kampa; short e, o; and heavy anusvāra]

1# slightly lengthened

2 length of two morae [used for dvimātra anusvāra inVedic]

3 prolonged length of three morae [used for plutavowels]

4 prolonged length of four or more morae [used inraṅga]

Page 16: Sanskrit Linguistic Processing Character-encoding, morphology, and lexicography Peter M. Scharf Brown University 23 December 2009.

Peter M. Scharf, 23 Dec. 2009: 16

B.3.3 Accent

/ high pitch

\ low pitch

^ circumflex

6 extra low tone

7 low tone

8 high tone

9 extra high tone

+ sharpness

Page 17: Sanskrit Linguistic Processing Character-encoding, morphology, and lexicography Peter M. Scharf Brown University 23 December 2009.

Peter M. Scharf, 23 Dec. 2009: 17

B.3.4 Nasalization~ nasalization

Yamas

20 epenthetic nasalized segments:k~, kh~, . . . , b~, bh~

4 four epenthetic nasalized segments:k~, kh~, g~, gh~

20 replacements for a non-nasal stop before a nasal: k~, kh~, . . . , b~, bh~ (Ṛkprātiśākhya)

Page 18: Sanskrit Linguistic Processing Character-encoding, morphology, and lexicography Peter M. Scharf Brown University 23 December 2009.

Peter M. Scharf, 23 Dec. 2009: 18

B.4.4 Syllabified visarga and anusvāra accent

H/ high-pitched visarga

H\ low-pitched visarga

H^ svarita visarga

M\ low-pitched anusvāra

Page 19: Sanskrit Linguistic Processing Character-encoding, morphology, and lexicography Peter M. Scharf Brown University 23 December 2009.

Peter M. Scharf, 23 Dec. 2009: 19

Nominal Declension

Page 20: Sanskrit Linguistic Processing Character-encoding, morphology, and lexicography Peter M. Scharf Brown University 23 December 2009.

Peter M. Scharf, 23 Dec. 2009: 20

Verbal Conjugation

Page 21: Sanskrit Linguistic Processing Character-encoding, morphology, and lexicography Peter M. Scharf Brown University 23 December 2009.

Peter M. Scharf, 23 Dec. 2009: 21

XML Rules

for guṇa

Page 22: Sanskrit Linguistic Processing Character-encoding, morphology, and lexicography Peter M. Scharf Brown University 23 December 2009.

Peter M. Scharf, 23 Dec. 2009: 22

ExecutablePerl code

Page 23: Sanskrit Linguistic Processing Character-encoding, morphology, and lexicography Peter M. Scharf Brown University 23 December 2009.

Peter M. Scharf, 23 Dec. 2009: 23

XMLFull-form

Lexicon

Page 24: Sanskrit Linguistic Processing Character-encoding, morphology, and lexicography Peter M. Scharf Brown University 23 December 2009.

Peter M. Scharf, 23 Dec. 2009: 24

Morphological Analyzer

Page 25: Sanskrit Linguistic Processing Character-encoding, morphology, and lexicography Peter M. Scharf Brown University 23 December 2009.

Peter M. Scharf, 23 Dec. 2009: 25

Cologne Digital Sanskrit Dictionaries

Page 26: Sanskrit Linguistic Processing Character-encoding, morphology, and lexicography Peter M. Scharf Brown University 23 December 2009.

Peter M. Scharf, 23 Dec. 2009: 26

CDSL Monier Williams

Page 27: Sanskrit Linguistic Processing Character-encoding, morphology, and lexicography Peter M. Scharf Brown University 23 December 2009.

Peter M. Scharf, 23 Dec. 2009: 27

Digital Dictionaries of South Asia

Page 28: Sanskrit Linguistic Processing Character-encoding, morphology, and lexicography Peter M. Scharf Brown University 23 December 2009.

Digital Sanskrit Library Integration

Flexible input and display,linking text to the full-form lexicon,

and aligning inflectional and morphological tags

Page 29: Sanskrit Linguistic Processing Character-encoding, morphology, and lexicography Peter M. Scharf Brown University 23 December 2009.

Peter M. Scharf, 23 Dec. 2009: 29

Sanskrit Library Text-

lexicon Integrat

ion

Page 30: Sanskrit Linguistic Processing Character-encoding, morphology, and lexicography Peter M. Scharf Brown University 23 December 2009.

Peter M. Scharf, 23 Dec. 2009: 30

Sanskrit Library Text-

lexicon Integrat

ion

Page 31: Sanskrit Linguistic Processing Character-encoding, morphology, and lexicography Peter M. Scharf Brown University 23 December 2009.

Peter M. Scharf, 23 Dec. 2009: 31

Sanskrit

Library

Morpho-

logical

Analysis

Page 32: Sanskrit Linguistic Processing Character-encoding, morphology, and lexicography Peter M. Scharf Brown University 23 December 2009.

Peter M. Scharf, 23 Dec. 2009: 32

Monier Williams: anuttama

Page 33: Sanskrit Linguistic Processing Character-encoding, morphology, and lexicography Peter M. Scharf Brown University 23 December 2009.

Peter M. Scharf, 23 Dec. 2009: 33

Sanskrit Library Input/Display Preferences

Page 34: Sanskrit Linguistic Processing Character-encoding, morphology, and lexicography Peter M. Scharf Brown University 23 December 2009.

Peter M. Scharf, 23 Dec. 2009: 34

Sanskrit Library Input/Display Preferences

Page 35: Sanskrit Linguistic Processing Character-encoding, morphology, and lexicography Peter M. Scharf Brown University 23 December 2009.

Peter M. Scharf, 23 Dec. 2009: 35

Sanskrit Library Input/Display Preferences

Page 36: Sanskrit Linguistic Processing Character-encoding, morphology, and lexicography Peter M. Scharf Brown University 23 December 2009.

Peter M. Scharf, 23 Dec. 2009: 36

Sanskrit Library Input/Display Preferences

Page 37: Sanskrit Linguistic Processing Character-encoding, morphology, and lexicography Peter M. Scharf Brown University 23 December 2009.

Peter M. Scharf, 23 Dec. 2009: 37

Sanskrit Library Lexical Sources Preferences

Page 38: Sanskrit Linguistic Processing Character-encoding, morphology, and lexicography Peter M. Scharf Brown University 23 December 2009.

Peter M. Scharf, 23 Dec. 2009: 38

Böhtlingk’sSanskrit-Wörterbuch in

kürzerer Fassunganuttama

Page 39: Sanskrit Linguistic Processing Character-encoding, morphology, and lexicography Peter M. Scharf Brown University 23 December 2009.

Peter M. Scharf, 23 Dec. 2009: 39

Böhtlingk and Roth’sGrosses Sanskrit-Wörterbuch

anuttama

Page 40: Sanskrit Linguistic Processing Character-encoding, morphology, and lexicography Peter M. Scharf Brown University 23 December 2009.

Peter M. Scharf, 23 Dec. 2009: 40

Apte'sPractical Sanskrit-English

Dictionaryanuttama

Page 41: Sanskrit Linguistic Processing Character-encoding, morphology, and lexicography Peter M. Scharf Brown University 23 December 2009.

Peter M. Scharf, 23 Dec. 2009: 41

Macdonell'sA Practical Sanskrit

Dictionaryanuttama

Page 42: Sanskrit Linguistic Processing Character-encoding, morphology, and lexicography Peter M. Scharf Brown University 23 December 2009.

Sanskrit Linguistic Processing

Text-image alignment,and digital critical editing

Page 43: Sanskrit Linguistic Processing Character-encoding, morphology, and lexicography Peter M. Scharf Brown University 23 December 2009.

Peter M. Scharf, 23 Dec. 2009: 43

Monier Williams Digital Image

Page 44: Sanskrit Linguistic Processing Character-encoding, morphology, and lexicography Peter M. Scharf Brown University 23 December 2009.

Peter M. Scharf, 23 Dec. 2009: 44

Machine-readable text

Below is a segment of Ṣaḍguruśiṣya’s Vedārthadīpikā in SLP1 encoding.

Page 45: Sanskrit Linguistic Processing Character-encoding, morphology, and lexicography Peter M. Scharf Brown University 23 December 2009.

Peter M. Scharf, 23 Dec. 2009: 45

Syllable Tags

Below is a segment of Ṣaḍguruśiṣya’s Vedārthadīpikā with orthographic syllable XML tags inserted.

Page 46: Sanskrit Linguistic Processing Character-encoding, morphology, and lexicography Peter M. Scharf Brown University 23 December 2009.

Peter M. Scharf, 23 Dec. 2009: 46

Variant Readings

An XML file contains variant readings for various manuscripts and editions of Ṣaḍguruśiṣya’s Vedārthadīpikā.

Page 47: Sanskrit Linguistic Processing Character-encoding, morphology, and lexicography Peter M. Scharf Brown University 23 December 2009.

Peter M. Scharf, 23 Dec. 2009: 47

Page Boundaries

An XML file of entries associates page boundaries in the manuscript Wai321 of Ṣaḍguruśiṣya’s Vedārthadīpikā with orthographic syllable tags in the machine-readable edition and in manuscript variants tags.

Page 48: Sanskrit Linguistic Processing Character-encoding, morphology, and lexicography Peter M. Scharf Brown University 23 December 2009.

Peter M. Scharf, 23 Dec. 2009: 48

Word-spotting

A highlighted passage in a manuscript of Ṣaḍguruśiṣya’s Vedārthadīpikā: Wai321, folio 131, recto, line 8.

Page 49: Sanskrit Linguistic Processing Character-encoding, morphology, and lexicography Peter M. Scharf Brown University 23 December 2009.

Peter M. Scharf, 23 Dec. 2009: 49

VAD Digital Critical Edition