Proposal for a Bangla (or Bengali) Script Root Zone Label Generation Ruleset (LGR) LGR Version: 4.0 Current Date: 2020-05-20 Document version: 5 Authors: Neo-Brahmi Generation Panel [NBGP] 1. General Information This document lays down the Label Generation Rule Set (LGR) for the Bangla (or ‘Bengali’) 1 script under the general rubric of the Neo-Brāhmī Writing System. Three main components of the Bangla Script LGR i.e. (i) Code point repertoire, (ii) Variants and (iii) Whole Label Evaluation Rules which have been described in detail here, having given a brief historical background of the Script under Section 3. All these components will be incorporated in a machine-readable format in an XML file named "proposal-bengali-lgr-20mar20-en.xml". Labels for testing can be found in the accompanying text document “bangla-test-labels-20mar20-en.txt”. 2. Script for Which the LGR Is Proposed ISO 15924 Code: Beng ISO 15924 Key N°: 325 ISO 15924 English Name: Bengali (Bangla) Latin transliteration of native script names [in IPA]: bɑːŋlɑː, ôxômiya Native names of the script: বাংলা, অসমীয়া Maximal Starting Repertoire (MSR) version : MSR-4 3. Background on Script & Principal Languages Using It 3.0. Introduction ‘Bangla’ (or Bengali) is historically and genealogically regarded as an eastern Indo- Aryan language with around 178.2 million speakers in Bangladesh (98% speakers), and 83.4 million speakers in the Indian states of West Bengal (68.37 million), Tripura (2.15 million), South Assam (7.3 million), Odisha (0.49 million) and Delhi (0.21 million) as 1 The term ‘Bangla’ is used in the descriptive text and the term ‘Bengali’ is used in the normative part of this proposal.
66
Embed
Proposal for a Bangla (or Bengali) Script Root Zone …Proposal for a Bangla (or Bengali) Script Root Zone Label Generation Ruleset (LGR) LGR Version: 4.0 Current Date: 2020-05-20
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
1 GeneralInformationThis document lays down the Label Generation Rule Set (LGR) for the Bangla (orlsquoBengalirsquo)1script under the general rubric of the Neo-Brāhmī Writing System Threemain components of theBangla Script LGR ie (i) Codepoint repertoire (ii) Variantsand(iii)WholeLabelEvaluationRuleswhichhavebeendescribedindetailherehavinggivenabriefhistoricalbackgroundoftheScriptunderSection3Allthesecomponentswillbeincorporatedinamachine-readableformatinanXMLfilenamed proposal-bengali-lgr-20mar20-enxml Labels for testing can be found in theaccompanyingtextdocumentldquobangla-test-labels-20mar20-entxtrdquo
3 BackgroundonScriptampPrincipalLanguagesUsingIt30IntroductionlsquoBanglarsquo (or Bengali) is historically and genealogically regarded as an eastern Indo-Aryanlanguagewitharound1782millionspeakersinBangladesh(98speakers)and834millionspeakersintheIndianstatesofWestBengal(6837million)Tripura(215million) SouthAssam (73million) Odisha (049million) andDelhi (021million) as
1 The term lsquoBanglarsquo is used in the descriptive text and the term lsquoBengalirsquo is used in the normative part of this proposal
2
wellasintheAndamanandNicobarIslands(closetoahundredthousand)-accountingfor83ofIndiaItisamajorlanguageinJharkhand(26million)tooandalanguagewith a sizable population in Bihar (044million) Apart from these there are a hugenumberofBangla-speakingdiasporasspreadallovertheworldItistheseventhlargestspokenandwrittenlanguageintheworldBanglaisthenationalandofficiallanguageofBangladeshandoneof the22Official languages in India(listed in the8thScheduleofthe Indian Constitution) It is also one of the official languages of Sierra Leone Thescript is also calledBangla [102]which is an eastern variety of the lsquoBrāhmīrsquoWritingSystemwritten from left to rightHistorically it derives from theBrāhmī alphabet asusedintheAshokaninscriptions(269-232BC)
Banglaanditscognatelanguagesasmentionedabovetogetherformalinguisticgroupknown as the Eastern New Indo-Aryan (NIA) There is a gross inadequacy of theinscriptionsandmanuscriptsintheEasternApabhranśaorlsquoAvahaṭṭharsquoexceptforsmallinscriptions and the manuscripts of the Tantric Buddhist text titledlsquoCaryyācaryyaviniścayarsquoortheCaryā-Pada[114]datingbacktothe9th-11thcenturyAsa result there is not much epigraphic evidence for the development of its writingsystemHoweverwhatevidenceisavailableofthegenesisofBanglawritingsystemisdiscussedinthesection31[109]Historically theBangla languageisdividedintothreeperiodsasevident fromvarioussources
(i) FirstlyOldBanglaPeriod (roughly9501000 toAD12001350) ofwhichthreespecimensarefound(a)47CaryāsongstheDohākōṣaofSarahaandtheDohākōṣa of Kānha (mostly in Apabhranśa) and theḌākārṇava (in avariety of Prakrt) (b) Old Bangla specimens of over 300 words in acommentary[141]
(iii)Finally after1800ADwe find theModernorNewBanglamarkedby theintroduction of written prose [109] in the books of Fort William College(established in1800)ThecolloquialvarietyofBanglabasedonthespeechvarietyofCalcutta(calledlsquoKolkatarsquonow)madeitsfirstappearancethroughthe Hutōm Pẽcāra Nakśā (1862) by Peari Chand Mitra The influence ofEnglishinthevocabularyidiomsandexpressionsaswellasinthewritingstyles of Bangla is significant by this time The fonts and types for Bangladeveloped during this time also spread to all parts of Bangla speechcommunity[101120]Thesamefontswithsomeextensionswerealsousedfortheneighbouringlanguagesdeployingthiswritingsystem
3
Bangla prose had developed two literary styles during the 19th-20th Century TheSādhubhāṣā (সাধভাষা - Elegant Language or Style) and the Calitabhāṣā (চিলতভাষাCurrent Language orModern Style) It is the latter style that is prevalent today inwrittenproseTheLanguageMovementinBangladesh(thethenEastPakistan)beganin1948ascivilsociety dissented to the elimination of the Bangla script from currency and stampswhichwere inuse since theBritishRaj Themovement reached its pinnacle in1952when on 21 February the police fired on demonstrating students and civilianstriggeringnumerousinjuriesanddeaths2LaterfollowingtheLanguagemovementon27 April 1952 the All Party National Language Committee decided to demandestablishment of an organization for the promotion of Bengali language BanglaAcademyDhaka right from its inception in1955hasbeenengaged inpromotingandfosteringBanglaasthelinguafrancaofthecountrybeforeandafterindependencefromPakistanin1971ThroughthevariouscommissionsandcommitteesconstitutedbytheGovernment of Bangladesh (Banladesa Jatıya Sy iksa Kamisana in 1972 Jatıya Sy iksaUpadestaParisadin1979BanlaBhasaBastabayanaSelain1982BanlaBhasaKamitiin1983 etc3) after independence in 1971 Bangla was made the primary medium ofinstructioncommunication in all Governmental and educational activities Through agreatstruggleandbloodshedtheBengalisestablishedBanglaasanofficiallanguageofthestate4
31WrittenBanglaThe lsquoBangla alphabetrsquo (বাংলা িলিপ - Bānglā lipi ISO15924) is derived from theBrāhmīwritingsystemwhichisrelatedtotheNagarı(alsoknownasDevanāgarī5)script[108]aswell as to Tirhutāwriting system [106] Considered to be fifthmostwidely usedwritingsystem in theworld thiscombinedBangla-Asamiyā-ManipuriScript (showingsomevariationsforAsamiyāandMeiteiorBisnupriyaManipuri)(130)wasusedintheeasternIndianSanskritmanuscriptstooForChakma in IndiaandBangladeshandforKokborok inTripura itwasandstill isoneof thescriptsusedAclosevariant calledTirhutā (123 now available also in UNICODE 100 as 11480 114DF See 110) or
2 The UN declared Ekuśe February (21st February) as the International Mother Language Day at the UNESCO General Conference in Paris on 17 November 1999 ldquoin recognition of the sanctity and preservation of all vernacular languages in the worldrdquo22 3 Bāṅlā Bhāṣā Kamiṭi 1983 Bāṅlā Bhāṣā Kamiṭi Riporṭ (Report of the Bangla Bhasha Committee) Dhakaː Śikṣā Dharma Krīṛā O Saṅskṛti Mantraṇālaya Peoples Republic of Bangladesh 4 Chakraborty Rajib 2018 The Fishermenrsquos Community A Language-Culture Interplay (A Study of Post-1971 Select Bangla Novels) Unpublished PhD Dissertation Visva-Bharati 5William DwightWhitney in his SanskritGrammar unequivocally said ldquoThis name (Devanagarı) is ofdoubtfuloriginandvaluerdquo(WhitneyWilliamDwight1994reprintSanskritGrammarNewDelhiːMotilalBanarasidassPublishersp1)
4
Mithilākṣarawasused forMaithili fromthe14thCenturyuntil theearly-20thcentury[106]InthiscontextonefindsamentionoflsquoSylhetiNagarılipirsquoorlsquoSilotirsquo(addedtotheUnicodeStandard inMarch2005with thereleaseofversion41) thedetailsofwhichcouldbeof interest only tohistorians andhistorical linguists (See137and144)ButSylhetiBanglaisgenerallywrittenbymanyinthemodern-dayBanglascriptnowforallpracticalpurposes Originallyduring thereignof thePāladynasty (750-1154AD) intheeasternIndiaandevenearlierperhapsduringtheMallaperiod(694ADonwards)thepresent-dayBanglawritingsystemgotashapecomparabletothemodern-dayones[111 119] A pictorial description of Brāhmī to Modern Bangla Script could bepresentedhereinatabularform
The inscriptional evidence in Brāhmī is found in the Archaic Brāhmī from the 3rdcenturyBC tothe1stcenturyBCandinMiddleBrāhmīndashsoonafter(1st-3rdCenturyAD)andthenonintheLateBrāhmī(4th-6thCenturyAD)ThisevidencecouldbeseeninbothBangladeshandWestBengal [108]by1)TheMahasthanagara(BogradistrictBangladesh mdash the ancient name being Pundranagara or Paundravardhanapura)inscriptions 2)Brāhmī (andKharoṣṭhī) inscriptions from the lower lsquoGangeticBengalrsquoand (3) Copper plate inscriptions of the Imperial Guptas fromNorthernpart ofWestBengal andNorth-West Bangladeshmdash in the areas underDharmaditya Gopachandraand Samācāradeva (about whom one only knows from five Copper-plates found inKotalipara in the Faridpur district in Bangladesh one in Mallasarul in the Burdwandistrict(WestBengal)andoneinJayramapura(BallesvaradistrictnowinOdisha)
5
Theseepigraphs fromtheeasternpartofUndivided India (datingback to the4th-6thCenturiesAD)showedsomecharacteristicfeaturesofletters(especiallyinমlsquomarsquoলlsquolarsquo
শlsquosarsquoসlsquosarsquoandহlsquoharsquo)whichledtothedevelopmentofeasternvarietyofGuptascriptEpigraphicrecordsfromBangladeshdemonstrateremarkabledevelopmentsinEasternBrāhmī In this context the Tippera copper plate inscription of the lsquoSamatatarsquo rulers(139 pp 265) such as Lokanātha (dated 7th Century AD during the latter half) theKailaninscriptionofSy ridharanaRātaaswellastheAstafpurcopperplatesThelettersseemtohangdownfromwedgeshapedsolidtriangleswithrighthandverticalsbendingdownatthebottombecauseofwhichitwasdescribedbyPrinsepandFleetasKuṭila-lipi (literally lsquoCursivewriting stylersquo)whereas the termSiddhamātrikā (as amatra orbarisplacedovereachoftheletters)wasusedbyAlBiruni(973-1048)todesignatethescriptofNorthernIndiaThenextstageofdevelopmentisillustratedbythe9thCenturycopper plate inscriptions fromKhalimpur of the reign of Dharmapāla fromMonghyrand Nalanda of the time of Devapāla in Bihar and from Jagjıvanpura (Malda) of thereignofMahendrapālaTheSiddhamātrikā(mentionedaslsquoSiddhamrsquoinChinesesources)issaidtohavebeenprevalentalsointhisregionuptotheendofthetenthcenturyAlsocalledtheGauri(ieGandi)inPūrvadeśāortheEasterncountryitwasregardedasthesame script to which is given the appellative Proto-Bangla characteristics inrudimentaryformsintheperiodbetweenAD875andAD1025Insomeepigraphs it isconsideredasbelonging to thesecondquarterof theeleventhcenturyADFlatteningofhead-marksbecomesprominentincomparisontothewedge-shaped serifs An important landmark in the development of the Bangla script is theRamaganja copper plate inscription of Mahāmānḍalika in the last quarter of theeleventhcenturyADItistheearliestdocumentfromthisentireregionwhichbearsthelettermwithatickrisingupwardsThefullvowelidevelopsatickattherightendofthe upper horizontal bar above and a curved hook below Initial e approaches themodernBanglacharacterAmature formofProto-Bangla the immediateprecursorofBanglascriptisillustratedintheinscriptionsoftheVarmanaSenaandDevarulersofthetwelfthandthirteenthcenturies[104]TheevolutionoftheBanglascript(Cf136)isalignedwiththestoryofadvancementofprintingtechnologyThefirstldquoMovabletyperdquoscriptstechnicallycreatedandusedwhileprintingNathanielBrasseyHalheds (1751-1830)1778-book titled AGrammaroftheBengalLanguageIn1785Governor-GeneralWarrenHastings(1732-1818)requestedanother civilian Charles Wilkins (1749-1836) to cut punches for Bangla printingcharactersThecurrentprintedformofBanglascriptappearedsoonafterItisgenerallyagreedthatWilkinsdevelopedBanglaprintscript[111]HepassedonthisknowledgetoPancananaKarmakara(-1804)arenownedartistinBengalLateritwasKarmakarand his family that became famous in Bangla printing technology Shepherd wasanotherassistantofWilkinsinthisdesigningofscriptwhichbecamemoreangularwithsharperturnsandedges[133]Afewarchaiclettersweremodernizedduringthe19th
6
century It was standardized by Pandit Ishwar Chandra Vidyasagar when the Banglatypefontsweretobeusedtopublishona largescaleundertheCalcuttaSchoolBookSociety[116forseveralreferences]Much later in1935 theLinotypetechnique inventedbyOttmarMergenthaler(1854-1899) in 1886was introduced intoBangla printing in 1935 by the efforts of SureshChandra Majumdar (1888-1954) Rajsekhar Basu (1880-1960) Jatindra Kumar Sen(1882-1966)andhisdiscipleSushilKumarBhattacharyaandhadbegunbeingusedbytheA nandabazaraPatrikagrouplaterfollowedbyothersWithinafewyearsthemoreadvancedmonotypetechnologycametobeusedinBanglaprintingHoweverinBanglaprinting culturemonotypehas a very limited acceptance and linotype held stage tilleventuallythedigitaltechnologycameintoreplaceallearliertechniquesAllthesecouldbepresentedinatable
32LanguagesConsideredBelowisthetabularrepresentationofthelanguagesusingBanglascriptthatareplacedonEGIDSScale1-6 (See117 fordetails) Some languagesunderEGIDS5 and6havealso developed their own scripts for printing and publishing Some had used Banglascriptearlier(suchasBodo)orusedit inWestBengalatsomepointoftime(Santali)but have later shifted to another writing system Bodo is now written in Nāgarī orDevanāgarīandforSantalioneusesbothNāgarīDevanāgarīandOl-chiki(145)Forthepurposesof theBanglaLGRonly languagesbelonging to theEGIDS scale1 to4havebeenconsideredConsiderthefollowingtable
9
EGIDSScale1
EGIDSScale2
EGIDSScale3
EGIDSScale4
EGIDSScale5
EGIDS6
Bangla(Bengali)
SantaliBodoRiangKhumiMru(ng)Asho
LepchaPnarKodaKoraChak
Asamiyā(Assamese)
KochorRajabansı
MaltoorMalpahariya
ManipuriorMeitei
BisnupriyaManipuriKok-Borok(TripuraampBangladesh)
ChakmaHajongMundariampKurux(ofBangladesh)
TotoRohingyaTipperaMegamTanchangya
Usoi LimbuSadriorOraon
BhumijorMundariBawmChin
Table4MainlanguagesinIndiaandBangladesh
thatuseBanglaScriptontheEGIDSScale
33NotableFeaturesofBanglaScript[150]BanglaWritingSystemhascertainfeaturesthatshowhowithastobewritteninorhowtype-setting inBangla couldbedoneThis section is followedbya section that explains theCode-points (and fixed Code-point sequences) which show certain distinctive characteristics ofBanglaandwhichmaketheRepertoireThenextsectionswillalsocoverthelsquoaksharrsquo-formationrules(ABNF)showingcharacterclassWordLevelEvaluation(WLE)andContextRulesaswellas In-ScriptandCross-ScriptVariantsHerewepresentsomebasic featuresof theScriptandPronuncition The Bangla script is an alpha-syllabic writing system in which writing of all
consonants are assumed to contain an accompanying lsquoinherentrsquo vowel(theoretically before or after each consonant) It varies between ɔ and odepending on the position of the consonant in the word At times theselsquoassumedrsquoorlsquoinherentrsquovowelsarenotpronouncedatall[142]
10
Vowelscanbewrittenasindependentlettersorbyusingavarietyofdiacriticalmarks which are written above below before after or both of the last twopositionstheconsonanttheyfollowinpronunciation[105]
WhenconsonantsoccurtogetherinclustersspecialconjunctlettersareformedInprintedBanglamanyof theseconsonantal clustersorconjoinedconsonantsareinuseThelettersfortheconsonantsotherthanthefinaloneinthegrouparegenerally reduced But there are a few special conjunct characters which arecompounds of the consonant characters eg 7(k)+ষ(s)=8(ks)
9(n)+জ(j)=(nj)(j)+ঞ(n)==(jn) gt (h)+ম(m)=(hm) There are other issuesalsomdashরasthesecondmemberofaclusterisreducedtoasecondarysymboleg
(p)+র(r)=A(pr)B(s)+C(t)+র(r)=D(str) (as inউD ustra ldquocamelrdquo)য (y)whenusedas a primary symbol represents jɔ in Bangla But its secondary symbol(allograph) jɔ-phala has two phonetic values When added to the initialconsonant in a word it is a vowel aelig (as in শGামল (syamala) ldquogreenrdquo র Gাপার
(ryapara)ldquowrapperrdquoetc)Butafteranon-initialconsonant it justdoublesit in
pronunciation (as in কাযH ধাযH etc) The I(r)+য(y) combination has two
shape of the second member is changedmdasheg M(ddh) N(gdh) and O(ndh)
respectively The solitary example of I (r)+ঋ(r)=ঋH (as in ৈনঋHত nairrtSouthwest) ndash usedmostly in cases of Classical borrowings shows the use ofsecondary symbol of a consonant followed by the primary symbol of a vowelTheinherentvowelonlyappliestothefinalconsonantofthecluster
The Bangla script has at least fifty-two primary symbols and quite a fewallographs(positionalvariantsofthem)correspondingtoforty-four(7oraland7nasalvowelsand30consonants)phonemes(150)orfunctionalspeechsoundswithsomeobviousredundanciesalthoughinoneofthefirstphonemicanalysisthenumberwasthoughttobethirty-fivephonemes[140]
11
As mentioned above in Bangla several graphemic symbols have secondaryshapestechnicallycalledlsquoallographsrsquowithacomplementarydistributionineachcaseThesegraphsormarkingsaregenerallyaddedtothefollowingpositionsoftheprimarysymbol[113]inthefollowingmanner
Asforcomplementarydistributionofvowelletters(word-orsyllable-initial)andVowel Matras which are relevant for ABNF let us consider the followingBesidessomesimpleVowelModifierscalledlsquoKarsrsquoinBangla(alsoreferredtoasMatraintheotherLGRdocumentsofNeo-Brāhmī)therearesomecombinatorymodifiersofBanglaVowelswithcertainconsonantsForexamplewhereas
আU+0986BENGALILETTERAAissubstitutedby
াU+09BEBENGALIVOWELSIGNAA
ইU+0987BENGALILETTERIissubstitutedby
pre-posedিU+09BFBENGALIVOWELSIGNI
ঈU+0988BENGALILETTERIIissubstitutedby
ীU+09C0BENGALIVOWELSIGNIIor
উU+0989BENGALILETTERUissubstitutedby
U+09C1 BENGALI VOWEL SIGN U by marking below the primary
grapheme there are some special vowel modifiers of উ as in the followingcombinedletters zwnj guratherthanwritingasগ(g)+ (u)
m (bh)+র (r)(n bhruldquoeyebrowrdquo)o (s)+র (r)(psru)ঋ (r)afterহ (h)(q hr)etc
TherehavebeenmanynotablecontributionsinsimplifyingandmodifyingBanglaspellings and combinatory techniques especially by scholars such as PabitraSarkar(1992)[134]Inthistherehasbeenanattempttoreducethenumberofallographs of both vowels and consonants in clusters and it has been widelyacceptedintheprintingofschooltextsinbothBangladeshandWestBengal[151152]Asofnow twosystems theold (traditional) and thenewgoon sidebysideoperativeindifferentdomains
HoweverinpreparationofthisLGRdocumenttheaimhasbeentoconsiderthewidelyused and usable sequences and combinations and their variations across the sisterscriptsbelongingtothebasketofBrāhmīwritingsystemsBanglaAcademyDhakapublishedStandardBanglaSpellingRulesin1992followingtherecommendationsofacommitteeconstitutedthroughaworkshopjointlyorganizedbytheJatıyaSy iksakramaandPathyapustakaBoardin1988AthroughlyrevisededitionoftheRuleswaspublishedinSeptember20126After the establishment of Banla A kademi ofWestBengal in 1986 its first PresidentAnnadasankar Ray (1904-2002) in his inaugural address gave a direction forstandardizationofBanglaalphabetscript thespellingsystemandclearlyarguedthattheywouldnotblindly followtheSanskriticmodelofconventionalgrammarAbroadlistofproposalswassenttoexpertsonBanglaandabroadagreementwasreachedforlsquohomogenizationofBanglaspellingrsquoby1988BasedonopinionsreceivedfromdifferentquartersaunanimouslistoflsquorulesrsquowasagreeduponThiswaspublishedbyalsquoSpellingDictionaryrsquo titled Ākādemi Bānāna Abhidhāna (1997) which was obviously morecomprehensive than lsquoTheUniversityofCalcuttaproposalsrsquomade in1936Alongwiththe lsquorationalizationrsquo of spellings another stepwas taken tomake thewriting systemeasier to read by making the symbols used both single and combined ones morelsquotransparentrsquoThesereformswereoriginallysuggestedbySarkar(1987firstpublishedin1978)[134][153]whereheusedthetermsSwaccha (lsquoTransparentrsquo)andAswaccha(lsquoOpaquersquo or non-transparent) even adding Ardha Swaccha (lsquohalf transparent) inbetweenthetwoSomesampleexamplesare Transparent r (nn) s (pt) [ (st) where both member of the cluster can berecognized
6Bangla Academy 2012 Bāṅlā Ekaḍemī Pramita Bāṅlā Bānānera Niyama (Bangla Academy StandardBanglaSpellingRules)DhakaːBanglaAcademy
Therewere in fact two types of proposals One concerned the shape of the lettersthose of consonant + vowel (CV) combinations and conjuncts which is consonant +consonantcombinationsTherewerefurthercomplexshapesiethoseofconsonant+consonant+ (consonant+) vowel (CC(CV) signs as in y (pru) or z (skru) SomedecisionsinthisareawerenecessarybecauseafewoftheCC(C)symbolsrepresentedcomplexitiesthatmadelearningthemdifficultforthechildrenTheotherdealtwiththespellings ofwords onlywithout any reference to the shapes of letters inwhich theywere written The basic objective here was lsquoone word one spellingrsquo to the greatestextentthatwaspossible[151]
Belowwe place a statement of themost salient changes that affect the consonant +vowelcombinations[153]
a The variants of the short u (^ উ-কার hrasva u-kāra) vowel sign have been brought down to one ie So zwnj (gu) is now গ Similarly h (ru) gt র zwnj (śu)gt শ j (hu)gtহ and therefore cluster + short u sign k (ntu)gt W
(ন++ত+উ) (stu)gt[ (স++ত+উ)
b The variants of long u (দীঘH ঊ-কার dīrgha u-kāra) have also been reduced
(rū)gt র n (bhrū) gt Y (ভ bh++র r+ঊ ū) (drū)gt (দ d++র r+ঊ ū) p (śrū)gt (শ ś++র r+ঊ ū)
c The variants of ঋ-কার (ṛ-kāra secondary symbol of ṛ) have been brought down to one q (hṛ) gt হ
Regarding consonant + consonant + (consonant)hellip+ (vowel) clusters PaschimbangaBanglaAkademi proposed transparent or semi-transparent shapes for clusters to theextentadmissibleinBanglawritingsystemSomeexampleswillclarifytheproposal(Aslashwillmeanthatthetraditionalcluster-shapeprecedesitwhiletheBanglaAkademiinnovationfollows)[153]
331TheConsonantsAsper traditional classificationBangla Consonants are categorized according to theirphoneticpropertiesespeciallyintermsofplaceandmannerofarticulation[107]Thereare Five lsquoVargarsquo (pronounced as lsquoBargarsquo in Bangla) or Groups (sets or classes)distinguished by Place of Articulation and one Non-lsquovargarsquo group [105] Each Vargawhich corresponds toStopsat a certainplaceof articulation containsa seriesof fiveconsonants classified as per their phonetic qualities (ie manner of articulation)beginning from Unvoiced and Unaspirated to Voiced and Aspirated (in the fourthcolumn)finallyendingwithaHomorganicorCorrespondingnasal[107]Considerthefollowingtable
332TheImplicitVowelKillerHasanta(calledrsquoHalantrsquoorlsquoHalantarsquoinotherBrahmı-basedscripts)As stated earlier all consonants are pronounced in isolation with an implicit vowel(centralback-ɔinBanglaastheneutralvowel)assumedtobeassociatedwiththem[121]ThelsquoHasantarsquo(=rsquoHalantrsquoorlsquoHalantarsquoinotherBrahmı-basedscripts)orthetermlsquoVirāmarsquo7(=rsquoDa rirsquoinBangla)aspreferredinUNICODE(cfUnicode30andabove)havebeenusedinthisreportastermsthathavebeenusedtodenotethecharacterthatmarkthe absence of this inherent vowel It may be noted that the term virama has beenadopted in UNICODE in a sense that is different from the traditional definition ofgrammarandhenceitrequiressomeexplanationhereConsideringtheimportanceofthedocumentthisnoteshouldbeapartofthisLGRdocumentsothatanybodyreferingtoitshouldbeabletoknowthepropergrammaticalexplanationofthetermBecauseaspecialsignisneededwheneverthisimplicitvowelisstrippedoffthesymbolisknownas the Hasanta (= Halant) (U+09CD) By placing the Hasanta under the firstconsonantofacombinationorclusteronecouldndashincommonparlanceldquokillrdquoitsvowelandcreate conjuncts In thismanner conjunct characters canbegenerallywrittenbyjoining two to fourconsonant combinations In rarecases thisprocess can joinup tofive consonantsHowever thenotionof amaximumnumberof consonants joining toformoneaksara8istobeboundedempiricallyThisisanobservationbasedontheCIIL-Emille Corpora of Bangla words [132 amp 133] as seen in print these days Given themixture of scripts and languages happening on theweb the possibility that onemaywant a generic Top Level Domain [gTLD] which may have more than the observedmaximum cannot be ruled out This can be the case when a foreign language wordwhichadmitsalargenumberofconsonantsistransliteratedintoBanglaHenceintheBanglaLGRworkthislimitwillnotbeenforced
333VowelsSeparate symbols exist for all lsquoSwararsquo or Vowels in Bangla which are pronouncedindependentlyeitheratthebeginningofthewordorafteranothervowelorconsonantsoundToindicateaVowelsoundotherthantheimplicitoneaVowelsigncalledlsquokārrsquoinBanglaorMātrāinNagarı9isattachedtotheconsonantSincetheconsonanthasthisbuilt in neutral vowel at the end there are equivalent kāras (Mātrās) for all vowelsexcepttheঅ(pronounced-ɔ)Thecorrelationisshownasfollows
7VirāmaasusedhereisalsoamisnomeraccordingtotheIndiangrammaticaltraditionsNowheremereabsence of a vowel is marked as virama Hasanta just marks the absence of a vowel nothing else(AbhyankarKashinathVasudevampJMShukla1961ADictionaryofSanskritGrammarBarodaːOrientalInstitute)8ThistermneedstobedisambiguatedAksaraalsomeanslsquosyllablelsquoinIndiangrammaticaltreaditions9AlthoughthetermlsquoMatralsquoinBanglastandsforanaltogetherdifferentconceptvizthetopbarplacedoveraletterndashtypicallyavailableinHindiandBanglabutmissinginGujarati
334TheAnusvāraonuʃʃār(ং-U+0982)TheAnusvāra or onuʃʃār inBangla at times represents a homorganic nasal but notalwaysItreplacesaconjunctgroupofalsquoNasalConsonant+Hasanta+ConsonantrsquowherethesecondconsonantbelongstotheVelarvargaorsetasinলংকাButitoftenappearsalso for such combinations involving non-velars appearing as the lastmember of thecombinationasinলGাংটা ldquonakedrdquoorলGাংচা ldquoakindofsweettolimprdquoBeforeanon-vargaconsonant the Anusvara represents a nasal sound that may have an alternativeconjoined writing symbol representing the corresponding nasal consonant of theparticularsetAlthoughModernHindiMarathiandKonkaniprefertheanusvāratothecorrespondingHalf-nasal inBangla it isclearlydemarcatedastowhereonemustusetheAnusvāraandwhere ithastobeaconjunctclusterwithanasalasthefirstorthesecondcomponent
335NasalizationCandrabindu(-U+0981)
Candrabindu denotes nasalization of the preceding vowel as in চাদ cad lsquomoonrsquo(U+099AU+09BEU+0981U+09A6)Thissignwithadotinsidethehalf-moonmarkisusedasnasalizationmarkerinmanyBrahmı-basedscripts[143]
336Nukta(-U+09BC)ThenuktasigndoesnotexistinBanglaorthographyItispredominantlyusedinmanyBrahmıderivedscriptssuchasDevanagarı(forHindiBodoMaithiliSantaliKashmiriandSindhiThetermandtheconceptofnuktaareborrowedinBanglaTheIDNAProtocol(RFC5891)statesthatIDNsmustbeinUnicodeNormalizationFormC (NFC) RFC 7940 applies this requirement to LGRs The definition of NFC in theUnicodeStandardcontainsanumberofcompositionexclusionsAsaresulttheBanglalettersয় YYAড় RRAandঢ় RRHAhavetoberepresentedinthethisLGRbyusingthesequences (YA +Nukta U+9AF + U+09BC) (DDA + Nukta U+9A1 + U+09BC) and(DDHA+NuktaU+9A2+U+09BC)insteadofthesinglecodepointsYYA(U+9DF)RRA(U+09DC) andRRHA (U+09DD) although the useof lsquoNuktarsquo is otherwise completelyunnaturalinBanglaIt is noted that in the current Unicode Standard chart these characters are listed asadditionalconsonantsAspertheLGRProcedurehoweverthesedecisionsdependontheIDNAProtocolthroughasetofprodeduresdevelopedbytheIETFEventhoughtheUnicode Standard also prescribesmethods to produce these three characters both asatomiccharacters (forexample09DC forড় [r]09DD forঢ় [rh] and09DFasয় [y]assinglekeystroke)theIDNAprotocolrequiresthatwetreatthemasconjunctcharactersandthenallocatecodesfortheseintheUnicodeBengaliBlock
18
ItmaybenotedthattherecouldbesporadicattemptsorcasesofwritingMuslimnamesUrdupoeticwordsandPerso-Arabicloanwordswithnuktaunderক(k)খ(kh)গ(g)জ(j) and ফ (ph) only for the sake of correct pronunciation and for maintaining thesanctityoftheloanwordThesewerealsolikeusingBanglawritingsystemtoworkliketheIPAscriptItishowevernotinuseinBanglawritinginprinting
337Visargabiʃɔrgo(ঃ-U+0983)andAvagraha(ঽ-U+09BD)
TheVisargabiʃɔrgoU+0983 is frequentlyused inBangla loanwordsborrowed fromSanskritandrepresentsasoundveryclosetohOnecouldquoteasanexampleদঃখduhkholdquosorrowrsquorsquoldquounhappinessrsquorsquo(U+0926U+0941U+0983U+0916)The Avagraha ঽ (U+09BD) is mainly used in Sanskrit Pali Prakrt or Maithili textswritteninBanglaItisgraduallybeingreplacedbyanuppercomma(egনেরাঽপরািণre-writtenasনেরাrsquoপরািণ)ItisrarelyusednoweveninotherlanguagesusingBanglascriptIncaseofLGRtheAvagrahaisnotpartoftherepertoireIthasbeendecidedthereforenottoretainAvagraha(ঽ)(U+09BD)becauseitisblockedinTLDsaspertheMaximalStartingRepertoire(MSR)PleaseseeAppendixIIinsection11foracompletelistofBanglaconsonantsandtheirallographs
type in the following mannermdashর++য but for র G the sequence would be
র+ZWJ++য [154] In other words ZWJ is used in the rendering of wordsdemanding ya-phalā after ra which is otherwise not possible to type (render)due to the same order of ra+hasanta+antastha ja in the medial andor finalposition Interestingly ra+hasanta+antastha ja is used to type repha on theconsonant -antasthaja as inকায6 (kaarjo) In order to get a ya-phalā after the
consonant -ra it is therefore obligatory to use ZWJ after -ra as in র Gাপার
TheuseofZWJZWNJhavebeenruledoutfromtherootzonebythe[Procedure]Usedin Bangla to create alternate renderings the insertion of these two signs can affectsearchingaswellasNLPTheZeroWidthNon-joiner(ZWNJ)isaninvisiblecharacterusedincertaincases(afterHasanta)wheredefaultconjunctformationistobeexplicitlyrestrictedandtheHasantajoiningthetwoconsonantsparticipatingintheconjunctformationneedstobeexplicitlyshown
339UseofYa-phalaaYa-Phalaasequencesare two instances inBanglawhereHasanta isprecededbya fullvowel(U+0985অ-BENGALILETTERAandU+098Fএ-BENGALILETTERE)
For renderingYa-phalā followedbyঅ andএ it isnecessary to typeU+09CDHasantaplusU+09AFyaprecededby thesaidvowelsThis isapurely ligaturalentityand theadditionofYa-phalāandākaraisusedtoelicittheaeligsoundasinEnglishacidঅGািসড
association অGােসািসেয়শনlsquobatrsquoবGাটlsquofatrsquo ফGাট lsquomatrsquo মGাটlsquocaprsquoকGাপetcTheBrāhmīscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribed as lsquovowel killerrsquo although it actually indicates absenceof a vowel after themarkedconsonantOnly theconsonantscanhave theHasantamarkedButasweseehereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅ8াandএ8া(CfUnicode100p473[100])
Owingtoco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egT ieka+Hasanta+raasinচTchakraldquocyclerdquo)Thepointisinboth the cases the slot for ra could be Bangla ra র (U+09B0) or the Assamese ra ৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesameTheLGRmakesanoteofthispoint of concern with respect to the two RAs in disguise as it would be compeltelyimpossibletodistinguishbetweenthemwithnakedeyesinalablesogeneratedwhichmay consequently lead to concerns related to spoofing and other kind of cyberirregularitiesThemotivetoclassthesetwoCPsas(blocking)variantsisbecausefullyrendered labels may mask the distinction between Bangla ra র (U+09B0) or theAssameseraৰ(U+09F0)ThatprovidesthejustificationforVariantSet4thoughonlyinthecontextoffollowingHasantThedifferencebetweentheRAsisonlydistinguishableifonelooksintotheirUnicodevaluesThereforelabelssuchasঅকHarka শীষH sırsalsquotopapexrsquo অY abhra lsquocloudthe skyrsquo ম śrama lsquophysical labourrsquo could be extremelydangerous as theweb-usermay never verify the digital content (the labels) with itsunicodevaluecodepoints ThispointismadeexplicitlywithreferencetoTable9(ofsequencesp36)andTable16(ofWLESymbolsp47)thataretofollowMoreoverit
4 OverallDevelopmentProcessandMethodologyThe Neo-Brāhmī Generation Panel (NBGP) has been formed by members havingexperience in Linguistics (especially in NLP Computational linguistics) LiteratureLanguageHistoryandEpigraphyUndertheNeo-BrāhmīGenerationPanelBanglaandeightotherscriptsbelongingtoseparateUnicodeblocksarebeingtakenuptoassignaseparate LGR for each However an attempt ismade to ensure that the fundamentalphilosophybehindbuildingthoseLGRsconsistentwithallotherBrāhmī-derivedscriptsThepresentLGRwillcater tomultiple languagesbelongingtoEGIDSscale1to4(seeTable4)thatuseBanglascriptThefollowingguidingprinciplesareusedinmakingdecisionsaboutBanglaLGRCode-points
41 GuidingPrinciplesTheNBGP adopts followingbroadprinciples for selection of code-points in the code-pointrepertoireacrosstheboardforalltheNeo-Brāhmīscriptswithinitsambit
411 InclusionPrinciples4111 ModernUsageEvery character proposed should be in the everyday usage of a particular linguisticcommunityThecharacterswhichhavebeenencodedintheUnicodefortranscriptionpurposesonlyorforarchivalpurposeswillnotbeconsideredforinclusioninthecode-pointrepertoire4112 UnambiguousUseEvery character proposed should have unambiguous understanding among linguistsaboutitsusageinthelanguage
42 ExclusionPrinciplesThe main exclusion principle is that of External Limits on Scope These consist ofprotocolsor standardswhichareprerequisites to theLabelGenerationRule-setsAllfurtherprinciplesareinfactsubsumedundertheselimitationsbuthavebeenspeltoutseparatelyforthesakeofclarity
22
421 ExternalLimitsofScopeThecodepointrepertoireforrootzonebeingaveryspecialcaseatthetopofprotocolhierarchies thecanvasofavailablecharacters forselectionasapartof theRootZonecodepointrepertoireisalreadyconstrainedbyvariousprotocollayersbeneathitThefollowingthreemainprotocolsstandardsactassuccessivefiltersiTheUnicodeChartOut of all the characters that are needed by the script in question if a particularcharacter is not encoded in Unicode it cannot be incorporated in the code pointrepertoire Such cases are quite rare and especially so in Bangla-Asamiyā-ManipuriWritingSystemgiventheelaborateandexhaustivecharacterinclusioneffortsmadebytheUnicodeconsortiumiiIDNAProtocolUnicode being the character-encoding standard for providing the maximum possiblerepresentation of a given scriptlanguage it has encoded as far as possible all thepossible characters needed by the script However the Domain name being aspecialized case it is governed by an additional protocol known as IDNA(InternationalizedDomainNames inApplications) The IDNAprotocol excludes somecharactersoutofUnicoderepertoirefrombeingpartofthedomainnamesiiiMaximalStartingRepertoire(MSR)TheRoot-zoneLGRbeing the repertoireof characterswhicharegoing tobeused forcreationoftheRoot-zoneTLDswhichinturnconstituteanevenmorespecializedcaseof domain names the ROOT LGR procedure introduces additional exclusions on theIDNArsquosallowedsetofcharacters ExampleBanglaSignAvagrahaঽ(U+093D)evenifallowedbyIDNAprotocol isnotpermittedintheRootZoneRepertoireaspertheMSRTosumuptherestrictionsstartoffwithadmittingonlysuchcharactersasarepartofthe code-block of the given scriptlanguage The IDNA Protocol further narrows thisdownandfinallyanadditionalfilterintheformofMaximalStartingRepertoirerestrictsthecharactersetassociatedwiththegivenlanguageevenmore4211 NoPunctuationMarksTheTLDsbeingidentifierspunctuationmarkerspresentinBraHami-basedscriptswillnotbeincluded
23
4212 NoSymbolsandAbbreviationsAbbreviations weights and measures and other such iconic characters like BANGLAISSHAR(U+09FA)BANGLACURRENCYDENOMINATORSIXTEEN৹(U+09F9)etcwillalsonotbeincluded4213 NoRareandObsoleteCharactersThere are characterswhich have been added toUnicode to accommodate rare formssuchasSanskriticVOCALICRRৠ(U+09E0)andVOCALICLldquoঌrdquo (U+098C)aswellasVOCALICLLৡ(U+09E1)andtheallographicndashkaraformsofthelattertwosymbols-VOWELSIGNVOCALICL(U+09E2)andVOWELSIGNVOCALICLLldquo(U+09E3)Allsuch charactersareexcludedwhich complieswith theConservatismprincipleas laiddownintheRootZoneLGRprocedureHoweverinBanglathe-karacorrespondingtoVOCALICRRৠ(U+09E0)whichisVOWELSIGNVOCALICRRldquordquo(U+09C4)isstill inactiveuseincertainlimitedborrowedorSanskriticwordsandarethereforeretained4214 NoStressMarkersofClassicalSanskritandVedicStressmarkers for classical Sanskrit will not be included This is also in consonancewiththeLetterprincipleaslaiddownintheRootZoneLGRprocedure4215 ABNFThe Augmented Backus-Naur Formalism (ABNF) is described in Section 541 andAppendix(Section101)
5 RepertoireTheBanglaWritingSystemisrepresentedinUNICODEusingtheBengali(Bangla)scriptname as enumerated in ISO 15924 corresponding to languages such as Asamiyā(Assamese) Bangla (Bengali) and Manipuri The BENGALI block used for Bangla-Asamiyā-Manipuri in theUNICODEhas93 entriesThis sectiondetails the code-pointrepertoirethattheNeo-BrāhmīGenerationPanel[NBGP]proposestobeincludedintheBanglaLGRItmaybementionedherethat theGovernmentofAssamhassubmittedaproposal toBureauof Indian Standards (BIS) on26th February2016 for dis-unificationofBanglaand Asamiyā Scripts The BIS inits 8thMeetingofIndian Language Technologies andProducts Sectional Committee LITD 20 held on 23rd Aug 2017 decided torefer theproposalforrecognitionofAssamesescriptinISOIEC10646toISOUntiltheUNICODEConsortiumtakesanyfurtheractionitwillbeassumedthattheCodePointRepertoireunderTable11willbevalidforallthethreelanguagesasabove
24
For each of the code points language references have been given in the last columntitledReferenceunderTable8titledtheldquoCodePointRepertoirerdquoForentirecoverageofBanglacodepointsreferencesofBanglaAsamiyā(Assamese)Manipuri(Meitei)andBishnupriya are given Kokborok written in Bangla script is not known to haveintroducedmanynewcomplicationsexceptforoneparticularcharacterThoughonlyafewrepresentativelanguagesunderEGIDSScale1-4havebeenchosenforreferencingthey together cover all the code-points required for all the languages that NBGP hasconsideredasgivenunderBanglaUnicodePoints(asgiveninUNICODE63)Howeverbefore thedetailsarepresented it is ideal to lookat theBanglaCodePointChartfromMaximalStartingRepertoire[MSR]Version3Itmaybenotedthattheshapesofthereferenceglyphsgivenbelowinthecodechartsarebasedononeofthemanyfontsdesigned and are not prescriptive because there could be some variations in actualfonts ndash both UNICODE-compatible and True-Type ones Consider the following Codepointtable
ColourconventionAllcharactersthatareincludedinthe[MSR]-YellowbackgroundPVALIDinIDNA2008butexcludedfromthe[MSR]-PinkishbackgroundNot PVALID in IDNA2008 or are ineligiblefor the root zone (digits hyphen) - Whitebackground
Apart from the above individual code-points the Neo-Brāhmī Generation Panel alsoproposes some specific sequences which enable conditional inclusion of the BanglaLETTER A and E followed by Bangla SIGN VIRAMA and Bangla LETTER YA againfollowed by Bangla VOWEL SIGN AA in the repertoire for enabling inclusion of aeligsoundasinEnglishlsquobatrsquolsquocatrsquoetc
52 CodePointRepertoireExclusionTherearesomecharactersoftheBanglascriptthatfindplaceintheUnicodebuthavenot been included in the repertoire in the LGR proposal The reason for excludingঌ(U+098C)andৗ(U+09D7)isthattheyarerareandobsoletecharacters
53 CodepointnotusedaloneBENGALI SIGN NUKTA U+09BC (See 336) is excluded from repertoire since it will never be used alone It will be used as sequence in three special characters in normalized form for ড় ঢ় য়
542 TheVowelSequenceInwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaare given To facilitate understanding of other Brahmi script users equivalents inDevanāgarīareprovidedwherevernecessaryAvowelsequenceismadeupofasinglevowelItmaybefollowedbutnotnecessarily(optionally)byanAnusvāraonuʃʃār(B)Candrabindu(D)oraVisargabiʃɔrgo(X)ThenumberofDBorXwhichcanfollowaV inBanglamaynotberestrictedtooneGoingbytherules illustratedinthedocument it isclearthat formationssuchasVDDVBBandVXXare invalidorthographicunitsHowever it isvalidandpossible tohaveformationsorsequencessuchasanusvarafollowedbyachandrabinduononehandandvisarga followed by a chandrabindu on the other as in হ8াংচা lsquohaelignchārsquo and lsquohaelignrsquo হ8াঃrespectivelyThepossibilityof aVisarga orAnusvāra (onuʃʃār) followingaCandrabinduexists inBangla Vowel can optionally be followed by a combination of Hasanta Virama [H]Consonant [C] to formaYa-phala ldquoYa-phala isapresentation formofU+09AFBanglaletterযorlsquoyarsquoRepresentedbythesequenceltU+09CDieBENGALISIGNVIRAMABangla SIGNHasanta or VIRAMA U+09AF -য BENGALI LETTER YAgt Ya-phala has aspecialformয়AgainwhencombinedwithU+09BEাBENGALIVOWELSIGNAA(ielsquoaarsquo(ā))itisusedfortranscribing[aelig]asintheldquoardquointheEnglishwordldquobatrdquowritteninBanglaasব8াটAVowel-sequenceadmitsthefollowingcombinations
5421 ASingleVowel
ExamplesV অअ
5422 AVowelwithConditionsAVowelcanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D]orVisarga[X]or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX] or combination ofHasanta(orVirama)[H]followedbyConsonant[C]followedbykāra(Mātrā)[M]
5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]
5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX
Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)
545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full
10 Refer to Rule P in Section 7 Table 16
42
vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+
6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed
61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus
স +Hasanta+হ(U+09B8+U+09CD+U+09B9)
2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus
ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)
44
Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)
62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference
7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecifications
11 Unicode uses Oriya for the script although Odia is now the official term used 12 As used by the Unicode denoting and including both Assamese and Maṇipuri
47
Below are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)
C rarr Consonant
M rarr Kāra(Mātrā)
V rarr Vowel
B rarr Anusvāra
D rarr Candrabindu
X rarr Visarga
H rarr Hasanta
Z rarr KhandaTa
S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules
P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA
Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinations
49
711 Case of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াtঅuইিvয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অu)andthesecondwordbeginswithaV(ইিvয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstwordThis isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule
72 AdditionalExamplesfromBanglaABNF
Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive
ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of the
JanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka
[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018
[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473
57
10 Appendix-I
101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply
1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াwেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াxব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexampleyেসিরং(Tsering)
102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti
13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section
58
Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)
Table17ndashTheScriptTableofSylhetiNagarıorSiloti
103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints
ক k Uuml(p+ক)raquo(p+ট)(p+ত)Yacute(p+y+র)(p+y+ব)THORN(p+ন)szlig(p+ব)agrave(p+ম)ক8(p+য)aacute(p+র)acirc(p+ষ)atilde(p+frac12+ণ)auml(p+frac12+ম)aring(p+frac12+ব)acirc8(p+frac12+য)aelig(p+স)t(ccedil+ক)egrave(+p+র)
amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)
গ g ecirc(micro+গ)euml(micro+দ)acute(micro+ধ)igrave(micro+ন)iacute(micro+ব)icirc(micro+ম)গ8(micro+য)iuml(micro+র)eth(micro+ল)ntilde(ccedil+গ)ntilde6 ( +ccedil+গ)
ণ n ouml(Agrave+ট)iquest(Agrave+ঠ)divide(Agrave+ড)oslash(Agrave+Atilde+র)AElig(Agrave+ঢ)ugrave(Agrave+ণ)ণ8(Agrave+য)uacute(Agrave+ব)atilde(p+frac12+ণ)ucirc(frac12+ণ)uuml(+ণ)
wellasintheAndamanandNicobarIslands(closetoahundredthousand)-accountingfor83ofIndiaItisamajorlanguageinJharkhand(26million)tooandalanguagewith a sizable population in Bihar (044million) Apart from these there are a hugenumberofBangla-speakingdiasporasspreadallovertheworldItistheseventhlargestspokenandwrittenlanguageintheworldBanglaisthenationalandofficiallanguageofBangladeshandoneof the22Official languages in India(listed in the8thScheduleofthe Indian Constitution) It is also one of the official languages of Sierra Leone Thescript is also calledBangla [102]which is an eastern variety of the lsquoBrāhmīrsquoWritingSystemwritten from left to rightHistorically it derives from theBrāhmī alphabet asusedintheAshokaninscriptions(269-232BC)
Banglaanditscognatelanguagesasmentionedabovetogetherformalinguisticgroupknown as the Eastern New Indo-Aryan (NIA) There is a gross inadequacy of theinscriptionsandmanuscriptsintheEasternApabhranśaorlsquoAvahaṭṭharsquoexceptforsmallinscriptions and the manuscripts of the Tantric Buddhist text titledlsquoCaryyācaryyaviniścayarsquoortheCaryā-Pada[114]datingbacktothe9th-11thcenturyAsa result there is not much epigraphic evidence for the development of its writingsystemHoweverwhatevidenceisavailableofthegenesisofBanglawritingsystemisdiscussedinthesection31[109]Historically theBangla languageisdividedintothreeperiodsasevident fromvarioussources
(i) FirstlyOldBanglaPeriod (roughly9501000 toAD12001350) ofwhichthreespecimensarefound(a)47CaryāsongstheDohākōṣaofSarahaandtheDohākōṣa of Kānha (mostly in Apabhranśa) and theḌākārṇava (in avariety of Prakrt) (b) Old Bangla specimens of over 300 words in acommentary[141]
(iii)Finally after1800ADwe find theModernorNewBanglamarkedby theintroduction of written prose [109] in the books of Fort William College(established in1800)ThecolloquialvarietyofBanglabasedonthespeechvarietyofCalcutta(calledlsquoKolkatarsquonow)madeitsfirstappearancethroughthe Hutōm Pẽcāra Nakśā (1862) by Peari Chand Mitra The influence ofEnglishinthevocabularyidiomsandexpressionsaswellasinthewritingstyles of Bangla is significant by this time The fonts and types for Bangladeveloped during this time also spread to all parts of Bangla speechcommunity[101120]Thesamefontswithsomeextensionswerealsousedfortheneighbouringlanguagesdeployingthiswritingsystem
3
Bangla prose had developed two literary styles during the 19th-20th Century TheSādhubhāṣā (সাধভাষা - Elegant Language or Style) and the Calitabhāṣā (চিলতভাষাCurrent Language orModern Style) It is the latter style that is prevalent today inwrittenproseTheLanguageMovementinBangladesh(thethenEastPakistan)beganin1948ascivilsociety dissented to the elimination of the Bangla script from currency and stampswhichwere inuse since theBritishRaj Themovement reached its pinnacle in1952when on 21 February the police fired on demonstrating students and civilianstriggeringnumerousinjuriesanddeaths2LaterfollowingtheLanguagemovementon27 April 1952 the All Party National Language Committee decided to demandestablishment of an organization for the promotion of Bengali language BanglaAcademyDhaka right from its inception in1955hasbeenengaged inpromotingandfosteringBanglaasthelinguafrancaofthecountrybeforeandafterindependencefromPakistanin1971ThroughthevariouscommissionsandcommitteesconstitutedbytheGovernment of Bangladesh (Banladesa Jatıya Sy iksa Kamisana in 1972 Jatıya Sy iksaUpadestaParisadin1979BanlaBhasaBastabayanaSelain1982BanlaBhasaKamitiin1983 etc3) after independence in 1971 Bangla was made the primary medium ofinstructioncommunication in all Governmental and educational activities Through agreatstruggleandbloodshedtheBengalisestablishedBanglaasanofficiallanguageofthestate4
31WrittenBanglaThe lsquoBangla alphabetrsquo (বাংলা িলিপ - Bānglā lipi ISO15924) is derived from theBrāhmīwritingsystemwhichisrelatedtotheNagarı(alsoknownasDevanāgarī5)script[108]aswell as to Tirhutāwriting system [106] Considered to be fifthmostwidely usedwritingsystem in theworld thiscombinedBangla-Asamiyā-ManipuriScript (showingsomevariationsforAsamiyāandMeiteiorBisnupriyaManipuri)(130)wasusedintheeasternIndianSanskritmanuscriptstooForChakma in IndiaandBangladeshandforKokborok inTripura itwasandstill isoneof thescriptsusedAclosevariant calledTirhutā (123 now available also in UNICODE 100 as 11480 114DF See 110) or
2 The UN declared Ekuśe February (21st February) as the International Mother Language Day at the UNESCO General Conference in Paris on 17 November 1999 ldquoin recognition of the sanctity and preservation of all vernacular languages in the worldrdquo22 3 Bāṅlā Bhāṣā Kamiṭi 1983 Bāṅlā Bhāṣā Kamiṭi Riporṭ (Report of the Bangla Bhasha Committee) Dhakaː Śikṣā Dharma Krīṛā O Saṅskṛti Mantraṇālaya Peoples Republic of Bangladesh 4 Chakraborty Rajib 2018 The Fishermenrsquos Community A Language-Culture Interplay (A Study of Post-1971 Select Bangla Novels) Unpublished PhD Dissertation Visva-Bharati 5William DwightWhitney in his SanskritGrammar unequivocally said ldquoThis name (Devanagarı) is ofdoubtfuloriginandvaluerdquo(WhitneyWilliamDwight1994reprintSanskritGrammarNewDelhiːMotilalBanarasidassPublishersp1)
4
Mithilākṣarawasused forMaithili fromthe14thCenturyuntil theearly-20thcentury[106]InthiscontextonefindsamentionoflsquoSylhetiNagarılipirsquoorlsquoSilotirsquo(addedtotheUnicodeStandard inMarch2005with thereleaseofversion41) thedetailsofwhichcouldbeof interest only tohistorians andhistorical linguists (See137and144)ButSylhetiBanglaisgenerallywrittenbymanyinthemodern-dayBanglascriptnowforallpracticalpurposes Originallyduring thereignof thePāladynasty (750-1154AD) intheeasternIndiaandevenearlierperhapsduringtheMallaperiod(694ADonwards)thepresent-dayBanglawritingsystemgotashapecomparabletothemodern-dayones[111 119] A pictorial description of Brāhmī to Modern Bangla Script could bepresentedhereinatabularform
The inscriptional evidence in Brāhmī is found in the Archaic Brāhmī from the 3rdcenturyBC tothe1stcenturyBCandinMiddleBrāhmīndashsoonafter(1st-3rdCenturyAD)andthenonintheLateBrāhmī(4th-6thCenturyAD)ThisevidencecouldbeseeninbothBangladeshandWestBengal [108]by1)TheMahasthanagara(BogradistrictBangladesh mdash the ancient name being Pundranagara or Paundravardhanapura)inscriptions 2)Brāhmī (andKharoṣṭhī) inscriptions from the lower lsquoGangeticBengalrsquoand (3) Copper plate inscriptions of the Imperial Guptas fromNorthernpart ofWestBengal andNorth-West Bangladeshmdash in the areas underDharmaditya Gopachandraand Samācāradeva (about whom one only knows from five Copper-plates found inKotalipara in the Faridpur district in Bangladesh one in Mallasarul in the Burdwandistrict(WestBengal)andoneinJayramapura(BallesvaradistrictnowinOdisha)
5
Theseepigraphs fromtheeasternpartofUndivided India (datingback to the4th-6thCenturiesAD)showedsomecharacteristicfeaturesofletters(especiallyinমlsquomarsquoলlsquolarsquo
শlsquosarsquoসlsquosarsquoandহlsquoharsquo)whichledtothedevelopmentofeasternvarietyofGuptascriptEpigraphicrecordsfromBangladeshdemonstrateremarkabledevelopmentsinEasternBrāhmī In this context the Tippera copper plate inscription of the lsquoSamatatarsquo rulers(139 pp 265) such as Lokanātha (dated 7th Century AD during the latter half) theKailaninscriptionofSy ridharanaRātaaswellastheAstafpurcopperplatesThelettersseemtohangdownfromwedgeshapedsolidtriangleswithrighthandverticalsbendingdownatthebottombecauseofwhichitwasdescribedbyPrinsepandFleetasKuṭila-lipi (literally lsquoCursivewriting stylersquo)whereas the termSiddhamātrikā (as amatra orbarisplacedovereachoftheletters)wasusedbyAlBiruni(973-1048)todesignatethescriptofNorthernIndiaThenextstageofdevelopmentisillustratedbythe9thCenturycopper plate inscriptions fromKhalimpur of the reign of Dharmapāla fromMonghyrand Nalanda of the time of Devapāla in Bihar and from Jagjıvanpura (Malda) of thereignofMahendrapālaTheSiddhamātrikā(mentionedaslsquoSiddhamrsquoinChinesesources)issaidtohavebeenprevalentalsointhisregionuptotheendofthetenthcenturyAlsocalledtheGauri(ieGandi)inPūrvadeśāortheEasterncountryitwasregardedasthesame script to which is given the appellative Proto-Bangla characteristics inrudimentaryformsintheperiodbetweenAD875andAD1025Insomeepigraphs it isconsideredasbelonging to thesecondquarterof theeleventhcenturyADFlatteningofhead-marksbecomesprominentincomparisontothewedge-shaped serifs An important landmark in the development of the Bangla script is theRamaganja copper plate inscription of Mahāmānḍalika in the last quarter of theeleventhcenturyADItistheearliestdocumentfromthisentireregionwhichbearsthelettermwithatickrisingupwardsThefullvowelidevelopsatickattherightendofthe upper horizontal bar above and a curved hook below Initial e approaches themodernBanglacharacterAmature formofProto-Bangla the immediateprecursorofBanglascriptisillustratedintheinscriptionsoftheVarmanaSenaandDevarulersofthetwelfthandthirteenthcenturies[104]TheevolutionoftheBanglascript(Cf136)isalignedwiththestoryofadvancementofprintingtechnologyThefirstldquoMovabletyperdquoscriptstechnicallycreatedandusedwhileprintingNathanielBrasseyHalheds (1751-1830)1778-book titled AGrammaroftheBengalLanguageIn1785Governor-GeneralWarrenHastings(1732-1818)requestedanother civilian Charles Wilkins (1749-1836) to cut punches for Bangla printingcharactersThecurrentprintedformofBanglascriptappearedsoonafterItisgenerallyagreedthatWilkinsdevelopedBanglaprintscript[111]HepassedonthisknowledgetoPancananaKarmakara(-1804)arenownedartistinBengalLateritwasKarmakarand his family that became famous in Bangla printing technology Shepherd wasanotherassistantofWilkinsinthisdesigningofscriptwhichbecamemoreangularwithsharperturnsandedges[133]Afewarchaiclettersweremodernizedduringthe19th
6
century It was standardized by Pandit Ishwar Chandra Vidyasagar when the Banglatypefontsweretobeusedtopublishona largescaleundertheCalcuttaSchoolBookSociety[116forseveralreferences]Much later in1935 theLinotypetechnique inventedbyOttmarMergenthaler(1854-1899) in 1886was introduced intoBangla printing in 1935 by the efforts of SureshChandra Majumdar (1888-1954) Rajsekhar Basu (1880-1960) Jatindra Kumar Sen(1882-1966)andhisdiscipleSushilKumarBhattacharyaandhadbegunbeingusedbytheA nandabazaraPatrikagrouplaterfollowedbyothersWithinafewyearsthemoreadvancedmonotypetechnologycametobeusedinBanglaprintingHoweverinBanglaprinting culturemonotypehas a very limited acceptance and linotype held stage tilleventuallythedigitaltechnologycameintoreplaceallearliertechniquesAllthesecouldbepresentedinatable
32LanguagesConsideredBelowisthetabularrepresentationofthelanguagesusingBanglascriptthatareplacedonEGIDSScale1-6 (See117 fordetails) Some languagesunderEGIDS5 and6havealso developed their own scripts for printing and publishing Some had used Banglascriptearlier(suchasBodo)orusedit inWestBengalatsomepointoftime(Santali)but have later shifted to another writing system Bodo is now written in Nāgarī orDevanāgarīandforSantalioneusesbothNāgarīDevanāgarīandOl-chiki(145)Forthepurposesof theBanglaLGRonly languagesbelonging to theEGIDS scale1 to4havebeenconsideredConsiderthefollowingtable
9
EGIDSScale1
EGIDSScale2
EGIDSScale3
EGIDSScale4
EGIDSScale5
EGIDS6
Bangla(Bengali)
SantaliBodoRiangKhumiMru(ng)Asho
LepchaPnarKodaKoraChak
Asamiyā(Assamese)
KochorRajabansı
MaltoorMalpahariya
ManipuriorMeitei
BisnupriyaManipuriKok-Borok(TripuraampBangladesh)
ChakmaHajongMundariampKurux(ofBangladesh)
TotoRohingyaTipperaMegamTanchangya
Usoi LimbuSadriorOraon
BhumijorMundariBawmChin
Table4MainlanguagesinIndiaandBangladesh
thatuseBanglaScriptontheEGIDSScale
33NotableFeaturesofBanglaScript[150]BanglaWritingSystemhascertainfeaturesthatshowhowithastobewritteninorhowtype-setting inBangla couldbedoneThis section is followedbya section that explains theCode-points (and fixed Code-point sequences) which show certain distinctive characteristics ofBanglaandwhichmaketheRepertoireThenextsectionswillalsocoverthelsquoaksharrsquo-formationrules(ABNF)showingcharacterclassWordLevelEvaluation(WLE)andContextRulesaswellas In-ScriptandCross-ScriptVariantsHerewepresentsomebasic featuresof theScriptandPronuncition The Bangla script is an alpha-syllabic writing system in which writing of all
consonants are assumed to contain an accompanying lsquoinherentrsquo vowel(theoretically before or after each consonant) It varies between ɔ and odepending on the position of the consonant in the word At times theselsquoassumedrsquoorlsquoinherentrsquovowelsarenotpronouncedatall[142]
10
Vowelscanbewrittenasindependentlettersorbyusingavarietyofdiacriticalmarks which are written above below before after or both of the last twopositionstheconsonanttheyfollowinpronunciation[105]
WhenconsonantsoccurtogetherinclustersspecialconjunctlettersareformedInprintedBanglamanyof theseconsonantal clustersorconjoinedconsonantsareinuseThelettersfortheconsonantsotherthanthefinaloneinthegrouparegenerally reduced But there are a few special conjunct characters which arecompounds of the consonant characters eg 7(k)+ষ(s)=8(ks)
9(n)+জ(j)=(nj)(j)+ঞ(n)==(jn) gt (h)+ম(m)=(hm) There are other issuesalsomdashরasthesecondmemberofaclusterisreducedtoasecondarysymboleg
(p)+র(r)=A(pr)B(s)+C(t)+র(r)=D(str) (as inউD ustra ldquocamelrdquo)য (y)whenusedas a primary symbol represents jɔ in Bangla But its secondary symbol(allograph) jɔ-phala has two phonetic values When added to the initialconsonant in a word it is a vowel aelig (as in শGামল (syamala) ldquogreenrdquo র Gাপার
(ryapara)ldquowrapperrdquoetc)Butafteranon-initialconsonant it justdoublesit in
pronunciation (as in কাযH ধাযH etc) The I(r)+য(y) combination has two
shape of the second member is changedmdasheg M(ddh) N(gdh) and O(ndh)
respectively The solitary example of I (r)+ঋ(r)=ঋH (as in ৈনঋHত nairrtSouthwest) ndash usedmostly in cases of Classical borrowings shows the use ofsecondary symbol of a consonant followed by the primary symbol of a vowelTheinherentvowelonlyappliestothefinalconsonantofthecluster
The Bangla script has at least fifty-two primary symbols and quite a fewallographs(positionalvariantsofthem)correspondingtoforty-four(7oraland7nasalvowelsand30consonants)phonemes(150)orfunctionalspeechsoundswithsomeobviousredundanciesalthoughinoneofthefirstphonemicanalysisthenumberwasthoughttobethirty-fivephonemes[140]
11
As mentioned above in Bangla several graphemic symbols have secondaryshapestechnicallycalledlsquoallographsrsquowithacomplementarydistributionineachcaseThesegraphsormarkingsaregenerallyaddedtothefollowingpositionsoftheprimarysymbol[113]inthefollowingmanner
Asforcomplementarydistributionofvowelletters(word-orsyllable-initial)andVowel Matras which are relevant for ABNF let us consider the followingBesidessomesimpleVowelModifierscalledlsquoKarsrsquoinBangla(alsoreferredtoasMatraintheotherLGRdocumentsofNeo-Brāhmī)therearesomecombinatorymodifiersofBanglaVowelswithcertainconsonantsForexamplewhereas
আU+0986BENGALILETTERAAissubstitutedby
াU+09BEBENGALIVOWELSIGNAA
ইU+0987BENGALILETTERIissubstitutedby
pre-posedিU+09BFBENGALIVOWELSIGNI
ঈU+0988BENGALILETTERIIissubstitutedby
ীU+09C0BENGALIVOWELSIGNIIor
উU+0989BENGALILETTERUissubstitutedby
U+09C1 BENGALI VOWEL SIGN U by marking below the primary
grapheme there are some special vowel modifiers of উ as in the followingcombinedletters zwnj guratherthanwritingasগ(g)+ (u)
m (bh)+র (r)(n bhruldquoeyebrowrdquo)o (s)+র (r)(psru)ঋ (r)afterহ (h)(q hr)etc
TherehavebeenmanynotablecontributionsinsimplifyingandmodifyingBanglaspellings and combinatory techniques especially by scholars such as PabitraSarkar(1992)[134]Inthistherehasbeenanattempttoreducethenumberofallographs of both vowels and consonants in clusters and it has been widelyacceptedintheprintingofschooltextsinbothBangladeshandWestBengal[151152]Asofnow twosystems theold (traditional) and thenewgoon sidebysideoperativeindifferentdomains
HoweverinpreparationofthisLGRdocumenttheaimhasbeentoconsiderthewidelyused and usable sequences and combinations and their variations across the sisterscriptsbelongingtothebasketofBrāhmīwritingsystemsBanglaAcademyDhakapublishedStandardBanglaSpellingRulesin1992followingtherecommendationsofacommitteeconstitutedthroughaworkshopjointlyorganizedbytheJatıyaSy iksakramaandPathyapustakaBoardin1988AthroughlyrevisededitionoftheRuleswaspublishedinSeptember20126After the establishment of Banla A kademi ofWestBengal in 1986 its first PresidentAnnadasankar Ray (1904-2002) in his inaugural address gave a direction forstandardizationofBanglaalphabetscript thespellingsystemandclearlyarguedthattheywouldnotblindly followtheSanskriticmodelofconventionalgrammarAbroadlistofproposalswassenttoexpertsonBanglaandabroadagreementwasreachedforlsquohomogenizationofBanglaspellingrsquoby1988BasedonopinionsreceivedfromdifferentquartersaunanimouslistoflsquorulesrsquowasagreeduponThiswaspublishedbyalsquoSpellingDictionaryrsquo titled Ākādemi Bānāna Abhidhāna (1997) which was obviously morecomprehensive than lsquoTheUniversityofCalcuttaproposalsrsquomade in1936Alongwiththe lsquorationalizationrsquo of spellings another stepwas taken tomake thewriting systemeasier to read by making the symbols used both single and combined ones morelsquotransparentrsquoThesereformswereoriginallysuggestedbySarkar(1987firstpublishedin1978)[134][153]whereheusedthetermsSwaccha (lsquoTransparentrsquo)andAswaccha(lsquoOpaquersquo or non-transparent) even adding Ardha Swaccha (lsquohalf transparent) inbetweenthetwoSomesampleexamplesare Transparent r (nn) s (pt) [ (st) where both member of the cluster can berecognized
6Bangla Academy 2012 Bāṅlā Ekaḍemī Pramita Bāṅlā Bānānera Niyama (Bangla Academy StandardBanglaSpellingRules)DhakaːBanglaAcademy
Therewere in fact two types of proposals One concerned the shape of the lettersthose of consonant + vowel (CV) combinations and conjuncts which is consonant +consonantcombinationsTherewerefurthercomplexshapesiethoseofconsonant+consonant+ (consonant+) vowel (CC(CV) signs as in y (pru) or z (skru) SomedecisionsinthisareawerenecessarybecauseafewoftheCC(C)symbolsrepresentedcomplexitiesthatmadelearningthemdifficultforthechildrenTheotherdealtwiththespellings ofwords onlywithout any reference to the shapes of letters inwhich theywere written The basic objective here was lsquoone word one spellingrsquo to the greatestextentthatwaspossible[151]
Belowwe place a statement of themost salient changes that affect the consonant +vowelcombinations[153]
a The variants of the short u (^ উ-কার hrasva u-kāra) vowel sign have been brought down to one ie So zwnj (gu) is now গ Similarly h (ru) gt র zwnj (śu)gt শ j (hu)gtহ and therefore cluster + short u sign k (ntu)gt W
(ন++ত+উ) (stu)gt[ (স++ত+উ)
b The variants of long u (দীঘH ঊ-কার dīrgha u-kāra) have also been reduced
(rū)gt র n (bhrū) gt Y (ভ bh++র r+ঊ ū) (drū)gt (দ d++র r+ঊ ū) p (śrū)gt (শ ś++র r+ঊ ū)
c The variants of ঋ-কার (ṛ-kāra secondary symbol of ṛ) have been brought down to one q (hṛ) gt হ
Regarding consonant + consonant + (consonant)hellip+ (vowel) clusters PaschimbangaBanglaAkademi proposed transparent or semi-transparent shapes for clusters to theextentadmissibleinBanglawritingsystemSomeexampleswillclarifytheproposal(Aslashwillmeanthatthetraditionalcluster-shapeprecedesitwhiletheBanglaAkademiinnovationfollows)[153]
331TheConsonantsAsper traditional classificationBangla Consonants are categorized according to theirphoneticpropertiesespeciallyintermsofplaceandmannerofarticulation[107]Thereare Five lsquoVargarsquo (pronounced as lsquoBargarsquo in Bangla) or Groups (sets or classes)distinguished by Place of Articulation and one Non-lsquovargarsquo group [105] Each Vargawhich corresponds toStopsat a certainplaceof articulation containsa seriesof fiveconsonants classified as per their phonetic qualities (ie manner of articulation)beginning from Unvoiced and Unaspirated to Voiced and Aspirated (in the fourthcolumn)finallyendingwithaHomorganicorCorrespondingnasal[107]Considerthefollowingtable
332TheImplicitVowelKillerHasanta(calledrsquoHalantrsquoorlsquoHalantarsquoinotherBrahmı-basedscripts)As stated earlier all consonants are pronounced in isolation with an implicit vowel(centralback-ɔinBanglaastheneutralvowel)assumedtobeassociatedwiththem[121]ThelsquoHasantarsquo(=rsquoHalantrsquoorlsquoHalantarsquoinotherBrahmı-basedscripts)orthetermlsquoVirāmarsquo7(=rsquoDa rirsquoinBangla)aspreferredinUNICODE(cfUnicode30andabove)havebeenusedinthisreportastermsthathavebeenusedtodenotethecharacterthatmarkthe absence of this inherent vowel It may be noted that the term virama has beenadopted in UNICODE in a sense that is different from the traditional definition ofgrammarandhenceitrequiressomeexplanationhereConsideringtheimportanceofthedocumentthisnoteshouldbeapartofthisLGRdocumentsothatanybodyreferingtoitshouldbeabletoknowthepropergrammaticalexplanationofthetermBecauseaspecialsignisneededwheneverthisimplicitvowelisstrippedoffthesymbolisknownas the Hasanta (= Halant) (U+09CD) By placing the Hasanta under the firstconsonantofacombinationorclusteronecouldndashincommonparlanceldquokillrdquoitsvowelandcreate conjuncts In thismanner conjunct characters canbegenerallywrittenbyjoining two to fourconsonant combinations In rarecases thisprocess can joinup tofive consonantsHowever thenotionof amaximumnumberof consonants joining toformoneaksara8istobeboundedempiricallyThisisanobservationbasedontheCIIL-Emille Corpora of Bangla words [132 amp 133] as seen in print these days Given themixture of scripts and languages happening on theweb the possibility that onemaywant a generic Top Level Domain [gTLD] which may have more than the observedmaximum cannot be ruled out This can be the case when a foreign language wordwhichadmitsalargenumberofconsonantsistransliteratedintoBanglaHenceintheBanglaLGRworkthislimitwillnotbeenforced
333VowelsSeparate symbols exist for all lsquoSwararsquo or Vowels in Bangla which are pronouncedindependentlyeitheratthebeginningofthewordorafteranothervowelorconsonantsoundToindicateaVowelsoundotherthantheimplicitoneaVowelsigncalledlsquokārrsquoinBanglaorMātrāinNagarı9isattachedtotheconsonantSincetheconsonanthasthisbuilt in neutral vowel at the end there are equivalent kāras (Mātrās) for all vowelsexcepttheঅ(pronounced-ɔ)Thecorrelationisshownasfollows
7VirāmaasusedhereisalsoamisnomeraccordingtotheIndiangrammaticaltraditionsNowheremereabsence of a vowel is marked as virama Hasanta just marks the absence of a vowel nothing else(AbhyankarKashinathVasudevampJMShukla1961ADictionaryofSanskritGrammarBarodaːOrientalInstitute)8ThistermneedstobedisambiguatedAksaraalsomeanslsquosyllablelsquoinIndiangrammaticaltreaditions9AlthoughthetermlsquoMatralsquoinBanglastandsforanaltogetherdifferentconceptvizthetopbarplacedoveraletterndashtypicallyavailableinHindiandBanglabutmissinginGujarati
334TheAnusvāraonuʃʃār(ং-U+0982)TheAnusvāra or onuʃʃār inBangla at times represents a homorganic nasal but notalwaysItreplacesaconjunctgroupofalsquoNasalConsonant+Hasanta+ConsonantrsquowherethesecondconsonantbelongstotheVelarvargaorsetasinলংকাButitoftenappearsalso for such combinations involving non-velars appearing as the lastmember of thecombinationasinলGাংটা ldquonakedrdquoorলGাংচা ldquoakindofsweettolimprdquoBeforeanon-vargaconsonant the Anusvara represents a nasal sound that may have an alternativeconjoined writing symbol representing the corresponding nasal consonant of theparticularsetAlthoughModernHindiMarathiandKonkaniprefertheanusvāratothecorrespondingHalf-nasal inBangla it isclearlydemarcatedastowhereonemustusetheAnusvāraandwhere ithastobeaconjunctclusterwithanasalasthefirstorthesecondcomponent
335NasalizationCandrabindu(-U+0981)
Candrabindu denotes nasalization of the preceding vowel as in চাদ cad lsquomoonrsquo(U+099AU+09BEU+0981U+09A6)Thissignwithadotinsidethehalf-moonmarkisusedasnasalizationmarkerinmanyBrahmı-basedscripts[143]
336Nukta(-U+09BC)ThenuktasigndoesnotexistinBanglaorthographyItispredominantlyusedinmanyBrahmıderivedscriptssuchasDevanagarı(forHindiBodoMaithiliSantaliKashmiriandSindhiThetermandtheconceptofnuktaareborrowedinBanglaTheIDNAProtocol(RFC5891)statesthatIDNsmustbeinUnicodeNormalizationFormC (NFC) RFC 7940 applies this requirement to LGRs The definition of NFC in theUnicodeStandardcontainsanumberofcompositionexclusionsAsaresulttheBanglalettersয় YYAড় RRAandঢ় RRHAhavetoberepresentedinthethisLGRbyusingthesequences (YA +Nukta U+9AF + U+09BC) (DDA + Nukta U+9A1 + U+09BC) and(DDHA+NuktaU+9A2+U+09BC)insteadofthesinglecodepointsYYA(U+9DF)RRA(U+09DC) andRRHA (U+09DD) although the useof lsquoNuktarsquo is otherwise completelyunnaturalinBanglaIt is noted that in the current Unicode Standard chart these characters are listed asadditionalconsonantsAspertheLGRProcedurehoweverthesedecisionsdependontheIDNAProtocolthroughasetofprodeduresdevelopedbytheIETFEventhoughtheUnicode Standard also prescribesmethods to produce these three characters both asatomiccharacters (forexample09DC forড় [r]09DD forঢ় [rh] and09DFasয় [y]assinglekeystroke)theIDNAprotocolrequiresthatwetreatthemasconjunctcharactersandthenallocatecodesfortheseintheUnicodeBengaliBlock
18
ItmaybenotedthattherecouldbesporadicattemptsorcasesofwritingMuslimnamesUrdupoeticwordsandPerso-Arabicloanwordswithnuktaunderক(k)খ(kh)গ(g)জ(j) and ফ (ph) only for the sake of correct pronunciation and for maintaining thesanctityoftheloanwordThesewerealsolikeusingBanglawritingsystemtoworkliketheIPAscriptItishowevernotinuseinBanglawritinginprinting
337Visargabiʃɔrgo(ঃ-U+0983)andAvagraha(ঽ-U+09BD)
TheVisargabiʃɔrgoU+0983 is frequentlyused inBangla loanwordsborrowed fromSanskritandrepresentsasoundveryclosetohOnecouldquoteasanexampleদঃখduhkholdquosorrowrsquorsquoldquounhappinessrsquorsquo(U+0926U+0941U+0983U+0916)The Avagraha ঽ (U+09BD) is mainly used in Sanskrit Pali Prakrt or Maithili textswritteninBanglaItisgraduallybeingreplacedbyanuppercomma(egনেরাঽপরািণre-writtenasনেরাrsquoপরািণ)ItisrarelyusednoweveninotherlanguagesusingBanglascriptIncaseofLGRtheAvagrahaisnotpartoftherepertoireIthasbeendecidedthereforenottoretainAvagraha(ঽ)(U+09BD)becauseitisblockedinTLDsaspertheMaximalStartingRepertoire(MSR)PleaseseeAppendixIIinsection11foracompletelistofBanglaconsonantsandtheirallographs
type in the following mannermdashর++য but for র G the sequence would be
র+ZWJ++য [154] In other words ZWJ is used in the rendering of wordsdemanding ya-phalā after ra which is otherwise not possible to type (render)due to the same order of ra+hasanta+antastha ja in the medial andor finalposition Interestingly ra+hasanta+antastha ja is used to type repha on theconsonant -antasthaja as inকায6 (kaarjo) In order to get a ya-phalā after the
consonant -ra it is therefore obligatory to use ZWJ after -ra as in র Gাপার
TheuseofZWJZWNJhavebeenruledoutfromtherootzonebythe[Procedure]Usedin Bangla to create alternate renderings the insertion of these two signs can affectsearchingaswellasNLPTheZeroWidthNon-joiner(ZWNJ)isaninvisiblecharacterusedincertaincases(afterHasanta)wheredefaultconjunctformationistobeexplicitlyrestrictedandtheHasantajoiningthetwoconsonantsparticipatingintheconjunctformationneedstobeexplicitlyshown
339UseofYa-phalaaYa-Phalaasequencesare two instances inBanglawhereHasanta isprecededbya fullvowel(U+0985অ-BENGALILETTERAandU+098Fএ-BENGALILETTERE)
For renderingYa-phalā followedbyঅ andএ it isnecessary to typeU+09CDHasantaplusU+09AFyaprecededby thesaidvowelsThis isapurely ligaturalentityand theadditionofYa-phalāandākaraisusedtoelicittheaeligsoundasinEnglishacidঅGািসড
association অGােসািসেয়শনlsquobatrsquoবGাটlsquofatrsquo ফGাট lsquomatrsquo মGাটlsquocaprsquoকGাপetcTheBrāhmīscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribed as lsquovowel killerrsquo although it actually indicates absenceof a vowel after themarkedconsonantOnly theconsonantscanhave theHasantamarkedButasweseehereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅ8াandএ8া(CfUnicode100p473[100])
Owingtoco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egT ieka+Hasanta+raasinচTchakraldquocyclerdquo)Thepointisinboth the cases the slot for ra could be Bangla ra র (U+09B0) or the Assamese ra ৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesameTheLGRmakesanoteofthispoint of concern with respect to the two RAs in disguise as it would be compeltelyimpossibletodistinguishbetweenthemwithnakedeyesinalablesogeneratedwhichmay consequently lead to concerns related to spoofing and other kind of cyberirregularitiesThemotivetoclassthesetwoCPsas(blocking)variantsisbecausefullyrendered labels may mask the distinction between Bangla ra র (U+09B0) or theAssameseraৰ(U+09F0)ThatprovidesthejustificationforVariantSet4thoughonlyinthecontextoffollowingHasantThedifferencebetweentheRAsisonlydistinguishableifonelooksintotheirUnicodevaluesThereforelabelssuchasঅকHarka শীষH sırsalsquotopapexrsquo অY abhra lsquocloudthe skyrsquo ম śrama lsquophysical labourrsquo could be extremelydangerous as theweb-usermay never verify the digital content (the labels) with itsunicodevaluecodepoints ThispointismadeexplicitlywithreferencetoTable9(ofsequencesp36)andTable16(ofWLESymbolsp47)thataretofollowMoreoverit
4 OverallDevelopmentProcessandMethodologyThe Neo-Brāhmī Generation Panel (NBGP) has been formed by members havingexperience in Linguistics (especially in NLP Computational linguistics) LiteratureLanguageHistoryandEpigraphyUndertheNeo-BrāhmīGenerationPanelBanglaandeightotherscriptsbelongingtoseparateUnicodeblocksarebeingtakenuptoassignaseparate LGR for each However an attempt ismade to ensure that the fundamentalphilosophybehindbuildingthoseLGRsconsistentwithallotherBrāhmī-derivedscriptsThepresentLGRwillcater tomultiple languagesbelongingtoEGIDSscale1to4(seeTable4)thatuseBanglascriptThefollowingguidingprinciplesareusedinmakingdecisionsaboutBanglaLGRCode-points
41 GuidingPrinciplesTheNBGP adopts followingbroadprinciples for selection of code-points in the code-pointrepertoireacrosstheboardforalltheNeo-Brāhmīscriptswithinitsambit
411 InclusionPrinciples4111 ModernUsageEvery character proposed should be in the everyday usage of a particular linguisticcommunityThecharacterswhichhavebeenencodedintheUnicodefortranscriptionpurposesonlyorforarchivalpurposeswillnotbeconsideredforinclusioninthecode-pointrepertoire4112 UnambiguousUseEvery character proposed should have unambiguous understanding among linguistsaboutitsusageinthelanguage
42 ExclusionPrinciplesThe main exclusion principle is that of External Limits on Scope These consist ofprotocolsor standardswhichareprerequisites to theLabelGenerationRule-setsAllfurtherprinciplesareinfactsubsumedundertheselimitationsbuthavebeenspeltoutseparatelyforthesakeofclarity
22
421 ExternalLimitsofScopeThecodepointrepertoireforrootzonebeingaveryspecialcaseatthetopofprotocolhierarchies thecanvasofavailablecharacters forselectionasapartof theRootZonecodepointrepertoireisalreadyconstrainedbyvariousprotocollayersbeneathitThefollowingthreemainprotocolsstandardsactassuccessivefiltersiTheUnicodeChartOut of all the characters that are needed by the script in question if a particularcharacter is not encoded in Unicode it cannot be incorporated in the code pointrepertoire Such cases are quite rare and especially so in Bangla-Asamiyā-ManipuriWritingSystemgiventheelaborateandexhaustivecharacterinclusioneffortsmadebytheUnicodeconsortiumiiIDNAProtocolUnicode being the character-encoding standard for providing the maximum possiblerepresentation of a given scriptlanguage it has encoded as far as possible all thepossible characters needed by the script However the Domain name being aspecialized case it is governed by an additional protocol known as IDNA(InternationalizedDomainNames inApplications) The IDNAprotocol excludes somecharactersoutofUnicoderepertoirefrombeingpartofthedomainnamesiiiMaximalStartingRepertoire(MSR)TheRoot-zoneLGRbeing the repertoireof characterswhicharegoing tobeused forcreationoftheRoot-zoneTLDswhichinturnconstituteanevenmorespecializedcaseof domain names the ROOT LGR procedure introduces additional exclusions on theIDNArsquosallowedsetofcharacters ExampleBanglaSignAvagrahaঽ(U+093D)evenifallowedbyIDNAprotocol isnotpermittedintheRootZoneRepertoireaspertheMSRTosumuptherestrictionsstartoffwithadmittingonlysuchcharactersasarepartofthe code-block of the given scriptlanguage The IDNA Protocol further narrows thisdownandfinallyanadditionalfilterintheformofMaximalStartingRepertoirerestrictsthecharactersetassociatedwiththegivenlanguageevenmore4211 NoPunctuationMarksTheTLDsbeingidentifierspunctuationmarkerspresentinBraHami-basedscriptswillnotbeincluded
23
4212 NoSymbolsandAbbreviationsAbbreviations weights and measures and other such iconic characters like BANGLAISSHAR(U+09FA)BANGLACURRENCYDENOMINATORSIXTEEN৹(U+09F9)etcwillalsonotbeincluded4213 NoRareandObsoleteCharactersThere are characterswhich have been added toUnicode to accommodate rare formssuchasSanskriticVOCALICRRৠ(U+09E0)andVOCALICLldquoঌrdquo (U+098C)aswellasVOCALICLLৡ(U+09E1)andtheallographicndashkaraformsofthelattertwosymbols-VOWELSIGNVOCALICL(U+09E2)andVOWELSIGNVOCALICLLldquo(U+09E3)Allsuch charactersareexcludedwhich complieswith theConservatismprincipleas laiddownintheRootZoneLGRprocedureHoweverinBanglathe-karacorrespondingtoVOCALICRRৠ(U+09E0)whichisVOWELSIGNVOCALICRRldquordquo(U+09C4)isstill inactiveuseincertainlimitedborrowedorSanskriticwordsandarethereforeretained4214 NoStressMarkersofClassicalSanskritandVedicStressmarkers for classical Sanskrit will not be included This is also in consonancewiththeLetterprincipleaslaiddownintheRootZoneLGRprocedure4215 ABNFThe Augmented Backus-Naur Formalism (ABNF) is described in Section 541 andAppendix(Section101)
5 RepertoireTheBanglaWritingSystemisrepresentedinUNICODEusingtheBengali(Bangla)scriptname as enumerated in ISO 15924 corresponding to languages such as Asamiyā(Assamese) Bangla (Bengali) and Manipuri The BENGALI block used for Bangla-Asamiyā-Manipuri in theUNICODEhas93 entriesThis sectiondetails the code-pointrepertoirethattheNeo-BrāhmīGenerationPanel[NBGP]proposestobeincludedintheBanglaLGRItmaybementionedherethat theGovernmentofAssamhassubmittedaproposal toBureauof Indian Standards (BIS) on26th February2016 for dis-unificationofBanglaand Asamiyā Scripts The BIS inits 8thMeetingofIndian Language Technologies andProducts Sectional Committee LITD 20 held on 23rd Aug 2017 decided torefer theproposalforrecognitionofAssamesescriptinISOIEC10646toISOUntiltheUNICODEConsortiumtakesanyfurtheractionitwillbeassumedthattheCodePointRepertoireunderTable11willbevalidforallthethreelanguagesasabove
24
For each of the code points language references have been given in the last columntitledReferenceunderTable8titledtheldquoCodePointRepertoirerdquoForentirecoverageofBanglacodepointsreferencesofBanglaAsamiyā(Assamese)Manipuri(Meitei)andBishnupriya are given Kokborok written in Bangla script is not known to haveintroducedmanynewcomplicationsexceptforoneparticularcharacterThoughonlyafewrepresentativelanguagesunderEGIDSScale1-4havebeenchosenforreferencingthey together cover all the code-points required for all the languages that NBGP hasconsideredasgivenunderBanglaUnicodePoints(asgiveninUNICODE63)Howeverbefore thedetailsarepresented it is ideal to lookat theBanglaCodePointChartfromMaximalStartingRepertoire[MSR]Version3Itmaybenotedthattheshapesofthereferenceglyphsgivenbelowinthecodechartsarebasedononeofthemanyfontsdesigned and are not prescriptive because there could be some variations in actualfonts ndash both UNICODE-compatible and True-Type ones Consider the following Codepointtable
ColourconventionAllcharactersthatareincludedinthe[MSR]-YellowbackgroundPVALIDinIDNA2008butexcludedfromthe[MSR]-PinkishbackgroundNot PVALID in IDNA2008 or are ineligiblefor the root zone (digits hyphen) - Whitebackground
Apart from the above individual code-points the Neo-Brāhmī Generation Panel alsoproposes some specific sequences which enable conditional inclusion of the BanglaLETTER A and E followed by Bangla SIGN VIRAMA and Bangla LETTER YA againfollowed by Bangla VOWEL SIGN AA in the repertoire for enabling inclusion of aeligsoundasinEnglishlsquobatrsquolsquocatrsquoetc
52 CodePointRepertoireExclusionTherearesomecharactersoftheBanglascriptthatfindplaceintheUnicodebuthavenot been included in the repertoire in the LGR proposal The reason for excludingঌ(U+098C)andৗ(U+09D7)isthattheyarerareandobsoletecharacters
53 CodepointnotusedaloneBENGALI SIGN NUKTA U+09BC (See 336) is excluded from repertoire since it will never be used alone It will be used as sequence in three special characters in normalized form for ড় ঢ় য়
542 TheVowelSequenceInwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaare given To facilitate understanding of other Brahmi script users equivalents inDevanāgarīareprovidedwherevernecessaryAvowelsequenceismadeupofasinglevowelItmaybefollowedbutnotnecessarily(optionally)byanAnusvāraonuʃʃār(B)Candrabindu(D)oraVisargabiʃɔrgo(X)ThenumberofDBorXwhichcanfollowaV inBanglamaynotberestrictedtooneGoingbytherules illustratedinthedocument it isclearthat formationssuchasVDDVBBandVXXare invalidorthographicunitsHowever it isvalidandpossible tohaveformationsorsequencessuchasanusvarafollowedbyachandrabinduononehandandvisarga followed by a chandrabindu on the other as in হ8াংচা lsquohaelignchārsquo and lsquohaelignrsquo হ8াঃrespectivelyThepossibilityof aVisarga orAnusvāra (onuʃʃār) followingaCandrabinduexists inBangla Vowel can optionally be followed by a combination of Hasanta Virama [H]Consonant [C] to formaYa-phala ldquoYa-phala isapresentation formofU+09AFBanglaletterযorlsquoyarsquoRepresentedbythesequenceltU+09CDieBENGALISIGNVIRAMABangla SIGNHasanta or VIRAMA U+09AF -য BENGALI LETTER YAgt Ya-phala has aspecialformয়AgainwhencombinedwithU+09BEাBENGALIVOWELSIGNAA(ielsquoaarsquo(ā))itisusedfortranscribing[aelig]asintheldquoardquointheEnglishwordldquobatrdquowritteninBanglaasব8াটAVowel-sequenceadmitsthefollowingcombinations
5421 ASingleVowel
ExamplesV অअ
5422 AVowelwithConditionsAVowelcanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D]orVisarga[X]or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX] or combination ofHasanta(orVirama)[H]followedbyConsonant[C]followedbykāra(Mātrā)[M]
5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]
5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX
Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)
545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full
10 Refer to Rule P in Section 7 Table 16
42
vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+
6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed
61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus
স +Hasanta+হ(U+09B8+U+09CD+U+09B9)
2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus
ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)
44
Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)
62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference
7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecifications
11 Unicode uses Oriya for the script although Odia is now the official term used 12 As used by the Unicode denoting and including both Assamese and Maṇipuri
47
Below are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)
C rarr Consonant
M rarr Kāra(Mātrā)
V rarr Vowel
B rarr Anusvāra
D rarr Candrabindu
X rarr Visarga
H rarr Hasanta
Z rarr KhandaTa
S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules
P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA
Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinations
49
711 Case of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াtঅuইিvয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অu)andthesecondwordbeginswithaV(ইিvয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstwordThis isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule
72 AdditionalExamplesfromBanglaABNF
Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive
ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of the
JanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka
[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018
[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473
57
10 Appendix-I
101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply
1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াwেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াxব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexampleyেসিরং(Tsering)
102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti
13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section
58
Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)
Table17ndashTheScriptTableofSylhetiNagarıorSiloti
103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints
ক k Uuml(p+ক)raquo(p+ট)(p+ত)Yacute(p+y+র)(p+y+ব)THORN(p+ন)szlig(p+ব)agrave(p+ম)ক8(p+য)aacute(p+র)acirc(p+ষ)atilde(p+frac12+ণ)auml(p+frac12+ম)aring(p+frac12+ব)acirc8(p+frac12+য)aelig(p+স)t(ccedil+ক)egrave(+p+র)
amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)
গ g ecirc(micro+গ)euml(micro+দ)acute(micro+ধ)igrave(micro+ন)iacute(micro+ব)icirc(micro+ম)গ8(micro+য)iuml(micro+র)eth(micro+ল)ntilde(ccedil+গ)ntilde6 ( +ccedil+গ)
ণ n ouml(Agrave+ট)iquest(Agrave+ঠ)divide(Agrave+ড)oslash(Agrave+Atilde+র)AElig(Agrave+ঢ)ugrave(Agrave+ণ)ণ8(Agrave+য)uacute(Agrave+ব)atilde(p+frac12+ণ)ucirc(frac12+ণ)uuml(+ণ)
Bangla prose had developed two literary styles during the 19th-20th Century TheSādhubhāṣā (সাধভাষা - Elegant Language or Style) and the Calitabhāṣā (চিলতভাষাCurrent Language orModern Style) It is the latter style that is prevalent today inwrittenproseTheLanguageMovementinBangladesh(thethenEastPakistan)beganin1948ascivilsociety dissented to the elimination of the Bangla script from currency and stampswhichwere inuse since theBritishRaj Themovement reached its pinnacle in1952when on 21 February the police fired on demonstrating students and civilianstriggeringnumerousinjuriesanddeaths2LaterfollowingtheLanguagemovementon27 April 1952 the All Party National Language Committee decided to demandestablishment of an organization for the promotion of Bengali language BanglaAcademyDhaka right from its inception in1955hasbeenengaged inpromotingandfosteringBanglaasthelinguafrancaofthecountrybeforeandafterindependencefromPakistanin1971ThroughthevariouscommissionsandcommitteesconstitutedbytheGovernment of Bangladesh (Banladesa Jatıya Sy iksa Kamisana in 1972 Jatıya Sy iksaUpadestaParisadin1979BanlaBhasaBastabayanaSelain1982BanlaBhasaKamitiin1983 etc3) after independence in 1971 Bangla was made the primary medium ofinstructioncommunication in all Governmental and educational activities Through agreatstruggleandbloodshedtheBengalisestablishedBanglaasanofficiallanguageofthestate4
31WrittenBanglaThe lsquoBangla alphabetrsquo (বাংলা িলিপ - Bānglā lipi ISO15924) is derived from theBrāhmīwritingsystemwhichisrelatedtotheNagarı(alsoknownasDevanāgarī5)script[108]aswell as to Tirhutāwriting system [106] Considered to be fifthmostwidely usedwritingsystem in theworld thiscombinedBangla-Asamiyā-ManipuriScript (showingsomevariationsforAsamiyāandMeiteiorBisnupriyaManipuri)(130)wasusedintheeasternIndianSanskritmanuscriptstooForChakma in IndiaandBangladeshandforKokborok inTripura itwasandstill isoneof thescriptsusedAclosevariant calledTirhutā (123 now available also in UNICODE 100 as 11480 114DF See 110) or
2 The UN declared Ekuśe February (21st February) as the International Mother Language Day at the UNESCO General Conference in Paris on 17 November 1999 ldquoin recognition of the sanctity and preservation of all vernacular languages in the worldrdquo22 3 Bāṅlā Bhāṣā Kamiṭi 1983 Bāṅlā Bhāṣā Kamiṭi Riporṭ (Report of the Bangla Bhasha Committee) Dhakaː Śikṣā Dharma Krīṛā O Saṅskṛti Mantraṇālaya Peoples Republic of Bangladesh 4 Chakraborty Rajib 2018 The Fishermenrsquos Community A Language-Culture Interplay (A Study of Post-1971 Select Bangla Novels) Unpublished PhD Dissertation Visva-Bharati 5William DwightWhitney in his SanskritGrammar unequivocally said ldquoThis name (Devanagarı) is ofdoubtfuloriginandvaluerdquo(WhitneyWilliamDwight1994reprintSanskritGrammarNewDelhiːMotilalBanarasidassPublishersp1)
4
Mithilākṣarawasused forMaithili fromthe14thCenturyuntil theearly-20thcentury[106]InthiscontextonefindsamentionoflsquoSylhetiNagarılipirsquoorlsquoSilotirsquo(addedtotheUnicodeStandard inMarch2005with thereleaseofversion41) thedetailsofwhichcouldbeof interest only tohistorians andhistorical linguists (See137and144)ButSylhetiBanglaisgenerallywrittenbymanyinthemodern-dayBanglascriptnowforallpracticalpurposes Originallyduring thereignof thePāladynasty (750-1154AD) intheeasternIndiaandevenearlierperhapsduringtheMallaperiod(694ADonwards)thepresent-dayBanglawritingsystemgotashapecomparabletothemodern-dayones[111 119] A pictorial description of Brāhmī to Modern Bangla Script could bepresentedhereinatabularform
The inscriptional evidence in Brāhmī is found in the Archaic Brāhmī from the 3rdcenturyBC tothe1stcenturyBCandinMiddleBrāhmīndashsoonafter(1st-3rdCenturyAD)andthenonintheLateBrāhmī(4th-6thCenturyAD)ThisevidencecouldbeseeninbothBangladeshandWestBengal [108]by1)TheMahasthanagara(BogradistrictBangladesh mdash the ancient name being Pundranagara or Paundravardhanapura)inscriptions 2)Brāhmī (andKharoṣṭhī) inscriptions from the lower lsquoGangeticBengalrsquoand (3) Copper plate inscriptions of the Imperial Guptas fromNorthernpart ofWestBengal andNorth-West Bangladeshmdash in the areas underDharmaditya Gopachandraand Samācāradeva (about whom one only knows from five Copper-plates found inKotalipara in the Faridpur district in Bangladesh one in Mallasarul in the Burdwandistrict(WestBengal)andoneinJayramapura(BallesvaradistrictnowinOdisha)
5
Theseepigraphs fromtheeasternpartofUndivided India (datingback to the4th-6thCenturiesAD)showedsomecharacteristicfeaturesofletters(especiallyinমlsquomarsquoলlsquolarsquo
শlsquosarsquoসlsquosarsquoandহlsquoharsquo)whichledtothedevelopmentofeasternvarietyofGuptascriptEpigraphicrecordsfromBangladeshdemonstrateremarkabledevelopmentsinEasternBrāhmī In this context the Tippera copper plate inscription of the lsquoSamatatarsquo rulers(139 pp 265) such as Lokanātha (dated 7th Century AD during the latter half) theKailaninscriptionofSy ridharanaRātaaswellastheAstafpurcopperplatesThelettersseemtohangdownfromwedgeshapedsolidtriangleswithrighthandverticalsbendingdownatthebottombecauseofwhichitwasdescribedbyPrinsepandFleetasKuṭila-lipi (literally lsquoCursivewriting stylersquo)whereas the termSiddhamātrikā (as amatra orbarisplacedovereachoftheletters)wasusedbyAlBiruni(973-1048)todesignatethescriptofNorthernIndiaThenextstageofdevelopmentisillustratedbythe9thCenturycopper plate inscriptions fromKhalimpur of the reign of Dharmapāla fromMonghyrand Nalanda of the time of Devapāla in Bihar and from Jagjıvanpura (Malda) of thereignofMahendrapālaTheSiddhamātrikā(mentionedaslsquoSiddhamrsquoinChinesesources)issaidtohavebeenprevalentalsointhisregionuptotheendofthetenthcenturyAlsocalledtheGauri(ieGandi)inPūrvadeśāortheEasterncountryitwasregardedasthesame script to which is given the appellative Proto-Bangla characteristics inrudimentaryformsintheperiodbetweenAD875andAD1025Insomeepigraphs it isconsideredasbelonging to thesecondquarterof theeleventhcenturyADFlatteningofhead-marksbecomesprominentincomparisontothewedge-shaped serifs An important landmark in the development of the Bangla script is theRamaganja copper plate inscription of Mahāmānḍalika in the last quarter of theeleventhcenturyADItistheearliestdocumentfromthisentireregionwhichbearsthelettermwithatickrisingupwardsThefullvowelidevelopsatickattherightendofthe upper horizontal bar above and a curved hook below Initial e approaches themodernBanglacharacterAmature formofProto-Bangla the immediateprecursorofBanglascriptisillustratedintheinscriptionsoftheVarmanaSenaandDevarulersofthetwelfthandthirteenthcenturies[104]TheevolutionoftheBanglascript(Cf136)isalignedwiththestoryofadvancementofprintingtechnologyThefirstldquoMovabletyperdquoscriptstechnicallycreatedandusedwhileprintingNathanielBrasseyHalheds (1751-1830)1778-book titled AGrammaroftheBengalLanguageIn1785Governor-GeneralWarrenHastings(1732-1818)requestedanother civilian Charles Wilkins (1749-1836) to cut punches for Bangla printingcharactersThecurrentprintedformofBanglascriptappearedsoonafterItisgenerallyagreedthatWilkinsdevelopedBanglaprintscript[111]HepassedonthisknowledgetoPancananaKarmakara(-1804)arenownedartistinBengalLateritwasKarmakarand his family that became famous in Bangla printing technology Shepherd wasanotherassistantofWilkinsinthisdesigningofscriptwhichbecamemoreangularwithsharperturnsandedges[133]Afewarchaiclettersweremodernizedduringthe19th
6
century It was standardized by Pandit Ishwar Chandra Vidyasagar when the Banglatypefontsweretobeusedtopublishona largescaleundertheCalcuttaSchoolBookSociety[116forseveralreferences]Much later in1935 theLinotypetechnique inventedbyOttmarMergenthaler(1854-1899) in 1886was introduced intoBangla printing in 1935 by the efforts of SureshChandra Majumdar (1888-1954) Rajsekhar Basu (1880-1960) Jatindra Kumar Sen(1882-1966)andhisdiscipleSushilKumarBhattacharyaandhadbegunbeingusedbytheA nandabazaraPatrikagrouplaterfollowedbyothersWithinafewyearsthemoreadvancedmonotypetechnologycametobeusedinBanglaprintingHoweverinBanglaprinting culturemonotypehas a very limited acceptance and linotype held stage tilleventuallythedigitaltechnologycameintoreplaceallearliertechniquesAllthesecouldbepresentedinatable
32LanguagesConsideredBelowisthetabularrepresentationofthelanguagesusingBanglascriptthatareplacedonEGIDSScale1-6 (See117 fordetails) Some languagesunderEGIDS5 and6havealso developed their own scripts for printing and publishing Some had used Banglascriptearlier(suchasBodo)orusedit inWestBengalatsomepointoftime(Santali)but have later shifted to another writing system Bodo is now written in Nāgarī orDevanāgarīandforSantalioneusesbothNāgarīDevanāgarīandOl-chiki(145)Forthepurposesof theBanglaLGRonly languagesbelonging to theEGIDS scale1 to4havebeenconsideredConsiderthefollowingtable
9
EGIDSScale1
EGIDSScale2
EGIDSScale3
EGIDSScale4
EGIDSScale5
EGIDS6
Bangla(Bengali)
SantaliBodoRiangKhumiMru(ng)Asho
LepchaPnarKodaKoraChak
Asamiyā(Assamese)
KochorRajabansı
MaltoorMalpahariya
ManipuriorMeitei
BisnupriyaManipuriKok-Borok(TripuraampBangladesh)
ChakmaHajongMundariampKurux(ofBangladesh)
TotoRohingyaTipperaMegamTanchangya
Usoi LimbuSadriorOraon
BhumijorMundariBawmChin
Table4MainlanguagesinIndiaandBangladesh
thatuseBanglaScriptontheEGIDSScale
33NotableFeaturesofBanglaScript[150]BanglaWritingSystemhascertainfeaturesthatshowhowithastobewritteninorhowtype-setting inBangla couldbedoneThis section is followedbya section that explains theCode-points (and fixed Code-point sequences) which show certain distinctive characteristics ofBanglaandwhichmaketheRepertoireThenextsectionswillalsocoverthelsquoaksharrsquo-formationrules(ABNF)showingcharacterclassWordLevelEvaluation(WLE)andContextRulesaswellas In-ScriptandCross-ScriptVariantsHerewepresentsomebasic featuresof theScriptandPronuncition The Bangla script is an alpha-syllabic writing system in which writing of all
consonants are assumed to contain an accompanying lsquoinherentrsquo vowel(theoretically before or after each consonant) It varies between ɔ and odepending on the position of the consonant in the word At times theselsquoassumedrsquoorlsquoinherentrsquovowelsarenotpronouncedatall[142]
10
Vowelscanbewrittenasindependentlettersorbyusingavarietyofdiacriticalmarks which are written above below before after or both of the last twopositionstheconsonanttheyfollowinpronunciation[105]
WhenconsonantsoccurtogetherinclustersspecialconjunctlettersareformedInprintedBanglamanyof theseconsonantal clustersorconjoinedconsonantsareinuseThelettersfortheconsonantsotherthanthefinaloneinthegrouparegenerally reduced But there are a few special conjunct characters which arecompounds of the consonant characters eg 7(k)+ষ(s)=8(ks)
9(n)+জ(j)=(nj)(j)+ঞ(n)==(jn) gt (h)+ম(m)=(hm) There are other issuesalsomdashরasthesecondmemberofaclusterisreducedtoasecondarysymboleg
(p)+র(r)=A(pr)B(s)+C(t)+র(r)=D(str) (as inউD ustra ldquocamelrdquo)য (y)whenusedas a primary symbol represents jɔ in Bangla But its secondary symbol(allograph) jɔ-phala has two phonetic values When added to the initialconsonant in a word it is a vowel aelig (as in শGামল (syamala) ldquogreenrdquo র Gাপার
(ryapara)ldquowrapperrdquoetc)Butafteranon-initialconsonant it justdoublesit in
pronunciation (as in কাযH ধাযH etc) The I(r)+য(y) combination has two
shape of the second member is changedmdasheg M(ddh) N(gdh) and O(ndh)
respectively The solitary example of I (r)+ঋ(r)=ঋH (as in ৈনঋHত nairrtSouthwest) ndash usedmostly in cases of Classical borrowings shows the use ofsecondary symbol of a consonant followed by the primary symbol of a vowelTheinherentvowelonlyappliestothefinalconsonantofthecluster
The Bangla script has at least fifty-two primary symbols and quite a fewallographs(positionalvariantsofthem)correspondingtoforty-four(7oraland7nasalvowelsand30consonants)phonemes(150)orfunctionalspeechsoundswithsomeobviousredundanciesalthoughinoneofthefirstphonemicanalysisthenumberwasthoughttobethirty-fivephonemes[140]
11
As mentioned above in Bangla several graphemic symbols have secondaryshapestechnicallycalledlsquoallographsrsquowithacomplementarydistributionineachcaseThesegraphsormarkingsaregenerallyaddedtothefollowingpositionsoftheprimarysymbol[113]inthefollowingmanner
Asforcomplementarydistributionofvowelletters(word-orsyllable-initial)andVowel Matras which are relevant for ABNF let us consider the followingBesidessomesimpleVowelModifierscalledlsquoKarsrsquoinBangla(alsoreferredtoasMatraintheotherLGRdocumentsofNeo-Brāhmī)therearesomecombinatorymodifiersofBanglaVowelswithcertainconsonantsForexamplewhereas
আU+0986BENGALILETTERAAissubstitutedby
াU+09BEBENGALIVOWELSIGNAA
ইU+0987BENGALILETTERIissubstitutedby
pre-posedিU+09BFBENGALIVOWELSIGNI
ঈU+0988BENGALILETTERIIissubstitutedby
ীU+09C0BENGALIVOWELSIGNIIor
উU+0989BENGALILETTERUissubstitutedby
U+09C1 BENGALI VOWEL SIGN U by marking below the primary
grapheme there are some special vowel modifiers of উ as in the followingcombinedletters zwnj guratherthanwritingasগ(g)+ (u)
m (bh)+র (r)(n bhruldquoeyebrowrdquo)o (s)+র (r)(psru)ঋ (r)afterহ (h)(q hr)etc
TherehavebeenmanynotablecontributionsinsimplifyingandmodifyingBanglaspellings and combinatory techniques especially by scholars such as PabitraSarkar(1992)[134]Inthistherehasbeenanattempttoreducethenumberofallographs of both vowels and consonants in clusters and it has been widelyacceptedintheprintingofschooltextsinbothBangladeshandWestBengal[151152]Asofnow twosystems theold (traditional) and thenewgoon sidebysideoperativeindifferentdomains
HoweverinpreparationofthisLGRdocumenttheaimhasbeentoconsiderthewidelyused and usable sequences and combinations and their variations across the sisterscriptsbelongingtothebasketofBrāhmīwritingsystemsBanglaAcademyDhakapublishedStandardBanglaSpellingRulesin1992followingtherecommendationsofacommitteeconstitutedthroughaworkshopjointlyorganizedbytheJatıyaSy iksakramaandPathyapustakaBoardin1988AthroughlyrevisededitionoftheRuleswaspublishedinSeptember20126After the establishment of Banla A kademi ofWestBengal in 1986 its first PresidentAnnadasankar Ray (1904-2002) in his inaugural address gave a direction forstandardizationofBanglaalphabetscript thespellingsystemandclearlyarguedthattheywouldnotblindly followtheSanskriticmodelofconventionalgrammarAbroadlistofproposalswassenttoexpertsonBanglaandabroadagreementwasreachedforlsquohomogenizationofBanglaspellingrsquoby1988BasedonopinionsreceivedfromdifferentquartersaunanimouslistoflsquorulesrsquowasagreeduponThiswaspublishedbyalsquoSpellingDictionaryrsquo titled Ākādemi Bānāna Abhidhāna (1997) which was obviously morecomprehensive than lsquoTheUniversityofCalcuttaproposalsrsquomade in1936Alongwiththe lsquorationalizationrsquo of spellings another stepwas taken tomake thewriting systemeasier to read by making the symbols used both single and combined ones morelsquotransparentrsquoThesereformswereoriginallysuggestedbySarkar(1987firstpublishedin1978)[134][153]whereheusedthetermsSwaccha (lsquoTransparentrsquo)andAswaccha(lsquoOpaquersquo or non-transparent) even adding Ardha Swaccha (lsquohalf transparent) inbetweenthetwoSomesampleexamplesare Transparent r (nn) s (pt) [ (st) where both member of the cluster can berecognized
6Bangla Academy 2012 Bāṅlā Ekaḍemī Pramita Bāṅlā Bānānera Niyama (Bangla Academy StandardBanglaSpellingRules)DhakaːBanglaAcademy
Therewere in fact two types of proposals One concerned the shape of the lettersthose of consonant + vowel (CV) combinations and conjuncts which is consonant +consonantcombinationsTherewerefurthercomplexshapesiethoseofconsonant+consonant+ (consonant+) vowel (CC(CV) signs as in y (pru) or z (skru) SomedecisionsinthisareawerenecessarybecauseafewoftheCC(C)symbolsrepresentedcomplexitiesthatmadelearningthemdifficultforthechildrenTheotherdealtwiththespellings ofwords onlywithout any reference to the shapes of letters inwhich theywere written The basic objective here was lsquoone word one spellingrsquo to the greatestextentthatwaspossible[151]
Belowwe place a statement of themost salient changes that affect the consonant +vowelcombinations[153]
a The variants of the short u (^ উ-কার hrasva u-kāra) vowel sign have been brought down to one ie So zwnj (gu) is now গ Similarly h (ru) gt র zwnj (śu)gt শ j (hu)gtহ and therefore cluster + short u sign k (ntu)gt W
(ন++ত+উ) (stu)gt[ (স++ত+উ)
b The variants of long u (দীঘH ঊ-কার dīrgha u-kāra) have also been reduced
(rū)gt র n (bhrū) gt Y (ভ bh++র r+ঊ ū) (drū)gt (দ d++র r+ঊ ū) p (śrū)gt (শ ś++র r+ঊ ū)
c The variants of ঋ-কার (ṛ-kāra secondary symbol of ṛ) have been brought down to one q (hṛ) gt হ
Regarding consonant + consonant + (consonant)hellip+ (vowel) clusters PaschimbangaBanglaAkademi proposed transparent or semi-transparent shapes for clusters to theextentadmissibleinBanglawritingsystemSomeexampleswillclarifytheproposal(Aslashwillmeanthatthetraditionalcluster-shapeprecedesitwhiletheBanglaAkademiinnovationfollows)[153]
331TheConsonantsAsper traditional classificationBangla Consonants are categorized according to theirphoneticpropertiesespeciallyintermsofplaceandmannerofarticulation[107]Thereare Five lsquoVargarsquo (pronounced as lsquoBargarsquo in Bangla) or Groups (sets or classes)distinguished by Place of Articulation and one Non-lsquovargarsquo group [105] Each Vargawhich corresponds toStopsat a certainplaceof articulation containsa seriesof fiveconsonants classified as per their phonetic qualities (ie manner of articulation)beginning from Unvoiced and Unaspirated to Voiced and Aspirated (in the fourthcolumn)finallyendingwithaHomorganicorCorrespondingnasal[107]Considerthefollowingtable
332TheImplicitVowelKillerHasanta(calledrsquoHalantrsquoorlsquoHalantarsquoinotherBrahmı-basedscripts)As stated earlier all consonants are pronounced in isolation with an implicit vowel(centralback-ɔinBanglaastheneutralvowel)assumedtobeassociatedwiththem[121]ThelsquoHasantarsquo(=rsquoHalantrsquoorlsquoHalantarsquoinotherBrahmı-basedscripts)orthetermlsquoVirāmarsquo7(=rsquoDa rirsquoinBangla)aspreferredinUNICODE(cfUnicode30andabove)havebeenusedinthisreportastermsthathavebeenusedtodenotethecharacterthatmarkthe absence of this inherent vowel It may be noted that the term virama has beenadopted in UNICODE in a sense that is different from the traditional definition ofgrammarandhenceitrequiressomeexplanationhereConsideringtheimportanceofthedocumentthisnoteshouldbeapartofthisLGRdocumentsothatanybodyreferingtoitshouldbeabletoknowthepropergrammaticalexplanationofthetermBecauseaspecialsignisneededwheneverthisimplicitvowelisstrippedoffthesymbolisknownas the Hasanta (= Halant) (U+09CD) By placing the Hasanta under the firstconsonantofacombinationorclusteronecouldndashincommonparlanceldquokillrdquoitsvowelandcreate conjuncts In thismanner conjunct characters canbegenerallywrittenbyjoining two to fourconsonant combinations In rarecases thisprocess can joinup tofive consonantsHowever thenotionof amaximumnumberof consonants joining toformoneaksara8istobeboundedempiricallyThisisanobservationbasedontheCIIL-Emille Corpora of Bangla words [132 amp 133] as seen in print these days Given themixture of scripts and languages happening on theweb the possibility that onemaywant a generic Top Level Domain [gTLD] which may have more than the observedmaximum cannot be ruled out This can be the case when a foreign language wordwhichadmitsalargenumberofconsonantsistransliteratedintoBanglaHenceintheBanglaLGRworkthislimitwillnotbeenforced
333VowelsSeparate symbols exist for all lsquoSwararsquo or Vowels in Bangla which are pronouncedindependentlyeitheratthebeginningofthewordorafteranothervowelorconsonantsoundToindicateaVowelsoundotherthantheimplicitoneaVowelsigncalledlsquokārrsquoinBanglaorMātrāinNagarı9isattachedtotheconsonantSincetheconsonanthasthisbuilt in neutral vowel at the end there are equivalent kāras (Mātrās) for all vowelsexcepttheঅ(pronounced-ɔ)Thecorrelationisshownasfollows
7VirāmaasusedhereisalsoamisnomeraccordingtotheIndiangrammaticaltraditionsNowheremereabsence of a vowel is marked as virama Hasanta just marks the absence of a vowel nothing else(AbhyankarKashinathVasudevampJMShukla1961ADictionaryofSanskritGrammarBarodaːOrientalInstitute)8ThistermneedstobedisambiguatedAksaraalsomeanslsquosyllablelsquoinIndiangrammaticaltreaditions9AlthoughthetermlsquoMatralsquoinBanglastandsforanaltogetherdifferentconceptvizthetopbarplacedoveraletterndashtypicallyavailableinHindiandBanglabutmissinginGujarati
334TheAnusvāraonuʃʃār(ং-U+0982)TheAnusvāra or onuʃʃār inBangla at times represents a homorganic nasal but notalwaysItreplacesaconjunctgroupofalsquoNasalConsonant+Hasanta+ConsonantrsquowherethesecondconsonantbelongstotheVelarvargaorsetasinলংকাButitoftenappearsalso for such combinations involving non-velars appearing as the lastmember of thecombinationasinলGাংটা ldquonakedrdquoorলGাংচা ldquoakindofsweettolimprdquoBeforeanon-vargaconsonant the Anusvara represents a nasal sound that may have an alternativeconjoined writing symbol representing the corresponding nasal consonant of theparticularsetAlthoughModernHindiMarathiandKonkaniprefertheanusvāratothecorrespondingHalf-nasal inBangla it isclearlydemarcatedastowhereonemustusetheAnusvāraandwhere ithastobeaconjunctclusterwithanasalasthefirstorthesecondcomponent
335NasalizationCandrabindu(-U+0981)
Candrabindu denotes nasalization of the preceding vowel as in চাদ cad lsquomoonrsquo(U+099AU+09BEU+0981U+09A6)Thissignwithadotinsidethehalf-moonmarkisusedasnasalizationmarkerinmanyBrahmı-basedscripts[143]
336Nukta(-U+09BC)ThenuktasigndoesnotexistinBanglaorthographyItispredominantlyusedinmanyBrahmıderivedscriptssuchasDevanagarı(forHindiBodoMaithiliSantaliKashmiriandSindhiThetermandtheconceptofnuktaareborrowedinBanglaTheIDNAProtocol(RFC5891)statesthatIDNsmustbeinUnicodeNormalizationFormC (NFC) RFC 7940 applies this requirement to LGRs The definition of NFC in theUnicodeStandardcontainsanumberofcompositionexclusionsAsaresulttheBanglalettersয় YYAড় RRAandঢ় RRHAhavetoberepresentedinthethisLGRbyusingthesequences (YA +Nukta U+9AF + U+09BC) (DDA + Nukta U+9A1 + U+09BC) and(DDHA+NuktaU+9A2+U+09BC)insteadofthesinglecodepointsYYA(U+9DF)RRA(U+09DC) andRRHA (U+09DD) although the useof lsquoNuktarsquo is otherwise completelyunnaturalinBanglaIt is noted that in the current Unicode Standard chart these characters are listed asadditionalconsonantsAspertheLGRProcedurehoweverthesedecisionsdependontheIDNAProtocolthroughasetofprodeduresdevelopedbytheIETFEventhoughtheUnicode Standard also prescribesmethods to produce these three characters both asatomiccharacters (forexample09DC forড় [r]09DD forঢ় [rh] and09DFasয় [y]assinglekeystroke)theIDNAprotocolrequiresthatwetreatthemasconjunctcharactersandthenallocatecodesfortheseintheUnicodeBengaliBlock
18
ItmaybenotedthattherecouldbesporadicattemptsorcasesofwritingMuslimnamesUrdupoeticwordsandPerso-Arabicloanwordswithnuktaunderক(k)খ(kh)গ(g)জ(j) and ফ (ph) only for the sake of correct pronunciation and for maintaining thesanctityoftheloanwordThesewerealsolikeusingBanglawritingsystemtoworkliketheIPAscriptItishowevernotinuseinBanglawritinginprinting
337Visargabiʃɔrgo(ঃ-U+0983)andAvagraha(ঽ-U+09BD)
TheVisargabiʃɔrgoU+0983 is frequentlyused inBangla loanwordsborrowed fromSanskritandrepresentsasoundveryclosetohOnecouldquoteasanexampleদঃখduhkholdquosorrowrsquorsquoldquounhappinessrsquorsquo(U+0926U+0941U+0983U+0916)The Avagraha ঽ (U+09BD) is mainly used in Sanskrit Pali Prakrt or Maithili textswritteninBanglaItisgraduallybeingreplacedbyanuppercomma(egনেরাঽপরািণre-writtenasনেরাrsquoপরািণ)ItisrarelyusednoweveninotherlanguagesusingBanglascriptIncaseofLGRtheAvagrahaisnotpartoftherepertoireIthasbeendecidedthereforenottoretainAvagraha(ঽ)(U+09BD)becauseitisblockedinTLDsaspertheMaximalStartingRepertoire(MSR)PleaseseeAppendixIIinsection11foracompletelistofBanglaconsonantsandtheirallographs
type in the following mannermdashর++য but for র G the sequence would be
র+ZWJ++য [154] In other words ZWJ is used in the rendering of wordsdemanding ya-phalā after ra which is otherwise not possible to type (render)due to the same order of ra+hasanta+antastha ja in the medial andor finalposition Interestingly ra+hasanta+antastha ja is used to type repha on theconsonant -antasthaja as inকায6 (kaarjo) In order to get a ya-phalā after the
consonant -ra it is therefore obligatory to use ZWJ after -ra as in র Gাপার
TheuseofZWJZWNJhavebeenruledoutfromtherootzonebythe[Procedure]Usedin Bangla to create alternate renderings the insertion of these two signs can affectsearchingaswellasNLPTheZeroWidthNon-joiner(ZWNJ)isaninvisiblecharacterusedincertaincases(afterHasanta)wheredefaultconjunctformationistobeexplicitlyrestrictedandtheHasantajoiningthetwoconsonantsparticipatingintheconjunctformationneedstobeexplicitlyshown
339UseofYa-phalaaYa-Phalaasequencesare two instances inBanglawhereHasanta isprecededbya fullvowel(U+0985অ-BENGALILETTERAandU+098Fএ-BENGALILETTERE)
For renderingYa-phalā followedbyঅ andএ it isnecessary to typeU+09CDHasantaplusU+09AFyaprecededby thesaidvowelsThis isapurely ligaturalentityand theadditionofYa-phalāandākaraisusedtoelicittheaeligsoundasinEnglishacidঅGািসড
association অGােসািসেয়শনlsquobatrsquoবGাটlsquofatrsquo ফGাট lsquomatrsquo মGাটlsquocaprsquoকGাপetcTheBrāhmīscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribed as lsquovowel killerrsquo although it actually indicates absenceof a vowel after themarkedconsonantOnly theconsonantscanhave theHasantamarkedButasweseehereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅ8াandএ8া(CfUnicode100p473[100])
Owingtoco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egT ieka+Hasanta+raasinচTchakraldquocyclerdquo)Thepointisinboth the cases the slot for ra could be Bangla ra র (U+09B0) or the Assamese ra ৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesameTheLGRmakesanoteofthispoint of concern with respect to the two RAs in disguise as it would be compeltelyimpossibletodistinguishbetweenthemwithnakedeyesinalablesogeneratedwhichmay consequently lead to concerns related to spoofing and other kind of cyberirregularitiesThemotivetoclassthesetwoCPsas(blocking)variantsisbecausefullyrendered labels may mask the distinction between Bangla ra র (U+09B0) or theAssameseraৰ(U+09F0)ThatprovidesthejustificationforVariantSet4thoughonlyinthecontextoffollowingHasantThedifferencebetweentheRAsisonlydistinguishableifonelooksintotheirUnicodevaluesThereforelabelssuchasঅকHarka শীষH sırsalsquotopapexrsquo অY abhra lsquocloudthe skyrsquo ম śrama lsquophysical labourrsquo could be extremelydangerous as theweb-usermay never verify the digital content (the labels) with itsunicodevaluecodepoints ThispointismadeexplicitlywithreferencetoTable9(ofsequencesp36)andTable16(ofWLESymbolsp47)thataretofollowMoreoverit
4 OverallDevelopmentProcessandMethodologyThe Neo-Brāhmī Generation Panel (NBGP) has been formed by members havingexperience in Linguistics (especially in NLP Computational linguistics) LiteratureLanguageHistoryandEpigraphyUndertheNeo-BrāhmīGenerationPanelBanglaandeightotherscriptsbelongingtoseparateUnicodeblocksarebeingtakenuptoassignaseparate LGR for each However an attempt ismade to ensure that the fundamentalphilosophybehindbuildingthoseLGRsconsistentwithallotherBrāhmī-derivedscriptsThepresentLGRwillcater tomultiple languagesbelongingtoEGIDSscale1to4(seeTable4)thatuseBanglascriptThefollowingguidingprinciplesareusedinmakingdecisionsaboutBanglaLGRCode-points
41 GuidingPrinciplesTheNBGP adopts followingbroadprinciples for selection of code-points in the code-pointrepertoireacrosstheboardforalltheNeo-Brāhmīscriptswithinitsambit
411 InclusionPrinciples4111 ModernUsageEvery character proposed should be in the everyday usage of a particular linguisticcommunityThecharacterswhichhavebeenencodedintheUnicodefortranscriptionpurposesonlyorforarchivalpurposeswillnotbeconsideredforinclusioninthecode-pointrepertoire4112 UnambiguousUseEvery character proposed should have unambiguous understanding among linguistsaboutitsusageinthelanguage
42 ExclusionPrinciplesThe main exclusion principle is that of External Limits on Scope These consist ofprotocolsor standardswhichareprerequisites to theLabelGenerationRule-setsAllfurtherprinciplesareinfactsubsumedundertheselimitationsbuthavebeenspeltoutseparatelyforthesakeofclarity
22
421 ExternalLimitsofScopeThecodepointrepertoireforrootzonebeingaveryspecialcaseatthetopofprotocolhierarchies thecanvasofavailablecharacters forselectionasapartof theRootZonecodepointrepertoireisalreadyconstrainedbyvariousprotocollayersbeneathitThefollowingthreemainprotocolsstandardsactassuccessivefiltersiTheUnicodeChartOut of all the characters that are needed by the script in question if a particularcharacter is not encoded in Unicode it cannot be incorporated in the code pointrepertoire Such cases are quite rare and especially so in Bangla-Asamiyā-ManipuriWritingSystemgiventheelaborateandexhaustivecharacterinclusioneffortsmadebytheUnicodeconsortiumiiIDNAProtocolUnicode being the character-encoding standard for providing the maximum possiblerepresentation of a given scriptlanguage it has encoded as far as possible all thepossible characters needed by the script However the Domain name being aspecialized case it is governed by an additional protocol known as IDNA(InternationalizedDomainNames inApplications) The IDNAprotocol excludes somecharactersoutofUnicoderepertoirefrombeingpartofthedomainnamesiiiMaximalStartingRepertoire(MSR)TheRoot-zoneLGRbeing the repertoireof characterswhicharegoing tobeused forcreationoftheRoot-zoneTLDswhichinturnconstituteanevenmorespecializedcaseof domain names the ROOT LGR procedure introduces additional exclusions on theIDNArsquosallowedsetofcharacters ExampleBanglaSignAvagrahaঽ(U+093D)evenifallowedbyIDNAprotocol isnotpermittedintheRootZoneRepertoireaspertheMSRTosumuptherestrictionsstartoffwithadmittingonlysuchcharactersasarepartofthe code-block of the given scriptlanguage The IDNA Protocol further narrows thisdownandfinallyanadditionalfilterintheformofMaximalStartingRepertoirerestrictsthecharactersetassociatedwiththegivenlanguageevenmore4211 NoPunctuationMarksTheTLDsbeingidentifierspunctuationmarkerspresentinBraHami-basedscriptswillnotbeincluded
23
4212 NoSymbolsandAbbreviationsAbbreviations weights and measures and other such iconic characters like BANGLAISSHAR(U+09FA)BANGLACURRENCYDENOMINATORSIXTEEN৹(U+09F9)etcwillalsonotbeincluded4213 NoRareandObsoleteCharactersThere are characterswhich have been added toUnicode to accommodate rare formssuchasSanskriticVOCALICRRৠ(U+09E0)andVOCALICLldquoঌrdquo (U+098C)aswellasVOCALICLLৡ(U+09E1)andtheallographicndashkaraformsofthelattertwosymbols-VOWELSIGNVOCALICL(U+09E2)andVOWELSIGNVOCALICLLldquo(U+09E3)Allsuch charactersareexcludedwhich complieswith theConservatismprincipleas laiddownintheRootZoneLGRprocedureHoweverinBanglathe-karacorrespondingtoVOCALICRRৠ(U+09E0)whichisVOWELSIGNVOCALICRRldquordquo(U+09C4)isstill inactiveuseincertainlimitedborrowedorSanskriticwordsandarethereforeretained4214 NoStressMarkersofClassicalSanskritandVedicStressmarkers for classical Sanskrit will not be included This is also in consonancewiththeLetterprincipleaslaiddownintheRootZoneLGRprocedure4215 ABNFThe Augmented Backus-Naur Formalism (ABNF) is described in Section 541 andAppendix(Section101)
5 RepertoireTheBanglaWritingSystemisrepresentedinUNICODEusingtheBengali(Bangla)scriptname as enumerated in ISO 15924 corresponding to languages such as Asamiyā(Assamese) Bangla (Bengali) and Manipuri The BENGALI block used for Bangla-Asamiyā-Manipuri in theUNICODEhas93 entriesThis sectiondetails the code-pointrepertoirethattheNeo-BrāhmīGenerationPanel[NBGP]proposestobeincludedintheBanglaLGRItmaybementionedherethat theGovernmentofAssamhassubmittedaproposal toBureauof Indian Standards (BIS) on26th February2016 for dis-unificationofBanglaand Asamiyā Scripts The BIS inits 8thMeetingofIndian Language Technologies andProducts Sectional Committee LITD 20 held on 23rd Aug 2017 decided torefer theproposalforrecognitionofAssamesescriptinISOIEC10646toISOUntiltheUNICODEConsortiumtakesanyfurtheractionitwillbeassumedthattheCodePointRepertoireunderTable11willbevalidforallthethreelanguagesasabove
24
For each of the code points language references have been given in the last columntitledReferenceunderTable8titledtheldquoCodePointRepertoirerdquoForentirecoverageofBanglacodepointsreferencesofBanglaAsamiyā(Assamese)Manipuri(Meitei)andBishnupriya are given Kokborok written in Bangla script is not known to haveintroducedmanynewcomplicationsexceptforoneparticularcharacterThoughonlyafewrepresentativelanguagesunderEGIDSScale1-4havebeenchosenforreferencingthey together cover all the code-points required for all the languages that NBGP hasconsideredasgivenunderBanglaUnicodePoints(asgiveninUNICODE63)Howeverbefore thedetailsarepresented it is ideal to lookat theBanglaCodePointChartfromMaximalStartingRepertoire[MSR]Version3Itmaybenotedthattheshapesofthereferenceglyphsgivenbelowinthecodechartsarebasedononeofthemanyfontsdesigned and are not prescriptive because there could be some variations in actualfonts ndash both UNICODE-compatible and True-Type ones Consider the following Codepointtable
ColourconventionAllcharactersthatareincludedinthe[MSR]-YellowbackgroundPVALIDinIDNA2008butexcludedfromthe[MSR]-PinkishbackgroundNot PVALID in IDNA2008 or are ineligiblefor the root zone (digits hyphen) - Whitebackground
Apart from the above individual code-points the Neo-Brāhmī Generation Panel alsoproposes some specific sequences which enable conditional inclusion of the BanglaLETTER A and E followed by Bangla SIGN VIRAMA and Bangla LETTER YA againfollowed by Bangla VOWEL SIGN AA in the repertoire for enabling inclusion of aeligsoundasinEnglishlsquobatrsquolsquocatrsquoetc
52 CodePointRepertoireExclusionTherearesomecharactersoftheBanglascriptthatfindplaceintheUnicodebuthavenot been included in the repertoire in the LGR proposal The reason for excludingঌ(U+098C)andৗ(U+09D7)isthattheyarerareandobsoletecharacters
53 CodepointnotusedaloneBENGALI SIGN NUKTA U+09BC (See 336) is excluded from repertoire since it will never be used alone It will be used as sequence in three special characters in normalized form for ড় ঢ় য়
542 TheVowelSequenceInwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaare given To facilitate understanding of other Brahmi script users equivalents inDevanāgarīareprovidedwherevernecessaryAvowelsequenceismadeupofasinglevowelItmaybefollowedbutnotnecessarily(optionally)byanAnusvāraonuʃʃār(B)Candrabindu(D)oraVisargabiʃɔrgo(X)ThenumberofDBorXwhichcanfollowaV inBanglamaynotberestrictedtooneGoingbytherules illustratedinthedocument it isclearthat formationssuchasVDDVBBandVXXare invalidorthographicunitsHowever it isvalidandpossible tohaveformationsorsequencessuchasanusvarafollowedbyachandrabinduononehandandvisarga followed by a chandrabindu on the other as in হ8াংচা lsquohaelignchārsquo and lsquohaelignrsquo হ8াঃrespectivelyThepossibilityof aVisarga orAnusvāra (onuʃʃār) followingaCandrabinduexists inBangla Vowel can optionally be followed by a combination of Hasanta Virama [H]Consonant [C] to formaYa-phala ldquoYa-phala isapresentation formofU+09AFBanglaletterযorlsquoyarsquoRepresentedbythesequenceltU+09CDieBENGALISIGNVIRAMABangla SIGNHasanta or VIRAMA U+09AF -য BENGALI LETTER YAgt Ya-phala has aspecialformয়AgainwhencombinedwithU+09BEাBENGALIVOWELSIGNAA(ielsquoaarsquo(ā))itisusedfortranscribing[aelig]asintheldquoardquointheEnglishwordldquobatrdquowritteninBanglaasব8াটAVowel-sequenceadmitsthefollowingcombinations
5421 ASingleVowel
ExamplesV অअ
5422 AVowelwithConditionsAVowelcanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D]orVisarga[X]or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX] or combination ofHasanta(orVirama)[H]followedbyConsonant[C]followedbykāra(Mātrā)[M]
5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]
5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX
Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)
545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full
10 Refer to Rule P in Section 7 Table 16
42
vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+
6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed
61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus
স +Hasanta+হ(U+09B8+U+09CD+U+09B9)
2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus
ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)
44
Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)
62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference
7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecifications
11 Unicode uses Oriya for the script although Odia is now the official term used 12 As used by the Unicode denoting and including both Assamese and Maṇipuri
47
Below are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)
C rarr Consonant
M rarr Kāra(Mātrā)
V rarr Vowel
B rarr Anusvāra
D rarr Candrabindu
X rarr Visarga
H rarr Hasanta
Z rarr KhandaTa
S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules
P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA
Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinations
49
711 Case of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াtঅuইিvয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অu)andthesecondwordbeginswithaV(ইিvয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstwordThis isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule
72 AdditionalExamplesfromBanglaABNF
Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive
ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of the
JanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka
[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018
[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473
57
10 Appendix-I
101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply
1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াwেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াxব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexampleyেসিরং(Tsering)
102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti
13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section
58
Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)
Table17ndashTheScriptTableofSylhetiNagarıorSiloti
103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints
ক k Uuml(p+ক)raquo(p+ট)(p+ত)Yacute(p+y+র)(p+y+ব)THORN(p+ন)szlig(p+ব)agrave(p+ম)ক8(p+য)aacute(p+র)acirc(p+ষ)atilde(p+frac12+ণ)auml(p+frac12+ম)aring(p+frac12+ব)acirc8(p+frac12+য)aelig(p+স)t(ccedil+ক)egrave(+p+র)
amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)
গ g ecirc(micro+গ)euml(micro+দ)acute(micro+ধ)igrave(micro+ন)iacute(micro+ব)icirc(micro+ম)গ8(micro+য)iuml(micro+র)eth(micro+ল)ntilde(ccedil+গ)ntilde6 ( +ccedil+গ)
ণ n ouml(Agrave+ট)iquest(Agrave+ঠ)divide(Agrave+ড)oslash(Agrave+Atilde+র)AElig(Agrave+ঢ)ugrave(Agrave+ণ)ণ8(Agrave+য)uacute(Agrave+ব)atilde(p+frac12+ণ)ucirc(frac12+ণ)uuml(+ণ)
Mithilākṣarawasused forMaithili fromthe14thCenturyuntil theearly-20thcentury[106]InthiscontextonefindsamentionoflsquoSylhetiNagarılipirsquoorlsquoSilotirsquo(addedtotheUnicodeStandard inMarch2005with thereleaseofversion41) thedetailsofwhichcouldbeof interest only tohistorians andhistorical linguists (See137and144)ButSylhetiBanglaisgenerallywrittenbymanyinthemodern-dayBanglascriptnowforallpracticalpurposes Originallyduring thereignof thePāladynasty (750-1154AD) intheeasternIndiaandevenearlierperhapsduringtheMallaperiod(694ADonwards)thepresent-dayBanglawritingsystemgotashapecomparabletothemodern-dayones[111 119] A pictorial description of Brāhmī to Modern Bangla Script could bepresentedhereinatabularform
The inscriptional evidence in Brāhmī is found in the Archaic Brāhmī from the 3rdcenturyBC tothe1stcenturyBCandinMiddleBrāhmīndashsoonafter(1st-3rdCenturyAD)andthenonintheLateBrāhmī(4th-6thCenturyAD)ThisevidencecouldbeseeninbothBangladeshandWestBengal [108]by1)TheMahasthanagara(BogradistrictBangladesh mdash the ancient name being Pundranagara or Paundravardhanapura)inscriptions 2)Brāhmī (andKharoṣṭhī) inscriptions from the lower lsquoGangeticBengalrsquoand (3) Copper plate inscriptions of the Imperial Guptas fromNorthernpart ofWestBengal andNorth-West Bangladeshmdash in the areas underDharmaditya Gopachandraand Samācāradeva (about whom one only knows from five Copper-plates found inKotalipara in the Faridpur district in Bangladesh one in Mallasarul in the Burdwandistrict(WestBengal)andoneinJayramapura(BallesvaradistrictnowinOdisha)
5
Theseepigraphs fromtheeasternpartofUndivided India (datingback to the4th-6thCenturiesAD)showedsomecharacteristicfeaturesofletters(especiallyinমlsquomarsquoলlsquolarsquo
শlsquosarsquoসlsquosarsquoandহlsquoharsquo)whichledtothedevelopmentofeasternvarietyofGuptascriptEpigraphicrecordsfromBangladeshdemonstrateremarkabledevelopmentsinEasternBrāhmī In this context the Tippera copper plate inscription of the lsquoSamatatarsquo rulers(139 pp 265) such as Lokanātha (dated 7th Century AD during the latter half) theKailaninscriptionofSy ridharanaRātaaswellastheAstafpurcopperplatesThelettersseemtohangdownfromwedgeshapedsolidtriangleswithrighthandverticalsbendingdownatthebottombecauseofwhichitwasdescribedbyPrinsepandFleetasKuṭila-lipi (literally lsquoCursivewriting stylersquo)whereas the termSiddhamātrikā (as amatra orbarisplacedovereachoftheletters)wasusedbyAlBiruni(973-1048)todesignatethescriptofNorthernIndiaThenextstageofdevelopmentisillustratedbythe9thCenturycopper plate inscriptions fromKhalimpur of the reign of Dharmapāla fromMonghyrand Nalanda of the time of Devapāla in Bihar and from Jagjıvanpura (Malda) of thereignofMahendrapālaTheSiddhamātrikā(mentionedaslsquoSiddhamrsquoinChinesesources)issaidtohavebeenprevalentalsointhisregionuptotheendofthetenthcenturyAlsocalledtheGauri(ieGandi)inPūrvadeśāortheEasterncountryitwasregardedasthesame script to which is given the appellative Proto-Bangla characteristics inrudimentaryformsintheperiodbetweenAD875andAD1025Insomeepigraphs it isconsideredasbelonging to thesecondquarterof theeleventhcenturyADFlatteningofhead-marksbecomesprominentincomparisontothewedge-shaped serifs An important landmark in the development of the Bangla script is theRamaganja copper plate inscription of Mahāmānḍalika in the last quarter of theeleventhcenturyADItistheearliestdocumentfromthisentireregionwhichbearsthelettermwithatickrisingupwardsThefullvowelidevelopsatickattherightendofthe upper horizontal bar above and a curved hook below Initial e approaches themodernBanglacharacterAmature formofProto-Bangla the immediateprecursorofBanglascriptisillustratedintheinscriptionsoftheVarmanaSenaandDevarulersofthetwelfthandthirteenthcenturies[104]TheevolutionoftheBanglascript(Cf136)isalignedwiththestoryofadvancementofprintingtechnologyThefirstldquoMovabletyperdquoscriptstechnicallycreatedandusedwhileprintingNathanielBrasseyHalheds (1751-1830)1778-book titled AGrammaroftheBengalLanguageIn1785Governor-GeneralWarrenHastings(1732-1818)requestedanother civilian Charles Wilkins (1749-1836) to cut punches for Bangla printingcharactersThecurrentprintedformofBanglascriptappearedsoonafterItisgenerallyagreedthatWilkinsdevelopedBanglaprintscript[111]HepassedonthisknowledgetoPancananaKarmakara(-1804)arenownedartistinBengalLateritwasKarmakarand his family that became famous in Bangla printing technology Shepherd wasanotherassistantofWilkinsinthisdesigningofscriptwhichbecamemoreangularwithsharperturnsandedges[133]Afewarchaiclettersweremodernizedduringthe19th
6
century It was standardized by Pandit Ishwar Chandra Vidyasagar when the Banglatypefontsweretobeusedtopublishona largescaleundertheCalcuttaSchoolBookSociety[116forseveralreferences]Much later in1935 theLinotypetechnique inventedbyOttmarMergenthaler(1854-1899) in 1886was introduced intoBangla printing in 1935 by the efforts of SureshChandra Majumdar (1888-1954) Rajsekhar Basu (1880-1960) Jatindra Kumar Sen(1882-1966)andhisdiscipleSushilKumarBhattacharyaandhadbegunbeingusedbytheA nandabazaraPatrikagrouplaterfollowedbyothersWithinafewyearsthemoreadvancedmonotypetechnologycametobeusedinBanglaprintingHoweverinBanglaprinting culturemonotypehas a very limited acceptance and linotype held stage tilleventuallythedigitaltechnologycameintoreplaceallearliertechniquesAllthesecouldbepresentedinatable
32LanguagesConsideredBelowisthetabularrepresentationofthelanguagesusingBanglascriptthatareplacedonEGIDSScale1-6 (See117 fordetails) Some languagesunderEGIDS5 and6havealso developed their own scripts for printing and publishing Some had used Banglascriptearlier(suchasBodo)orusedit inWestBengalatsomepointoftime(Santali)but have later shifted to another writing system Bodo is now written in Nāgarī orDevanāgarīandforSantalioneusesbothNāgarīDevanāgarīandOl-chiki(145)Forthepurposesof theBanglaLGRonly languagesbelonging to theEGIDS scale1 to4havebeenconsideredConsiderthefollowingtable
9
EGIDSScale1
EGIDSScale2
EGIDSScale3
EGIDSScale4
EGIDSScale5
EGIDS6
Bangla(Bengali)
SantaliBodoRiangKhumiMru(ng)Asho
LepchaPnarKodaKoraChak
Asamiyā(Assamese)
KochorRajabansı
MaltoorMalpahariya
ManipuriorMeitei
BisnupriyaManipuriKok-Borok(TripuraampBangladesh)
ChakmaHajongMundariampKurux(ofBangladesh)
TotoRohingyaTipperaMegamTanchangya
Usoi LimbuSadriorOraon
BhumijorMundariBawmChin
Table4MainlanguagesinIndiaandBangladesh
thatuseBanglaScriptontheEGIDSScale
33NotableFeaturesofBanglaScript[150]BanglaWritingSystemhascertainfeaturesthatshowhowithastobewritteninorhowtype-setting inBangla couldbedoneThis section is followedbya section that explains theCode-points (and fixed Code-point sequences) which show certain distinctive characteristics ofBanglaandwhichmaketheRepertoireThenextsectionswillalsocoverthelsquoaksharrsquo-formationrules(ABNF)showingcharacterclassWordLevelEvaluation(WLE)andContextRulesaswellas In-ScriptandCross-ScriptVariantsHerewepresentsomebasic featuresof theScriptandPronuncition The Bangla script is an alpha-syllabic writing system in which writing of all
consonants are assumed to contain an accompanying lsquoinherentrsquo vowel(theoretically before or after each consonant) It varies between ɔ and odepending on the position of the consonant in the word At times theselsquoassumedrsquoorlsquoinherentrsquovowelsarenotpronouncedatall[142]
10
Vowelscanbewrittenasindependentlettersorbyusingavarietyofdiacriticalmarks which are written above below before after or both of the last twopositionstheconsonanttheyfollowinpronunciation[105]
WhenconsonantsoccurtogetherinclustersspecialconjunctlettersareformedInprintedBanglamanyof theseconsonantal clustersorconjoinedconsonantsareinuseThelettersfortheconsonantsotherthanthefinaloneinthegrouparegenerally reduced But there are a few special conjunct characters which arecompounds of the consonant characters eg 7(k)+ষ(s)=8(ks)
9(n)+জ(j)=(nj)(j)+ঞ(n)==(jn) gt (h)+ম(m)=(hm) There are other issuesalsomdashরasthesecondmemberofaclusterisreducedtoasecondarysymboleg
(p)+র(r)=A(pr)B(s)+C(t)+র(r)=D(str) (as inউD ustra ldquocamelrdquo)য (y)whenusedas a primary symbol represents jɔ in Bangla But its secondary symbol(allograph) jɔ-phala has two phonetic values When added to the initialconsonant in a word it is a vowel aelig (as in শGামল (syamala) ldquogreenrdquo র Gাপার
(ryapara)ldquowrapperrdquoetc)Butafteranon-initialconsonant it justdoublesit in
pronunciation (as in কাযH ধাযH etc) The I(r)+য(y) combination has two
shape of the second member is changedmdasheg M(ddh) N(gdh) and O(ndh)
respectively The solitary example of I (r)+ঋ(r)=ঋH (as in ৈনঋHত nairrtSouthwest) ndash usedmostly in cases of Classical borrowings shows the use ofsecondary symbol of a consonant followed by the primary symbol of a vowelTheinherentvowelonlyappliestothefinalconsonantofthecluster
The Bangla script has at least fifty-two primary symbols and quite a fewallographs(positionalvariantsofthem)correspondingtoforty-four(7oraland7nasalvowelsand30consonants)phonemes(150)orfunctionalspeechsoundswithsomeobviousredundanciesalthoughinoneofthefirstphonemicanalysisthenumberwasthoughttobethirty-fivephonemes[140]
11
As mentioned above in Bangla several graphemic symbols have secondaryshapestechnicallycalledlsquoallographsrsquowithacomplementarydistributionineachcaseThesegraphsormarkingsaregenerallyaddedtothefollowingpositionsoftheprimarysymbol[113]inthefollowingmanner
Asforcomplementarydistributionofvowelletters(word-orsyllable-initial)andVowel Matras which are relevant for ABNF let us consider the followingBesidessomesimpleVowelModifierscalledlsquoKarsrsquoinBangla(alsoreferredtoasMatraintheotherLGRdocumentsofNeo-Brāhmī)therearesomecombinatorymodifiersofBanglaVowelswithcertainconsonantsForexamplewhereas
আU+0986BENGALILETTERAAissubstitutedby
াU+09BEBENGALIVOWELSIGNAA
ইU+0987BENGALILETTERIissubstitutedby
pre-posedিU+09BFBENGALIVOWELSIGNI
ঈU+0988BENGALILETTERIIissubstitutedby
ীU+09C0BENGALIVOWELSIGNIIor
উU+0989BENGALILETTERUissubstitutedby
U+09C1 BENGALI VOWEL SIGN U by marking below the primary
grapheme there are some special vowel modifiers of উ as in the followingcombinedletters zwnj guratherthanwritingasগ(g)+ (u)
m (bh)+র (r)(n bhruldquoeyebrowrdquo)o (s)+র (r)(psru)ঋ (r)afterহ (h)(q hr)etc
TherehavebeenmanynotablecontributionsinsimplifyingandmodifyingBanglaspellings and combinatory techniques especially by scholars such as PabitraSarkar(1992)[134]Inthistherehasbeenanattempttoreducethenumberofallographs of both vowels and consonants in clusters and it has been widelyacceptedintheprintingofschooltextsinbothBangladeshandWestBengal[151152]Asofnow twosystems theold (traditional) and thenewgoon sidebysideoperativeindifferentdomains
HoweverinpreparationofthisLGRdocumenttheaimhasbeentoconsiderthewidelyused and usable sequences and combinations and their variations across the sisterscriptsbelongingtothebasketofBrāhmīwritingsystemsBanglaAcademyDhakapublishedStandardBanglaSpellingRulesin1992followingtherecommendationsofacommitteeconstitutedthroughaworkshopjointlyorganizedbytheJatıyaSy iksakramaandPathyapustakaBoardin1988AthroughlyrevisededitionoftheRuleswaspublishedinSeptember20126After the establishment of Banla A kademi ofWestBengal in 1986 its first PresidentAnnadasankar Ray (1904-2002) in his inaugural address gave a direction forstandardizationofBanglaalphabetscript thespellingsystemandclearlyarguedthattheywouldnotblindly followtheSanskriticmodelofconventionalgrammarAbroadlistofproposalswassenttoexpertsonBanglaandabroadagreementwasreachedforlsquohomogenizationofBanglaspellingrsquoby1988BasedonopinionsreceivedfromdifferentquartersaunanimouslistoflsquorulesrsquowasagreeduponThiswaspublishedbyalsquoSpellingDictionaryrsquo titled Ākādemi Bānāna Abhidhāna (1997) which was obviously morecomprehensive than lsquoTheUniversityofCalcuttaproposalsrsquomade in1936Alongwiththe lsquorationalizationrsquo of spellings another stepwas taken tomake thewriting systemeasier to read by making the symbols used both single and combined ones morelsquotransparentrsquoThesereformswereoriginallysuggestedbySarkar(1987firstpublishedin1978)[134][153]whereheusedthetermsSwaccha (lsquoTransparentrsquo)andAswaccha(lsquoOpaquersquo or non-transparent) even adding Ardha Swaccha (lsquohalf transparent) inbetweenthetwoSomesampleexamplesare Transparent r (nn) s (pt) [ (st) where both member of the cluster can berecognized
6Bangla Academy 2012 Bāṅlā Ekaḍemī Pramita Bāṅlā Bānānera Niyama (Bangla Academy StandardBanglaSpellingRules)DhakaːBanglaAcademy
Therewere in fact two types of proposals One concerned the shape of the lettersthose of consonant + vowel (CV) combinations and conjuncts which is consonant +consonantcombinationsTherewerefurthercomplexshapesiethoseofconsonant+consonant+ (consonant+) vowel (CC(CV) signs as in y (pru) or z (skru) SomedecisionsinthisareawerenecessarybecauseafewoftheCC(C)symbolsrepresentedcomplexitiesthatmadelearningthemdifficultforthechildrenTheotherdealtwiththespellings ofwords onlywithout any reference to the shapes of letters inwhich theywere written The basic objective here was lsquoone word one spellingrsquo to the greatestextentthatwaspossible[151]
Belowwe place a statement of themost salient changes that affect the consonant +vowelcombinations[153]
a The variants of the short u (^ উ-কার hrasva u-kāra) vowel sign have been brought down to one ie So zwnj (gu) is now গ Similarly h (ru) gt র zwnj (śu)gt শ j (hu)gtহ and therefore cluster + short u sign k (ntu)gt W
(ন++ত+উ) (stu)gt[ (স++ত+উ)
b The variants of long u (দীঘH ঊ-কার dīrgha u-kāra) have also been reduced
(rū)gt র n (bhrū) gt Y (ভ bh++র r+ঊ ū) (drū)gt (দ d++র r+ঊ ū) p (śrū)gt (শ ś++র r+ঊ ū)
c The variants of ঋ-কার (ṛ-kāra secondary symbol of ṛ) have been brought down to one q (hṛ) gt হ
Regarding consonant + consonant + (consonant)hellip+ (vowel) clusters PaschimbangaBanglaAkademi proposed transparent or semi-transparent shapes for clusters to theextentadmissibleinBanglawritingsystemSomeexampleswillclarifytheproposal(Aslashwillmeanthatthetraditionalcluster-shapeprecedesitwhiletheBanglaAkademiinnovationfollows)[153]
331TheConsonantsAsper traditional classificationBangla Consonants are categorized according to theirphoneticpropertiesespeciallyintermsofplaceandmannerofarticulation[107]Thereare Five lsquoVargarsquo (pronounced as lsquoBargarsquo in Bangla) or Groups (sets or classes)distinguished by Place of Articulation and one Non-lsquovargarsquo group [105] Each Vargawhich corresponds toStopsat a certainplaceof articulation containsa seriesof fiveconsonants classified as per their phonetic qualities (ie manner of articulation)beginning from Unvoiced and Unaspirated to Voiced and Aspirated (in the fourthcolumn)finallyendingwithaHomorganicorCorrespondingnasal[107]Considerthefollowingtable
332TheImplicitVowelKillerHasanta(calledrsquoHalantrsquoorlsquoHalantarsquoinotherBrahmı-basedscripts)As stated earlier all consonants are pronounced in isolation with an implicit vowel(centralback-ɔinBanglaastheneutralvowel)assumedtobeassociatedwiththem[121]ThelsquoHasantarsquo(=rsquoHalantrsquoorlsquoHalantarsquoinotherBrahmı-basedscripts)orthetermlsquoVirāmarsquo7(=rsquoDa rirsquoinBangla)aspreferredinUNICODE(cfUnicode30andabove)havebeenusedinthisreportastermsthathavebeenusedtodenotethecharacterthatmarkthe absence of this inherent vowel It may be noted that the term virama has beenadopted in UNICODE in a sense that is different from the traditional definition ofgrammarandhenceitrequiressomeexplanationhereConsideringtheimportanceofthedocumentthisnoteshouldbeapartofthisLGRdocumentsothatanybodyreferingtoitshouldbeabletoknowthepropergrammaticalexplanationofthetermBecauseaspecialsignisneededwheneverthisimplicitvowelisstrippedoffthesymbolisknownas the Hasanta (= Halant) (U+09CD) By placing the Hasanta under the firstconsonantofacombinationorclusteronecouldndashincommonparlanceldquokillrdquoitsvowelandcreate conjuncts In thismanner conjunct characters canbegenerallywrittenbyjoining two to fourconsonant combinations In rarecases thisprocess can joinup tofive consonantsHowever thenotionof amaximumnumberof consonants joining toformoneaksara8istobeboundedempiricallyThisisanobservationbasedontheCIIL-Emille Corpora of Bangla words [132 amp 133] as seen in print these days Given themixture of scripts and languages happening on theweb the possibility that onemaywant a generic Top Level Domain [gTLD] which may have more than the observedmaximum cannot be ruled out This can be the case when a foreign language wordwhichadmitsalargenumberofconsonantsistransliteratedintoBanglaHenceintheBanglaLGRworkthislimitwillnotbeenforced
333VowelsSeparate symbols exist for all lsquoSwararsquo or Vowels in Bangla which are pronouncedindependentlyeitheratthebeginningofthewordorafteranothervowelorconsonantsoundToindicateaVowelsoundotherthantheimplicitoneaVowelsigncalledlsquokārrsquoinBanglaorMātrāinNagarı9isattachedtotheconsonantSincetheconsonanthasthisbuilt in neutral vowel at the end there are equivalent kāras (Mātrās) for all vowelsexcepttheঅ(pronounced-ɔ)Thecorrelationisshownasfollows
7VirāmaasusedhereisalsoamisnomeraccordingtotheIndiangrammaticaltraditionsNowheremereabsence of a vowel is marked as virama Hasanta just marks the absence of a vowel nothing else(AbhyankarKashinathVasudevampJMShukla1961ADictionaryofSanskritGrammarBarodaːOrientalInstitute)8ThistermneedstobedisambiguatedAksaraalsomeanslsquosyllablelsquoinIndiangrammaticaltreaditions9AlthoughthetermlsquoMatralsquoinBanglastandsforanaltogetherdifferentconceptvizthetopbarplacedoveraletterndashtypicallyavailableinHindiandBanglabutmissinginGujarati
334TheAnusvāraonuʃʃār(ং-U+0982)TheAnusvāra or onuʃʃār inBangla at times represents a homorganic nasal but notalwaysItreplacesaconjunctgroupofalsquoNasalConsonant+Hasanta+ConsonantrsquowherethesecondconsonantbelongstotheVelarvargaorsetasinলংকাButitoftenappearsalso for such combinations involving non-velars appearing as the lastmember of thecombinationasinলGাংটা ldquonakedrdquoorলGাংচা ldquoakindofsweettolimprdquoBeforeanon-vargaconsonant the Anusvara represents a nasal sound that may have an alternativeconjoined writing symbol representing the corresponding nasal consonant of theparticularsetAlthoughModernHindiMarathiandKonkaniprefertheanusvāratothecorrespondingHalf-nasal inBangla it isclearlydemarcatedastowhereonemustusetheAnusvāraandwhere ithastobeaconjunctclusterwithanasalasthefirstorthesecondcomponent
335NasalizationCandrabindu(-U+0981)
Candrabindu denotes nasalization of the preceding vowel as in চাদ cad lsquomoonrsquo(U+099AU+09BEU+0981U+09A6)Thissignwithadotinsidethehalf-moonmarkisusedasnasalizationmarkerinmanyBrahmı-basedscripts[143]
336Nukta(-U+09BC)ThenuktasigndoesnotexistinBanglaorthographyItispredominantlyusedinmanyBrahmıderivedscriptssuchasDevanagarı(forHindiBodoMaithiliSantaliKashmiriandSindhiThetermandtheconceptofnuktaareborrowedinBanglaTheIDNAProtocol(RFC5891)statesthatIDNsmustbeinUnicodeNormalizationFormC (NFC) RFC 7940 applies this requirement to LGRs The definition of NFC in theUnicodeStandardcontainsanumberofcompositionexclusionsAsaresulttheBanglalettersয় YYAড় RRAandঢ় RRHAhavetoberepresentedinthethisLGRbyusingthesequences (YA +Nukta U+9AF + U+09BC) (DDA + Nukta U+9A1 + U+09BC) and(DDHA+NuktaU+9A2+U+09BC)insteadofthesinglecodepointsYYA(U+9DF)RRA(U+09DC) andRRHA (U+09DD) although the useof lsquoNuktarsquo is otherwise completelyunnaturalinBanglaIt is noted that in the current Unicode Standard chart these characters are listed asadditionalconsonantsAspertheLGRProcedurehoweverthesedecisionsdependontheIDNAProtocolthroughasetofprodeduresdevelopedbytheIETFEventhoughtheUnicode Standard also prescribesmethods to produce these three characters both asatomiccharacters (forexample09DC forড় [r]09DD forঢ় [rh] and09DFasয় [y]assinglekeystroke)theIDNAprotocolrequiresthatwetreatthemasconjunctcharactersandthenallocatecodesfortheseintheUnicodeBengaliBlock
18
ItmaybenotedthattherecouldbesporadicattemptsorcasesofwritingMuslimnamesUrdupoeticwordsandPerso-Arabicloanwordswithnuktaunderক(k)খ(kh)গ(g)জ(j) and ফ (ph) only for the sake of correct pronunciation and for maintaining thesanctityoftheloanwordThesewerealsolikeusingBanglawritingsystemtoworkliketheIPAscriptItishowevernotinuseinBanglawritinginprinting
337Visargabiʃɔrgo(ঃ-U+0983)andAvagraha(ঽ-U+09BD)
TheVisargabiʃɔrgoU+0983 is frequentlyused inBangla loanwordsborrowed fromSanskritandrepresentsasoundveryclosetohOnecouldquoteasanexampleদঃখduhkholdquosorrowrsquorsquoldquounhappinessrsquorsquo(U+0926U+0941U+0983U+0916)The Avagraha ঽ (U+09BD) is mainly used in Sanskrit Pali Prakrt or Maithili textswritteninBanglaItisgraduallybeingreplacedbyanuppercomma(egনেরাঽপরািণre-writtenasনেরাrsquoপরািণ)ItisrarelyusednoweveninotherlanguagesusingBanglascriptIncaseofLGRtheAvagrahaisnotpartoftherepertoireIthasbeendecidedthereforenottoretainAvagraha(ঽ)(U+09BD)becauseitisblockedinTLDsaspertheMaximalStartingRepertoire(MSR)PleaseseeAppendixIIinsection11foracompletelistofBanglaconsonantsandtheirallographs
type in the following mannermdashর++য but for র G the sequence would be
র+ZWJ++য [154] In other words ZWJ is used in the rendering of wordsdemanding ya-phalā after ra which is otherwise not possible to type (render)due to the same order of ra+hasanta+antastha ja in the medial andor finalposition Interestingly ra+hasanta+antastha ja is used to type repha on theconsonant -antasthaja as inকায6 (kaarjo) In order to get a ya-phalā after the
consonant -ra it is therefore obligatory to use ZWJ after -ra as in র Gাপার
TheuseofZWJZWNJhavebeenruledoutfromtherootzonebythe[Procedure]Usedin Bangla to create alternate renderings the insertion of these two signs can affectsearchingaswellasNLPTheZeroWidthNon-joiner(ZWNJ)isaninvisiblecharacterusedincertaincases(afterHasanta)wheredefaultconjunctformationistobeexplicitlyrestrictedandtheHasantajoiningthetwoconsonantsparticipatingintheconjunctformationneedstobeexplicitlyshown
339UseofYa-phalaaYa-Phalaasequencesare two instances inBanglawhereHasanta isprecededbya fullvowel(U+0985অ-BENGALILETTERAandU+098Fএ-BENGALILETTERE)
For renderingYa-phalā followedbyঅ andএ it isnecessary to typeU+09CDHasantaplusU+09AFyaprecededby thesaidvowelsThis isapurely ligaturalentityand theadditionofYa-phalāandākaraisusedtoelicittheaeligsoundasinEnglishacidঅGািসড
association অGােসািসেয়শনlsquobatrsquoবGাটlsquofatrsquo ফGাট lsquomatrsquo মGাটlsquocaprsquoকGাপetcTheBrāhmīscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribed as lsquovowel killerrsquo although it actually indicates absenceof a vowel after themarkedconsonantOnly theconsonantscanhave theHasantamarkedButasweseehereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅ8াandএ8া(CfUnicode100p473[100])
Owingtoco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egT ieka+Hasanta+raasinচTchakraldquocyclerdquo)Thepointisinboth the cases the slot for ra could be Bangla ra র (U+09B0) or the Assamese ra ৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesameTheLGRmakesanoteofthispoint of concern with respect to the two RAs in disguise as it would be compeltelyimpossibletodistinguishbetweenthemwithnakedeyesinalablesogeneratedwhichmay consequently lead to concerns related to spoofing and other kind of cyberirregularitiesThemotivetoclassthesetwoCPsas(blocking)variantsisbecausefullyrendered labels may mask the distinction between Bangla ra র (U+09B0) or theAssameseraৰ(U+09F0)ThatprovidesthejustificationforVariantSet4thoughonlyinthecontextoffollowingHasantThedifferencebetweentheRAsisonlydistinguishableifonelooksintotheirUnicodevaluesThereforelabelssuchasঅকHarka শীষH sırsalsquotopapexrsquo অY abhra lsquocloudthe skyrsquo ম śrama lsquophysical labourrsquo could be extremelydangerous as theweb-usermay never verify the digital content (the labels) with itsunicodevaluecodepoints ThispointismadeexplicitlywithreferencetoTable9(ofsequencesp36)andTable16(ofWLESymbolsp47)thataretofollowMoreoverit
4 OverallDevelopmentProcessandMethodologyThe Neo-Brāhmī Generation Panel (NBGP) has been formed by members havingexperience in Linguistics (especially in NLP Computational linguistics) LiteratureLanguageHistoryandEpigraphyUndertheNeo-BrāhmīGenerationPanelBanglaandeightotherscriptsbelongingtoseparateUnicodeblocksarebeingtakenuptoassignaseparate LGR for each However an attempt ismade to ensure that the fundamentalphilosophybehindbuildingthoseLGRsconsistentwithallotherBrāhmī-derivedscriptsThepresentLGRwillcater tomultiple languagesbelongingtoEGIDSscale1to4(seeTable4)thatuseBanglascriptThefollowingguidingprinciplesareusedinmakingdecisionsaboutBanglaLGRCode-points
41 GuidingPrinciplesTheNBGP adopts followingbroadprinciples for selection of code-points in the code-pointrepertoireacrosstheboardforalltheNeo-Brāhmīscriptswithinitsambit
411 InclusionPrinciples4111 ModernUsageEvery character proposed should be in the everyday usage of a particular linguisticcommunityThecharacterswhichhavebeenencodedintheUnicodefortranscriptionpurposesonlyorforarchivalpurposeswillnotbeconsideredforinclusioninthecode-pointrepertoire4112 UnambiguousUseEvery character proposed should have unambiguous understanding among linguistsaboutitsusageinthelanguage
42 ExclusionPrinciplesThe main exclusion principle is that of External Limits on Scope These consist ofprotocolsor standardswhichareprerequisites to theLabelGenerationRule-setsAllfurtherprinciplesareinfactsubsumedundertheselimitationsbuthavebeenspeltoutseparatelyforthesakeofclarity
22
421 ExternalLimitsofScopeThecodepointrepertoireforrootzonebeingaveryspecialcaseatthetopofprotocolhierarchies thecanvasofavailablecharacters forselectionasapartof theRootZonecodepointrepertoireisalreadyconstrainedbyvariousprotocollayersbeneathitThefollowingthreemainprotocolsstandardsactassuccessivefiltersiTheUnicodeChartOut of all the characters that are needed by the script in question if a particularcharacter is not encoded in Unicode it cannot be incorporated in the code pointrepertoire Such cases are quite rare and especially so in Bangla-Asamiyā-ManipuriWritingSystemgiventheelaborateandexhaustivecharacterinclusioneffortsmadebytheUnicodeconsortiumiiIDNAProtocolUnicode being the character-encoding standard for providing the maximum possiblerepresentation of a given scriptlanguage it has encoded as far as possible all thepossible characters needed by the script However the Domain name being aspecialized case it is governed by an additional protocol known as IDNA(InternationalizedDomainNames inApplications) The IDNAprotocol excludes somecharactersoutofUnicoderepertoirefrombeingpartofthedomainnamesiiiMaximalStartingRepertoire(MSR)TheRoot-zoneLGRbeing the repertoireof characterswhicharegoing tobeused forcreationoftheRoot-zoneTLDswhichinturnconstituteanevenmorespecializedcaseof domain names the ROOT LGR procedure introduces additional exclusions on theIDNArsquosallowedsetofcharacters ExampleBanglaSignAvagrahaঽ(U+093D)evenifallowedbyIDNAprotocol isnotpermittedintheRootZoneRepertoireaspertheMSRTosumuptherestrictionsstartoffwithadmittingonlysuchcharactersasarepartofthe code-block of the given scriptlanguage The IDNA Protocol further narrows thisdownandfinallyanadditionalfilterintheformofMaximalStartingRepertoirerestrictsthecharactersetassociatedwiththegivenlanguageevenmore4211 NoPunctuationMarksTheTLDsbeingidentifierspunctuationmarkerspresentinBraHami-basedscriptswillnotbeincluded
23
4212 NoSymbolsandAbbreviationsAbbreviations weights and measures and other such iconic characters like BANGLAISSHAR(U+09FA)BANGLACURRENCYDENOMINATORSIXTEEN৹(U+09F9)etcwillalsonotbeincluded4213 NoRareandObsoleteCharactersThere are characterswhich have been added toUnicode to accommodate rare formssuchasSanskriticVOCALICRRৠ(U+09E0)andVOCALICLldquoঌrdquo (U+098C)aswellasVOCALICLLৡ(U+09E1)andtheallographicndashkaraformsofthelattertwosymbols-VOWELSIGNVOCALICL(U+09E2)andVOWELSIGNVOCALICLLldquo(U+09E3)Allsuch charactersareexcludedwhich complieswith theConservatismprincipleas laiddownintheRootZoneLGRprocedureHoweverinBanglathe-karacorrespondingtoVOCALICRRৠ(U+09E0)whichisVOWELSIGNVOCALICRRldquordquo(U+09C4)isstill inactiveuseincertainlimitedborrowedorSanskriticwordsandarethereforeretained4214 NoStressMarkersofClassicalSanskritandVedicStressmarkers for classical Sanskrit will not be included This is also in consonancewiththeLetterprincipleaslaiddownintheRootZoneLGRprocedure4215 ABNFThe Augmented Backus-Naur Formalism (ABNF) is described in Section 541 andAppendix(Section101)
5 RepertoireTheBanglaWritingSystemisrepresentedinUNICODEusingtheBengali(Bangla)scriptname as enumerated in ISO 15924 corresponding to languages such as Asamiyā(Assamese) Bangla (Bengali) and Manipuri The BENGALI block used for Bangla-Asamiyā-Manipuri in theUNICODEhas93 entriesThis sectiondetails the code-pointrepertoirethattheNeo-BrāhmīGenerationPanel[NBGP]proposestobeincludedintheBanglaLGRItmaybementionedherethat theGovernmentofAssamhassubmittedaproposal toBureauof Indian Standards (BIS) on26th February2016 for dis-unificationofBanglaand Asamiyā Scripts The BIS inits 8thMeetingofIndian Language Technologies andProducts Sectional Committee LITD 20 held on 23rd Aug 2017 decided torefer theproposalforrecognitionofAssamesescriptinISOIEC10646toISOUntiltheUNICODEConsortiumtakesanyfurtheractionitwillbeassumedthattheCodePointRepertoireunderTable11willbevalidforallthethreelanguagesasabove
24
For each of the code points language references have been given in the last columntitledReferenceunderTable8titledtheldquoCodePointRepertoirerdquoForentirecoverageofBanglacodepointsreferencesofBanglaAsamiyā(Assamese)Manipuri(Meitei)andBishnupriya are given Kokborok written in Bangla script is not known to haveintroducedmanynewcomplicationsexceptforoneparticularcharacterThoughonlyafewrepresentativelanguagesunderEGIDSScale1-4havebeenchosenforreferencingthey together cover all the code-points required for all the languages that NBGP hasconsideredasgivenunderBanglaUnicodePoints(asgiveninUNICODE63)Howeverbefore thedetailsarepresented it is ideal to lookat theBanglaCodePointChartfromMaximalStartingRepertoire[MSR]Version3Itmaybenotedthattheshapesofthereferenceglyphsgivenbelowinthecodechartsarebasedononeofthemanyfontsdesigned and are not prescriptive because there could be some variations in actualfonts ndash both UNICODE-compatible and True-Type ones Consider the following Codepointtable
ColourconventionAllcharactersthatareincludedinthe[MSR]-YellowbackgroundPVALIDinIDNA2008butexcludedfromthe[MSR]-PinkishbackgroundNot PVALID in IDNA2008 or are ineligiblefor the root zone (digits hyphen) - Whitebackground
Apart from the above individual code-points the Neo-Brāhmī Generation Panel alsoproposes some specific sequences which enable conditional inclusion of the BanglaLETTER A and E followed by Bangla SIGN VIRAMA and Bangla LETTER YA againfollowed by Bangla VOWEL SIGN AA in the repertoire for enabling inclusion of aeligsoundasinEnglishlsquobatrsquolsquocatrsquoetc
52 CodePointRepertoireExclusionTherearesomecharactersoftheBanglascriptthatfindplaceintheUnicodebuthavenot been included in the repertoire in the LGR proposal The reason for excludingঌ(U+098C)andৗ(U+09D7)isthattheyarerareandobsoletecharacters
53 CodepointnotusedaloneBENGALI SIGN NUKTA U+09BC (See 336) is excluded from repertoire since it will never be used alone It will be used as sequence in three special characters in normalized form for ড় ঢ় য়
542 TheVowelSequenceInwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaare given To facilitate understanding of other Brahmi script users equivalents inDevanāgarīareprovidedwherevernecessaryAvowelsequenceismadeupofasinglevowelItmaybefollowedbutnotnecessarily(optionally)byanAnusvāraonuʃʃār(B)Candrabindu(D)oraVisargabiʃɔrgo(X)ThenumberofDBorXwhichcanfollowaV inBanglamaynotberestrictedtooneGoingbytherules illustratedinthedocument it isclearthat formationssuchasVDDVBBandVXXare invalidorthographicunitsHowever it isvalidandpossible tohaveformationsorsequencessuchasanusvarafollowedbyachandrabinduononehandandvisarga followed by a chandrabindu on the other as in হ8াংচা lsquohaelignchārsquo and lsquohaelignrsquo হ8াঃrespectivelyThepossibilityof aVisarga orAnusvāra (onuʃʃār) followingaCandrabinduexists inBangla Vowel can optionally be followed by a combination of Hasanta Virama [H]Consonant [C] to formaYa-phala ldquoYa-phala isapresentation formofU+09AFBanglaletterযorlsquoyarsquoRepresentedbythesequenceltU+09CDieBENGALISIGNVIRAMABangla SIGNHasanta or VIRAMA U+09AF -য BENGALI LETTER YAgt Ya-phala has aspecialformয়AgainwhencombinedwithU+09BEাBENGALIVOWELSIGNAA(ielsquoaarsquo(ā))itisusedfortranscribing[aelig]asintheldquoardquointheEnglishwordldquobatrdquowritteninBanglaasব8াটAVowel-sequenceadmitsthefollowingcombinations
5421 ASingleVowel
ExamplesV অअ
5422 AVowelwithConditionsAVowelcanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D]orVisarga[X]or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX] or combination ofHasanta(orVirama)[H]followedbyConsonant[C]followedbykāra(Mātrā)[M]
5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]
5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX
Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)
545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full
10 Refer to Rule P in Section 7 Table 16
42
vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+
6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed
61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus
স +Hasanta+হ(U+09B8+U+09CD+U+09B9)
2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus
ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)
44
Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)
62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference
7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecifications
11 Unicode uses Oriya for the script although Odia is now the official term used 12 As used by the Unicode denoting and including both Assamese and Maṇipuri
47
Below are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)
C rarr Consonant
M rarr Kāra(Mātrā)
V rarr Vowel
B rarr Anusvāra
D rarr Candrabindu
X rarr Visarga
H rarr Hasanta
Z rarr KhandaTa
S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules
P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA
Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinations
49
711 Case of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াtঅuইিvয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অu)andthesecondwordbeginswithaV(ইিvয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstwordThis isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule
72 AdditionalExamplesfromBanglaABNF
Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive
ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of the
JanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka
[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018
[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473
57
10 Appendix-I
101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply
1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াwেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াxব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexampleyেসিরং(Tsering)
102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti
13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section
58
Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)
Table17ndashTheScriptTableofSylhetiNagarıorSiloti
103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints
ক k Uuml(p+ক)raquo(p+ট)(p+ত)Yacute(p+y+র)(p+y+ব)THORN(p+ন)szlig(p+ব)agrave(p+ম)ক8(p+য)aacute(p+র)acirc(p+ষ)atilde(p+frac12+ণ)auml(p+frac12+ম)aring(p+frac12+ব)acirc8(p+frac12+য)aelig(p+স)t(ccedil+ক)egrave(+p+র)
amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)
গ g ecirc(micro+গ)euml(micro+দ)acute(micro+ধ)igrave(micro+ন)iacute(micro+ব)icirc(micro+ম)গ8(micro+য)iuml(micro+র)eth(micro+ল)ntilde(ccedil+গ)ntilde6 ( +ccedil+গ)
ণ n ouml(Agrave+ট)iquest(Agrave+ঠ)divide(Agrave+ড)oslash(Agrave+Atilde+র)AElig(Agrave+ঢ)ugrave(Agrave+ণ)ণ8(Agrave+য)uacute(Agrave+ব)atilde(p+frac12+ণ)ucirc(frac12+ণ)uuml(+ণ)
Theseepigraphs fromtheeasternpartofUndivided India (datingback to the4th-6thCenturiesAD)showedsomecharacteristicfeaturesofletters(especiallyinমlsquomarsquoলlsquolarsquo
শlsquosarsquoসlsquosarsquoandহlsquoharsquo)whichledtothedevelopmentofeasternvarietyofGuptascriptEpigraphicrecordsfromBangladeshdemonstrateremarkabledevelopmentsinEasternBrāhmī In this context the Tippera copper plate inscription of the lsquoSamatatarsquo rulers(139 pp 265) such as Lokanātha (dated 7th Century AD during the latter half) theKailaninscriptionofSy ridharanaRātaaswellastheAstafpurcopperplatesThelettersseemtohangdownfromwedgeshapedsolidtriangleswithrighthandverticalsbendingdownatthebottombecauseofwhichitwasdescribedbyPrinsepandFleetasKuṭila-lipi (literally lsquoCursivewriting stylersquo)whereas the termSiddhamātrikā (as amatra orbarisplacedovereachoftheletters)wasusedbyAlBiruni(973-1048)todesignatethescriptofNorthernIndiaThenextstageofdevelopmentisillustratedbythe9thCenturycopper plate inscriptions fromKhalimpur of the reign of Dharmapāla fromMonghyrand Nalanda of the time of Devapāla in Bihar and from Jagjıvanpura (Malda) of thereignofMahendrapālaTheSiddhamātrikā(mentionedaslsquoSiddhamrsquoinChinesesources)issaidtohavebeenprevalentalsointhisregionuptotheendofthetenthcenturyAlsocalledtheGauri(ieGandi)inPūrvadeśāortheEasterncountryitwasregardedasthesame script to which is given the appellative Proto-Bangla characteristics inrudimentaryformsintheperiodbetweenAD875andAD1025Insomeepigraphs it isconsideredasbelonging to thesecondquarterof theeleventhcenturyADFlatteningofhead-marksbecomesprominentincomparisontothewedge-shaped serifs An important landmark in the development of the Bangla script is theRamaganja copper plate inscription of Mahāmānḍalika in the last quarter of theeleventhcenturyADItistheearliestdocumentfromthisentireregionwhichbearsthelettermwithatickrisingupwardsThefullvowelidevelopsatickattherightendofthe upper horizontal bar above and a curved hook below Initial e approaches themodernBanglacharacterAmature formofProto-Bangla the immediateprecursorofBanglascriptisillustratedintheinscriptionsoftheVarmanaSenaandDevarulersofthetwelfthandthirteenthcenturies[104]TheevolutionoftheBanglascript(Cf136)isalignedwiththestoryofadvancementofprintingtechnologyThefirstldquoMovabletyperdquoscriptstechnicallycreatedandusedwhileprintingNathanielBrasseyHalheds (1751-1830)1778-book titled AGrammaroftheBengalLanguageIn1785Governor-GeneralWarrenHastings(1732-1818)requestedanother civilian Charles Wilkins (1749-1836) to cut punches for Bangla printingcharactersThecurrentprintedformofBanglascriptappearedsoonafterItisgenerallyagreedthatWilkinsdevelopedBanglaprintscript[111]HepassedonthisknowledgetoPancananaKarmakara(-1804)arenownedartistinBengalLateritwasKarmakarand his family that became famous in Bangla printing technology Shepherd wasanotherassistantofWilkinsinthisdesigningofscriptwhichbecamemoreangularwithsharperturnsandedges[133]Afewarchaiclettersweremodernizedduringthe19th
6
century It was standardized by Pandit Ishwar Chandra Vidyasagar when the Banglatypefontsweretobeusedtopublishona largescaleundertheCalcuttaSchoolBookSociety[116forseveralreferences]Much later in1935 theLinotypetechnique inventedbyOttmarMergenthaler(1854-1899) in 1886was introduced intoBangla printing in 1935 by the efforts of SureshChandra Majumdar (1888-1954) Rajsekhar Basu (1880-1960) Jatindra Kumar Sen(1882-1966)andhisdiscipleSushilKumarBhattacharyaandhadbegunbeingusedbytheA nandabazaraPatrikagrouplaterfollowedbyothersWithinafewyearsthemoreadvancedmonotypetechnologycametobeusedinBanglaprintingHoweverinBanglaprinting culturemonotypehas a very limited acceptance and linotype held stage tilleventuallythedigitaltechnologycameintoreplaceallearliertechniquesAllthesecouldbepresentedinatable
32LanguagesConsideredBelowisthetabularrepresentationofthelanguagesusingBanglascriptthatareplacedonEGIDSScale1-6 (See117 fordetails) Some languagesunderEGIDS5 and6havealso developed their own scripts for printing and publishing Some had used Banglascriptearlier(suchasBodo)orusedit inWestBengalatsomepointoftime(Santali)but have later shifted to another writing system Bodo is now written in Nāgarī orDevanāgarīandforSantalioneusesbothNāgarīDevanāgarīandOl-chiki(145)Forthepurposesof theBanglaLGRonly languagesbelonging to theEGIDS scale1 to4havebeenconsideredConsiderthefollowingtable
9
EGIDSScale1
EGIDSScale2
EGIDSScale3
EGIDSScale4
EGIDSScale5
EGIDS6
Bangla(Bengali)
SantaliBodoRiangKhumiMru(ng)Asho
LepchaPnarKodaKoraChak
Asamiyā(Assamese)
KochorRajabansı
MaltoorMalpahariya
ManipuriorMeitei
BisnupriyaManipuriKok-Borok(TripuraampBangladesh)
ChakmaHajongMundariampKurux(ofBangladesh)
TotoRohingyaTipperaMegamTanchangya
Usoi LimbuSadriorOraon
BhumijorMundariBawmChin
Table4MainlanguagesinIndiaandBangladesh
thatuseBanglaScriptontheEGIDSScale
33NotableFeaturesofBanglaScript[150]BanglaWritingSystemhascertainfeaturesthatshowhowithastobewritteninorhowtype-setting inBangla couldbedoneThis section is followedbya section that explains theCode-points (and fixed Code-point sequences) which show certain distinctive characteristics ofBanglaandwhichmaketheRepertoireThenextsectionswillalsocoverthelsquoaksharrsquo-formationrules(ABNF)showingcharacterclassWordLevelEvaluation(WLE)andContextRulesaswellas In-ScriptandCross-ScriptVariantsHerewepresentsomebasic featuresof theScriptandPronuncition The Bangla script is an alpha-syllabic writing system in which writing of all
consonants are assumed to contain an accompanying lsquoinherentrsquo vowel(theoretically before or after each consonant) It varies between ɔ and odepending on the position of the consonant in the word At times theselsquoassumedrsquoorlsquoinherentrsquovowelsarenotpronouncedatall[142]
10
Vowelscanbewrittenasindependentlettersorbyusingavarietyofdiacriticalmarks which are written above below before after or both of the last twopositionstheconsonanttheyfollowinpronunciation[105]
WhenconsonantsoccurtogetherinclustersspecialconjunctlettersareformedInprintedBanglamanyof theseconsonantal clustersorconjoinedconsonantsareinuseThelettersfortheconsonantsotherthanthefinaloneinthegrouparegenerally reduced But there are a few special conjunct characters which arecompounds of the consonant characters eg 7(k)+ষ(s)=8(ks)
9(n)+জ(j)=(nj)(j)+ঞ(n)==(jn) gt (h)+ম(m)=(hm) There are other issuesalsomdashরasthesecondmemberofaclusterisreducedtoasecondarysymboleg
(p)+র(r)=A(pr)B(s)+C(t)+র(r)=D(str) (as inউD ustra ldquocamelrdquo)য (y)whenusedas a primary symbol represents jɔ in Bangla But its secondary symbol(allograph) jɔ-phala has two phonetic values When added to the initialconsonant in a word it is a vowel aelig (as in শGামল (syamala) ldquogreenrdquo র Gাপার
(ryapara)ldquowrapperrdquoetc)Butafteranon-initialconsonant it justdoublesit in
pronunciation (as in কাযH ধাযH etc) The I(r)+য(y) combination has two
shape of the second member is changedmdasheg M(ddh) N(gdh) and O(ndh)
respectively The solitary example of I (r)+ঋ(r)=ঋH (as in ৈনঋHত nairrtSouthwest) ndash usedmostly in cases of Classical borrowings shows the use ofsecondary symbol of a consonant followed by the primary symbol of a vowelTheinherentvowelonlyappliestothefinalconsonantofthecluster
The Bangla script has at least fifty-two primary symbols and quite a fewallographs(positionalvariantsofthem)correspondingtoforty-four(7oraland7nasalvowelsand30consonants)phonemes(150)orfunctionalspeechsoundswithsomeobviousredundanciesalthoughinoneofthefirstphonemicanalysisthenumberwasthoughttobethirty-fivephonemes[140]
11
As mentioned above in Bangla several graphemic symbols have secondaryshapestechnicallycalledlsquoallographsrsquowithacomplementarydistributionineachcaseThesegraphsormarkingsaregenerallyaddedtothefollowingpositionsoftheprimarysymbol[113]inthefollowingmanner
Asforcomplementarydistributionofvowelletters(word-orsyllable-initial)andVowel Matras which are relevant for ABNF let us consider the followingBesidessomesimpleVowelModifierscalledlsquoKarsrsquoinBangla(alsoreferredtoasMatraintheotherLGRdocumentsofNeo-Brāhmī)therearesomecombinatorymodifiersofBanglaVowelswithcertainconsonantsForexamplewhereas
আU+0986BENGALILETTERAAissubstitutedby
াU+09BEBENGALIVOWELSIGNAA
ইU+0987BENGALILETTERIissubstitutedby
pre-posedিU+09BFBENGALIVOWELSIGNI
ঈU+0988BENGALILETTERIIissubstitutedby
ীU+09C0BENGALIVOWELSIGNIIor
উU+0989BENGALILETTERUissubstitutedby
U+09C1 BENGALI VOWEL SIGN U by marking below the primary
grapheme there are some special vowel modifiers of উ as in the followingcombinedletters zwnj guratherthanwritingasগ(g)+ (u)
m (bh)+র (r)(n bhruldquoeyebrowrdquo)o (s)+র (r)(psru)ঋ (r)afterহ (h)(q hr)etc
TherehavebeenmanynotablecontributionsinsimplifyingandmodifyingBanglaspellings and combinatory techniques especially by scholars such as PabitraSarkar(1992)[134]Inthistherehasbeenanattempttoreducethenumberofallographs of both vowels and consonants in clusters and it has been widelyacceptedintheprintingofschooltextsinbothBangladeshandWestBengal[151152]Asofnow twosystems theold (traditional) and thenewgoon sidebysideoperativeindifferentdomains
HoweverinpreparationofthisLGRdocumenttheaimhasbeentoconsiderthewidelyused and usable sequences and combinations and their variations across the sisterscriptsbelongingtothebasketofBrāhmīwritingsystemsBanglaAcademyDhakapublishedStandardBanglaSpellingRulesin1992followingtherecommendationsofacommitteeconstitutedthroughaworkshopjointlyorganizedbytheJatıyaSy iksakramaandPathyapustakaBoardin1988AthroughlyrevisededitionoftheRuleswaspublishedinSeptember20126After the establishment of Banla A kademi ofWestBengal in 1986 its first PresidentAnnadasankar Ray (1904-2002) in his inaugural address gave a direction forstandardizationofBanglaalphabetscript thespellingsystemandclearlyarguedthattheywouldnotblindly followtheSanskriticmodelofconventionalgrammarAbroadlistofproposalswassenttoexpertsonBanglaandabroadagreementwasreachedforlsquohomogenizationofBanglaspellingrsquoby1988BasedonopinionsreceivedfromdifferentquartersaunanimouslistoflsquorulesrsquowasagreeduponThiswaspublishedbyalsquoSpellingDictionaryrsquo titled Ākādemi Bānāna Abhidhāna (1997) which was obviously morecomprehensive than lsquoTheUniversityofCalcuttaproposalsrsquomade in1936Alongwiththe lsquorationalizationrsquo of spellings another stepwas taken tomake thewriting systemeasier to read by making the symbols used both single and combined ones morelsquotransparentrsquoThesereformswereoriginallysuggestedbySarkar(1987firstpublishedin1978)[134][153]whereheusedthetermsSwaccha (lsquoTransparentrsquo)andAswaccha(lsquoOpaquersquo or non-transparent) even adding Ardha Swaccha (lsquohalf transparent) inbetweenthetwoSomesampleexamplesare Transparent r (nn) s (pt) [ (st) where both member of the cluster can berecognized
6Bangla Academy 2012 Bāṅlā Ekaḍemī Pramita Bāṅlā Bānānera Niyama (Bangla Academy StandardBanglaSpellingRules)DhakaːBanglaAcademy
Therewere in fact two types of proposals One concerned the shape of the lettersthose of consonant + vowel (CV) combinations and conjuncts which is consonant +consonantcombinationsTherewerefurthercomplexshapesiethoseofconsonant+consonant+ (consonant+) vowel (CC(CV) signs as in y (pru) or z (skru) SomedecisionsinthisareawerenecessarybecauseafewoftheCC(C)symbolsrepresentedcomplexitiesthatmadelearningthemdifficultforthechildrenTheotherdealtwiththespellings ofwords onlywithout any reference to the shapes of letters inwhich theywere written The basic objective here was lsquoone word one spellingrsquo to the greatestextentthatwaspossible[151]
Belowwe place a statement of themost salient changes that affect the consonant +vowelcombinations[153]
a The variants of the short u (^ উ-কার hrasva u-kāra) vowel sign have been brought down to one ie So zwnj (gu) is now গ Similarly h (ru) gt র zwnj (śu)gt শ j (hu)gtহ and therefore cluster + short u sign k (ntu)gt W
(ন++ত+উ) (stu)gt[ (স++ত+উ)
b The variants of long u (দীঘH ঊ-কার dīrgha u-kāra) have also been reduced
(rū)gt র n (bhrū) gt Y (ভ bh++র r+ঊ ū) (drū)gt (দ d++র r+ঊ ū) p (śrū)gt (শ ś++র r+ঊ ū)
c The variants of ঋ-কার (ṛ-kāra secondary symbol of ṛ) have been brought down to one q (hṛ) gt হ
Regarding consonant + consonant + (consonant)hellip+ (vowel) clusters PaschimbangaBanglaAkademi proposed transparent or semi-transparent shapes for clusters to theextentadmissibleinBanglawritingsystemSomeexampleswillclarifytheproposal(Aslashwillmeanthatthetraditionalcluster-shapeprecedesitwhiletheBanglaAkademiinnovationfollows)[153]
331TheConsonantsAsper traditional classificationBangla Consonants are categorized according to theirphoneticpropertiesespeciallyintermsofplaceandmannerofarticulation[107]Thereare Five lsquoVargarsquo (pronounced as lsquoBargarsquo in Bangla) or Groups (sets or classes)distinguished by Place of Articulation and one Non-lsquovargarsquo group [105] Each Vargawhich corresponds toStopsat a certainplaceof articulation containsa seriesof fiveconsonants classified as per their phonetic qualities (ie manner of articulation)beginning from Unvoiced and Unaspirated to Voiced and Aspirated (in the fourthcolumn)finallyendingwithaHomorganicorCorrespondingnasal[107]Considerthefollowingtable
332TheImplicitVowelKillerHasanta(calledrsquoHalantrsquoorlsquoHalantarsquoinotherBrahmı-basedscripts)As stated earlier all consonants are pronounced in isolation with an implicit vowel(centralback-ɔinBanglaastheneutralvowel)assumedtobeassociatedwiththem[121]ThelsquoHasantarsquo(=rsquoHalantrsquoorlsquoHalantarsquoinotherBrahmı-basedscripts)orthetermlsquoVirāmarsquo7(=rsquoDa rirsquoinBangla)aspreferredinUNICODE(cfUnicode30andabove)havebeenusedinthisreportastermsthathavebeenusedtodenotethecharacterthatmarkthe absence of this inherent vowel It may be noted that the term virama has beenadopted in UNICODE in a sense that is different from the traditional definition ofgrammarandhenceitrequiressomeexplanationhereConsideringtheimportanceofthedocumentthisnoteshouldbeapartofthisLGRdocumentsothatanybodyreferingtoitshouldbeabletoknowthepropergrammaticalexplanationofthetermBecauseaspecialsignisneededwheneverthisimplicitvowelisstrippedoffthesymbolisknownas the Hasanta (= Halant) (U+09CD) By placing the Hasanta under the firstconsonantofacombinationorclusteronecouldndashincommonparlanceldquokillrdquoitsvowelandcreate conjuncts In thismanner conjunct characters canbegenerallywrittenbyjoining two to fourconsonant combinations In rarecases thisprocess can joinup tofive consonantsHowever thenotionof amaximumnumberof consonants joining toformoneaksara8istobeboundedempiricallyThisisanobservationbasedontheCIIL-Emille Corpora of Bangla words [132 amp 133] as seen in print these days Given themixture of scripts and languages happening on theweb the possibility that onemaywant a generic Top Level Domain [gTLD] which may have more than the observedmaximum cannot be ruled out This can be the case when a foreign language wordwhichadmitsalargenumberofconsonantsistransliteratedintoBanglaHenceintheBanglaLGRworkthislimitwillnotbeenforced
333VowelsSeparate symbols exist for all lsquoSwararsquo or Vowels in Bangla which are pronouncedindependentlyeitheratthebeginningofthewordorafteranothervowelorconsonantsoundToindicateaVowelsoundotherthantheimplicitoneaVowelsigncalledlsquokārrsquoinBanglaorMātrāinNagarı9isattachedtotheconsonantSincetheconsonanthasthisbuilt in neutral vowel at the end there are equivalent kāras (Mātrās) for all vowelsexcepttheঅ(pronounced-ɔ)Thecorrelationisshownasfollows
7VirāmaasusedhereisalsoamisnomeraccordingtotheIndiangrammaticaltraditionsNowheremereabsence of a vowel is marked as virama Hasanta just marks the absence of a vowel nothing else(AbhyankarKashinathVasudevampJMShukla1961ADictionaryofSanskritGrammarBarodaːOrientalInstitute)8ThistermneedstobedisambiguatedAksaraalsomeanslsquosyllablelsquoinIndiangrammaticaltreaditions9AlthoughthetermlsquoMatralsquoinBanglastandsforanaltogetherdifferentconceptvizthetopbarplacedoveraletterndashtypicallyavailableinHindiandBanglabutmissinginGujarati
334TheAnusvāraonuʃʃār(ং-U+0982)TheAnusvāra or onuʃʃār inBangla at times represents a homorganic nasal but notalwaysItreplacesaconjunctgroupofalsquoNasalConsonant+Hasanta+ConsonantrsquowherethesecondconsonantbelongstotheVelarvargaorsetasinলংকাButitoftenappearsalso for such combinations involving non-velars appearing as the lastmember of thecombinationasinলGাংটা ldquonakedrdquoorলGাংচা ldquoakindofsweettolimprdquoBeforeanon-vargaconsonant the Anusvara represents a nasal sound that may have an alternativeconjoined writing symbol representing the corresponding nasal consonant of theparticularsetAlthoughModernHindiMarathiandKonkaniprefertheanusvāratothecorrespondingHalf-nasal inBangla it isclearlydemarcatedastowhereonemustusetheAnusvāraandwhere ithastobeaconjunctclusterwithanasalasthefirstorthesecondcomponent
335NasalizationCandrabindu(-U+0981)
Candrabindu denotes nasalization of the preceding vowel as in চাদ cad lsquomoonrsquo(U+099AU+09BEU+0981U+09A6)Thissignwithadotinsidethehalf-moonmarkisusedasnasalizationmarkerinmanyBrahmı-basedscripts[143]
336Nukta(-U+09BC)ThenuktasigndoesnotexistinBanglaorthographyItispredominantlyusedinmanyBrahmıderivedscriptssuchasDevanagarı(forHindiBodoMaithiliSantaliKashmiriandSindhiThetermandtheconceptofnuktaareborrowedinBanglaTheIDNAProtocol(RFC5891)statesthatIDNsmustbeinUnicodeNormalizationFormC (NFC) RFC 7940 applies this requirement to LGRs The definition of NFC in theUnicodeStandardcontainsanumberofcompositionexclusionsAsaresulttheBanglalettersয় YYAড় RRAandঢ় RRHAhavetoberepresentedinthethisLGRbyusingthesequences (YA +Nukta U+9AF + U+09BC) (DDA + Nukta U+9A1 + U+09BC) and(DDHA+NuktaU+9A2+U+09BC)insteadofthesinglecodepointsYYA(U+9DF)RRA(U+09DC) andRRHA (U+09DD) although the useof lsquoNuktarsquo is otherwise completelyunnaturalinBanglaIt is noted that in the current Unicode Standard chart these characters are listed asadditionalconsonantsAspertheLGRProcedurehoweverthesedecisionsdependontheIDNAProtocolthroughasetofprodeduresdevelopedbytheIETFEventhoughtheUnicode Standard also prescribesmethods to produce these three characters both asatomiccharacters (forexample09DC forড় [r]09DD forঢ় [rh] and09DFasয় [y]assinglekeystroke)theIDNAprotocolrequiresthatwetreatthemasconjunctcharactersandthenallocatecodesfortheseintheUnicodeBengaliBlock
18
ItmaybenotedthattherecouldbesporadicattemptsorcasesofwritingMuslimnamesUrdupoeticwordsandPerso-Arabicloanwordswithnuktaunderক(k)খ(kh)গ(g)জ(j) and ফ (ph) only for the sake of correct pronunciation and for maintaining thesanctityoftheloanwordThesewerealsolikeusingBanglawritingsystemtoworkliketheIPAscriptItishowevernotinuseinBanglawritinginprinting
337Visargabiʃɔrgo(ঃ-U+0983)andAvagraha(ঽ-U+09BD)
TheVisargabiʃɔrgoU+0983 is frequentlyused inBangla loanwordsborrowed fromSanskritandrepresentsasoundveryclosetohOnecouldquoteasanexampleদঃখduhkholdquosorrowrsquorsquoldquounhappinessrsquorsquo(U+0926U+0941U+0983U+0916)The Avagraha ঽ (U+09BD) is mainly used in Sanskrit Pali Prakrt or Maithili textswritteninBanglaItisgraduallybeingreplacedbyanuppercomma(egনেরাঽপরািণre-writtenasনেরাrsquoপরািণ)ItisrarelyusednoweveninotherlanguagesusingBanglascriptIncaseofLGRtheAvagrahaisnotpartoftherepertoireIthasbeendecidedthereforenottoretainAvagraha(ঽ)(U+09BD)becauseitisblockedinTLDsaspertheMaximalStartingRepertoire(MSR)PleaseseeAppendixIIinsection11foracompletelistofBanglaconsonantsandtheirallographs
type in the following mannermdashর++য but for র G the sequence would be
র+ZWJ++য [154] In other words ZWJ is used in the rendering of wordsdemanding ya-phalā after ra which is otherwise not possible to type (render)due to the same order of ra+hasanta+antastha ja in the medial andor finalposition Interestingly ra+hasanta+antastha ja is used to type repha on theconsonant -antasthaja as inকায6 (kaarjo) In order to get a ya-phalā after the
consonant -ra it is therefore obligatory to use ZWJ after -ra as in র Gাপার
TheuseofZWJZWNJhavebeenruledoutfromtherootzonebythe[Procedure]Usedin Bangla to create alternate renderings the insertion of these two signs can affectsearchingaswellasNLPTheZeroWidthNon-joiner(ZWNJ)isaninvisiblecharacterusedincertaincases(afterHasanta)wheredefaultconjunctformationistobeexplicitlyrestrictedandtheHasantajoiningthetwoconsonantsparticipatingintheconjunctformationneedstobeexplicitlyshown
339UseofYa-phalaaYa-Phalaasequencesare two instances inBanglawhereHasanta isprecededbya fullvowel(U+0985অ-BENGALILETTERAandU+098Fএ-BENGALILETTERE)
For renderingYa-phalā followedbyঅ andএ it isnecessary to typeU+09CDHasantaplusU+09AFyaprecededby thesaidvowelsThis isapurely ligaturalentityand theadditionofYa-phalāandākaraisusedtoelicittheaeligsoundasinEnglishacidঅGািসড
association অGােসািসেয়শনlsquobatrsquoবGাটlsquofatrsquo ফGাট lsquomatrsquo মGাটlsquocaprsquoকGাপetcTheBrāhmīscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribed as lsquovowel killerrsquo although it actually indicates absenceof a vowel after themarkedconsonantOnly theconsonantscanhave theHasantamarkedButasweseehereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅ8াandএ8া(CfUnicode100p473[100])
Owingtoco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egT ieka+Hasanta+raasinচTchakraldquocyclerdquo)Thepointisinboth the cases the slot for ra could be Bangla ra র (U+09B0) or the Assamese ra ৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesameTheLGRmakesanoteofthispoint of concern with respect to the two RAs in disguise as it would be compeltelyimpossibletodistinguishbetweenthemwithnakedeyesinalablesogeneratedwhichmay consequently lead to concerns related to spoofing and other kind of cyberirregularitiesThemotivetoclassthesetwoCPsas(blocking)variantsisbecausefullyrendered labels may mask the distinction between Bangla ra র (U+09B0) or theAssameseraৰ(U+09F0)ThatprovidesthejustificationforVariantSet4thoughonlyinthecontextoffollowingHasantThedifferencebetweentheRAsisonlydistinguishableifonelooksintotheirUnicodevaluesThereforelabelssuchasঅকHarka শীষH sırsalsquotopapexrsquo অY abhra lsquocloudthe skyrsquo ম śrama lsquophysical labourrsquo could be extremelydangerous as theweb-usermay never verify the digital content (the labels) with itsunicodevaluecodepoints ThispointismadeexplicitlywithreferencetoTable9(ofsequencesp36)andTable16(ofWLESymbolsp47)thataretofollowMoreoverit
4 OverallDevelopmentProcessandMethodologyThe Neo-Brāhmī Generation Panel (NBGP) has been formed by members havingexperience in Linguistics (especially in NLP Computational linguistics) LiteratureLanguageHistoryandEpigraphyUndertheNeo-BrāhmīGenerationPanelBanglaandeightotherscriptsbelongingtoseparateUnicodeblocksarebeingtakenuptoassignaseparate LGR for each However an attempt ismade to ensure that the fundamentalphilosophybehindbuildingthoseLGRsconsistentwithallotherBrāhmī-derivedscriptsThepresentLGRwillcater tomultiple languagesbelongingtoEGIDSscale1to4(seeTable4)thatuseBanglascriptThefollowingguidingprinciplesareusedinmakingdecisionsaboutBanglaLGRCode-points
41 GuidingPrinciplesTheNBGP adopts followingbroadprinciples for selection of code-points in the code-pointrepertoireacrosstheboardforalltheNeo-Brāhmīscriptswithinitsambit
411 InclusionPrinciples4111 ModernUsageEvery character proposed should be in the everyday usage of a particular linguisticcommunityThecharacterswhichhavebeenencodedintheUnicodefortranscriptionpurposesonlyorforarchivalpurposeswillnotbeconsideredforinclusioninthecode-pointrepertoire4112 UnambiguousUseEvery character proposed should have unambiguous understanding among linguistsaboutitsusageinthelanguage
42 ExclusionPrinciplesThe main exclusion principle is that of External Limits on Scope These consist ofprotocolsor standardswhichareprerequisites to theLabelGenerationRule-setsAllfurtherprinciplesareinfactsubsumedundertheselimitationsbuthavebeenspeltoutseparatelyforthesakeofclarity
22
421 ExternalLimitsofScopeThecodepointrepertoireforrootzonebeingaveryspecialcaseatthetopofprotocolhierarchies thecanvasofavailablecharacters forselectionasapartof theRootZonecodepointrepertoireisalreadyconstrainedbyvariousprotocollayersbeneathitThefollowingthreemainprotocolsstandardsactassuccessivefiltersiTheUnicodeChartOut of all the characters that are needed by the script in question if a particularcharacter is not encoded in Unicode it cannot be incorporated in the code pointrepertoire Such cases are quite rare and especially so in Bangla-Asamiyā-ManipuriWritingSystemgiventheelaborateandexhaustivecharacterinclusioneffortsmadebytheUnicodeconsortiumiiIDNAProtocolUnicode being the character-encoding standard for providing the maximum possiblerepresentation of a given scriptlanguage it has encoded as far as possible all thepossible characters needed by the script However the Domain name being aspecialized case it is governed by an additional protocol known as IDNA(InternationalizedDomainNames inApplications) The IDNAprotocol excludes somecharactersoutofUnicoderepertoirefrombeingpartofthedomainnamesiiiMaximalStartingRepertoire(MSR)TheRoot-zoneLGRbeing the repertoireof characterswhicharegoing tobeused forcreationoftheRoot-zoneTLDswhichinturnconstituteanevenmorespecializedcaseof domain names the ROOT LGR procedure introduces additional exclusions on theIDNArsquosallowedsetofcharacters ExampleBanglaSignAvagrahaঽ(U+093D)evenifallowedbyIDNAprotocol isnotpermittedintheRootZoneRepertoireaspertheMSRTosumuptherestrictionsstartoffwithadmittingonlysuchcharactersasarepartofthe code-block of the given scriptlanguage The IDNA Protocol further narrows thisdownandfinallyanadditionalfilterintheformofMaximalStartingRepertoirerestrictsthecharactersetassociatedwiththegivenlanguageevenmore4211 NoPunctuationMarksTheTLDsbeingidentifierspunctuationmarkerspresentinBraHami-basedscriptswillnotbeincluded
23
4212 NoSymbolsandAbbreviationsAbbreviations weights and measures and other such iconic characters like BANGLAISSHAR(U+09FA)BANGLACURRENCYDENOMINATORSIXTEEN৹(U+09F9)etcwillalsonotbeincluded4213 NoRareandObsoleteCharactersThere are characterswhich have been added toUnicode to accommodate rare formssuchasSanskriticVOCALICRRৠ(U+09E0)andVOCALICLldquoঌrdquo (U+098C)aswellasVOCALICLLৡ(U+09E1)andtheallographicndashkaraformsofthelattertwosymbols-VOWELSIGNVOCALICL(U+09E2)andVOWELSIGNVOCALICLLldquo(U+09E3)Allsuch charactersareexcludedwhich complieswith theConservatismprincipleas laiddownintheRootZoneLGRprocedureHoweverinBanglathe-karacorrespondingtoVOCALICRRৠ(U+09E0)whichisVOWELSIGNVOCALICRRldquordquo(U+09C4)isstill inactiveuseincertainlimitedborrowedorSanskriticwordsandarethereforeretained4214 NoStressMarkersofClassicalSanskritandVedicStressmarkers for classical Sanskrit will not be included This is also in consonancewiththeLetterprincipleaslaiddownintheRootZoneLGRprocedure4215 ABNFThe Augmented Backus-Naur Formalism (ABNF) is described in Section 541 andAppendix(Section101)
5 RepertoireTheBanglaWritingSystemisrepresentedinUNICODEusingtheBengali(Bangla)scriptname as enumerated in ISO 15924 corresponding to languages such as Asamiyā(Assamese) Bangla (Bengali) and Manipuri The BENGALI block used for Bangla-Asamiyā-Manipuri in theUNICODEhas93 entriesThis sectiondetails the code-pointrepertoirethattheNeo-BrāhmīGenerationPanel[NBGP]proposestobeincludedintheBanglaLGRItmaybementionedherethat theGovernmentofAssamhassubmittedaproposal toBureauof Indian Standards (BIS) on26th February2016 for dis-unificationofBanglaand Asamiyā Scripts The BIS inits 8thMeetingofIndian Language Technologies andProducts Sectional Committee LITD 20 held on 23rd Aug 2017 decided torefer theproposalforrecognitionofAssamesescriptinISOIEC10646toISOUntiltheUNICODEConsortiumtakesanyfurtheractionitwillbeassumedthattheCodePointRepertoireunderTable11willbevalidforallthethreelanguagesasabove
24
For each of the code points language references have been given in the last columntitledReferenceunderTable8titledtheldquoCodePointRepertoirerdquoForentirecoverageofBanglacodepointsreferencesofBanglaAsamiyā(Assamese)Manipuri(Meitei)andBishnupriya are given Kokborok written in Bangla script is not known to haveintroducedmanynewcomplicationsexceptforoneparticularcharacterThoughonlyafewrepresentativelanguagesunderEGIDSScale1-4havebeenchosenforreferencingthey together cover all the code-points required for all the languages that NBGP hasconsideredasgivenunderBanglaUnicodePoints(asgiveninUNICODE63)Howeverbefore thedetailsarepresented it is ideal to lookat theBanglaCodePointChartfromMaximalStartingRepertoire[MSR]Version3Itmaybenotedthattheshapesofthereferenceglyphsgivenbelowinthecodechartsarebasedononeofthemanyfontsdesigned and are not prescriptive because there could be some variations in actualfonts ndash both UNICODE-compatible and True-Type ones Consider the following Codepointtable
ColourconventionAllcharactersthatareincludedinthe[MSR]-YellowbackgroundPVALIDinIDNA2008butexcludedfromthe[MSR]-PinkishbackgroundNot PVALID in IDNA2008 or are ineligiblefor the root zone (digits hyphen) - Whitebackground
Apart from the above individual code-points the Neo-Brāhmī Generation Panel alsoproposes some specific sequences which enable conditional inclusion of the BanglaLETTER A and E followed by Bangla SIGN VIRAMA and Bangla LETTER YA againfollowed by Bangla VOWEL SIGN AA in the repertoire for enabling inclusion of aeligsoundasinEnglishlsquobatrsquolsquocatrsquoetc
52 CodePointRepertoireExclusionTherearesomecharactersoftheBanglascriptthatfindplaceintheUnicodebuthavenot been included in the repertoire in the LGR proposal The reason for excludingঌ(U+098C)andৗ(U+09D7)isthattheyarerareandobsoletecharacters
53 CodepointnotusedaloneBENGALI SIGN NUKTA U+09BC (See 336) is excluded from repertoire since it will never be used alone It will be used as sequence in three special characters in normalized form for ড় ঢ় য়
542 TheVowelSequenceInwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaare given To facilitate understanding of other Brahmi script users equivalents inDevanāgarīareprovidedwherevernecessaryAvowelsequenceismadeupofasinglevowelItmaybefollowedbutnotnecessarily(optionally)byanAnusvāraonuʃʃār(B)Candrabindu(D)oraVisargabiʃɔrgo(X)ThenumberofDBorXwhichcanfollowaV inBanglamaynotberestrictedtooneGoingbytherules illustratedinthedocument it isclearthat formationssuchasVDDVBBandVXXare invalidorthographicunitsHowever it isvalidandpossible tohaveformationsorsequencessuchasanusvarafollowedbyachandrabinduononehandandvisarga followed by a chandrabindu on the other as in হ8াংচা lsquohaelignchārsquo and lsquohaelignrsquo হ8াঃrespectivelyThepossibilityof aVisarga orAnusvāra (onuʃʃār) followingaCandrabinduexists inBangla Vowel can optionally be followed by a combination of Hasanta Virama [H]Consonant [C] to formaYa-phala ldquoYa-phala isapresentation formofU+09AFBanglaletterযorlsquoyarsquoRepresentedbythesequenceltU+09CDieBENGALISIGNVIRAMABangla SIGNHasanta or VIRAMA U+09AF -য BENGALI LETTER YAgt Ya-phala has aspecialformয়AgainwhencombinedwithU+09BEাBENGALIVOWELSIGNAA(ielsquoaarsquo(ā))itisusedfortranscribing[aelig]asintheldquoardquointheEnglishwordldquobatrdquowritteninBanglaasব8াটAVowel-sequenceadmitsthefollowingcombinations
5421 ASingleVowel
ExamplesV অअ
5422 AVowelwithConditionsAVowelcanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D]orVisarga[X]or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX] or combination ofHasanta(orVirama)[H]followedbyConsonant[C]followedbykāra(Mātrā)[M]
5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]
5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX
Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)
545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full
10 Refer to Rule P in Section 7 Table 16
42
vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+
6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed
61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus
স +Hasanta+হ(U+09B8+U+09CD+U+09B9)
2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus
ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)
44
Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)
62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference
7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecifications
11 Unicode uses Oriya for the script although Odia is now the official term used 12 As used by the Unicode denoting and including both Assamese and Maṇipuri
47
Below are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)
C rarr Consonant
M rarr Kāra(Mātrā)
V rarr Vowel
B rarr Anusvāra
D rarr Candrabindu
X rarr Visarga
H rarr Hasanta
Z rarr KhandaTa
S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules
P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA
Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinations
49
711 Case of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াtঅuইিvয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অu)andthesecondwordbeginswithaV(ইিvয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstwordThis isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule
72 AdditionalExamplesfromBanglaABNF
Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive
ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of the
JanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka
[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018
[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473
57
10 Appendix-I
101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply
1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াwেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াxব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexampleyেসিরং(Tsering)
102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti
13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section
58
Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)
Table17ndashTheScriptTableofSylhetiNagarıorSiloti
103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints
ক k Uuml(p+ক)raquo(p+ট)(p+ত)Yacute(p+y+র)(p+y+ব)THORN(p+ন)szlig(p+ব)agrave(p+ম)ক8(p+য)aacute(p+র)acirc(p+ষ)atilde(p+frac12+ণ)auml(p+frac12+ম)aring(p+frac12+ব)acirc8(p+frac12+য)aelig(p+স)t(ccedil+ক)egrave(+p+র)
amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)
গ g ecirc(micro+গ)euml(micro+দ)acute(micro+ধ)igrave(micro+ন)iacute(micro+ব)icirc(micro+ম)গ8(micro+য)iuml(micro+র)eth(micro+ল)ntilde(ccedil+গ)ntilde6 ( +ccedil+গ)
ণ n ouml(Agrave+ট)iquest(Agrave+ঠ)divide(Agrave+ড)oslash(Agrave+Atilde+র)AElig(Agrave+ঢ)ugrave(Agrave+ণ)ণ8(Agrave+য)uacute(Agrave+ব)atilde(p+frac12+ণ)ucirc(frac12+ণ)uuml(+ণ)
century It was standardized by Pandit Ishwar Chandra Vidyasagar when the Banglatypefontsweretobeusedtopublishona largescaleundertheCalcuttaSchoolBookSociety[116forseveralreferences]Much later in1935 theLinotypetechnique inventedbyOttmarMergenthaler(1854-1899) in 1886was introduced intoBangla printing in 1935 by the efforts of SureshChandra Majumdar (1888-1954) Rajsekhar Basu (1880-1960) Jatindra Kumar Sen(1882-1966)andhisdiscipleSushilKumarBhattacharyaandhadbegunbeingusedbytheA nandabazaraPatrikagrouplaterfollowedbyothersWithinafewyearsthemoreadvancedmonotypetechnologycametobeusedinBanglaprintingHoweverinBanglaprinting culturemonotypehas a very limited acceptance and linotype held stage tilleventuallythedigitaltechnologycameintoreplaceallearliertechniquesAllthesecouldbepresentedinatable
32LanguagesConsideredBelowisthetabularrepresentationofthelanguagesusingBanglascriptthatareplacedonEGIDSScale1-6 (See117 fordetails) Some languagesunderEGIDS5 and6havealso developed their own scripts for printing and publishing Some had used Banglascriptearlier(suchasBodo)orusedit inWestBengalatsomepointoftime(Santali)but have later shifted to another writing system Bodo is now written in Nāgarī orDevanāgarīandforSantalioneusesbothNāgarīDevanāgarīandOl-chiki(145)Forthepurposesof theBanglaLGRonly languagesbelonging to theEGIDS scale1 to4havebeenconsideredConsiderthefollowingtable
9
EGIDSScale1
EGIDSScale2
EGIDSScale3
EGIDSScale4
EGIDSScale5
EGIDS6
Bangla(Bengali)
SantaliBodoRiangKhumiMru(ng)Asho
LepchaPnarKodaKoraChak
Asamiyā(Assamese)
KochorRajabansı
MaltoorMalpahariya
ManipuriorMeitei
BisnupriyaManipuriKok-Borok(TripuraampBangladesh)
ChakmaHajongMundariampKurux(ofBangladesh)
TotoRohingyaTipperaMegamTanchangya
Usoi LimbuSadriorOraon
BhumijorMundariBawmChin
Table4MainlanguagesinIndiaandBangladesh
thatuseBanglaScriptontheEGIDSScale
33NotableFeaturesofBanglaScript[150]BanglaWritingSystemhascertainfeaturesthatshowhowithastobewritteninorhowtype-setting inBangla couldbedoneThis section is followedbya section that explains theCode-points (and fixed Code-point sequences) which show certain distinctive characteristics ofBanglaandwhichmaketheRepertoireThenextsectionswillalsocoverthelsquoaksharrsquo-formationrules(ABNF)showingcharacterclassWordLevelEvaluation(WLE)andContextRulesaswellas In-ScriptandCross-ScriptVariantsHerewepresentsomebasic featuresof theScriptandPronuncition The Bangla script is an alpha-syllabic writing system in which writing of all
consonants are assumed to contain an accompanying lsquoinherentrsquo vowel(theoretically before or after each consonant) It varies between ɔ and odepending on the position of the consonant in the word At times theselsquoassumedrsquoorlsquoinherentrsquovowelsarenotpronouncedatall[142]
10
Vowelscanbewrittenasindependentlettersorbyusingavarietyofdiacriticalmarks which are written above below before after or both of the last twopositionstheconsonanttheyfollowinpronunciation[105]
WhenconsonantsoccurtogetherinclustersspecialconjunctlettersareformedInprintedBanglamanyof theseconsonantal clustersorconjoinedconsonantsareinuseThelettersfortheconsonantsotherthanthefinaloneinthegrouparegenerally reduced But there are a few special conjunct characters which arecompounds of the consonant characters eg 7(k)+ষ(s)=8(ks)
9(n)+জ(j)=(nj)(j)+ঞ(n)==(jn) gt (h)+ম(m)=(hm) There are other issuesalsomdashরasthesecondmemberofaclusterisreducedtoasecondarysymboleg
(p)+র(r)=A(pr)B(s)+C(t)+র(r)=D(str) (as inউD ustra ldquocamelrdquo)য (y)whenusedas a primary symbol represents jɔ in Bangla But its secondary symbol(allograph) jɔ-phala has two phonetic values When added to the initialconsonant in a word it is a vowel aelig (as in শGামল (syamala) ldquogreenrdquo র Gাপার
(ryapara)ldquowrapperrdquoetc)Butafteranon-initialconsonant it justdoublesit in
pronunciation (as in কাযH ধাযH etc) The I(r)+য(y) combination has two
shape of the second member is changedmdasheg M(ddh) N(gdh) and O(ndh)
respectively The solitary example of I (r)+ঋ(r)=ঋH (as in ৈনঋHত nairrtSouthwest) ndash usedmostly in cases of Classical borrowings shows the use ofsecondary symbol of a consonant followed by the primary symbol of a vowelTheinherentvowelonlyappliestothefinalconsonantofthecluster
The Bangla script has at least fifty-two primary symbols and quite a fewallographs(positionalvariantsofthem)correspondingtoforty-four(7oraland7nasalvowelsand30consonants)phonemes(150)orfunctionalspeechsoundswithsomeobviousredundanciesalthoughinoneofthefirstphonemicanalysisthenumberwasthoughttobethirty-fivephonemes[140]
11
As mentioned above in Bangla several graphemic symbols have secondaryshapestechnicallycalledlsquoallographsrsquowithacomplementarydistributionineachcaseThesegraphsormarkingsaregenerallyaddedtothefollowingpositionsoftheprimarysymbol[113]inthefollowingmanner
Asforcomplementarydistributionofvowelletters(word-orsyllable-initial)andVowel Matras which are relevant for ABNF let us consider the followingBesidessomesimpleVowelModifierscalledlsquoKarsrsquoinBangla(alsoreferredtoasMatraintheotherLGRdocumentsofNeo-Brāhmī)therearesomecombinatorymodifiersofBanglaVowelswithcertainconsonantsForexamplewhereas
আU+0986BENGALILETTERAAissubstitutedby
াU+09BEBENGALIVOWELSIGNAA
ইU+0987BENGALILETTERIissubstitutedby
pre-posedিU+09BFBENGALIVOWELSIGNI
ঈU+0988BENGALILETTERIIissubstitutedby
ীU+09C0BENGALIVOWELSIGNIIor
উU+0989BENGALILETTERUissubstitutedby
U+09C1 BENGALI VOWEL SIGN U by marking below the primary
grapheme there are some special vowel modifiers of উ as in the followingcombinedletters zwnj guratherthanwritingasগ(g)+ (u)
m (bh)+র (r)(n bhruldquoeyebrowrdquo)o (s)+র (r)(psru)ঋ (r)afterহ (h)(q hr)etc
TherehavebeenmanynotablecontributionsinsimplifyingandmodifyingBanglaspellings and combinatory techniques especially by scholars such as PabitraSarkar(1992)[134]Inthistherehasbeenanattempttoreducethenumberofallographs of both vowels and consonants in clusters and it has been widelyacceptedintheprintingofschooltextsinbothBangladeshandWestBengal[151152]Asofnow twosystems theold (traditional) and thenewgoon sidebysideoperativeindifferentdomains
HoweverinpreparationofthisLGRdocumenttheaimhasbeentoconsiderthewidelyused and usable sequences and combinations and their variations across the sisterscriptsbelongingtothebasketofBrāhmīwritingsystemsBanglaAcademyDhakapublishedStandardBanglaSpellingRulesin1992followingtherecommendationsofacommitteeconstitutedthroughaworkshopjointlyorganizedbytheJatıyaSy iksakramaandPathyapustakaBoardin1988AthroughlyrevisededitionoftheRuleswaspublishedinSeptember20126After the establishment of Banla A kademi ofWestBengal in 1986 its first PresidentAnnadasankar Ray (1904-2002) in his inaugural address gave a direction forstandardizationofBanglaalphabetscript thespellingsystemandclearlyarguedthattheywouldnotblindly followtheSanskriticmodelofconventionalgrammarAbroadlistofproposalswassenttoexpertsonBanglaandabroadagreementwasreachedforlsquohomogenizationofBanglaspellingrsquoby1988BasedonopinionsreceivedfromdifferentquartersaunanimouslistoflsquorulesrsquowasagreeduponThiswaspublishedbyalsquoSpellingDictionaryrsquo titled Ākādemi Bānāna Abhidhāna (1997) which was obviously morecomprehensive than lsquoTheUniversityofCalcuttaproposalsrsquomade in1936Alongwiththe lsquorationalizationrsquo of spellings another stepwas taken tomake thewriting systemeasier to read by making the symbols used both single and combined ones morelsquotransparentrsquoThesereformswereoriginallysuggestedbySarkar(1987firstpublishedin1978)[134][153]whereheusedthetermsSwaccha (lsquoTransparentrsquo)andAswaccha(lsquoOpaquersquo or non-transparent) even adding Ardha Swaccha (lsquohalf transparent) inbetweenthetwoSomesampleexamplesare Transparent r (nn) s (pt) [ (st) where both member of the cluster can berecognized
6Bangla Academy 2012 Bāṅlā Ekaḍemī Pramita Bāṅlā Bānānera Niyama (Bangla Academy StandardBanglaSpellingRules)DhakaːBanglaAcademy
Therewere in fact two types of proposals One concerned the shape of the lettersthose of consonant + vowel (CV) combinations and conjuncts which is consonant +consonantcombinationsTherewerefurthercomplexshapesiethoseofconsonant+consonant+ (consonant+) vowel (CC(CV) signs as in y (pru) or z (skru) SomedecisionsinthisareawerenecessarybecauseafewoftheCC(C)symbolsrepresentedcomplexitiesthatmadelearningthemdifficultforthechildrenTheotherdealtwiththespellings ofwords onlywithout any reference to the shapes of letters inwhich theywere written The basic objective here was lsquoone word one spellingrsquo to the greatestextentthatwaspossible[151]
Belowwe place a statement of themost salient changes that affect the consonant +vowelcombinations[153]
a The variants of the short u (^ উ-কার hrasva u-kāra) vowel sign have been brought down to one ie So zwnj (gu) is now গ Similarly h (ru) gt র zwnj (śu)gt শ j (hu)gtহ and therefore cluster + short u sign k (ntu)gt W
(ন++ত+উ) (stu)gt[ (স++ত+উ)
b The variants of long u (দীঘH ঊ-কার dīrgha u-kāra) have also been reduced
(rū)gt র n (bhrū) gt Y (ভ bh++র r+ঊ ū) (drū)gt (দ d++র r+ঊ ū) p (śrū)gt (শ ś++র r+ঊ ū)
c The variants of ঋ-কার (ṛ-kāra secondary symbol of ṛ) have been brought down to one q (hṛ) gt হ
Regarding consonant + consonant + (consonant)hellip+ (vowel) clusters PaschimbangaBanglaAkademi proposed transparent or semi-transparent shapes for clusters to theextentadmissibleinBanglawritingsystemSomeexampleswillclarifytheproposal(Aslashwillmeanthatthetraditionalcluster-shapeprecedesitwhiletheBanglaAkademiinnovationfollows)[153]
331TheConsonantsAsper traditional classificationBangla Consonants are categorized according to theirphoneticpropertiesespeciallyintermsofplaceandmannerofarticulation[107]Thereare Five lsquoVargarsquo (pronounced as lsquoBargarsquo in Bangla) or Groups (sets or classes)distinguished by Place of Articulation and one Non-lsquovargarsquo group [105] Each Vargawhich corresponds toStopsat a certainplaceof articulation containsa seriesof fiveconsonants classified as per their phonetic qualities (ie manner of articulation)beginning from Unvoiced and Unaspirated to Voiced and Aspirated (in the fourthcolumn)finallyendingwithaHomorganicorCorrespondingnasal[107]Considerthefollowingtable
332TheImplicitVowelKillerHasanta(calledrsquoHalantrsquoorlsquoHalantarsquoinotherBrahmı-basedscripts)As stated earlier all consonants are pronounced in isolation with an implicit vowel(centralback-ɔinBanglaastheneutralvowel)assumedtobeassociatedwiththem[121]ThelsquoHasantarsquo(=rsquoHalantrsquoorlsquoHalantarsquoinotherBrahmı-basedscripts)orthetermlsquoVirāmarsquo7(=rsquoDa rirsquoinBangla)aspreferredinUNICODE(cfUnicode30andabove)havebeenusedinthisreportastermsthathavebeenusedtodenotethecharacterthatmarkthe absence of this inherent vowel It may be noted that the term virama has beenadopted in UNICODE in a sense that is different from the traditional definition ofgrammarandhenceitrequiressomeexplanationhereConsideringtheimportanceofthedocumentthisnoteshouldbeapartofthisLGRdocumentsothatanybodyreferingtoitshouldbeabletoknowthepropergrammaticalexplanationofthetermBecauseaspecialsignisneededwheneverthisimplicitvowelisstrippedoffthesymbolisknownas the Hasanta (= Halant) (U+09CD) By placing the Hasanta under the firstconsonantofacombinationorclusteronecouldndashincommonparlanceldquokillrdquoitsvowelandcreate conjuncts In thismanner conjunct characters canbegenerallywrittenbyjoining two to fourconsonant combinations In rarecases thisprocess can joinup tofive consonantsHowever thenotionof amaximumnumberof consonants joining toformoneaksara8istobeboundedempiricallyThisisanobservationbasedontheCIIL-Emille Corpora of Bangla words [132 amp 133] as seen in print these days Given themixture of scripts and languages happening on theweb the possibility that onemaywant a generic Top Level Domain [gTLD] which may have more than the observedmaximum cannot be ruled out This can be the case when a foreign language wordwhichadmitsalargenumberofconsonantsistransliteratedintoBanglaHenceintheBanglaLGRworkthislimitwillnotbeenforced
333VowelsSeparate symbols exist for all lsquoSwararsquo or Vowels in Bangla which are pronouncedindependentlyeitheratthebeginningofthewordorafteranothervowelorconsonantsoundToindicateaVowelsoundotherthantheimplicitoneaVowelsigncalledlsquokārrsquoinBanglaorMātrāinNagarı9isattachedtotheconsonantSincetheconsonanthasthisbuilt in neutral vowel at the end there are equivalent kāras (Mātrās) for all vowelsexcepttheঅ(pronounced-ɔ)Thecorrelationisshownasfollows
7VirāmaasusedhereisalsoamisnomeraccordingtotheIndiangrammaticaltraditionsNowheremereabsence of a vowel is marked as virama Hasanta just marks the absence of a vowel nothing else(AbhyankarKashinathVasudevampJMShukla1961ADictionaryofSanskritGrammarBarodaːOrientalInstitute)8ThistermneedstobedisambiguatedAksaraalsomeanslsquosyllablelsquoinIndiangrammaticaltreaditions9AlthoughthetermlsquoMatralsquoinBanglastandsforanaltogetherdifferentconceptvizthetopbarplacedoveraletterndashtypicallyavailableinHindiandBanglabutmissinginGujarati
334TheAnusvāraonuʃʃār(ং-U+0982)TheAnusvāra or onuʃʃār inBangla at times represents a homorganic nasal but notalwaysItreplacesaconjunctgroupofalsquoNasalConsonant+Hasanta+ConsonantrsquowherethesecondconsonantbelongstotheVelarvargaorsetasinলংকাButitoftenappearsalso for such combinations involving non-velars appearing as the lastmember of thecombinationasinলGাংটা ldquonakedrdquoorলGাংচা ldquoakindofsweettolimprdquoBeforeanon-vargaconsonant the Anusvara represents a nasal sound that may have an alternativeconjoined writing symbol representing the corresponding nasal consonant of theparticularsetAlthoughModernHindiMarathiandKonkaniprefertheanusvāratothecorrespondingHalf-nasal inBangla it isclearlydemarcatedastowhereonemustusetheAnusvāraandwhere ithastobeaconjunctclusterwithanasalasthefirstorthesecondcomponent
335NasalizationCandrabindu(-U+0981)
Candrabindu denotes nasalization of the preceding vowel as in চাদ cad lsquomoonrsquo(U+099AU+09BEU+0981U+09A6)Thissignwithadotinsidethehalf-moonmarkisusedasnasalizationmarkerinmanyBrahmı-basedscripts[143]
336Nukta(-U+09BC)ThenuktasigndoesnotexistinBanglaorthographyItispredominantlyusedinmanyBrahmıderivedscriptssuchasDevanagarı(forHindiBodoMaithiliSantaliKashmiriandSindhiThetermandtheconceptofnuktaareborrowedinBanglaTheIDNAProtocol(RFC5891)statesthatIDNsmustbeinUnicodeNormalizationFormC (NFC) RFC 7940 applies this requirement to LGRs The definition of NFC in theUnicodeStandardcontainsanumberofcompositionexclusionsAsaresulttheBanglalettersয় YYAড় RRAandঢ় RRHAhavetoberepresentedinthethisLGRbyusingthesequences (YA +Nukta U+9AF + U+09BC) (DDA + Nukta U+9A1 + U+09BC) and(DDHA+NuktaU+9A2+U+09BC)insteadofthesinglecodepointsYYA(U+9DF)RRA(U+09DC) andRRHA (U+09DD) although the useof lsquoNuktarsquo is otherwise completelyunnaturalinBanglaIt is noted that in the current Unicode Standard chart these characters are listed asadditionalconsonantsAspertheLGRProcedurehoweverthesedecisionsdependontheIDNAProtocolthroughasetofprodeduresdevelopedbytheIETFEventhoughtheUnicode Standard also prescribesmethods to produce these three characters both asatomiccharacters (forexample09DC forড় [r]09DD forঢ় [rh] and09DFasয় [y]assinglekeystroke)theIDNAprotocolrequiresthatwetreatthemasconjunctcharactersandthenallocatecodesfortheseintheUnicodeBengaliBlock
18
ItmaybenotedthattherecouldbesporadicattemptsorcasesofwritingMuslimnamesUrdupoeticwordsandPerso-Arabicloanwordswithnuktaunderক(k)খ(kh)গ(g)জ(j) and ফ (ph) only for the sake of correct pronunciation and for maintaining thesanctityoftheloanwordThesewerealsolikeusingBanglawritingsystemtoworkliketheIPAscriptItishowevernotinuseinBanglawritinginprinting
337Visargabiʃɔrgo(ঃ-U+0983)andAvagraha(ঽ-U+09BD)
TheVisargabiʃɔrgoU+0983 is frequentlyused inBangla loanwordsborrowed fromSanskritandrepresentsasoundveryclosetohOnecouldquoteasanexampleদঃখduhkholdquosorrowrsquorsquoldquounhappinessrsquorsquo(U+0926U+0941U+0983U+0916)The Avagraha ঽ (U+09BD) is mainly used in Sanskrit Pali Prakrt or Maithili textswritteninBanglaItisgraduallybeingreplacedbyanuppercomma(egনেরাঽপরািণre-writtenasনেরাrsquoপরািণ)ItisrarelyusednoweveninotherlanguagesusingBanglascriptIncaseofLGRtheAvagrahaisnotpartoftherepertoireIthasbeendecidedthereforenottoretainAvagraha(ঽ)(U+09BD)becauseitisblockedinTLDsaspertheMaximalStartingRepertoire(MSR)PleaseseeAppendixIIinsection11foracompletelistofBanglaconsonantsandtheirallographs
type in the following mannermdashর++য but for র G the sequence would be
র+ZWJ++য [154] In other words ZWJ is used in the rendering of wordsdemanding ya-phalā after ra which is otherwise not possible to type (render)due to the same order of ra+hasanta+antastha ja in the medial andor finalposition Interestingly ra+hasanta+antastha ja is used to type repha on theconsonant -antasthaja as inকায6 (kaarjo) In order to get a ya-phalā after the
consonant -ra it is therefore obligatory to use ZWJ after -ra as in র Gাপার
TheuseofZWJZWNJhavebeenruledoutfromtherootzonebythe[Procedure]Usedin Bangla to create alternate renderings the insertion of these two signs can affectsearchingaswellasNLPTheZeroWidthNon-joiner(ZWNJ)isaninvisiblecharacterusedincertaincases(afterHasanta)wheredefaultconjunctformationistobeexplicitlyrestrictedandtheHasantajoiningthetwoconsonantsparticipatingintheconjunctformationneedstobeexplicitlyshown
339UseofYa-phalaaYa-Phalaasequencesare two instances inBanglawhereHasanta isprecededbya fullvowel(U+0985অ-BENGALILETTERAandU+098Fএ-BENGALILETTERE)
For renderingYa-phalā followedbyঅ andএ it isnecessary to typeU+09CDHasantaplusU+09AFyaprecededby thesaidvowelsThis isapurely ligaturalentityand theadditionofYa-phalāandākaraisusedtoelicittheaeligsoundasinEnglishacidঅGািসড
association অGােসািসেয়শনlsquobatrsquoবGাটlsquofatrsquo ফGাট lsquomatrsquo মGাটlsquocaprsquoকGাপetcTheBrāhmīscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribed as lsquovowel killerrsquo although it actually indicates absenceof a vowel after themarkedconsonantOnly theconsonantscanhave theHasantamarkedButasweseehereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅ8াandএ8া(CfUnicode100p473[100])
Owingtoco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egT ieka+Hasanta+raasinচTchakraldquocyclerdquo)Thepointisinboth the cases the slot for ra could be Bangla ra র (U+09B0) or the Assamese ra ৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesameTheLGRmakesanoteofthispoint of concern with respect to the two RAs in disguise as it would be compeltelyimpossibletodistinguishbetweenthemwithnakedeyesinalablesogeneratedwhichmay consequently lead to concerns related to spoofing and other kind of cyberirregularitiesThemotivetoclassthesetwoCPsas(blocking)variantsisbecausefullyrendered labels may mask the distinction between Bangla ra র (U+09B0) or theAssameseraৰ(U+09F0)ThatprovidesthejustificationforVariantSet4thoughonlyinthecontextoffollowingHasantThedifferencebetweentheRAsisonlydistinguishableifonelooksintotheirUnicodevaluesThereforelabelssuchasঅকHarka শীষH sırsalsquotopapexrsquo অY abhra lsquocloudthe skyrsquo ম śrama lsquophysical labourrsquo could be extremelydangerous as theweb-usermay never verify the digital content (the labels) with itsunicodevaluecodepoints ThispointismadeexplicitlywithreferencetoTable9(ofsequencesp36)andTable16(ofWLESymbolsp47)thataretofollowMoreoverit
4 OverallDevelopmentProcessandMethodologyThe Neo-Brāhmī Generation Panel (NBGP) has been formed by members havingexperience in Linguistics (especially in NLP Computational linguistics) LiteratureLanguageHistoryandEpigraphyUndertheNeo-BrāhmīGenerationPanelBanglaandeightotherscriptsbelongingtoseparateUnicodeblocksarebeingtakenuptoassignaseparate LGR for each However an attempt ismade to ensure that the fundamentalphilosophybehindbuildingthoseLGRsconsistentwithallotherBrāhmī-derivedscriptsThepresentLGRwillcater tomultiple languagesbelongingtoEGIDSscale1to4(seeTable4)thatuseBanglascriptThefollowingguidingprinciplesareusedinmakingdecisionsaboutBanglaLGRCode-points
41 GuidingPrinciplesTheNBGP adopts followingbroadprinciples for selection of code-points in the code-pointrepertoireacrosstheboardforalltheNeo-Brāhmīscriptswithinitsambit
411 InclusionPrinciples4111 ModernUsageEvery character proposed should be in the everyday usage of a particular linguisticcommunityThecharacterswhichhavebeenencodedintheUnicodefortranscriptionpurposesonlyorforarchivalpurposeswillnotbeconsideredforinclusioninthecode-pointrepertoire4112 UnambiguousUseEvery character proposed should have unambiguous understanding among linguistsaboutitsusageinthelanguage
42 ExclusionPrinciplesThe main exclusion principle is that of External Limits on Scope These consist ofprotocolsor standardswhichareprerequisites to theLabelGenerationRule-setsAllfurtherprinciplesareinfactsubsumedundertheselimitationsbuthavebeenspeltoutseparatelyforthesakeofclarity
22
421 ExternalLimitsofScopeThecodepointrepertoireforrootzonebeingaveryspecialcaseatthetopofprotocolhierarchies thecanvasofavailablecharacters forselectionasapartof theRootZonecodepointrepertoireisalreadyconstrainedbyvariousprotocollayersbeneathitThefollowingthreemainprotocolsstandardsactassuccessivefiltersiTheUnicodeChartOut of all the characters that are needed by the script in question if a particularcharacter is not encoded in Unicode it cannot be incorporated in the code pointrepertoire Such cases are quite rare and especially so in Bangla-Asamiyā-ManipuriWritingSystemgiventheelaborateandexhaustivecharacterinclusioneffortsmadebytheUnicodeconsortiumiiIDNAProtocolUnicode being the character-encoding standard for providing the maximum possiblerepresentation of a given scriptlanguage it has encoded as far as possible all thepossible characters needed by the script However the Domain name being aspecialized case it is governed by an additional protocol known as IDNA(InternationalizedDomainNames inApplications) The IDNAprotocol excludes somecharactersoutofUnicoderepertoirefrombeingpartofthedomainnamesiiiMaximalStartingRepertoire(MSR)TheRoot-zoneLGRbeing the repertoireof characterswhicharegoing tobeused forcreationoftheRoot-zoneTLDswhichinturnconstituteanevenmorespecializedcaseof domain names the ROOT LGR procedure introduces additional exclusions on theIDNArsquosallowedsetofcharacters ExampleBanglaSignAvagrahaঽ(U+093D)evenifallowedbyIDNAprotocol isnotpermittedintheRootZoneRepertoireaspertheMSRTosumuptherestrictionsstartoffwithadmittingonlysuchcharactersasarepartofthe code-block of the given scriptlanguage The IDNA Protocol further narrows thisdownandfinallyanadditionalfilterintheformofMaximalStartingRepertoirerestrictsthecharactersetassociatedwiththegivenlanguageevenmore4211 NoPunctuationMarksTheTLDsbeingidentifierspunctuationmarkerspresentinBraHami-basedscriptswillnotbeincluded
23
4212 NoSymbolsandAbbreviationsAbbreviations weights and measures and other such iconic characters like BANGLAISSHAR(U+09FA)BANGLACURRENCYDENOMINATORSIXTEEN৹(U+09F9)etcwillalsonotbeincluded4213 NoRareandObsoleteCharactersThere are characterswhich have been added toUnicode to accommodate rare formssuchasSanskriticVOCALICRRৠ(U+09E0)andVOCALICLldquoঌrdquo (U+098C)aswellasVOCALICLLৡ(U+09E1)andtheallographicndashkaraformsofthelattertwosymbols-VOWELSIGNVOCALICL(U+09E2)andVOWELSIGNVOCALICLLldquo(U+09E3)Allsuch charactersareexcludedwhich complieswith theConservatismprincipleas laiddownintheRootZoneLGRprocedureHoweverinBanglathe-karacorrespondingtoVOCALICRRৠ(U+09E0)whichisVOWELSIGNVOCALICRRldquordquo(U+09C4)isstill inactiveuseincertainlimitedborrowedorSanskriticwordsandarethereforeretained4214 NoStressMarkersofClassicalSanskritandVedicStressmarkers for classical Sanskrit will not be included This is also in consonancewiththeLetterprincipleaslaiddownintheRootZoneLGRprocedure4215 ABNFThe Augmented Backus-Naur Formalism (ABNF) is described in Section 541 andAppendix(Section101)
5 RepertoireTheBanglaWritingSystemisrepresentedinUNICODEusingtheBengali(Bangla)scriptname as enumerated in ISO 15924 corresponding to languages such as Asamiyā(Assamese) Bangla (Bengali) and Manipuri The BENGALI block used for Bangla-Asamiyā-Manipuri in theUNICODEhas93 entriesThis sectiondetails the code-pointrepertoirethattheNeo-BrāhmīGenerationPanel[NBGP]proposestobeincludedintheBanglaLGRItmaybementionedherethat theGovernmentofAssamhassubmittedaproposal toBureauof Indian Standards (BIS) on26th February2016 for dis-unificationofBanglaand Asamiyā Scripts The BIS inits 8thMeetingofIndian Language Technologies andProducts Sectional Committee LITD 20 held on 23rd Aug 2017 decided torefer theproposalforrecognitionofAssamesescriptinISOIEC10646toISOUntiltheUNICODEConsortiumtakesanyfurtheractionitwillbeassumedthattheCodePointRepertoireunderTable11willbevalidforallthethreelanguagesasabove
24
For each of the code points language references have been given in the last columntitledReferenceunderTable8titledtheldquoCodePointRepertoirerdquoForentirecoverageofBanglacodepointsreferencesofBanglaAsamiyā(Assamese)Manipuri(Meitei)andBishnupriya are given Kokborok written in Bangla script is not known to haveintroducedmanynewcomplicationsexceptforoneparticularcharacterThoughonlyafewrepresentativelanguagesunderEGIDSScale1-4havebeenchosenforreferencingthey together cover all the code-points required for all the languages that NBGP hasconsideredasgivenunderBanglaUnicodePoints(asgiveninUNICODE63)Howeverbefore thedetailsarepresented it is ideal to lookat theBanglaCodePointChartfromMaximalStartingRepertoire[MSR]Version3Itmaybenotedthattheshapesofthereferenceglyphsgivenbelowinthecodechartsarebasedononeofthemanyfontsdesigned and are not prescriptive because there could be some variations in actualfonts ndash both UNICODE-compatible and True-Type ones Consider the following Codepointtable
ColourconventionAllcharactersthatareincludedinthe[MSR]-YellowbackgroundPVALIDinIDNA2008butexcludedfromthe[MSR]-PinkishbackgroundNot PVALID in IDNA2008 or are ineligiblefor the root zone (digits hyphen) - Whitebackground
Apart from the above individual code-points the Neo-Brāhmī Generation Panel alsoproposes some specific sequences which enable conditional inclusion of the BanglaLETTER A and E followed by Bangla SIGN VIRAMA and Bangla LETTER YA againfollowed by Bangla VOWEL SIGN AA in the repertoire for enabling inclusion of aeligsoundasinEnglishlsquobatrsquolsquocatrsquoetc
52 CodePointRepertoireExclusionTherearesomecharactersoftheBanglascriptthatfindplaceintheUnicodebuthavenot been included in the repertoire in the LGR proposal The reason for excludingঌ(U+098C)andৗ(U+09D7)isthattheyarerareandobsoletecharacters
53 CodepointnotusedaloneBENGALI SIGN NUKTA U+09BC (See 336) is excluded from repertoire since it will never be used alone It will be used as sequence in three special characters in normalized form for ড় ঢ় য়
542 TheVowelSequenceInwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaare given To facilitate understanding of other Brahmi script users equivalents inDevanāgarīareprovidedwherevernecessaryAvowelsequenceismadeupofasinglevowelItmaybefollowedbutnotnecessarily(optionally)byanAnusvāraonuʃʃār(B)Candrabindu(D)oraVisargabiʃɔrgo(X)ThenumberofDBorXwhichcanfollowaV inBanglamaynotberestrictedtooneGoingbytherules illustratedinthedocument it isclearthat formationssuchasVDDVBBandVXXare invalidorthographicunitsHowever it isvalidandpossible tohaveformationsorsequencessuchasanusvarafollowedbyachandrabinduononehandandvisarga followed by a chandrabindu on the other as in হ8াংচা lsquohaelignchārsquo and lsquohaelignrsquo হ8াঃrespectivelyThepossibilityof aVisarga orAnusvāra (onuʃʃār) followingaCandrabinduexists inBangla Vowel can optionally be followed by a combination of Hasanta Virama [H]Consonant [C] to formaYa-phala ldquoYa-phala isapresentation formofU+09AFBanglaletterযorlsquoyarsquoRepresentedbythesequenceltU+09CDieBENGALISIGNVIRAMABangla SIGNHasanta or VIRAMA U+09AF -য BENGALI LETTER YAgt Ya-phala has aspecialformয়AgainwhencombinedwithU+09BEাBENGALIVOWELSIGNAA(ielsquoaarsquo(ā))itisusedfortranscribing[aelig]asintheldquoardquointheEnglishwordldquobatrdquowritteninBanglaasব8াটAVowel-sequenceadmitsthefollowingcombinations
5421 ASingleVowel
ExamplesV অअ
5422 AVowelwithConditionsAVowelcanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D]orVisarga[X]or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX] or combination ofHasanta(orVirama)[H]followedbyConsonant[C]followedbykāra(Mātrā)[M]
5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]
5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX
Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)
545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full
10 Refer to Rule P in Section 7 Table 16
42
vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+
6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed
61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus
স +Hasanta+হ(U+09B8+U+09CD+U+09B9)
2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus
ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)
44
Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)
62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference
7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecifications
11 Unicode uses Oriya for the script although Odia is now the official term used 12 As used by the Unicode denoting and including both Assamese and Maṇipuri
47
Below are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)
C rarr Consonant
M rarr Kāra(Mātrā)
V rarr Vowel
B rarr Anusvāra
D rarr Candrabindu
X rarr Visarga
H rarr Hasanta
Z rarr KhandaTa
S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules
P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA
Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinations
49
711 Case of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াtঅuইিvয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অu)andthesecondwordbeginswithaV(ইিvয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstwordThis isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule
72 AdditionalExamplesfromBanglaABNF
Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive
ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of the
JanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka
[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018
[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473
57
10 Appendix-I
101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply
1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াwেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াxব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexampleyেসিরং(Tsering)
102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti
13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section
58
Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)
Table17ndashTheScriptTableofSylhetiNagarıorSiloti
103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints
ক k Uuml(p+ক)raquo(p+ট)(p+ত)Yacute(p+y+র)(p+y+ব)THORN(p+ন)szlig(p+ব)agrave(p+ম)ক8(p+য)aacute(p+র)acirc(p+ষ)atilde(p+frac12+ণ)auml(p+frac12+ম)aring(p+frac12+ব)acirc8(p+frac12+য)aelig(p+স)t(ccedil+ক)egrave(+p+র)
amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)
গ g ecirc(micro+গ)euml(micro+দ)acute(micro+ধ)igrave(micro+ন)iacute(micro+ব)icirc(micro+ম)গ8(micro+য)iuml(micro+র)eth(micro+ল)ntilde(ccedil+গ)ntilde6 ( +ccedil+গ)
ণ n ouml(Agrave+ট)iquest(Agrave+ঠ)divide(Agrave+ড)oslash(Agrave+Atilde+র)AElig(Agrave+ঢ)ugrave(Agrave+ণ)ণ8(Agrave+য)uacute(Agrave+ব)atilde(p+frac12+ণ)ucirc(frac12+ণ)uuml(+ণ)
32LanguagesConsideredBelowisthetabularrepresentationofthelanguagesusingBanglascriptthatareplacedonEGIDSScale1-6 (See117 fordetails) Some languagesunderEGIDS5 and6havealso developed their own scripts for printing and publishing Some had used Banglascriptearlier(suchasBodo)orusedit inWestBengalatsomepointoftime(Santali)but have later shifted to another writing system Bodo is now written in Nāgarī orDevanāgarīandforSantalioneusesbothNāgarīDevanāgarīandOl-chiki(145)Forthepurposesof theBanglaLGRonly languagesbelonging to theEGIDS scale1 to4havebeenconsideredConsiderthefollowingtable
9
EGIDSScale1
EGIDSScale2
EGIDSScale3
EGIDSScale4
EGIDSScale5
EGIDS6
Bangla(Bengali)
SantaliBodoRiangKhumiMru(ng)Asho
LepchaPnarKodaKoraChak
Asamiyā(Assamese)
KochorRajabansı
MaltoorMalpahariya
ManipuriorMeitei
BisnupriyaManipuriKok-Borok(TripuraampBangladesh)
ChakmaHajongMundariampKurux(ofBangladesh)
TotoRohingyaTipperaMegamTanchangya
Usoi LimbuSadriorOraon
BhumijorMundariBawmChin
Table4MainlanguagesinIndiaandBangladesh
thatuseBanglaScriptontheEGIDSScale
33NotableFeaturesofBanglaScript[150]BanglaWritingSystemhascertainfeaturesthatshowhowithastobewritteninorhowtype-setting inBangla couldbedoneThis section is followedbya section that explains theCode-points (and fixed Code-point sequences) which show certain distinctive characteristics ofBanglaandwhichmaketheRepertoireThenextsectionswillalsocoverthelsquoaksharrsquo-formationrules(ABNF)showingcharacterclassWordLevelEvaluation(WLE)andContextRulesaswellas In-ScriptandCross-ScriptVariantsHerewepresentsomebasic featuresof theScriptandPronuncition The Bangla script is an alpha-syllabic writing system in which writing of all
consonants are assumed to contain an accompanying lsquoinherentrsquo vowel(theoretically before or after each consonant) It varies between ɔ and odepending on the position of the consonant in the word At times theselsquoassumedrsquoorlsquoinherentrsquovowelsarenotpronouncedatall[142]
10
Vowelscanbewrittenasindependentlettersorbyusingavarietyofdiacriticalmarks which are written above below before after or both of the last twopositionstheconsonanttheyfollowinpronunciation[105]
WhenconsonantsoccurtogetherinclustersspecialconjunctlettersareformedInprintedBanglamanyof theseconsonantal clustersorconjoinedconsonantsareinuseThelettersfortheconsonantsotherthanthefinaloneinthegrouparegenerally reduced But there are a few special conjunct characters which arecompounds of the consonant characters eg 7(k)+ষ(s)=8(ks)
9(n)+জ(j)=(nj)(j)+ঞ(n)==(jn) gt (h)+ম(m)=(hm) There are other issuesalsomdashরasthesecondmemberofaclusterisreducedtoasecondarysymboleg
(p)+র(r)=A(pr)B(s)+C(t)+র(r)=D(str) (as inউD ustra ldquocamelrdquo)য (y)whenusedas a primary symbol represents jɔ in Bangla But its secondary symbol(allograph) jɔ-phala has two phonetic values When added to the initialconsonant in a word it is a vowel aelig (as in শGামল (syamala) ldquogreenrdquo র Gাপার
(ryapara)ldquowrapperrdquoetc)Butafteranon-initialconsonant it justdoublesit in
pronunciation (as in কাযH ধাযH etc) The I(r)+য(y) combination has two
shape of the second member is changedmdasheg M(ddh) N(gdh) and O(ndh)
respectively The solitary example of I (r)+ঋ(r)=ঋH (as in ৈনঋHত nairrtSouthwest) ndash usedmostly in cases of Classical borrowings shows the use ofsecondary symbol of a consonant followed by the primary symbol of a vowelTheinherentvowelonlyappliestothefinalconsonantofthecluster
The Bangla script has at least fifty-two primary symbols and quite a fewallographs(positionalvariantsofthem)correspondingtoforty-four(7oraland7nasalvowelsand30consonants)phonemes(150)orfunctionalspeechsoundswithsomeobviousredundanciesalthoughinoneofthefirstphonemicanalysisthenumberwasthoughttobethirty-fivephonemes[140]
11
As mentioned above in Bangla several graphemic symbols have secondaryshapestechnicallycalledlsquoallographsrsquowithacomplementarydistributionineachcaseThesegraphsormarkingsaregenerallyaddedtothefollowingpositionsoftheprimarysymbol[113]inthefollowingmanner
Asforcomplementarydistributionofvowelletters(word-orsyllable-initial)andVowel Matras which are relevant for ABNF let us consider the followingBesidessomesimpleVowelModifierscalledlsquoKarsrsquoinBangla(alsoreferredtoasMatraintheotherLGRdocumentsofNeo-Brāhmī)therearesomecombinatorymodifiersofBanglaVowelswithcertainconsonantsForexamplewhereas
আU+0986BENGALILETTERAAissubstitutedby
াU+09BEBENGALIVOWELSIGNAA
ইU+0987BENGALILETTERIissubstitutedby
pre-posedিU+09BFBENGALIVOWELSIGNI
ঈU+0988BENGALILETTERIIissubstitutedby
ীU+09C0BENGALIVOWELSIGNIIor
উU+0989BENGALILETTERUissubstitutedby
U+09C1 BENGALI VOWEL SIGN U by marking below the primary
grapheme there are some special vowel modifiers of উ as in the followingcombinedletters zwnj guratherthanwritingasগ(g)+ (u)
m (bh)+র (r)(n bhruldquoeyebrowrdquo)o (s)+র (r)(psru)ঋ (r)afterহ (h)(q hr)etc
TherehavebeenmanynotablecontributionsinsimplifyingandmodifyingBanglaspellings and combinatory techniques especially by scholars such as PabitraSarkar(1992)[134]Inthistherehasbeenanattempttoreducethenumberofallographs of both vowels and consonants in clusters and it has been widelyacceptedintheprintingofschooltextsinbothBangladeshandWestBengal[151152]Asofnow twosystems theold (traditional) and thenewgoon sidebysideoperativeindifferentdomains
HoweverinpreparationofthisLGRdocumenttheaimhasbeentoconsiderthewidelyused and usable sequences and combinations and their variations across the sisterscriptsbelongingtothebasketofBrāhmīwritingsystemsBanglaAcademyDhakapublishedStandardBanglaSpellingRulesin1992followingtherecommendationsofacommitteeconstitutedthroughaworkshopjointlyorganizedbytheJatıyaSy iksakramaandPathyapustakaBoardin1988AthroughlyrevisededitionoftheRuleswaspublishedinSeptember20126After the establishment of Banla A kademi ofWestBengal in 1986 its first PresidentAnnadasankar Ray (1904-2002) in his inaugural address gave a direction forstandardizationofBanglaalphabetscript thespellingsystemandclearlyarguedthattheywouldnotblindly followtheSanskriticmodelofconventionalgrammarAbroadlistofproposalswassenttoexpertsonBanglaandabroadagreementwasreachedforlsquohomogenizationofBanglaspellingrsquoby1988BasedonopinionsreceivedfromdifferentquartersaunanimouslistoflsquorulesrsquowasagreeduponThiswaspublishedbyalsquoSpellingDictionaryrsquo titled Ākādemi Bānāna Abhidhāna (1997) which was obviously morecomprehensive than lsquoTheUniversityofCalcuttaproposalsrsquomade in1936Alongwiththe lsquorationalizationrsquo of spellings another stepwas taken tomake thewriting systemeasier to read by making the symbols used both single and combined ones morelsquotransparentrsquoThesereformswereoriginallysuggestedbySarkar(1987firstpublishedin1978)[134][153]whereheusedthetermsSwaccha (lsquoTransparentrsquo)andAswaccha(lsquoOpaquersquo or non-transparent) even adding Ardha Swaccha (lsquohalf transparent) inbetweenthetwoSomesampleexamplesare Transparent r (nn) s (pt) [ (st) where both member of the cluster can berecognized
6Bangla Academy 2012 Bāṅlā Ekaḍemī Pramita Bāṅlā Bānānera Niyama (Bangla Academy StandardBanglaSpellingRules)DhakaːBanglaAcademy
Therewere in fact two types of proposals One concerned the shape of the lettersthose of consonant + vowel (CV) combinations and conjuncts which is consonant +consonantcombinationsTherewerefurthercomplexshapesiethoseofconsonant+consonant+ (consonant+) vowel (CC(CV) signs as in y (pru) or z (skru) SomedecisionsinthisareawerenecessarybecauseafewoftheCC(C)symbolsrepresentedcomplexitiesthatmadelearningthemdifficultforthechildrenTheotherdealtwiththespellings ofwords onlywithout any reference to the shapes of letters inwhich theywere written The basic objective here was lsquoone word one spellingrsquo to the greatestextentthatwaspossible[151]
Belowwe place a statement of themost salient changes that affect the consonant +vowelcombinations[153]
a The variants of the short u (^ উ-কার hrasva u-kāra) vowel sign have been brought down to one ie So zwnj (gu) is now গ Similarly h (ru) gt র zwnj (śu)gt শ j (hu)gtহ and therefore cluster + short u sign k (ntu)gt W
(ন++ত+উ) (stu)gt[ (স++ত+উ)
b The variants of long u (দীঘH ঊ-কার dīrgha u-kāra) have also been reduced
(rū)gt র n (bhrū) gt Y (ভ bh++র r+ঊ ū) (drū)gt (দ d++র r+ঊ ū) p (śrū)gt (শ ś++র r+ঊ ū)
c The variants of ঋ-কার (ṛ-kāra secondary symbol of ṛ) have been brought down to one q (hṛ) gt হ
Regarding consonant + consonant + (consonant)hellip+ (vowel) clusters PaschimbangaBanglaAkademi proposed transparent or semi-transparent shapes for clusters to theextentadmissibleinBanglawritingsystemSomeexampleswillclarifytheproposal(Aslashwillmeanthatthetraditionalcluster-shapeprecedesitwhiletheBanglaAkademiinnovationfollows)[153]
331TheConsonantsAsper traditional classificationBangla Consonants are categorized according to theirphoneticpropertiesespeciallyintermsofplaceandmannerofarticulation[107]Thereare Five lsquoVargarsquo (pronounced as lsquoBargarsquo in Bangla) or Groups (sets or classes)distinguished by Place of Articulation and one Non-lsquovargarsquo group [105] Each Vargawhich corresponds toStopsat a certainplaceof articulation containsa seriesof fiveconsonants classified as per their phonetic qualities (ie manner of articulation)beginning from Unvoiced and Unaspirated to Voiced and Aspirated (in the fourthcolumn)finallyendingwithaHomorganicorCorrespondingnasal[107]Considerthefollowingtable
332TheImplicitVowelKillerHasanta(calledrsquoHalantrsquoorlsquoHalantarsquoinotherBrahmı-basedscripts)As stated earlier all consonants are pronounced in isolation with an implicit vowel(centralback-ɔinBanglaastheneutralvowel)assumedtobeassociatedwiththem[121]ThelsquoHasantarsquo(=rsquoHalantrsquoorlsquoHalantarsquoinotherBrahmı-basedscripts)orthetermlsquoVirāmarsquo7(=rsquoDa rirsquoinBangla)aspreferredinUNICODE(cfUnicode30andabove)havebeenusedinthisreportastermsthathavebeenusedtodenotethecharacterthatmarkthe absence of this inherent vowel It may be noted that the term virama has beenadopted in UNICODE in a sense that is different from the traditional definition ofgrammarandhenceitrequiressomeexplanationhereConsideringtheimportanceofthedocumentthisnoteshouldbeapartofthisLGRdocumentsothatanybodyreferingtoitshouldbeabletoknowthepropergrammaticalexplanationofthetermBecauseaspecialsignisneededwheneverthisimplicitvowelisstrippedoffthesymbolisknownas the Hasanta (= Halant) (U+09CD) By placing the Hasanta under the firstconsonantofacombinationorclusteronecouldndashincommonparlanceldquokillrdquoitsvowelandcreate conjuncts In thismanner conjunct characters canbegenerallywrittenbyjoining two to fourconsonant combinations In rarecases thisprocess can joinup tofive consonantsHowever thenotionof amaximumnumberof consonants joining toformoneaksara8istobeboundedempiricallyThisisanobservationbasedontheCIIL-Emille Corpora of Bangla words [132 amp 133] as seen in print these days Given themixture of scripts and languages happening on theweb the possibility that onemaywant a generic Top Level Domain [gTLD] which may have more than the observedmaximum cannot be ruled out This can be the case when a foreign language wordwhichadmitsalargenumberofconsonantsistransliteratedintoBanglaHenceintheBanglaLGRworkthislimitwillnotbeenforced
333VowelsSeparate symbols exist for all lsquoSwararsquo or Vowels in Bangla which are pronouncedindependentlyeitheratthebeginningofthewordorafteranothervowelorconsonantsoundToindicateaVowelsoundotherthantheimplicitoneaVowelsigncalledlsquokārrsquoinBanglaorMātrāinNagarı9isattachedtotheconsonantSincetheconsonanthasthisbuilt in neutral vowel at the end there are equivalent kāras (Mātrās) for all vowelsexcepttheঅ(pronounced-ɔ)Thecorrelationisshownasfollows
7VirāmaasusedhereisalsoamisnomeraccordingtotheIndiangrammaticaltraditionsNowheremereabsence of a vowel is marked as virama Hasanta just marks the absence of a vowel nothing else(AbhyankarKashinathVasudevampJMShukla1961ADictionaryofSanskritGrammarBarodaːOrientalInstitute)8ThistermneedstobedisambiguatedAksaraalsomeanslsquosyllablelsquoinIndiangrammaticaltreaditions9AlthoughthetermlsquoMatralsquoinBanglastandsforanaltogetherdifferentconceptvizthetopbarplacedoveraletterndashtypicallyavailableinHindiandBanglabutmissinginGujarati
334TheAnusvāraonuʃʃār(ং-U+0982)TheAnusvāra or onuʃʃār inBangla at times represents a homorganic nasal but notalwaysItreplacesaconjunctgroupofalsquoNasalConsonant+Hasanta+ConsonantrsquowherethesecondconsonantbelongstotheVelarvargaorsetasinলংকাButitoftenappearsalso for such combinations involving non-velars appearing as the lastmember of thecombinationasinলGাংটা ldquonakedrdquoorলGাংচা ldquoakindofsweettolimprdquoBeforeanon-vargaconsonant the Anusvara represents a nasal sound that may have an alternativeconjoined writing symbol representing the corresponding nasal consonant of theparticularsetAlthoughModernHindiMarathiandKonkaniprefertheanusvāratothecorrespondingHalf-nasal inBangla it isclearlydemarcatedastowhereonemustusetheAnusvāraandwhere ithastobeaconjunctclusterwithanasalasthefirstorthesecondcomponent
335NasalizationCandrabindu(-U+0981)
Candrabindu denotes nasalization of the preceding vowel as in চাদ cad lsquomoonrsquo(U+099AU+09BEU+0981U+09A6)Thissignwithadotinsidethehalf-moonmarkisusedasnasalizationmarkerinmanyBrahmı-basedscripts[143]
336Nukta(-U+09BC)ThenuktasigndoesnotexistinBanglaorthographyItispredominantlyusedinmanyBrahmıderivedscriptssuchasDevanagarı(forHindiBodoMaithiliSantaliKashmiriandSindhiThetermandtheconceptofnuktaareborrowedinBanglaTheIDNAProtocol(RFC5891)statesthatIDNsmustbeinUnicodeNormalizationFormC (NFC) RFC 7940 applies this requirement to LGRs The definition of NFC in theUnicodeStandardcontainsanumberofcompositionexclusionsAsaresulttheBanglalettersয় YYAড় RRAandঢ় RRHAhavetoberepresentedinthethisLGRbyusingthesequences (YA +Nukta U+9AF + U+09BC) (DDA + Nukta U+9A1 + U+09BC) and(DDHA+NuktaU+9A2+U+09BC)insteadofthesinglecodepointsYYA(U+9DF)RRA(U+09DC) andRRHA (U+09DD) although the useof lsquoNuktarsquo is otherwise completelyunnaturalinBanglaIt is noted that in the current Unicode Standard chart these characters are listed asadditionalconsonantsAspertheLGRProcedurehoweverthesedecisionsdependontheIDNAProtocolthroughasetofprodeduresdevelopedbytheIETFEventhoughtheUnicode Standard also prescribesmethods to produce these three characters both asatomiccharacters (forexample09DC forড় [r]09DD forঢ় [rh] and09DFasয় [y]assinglekeystroke)theIDNAprotocolrequiresthatwetreatthemasconjunctcharactersandthenallocatecodesfortheseintheUnicodeBengaliBlock
18
ItmaybenotedthattherecouldbesporadicattemptsorcasesofwritingMuslimnamesUrdupoeticwordsandPerso-Arabicloanwordswithnuktaunderক(k)খ(kh)গ(g)জ(j) and ফ (ph) only for the sake of correct pronunciation and for maintaining thesanctityoftheloanwordThesewerealsolikeusingBanglawritingsystemtoworkliketheIPAscriptItishowevernotinuseinBanglawritinginprinting
337Visargabiʃɔrgo(ঃ-U+0983)andAvagraha(ঽ-U+09BD)
TheVisargabiʃɔrgoU+0983 is frequentlyused inBangla loanwordsborrowed fromSanskritandrepresentsasoundveryclosetohOnecouldquoteasanexampleদঃখduhkholdquosorrowrsquorsquoldquounhappinessrsquorsquo(U+0926U+0941U+0983U+0916)The Avagraha ঽ (U+09BD) is mainly used in Sanskrit Pali Prakrt or Maithili textswritteninBanglaItisgraduallybeingreplacedbyanuppercomma(egনেরাঽপরািণre-writtenasনেরাrsquoপরািণ)ItisrarelyusednoweveninotherlanguagesusingBanglascriptIncaseofLGRtheAvagrahaisnotpartoftherepertoireIthasbeendecidedthereforenottoretainAvagraha(ঽ)(U+09BD)becauseitisblockedinTLDsaspertheMaximalStartingRepertoire(MSR)PleaseseeAppendixIIinsection11foracompletelistofBanglaconsonantsandtheirallographs
type in the following mannermdashর++য but for র G the sequence would be
র+ZWJ++য [154] In other words ZWJ is used in the rendering of wordsdemanding ya-phalā after ra which is otherwise not possible to type (render)due to the same order of ra+hasanta+antastha ja in the medial andor finalposition Interestingly ra+hasanta+antastha ja is used to type repha on theconsonant -antasthaja as inকায6 (kaarjo) In order to get a ya-phalā after the
consonant -ra it is therefore obligatory to use ZWJ after -ra as in র Gাপার
TheuseofZWJZWNJhavebeenruledoutfromtherootzonebythe[Procedure]Usedin Bangla to create alternate renderings the insertion of these two signs can affectsearchingaswellasNLPTheZeroWidthNon-joiner(ZWNJ)isaninvisiblecharacterusedincertaincases(afterHasanta)wheredefaultconjunctformationistobeexplicitlyrestrictedandtheHasantajoiningthetwoconsonantsparticipatingintheconjunctformationneedstobeexplicitlyshown
339UseofYa-phalaaYa-Phalaasequencesare two instances inBanglawhereHasanta isprecededbya fullvowel(U+0985অ-BENGALILETTERAandU+098Fএ-BENGALILETTERE)
For renderingYa-phalā followedbyঅ andএ it isnecessary to typeU+09CDHasantaplusU+09AFyaprecededby thesaidvowelsThis isapurely ligaturalentityand theadditionofYa-phalāandākaraisusedtoelicittheaeligsoundasinEnglishacidঅGািসড
association অGােসািসেয়শনlsquobatrsquoবGাটlsquofatrsquo ফGাট lsquomatrsquo মGাটlsquocaprsquoকGাপetcTheBrāhmīscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribed as lsquovowel killerrsquo although it actually indicates absenceof a vowel after themarkedconsonantOnly theconsonantscanhave theHasantamarkedButasweseehereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅ8াandএ8া(CfUnicode100p473[100])
Owingtoco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egT ieka+Hasanta+raasinচTchakraldquocyclerdquo)Thepointisinboth the cases the slot for ra could be Bangla ra র (U+09B0) or the Assamese ra ৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesameTheLGRmakesanoteofthispoint of concern with respect to the two RAs in disguise as it would be compeltelyimpossibletodistinguishbetweenthemwithnakedeyesinalablesogeneratedwhichmay consequently lead to concerns related to spoofing and other kind of cyberirregularitiesThemotivetoclassthesetwoCPsas(blocking)variantsisbecausefullyrendered labels may mask the distinction between Bangla ra র (U+09B0) or theAssameseraৰ(U+09F0)ThatprovidesthejustificationforVariantSet4thoughonlyinthecontextoffollowingHasantThedifferencebetweentheRAsisonlydistinguishableifonelooksintotheirUnicodevaluesThereforelabelssuchasঅকHarka শীষH sırsalsquotopapexrsquo অY abhra lsquocloudthe skyrsquo ম śrama lsquophysical labourrsquo could be extremelydangerous as theweb-usermay never verify the digital content (the labels) with itsunicodevaluecodepoints ThispointismadeexplicitlywithreferencetoTable9(ofsequencesp36)andTable16(ofWLESymbolsp47)thataretofollowMoreoverit
4 OverallDevelopmentProcessandMethodologyThe Neo-Brāhmī Generation Panel (NBGP) has been formed by members havingexperience in Linguistics (especially in NLP Computational linguistics) LiteratureLanguageHistoryandEpigraphyUndertheNeo-BrāhmīGenerationPanelBanglaandeightotherscriptsbelongingtoseparateUnicodeblocksarebeingtakenuptoassignaseparate LGR for each However an attempt ismade to ensure that the fundamentalphilosophybehindbuildingthoseLGRsconsistentwithallotherBrāhmī-derivedscriptsThepresentLGRwillcater tomultiple languagesbelongingtoEGIDSscale1to4(seeTable4)thatuseBanglascriptThefollowingguidingprinciplesareusedinmakingdecisionsaboutBanglaLGRCode-points
41 GuidingPrinciplesTheNBGP adopts followingbroadprinciples for selection of code-points in the code-pointrepertoireacrosstheboardforalltheNeo-Brāhmīscriptswithinitsambit
411 InclusionPrinciples4111 ModernUsageEvery character proposed should be in the everyday usage of a particular linguisticcommunityThecharacterswhichhavebeenencodedintheUnicodefortranscriptionpurposesonlyorforarchivalpurposeswillnotbeconsideredforinclusioninthecode-pointrepertoire4112 UnambiguousUseEvery character proposed should have unambiguous understanding among linguistsaboutitsusageinthelanguage
42 ExclusionPrinciplesThe main exclusion principle is that of External Limits on Scope These consist ofprotocolsor standardswhichareprerequisites to theLabelGenerationRule-setsAllfurtherprinciplesareinfactsubsumedundertheselimitationsbuthavebeenspeltoutseparatelyforthesakeofclarity
22
421 ExternalLimitsofScopeThecodepointrepertoireforrootzonebeingaveryspecialcaseatthetopofprotocolhierarchies thecanvasofavailablecharacters forselectionasapartof theRootZonecodepointrepertoireisalreadyconstrainedbyvariousprotocollayersbeneathitThefollowingthreemainprotocolsstandardsactassuccessivefiltersiTheUnicodeChartOut of all the characters that are needed by the script in question if a particularcharacter is not encoded in Unicode it cannot be incorporated in the code pointrepertoire Such cases are quite rare and especially so in Bangla-Asamiyā-ManipuriWritingSystemgiventheelaborateandexhaustivecharacterinclusioneffortsmadebytheUnicodeconsortiumiiIDNAProtocolUnicode being the character-encoding standard for providing the maximum possiblerepresentation of a given scriptlanguage it has encoded as far as possible all thepossible characters needed by the script However the Domain name being aspecialized case it is governed by an additional protocol known as IDNA(InternationalizedDomainNames inApplications) The IDNAprotocol excludes somecharactersoutofUnicoderepertoirefrombeingpartofthedomainnamesiiiMaximalStartingRepertoire(MSR)TheRoot-zoneLGRbeing the repertoireof characterswhicharegoing tobeused forcreationoftheRoot-zoneTLDswhichinturnconstituteanevenmorespecializedcaseof domain names the ROOT LGR procedure introduces additional exclusions on theIDNArsquosallowedsetofcharacters ExampleBanglaSignAvagrahaঽ(U+093D)evenifallowedbyIDNAprotocol isnotpermittedintheRootZoneRepertoireaspertheMSRTosumuptherestrictionsstartoffwithadmittingonlysuchcharactersasarepartofthe code-block of the given scriptlanguage The IDNA Protocol further narrows thisdownandfinallyanadditionalfilterintheformofMaximalStartingRepertoirerestrictsthecharactersetassociatedwiththegivenlanguageevenmore4211 NoPunctuationMarksTheTLDsbeingidentifierspunctuationmarkerspresentinBraHami-basedscriptswillnotbeincluded
23
4212 NoSymbolsandAbbreviationsAbbreviations weights and measures and other such iconic characters like BANGLAISSHAR(U+09FA)BANGLACURRENCYDENOMINATORSIXTEEN৹(U+09F9)etcwillalsonotbeincluded4213 NoRareandObsoleteCharactersThere are characterswhich have been added toUnicode to accommodate rare formssuchasSanskriticVOCALICRRৠ(U+09E0)andVOCALICLldquoঌrdquo (U+098C)aswellasVOCALICLLৡ(U+09E1)andtheallographicndashkaraformsofthelattertwosymbols-VOWELSIGNVOCALICL(U+09E2)andVOWELSIGNVOCALICLLldquo(U+09E3)Allsuch charactersareexcludedwhich complieswith theConservatismprincipleas laiddownintheRootZoneLGRprocedureHoweverinBanglathe-karacorrespondingtoVOCALICRRৠ(U+09E0)whichisVOWELSIGNVOCALICRRldquordquo(U+09C4)isstill inactiveuseincertainlimitedborrowedorSanskriticwordsandarethereforeretained4214 NoStressMarkersofClassicalSanskritandVedicStressmarkers for classical Sanskrit will not be included This is also in consonancewiththeLetterprincipleaslaiddownintheRootZoneLGRprocedure4215 ABNFThe Augmented Backus-Naur Formalism (ABNF) is described in Section 541 andAppendix(Section101)
5 RepertoireTheBanglaWritingSystemisrepresentedinUNICODEusingtheBengali(Bangla)scriptname as enumerated in ISO 15924 corresponding to languages such as Asamiyā(Assamese) Bangla (Bengali) and Manipuri The BENGALI block used for Bangla-Asamiyā-Manipuri in theUNICODEhas93 entriesThis sectiondetails the code-pointrepertoirethattheNeo-BrāhmīGenerationPanel[NBGP]proposestobeincludedintheBanglaLGRItmaybementionedherethat theGovernmentofAssamhassubmittedaproposal toBureauof Indian Standards (BIS) on26th February2016 for dis-unificationofBanglaand Asamiyā Scripts The BIS inits 8thMeetingofIndian Language Technologies andProducts Sectional Committee LITD 20 held on 23rd Aug 2017 decided torefer theproposalforrecognitionofAssamesescriptinISOIEC10646toISOUntiltheUNICODEConsortiumtakesanyfurtheractionitwillbeassumedthattheCodePointRepertoireunderTable11willbevalidforallthethreelanguagesasabove
24
For each of the code points language references have been given in the last columntitledReferenceunderTable8titledtheldquoCodePointRepertoirerdquoForentirecoverageofBanglacodepointsreferencesofBanglaAsamiyā(Assamese)Manipuri(Meitei)andBishnupriya are given Kokborok written in Bangla script is not known to haveintroducedmanynewcomplicationsexceptforoneparticularcharacterThoughonlyafewrepresentativelanguagesunderEGIDSScale1-4havebeenchosenforreferencingthey together cover all the code-points required for all the languages that NBGP hasconsideredasgivenunderBanglaUnicodePoints(asgiveninUNICODE63)Howeverbefore thedetailsarepresented it is ideal to lookat theBanglaCodePointChartfromMaximalStartingRepertoire[MSR]Version3Itmaybenotedthattheshapesofthereferenceglyphsgivenbelowinthecodechartsarebasedononeofthemanyfontsdesigned and are not prescriptive because there could be some variations in actualfonts ndash both UNICODE-compatible and True-Type ones Consider the following Codepointtable
ColourconventionAllcharactersthatareincludedinthe[MSR]-YellowbackgroundPVALIDinIDNA2008butexcludedfromthe[MSR]-PinkishbackgroundNot PVALID in IDNA2008 or are ineligiblefor the root zone (digits hyphen) - Whitebackground
Apart from the above individual code-points the Neo-Brāhmī Generation Panel alsoproposes some specific sequences which enable conditional inclusion of the BanglaLETTER A and E followed by Bangla SIGN VIRAMA and Bangla LETTER YA againfollowed by Bangla VOWEL SIGN AA in the repertoire for enabling inclusion of aeligsoundasinEnglishlsquobatrsquolsquocatrsquoetc
52 CodePointRepertoireExclusionTherearesomecharactersoftheBanglascriptthatfindplaceintheUnicodebuthavenot been included in the repertoire in the LGR proposal The reason for excludingঌ(U+098C)andৗ(U+09D7)isthattheyarerareandobsoletecharacters
53 CodepointnotusedaloneBENGALI SIGN NUKTA U+09BC (See 336) is excluded from repertoire since it will never be used alone It will be used as sequence in three special characters in normalized form for ড় ঢ় য়
542 TheVowelSequenceInwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaare given To facilitate understanding of other Brahmi script users equivalents inDevanāgarīareprovidedwherevernecessaryAvowelsequenceismadeupofasinglevowelItmaybefollowedbutnotnecessarily(optionally)byanAnusvāraonuʃʃār(B)Candrabindu(D)oraVisargabiʃɔrgo(X)ThenumberofDBorXwhichcanfollowaV inBanglamaynotberestrictedtooneGoingbytherules illustratedinthedocument it isclearthat formationssuchasVDDVBBandVXXare invalidorthographicunitsHowever it isvalidandpossible tohaveformationsorsequencessuchasanusvarafollowedbyachandrabinduononehandandvisarga followed by a chandrabindu on the other as in হ8াংচা lsquohaelignchārsquo and lsquohaelignrsquo হ8াঃrespectivelyThepossibilityof aVisarga orAnusvāra (onuʃʃār) followingaCandrabinduexists inBangla Vowel can optionally be followed by a combination of Hasanta Virama [H]Consonant [C] to formaYa-phala ldquoYa-phala isapresentation formofU+09AFBanglaletterযorlsquoyarsquoRepresentedbythesequenceltU+09CDieBENGALISIGNVIRAMABangla SIGNHasanta or VIRAMA U+09AF -য BENGALI LETTER YAgt Ya-phala has aspecialformয়AgainwhencombinedwithU+09BEাBENGALIVOWELSIGNAA(ielsquoaarsquo(ā))itisusedfortranscribing[aelig]asintheldquoardquointheEnglishwordldquobatrdquowritteninBanglaasব8াটAVowel-sequenceadmitsthefollowingcombinations
5421 ASingleVowel
ExamplesV অअ
5422 AVowelwithConditionsAVowelcanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D]orVisarga[X]or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX] or combination ofHasanta(orVirama)[H]followedbyConsonant[C]followedbykāra(Mātrā)[M]
5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]
5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX
Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)
545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full
10 Refer to Rule P in Section 7 Table 16
42
vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+
6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed
61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus
স +Hasanta+হ(U+09B8+U+09CD+U+09B9)
2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus
ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)
44
Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)
62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference
7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecifications
11 Unicode uses Oriya for the script although Odia is now the official term used 12 As used by the Unicode denoting and including both Assamese and Maṇipuri
47
Below are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)
C rarr Consonant
M rarr Kāra(Mātrā)
V rarr Vowel
B rarr Anusvāra
D rarr Candrabindu
X rarr Visarga
H rarr Hasanta
Z rarr KhandaTa
S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules
P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA
Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinations
49
711 Case of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াtঅuইিvয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অu)andthesecondwordbeginswithaV(ইিvয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstwordThis isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule
72 AdditionalExamplesfromBanglaABNF
Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive
ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of the
JanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka
[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018
[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473
57
10 Appendix-I
101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply
1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াwেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াxব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexampleyেসিরং(Tsering)
102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti
13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section
58
Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)
Table17ndashTheScriptTableofSylhetiNagarıorSiloti
103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints
ক k Uuml(p+ক)raquo(p+ট)(p+ত)Yacute(p+y+র)(p+y+ব)THORN(p+ন)szlig(p+ব)agrave(p+ম)ক8(p+য)aacute(p+র)acirc(p+ষ)atilde(p+frac12+ণ)auml(p+frac12+ম)aring(p+frac12+ব)acirc8(p+frac12+য)aelig(p+স)t(ccedil+ক)egrave(+p+র)
amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)
গ g ecirc(micro+গ)euml(micro+দ)acute(micro+ধ)igrave(micro+ন)iacute(micro+ব)icirc(micro+ম)গ8(micro+য)iuml(micro+র)eth(micro+ল)ntilde(ccedil+গ)ntilde6 ( +ccedil+গ)
ণ n ouml(Agrave+ট)iquest(Agrave+ঠ)divide(Agrave+ড)oslash(Agrave+Atilde+র)AElig(Agrave+ঢ)ugrave(Agrave+ণ)ণ8(Agrave+য)uacute(Agrave+ব)atilde(p+frac12+ণ)ucirc(frac12+ণ)uuml(+ণ)
32LanguagesConsideredBelowisthetabularrepresentationofthelanguagesusingBanglascriptthatareplacedonEGIDSScale1-6 (See117 fordetails) Some languagesunderEGIDS5 and6havealso developed their own scripts for printing and publishing Some had used Banglascriptearlier(suchasBodo)orusedit inWestBengalatsomepointoftime(Santali)but have later shifted to another writing system Bodo is now written in Nāgarī orDevanāgarīandforSantalioneusesbothNāgarīDevanāgarīandOl-chiki(145)Forthepurposesof theBanglaLGRonly languagesbelonging to theEGIDS scale1 to4havebeenconsideredConsiderthefollowingtable
9
EGIDSScale1
EGIDSScale2
EGIDSScale3
EGIDSScale4
EGIDSScale5
EGIDS6
Bangla(Bengali)
SantaliBodoRiangKhumiMru(ng)Asho
LepchaPnarKodaKoraChak
Asamiyā(Assamese)
KochorRajabansı
MaltoorMalpahariya
ManipuriorMeitei
BisnupriyaManipuriKok-Borok(TripuraampBangladesh)
ChakmaHajongMundariampKurux(ofBangladesh)
TotoRohingyaTipperaMegamTanchangya
Usoi LimbuSadriorOraon
BhumijorMundariBawmChin
Table4MainlanguagesinIndiaandBangladesh
thatuseBanglaScriptontheEGIDSScale
33NotableFeaturesofBanglaScript[150]BanglaWritingSystemhascertainfeaturesthatshowhowithastobewritteninorhowtype-setting inBangla couldbedoneThis section is followedbya section that explains theCode-points (and fixed Code-point sequences) which show certain distinctive characteristics ofBanglaandwhichmaketheRepertoireThenextsectionswillalsocoverthelsquoaksharrsquo-formationrules(ABNF)showingcharacterclassWordLevelEvaluation(WLE)andContextRulesaswellas In-ScriptandCross-ScriptVariantsHerewepresentsomebasic featuresof theScriptandPronuncition The Bangla script is an alpha-syllabic writing system in which writing of all
consonants are assumed to contain an accompanying lsquoinherentrsquo vowel(theoretically before or after each consonant) It varies between ɔ and odepending on the position of the consonant in the word At times theselsquoassumedrsquoorlsquoinherentrsquovowelsarenotpronouncedatall[142]
10
Vowelscanbewrittenasindependentlettersorbyusingavarietyofdiacriticalmarks which are written above below before after or both of the last twopositionstheconsonanttheyfollowinpronunciation[105]
WhenconsonantsoccurtogetherinclustersspecialconjunctlettersareformedInprintedBanglamanyof theseconsonantal clustersorconjoinedconsonantsareinuseThelettersfortheconsonantsotherthanthefinaloneinthegrouparegenerally reduced But there are a few special conjunct characters which arecompounds of the consonant characters eg 7(k)+ষ(s)=8(ks)
9(n)+জ(j)=(nj)(j)+ঞ(n)==(jn) gt (h)+ম(m)=(hm) There are other issuesalsomdashরasthesecondmemberofaclusterisreducedtoasecondarysymboleg
(p)+র(r)=A(pr)B(s)+C(t)+র(r)=D(str) (as inউD ustra ldquocamelrdquo)য (y)whenusedas a primary symbol represents jɔ in Bangla But its secondary symbol(allograph) jɔ-phala has two phonetic values When added to the initialconsonant in a word it is a vowel aelig (as in শGামল (syamala) ldquogreenrdquo র Gাপার
(ryapara)ldquowrapperrdquoetc)Butafteranon-initialconsonant it justdoublesit in
pronunciation (as in কাযH ধাযH etc) The I(r)+য(y) combination has two
shape of the second member is changedmdasheg M(ddh) N(gdh) and O(ndh)
respectively The solitary example of I (r)+ঋ(r)=ঋH (as in ৈনঋHত nairrtSouthwest) ndash usedmostly in cases of Classical borrowings shows the use ofsecondary symbol of a consonant followed by the primary symbol of a vowelTheinherentvowelonlyappliestothefinalconsonantofthecluster
The Bangla script has at least fifty-two primary symbols and quite a fewallographs(positionalvariantsofthem)correspondingtoforty-four(7oraland7nasalvowelsand30consonants)phonemes(150)orfunctionalspeechsoundswithsomeobviousredundanciesalthoughinoneofthefirstphonemicanalysisthenumberwasthoughttobethirty-fivephonemes[140]
11
As mentioned above in Bangla several graphemic symbols have secondaryshapestechnicallycalledlsquoallographsrsquowithacomplementarydistributionineachcaseThesegraphsormarkingsaregenerallyaddedtothefollowingpositionsoftheprimarysymbol[113]inthefollowingmanner
Asforcomplementarydistributionofvowelletters(word-orsyllable-initial)andVowel Matras which are relevant for ABNF let us consider the followingBesidessomesimpleVowelModifierscalledlsquoKarsrsquoinBangla(alsoreferredtoasMatraintheotherLGRdocumentsofNeo-Brāhmī)therearesomecombinatorymodifiersofBanglaVowelswithcertainconsonantsForexamplewhereas
আU+0986BENGALILETTERAAissubstitutedby
াU+09BEBENGALIVOWELSIGNAA
ইU+0987BENGALILETTERIissubstitutedby
pre-posedিU+09BFBENGALIVOWELSIGNI
ঈU+0988BENGALILETTERIIissubstitutedby
ীU+09C0BENGALIVOWELSIGNIIor
উU+0989BENGALILETTERUissubstitutedby
U+09C1 BENGALI VOWEL SIGN U by marking below the primary
grapheme there are some special vowel modifiers of উ as in the followingcombinedletters zwnj guratherthanwritingasগ(g)+ (u)
m (bh)+র (r)(n bhruldquoeyebrowrdquo)o (s)+র (r)(psru)ঋ (r)afterহ (h)(q hr)etc
TherehavebeenmanynotablecontributionsinsimplifyingandmodifyingBanglaspellings and combinatory techniques especially by scholars such as PabitraSarkar(1992)[134]Inthistherehasbeenanattempttoreducethenumberofallographs of both vowels and consonants in clusters and it has been widelyacceptedintheprintingofschooltextsinbothBangladeshandWestBengal[151152]Asofnow twosystems theold (traditional) and thenewgoon sidebysideoperativeindifferentdomains
HoweverinpreparationofthisLGRdocumenttheaimhasbeentoconsiderthewidelyused and usable sequences and combinations and their variations across the sisterscriptsbelongingtothebasketofBrāhmīwritingsystemsBanglaAcademyDhakapublishedStandardBanglaSpellingRulesin1992followingtherecommendationsofacommitteeconstitutedthroughaworkshopjointlyorganizedbytheJatıyaSy iksakramaandPathyapustakaBoardin1988AthroughlyrevisededitionoftheRuleswaspublishedinSeptember20126After the establishment of Banla A kademi ofWestBengal in 1986 its first PresidentAnnadasankar Ray (1904-2002) in his inaugural address gave a direction forstandardizationofBanglaalphabetscript thespellingsystemandclearlyarguedthattheywouldnotblindly followtheSanskriticmodelofconventionalgrammarAbroadlistofproposalswassenttoexpertsonBanglaandabroadagreementwasreachedforlsquohomogenizationofBanglaspellingrsquoby1988BasedonopinionsreceivedfromdifferentquartersaunanimouslistoflsquorulesrsquowasagreeduponThiswaspublishedbyalsquoSpellingDictionaryrsquo titled Ākādemi Bānāna Abhidhāna (1997) which was obviously morecomprehensive than lsquoTheUniversityofCalcuttaproposalsrsquomade in1936Alongwiththe lsquorationalizationrsquo of spellings another stepwas taken tomake thewriting systemeasier to read by making the symbols used both single and combined ones morelsquotransparentrsquoThesereformswereoriginallysuggestedbySarkar(1987firstpublishedin1978)[134][153]whereheusedthetermsSwaccha (lsquoTransparentrsquo)andAswaccha(lsquoOpaquersquo or non-transparent) even adding Ardha Swaccha (lsquohalf transparent) inbetweenthetwoSomesampleexamplesare Transparent r (nn) s (pt) [ (st) where both member of the cluster can berecognized
6Bangla Academy 2012 Bāṅlā Ekaḍemī Pramita Bāṅlā Bānānera Niyama (Bangla Academy StandardBanglaSpellingRules)DhakaːBanglaAcademy
Therewere in fact two types of proposals One concerned the shape of the lettersthose of consonant + vowel (CV) combinations and conjuncts which is consonant +consonantcombinationsTherewerefurthercomplexshapesiethoseofconsonant+consonant+ (consonant+) vowel (CC(CV) signs as in y (pru) or z (skru) SomedecisionsinthisareawerenecessarybecauseafewoftheCC(C)symbolsrepresentedcomplexitiesthatmadelearningthemdifficultforthechildrenTheotherdealtwiththespellings ofwords onlywithout any reference to the shapes of letters inwhich theywere written The basic objective here was lsquoone word one spellingrsquo to the greatestextentthatwaspossible[151]
Belowwe place a statement of themost salient changes that affect the consonant +vowelcombinations[153]
a The variants of the short u (^ উ-কার hrasva u-kāra) vowel sign have been brought down to one ie So zwnj (gu) is now গ Similarly h (ru) gt র zwnj (śu)gt শ j (hu)gtহ and therefore cluster + short u sign k (ntu)gt W
(ন++ত+উ) (stu)gt[ (স++ত+উ)
b The variants of long u (দীঘH ঊ-কার dīrgha u-kāra) have also been reduced
(rū)gt র n (bhrū) gt Y (ভ bh++র r+ঊ ū) (drū)gt (দ d++র r+ঊ ū) p (śrū)gt (শ ś++র r+ঊ ū)
c The variants of ঋ-কার (ṛ-kāra secondary symbol of ṛ) have been brought down to one q (hṛ) gt হ
Regarding consonant + consonant + (consonant)hellip+ (vowel) clusters PaschimbangaBanglaAkademi proposed transparent or semi-transparent shapes for clusters to theextentadmissibleinBanglawritingsystemSomeexampleswillclarifytheproposal(Aslashwillmeanthatthetraditionalcluster-shapeprecedesitwhiletheBanglaAkademiinnovationfollows)[153]
331TheConsonantsAsper traditional classificationBangla Consonants are categorized according to theirphoneticpropertiesespeciallyintermsofplaceandmannerofarticulation[107]Thereare Five lsquoVargarsquo (pronounced as lsquoBargarsquo in Bangla) or Groups (sets or classes)distinguished by Place of Articulation and one Non-lsquovargarsquo group [105] Each Vargawhich corresponds toStopsat a certainplaceof articulation containsa seriesof fiveconsonants classified as per their phonetic qualities (ie manner of articulation)beginning from Unvoiced and Unaspirated to Voiced and Aspirated (in the fourthcolumn)finallyendingwithaHomorganicorCorrespondingnasal[107]Considerthefollowingtable
332TheImplicitVowelKillerHasanta(calledrsquoHalantrsquoorlsquoHalantarsquoinotherBrahmı-basedscripts)As stated earlier all consonants are pronounced in isolation with an implicit vowel(centralback-ɔinBanglaastheneutralvowel)assumedtobeassociatedwiththem[121]ThelsquoHasantarsquo(=rsquoHalantrsquoorlsquoHalantarsquoinotherBrahmı-basedscripts)orthetermlsquoVirāmarsquo7(=rsquoDa rirsquoinBangla)aspreferredinUNICODE(cfUnicode30andabove)havebeenusedinthisreportastermsthathavebeenusedtodenotethecharacterthatmarkthe absence of this inherent vowel It may be noted that the term virama has beenadopted in UNICODE in a sense that is different from the traditional definition ofgrammarandhenceitrequiressomeexplanationhereConsideringtheimportanceofthedocumentthisnoteshouldbeapartofthisLGRdocumentsothatanybodyreferingtoitshouldbeabletoknowthepropergrammaticalexplanationofthetermBecauseaspecialsignisneededwheneverthisimplicitvowelisstrippedoffthesymbolisknownas the Hasanta (= Halant) (U+09CD) By placing the Hasanta under the firstconsonantofacombinationorclusteronecouldndashincommonparlanceldquokillrdquoitsvowelandcreate conjuncts In thismanner conjunct characters canbegenerallywrittenbyjoining two to fourconsonant combinations In rarecases thisprocess can joinup tofive consonantsHowever thenotionof amaximumnumberof consonants joining toformoneaksara8istobeboundedempiricallyThisisanobservationbasedontheCIIL-Emille Corpora of Bangla words [132 amp 133] as seen in print these days Given themixture of scripts and languages happening on theweb the possibility that onemaywant a generic Top Level Domain [gTLD] which may have more than the observedmaximum cannot be ruled out This can be the case when a foreign language wordwhichadmitsalargenumberofconsonantsistransliteratedintoBanglaHenceintheBanglaLGRworkthislimitwillnotbeenforced
333VowelsSeparate symbols exist for all lsquoSwararsquo or Vowels in Bangla which are pronouncedindependentlyeitheratthebeginningofthewordorafteranothervowelorconsonantsoundToindicateaVowelsoundotherthantheimplicitoneaVowelsigncalledlsquokārrsquoinBanglaorMātrāinNagarı9isattachedtotheconsonantSincetheconsonanthasthisbuilt in neutral vowel at the end there are equivalent kāras (Mātrās) for all vowelsexcepttheঅ(pronounced-ɔ)Thecorrelationisshownasfollows
7VirāmaasusedhereisalsoamisnomeraccordingtotheIndiangrammaticaltraditionsNowheremereabsence of a vowel is marked as virama Hasanta just marks the absence of a vowel nothing else(AbhyankarKashinathVasudevampJMShukla1961ADictionaryofSanskritGrammarBarodaːOrientalInstitute)8ThistermneedstobedisambiguatedAksaraalsomeanslsquosyllablelsquoinIndiangrammaticaltreaditions9AlthoughthetermlsquoMatralsquoinBanglastandsforanaltogetherdifferentconceptvizthetopbarplacedoveraletterndashtypicallyavailableinHindiandBanglabutmissinginGujarati
334TheAnusvāraonuʃʃār(ং-U+0982)TheAnusvāra or onuʃʃār inBangla at times represents a homorganic nasal but notalwaysItreplacesaconjunctgroupofalsquoNasalConsonant+Hasanta+ConsonantrsquowherethesecondconsonantbelongstotheVelarvargaorsetasinলংকাButitoftenappearsalso for such combinations involving non-velars appearing as the lastmember of thecombinationasinলGাংটা ldquonakedrdquoorলGাংচা ldquoakindofsweettolimprdquoBeforeanon-vargaconsonant the Anusvara represents a nasal sound that may have an alternativeconjoined writing symbol representing the corresponding nasal consonant of theparticularsetAlthoughModernHindiMarathiandKonkaniprefertheanusvāratothecorrespondingHalf-nasal inBangla it isclearlydemarcatedastowhereonemustusetheAnusvāraandwhere ithastobeaconjunctclusterwithanasalasthefirstorthesecondcomponent
335NasalizationCandrabindu(-U+0981)
Candrabindu denotes nasalization of the preceding vowel as in চাদ cad lsquomoonrsquo(U+099AU+09BEU+0981U+09A6)Thissignwithadotinsidethehalf-moonmarkisusedasnasalizationmarkerinmanyBrahmı-basedscripts[143]
336Nukta(-U+09BC)ThenuktasigndoesnotexistinBanglaorthographyItispredominantlyusedinmanyBrahmıderivedscriptssuchasDevanagarı(forHindiBodoMaithiliSantaliKashmiriandSindhiThetermandtheconceptofnuktaareborrowedinBanglaTheIDNAProtocol(RFC5891)statesthatIDNsmustbeinUnicodeNormalizationFormC (NFC) RFC 7940 applies this requirement to LGRs The definition of NFC in theUnicodeStandardcontainsanumberofcompositionexclusionsAsaresulttheBanglalettersয় YYAড় RRAandঢ় RRHAhavetoberepresentedinthethisLGRbyusingthesequences (YA +Nukta U+9AF + U+09BC) (DDA + Nukta U+9A1 + U+09BC) and(DDHA+NuktaU+9A2+U+09BC)insteadofthesinglecodepointsYYA(U+9DF)RRA(U+09DC) andRRHA (U+09DD) although the useof lsquoNuktarsquo is otherwise completelyunnaturalinBanglaIt is noted that in the current Unicode Standard chart these characters are listed asadditionalconsonantsAspertheLGRProcedurehoweverthesedecisionsdependontheIDNAProtocolthroughasetofprodeduresdevelopedbytheIETFEventhoughtheUnicode Standard also prescribesmethods to produce these three characters both asatomiccharacters (forexample09DC forড় [r]09DD forঢ় [rh] and09DFasয় [y]assinglekeystroke)theIDNAprotocolrequiresthatwetreatthemasconjunctcharactersandthenallocatecodesfortheseintheUnicodeBengaliBlock
18
ItmaybenotedthattherecouldbesporadicattemptsorcasesofwritingMuslimnamesUrdupoeticwordsandPerso-Arabicloanwordswithnuktaunderক(k)খ(kh)গ(g)জ(j) and ফ (ph) only for the sake of correct pronunciation and for maintaining thesanctityoftheloanwordThesewerealsolikeusingBanglawritingsystemtoworkliketheIPAscriptItishowevernotinuseinBanglawritinginprinting
337Visargabiʃɔrgo(ঃ-U+0983)andAvagraha(ঽ-U+09BD)
TheVisargabiʃɔrgoU+0983 is frequentlyused inBangla loanwordsborrowed fromSanskritandrepresentsasoundveryclosetohOnecouldquoteasanexampleদঃখduhkholdquosorrowrsquorsquoldquounhappinessrsquorsquo(U+0926U+0941U+0983U+0916)The Avagraha ঽ (U+09BD) is mainly used in Sanskrit Pali Prakrt or Maithili textswritteninBanglaItisgraduallybeingreplacedbyanuppercomma(egনেরাঽপরািণre-writtenasনেরাrsquoপরািণ)ItisrarelyusednoweveninotherlanguagesusingBanglascriptIncaseofLGRtheAvagrahaisnotpartoftherepertoireIthasbeendecidedthereforenottoretainAvagraha(ঽ)(U+09BD)becauseitisblockedinTLDsaspertheMaximalStartingRepertoire(MSR)PleaseseeAppendixIIinsection11foracompletelistofBanglaconsonantsandtheirallographs
type in the following mannermdashর++য but for র G the sequence would be
র+ZWJ++য [154] In other words ZWJ is used in the rendering of wordsdemanding ya-phalā after ra which is otherwise not possible to type (render)due to the same order of ra+hasanta+antastha ja in the medial andor finalposition Interestingly ra+hasanta+antastha ja is used to type repha on theconsonant -antasthaja as inকায6 (kaarjo) In order to get a ya-phalā after the
consonant -ra it is therefore obligatory to use ZWJ after -ra as in র Gাপার
TheuseofZWJZWNJhavebeenruledoutfromtherootzonebythe[Procedure]Usedin Bangla to create alternate renderings the insertion of these two signs can affectsearchingaswellasNLPTheZeroWidthNon-joiner(ZWNJ)isaninvisiblecharacterusedincertaincases(afterHasanta)wheredefaultconjunctformationistobeexplicitlyrestrictedandtheHasantajoiningthetwoconsonantsparticipatingintheconjunctformationneedstobeexplicitlyshown
339UseofYa-phalaaYa-Phalaasequencesare two instances inBanglawhereHasanta isprecededbya fullvowel(U+0985অ-BENGALILETTERAandU+098Fএ-BENGALILETTERE)
For renderingYa-phalā followedbyঅ andএ it isnecessary to typeU+09CDHasantaplusU+09AFyaprecededby thesaidvowelsThis isapurely ligaturalentityand theadditionofYa-phalāandākaraisusedtoelicittheaeligsoundasinEnglishacidঅGািসড
association অGােসািসেয়শনlsquobatrsquoবGাটlsquofatrsquo ফGাট lsquomatrsquo মGাটlsquocaprsquoকGাপetcTheBrāhmīscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribed as lsquovowel killerrsquo although it actually indicates absenceof a vowel after themarkedconsonantOnly theconsonantscanhave theHasantamarkedButasweseehereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅ8াandএ8া(CfUnicode100p473[100])
Owingtoco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egT ieka+Hasanta+raasinচTchakraldquocyclerdquo)Thepointisinboth the cases the slot for ra could be Bangla ra র (U+09B0) or the Assamese ra ৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesameTheLGRmakesanoteofthispoint of concern with respect to the two RAs in disguise as it would be compeltelyimpossibletodistinguishbetweenthemwithnakedeyesinalablesogeneratedwhichmay consequently lead to concerns related to spoofing and other kind of cyberirregularitiesThemotivetoclassthesetwoCPsas(blocking)variantsisbecausefullyrendered labels may mask the distinction between Bangla ra র (U+09B0) or theAssameseraৰ(U+09F0)ThatprovidesthejustificationforVariantSet4thoughonlyinthecontextoffollowingHasantThedifferencebetweentheRAsisonlydistinguishableifonelooksintotheirUnicodevaluesThereforelabelssuchasঅকHarka শীষH sırsalsquotopapexrsquo অY abhra lsquocloudthe skyrsquo ম śrama lsquophysical labourrsquo could be extremelydangerous as theweb-usermay never verify the digital content (the labels) with itsunicodevaluecodepoints ThispointismadeexplicitlywithreferencetoTable9(ofsequencesp36)andTable16(ofWLESymbolsp47)thataretofollowMoreoverit
4 OverallDevelopmentProcessandMethodologyThe Neo-Brāhmī Generation Panel (NBGP) has been formed by members havingexperience in Linguistics (especially in NLP Computational linguistics) LiteratureLanguageHistoryandEpigraphyUndertheNeo-BrāhmīGenerationPanelBanglaandeightotherscriptsbelongingtoseparateUnicodeblocksarebeingtakenuptoassignaseparate LGR for each However an attempt ismade to ensure that the fundamentalphilosophybehindbuildingthoseLGRsconsistentwithallotherBrāhmī-derivedscriptsThepresentLGRwillcater tomultiple languagesbelongingtoEGIDSscale1to4(seeTable4)thatuseBanglascriptThefollowingguidingprinciplesareusedinmakingdecisionsaboutBanglaLGRCode-points
41 GuidingPrinciplesTheNBGP adopts followingbroadprinciples for selection of code-points in the code-pointrepertoireacrosstheboardforalltheNeo-Brāhmīscriptswithinitsambit
411 InclusionPrinciples4111 ModernUsageEvery character proposed should be in the everyday usage of a particular linguisticcommunityThecharacterswhichhavebeenencodedintheUnicodefortranscriptionpurposesonlyorforarchivalpurposeswillnotbeconsideredforinclusioninthecode-pointrepertoire4112 UnambiguousUseEvery character proposed should have unambiguous understanding among linguistsaboutitsusageinthelanguage
42 ExclusionPrinciplesThe main exclusion principle is that of External Limits on Scope These consist ofprotocolsor standardswhichareprerequisites to theLabelGenerationRule-setsAllfurtherprinciplesareinfactsubsumedundertheselimitationsbuthavebeenspeltoutseparatelyforthesakeofclarity
22
421 ExternalLimitsofScopeThecodepointrepertoireforrootzonebeingaveryspecialcaseatthetopofprotocolhierarchies thecanvasofavailablecharacters forselectionasapartof theRootZonecodepointrepertoireisalreadyconstrainedbyvariousprotocollayersbeneathitThefollowingthreemainprotocolsstandardsactassuccessivefiltersiTheUnicodeChartOut of all the characters that are needed by the script in question if a particularcharacter is not encoded in Unicode it cannot be incorporated in the code pointrepertoire Such cases are quite rare and especially so in Bangla-Asamiyā-ManipuriWritingSystemgiventheelaborateandexhaustivecharacterinclusioneffortsmadebytheUnicodeconsortiumiiIDNAProtocolUnicode being the character-encoding standard for providing the maximum possiblerepresentation of a given scriptlanguage it has encoded as far as possible all thepossible characters needed by the script However the Domain name being aspecialized case it is governed by an additional protocol known as IDNA(InternationalizedDomainNames inApplications) The IDNAprotocol excludes somecharactersoutofUnicoderepertoirefrombeingpartofthedomainnamesiiiMaximalStartingRepertoire(MSR)TheRoot-zoneLGRbeing the repertoireof characterswhicharegoing tobeused forcreationoftheRoot-zoneTLDswhichinturnconstituteanevenmorespecializedcaseof domain names the ROOT LGR procedure introduces additional exclusions on theIDNArsquosallowedsetofcharacters ExampleBanglaSignAvagrahaঽ(U+093D)evenifallowedbyIDNAprotocol isnotpermittedintheRootZoneRepertoireaspertheMSRTosumuptherestrictionsstartoffwithadmittingonlysuchcharactersasarepartofthe code-block of the given scriptlanguage The IDNA Protocol further narrows thisdownandfinallyanadditionalfilterintheformofMaximalStartingRepertoirerestrictsthecharactersetassociatedwiththegivenlanguageevenmore4211 NoPunctuationMarksTheTLDsbeingidentifierspunctuationmarkerspresentinBraHami-basedscriptswillnotbeincluded
23
4212 NoSymbolsandAbbreviationsAbbreviations weights and measures and other such iconic characters like BANGLAISSHAR(U+09FA)BANGLACURRENCYDENOMINATORSIXTEEN৹(U+09F9)etcwillalsonotbeincluded4213 NoRareandObsoleteCharactersThere are characterswhich have been added toUnicode to accommodate rare formssuchasSanskriticVOCALICRRৠ(U+09E0)andVOCALICLldquoঌrdquo (U+098C)aswellasVOCALICLLৡ(U+09E1)andtheallographicndashkaraformsofthelattertwosymbols-VOWELSIGNVOCALICL(U+09E2)andVOWELSIGNVOCALICLLldquo(U+09E3)Allsuch charactersareexcludedwhich complieswith theConservatismprincipleas laiddownintheRootZoneLGRprocedureHoweverinBanglathe-karacorrespondingtoVOCALICRRৠ(U+09E0)whichisVOWELSIGNVOCALICRRldquordquo(U+09C4)isstill inactiveuseincertainlimitedborrowedorSanskriticwordsandarethereforeretained4214 NoStressMarkersofClassicalSanskritandVedicStressmarkers for classical Sanskrit will not be included This is also in consonancewiththeLetterprincipleaslaiddownintheRootZoneLGRprocedure4215 ABNFThe Augmented Backus-Naur Formalism (ABNF) is described in Section 541 andAppendix(Section101)
5 RepertoireTheBanglaWritingSystemisrepresentedinUNICODEusingtheBengali(Bangla)scriptname as enumerated in ISO 15924 corresponding to languages such as Asamiyā(Assamese) Bangla (Bengali) and Manipuri The BENGALI block used for Bangla-Asamiyā-Manipuri in theUNICODEhas93 entriesThis sectiondetails the code-pointrepertoirethattheNeo-BrāhmīGenerationPanel[NBGP]proposestobeincludedintheBanglaLGRItmaybementionedherethat theGovernmentofAssamhassubmittedaproposal toBureauof Indian Standards (BIS) on26th February2016 for dis-unificationofBanglaand Asamiyā Scripts The BIS inits 8thMeetingofIndian Language Technologies andProducts Sectional Committee LITD 20 held on 23rd Aug 2017 decided torefer theproposalforrecognitionofAssamesescriptinISOIEC10646toISOUntiltheUNICODEConsortiumtakesanyfurtheractionitwillbeassumedthattheCodePointRepertoireunderTable11willbevalidforallthethreelanguagesasabove
24
For each of the code points language references have been given in the last columntitledReferenceunderTable8titledtheldquoCodePointRepertoirerdquoForentirecoverageofBanglacodepointsreferencesofBanglaAsamiyā(Assamese)Manipuri(Meitei)andBishnupriya are given Kokborok written in Bangla script is not known to haveintroducedmanynewcomplicationsexceptforoneparticularcharacterThoughonlyafewrepresentativelanguagesunderEGIDSScale1-4havebeenchosenforreferencingthey together cover all the code-points required for all the languages that NBGP hasconsideredasgivenunderBanglaUnicodePoints(asgiveninUNICODE63)Howeverbefore thedetailsarepresented it is ideal to lookat theBanglaCodePointChartfromMaximalStartingRepertoire[MSR]Version3Itmaybenotedthattheshapesofthereferenceglyphsgivenbelowinthecodechartsarebasedononeofthemanyfontsdesigned and are not prescriptive because there could be some variations in actualfonts ndash both UNICODE-compatible and True-Type ones Consider the following Codepointtable
ColourconventionAllcharactersthatareincludedinthe[MSR]-YellowbackgroundPVALIDinIDNA2008butexcludedfromthe[MSR]-PinkishbackgroundNot PVALID in IDNA2008 or are ineligiblefor the root zone (digits hyphen) - Whitebackground
Apart from the above individual code-points the Neo-Brāhmī Generation Panel alsoproposes some specific sequences which enable conditional inclusion of the BanglaLETTER A and E followed by Bangla SIGN VIRAMA and Bangla LETTER YA againfollowed by Bangla VOWEL SIGN AA in the repertoire for enabling inclusion of aeligsoundasinEnglishlsquobatrsquolsquocatrsquoetc
52 CodePointRepertoireExclusionTherearesomecharactersoftheBanglascriptthatfindplaceintheUnicodebuthavenot been included in the repertoire in the LGR proposal The reason for excludingঌ(U+098C)andৗ(U+09D7)isthattheyarerareandobsoletecharacters
53 CodepointnotusedaloneBENGALI SIGN NUKTA U+09BC (See 336) is excluded from repertoire since it will never be used alone It will be used as sequence in three special characters in normalized form for ড় ঢ় য়
542 TheVowelSequenceInwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaare given To facilitate understanding of other Brahmi script users equivalents inDevanāgarīareprovidedwherevernecessaryAvowelsequenceismadeupofasinglevowelItmaybefollowedbutnotnecessarily(optionally)byanAnusvāraonuʃʃār(B)Candrabindu(D)oraVisargabiʃɔrgo(X)ThenumberofDBorXwhichcanfollowaV inBanglamaynotberestrictedtooneGoingbytherules illustratedinthedocument it isclearthat formationssuchasVDDVBBandVXXare invalidorthographicunitsHowever it isvalidandpossible tohaveformationsorsequencessuchasanusvarafollowedbyachandrabinduononehandandvisarga followed by a chandrabindu on the other as in হ8াংচা lsquohaelignchārsquo and lsquohaelignrsquo হ8াঃrespectivelyThepossibilityof aVisarga orAnusvāra (onuʃʃār) followingaCandrabinduexists inBangla Vowel can optionally be followed by a combination of Hasanta Virama [H]Consonant [C] to formaYa-phala ldquoYa-phala isapresentation formofU+09AFBanglaletterযorlsquoyarsquoRepresentedbythesequenceltU+09CDieBENGALISIGNVIRAMABangla SIGNHasanta or VIRAMA U+09AF -য BENGALI LETTER YAgt Ya-phala has aspecialformয়AgainwhencombinedwithU+09BEাBENGALIVOWELSIGNAA(ielsquoaarsquo(ā))itisusedfortranscribing[aelig]asintheldquoardquointheEnglishwordldquobatrdquowritteninBanglaasব8াটAVowel-sequenceadmitsthefollowingcombinations
5421 ASingleVowel
ExamplesV অअ
5422 AVowelwithConditionsAVowelcanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D]orVisarga[X]or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX] or combination ofHasanta(orVirama)[H]followedbyConsonant[C]followedbykāra(Mātrā)[M]
5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]
5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX
Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)
545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full
10 Refer to Rule P in Section 7 Table 16
42
vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+
6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed
61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus
স +Hasanta+হ(U+09B8+U+09CD+U+09B9)
2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus
ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)
44
Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)
62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference
7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecifications
11 Unicode uses Oriya for the script although Odia is now the official term used 12 As used by the Unicode denoting and including both Assamese and Maṇipuri
47
Below are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)
C rarr Consonant
M rarr Kāra(Mātrā)
V rarr Vowel
B rarr Anusvāra
D rarr Candrabindu
X rarr Visarga
H rarr Hasanta
Z rarr KhandaTa
S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules
P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA
Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinations
49
711 Case of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াtঅuইিvয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অu)andthesecondwordbeginswithaV(ইিvয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstwordThis isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule
72 AdditionalExamplesfromBanglaABNF
Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive
ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of the
JanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka
[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018
[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473
57
10 Appendix-I
101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply
1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াwেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াxব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexampleyেসিরং(Tsering)
102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti
13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section
58
Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)
Table17ndashTheScriptTableofSylhetiNagarıorSiloti
103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints
ক k Uuml(p+ক)raquo(p+ট)(p+ত)Yacute(p+y+র)(p+y+ব)THORN(p+ন)szlig(p+ব)agrave(p+ম)ক8(p+য)aacute(p+র)acirc(p+ষ)atilde(p+frac12+ণ)auml(p+frac12+ম)aring(p+frac12+ব)acirc8(p+frac12+য)aelig(p+স)t(ccedil+ক)egrave(+p+র)
amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)
গ g ecirc(micro+গ)euml(micro+দ)acute(micro+ধ)igrave(micro+ন)iacute(micro+ব)icirc(micro+ম)গ8(micro+য)iuml(micro+র)eth(micro+ল)ntilde(ccedil+গ)ntilde6 ( +ccedil+গ)
ণ n ouml(Agrave+ট)iquest(Agrave+ঠ)divide(Agrave+ড)oslash(Agrave+Atilde+র)AElig(Agrave+ঢ)ugrave(Agrave+ণ)ণ8(Agrave+য)uacute(Agrave+ব)atilde(p+frac12+ণ)ucirc(frac12+ণ)uuml(+ণ)
33NotableFeaturesofBanglaScript[150]BanglaWritingSystemhascertainfeaturesthatshowhowithastobewritteninorhowtype-setting inBangla couldbedoneThis section is followedbya section that explains theCode-points (and fixed Code-point sequences) which show certain distinctive characteristics ofBanglaandwhichmaketheRepertoireThenextsectionswillalsocoverthelsquoaksharrsquo-formationrules(ABNF)showingcharacterclassWordLevelEvaluation(WLE)andContextRulesaswellas In-ScriptandCross-ScriptVariantsHerewepresentsomebasic featuresof theScriptandPronuncition The Bangla script is an alpha-syllabic writing system in which writing of all
consonants are assumed to contain an accompanying lsquoinherentrsquo vowel(theoretically before or after each consonant) It varies between ɔ and odepending on the position of the consonant in the word At times theselsquoassumedrsquoorlsquoinherentrsquovowelsarenotpronouncedatall[142]
10
Vowelscanbewrittenasindependentlettersorbyusingavarietyofdiacriticalmarks which are written above below before after or both of the last twopositionstheconsonanttheyfollowinpronunciation[105]
WhenconsonantsoccurtogetherinclustersspecialconjunctlettersareformedInprintedBanglamanyof theseconsonantal clustersorconjoinedconsonantsareinuseThelettersfortheconsonantsotherthanthefinaloneinthegrouparegenerally reduced But there are a few special conjunct characters which arecompounds of the consonant characters eg 7(k)+ষ(s)=8(ks)
9(n)+জ(j)=(nj)(j)+ঞ(n)==(jn) gt (h)+ম(m)=(hm) There are other issuesalsomdashরasthesecondmemberofaclusterisreducedtoasecondarysymboleg
(p)+র(r)=A(pr)B(s)+C(t)+র(r)=D(str) (as inউD ustra ldquocamelrdquo)য (y)whenusedas a primary symbol represents jɔ in Bangla But its secondary symbol(allograph) jɔ-phala has two phonetic values When added to the initialconsonant in a word it is a vowel aelig (as in শGামল (syamala) ldquogreenrdquo র Gাপার
(ryapara)ldquowrapperrdquoetc)Butafteranon-initialconsonant it justdoublesit in
pronunciation (as in কাযH ধাযH etc) The I(r)+য(y) combination has two
shape of the second member is changedmdasheg M(ddh) N(gdh) and O(ndh)
respectively The solitary example of I (r)+ঋ(r)=ঋH (as in ৈনঋHত nairrtSouthwest) ndash usedmostly in cases of Classical borrowings shows the use ofsecondary symbol of a consonant followed by the primary symbol of a vowelTheinherentvowelonlyappliestothefinalconsonantofthecluster
The Bangla script has at least fifty-two primary symbols and quite a fewallographs(positionalvariantsofthem)correspondingtoforty-four(7oraland7nasalvowelsand30consonants)phonemes(150)orfunctionalspeechsoundswithsomeobviousredundanciesalthoughinoneofthefirstphonemicanalysisthenumberwasthoughttobethirty-fivephonemes[140]
11
As mentioned above in Bangla several graphemic symbols have secondaryshapestechnicallycalledlsquoallographsrsquowithacomplementarydistributionineachcaseThesegraphsormarkingsaregenerallyaddedtothefollowingpositionsoftheprimarysymbol[113]inthefollowingmanner
Asforcomplementarydistributionofvowelletters(word-orsyllable-initial)andVowel Matras which are relevant for ABNF let us consider the followingBesidessomesimpleVowelModifierscalledlsquoKarsrsquoinBangla(alsoreferredtoasMatraintheotherLGRdocumentsofNeo-Brāhmī)therearesomecombinatorymodifiersofBanglaVowelswithcertainconsonantsForexamplewhereas
আU+0986BENGALILETTERAAissubstitutedby
াU+09BEBENGALIVOWELSIGNAA
ইU+0987BENGALILETTERIissubstitutedby
pre-posedিU+09BFBENGALIVOWELSIGNI
ঈU+0988BENGALILETTERIIissubstitutedby
ীU+09C0BENGALIVOWELSIGNIIor
উU+0989BENGALILETTERUissubstitutedby
U+09C1 BENGALI VOWEL SIGN U by marking below the primary
grapheme there are some special vowel modifiers of উ as in the followingcombinedletters zwnj guratherthanwritingasগ(g)+ (u)
m (bh)+র (r)(n bhruldquoeyebrowrdquo)o (s)+র (r)(psru)ঋ (r)afterহ (h)(q hr)etc
TherehavebeenmanynotablecontributionsinsimplifyingandmodifyingBanglaspellings and combinatory techniques especially by scholars such as PabitraSarkar(1992)[134]Inthistherehasbeenanattempttoreducethenumberofallographs of both vowels and consonants in clusters and it has been widelyacceptedintheprintingofschooltextsinbothBangladeshandWestBengal[151152]Asofnow twosystems theold (traditional) and thenewgoon sidebysideoperativeindifferentdomains
HoweverinpreparationofthisLGRdocumenttheaimhasbeentoconsiderthewidelyused and usable sequences and combinations and their variations across the sisterscriptsbelongingtothebasketofBrāhmīwritingsystemsBanglaAcademyDhakapublishedStandardBanglaSpellingRulesin1992followingtherecommendationsofacommitteeconstitutedthroughaworkshopjointlyorganizedbytheJatıyaSy iksakramaandPathyapustakaBoardin1988AthroughlyrevisededitionoftheRuleswaspublishedinSeptember20126After the establishment of Banla A kademi ofWestBengal in 1986 its first PresidentAnnadasankar Ray (1904-2002) in his inaugural address gave a direction forstandardizationofBanglaalphabetscript thespellingsystemandclearlyarguedthattheywouldnotblindly followtheSanskriticmodelofconventionalgrammarAbroadlistofproposalswassenttoexpertsonBanglaandabroadagreementwasreachedforlsquohomogenizationofBanglaspellingrsquoby1988BasedonopinionsreceivedfromdifferentquartersaunanimouslistoflsquorulesrsquowasagreeduponThiswaspublishedbyalsquoSpellingDictionaryrsquo titled Ākādemi Bānāna Abhidhāna (1997) which was obviously morecomprehensive than lsquoTheUniversityofCalcuttaproposalsrsquomade in1936Alongwiththe lsquorationalizationrsquo of spellings another stepwas taken tomake thewriting systemeasier to read by making the symbols used both single and combined ones morelsquotransparentrsquoThesereformswereoriginallysuggestedbySarkar(1987firstpublishedin1978)[134][153]whereheusedthetermsSwaccha (lsquoTransparentrsquo)andAswaccha(lsquoOpaquersquo or non-transparent) even adding Ardha Swaccha (lsquohalf transparent) inbetweenthetwoSomesampleexamplesare Transparent r (nn) s (pt) [ (st) where both member of the cluster can berecognized
6Bangla Academy 2012 Bāṅlā Ekaḍemī Pramita Bāṅlā Bānānera Niyama (Bangla Academy StandardBanglaSpellingRules)DhakaːBanglaAcademy
Therewere in fact two types of proposals One concerned the shape of the lettersthose of consonant + vowel (CV) combinations and conjuncts which is consonant +consonantcombinationsTherewerefurthercomplexshapesiethoseofconsonant+consonant+ (consonant+) vowel (CC(CV) signs as in y (pru) or z (skru) SomedecisionsinthisareawerenecessarybecauseafewoftheCC(C)symbolsrepresentedcomplexitiesthatmadelearningthemdifficultforthechildrenTheotherdealtwiththespellings ofwords onlywithout any reference to the shapes of letters inwhich theywere written The basic objective here was lsquoone word one spellingrsquo to the greatestextentthatwaspossible[151]
Belowwe place a statement of themost salient changes that affect the consonant +vowelcombinations[153]
a The variants of the short u (^ উ-কার hrasva u-kāra) vowel sign have been brought down to one ie So zwnj (gu) is now গ Similarly h (ru) gt র zwnj (śu)gt শ j (hu)gtহ and therefore cluster + short u sign k (ntu)gt W
(ন++ত+উ) (stu)gt[ (স++ত+উ)
b The variants of long u (দীঘH ঊ-কার dīrgha u-kāra) have also been reduced
(rū)gt র n (bhrū) gt Y (ভ bh++র r+ঊ ū) (drū)gt (দ d++র r+ঊ ū) p (śrū)gt (শ ś++র r+ঊ ū)
c The variants of ঋ-কার (ṛ-kāra secondary symbol of ṛ) have been brought down to one q (hṛ) gt হ
Regarding consonant + consonant + (consonant)hellip+ (vowel) clusters PaschimbangaBanglaAkademi proposed transparent or semi-transparent shapes for clusters to theextentadmissibleinBanglawritingsystemSomeexampleswillclarifytheproposal(Aslashwillmeanthatthetraditionalcluster-shapeprecedesitwhiletheBanglaAkademiinnovationfollows)[153]
331TheConsonantsAsper traditional classificationBangla Consonants are categorized according to theirphoneticpropertiesespeciallyintermsofplaceandmannerofarticulation[107]Thereare Five lsquoVargarsquo (pronounced as lsquoBargarsquo in Bangla) or Groups (sets or classes)distinguished by Place of Articulation and one Non-lsquovargarsquo group [105] Each Vargawhich corresponds toStopsat a certainplaceof articulation containsa seriesof fiveconsonants classified as per their phonetic qualities (ie manner of articulation)beginning from Unvoiced and Unaspirated to Voiced and Aspirated (in the fourthcolumn)finallyendingwithaHomorganicorCorrespondingnasal[107]Considerthefollowingtable
332TheImplicitVowelKillerHasanta(calledrsquoHalantrsquoorlsquoHalantarsquoinotherBrahmı-basedscripts)As stated earlier all consonants are pronounced in isolation with an implicit vowel(centralback-ɔinBanglaastheneutralvowel)assumedtobeassociatedwiththem[121]ThelsquoHasantarsquo(=rsquoHalantrsquoorlsquoHalantarsquoinotherBrahmı-basedscripts)orthetermlsquoVirāmarsquo7(=rsquoDa rirsquoinBangla)aspreferredinUNICODE(cfUnicode30andabove)havebeenusedinthisreportastermsthathavebeenusedtodenotethecharacterthatmarkthe absence of this inherent vowel It may be noted that the term virama has beenadopted in UNICODE in a sense that is different from the traditional definition ofgrammarandhenceitrequiressomeexplanationhereConsideringtheimportanceofthedocumentthisnoteshouldbeapartofthisLGRdocumentsothatanybodyreferingtoitshouldbeabletoknowthepropergrammaticalexplanationofthetermBecauseaspecialsignisneededwheneverthisimplicitvowelisstrippedoffthesymbolisknownas the Hasanta (= Halant) (U+09CD) By placing the Hasanta under the firstconsonantofacombinationorclusteronecouldndashincommonparlanceldquokillrdquoitsvowelandcreate conjuncts In thismanner conjunct characters canbegenerallywrittenbyjoining two to fourconsonant combinations In rarecases thisprocess can joinup tofive consonantsHowever thenotionof amaximumnumberof consonants joining toformoneaksara8istobeboundedempiricallyThisisanobservationbasedontheCIIL-Emille Corpora of Bangla words [132 amp 133] as seen in print these days Given themixture of scripts and languages happening on theweb the possibility that onemaywant a generic Top Level Domain [gTLD] which may have more than the observedmaximum cannot be ruled out This can be the case when a foreign language wordwhichadmitsalargenumberofconsonantsistransliteratedintoBanglaHenceintheBanglaLGRworkthislimitwillnotbeenforced
333VowelsSeparate symbols exist for all lsquoSwararsquo or Vowels in Bangla which are pronouncedindependentlyeitheratthebeginningofthewordorafteranothervowelorconsonantsoundToindicateaVowelsoundotherthantheimplicitoneaVowelsigncalledlsquokārrsquoinBanglaorMātrāinNagarı9isattachedtotheconsonantSincetheconsonanthasthisbuilt in neutral vowel at the end there are equivalent kāras (Mātrās) for all vowelsexcepttheঅ(pronounced-ɔ)Thecorrelationisshownasfollows
7VirāmaasusedhereisalsoamisnomeraccordingtotheIndiangrammaticaltraditionsNowheremereabsence of a vowel is marked as virama Hasanta just marks the absence of a vowel nothing else(AbhyankarKashinathVasudevampJMShukla1961ADictionaryofSanskritGrammarBarodaːOrientalInstitute)8ThistermneedstobedisambiguatedAksaraalsomeanslsquosyllablelsquoinIndiangrammaticaltreaditions9AlthoughthetermlsquoMatralsquoinBanglastandsforanaltogetherdifferentconceptvizthetopbarplacedoveraletterndashtypicallyavailableinHindiandBanglabutmissinginGujarati
334TheAnusvāraonuʃʃār(ং-U+0982)TheAnusvāra or onuʃʃār inBangla at times represents a homorganic nasal but notalwaysItreplacesaconjunctgroupofalsquoNasalConsonant+Hasanta+ConsonantrsquowherethesecondconsonantbelongstotheVelarvargaorsetasinলংকাButitoftenappearsalso for such combinations involving non-velars appearing as the lastmember of thecombinationasinলGাংটা ldquonakedrdquoorলGাংচা ldquoakindofsweettolimprdquoBeforeanon-vargaconsonant the Anusvara represents a nasal sound that may have an alternativeconjoined writing symbol representing the corresponding nasal consonant of theparticularsetAlthoughModernHindiMarathiandKonkaniprefertheanusvāratothecorrespondingHalf-nasal inBangla it isclearlydemarcatedastowhereonemustusetheAnusvāraandwhere ithastobeaconjunctclusterwithanasalasthefirstorthesecondcomponent
335NasalizationCandrabindu(-U+0981)
Candrabindu denotes nasalization of the preceding vowel as in চাদ cad lsquomoonrsquo(U+099AU+09BEU+0981U+09A6)Thissignwithadotinsidethehalf-moonmarkisusedasnasalizationmarkerinmanyBrahmı-basedscripts[143]
336Nukta(-U+09BC)ThenuktasigndoesnotexistinBanglaorthographyItispredominantlyusedinmanyBrahmıderivedscriptssuchasDevanagarı(forHindiBodoMaithiliSantaliKashmiriandSindhiThetermandtheconceptofnuktaareborrowedinBanglaTheIDNAProtocol(RFC5891)statesthatIDNsmustbeinUnicodeNormalizationFormC (NFC) RFC 7940 applies this requirement to LGRs The definition of NFC in theUnicodeStandardcontainsanumberofcompositionexclusionsAsaresulttheBanglalettersয় YYAড় RRAandঢ় RRHAhavetoberepresentedinthethisLGRbyusingthesequences (YA +Nukta U+9AF + U+09BC) (DDA + Nukta U+9A1 + U+09BC) and(DDHA+NuktaU+9A2+U+09BC)insteadofthesinglecodepointsYYA(U+9DF)RRA(U+09DC) andRRHA (U+09DD) although the useof lsquoNuktarsquo is otherwise completelyunnaturalinBanglaIt is noted that in the current Unicode Standard chart these characters are listed asadditionalconsonantsAspertheLGRProcedurehoweverthesedecisionsdependontheIDNAProtocolthroughasetofprodeduresdevelopedbytheIETFEventhoughtheUnicode Standard also prescribesmethods to produce these three characters both asatomiccharacters (forexample09DC forড় [r]09DD forঢ় [rh] and09DFasয় [y]assinglekeystroke)theIDNAprotocolrequiresthatwetreatthemasconjunctcharactersandthenallocatecodesfortheseintheUnicodeBengaliBlock
18
ItmaybenotedthattherecouldbesporadicattemptsorcasesofwritingMuslimnamesUrdupoeticwordsandPerso-Arabicloanwordswithnuktaunderক(k)খ(kh)গ(g)জ(j) and ফ (ph) only for the sake of correct pronunciation and for maintaining thesanctityoftheloanwordThesewerealsolikeusingBanglawritingsystemtoworkliketheIPAscriptItishowevernotinuseinBanglawritinginprinting
337Visargabiʃɔrgo(ঃ-U+0983)andAvagraha(ঽ-U+09BD)
TheVisargabiʃɔrgoU+0983 is frequentlyused inBangla loanwordsborrowed fromSanskritandrepresentsasoundveryclosetohOnecouldquoteasanexampleদঃখduhkholdquosorrowrsquorsquoldquounhappinessrsquorsquo(U+0926U+0941U+0983U+0916)The Avagraha ঽ (U+09BD) is mainly used in Sanskrit Pali Prakrt or Maithili textswritteninBanglaItisgraduallybeingreplacedbyanuppercomma(egনেরাঽপরািণre-writtenasনেরাrsquoপরািণ)ItisrarelyusednoweveninotherlanguagesusingBanglascriptIncaseofLGRtheAvagrahaisnotpartoftherepertoireIthasbeendecidedthereforenottoretainAvagraha(ঽ)(U+09BD)becauseitisblockedinTLDsaspertheMaximalStartingRepertoire(MSR)PleaseseeAppendixIIinsection11foracompletelistofBanglaconsonantsandtheirallographs
type in the following mannermdashর++য but for র G the sequence would be
র+ZWJ++য [154] In other words ZWJ is used in the rendering of wordsdemanding ya-phalā after ra which is otherwise not possible to type (render)due to the same order of ra+hasanta+antastha ja in the medial andor finalposition Interestingly ra+hasanta+antastha ja is used to type repha on theconsonant -antasthaja as inকায6 (kaarjo) In order to get a ya-phalā after the
consonant -ra it is therefore obligatory to use ZWJ after -ra as in র Gাপার
TheuseofZWJZWNJhavebeenruledoutfromtherootzonebythe[Procedure]Usedin Bangla to create alternate renderings the insertion of these two signs can affectsearchingaswellasNLPTheZeroWidthNon-joiner(ZWNJ)isaninvisiblecharacterusedincertaincases(afterHasanta)wheredefaultconjunctformationistobeexplicitlyrestrictedandtheHasantajoiningthetwoconsonantsparticipatingintheconjunctformationneedstobeexplicitlyshown
339UseofYa-phalaaYa-Phalaasequencesare two instances inBanglawhereHasanta isprecededbya fullvowel(U+0985অ-BENGALILETTERAandU+098Fএ-BENGALILETTERE)
For renderingYa-phalā followedbyঅ andএ it isnecessary to typeU+09CDHasantaplusU+09AFyaprecededby thesaidvowelsThis isapurely ligaturalentityand theadditionofYa-phalāandākaraisusedtoelicittheaeligsoundasinEnglishacidঅGািসড
association অGােসািসেয়শনlsquobatrsquoবGাটlsquofatrsquo ফGাট lsquomatrsquo মGাটlsquocaprsquoকGাপetcTheBrāhmīscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribed as lsquovowel killerrsquo although it actually indicates absenceof a vowel after themarkedconsonantOnly theconsonantscanhave theHasantamarkedButasweseehereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅ8াandএ8া(CfUnicode100p473[100])
Owingtoco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egT ieka+Hasanta+raasinচTchakraldquocyclerdquo)Thepointisinboth the cases the slot for ra could be Bangla ra র (U+09B0) or the Assamese ra ৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesameTheLGRmakesanoteofthispoint of concern with respect to the two RAs in disguise as it would be compeltelyimpossibletodistinguishbetweenthemwithnakedeyesinalablesogeneratedwhichmay consequently lead to concerns related to spoofing and other kind of cyberirregularitiesThemotivetoclassthesetwoCPsas(blocking)variantsisbecausefullyrendered labels may mask the distinction between Bangla ra র (U+09B0) or theAssameseraৰ(U+09F0)ThatprovidesthejustificationforVariantSet4thoughonlyinthecontextoffollowingHasantThedifferencebetweentheRAsisonlydistinguishableifonelooksintotheirUnicodevaluesThereforelabelssuchasঅকHarka শীষH sırsalsquotopapexrsquo অY abhra lsquocloudthe skyrsquo ম śrama lsquophysical labourrsquo could be extremelydangerous as theweb-usermay never verify the digital content (the labels) with itsunicodevaluecodepoints ThispointismadeexplicitlywithreferencetoTable9(ofsequencesp36)andTable16(ofWLESymbolsp47)thataretofollowMoreoverit
4 OverallDevelopmentProcessandMethodologyThe Neo-Brāhmī Generation Panel (NBGP) has been formed by members havingexperience in Linguistics (especially in NLP Computational linguistics) LiteratureLanguageHistoryandEpigraphyUndertheNeo-BrāhmīGenerationPanelBanglaandeightotherscriptsbelongingtoseparateUnicodeblocksarebeingtakenuptoassignaseparate LGR for each However an attempt ismade to ensure that the fundamentalphilosophybehindbuildingthoseLGRsconsistentwithallotherBrāhmī-derivedscriptsThepresentLGRwillcater tomultiple languagesbelongingtoEGIDSscale1to4(seeTable4)thatuseBanglascriptThefollowingguidingprinciplesareusedinmakingdecisionsaboutBanglaLGRCode-points
41 GuidingPrinciplesTheNBGP adopts followingbroadprinciples for selection of code-points in the code-pointrepertoireacrosstheboardforalltheNeo-Brāhmīscriptswithinitsambit
411 InclusionPrinciples4111 ModernUsageEvery character proposed should be in the everyday usage of a particular linguisticcommunityThecharacterswhichhavebeenencodedintheUnicodefortranscriptionpurposesonlyorforarchivalpurposeswillnotbeconsideredforinclusioninthecode-pointrepertoire4112 UnambiguousUseEvery character proposed should have unambiguous understanding among linguistsaboutitsusageinthelanguage
42 ExclusionPrinciplesThe main exclusion principle is that of External Limits on Scope These consist ofprotocolsor standardswhichareprerequisites to theLabelGenerationRule-setsAllfurtherprinciplesareinfactsubsumedundertheselimitationsbuthavebeenspeltoutseparatelyforthesakeofclarity
22
421 ExternalLimitsofScopeThecodepointrepertoireforrootzonebeingaveryspecialcaseatthetopofprotocolhierarchies thecanvasofavailablecharacters forselectionasapartof theRootZonecodepointrepertoireisalreadyconstrainedbyvariousprotocollayersbeneathitThefollowingthreemainprotocolsstandardsactassuccessivefiltersiTheUnicodeChartOut of all the characters that are needed by the script in question if a particularcharacter is not encoded in Unicode it cannot be incorporated in the code pointrepertoire Such cases are quite rare and especially so in Bangla-Asamiyā-ManipuriWritingSystemgiventheelaborateandexhaustivecharacterinclusioneffortsmadebytheUnicodeconsortiumiiIDNAProtocolUnicode being the character-encoding standard for providing the maximum possiblerepresentation of a given scriptlanguage it has encoded as far as possible all thepossible characters needed by the script However the Domain name being aspecialized case it is governed by an additional protocol known as IDNA(InternationalizedDomainNames inApplications) The IDNAprotocol excludes somecharactersoutofUnicoderepertoirefrombeingpartofthedomainnamesiiiMaximalStartingRepertoire(MSR)TheRoot-zoneLGRbeing the repertoireof characterswhicharegoing tobeused forcreationoftheRoot-zoneTLDswhichinturnconstituteanevenmorespecializedcaseof domain names the ROOT LGR procedure introduces additional exclusions on theIDNArsquosallowedsetofcharacters ExampleBanglaSignAvagrahaঽ(U+093D)evenifallowedbyIDNAprotocol isnotpermittedintheRootZoneRepertoireaspertheMSRTosumuptherestrictionsstartoffwithadmittingonlysuchcharactersasarepartofthe code-block of the given scriptlanguage The IDNA Protocol further narrows thisdownandfinallyanadditionalfilterintheformofMaximalStartingRepertoirerestrictsthecharactersetassociatedwiththegivenlanguageevenmore4211 NoPunctuationMarksTheTLDsbeingidentifierspunctuationmarkerspresentinBraHami-basedscriptswillnotbeincluded
23
4212 NoSymbolsandAbbreviationsAbbreviations weights and measures and other such iconic characters like BANGLAISSHAR(U+09FA)BANGLACURRENCYDENOMINATORSIXTEEN৹(U+09F9)etcwillalsonotbeincluded4213 NoRareandObsoleteCharactersThere are characterswhich have been added toUnicode to accommodate rare formssuchasSanskriticVOCALICRRৠ(U+09E0)andVOCALICLldquoঌrdquo (U+098C)aswellasVOCALICLLৡ(U+09E1)andtheallographicndashkaraformsofthelattertwosymbols-VOWELSIGNVOCALICL(U+09E2)andVOWELSIGNVOCALICLLldquo(U+09E3)Allsuch charactersareexcludedwhich complieswith theConservatismprincipleas laiddownintheRootZoneLGRprocedureHoweverinBanglathe-karacorrespondingtoVOCALICRRৠ(U+09E0)whichisVOWELSIGNVOCALICRRldquordquo(U+09C4)isstill inactiveuseincertainlimitedborrowedorSanskriticwordsandarethereforeretained4214 NoStressMarkersofClassicalSanskritandVedicStressmarkers for classical Sanskrit will not be included This is also in consonancewiththeLetterprincipleaslaiddownintheRootZoneLGRprocedure4215 ABNFThe Augmented Backus-Naur Formalism (ABNF) is described in Section 541 andAppendix(Section101)
5 RepertoireTheBanglaWritingSystemisrepresentedinUNICODEusingtheBengali(Bangla)scriptname as enumerated in ISO 15924 corresponding to languages such as Asamiyā(Assamese) Bangla (Bengali) and Manipuri The BENGALI block used for Bangla-Asamiyā-Manipuri in theUNICODEhas93 entriesThis sectiondetails the code-pointrepertoirethattheNeo-BrāhmīGenerationPanel[NBGP]proposestobeincludedintheBanglaLGRItmaybementionedherethat theGovernmentofAssamhassubmittedaproposal toBureauof Indian Standards (BIS) on26th February2016 for dis-unificationofBanglaand Asamiyā Scripts The BIS inits 8thMeetingofIndian Language Technologies andProducts Sectional Committee LITD 20 held on 23rd Aug 2017 decided torefer theproposalforrecognitionofAssamesescriptinISOIEC10646toISOUntiltheUNICODEConsortiumtakesanyfurtheractionitwillbeassumedthattheCodePointRepertoireunderTable11willbevalidforallthethreelanguagesasabove
24
For each of the code points language references have been given in the last columntitledReferenceunderTable8titledtheldquoCodePointRepertoirerdquoForentirecoverageofBanglacodepointsreferencesofBanglaAsamiyā(Assamese)Manipuri(Meitei)andBishnupriya are given Kokborok written in Bangla script is not known to haveintroducedmanynewcomplicationsexceptforoneparticularcharacterThoughonlyafewrepresentativelanguagesunderEGIDSScale1-4havebeenchosenforreferencingthey together cover all the code-points required for all the languages that NBGP hasconsideredasgivenunderBanglaUnicodePoints(asgiveninUNICODE63)Howeverbefore thedetailsarepresented it is ideal to lookat theBanglaCodePointChartfromMaximalStartingRepertoire[MSR]Version3Itmaybenotedthattheshapesofthereferenceglyphsgivenbelowinthecodechartsarebasedononeofthemanyfontsdesigned and are not prescriptive because there could be some variations in actualfonts ndash both UNICODE-compatible and True-Type ones Consider the following Codepointtable
ColourconventionAllcharactersthatareincludedinthe[MSR]-YellowbackgroundPVALIDinIDNA2008butexcludedfromthe[MSR]-PinkishbackgroundNot PVALID in IDNA2008 or are ineligiblefor the root zone (digits hyphen) - Whitebackground
Apart from the above individual code-points the Neo-Brāhmī Generation Panel alsoproposes some specific sequences which enable conditional inclusion of the BanglaLETTER A and E followed by Bangla SIGN VIRAMA and Bangla LETTER YA againfollowed by Bangla VOWEL SIGN AA in the repertoire for enabling inclusion of aeligsoundasinEnglishlsquobatrsquolsquocatrsquoetc
52 CodePointRepertoireExclusionTherearesomecharactersoftheBanglascriptthatfindplaceintheUnicodebuthavenot been included in the repertoire in the LGR proposal The reason for excludingঌ(U+098C)andৗ(U+09D7)isthattheyarerareandobsoletecharacters
53 CodepointnotusedaloneBENGALI SIGN NUKTA U+09BC (See 336) is excluded from repertoire since it will never be used alone It will be used as sequence in three special characters in normalized form for ড় ঢ় য়
542 TheVowelSequenceInwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaare given To facilitate understanding of other Brahmi script users equivalents inDevanāgarīareprovidedwherevernecessaryAvowelsequenceismadeupofasinglevowelItmaybefollowedbutnotnecessarily(optionally)byanAnusvāraonuʃʃār(B)Candrabindu(D)oraVisargabiʃɔrgo(X)ThenumberofDBorXwhichcanfollowaV inBanglamaynotberestrictedtooneGoingbytherules illustratedinthedocument it isclearthat formationssuchasVDDVBBandVXXare invalidorthographicunitsHowever it isvalidandpossible tohaveformationsorsequencessuchasanusvarafollowedbyachandrabinduononehandandvisarga followed by a chandrabindu on the other as in হ8াংচা lsquohaelignchārsquo and lsquohaelignrsquo হ8াঃrespectivelyThepossibilityof aVisarga orAnusvāra (onuʃʃār) followingaCandrabinduexists inBangla Vowel can optionally be followed by a combination of Hasanta Virama [H]Consonant [C] to formaYa-phala ldquoYa-phala isapresentation formofU+09AFBanglaletterযorlsquoyarsquoRepresentedbythesequenceltU+09CDieBENGALISIGNVIRAMABangla SIGNHasanta or VIRAMA U+09AF -য BENGALI LETTER YAgt Ya-phala has aspecialformয়AgainwhencombinedwithU+09BEাBENGALIVOWELSIGNAA(ielsquoaarsquo(ā))itisusedfortranscribing[aelig]asintheldquoardquointheEnglishwordldquobatrdquowritteninBanglaasব8াটAVowel-sequenceadmitsthefollowingcombinations
5421 ASingleVowel
ExamplesV অअ
5422 AVowelwithConditionsAVowelcanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D]orVisarga[X]or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX] or combination ofHasanta(orVirama)[H]followedbyConsonant[C]followedbykāra(Mātrā)[M]
5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]
5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX
Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)
545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full
10 Refer to Rule P in Section 7 Table 16
42
vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+
6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed
61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus
স +Hasanta+হ(U+09B8+U+09CD+U+09B9)
2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus
ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)
44
Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)
62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference
7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecifications
11 Unicode uses Oriya for the script although Odia is now the official term used 12 As used by the Unicode denoting and including both Assamese and Maṇipuri
47
Below are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)
C rarr Consonant
M rarr Kāra(Mātrā)
V rarr Vowel
B rarr Anusvāra
D rarr Candrabindu
X rarr Visarga
H rarr Hasanta
Z rarr KhandaTa
S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules
P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA
Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinations
49
711 Case of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াtঅuইিvয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অu)andthesecondwordbeginswithaV(ইিvয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstwordThis isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule
72 AdditionalExamplesfromBanglaABNF
Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive
ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of the
JanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka
[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018
[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473
57
10 Appendix-I
101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply
1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াwেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াxব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexampleyেসিরং(Tsering)
102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti
13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section
58
Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)
Table17ndashTheScriptTableofSylhetiNagarıorSiloti
103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints
ক k Uuml(p+ক)raquo(p+ট)(p+ত)Yacute(p+y+র)(p+y+ব)THORN(p+ন)szlig(p+ব)agrave(p+ম)ক8(p+য)aacute(p+র)acirc(p+ষ)atilde(p+frac12+ণ)auml(p+frac12+ম)aring(p+frac12+ব)acirc8(p+frac12+য)aelig(p+স)t(ccedil+ক)egrave(+p+র)
amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)
গ g ecirc(micro+গ)euml(micro+দ)acute(micro+ধ)igrave(micro+ন)iacute(micro+ব)icirc(micro+ম)গ8(micro+য)iuml(micro+র)eth(micro+ল)ntilde(ccedil+গ)ntilde6 ( +ccedil+গ)
ণ n ouml(Agrave+ট)iquest(Agrave+ঠ)divide(Agrave+ড)oslash(Agrave+Atilde+র)AElig(Agrave+ঢ)ugrave(Agrave+ণ)ণ8(Agrave+য)uacute(Agrave+ব)atilde(p+frac12+ণ)ucirc(frac12+ণ)uuml(+ণ)
Vowelscanbewrittenasindependentlettersorbyusingavarietyofdiacriticalmarks which are written above below before after or both of the last twopositionstheconsonanttheyfollowinpronunciation[105]
WhenconsonantsoccurtogetherinclustersspecialconjunctlettersareformedInprintedBanglamanyof theseconsonantal clustersorconjoinedconsonantsareinuseThelettersfortheconsonantsotherthanthefinaloneinthegrouparegenerally reduced But there are a few special conjunct characters which arecompounds of the consonant characters eg 7(k)+ষ(s)=8(ks)
9(n)+জ(j)=(nj)(j)+ঞ(n)==(jn) gt (h)+ম(m)=(hm) There are other issuesalsomdashরasthesecondmemberofaclusterisreducedtoasecondarysymboleg
(p)+র(r)=A(pr)B(s)+C(t)+র(r)=D(str) (as inউD ustra ldquocamelrdquo)য (y)whenusedas a primary symbol represents jɔ in Bangla But its secondary symbol(allograph) jɔ-phala has two phonetic values When added to the initialconsonant in a word it is a vowel aelig (as in শGামল (syamala) ldquogreenrdquo র Gাপার
(ryapara)ldquowrapperrdquoetc)Butafteranon-initialconsonant it justdoublesit in
pronunciation (as in কাযH ধাযH etc) The I(r)+য(y) combination has two
shape of the second member is changedmdasheg M(ddh) N(gdh) and O(ndh)
respectively The solitary example of I (r)+ঋ(r)=ঋH (as in ৈনঋHত nairrtSouthwest) ndash usedmostly in cases of Classical borrowings shows the use ofsecondary symbol of a consonant followed by the primary symbol of a vowelTheinherentvowelonlyappliestothefinalconsonantofthecluster
The Bangla script has at least fifty-two primary symbols and quite a fewallographs(positionalvariantsofthem)correspondingtoforty-four(7oraland7nasalvowelsand30consonants)phonemes(150)orfunctionalspeechsoundswithsomeobviousredundanciesalthoughinoneofthefirstphonemicanalysisthenumberwasthoughttobethirty-fivephonemes[140]
11
As mentioned above in Bangla several graphemic symbols have secondaryshapestechnicallycalledlsquoallographsrsquowithacomplementarydistributionineachcaseThesegraphsormarkingsaregenerallyaddedtothefollowingpositionsoftheprimarysymbol[113]inthefollowingmanner
Asforcomplementarydistributionofvowelletters(word-orsyllable-initial)andVowel Matras which are relevant for ABNF let us consider the followingBesidessomesimpleVowelModifierscalledlsquoKarsrsquoinBangla(alsoreferredtoasMatraintheotherLGRdocumentsofNeo-Brāhmī)therearesomecombinatorymodifiersofBanglaVowelswithcertainconsonantsForexamplewhereas
আU+0986BENGALILETTERAAissubstitutedby
াU+09BEBENGALIVOWELSIGNAA
ইU+0987BENGALILETTERIissubstitutedby
pre-posedিU+09BFBENGALIVOWELSIGNI
ঈU+0988BENGALILETTERIIissubstitutedby
ীU+09C0BENGALIVOWELSIGNIIor
উU+0989BENGALILETTERUissubstitutedby
U+09C1 BENGALI VOWEL SIGN U by marking below the primary
grapheme there are some special vowel modifiers of উ as in the followingcombinedletters zwnj guratherthanwritingasগ(g)+ (u)
m (bh)+র (r)(n bhruldquoeyebrowrdquo)o (s)+র (r)(psru)ঋ (r)afterহ (h)(q hr)etc
TherehavebeenmanynotablecontributionsinsimplifyingandmodifyingBanglaspellings and combinatory techniques especially by scholars such as PabitraSarkar(1992)[134]Inthistherehasbeenanattempttoreducethenumberofallographs of both vowels and consonants in clusters and it has been widelyacceptedintheprintingofschooltextsinbothBangladeshandWestBengal[151152]Asofnow twosystems theold (traditional) and thenewgoon sidebysideoperativeindifferentdomains
HoweverinpreparationofthisLGRdocumenttheaimhasbeentoconsiderthewidelyused and usable sequences and combinations and their variations across the sisterscriptsbelongingtothebasketofBrāhmīwritingsystemsBanglaAcademyDhakapublishedStandardBanglaSpellingRulesin1992followingtherecommendationsofacommitteeconstitutedthroughaworkshopjointlyorganizedbytheJatıyaSy iksakramaandPathyapustakaBoardin1988AthroughlyrevisededitionoftheRuleswaspublishedinSeptember20126After the establishment of Banla A kademi ofWestBengal in 1986 its first PresidentAnnadasankar Ray (1904-2002) in his inaugural address gave a direction forstandardizationofBanglaalphabetscript thespellingsystemandclearlyarguedthattheywouldnotblindly followtheSanskriticmodelofconventionalgrammarAbroadlistofproposalswassenttoexpertsonBanglaandabroadagreementwasreachedforlsquohomogenizationofBanglaspellingrsquoby1988BasedonopinionsreceivedfromdifferentquartersaunanimouslistoflsquorulesrsquowasagreeduponThiswaspublishedbyalsquoSpellingDictionaryrsquo titled Ākādemi Bānāna Abhidhāna (1997) which was obviously morecomprehensive than lsquoTheUniversityofCalcuttaproposalsrsquomade in1936Alongwiththe lsquorationalizationrsquo of spellings another stepwas taken tomake thewriting systemeasier to read by making the symbols used both single and combined ones morelsquotransparentrsquoThesereformswereoriginallysuggestedbySarkar(1987firstpublishedin1978)[134][153]whereheusedthetermsSwaccha (lsquoTransparentrsquo)andAswaccha(lsquoOpaquersquo or non-transparent) even adding Ardha Swaccha (lsquohalf transparent) inbetweenthetwoSomesampleexamplesare Transparent r (nn) s (pt) [ (st) where both member of the cluster can berecognized
6Bangla Academy 2012 Bāṅlā Ekaḍemī Pramita Bāṅlā Bānānera Niyama (Bangla Academy StandardBanglaSpellingRules)DhakaːBanglaAcademy
Therewere in fact two types of proposals One concerned the shape of the lettersthose of consonant + vowel (CV) combinations and conjuncts which is consonant +consonantcombinationsTherewerefurthercomplexshapesiethoseofconsonant+consonant+ (consonant+) vowel (CC(CV) signs as in y (pru) or z (skru) SomedecisionsinthisareawerenecessarybecauseafewoftheCC(C)symbolsrepresentedcomplexitiesthatmadelearningthemdifficultforthechildrenTheotherdealtwiththespellings ofwords onlywithout any reference to the shapes of letters inwhich theywere written The basic objective here was lsquoone word one spellingrsquo to the greatestextentthatwaspossible[151]
Belowwe place a statement of themost salient changes that affect the consonant +vowelcombinations[153]
a The variants of the short u (^ উ-কার hrasva u-kāra) vowel sign have been brought down to one ie So zwnj (gu) is now গ Similarly h (ru) gt র zwnj (śu)gt শ j (hu)gtহ and therefore cluster + short u sign k (ntu)gt W
(ন++ত+উ) (stu)gt[ (স++ত+উ)
b The variants of long u (দীঘH ঊ-কার dīrgha u-kāra) have also been reduced
(rū)gt র n (bhrū) gt Y (ভ bh++র r+ঊ ū) (drū)gt (দ d++র r+ঊ ū) p (śrū)gt (শ ś++র r+ঊ ū)
c The variants of ঋ-কার (ṛ-kāra secondary symbol of ṛ) have been brought down to one q (hṛ) gt হ
Regarding consonant + consonant + (consonant)hellip+ (vowel) clusters PaschimbangaBanglaAkademi proposed transparent or semi-transparent shapes for clusters to theextentadmissibleinBanglawritingsystemSomeexampleswillclarifytheproposal(Aslashwillmeanthatthetraditionalcluster-shapeprecedesitwhiletheBanglaAkademiinnovationfollows)[153]
331TheConsonantsAsper traditional classificationBangla Consonants are categorized according to theirphoneticpropertiesespeciallyintermsofplaceandmannerofarticulation[107]Thereare Five lsquoVargarsquo (pronounced as lsquoBargarsquo in Bangla) or Groups (sets or classes)distinguished by Place of Articulation and one Non-lsquovargarsquo group [105] Each Vargawhich corresponds toStopsat a certainplaceof articulation containsa seriesof fiveconsonants classified as per their phonetic qualities (ie manner of articulation)beginning from Unvoiced and Unaspirated to Voiced and Aspirated (in the fourthcolumn)finallyendingwithaHomorganicorCorrespondingnasal[107]Considerthefollowingtable
332TheImplicitVowelKillerHasanta(calledrsquoHalantrsquoorlsquoHalantarsquoinotherBrahmı-basedscripts)As stated earlier all consonants are pronounced in isolation with an implicit vowel(centralback-ɔinBanglaastheneutralvowel)assumedtobeassociatedwiththem[121]ThelsquoHasantarsquo(=rsquoHalantrsquoorlsquoHalantarsquoinotherBrahmı-basedscripts)orthetermlsquoVirāmarsquo7(=rsquoDa rirsquoinBangla)aspreferredinUNICODE(cfUnicode30andabove)havebeenusedinthisreportastermsthathavebeenusedtodenotethecharacterthatmarkthe absence of this inherent vowel It may be noted that the term virama has beenadopted in UNICODE in a sense that is different from the traditional definition ofgrammarandhenceitrequiressomeexplanationhereConsideringtheimportanceofthedocumentthisnoteshouldbeapartofthisLGRdocumentsothatanybodyreferingtoitshouldbeabletoknowthepropergrammaticalexplanationofthetermBecauseaspecialsignisneededwheneverthisimplicitvowelisstrippedoffthesymbolisknownas the Hasanta (= Halant) (U+09CD) By placing the Hasanta under the firstconsonantofacombinationorclusteronecouldndashincommonparlanceldquokillrdquoitsvowelandcreate conjuncts In thismanner conjunct characters canbegenerallywrittenbyjoining two to fourconsonant combinations In rarecases thisprocess can joinup tofive consonantsHowever thenotionof amaximumnumberof consonants joining toformoneaksara8istobeboundedempiricallyThisisanobservationbasedontheCIIL-Emille Corpora of Bangla words [132 amp 133] as seen in print these days Given themixture of scripts and languages happening on theweb the possibility that onemaywant a generic Top Level Domain [gTLD] which may have more than the observedmaximum cannot be ruled out This can be the case when a foreign language wordwhichadmitsalargenumberofconsonantsistransliteratedintoBanglaHenceintheBanglaLGRworkthislimitwillnotbeenforced
333VowelsSeparate symbols exist for all lsquoSwararsquo or Vowels in Bangla which are pronouncedindependentlyeitheratthebeginningofthewordorafteranothervowelorconsonantsoundToindicateaVowelsoundotherthantheimplicitoneaVowelsigncalledlsquokārrsquoinBanglaorMātrāinNagarı9isattachedtotheconsonantSincetheconsonanthasthisbuilt in neutral vowel at the end there are equivalent kāras (Mātrās) for all vowelsexcepttheঅ(pronounced-ɔ)Thecorrelationisshownasfollows
7VirāmaasusedhereisalsoamisnomeraccordingtotheIndiangrammaticaltraditionsNowheremereabsence of a vowel is marked as virama Hasanta just marks the absence of a vowel nothing else(AbhyankarKashinathVasudevampJMShukla1961ADictionaryofSanskritGrammarBarodaːOrientalInstitute)8ThistermneedstobedisambiguatedAksaraalsomeanslsquosyllablelsquoinIndiangrammaticaltreaditions9AlthoughthetermlsquoMatralsquoinBanglastandsforanaltogetherdifferentconceptvizthetopbarplacedoveraletterndashtypicallyavailableinHindiandBanglabutmissinginGujarati
334TheAnusvāraonuʃʃār(ং-U+0982)TheAnusvāra or onuʃʃār inBangla at times represents a homorganic nasal but notalwaysItreplacesaconjunctgroupofalsquoNasalConsonant+Hasanta+ConsonantrsquowherethesecondconsonantbelongstotheVelarvargaorsetasinলংকাButitoftenappearsalso for such combinations involving non-velars appearing as the lastmember of thecombinationasinলGাংটা ldquonakedrdquoorলGাংচা ldquoakindofsweettolimprdquoBeforeanon-vargaconsonant the Anusvara represents a nasal sound that may have an alternativeconjoined writing symbol representing the corresponding nasal consonant of theparticularsetAlthoughModernHindiMarathiandKonkaniprefertheanusvāratothecorrespondingHalf-nasal inBangla it isclearlydemarcatedastowhereonemustusetheAnusvāraandwhere ithastobeaconjunctclusterwithanasalasthefirstorthesecondcomponent
335NasalizationCandrabindu(-U+0981)
Candrabindu denotes nasalization of the preceding vowel as in চাদ cad lsquomoonrsquo(U+099AU+09BEU+0981U+09A6)Thissignwithadotinsidethehalf-moonmarkisusedasnasalizationmarkerinmanyBrahmı-basedscripts[143]
336Nukta(-U+09BC)ThenuktasigndoesnotexistinBanglaorthographyItispredominantlyusedinmanyBrahmıderivedscriptssuchasDevanagarı(forHindiBodoMaithiliSantaliKashmiriandSindhiThetermandtheconceptofnuktaareborrowedinBanglaTheIDNAProtocol(RFC5891)statesthatIDNsmustbeinUnicodeNormalizationFormC (NFC) RFC 7940 applies this requirement to LGRs The definition of NFC in theUnicodeStandardcontainsanumberofcompositionexclusionsAsaresulttheBanglalettersয় YYAড় RRAandঢ় RRHAhavetoberepresentedinthethisLGRbyusingthesequences (YA +Nukta U+9AF + U+09BC) (DDA + Nukta U+9A1 + U+09BC) and(DDHA+NuktaU+9A2+U+09BC)insteadofthesinglecodepointsYYA(U+9DF)RRA(U+09DC) andRRHA (U+09DD) although the useof lsquoNuktarsquo is otherwise completelyunnaturalinBanglaIt is noted that in the current Unicode Standard chart these characters are listed asadditionalconsonantsAspertheLGRProcedurehoweverthesedecisionsdependontheIDNAProtocolthroughasetofprodeduresdevelopedbytheIETFEventhoughtheUnicode Standard also prescribesmethods to produce these three characters both asatomiccharacters (forexample09DC forড় [r]09DD forঢ় [rh] and09DFasয় [y]assinglekeystroke)theIDNAprotocolrequiresthatwetreatthemasconjunctcharactersandthenallocatecodesfortheseintheUnicodeBengaliBlock
18
ItmaybenotedthattherecouldbesporadicattemptsorcasesofwritingMuslimnamesUrdupoeticwordsandPerso-Arabicloanwordswithnuktaunderক(k)খ(kh)গ(g)জ(j) and ফ (ph) only for the sake of correct pronunciation and for maintaining thesanctityoftheloanwordThesewerealsolikeusingBanglawritingsystemtoworkliketheIPAscriptItishowevernotinuseinBanglawritinginprinting
337Visargabiʃɔrgo(ঃ-U+0983)andAvagraha(ঽ-U+09BD)
TheVisargabiʃɔrgoU+0983 is frequentlyused inBangla loanwordsborrowed fromSanskritandrepresentsasoundveryclosetohOnecouldquoteasanexampleদঃখduhkholdquosorrowrsquorsquoldquounhappinessrsquorsquo(U+0926U+0941U+0983U+0916)The Avagraha ঽ (U+09BD) is mainly used in Sanskrit Pali Prakrt or Maithili textswritteninBanglaItisgraduallybeingreplacedbyanuppercomma(egনেরাঽপরািণre-writtenasনেরাrsquoপরািণ)ItisrarelyusednoweveninotherlanguagesusingBanglascriptIncaseofLGRtheAvagrahaisnotpartoftherepertoireIthasbeendecidedthereforenottoretainAvagraha(ঽ)(U+09BD)becauseitisblockedinTLDsaspertheMaximalStartingRepertoire(MSR)PleaseseeAppendixIIinsection11foracompletelistofBanglaconsonantsandtheirallographs
type in the following mannermdashর++য but for র G the sequence would be
র+ZWJ++য [154] In other words ZWJ is used in the rendering of wordsdemanding ya-phalā after ra which is otherwise not possible to type (render)due to the same order of ra+hasanta+antastha ja in the medial andor finalposition Interestingly ra+hasanta+antastha ja is used to type repha on theconsonant -antasthaja as inকায6 (kaarjo) In order to get a ya-phalā after the
consonant -ra it is therefore obligatory to use ZWJ after -ra as in র Gাপার
TheuseofZWJZWNJhavebeenruledoutfromtherootzonebythe[Procedure]Usedin Bangla to create alternate renderings the insertion of these two signs can affectsearchingaswellasNLPTheZeroWidthNon-joiner(ZWNJ)isaninvisiblecharacterusedincertaincases(afterHasanta)wheredefaultconjunctformationistobeexplicitlyrestrictedandtheHasantajoiningthetwoconsonantsparticipatingintheconjunctformationneedstobeexplicitlyshown
339UseofYa-phalaaYa-Phalaasequencesare two instances inBanglawhereHasanta isprecededbya fullvowel(U+0985অ-BENGALILETTERAandU+098Fএ-BENGALILETTERE)
For renderingYa-phalā followedbyঅ andএ it isnecessary to typeU+09CDHasantaplusU+09AFyaprecededby thesaidvowelsThis isapurely ligaturalentityand theadditionofYa-phalāandākaraisusedtoelicittheaeligsoundasinEnglishacidঅGািসড
association অGােসািসেয়শনlsquobatrsquoবGাটlsquofatrsquo ফGাট lsquomatrsquo মGাটlsquocaprsquoকGাপetcTheBrāhmīscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribed as lsquovowel killerrsquo although it actually indicates absenceof a vowel after themarkedconsonantOnly theconsonantscanhave theHasantamarkedButasweseehereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅ8াandএ8া(CfUnicode100p473[100])
Owingtoco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egT ieka+Hasanta+raasinচTchakraldquocyclerdquo)Thepointisinboth the cases the slot for ra could be Bangla ra র (U+09B0) or the Assamese ra ৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesameTheLGRmakesanoteofthispoint of concern with respect to the two RAs in disguise as it would be compeltelyimpossibletodistinguishbetweenthemwithnakedeyesinalablesogeneratedwhichmay consequently lead to concerns related to spoofing and other kind of cyberirregularitiesThemotivetoclassthesetwoCPsas(blocking)variantsisbecausefullyrendered labels may mask the distinction between Bangla ra র (U+09B0) or theAssameseraৰ(U+09F0)ThatprovidesthejustificationforVariantSet4thoughonlyinthecontextoffollowingHasantThedifferencebetweentheRAsisonlydistinguishableifonelooksintotheirUnicodevaluesThereforelabelssuchasঅকHarka শীষH sırsalsquotopapexrsquo অY abhra lsquocloudthe skyrsquo ম śrama lsquophysical labourrsquo could be extremelydangerous as theweb-usermay never verify the digital content (the labels) with itsunicodevaluecodepoints ThispointismadeexplicitlywithreferencetoTable9(ofsequencesp36)andTable16(ofWLESymbolsp47)thataretofollowMoreoverit
4 OverallDevelopmentProcessandMethodologyThe Neo-Brāhmī Generation Panel (NBGP) has been formed by members havingexperience in Linguistics (especially in NLP Computational linguistics) LiteratureLanguageHistoryandEpigraphyUndertheNeo-BrāhmīGenerationPanelBanglaandeightotherscriptsbelongingtoseparateUnicodeblocksarebeingtakenuptoassignaseparate LGR for each However an attempt ismade to ensure that the fundamentalphilosophybehindbuildingthoseLGRsconsistentwithallotherBrāhmī-derivedscriptsThepresentLGRwillcater tomultiple languagesbelongingtoEGIDSscale1to4(seeTable4)thatuseBanglascriptThefollowingguidingprinciplesareusedinmakingdecisionsaboutBanglaLGRCode-points
41 GuidingPrinciplesTheNBGP adopts followingbroadprinciples for selection of code-points in the code-pointrepertoireacrosstheboardforalltheNeo-Brāhmīscriptswithinitsambit
411 InclusionPrinciples4111 ModernUsageEvery character proposed should be in the everyday usage of a particular linguisticcommunityThecharacterswhichhavebeenencodedintheUnicodefortranscriptionpurposesonlyorforarchivalpurposeswillnotbeconsideredforinclusioninthecode-pointrepertoire4112 UnambiguousUseEvery character proposed should have unambiguous understanding among linguistsaboutitsusageinthelanguage
42 ExclusionPrinciplesThe main exclusion principle is that of External Limits on Scope These consist ofprotocolsor standardswhichareprerequisites to theLabelGenerationRule-setsAllfurtherprinciplesareinfactsubsumedundertheselimitationsbuthavebeenspeltoutseparatelyforthesakeofclarity
22
421 ExternalLimitsofScopeThecodepointrepertoireforrootzonebeingaveryspecialcaseatthetopofprotocolhierarchies thecanvasofavailablecharacters forselectionasapartof theRootZonecodepointrepertoireisalreadyconstrainedbyvariousprotocollayersbeneathitThefollowingthreemainprotocolsstandardsactassuccessivefiltersiTheUnicodeChartOut of all the characters that are needed by the script in question if a particularcharacter is not encoded in Unicode it cannot be incorporated in the code pointrepertoire Such cases are quite rare and especially so in Bangla-Asamiyā-ManipuriWritingSystemgiventheelaborateandexhaustivecharacterinclusioneffortsmadebytheUnicodeconsortiumiiIDNAProtocolUnicode being the character-encoding standard for providing the maximum possiblerepresentation of a given scriptlanguage it has encoded as far as possible all thepossible characters needed by the script However the Domain name being aspecialized case it is governed by an additional protocol known as IDNA(InternationalizedDomainNames inApplications) The IDNAprotocol excludes somecharactersoutofUnicoderepertoirefrombeingpartofthedomainnamesiiiMaximalStartingRepertoire(MSR)TheRoot-zoneLGRbeing the repertoireof characterswhicharegoing tobeused forcreationoftheRoot-zoneTLDswhichinturnconstituteanevenmorespecializedcaseof domain names the ROOT LGR procedure introduces additional exclusions on theIDNArsquosallowedsetofcharacters ExampleBanglaSignAvagrahaঽ(U+093D)evenifallowedbyIDNAprotocol isnotpermittedintheRootZoneRepertoireaspertheMSRTosumuptherestrictionsstartoffwithadmittingonlysuchcharactersasarepartofthe code-block of the given scriptlanguage The IDNA Protocol further narrows thisdownandfinallyanadditionalfilterintheformofMaximalStartingRepertoirerestrictsthecharactersetassociatedwiththegivenlanguageevenmore4211 NoPunctuationMarksTheTLDsbeingidentifierspunctuationmarkerspresentinBraHami-basedscriptswillnotbeincluded
23
4212 NoSymbolsandAbbreviationsAbbreviations weights and measures and other such iconic characters like BANGLAISSHAR(U+09FA)BANGLACURRENCYDENOMINATORSIXTEEN৹(U+09F9)etcwillalsonotbeincluded4213 NoRareandObsoleteCharactersThere are characterswhich have been added toUnicode to accommodate rare formssuchasSanskriticVOCALICRRৠ(U+09E0)andVOCALICLldquoঌrdquo (U+098C)aswellasVOCALICLLৡ(U+09E1)andtheallographicndashkaraformsofthelattertwosymbols-VOWELSIGNVOCALICL(U+09E2)andVOWELSIGNVOCALICLLldquo(U+09E3)Allsuch charactersareexcludedwhich complieswith theConservatismprincipleas laiddownintheRootZoneLGRprocedureHoweverinBanglathe-karacorrespondingtoVOCALICRRৠ(U+09E0)whichisVOWELSIGNVOCALICRRldquordquo(U+09C4)isstill inactiveuseincertainlimitedborrowedorSanskriticwordsandarethereforeretained4214 NoStressMarkersofClassicalSanskritandVedicStressmarkers for classical Sanskrit will not be included This is also in consonancewiththeLetterprincipleaslaiddownintheRootZoneLGRprocedure4215 ABNFThe Augmented Backus-Naur Formalism (ABNF) is described in Section 541 andAppendix(Section101)
5 RepertoireTheBanglaWritingSystemisrepresentedinUNICODEusingtheBengali(Bangla)scriptname as enumerated in ISO 15924 corresponding to languages such as Asamiyā(Assamese) Bangla (Bengali) and Manipuri The BENGALI block used for Bangla-Asamiyā-Manipuri in theUNICODEhas93 entriesThis sectiondetails the code-pointrepertoirethattheNeo-BrāhmīGenerationPanel[NBGP]proposestobeincludedintheBanglaLGRItmaybementionedherethat theGovernmentofAssamhassubmittedaproposal toBureauof Indian Standards (BIS) on26th February2016 for dis-unificationofBanglaand Asamiyā Scripts The BIS inits 8thMeetingofIndian Language Technologies andProducts Sectional Committee LITD 20 held on 23rd Aug 2017 decided torefer theproposalforrecognitionofAssamesescriptinISOIEC10646toISOUntiltheUNICODEConsortiumtakesanyfurtheractionitwillbeassumedthattheCodePointRepertoireunderTable11willbevalidforallthethreelanguagesasabove
24
For each of the code points language references have been given in the last columntitledReferenceunderTable8titledtheldquoCodePointRepertoirerdquoForentirecoverageofBanglacodepointsreferencesofBanglaAsamiyā(Assamese)Manipuri(Meitei)andBishnupriya are given Kokborok written in Bangla script is not known to haveintroducedmanynewcomplicationsexceptforoneparticularcharacterThoughonlyafewrepresentativelanguagesunderEGIDSScale1-4havebeenchosenforreferencingthey together cover all the code-points required for all the languages that NBGP hasconsideredasgivenunderBanglaUnicodePoints(asgiveninUNICODE63)Howeverbefore thedetailsarepresented it is ideal to lookat theBanglaCodePointChartfromMaximalStartingRepertoire[MSR]Version3Itmaybenotedthattheshapesofthereferenceglyphsgivenbelowinthecodechartsarebasedononeofthemanyfontsdesigned and are not prescriptive because there could be some variations in actualfonts ndash both UNICODE-compatible and True-Type ones Consider the following Codepointtable
ColourconventionAllcharactersthatareincludedinthe[MSR]-YellowbackgroundPVALIDinIDNA2008butexcludedfromthe[MSR]-PinkishbackgroundNot PVALID in IDNA2008 or are ineligiblefor the root zone (digits hyphen) - Whitebackground
Apart from the above individual code-points the Neo-Brāhmī Generation Panel alsoproposes some specific sequences which enable conditional inclusion of the BanglaLETTER A and E followed by Bangla SIGN VIRAMA and Bangla LETTER YA againfollowed by Bangla VOWEL SIGN AA in the repertoire for enabling inclusion of aeligsoundasinEnglishlsquobatrsquolsquocatrsquoetc
52 CodePointRepertoireExclusionTherearesomecharactersoftheBanglascriptthatfindplaceintheUnicodebuthavenot been included in the repertoire in the LGR proposal The reason for excludingঌ(U+098C)andৗ(U+09D7)isthattheyarerareandobsoletecharacters
53 CodepointnotusedaloneBENGALI SIGN NUKTA U+09BC (See 336) is excluded from repertoire since it will never be used alone It will be used as sequence in three special characters in normalized form for ড় ঢ় য়
542 TheVowelSequenceInwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaare given To facilitate understanding of other Brahmi script users equivalents inDevanāgarīareprovidedwherevernecessaryAvowelsequenceismadeupofasinglevowelItmaybefollowedbutnotnecessarily(optionally)byanAnusvāraonuʃʃār(B)Candrabindu(D)oraVisargabiʃɔrgo(X)ThenumberofDBorXwhichcanfollowaV inBanglamaynotberestrictedtooneGoingbytherules illustratedinthedocument it isclearthat formationssuchasVDDVBBandVXXare invalidorthographicunitsHowever it isvalidandpossible tohaveformationsorsequencessuchasanusvarafollowedbyachandrabinduononehandandvisarga followed by a chandrabindu on the other as in হ8াংচা lsquohaelignchārsquo and lsquohaelignrsquo হ8াঃrespectivelyThepossibilityof aVisarga orAnusvāra (onuʃʃār) followingaCandrabinduexists inBangla Vowel can optionally be followed by a combination of Hasanta Virama [H]Consonant [C] to formaYa-phala ldquoYa-phala isapresentation formofU+09AFBanglaletterযorlsquoyarsquoRepresentedbythesequenceltU+09CDieBENGALISIGNVIRAMABangla SIGNHasanta or VIRAMA U+09AF -য BENGALI LETTER YAgt Ya-phala has aspecialformয়AgainwhencombinedwithU+09BEাBENGALIVOWELSIGNAA(ielsquoaarsquo(ā))itisusedfortranscribing[aelig]asintheldquoardquointheEnglishwordldquobatrdquowritteninBanglaasব8াটAVowel-sequenceadmitsthefollowingcombinations
5421 ASingleVowel
ExamplesV অअ
5422 AVowelwithConditionsAVowelcanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D]orVisarga[X]or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX] or combination ofHasanta(orVirama)[H]followedbyConsonant[C]followedbykāra(Mātrā)[M]
5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]
5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX
Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)
545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full
10 Refer to Rule P in Section 7 Table 16
42
vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+
6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed
61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus
স +Hasanta+হ(U+09B8+U+09CD+U+09B9)
2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus
ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)
44
Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)
62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference
7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecifications
11 Unicode uses Oriya for the script although Odia is now the official term used 12 As used by the Unicode denoting and including both Assamese and Maṇipuri
47
Below are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)
C rarr Consonant
M rarr Kāra(Mātrā)
V rarr Vowel
B rarr Anusvāra
D rarr Candrabindu
X rarr Visarga
H rarr Hasanta
Z rarr KhandaTa
S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules
P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA
Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinations
49
711 Case of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াtঅuইিvয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অu)andthesecondwordbeginswithaV(ইিvয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstwordThis isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule
72 AdditionalExamplesfromBanglaABNF
Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive
ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of the
JanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka
[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018
[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473
57
10 Appendix-I
101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply
1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াwেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াxব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexampleyেসিরং(Tsering)
102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti
13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section
58
Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)
Table17ndashTheScriptTableofSylhetiNagarıorSiloti
103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints
ক k Uuml(p+ক)raquo(p+ট)(p+ত)Yacute(p+y+র)(p+y+ব)THORN(p+ন)szlig(p+ব)agrave(p+ম)ক8(p+য)aacute(p+র)acirc(p+ষ)atilde(p+frac12+ণ)auml(p+frac12+ম)aring(p+frac12+ব)acirc8(p+frac12+য)aelig(p+স)t(ccedil+ক)egrave(+p+র)
amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)
গ g ecirc(micro+গ)euml(micro+দ)acute(micro+ধ)igrave(micro+ন)iacute(micro+ব)icirc(micro+ম)গ8(micro+য)iuml(micro+র)eth(micro+ল)ntilde(ccedil+গ)ntilde6 ( +ccedil+গ)
ণ n ouml(Agrave+ট)iquest(Agrave+ঠ)divide(Agrave+ড)oslash(Agrave+Atilde+র)AElig(Agrave+ঢ)ugrave(Agrave+ণ)ণ8(Agrave+য)uacute(Agrave+ব)atilde(p+frac12+ণ)ucirc(frac12+ণ)uuml(+ণ)
As mentioned above in Bangla several graphemic symbols have secondaryshapestechnicallycalledlsquoallographsrsquowithacomplementarydistributionineachcaseThesegraphsormarkingsaregenerallyaddedtothefollowingpositionsoftheprimarysymbol[113]inthefollowingmanner
Asforcomplementarydistributionofvowelletters(word-orsyllable-initial)andVowel Matras which are relevant for ABNF let us consider the followingBesidessomesimpleVowelModifierscalledlsquoKarsrsquoinBangla(alsoreferredtoasMatraintheotherLGRdocumentsofNeo-Brāhmī)therearesomecombinatorymodifiersofBanglaVowelswithcertainconsonantsForexamplewhereas
আU+0986BENGALILETTERAAissubstitutedby
াU+09BEBENGALIVOWELSIGNAA
ইU+0987BENGALILETTERIissubstitutedby
pre-posedিU+09BFBENGALIVOWELSIGNI
ঈU+0988BENGALILETTERIIissubstitutedby
ীU+09C0BENGALIVOWELSIGNIIor
উU+0989BENGALILETTERUissubstitutedby
U+09C1 BENGALI VOWEL SIGN U by marking below the primary
grapheme there are some special vowel modifiers of উ as in the followingcombinedletters zwnj guratherthanwritingasগ(g)+ (u)
m (bh)+র (r)(n bhruldquoeyebrowrdquo)o (s)+র (r)(psru)ঋ (r)afterহ (h)(q hr)etc
TherehavebeenmanynotablecontributionsinsimplifyingandmodifyingBanglaspellings and combinatory techniques especially by scholars such as PabitraSarkar(1992)[134]Inthistherehasbeenanattempttoreducethenumberofallographs of both vowels and consonants in clusters and it has been widelyacceptedintheprintingofschooltextsinbothBangladeshandWestBengal[151152]Asofnow twosystems theold (traditional) and thenewgoon sidebysideoperativeindifferentdomains
HoweverinpreparationofthisLGRdocumenttheaimhasbeentoconsiderthewidelyused and usable sequences and combinations and their variations across the sisterscriptsbelongingtothebasketofBrāhmīwritingsystemsBanglaAcademyDhakapublishedStandardBanglaSpellingRulesin1992followingtherecommendationsofacommitteeconstitutedthroughaworkshopjointlyorganizedbytheJatıyaSy iksakramaandPathyapustakaBoardin1988AthroughlyrevisededitionoftheRuleswaspublishedinSeptember20126After the establishment of Banla A kademi ofWestBengal in 1986 its first PresidentAnnadasankar Ray (1904-2002) in his inaugural address gave a direction forstandardizationofBanglaalphabetscript thespellingsystemandclearlyarguedthattheywouldnotblindly followtheSanskriticmodelofconventionalgrammarAbroadlistofproposalswassenttoexpertsonBanglaandabroadagreementwasreachedforlsquohomogenizationofBanglaspellingrsquoby1988BasedonopinionsreceivedfromdifferentquartersaunanimouslistoflsquorulesrsquowasagreeduponThiswaspublishedbyalsquoSpellingDictionaryrsquo titled Ākādemi Bānāna Abhidhāna (1997) which was obviously morecomprehensive than lsquoTheUniversityofCalcuttaproposalsrsquomade in1936Alongwiththe lsquorationalizationrsquo of spellings another stepwas taken tomake thewriting systemeasier to read by making the symbols used both single and combined ones morelsquotransparentrsquoThesereformswereoriginallysuggestedbySarkar(1987firstpublishedin1978)[134][153]whereheusedthetermsSwaccha (lsquoTransparentrsquo)andAswaccha(lsquoOpaquersquo or non-transparent) even adding Ardha Swaccha (lsquohalf transparent) inbetweenthetwoSomesampleexamplesare Transparent r (nn) s (pt) [ (st) where both member of the cluster can berecognized
6Bangla Academy 2012 Bāṅlā Ekaḍemī Pramita Bāṅlā Bānānera Niyama (Bangla Academy StandardBanglaSpellingRules)DhakaːBanglaAcademy
Therewere in fact two types of proposals One concerned the shape of the lettersthose of consonant + vowel (CV) combinations and conjuncts which is consonant +consonantcombinationsTherewerefurthercomplexshapesiethoseofconsonant+consonant+ (consonant+) vowel (CC(CV) signs as in y (pru) or z (skru) SomedecisionsinthisareawerenecessarybecauseafewoftheCC(C)symbolsrepresentedcomplexitiesthatmadelearningthemdifficultforthechildrenTheotherdealtwiththespellings ofwords onlywithout any reference to the shapes of letters inwhich theywere written The basic objective here was lsquoone word one spellingrsquo to the greatestextentthatwaspossible[151]
Belowwe place a statement of themost salient changes that affect the consonant +vowelcombinations[153]
a The variants of the short u (^ উ-কার hrasva u-kāra) vowel sign have been brought down to one ie So zwnj (gu) is now গ Similarly h (ru) gt র zwnj (śu)gt শ j (hu)gtহ and therefore cluster + short u sign k (ntu)gt W
(ন++ত+উ) (stu)gt[ (স++ত+উ)
b The variants of long u (দীঘH ঊ-কার dīrgha u-kāra) have also been reduced
(rū)gt র n (bhrū) gt Y (ভ bh++র r+ঊ ū) (drū)gt (দ d++র r+ঊ ū) p (śrū)gt (শ ś++র r+ঊ ū)
c The variants of ঋ-কার (ṛ-kāra secondary symbol of ṛ) have been brought down to one q (hṛ) gt হ
Regarding consonant + consonant + (consonant)hellip+ (vowel) clusters PaschimbangaBanglaAkademi proposed transparent or semi-transparent shapes for clusters to theextentadmissibleinBanglawritingsystemSomeexampleswillclarifytheproposal(Aslashwillmeanthatthetraditionalcluster-shapeprecedesitwhiletheBanglaAkademiinnovationfollows)[153]
331TheConsonantsAsper traditional classificationBangla Consonants are categorized according to theirphoneticpropertiesespeciallyintermsofplaceandmannerofarticulation[107]Thereare Five lsquoVargarsquo (pronounced as lsquoBargarsquo in Bangla) or Groups (sets or classes)distinguished by Place of Articulation and one Non-lsquovargarsquo group [105] Each Vargawhich corresponds toStopsat a certainplaceof articulation containsa seriesof fiveconsonants classified as per their phonetic qualities (ie manner of articulation)beginning from Unvoiced and Unaspirated to Voiced and Aspirated (in the fourthcolumn)finallyendingwithaHomorganicorCorrespondingnasal[107]Considerthefollowingtable
332TheImplicitVowelKillerHasanta(calledrsquoHalantrsquoorlsquoHalantarsquoinotherBrahmı-basedscripts)As stated earlier all consonants are pronounced in isolation with an implicit vowel(centralback-ɔinBanglaastheneutralvowel)assumedtobeassociatedwiththem[121]ThelsquoHasantarsquo(=rsquoHalantrsquoorlsquoHalantarsquoinotherBrahmı-basedscripts)orthetermlsquoVirāmarsquo7(=rsquoDa rirsquoinBangla)aspreferredinUNICODE(cfUnicode30andabove)havebeenusedinthisreportastermsthathavebeenusedtodenotethecharacterthatmarkthe absence of this inherent vowel It may be noted that the term virama has beenadopted in UNICODE in a sense that is different from the traditional definition ofgrammarandhenceitrequiressomeexplanationhereConsideringtheimportanceofthedocumentthisnoteshouldbeapartofthisLGRdocumentsothatanybodyreferingtoitshouldbeabletoknowthepropergrammaticalexplanationofthetermBecauseaspecialsignisneededwheneverthisimplicitvowelisstrippedoffthesymbolisknownas the Hasanta (= Halant) (U+09CD) By placing the Hasanta under the firstconsonantofacombinationorclusteronecouldndashincommonparlanceldquokillrdquoitsvowelandcreate conjuncts In thismanner conjunct characters canbegenerallywrittenbyjoining two to fourconsonant combinations In rarecases thisprocess can joinup tofive consonantsHowever thenotionof amaximumnumberof consonants joining toformoneaksara8istobeboundedempiricallyThisisanobservationbasedontheCIIL-Emille Corpora of Bangla words [132 amp 133] as seen in print these days Given themixture of scripts and languages happening on theweb the possibility that onemaywant a generic Top Level Domain [gTLD] which may have more than the observedmaximum cannot be ruled out This can be the case when a foreign language wordwhichadmitsalargenumberofconsonantsistransliteratedintoBanglaHenceintheBanglaLGRworkthislimitwillnotbeenforced
333VowelsSeparate symbols exist for all lsquoSwararsquo or Vowels in Bangla which are pronouncedindependentlyeitheratthebeginningofthewordorafteranothervowelorconsonantsoundToindicateaVowelsoundotherthantheimplicitoneaVowelsigncalledlsquokārrsquoinBanglaorMātrāinNagarı9isattachedtotheconsonantSincetheconsonanthasthisbuilt in neutral vowel at the end there are equivalent kāras (Mātrās) for all vowelsexcepttheঅ(pronounced-ɔ)Thecorrelationisshownasfollows
7VirāmaasusedhereisalsoamisnomeraccordingtotheIndiangrammaticaltraditionsNowheremereabsence of a vowel is marked as virama Hasanta just marks the absence of a vowel nothing else(AbhyankarKashinathVasudevampJMShukla1961ADictionaryofSanskritGrammarBarodaːOrientalInstitute)8ThistermneedstobedisambiguatedAksaraalsomeanslsquosyllablelsquoinIndiangrammaticaltreaditions9AlthoughthetermlsquoMatralsquoinBanglastandsforanaltogetherdifferentconceptvizthetopbarplacedoveraletterndashtypicallyavailableinHindiandBanglabutmissinginGujarati
334TheAnusvāraonuʃʃār(ং-U+0982)TheAnusvāra or onuʃʃār inBangla at times represents a homorganic nasal but notalwaysItreplacesaconjunctgroupofalsquoNasalConsonant+Hasanta+ConsonantrsquowherethesecondconsonantbelongstotheVelarvargaorsetasinলংকাButitoftenappearsalso for such combinations involving non-velars appearing as the lastmember of thecombinationasinলGাংটা ldquonakedrdquoorলGাংচা ldquoakindofsweettolimprdquoBeforeanon-vargaconsonant the Anusvara represents a nasal sound that may have an alternativeconjoined writing symbol representing the corresponding nasal consonant of theparticularsetAlthoughModernHindiMarathiandKonkaniprefertheanusvāratothecorrespondingHalf-nasal inBangla it isclearlydemarcatedastowhereonemustusetheAnusvāraandwhere ithastobeaconjunctclusterwithanasalasthefirstorthesecondcomponent
335NasalizationCandrabindu(-U+0981)
Candrabindu denotes nasalization of the preceding vowel as in চাদ cad lsquomoonrsquo(U+099AU+09BEU+0981U+09A6)Thissignwithadotinsidethehalf-moonmarkisusedasnasalizationmarkerinmanyBrahmı-basedscripts[143]
336Nukta(-U+09BC)ThenuktasigndoesnotexistinBanglaorthographyItispredominantlyusedinmanyBrahmıderivedscriptssuchasDevanagarı(forHindiBodoMaithiliSantaliKashmiriandSindhiThetermandtheconceptofnuktaareborrowedinBanglaTheIDNAProtocol(RFC5891)statesthatIDNsmustbeinUnicodeNormalizationFormC (NFC) RFC 7940 applies this requirement to LGRs The definition of NFC in theUnicodeStandardcontainsanumberofcompositionexclusionsAsaresulttheBanglalettersয় YYAড় RRAandঢ় RRHAhavetoberepresentedinthethisLGRbyusingthesequences (YA +Nukta U+9AF + U+09BC) (DDA + Nukta U+9A1 + U+09BC) and(DDHA+NuktaU+9A2+U+09BC)insteadofthesinglecodepointsYYA(U+9DF)RRA(U+09DC) andRRHA (U+09DD) although the useof lsquoNuktarsquo is otherwise completelyunnaturalinBanglaIt is noted that in the current Unicode Standard chart these characters are listed asadditionalconsonantsAspertheLGRProcedurehoweverthesedecisionsdependontheIDNAProtocolthroughasetofprodeduresdevelopedbytheIETFEventhoughtheUnicode Standard also prescribesmethods to produce these three characters both asatomiccharacters (forexample09DC forড় [r]09DD forঢ় [rh] and09DFasয় [y]assinglekeystroke)theIDNAprotocolrequiresthatwetreatthemasconjunctcharactersandthenallocatecodesfortheseintheUnicodeBengaliBlock
18
ItmaybenotedthattherecouldbesporadicattemptsorcasesofwritingMuslimnamesUrdupoeticwordsandPerso-Arabicloanwordswithnuktaunderক(k)খ(kh)গ(g)জ(j) and ফ (ph) only for the sake of correct pronunciation and for maintaining thesanctityoftheloanwordThesewerealsolikeusingBanglawritingsystemtoworkliketheIPAscriptItishowevernotinuseinBanglawritinginprinting
337Visargabiʃɔrgo(ঃ-U+0983)andAvagraha(ঽ-U+09BD)
TheVisargabiʃɔrgoU+0983 is frequentlyused inBangla loanwordsborrowed fromSanskritandrepresentsasoundveryclosetohOnecouldquoteasanexampleদঃখduhkholdquosorrowrsquorsquoldquounhappinessrsquorsquo(U+0926U+0941U+0983U+0916)The Avagraha ঽ (U+09BD) is mainly used in Sanskrit Pali Prakrt or Maithili textswritteninBanglaItisgraduallybeingreplacedbyanuppercomma(egনেরাঽপরািণre-writtenasনেরাrsquoপরািণ)ItisrarelyusednoweveninotherlanguagesusingBanglascriptIncaseofLGRtheAvagrahaisnotpartoftherepertoireIthasbeendecidedthereforenottoretainAvagraha(ঽ)(U+09BD)becauseitisblockedinTLDsaspertheMaximalStartingRepertoire(MSR)PleaseseeAppendixIIinsection11foracompletelistofBanglaconsonantsandtheirallographs
type in the following mannermdashর++য but for র G the sequence would be
র+ZWJ++য [154] In other words ZWJ is used in the rendering of wordsdemanding ya-phalā after ra which is otherwise not possible to type (render)due to the same order of ra+hasanta+antastha ja in the medial andor finalposition Interestingly ra+hasanta+antastha ja is used to type repha on theconsonant -antasthaja as inকায6 (kaarjo) In order to get a ya-phalā after the
consonant -ra it is therefore obligatory to use ZWJ after -ra as in র Gাপার
TheuseofZWJZWNJhavebeenruledoutfromtherootzonebythe[Procedure]Usedin Bangla to create alternate renderings the insertion of these two signs can affectsearchingaswellasNLPTheZeroWidthNon-joiner(ZWNJ)isaninvisiblecharacterusedincertaincases(afterHasanta)wheredefaultconjunctformationistobeexplicitlyrestrictedandtheHasantajoiningthetwoconsonantsparticipatingintheconjunctformationneedstobeexplicitlyshown
339UseofYa-phalaaYa-Phalaasequencesare two instances inBanglawhereHasanta isprecededbya fullvowel(U+0985অ-BENGALILETTERAandU+098Fএ-BENGALILETTERE)
For renderingYa-phalā followedbyঅ andএ it isnecessary to typeU+09CDHasantaplusU+09AFyaprecededby thesaidvowelsThis isapurely ligaturalentityand theadditionofYa-phalāandākaraisusedtoelicittheaeligsoundasinEnglishacidঅGািসড
association অGােসািসেয়শনlsquobatrsquoবGাটlsquofatrsquo ফGাট lsquomatrsquo মGাটlsquocaprsquoকGাপetcTheBrāhmīscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribed as lsquovowel killerrsquo although it actually indicates absenceof a vowel after themarkedconsonantOnly theconsonantscanhave theHasantamarkedButasweseehereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅ8াandএ8া(CfUnicode100p473[100])
Owingtoco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egT ieka+Hasanta+raasinচTchakraldquocyclerdquo)Thepointisinboth the cases the slot for ra could be Bangla ra র (U+09B0) or the Assamese ra ৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesameTheLGRmakesanoteofthispoint of concern with respect to the two RAs in disguise as it would be compeltelyimpossibletodistinguishbetweenthemwithnakedeyesinalablesogeneratedwhichmay consequently lead to concerns related to spoofing and other kind of cyberirregularitiesThemotivetoclassthesetwoCPsas(blocking)variantsisbecausefullyrendered labels may mask the distinction between Bangla ra র (U+09B0) or theAssameseraৰ(U+09F0)ThatprovidesthejustificationforVariantSet4thoughonlyinthecontextoffollowingHasantThedifferencebetweentheRAsisonlydistinguishableifonelooksintotheirUnicodevaluesThereforelabelssuchasঅকHarka শীষH sırsalsquotopapexrsquo অY abhra lsquocloudthe skyrsquo ম śrama lsquophysical labourrsquo could be extremelydangerous as theweb-usermay never verify the digital content (the labels) with itsunicodevaluecodepoints ThispointismadeexplicitlywithreferencetoTable9(ofsequencesp36)andTable16(ofWLESymbolsp47)thataretofollowMoreoverit
4 OverallDevelopmentProcessandMethodologyThe Neo-Brāhmī Generation Panel (NBGP) has been formed by members havingexperience in Linguistics (especially in NLP Computational linguistics) LiteratureLanguageHistoryandEpigraphyUndertheNeo-BrāhmīGenerationPanelBanglaandeightotherscriptsbelongingtoseparateUnicodeblocksarebeingtakenuptoassignaseparate LGR for each However an attempt ismade to ensure that the fundamentalphilosophybehindbuildingthoseLGRsconsistentwithallotherBrāhmī-derivedscriptsThepresentLGRwillcater tomultiple languagesbelongingtoEGIDSscale1to4(seeTable4)thatuseBanglascriptThefollowingguidingprinciplesareusedinmakingdecisionsaboutBanglaLGRCode-points
41 GuidingPrinciplesTheNBGP adopts followingbroadprinciples for selection of code-points in the code-pointrepertoireacrosstheboardforalltheNeo-Brāhmīscriptswithinitsambit
411 InclusionPrinciples4111 ModernUsageEvery character proposed should be in the everyday usage of a particular linguisticcommunityThecharacterswhichhavebeenencodedintheUnicodefortranscriptionpurposesonlyorforarchivalpurposeswillnotbeconsideredforinclusioninthecode-pointrepertoire4112 UnambiguousUseEvery character proposed should have unambiguous understanding among linguistsaboutitsusageinthelanguage
42 ExclusionPrinciplesThe main exclusion principle is that of External Limits on Scope These consist ofprotocolsor standardswhichareprerequisites to theLabelGenerationRule-setsAllfurtherprinciplesareinfactsubsumedundertheselimitationsbuthavebeenspeltoutseparatelyforthesakeofclarity
22
421 ExternalLimitsofScopeThecodepointrepertoireforrootzonebeingaveryspecialcaseatthetopofprotocolhierarchies thecanvasofavailablecharacters forselectionasapartof theRootZonecodepointrepertoireisalreadyconstrainedbyvariousprotocollayersbeneathitThefollowingthreemainprotocolsstandardsactassuccessivefiltersiTheUnicodeChartOut of all the characters that are needed by the script in question if a particularcharacter is not encoded in Unicode it cannot be incorporated in the code pointrepertoire Such cases are quite rare and especially so in Bangla-Asamiyā-ManipuriWritingSystemgiventheelaborateandexhaustivecharacterinclusioneffortsmadebytheUnicodeconsortiumiiIDNAProtocolUnicode being the character-encoding standard for providing the maximum possiblerepresentation of a given scriptlanguage it has encoded as far as possible all thepossible characters needed by the script However the Domain name being aspecialized case it is governed by an additional protocol known as IDNA(InternationalizedDomainNames inApplications) The IDNAprotocol excludes somecharactersoutofUnicoderepertoirefrombeingpartofthedomainnamesiiiMaximalStartingRepertoire(MSR)TheRoot-zoneLGRbeing the repertoireof characterswhicharegoing tobeused forcreationoftheRoot-zoneTLDswhichinturnconstituteanevenmorespecializedcaseof domain names the ROOT LGR procedure introduces additional exclusions on theIDNArsquosallowedsetofcharacters ExampleBanglaSignAvagrahaঽ(U+093D)evenifallowedbyIDNAprotocol isnotpermittedintheRootZoneRepertoireaspertheMSRTosumuptherestrictionsstartoffwithadmittingonlysuchcharactersasarepartofthe code-block of the given scriptlanguage The IDNA Protocol further narrows thisdownandfinallyanadditionalfilterintheformofMaximalStartingRepertoirerestrictsthecharactersetassociatedwiththegivenlanguageevenmore4211 NoPunctuationMarksTheTLDsbeingidentifierspunctuationmarkerspresentinBraHami-basedscriptswillnotbeincluded
23
4212 NoSymbolsandAbbreviationsAbbreviations weights and measures and other such iconic characters like BANGLAISSHAR(U+09FA)BANGLACURRENCYDENOMINATORSIXTEEN৹(U+09F9)etcwillalsonotbeincluded4213 NoRareandObsoleteCharactersThere are characterswhich have been added toUnicode to accommodate rare formssuchasSanskriticVOCALICRRৠ(U+09E0)andVOCALICLldquoঌrdquo (U+098C)aswellasVOCALICLLৡ(U+09E1)andtheallographicndashkaraformsofthelattertwosymbols-VOWELSIGNVOCALICL(U+09E2)andVOWELSIGNVOCALICLLldquo(U+09E3)Allsuch charactersareexcludedwhich complieswith theConservatismprincipleas laiddownintheRootZoneLGRprocedureHoweverinBanglathe-karacorrespondingtoVOCALICRRৠ(U+09E0)whichisVOWELSIGNVOCALICRRldquordquo(U+09C4)isstill inactiveuseincertainlimitedborrowedorSanskriticwordsandarethereforeretained4214 NoStressMarkersofClassicalSanskritandVedicStressmarkers for classical Sanskrit will not be included This is also in consonancewiththeLetterprincipleaslaiddownintheRootZoneLGRprocedure4215 ABNFThe Augmented Backus-Naur Formalism (ABNF) is described in Section 541 andAppendix(Section101)
5 RepertoireTheBanglaWritingSystemisrepresentedinUNICODEusingtheBengali(Bangla)scriptname as enumerated in ISO 15924 corresponding to languages such as Asamiyā(Assamese) Bangla (Bengali) and Manipuri The BENGALI block used for Bangla-Asamiyā-Manipuri in theUNICODEhas93 entriesThis sectiondetails the code-pointrepertoirethattheNeo-BrāhmīGenerationPanel[NBGP]proposestobeincludedintheBanglaLGRItmaybementionedherethat theGovernmentofAssamhassubmittedaproposal toBureauof Indian Standards (BIS) on26th February2016 for dis-unificationofBanglaand Asamiyā Scripts The BIS inits 8thMeetingofIndian Language Technologies andProducts Sectional Committee LITD 20 held on 23rd Aug 2017 decided torefer theproposalforrecognitionofAssamesescriptinISOIEC10646toISOUntiltheUNICODEConsortiumtakesanyfurtheractionitwillbeassumedthattheCodePointRepertoireunderTable11willbevalidforallthethreelanguagesasabove
24
For each of the code points language references have been given in the last columntitledReferenceunderTable8titledtheldquoCodePointRepertoirerdquoForentirecoverageofBanglacodepointsreferencesofBanglaAsamiyā(Assamese)Manipuri(Meitei)andBishnupriya are given Kokborok written in Bangla script is not known to haveintroducedmanynewcomplicationsexceptforoneparticularcharacterThoughonlyafewrepresentativelanguagesunderEGIDSScale1-4havebeenchosenforreferencingthey together cover all the code-points required for all the languages that NBGP hasconsideredasgivenunderBanglaUnicodePoints(asgiveninUNICODE63)Howeverbefore thedetailsarepresented it is ideal to lookat theBanglaCodePointChartfromMaximalStartingRepertoire[MSR]Version3Itmaybenotedthattheshapesofthereferenceglyphsgivenbelowinthecodechartsarebasedononeofthemanyfontsdesigned and are not prescriptive because there could be some variations in actualfonts ndash both UNICODE-compatible and True-Type ones Consider the following Codepointtable
ColourconventionAllcharactersthatareincludedinthe[MSR]-YellowbackgroundPVALIDinIDNA2008butexcludedfromthe[MSR]-PinkishbackgroundNot PVALID in IDNA2008 or are ineligiblefor the root zone (digits hyphen) - Whitebackground
Apart from the above individual code-points the Neo-Brāhmī Generation Panel alsoproposes some specific sequences which enable conditional inclusion of the BanglaLETTER A and E followed by Bangla SIGN VIRAMA and Bangla LETTER YA againfollowed by Bangla VOWEL SIGN AA in the repertoire for enabling inclusion of aeligsoundasinEnglishlsquobatrsquolsquocatrsquoetc
52 CodePointRepertoireExclusionTherearesomecharactersoftheBanglascriptthatfindplaceintheUnicodebuthavenot been included in the repertoire in the LGR proposal The reason for excludingঌ(U+098C)andৗ(U+09D7)isthattheyarerareandobsoletecharacters
53 CodepointnotusedaloneBENGALI SIGN NUKTA U+09BC (See 336) is excluded from repertoire since it will never be used alone It will be used as sequence in three special characters in normalized form for ড় ঢ় য়
542 TheVowelSequenceInwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaare given To facilitate understanding of other Brahmi script users equivalents inDevanāgarīareprovidedwherevernecessaryAvowelsequenceismadeupofasinglevowelItmaybefollowedbutnotnecessarily(optionally)byanAnusvāraonuʃʃār(B)Candrabindu(D)oraVisargabiʃɔrgo(X)ThenumberofDBorXwhichcanfollowaV inBanglamaynotberestrictedtooneGoingbytherules illustratedinthedocument it isclearthat formationssuchasVDDVBBandVXXare invalidorthographicunitsHowever it isvalidandpossible tohaveformationsorsequencessuchasanusvarafollowedbyachandrabinduononehandandvisarga followed by a chandrabindu on the other as in হ8াংচা lsquohaelignchārsquo and lsquohaelignrsquo হ8াঃrespectivelyThepossibilityof aVisarga orAnusvāra (onuʃʃār) followingaCandrabinduexists inBangla Vowel can optionally be followed by a combination of Hasanta Virama [H]Consonant [C] to formaYa-phala ldquoYa-phala isapresentation formofU+09AFBanglaletterযorlsquoyarsquoRepresentedbythesequenceltU+09CDieBENGALISIGNVIRAMABangla SIGNHasanta or VIRAMA U+09AF -য BENGALI LETTER YAgt Ya-phala has aspecialformয়AgainwhencombinedwithU+09BEাBENGALIVOWELSIGNAA(ielsquoaarsquo(ā))itisusedfortranscribing[aelig]asintheldquoardquointheEnglishwordldquobatrdquowritteninBanglaasব8াটAVowel-sequenceadmitsthefollowingcombinations
5421 ASingleVowel
ExamplesV অअ
5422 AVowelwithConditionsAVowelcanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D]orVisarga[X]or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX] or combination ofHasanta(orVirama)[H]followedbyConsonant[C]followedbykāra(Mātrā)[M]
5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]
5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX
Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)
545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full
10 Refer to Rule P in Section 7 Table 16
42
vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+
6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed
61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus
স +Hasanta+হ(U+09B8+U+09CD+U+09B9)
2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus
ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)
44
Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)
62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference
7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecifications
11 Unicode uses Oriya for the script although Odia is now the official term used 12 As used by the Unicode denoting and including both Assamese and Maṇipuri
47
Below are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)
C rarr Consonant
M rarr Kāra(Mātrā)
V rarr Vowel
B rarr Anusvāra
D rarr Candrabindu
X rarr Visarga
H rarr Hasanta
Z rarr KhandaTa
S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules
P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA
Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinations
49
711 Case of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াtঅuইিvয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অu)andthesecondwordbeginswithaV(ইিvয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstwordThis isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule
72 AdditionalExamplesfromBanglaABNF
Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive
ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of the
JanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka
[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018
[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473
57
10 Appendix-I
101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply
1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াwেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াxব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexampleyেসিরং(Tsering)
102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti
13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section
58
Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)
Table17ndashTheScriptTableofSylhetiNagarıorSiloti
103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints
ক k Uuml(p+ক)raquo(p+ট)(p+ত)Yacute(p+y+র)(p+y+ব)THORN(p+ন)szlig(p+ব)agrave(p+ম)ক8(p+য)aacute(p+র)acirc(p+ষ)atilde(p+frac12+ণ)auml(p+frac12+ম)aring(p+frac12+ব)acirc8(p+frac12+য)aelig(p+স)t(ccedil+ক)egrave(+p+র)
amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)
গ g ecirc(micro+গ)euml(micro+দ)acute(micro+ধ)igrave(micro+ন)iacute(micro+ব)icirc(micro+ম)গ8(micro+য)iuml(micro+র)eth(micro+ল)ntilde(ccedil+গ)ntilde6 ( +ccedil+গ)
ণ n ouml(Agrave+ট)iquest(Agrave+ঠ)divide(Agrave+ড)oslash(Agrave+Atilde+র)AElig(Agrave+ঢ)ugrave(Agrave+ণ)ণ8(Agrave+য)uacute(Agrave+ব)atilde(p+frac12+ণ)ucirc(frac12+ণ)uuml(+ণ)
m (bh)+র (r)(n bhruldquoeyebrowrdquo)o (s)+র (r)(psru)ঋ (r)afterহ (h)(q hr)etc
TherehavebeenmanynotablecontributionsinsimplifyingandmodifyingBanglaspellings and combinatory techniques especially by scholars such as PabitraSarkar(1992)[134]Inthistherehasbeenanattempttoreducethenumberofallographs of both vowels and consonants in clusters and it has been widelyacceptedintheprintingofschooltextsinbothBangladeshandWestBengal[151152]Asofnow twosystems theold (traditional) and thenewgoon sidebysideoperativeindifferentdomains
HoweverinpreparationofthisLGRdocumenttheaimhasbeentoconsiderthewidelyused and usable sequences and combinations and their variations across the sisterscriptsbelongingtothebasketofBrāhmīwritingsystemsBanglaAcademyDhakapublishedStandardBanglaSpellingRulesin1992followingtherecommendationsofacommitteeconstitutedthroughaworkshopjointlyorganizedbytheJatıyaSy iksakramaandPathyapustakaBoardin1988AthroughlyrevisededitionoftheRuleswaspublishedinSeptember20126After the establishment of Banla A kademi ofWestBengal in 1986 its first PresidentAnnadasankar Ray (1904-2002) in his inaugural address gave a direction forstandardizationofBanglaalphabetscript thespellingsystemandclearlyarguedthattheywouldnotblindly followtheSanskriticmodelofconventionalgrammarAbroadlistofproposalswassenttoexpertsonBanglaandabroadagreementwasreachedforlsquohomogenizationofBanglaspellingrsquoby1988BasedonopinionsreceivedfromdifferentquartersaunanimouslistoflsquorulesrsquowasagreeduponThiswaspublishedbyalsquoSpellingDictionaryrsquo titled Ākādemi Bānāna Abhidhāna (1997) which was obviously morecomprehensive than lsquoTheUniversityofCalcuttaproposalsrsquomade in1936Alongwiththe lsquorationalizationrsquo of spellings another stepwas taken tomake thewriting systemeasier to read by making the symbols used both single and combined ones morelsquotransparentrsquoThesereformswereoriginallysuggestedbySarkar(1987firstpublishedin1978)[134][153]whereheusedthetermsSwaccha (lsquoTransparentrsquo)andAswaccha(lsquoOpaquersquo or non-transparent) even adding Ardha Swaccha (lsquohalf transparent) inbetweenthetwoSomesampleexamplesare Transparent r (nn) s (pt) [ (st) where both member of the cluster can berecognized
6Bangla Academy 2012 Bāṅlā Ekaḍemī Pramita Bāṅlā Bānānera Niyama (Bangla Academy StandardBanglaSpellingRules)DhakaːBanglaAcademy
Therewere in fact two types of proposals One concerned the shape of the lettersthose of consonant + vowel (CV) combinations and conjuncts which is consonant +consonantcombinationsTherewerefurthercomplexshapesiethoseofconsonant+consonant+ (consonant+) vowel (CC(CV) signs as in y (pru) or z (skru) SomedecisionsinthisareawerenecessarybecauseafewoftheCC(C)symbolsrepresentedcomplexitiesthatmadelearningthemdifficultforthechildrenTheotherdealtwiththespellings ofwords onlywithout any reference to the shapes of letters inwhich theywere written The basic objective here was lsquoone word one spellingrsquo to the greatestextentthatwaspossible[151]
Belowwe place a statement of themost salient changes that affect the consonant +vowelcombinations[153]
a The variants of the short u (^ উ-কার hrasva u-kāra) vowel sign have been brought down to one ie So zwnj (gu) is now গ Similarly h (ru) gt র zwnj (śu)gt শ j (hu)gtহ and therefore cluster + short u sign k (ntu)gt W
(ন++ত+উ) (stu)gt[ (স++ত+উ)
b The variants of long u (দীঘH ঊ-কার dīrgha u-kāra) have also been reduced
(rū)gt র n (bhrū) gt Y (ভ bh++র r+ঊ ū) (drū)gt (দ d++র r+ঊ ū) p (śrū)gt (শ ś++র r+ঊ ū)
c The variants of ঋ-কার (ṛ-kāra secondary symbol of ṛ) have been brought down to one q (hṛ) gt হ
Regarding consonant + consonant + (consonant)hellip+ (vowel) clusters PaschimbangaBanglaAkademi proposed transparent or semi-transparent shapes for clusters to theextentadmissibleinBanglawritingsystemSomeexampleswillclarifytheproposal(Aslashwillmeanthatthetraditionalcluster-shapeprecedesitwhiletheBanglaAkademiinnovationfollows)[153]
331TheConsonantsAsper traditional classificationBangla Consonants are categorized according to theirphoneticpropertiesespeciallyintermsofplaceandmannerofarticulation[107]Thereare Five lsquoVargarsquo (pronounced as lsquoBargarsquo in Bangla) or Groups (sets or classes)distinguished by Place of Articulation and one Non-lsquovargarsquo group [105] Each Vargawhich corresponds toStopsat a certainplaceof articulation containsa seriesof fiveconsonants classified as per their phonetic qualities (ie manner of articulation)beginning from Unvoiced and Unaspirated to Voiced and Aspirated (in the fourthcolumn)finallyendingwithaHomorganicorCorrespondingnasal[107]Considerthefollowingtable
332TheImplicitVowelKillerHasanta(calledrsquoHalantrsquoorlsquoHalantarsquoinotherBrahmı-basedscripts)As stated earlier all consonants are pronounced in isolation with an implicit vowel(centralback-ɔinBanglaastheneutralvowel)assumedtobeassociatedwiththem[121]ThelsquoHasantarsquo(=rsquoHalantrsquoorlsquoHalantarsquoinotherBrahmı-basedscripts)orthetermlsquoVirāmarsquo7(=rsquoDa rirsquoinBangla)aspreferredinUNICODE(cfUnicode30andabove)havebeenusedinthisreportastermsthathavebeenusedtodenotethecharacterthatmarkthe absence of this inherent vowel It may be noted that the term virama has beenadopted in UNICODE in a sense that is different from the traditional definition ofgrammarandhenceitrequiressomeexplanationhereConsideringtheimportanceofthedocumentthisnoteshouldbeapartofthisLGRdocumentsothatanybodyreferingtoitshouldbeabletoknowthepropergrammaticalexplanationofthetermBecauseaspecialsignisneededwheneverthisimplicitvowelisstrippedoffthesymbolisknownas the Hasanta (= Halant) (U+09CD) By placing the Hasanta under the firstconsonantofacombinationorclusteronecouldndashincommonparlanceldquokillrdquoitsvowelandcreate conjuncts In thismanner conjunct characters canbegenerallywrittenbyjoining two to fourconsonant combinations In rarecases thisprocess can joinup tofive consonantsHowever thenotionof amaximumnumberof consonants joining toformoneaksara8istobeboundedempiricallyThisisanobservationbasedontheCIIL-Emille Corpora of Bangla words [132 amp 133] as seen in print these days Given themixture of scripts and languages happening on theweb the possibility that onemaywant a generic Top Level Domain [gTLD] which may have more than the observedmaximum cannot be ruled out This can be the case when a foreign language wordwhichadmitsalargenumberofconsonantsistransliteratedintoBanglaHenceintheBanglaLGRworkthislimitwillnotbeenforced
333VowelsSeparate symbols exist for all lsquoSwararsquo or Vowels in Bangla which are pronouncedindependentlyeitheratthebeginningofthewordorafteranothervowelorconsonantsoundToindicateaVowelsoundotherthantheimplicitoneaVowelsigncalledlsquokārrsquoinBanglaorMātrāinNagarı9isattachedtotheconsonantSincetheconsonanthasthisbuilt in neutral vowel at the end there are equivalent kāras (Mātrās) for all vowelsexcepttheঅ(pronounced-ɔ)Thecorrelationisshownasfollows
7VirāmaasusedhereisalsoamisnomeraccordingtotheIndiangrammaticaltraditionsNowheremereabsence of a vowel is marked as virama Hasanta just marks the absence of a vowel nothing else(AbhyankarKashinathVasudevampJMShukla1961ADictionaryofSanskritGrammarBarodaːOrientalInstitute)8ThistermneedstobedisambiguatedAksaraalsomeanslsquosyllablelsquoinIndiangrammaticaltreaditions9AlthoughthetermlsquoMatralsquoinBanglastandsforanaltogetherdifferentconceptvizthetopbarplacedoveraletterndashtypicallyavailableinHindiandBanglabutmissinginGujarati
334TheAnusvāraonuʃʃār(ং-U+0982)TheAnusvāra or onuʃʃār inBangla at times represents a homorganic nasal but notalwaysItreplacesaconjunctgroupofalsquoNasalConsonant+Hasanta+ConsonantrsquowherethesecondconsonantbelongstotheVelarvargaorsetasinলংকাButitoftenappearsalso for such combinations involving non-velars appearing as the lastmember of thecombinationasinলGাংটা ldquonakedrdquoorলGাংচা ldquoakindofsweettolimprdquoBeforeanon-vargaconsonant the Anusvara represents a nasal sound that may have an alternativeconjoined writing symbol representing the corresponding nasal consonant of theparticularsetAlthoughModernHindiMarathiandKonkaniprefertheanusvāratothecorrespondingHalf-nasal inBangla it isclearlydemarcatedastowhereonemustusetheAnusvāraandwhere ithastobeaconjunctclusterwithanasalasthefirstorthesecondcomponent
335NasalizationCandrabindu(-U+0981)
Candrabindu denotes nasalization of the preceding vowel as in চাদ cad lsquomoonrsquo(U+099AU+09BEU+0981U+09A6)Thissignwithadotinsidethehalf-moonmarkisusedasnasalizationmarkerinmanyBrahmı-basedscripts[143]
336Nukta(-U+09BC)ThenuktasigndoesnotexistinBanglaorthographyItispredominantlyusedinmanyBrahmıderivedscriptssuchasDevanagarı(forHindiBodoMaithiliSantaliKashmiriandSindhiThetermandtheconceptofnuktaareborrowedinBanglaTheIDNAProtocol(RFC5891)statesthatIDNsmustbeinUnicodeNormalizationFormC (NFC) RFC 7940 applies this requirement to LGRs The definition of NFC in theUnicodeStandardcontainsanumberofcompositionexclusionsAsaresulttheBanglalettersয় YYAড় RRAandঢ় RRHAhavetoberepresentedinthethisLGRbyusingthesequences (YA +Nukta U+9AF + U+09BC) (DDA + Nukta U+9A1 + U+09BC) and(DDHA+NuktaU+9A2+U+09BC)insteadofthesinglecodepointsYYA(U+9DF)RRA(U+09DC) andRRHA (U+09DD) although the useof lsquoNuktarsquo is otherwise completelyunnaturalinBanglaIt is noted that in the current Unicode Standard chart these characters are listed asadditionalconsonantsAspertheLGRProcedurehoweverthesedecisionsdependontheIDNAProtocolthroughasetofprodeduresdevelopedbytheIETFEventhoughtheUnicode Standard also prescribesmethods to produce these three characters both asatomiccharacters (forexample09DC forড় [r]09DD forঢ় [rh] and09DFasয় [y]assinglekeystroke)theIDNAprotocolrequiresthatwetreatthemasconjunctcharactersandthenallocatecodesfortheseintheUnicodeBengaliBlock
18
ItmaybenotedthattherecouldbesporadicattemptsorcasesofwritingMuslimnamesUrdupoeticwordsandPerso-Arabicloanwordswithnuktaunderক(k)খ(kh)গ(g)জ(j) and ফ (ph) only for the sake of correct pronunciation and for maintaining thesanctityoftheloanwordThesewerealsolikeusingBanglawritingsystemtoworkliketheIPAscriptItishowevernotinuseinBanglawritinginprinting
337Visargabiʃɔrgo(ঃ-U+0983)andAvagraha(ঽ-U+09BD)
TheVisargabiʃɔrgoU+0983 is frequentlyused inBangla loanwordsborrowed fromSanskritandrepresentsasoundveryclosetohOnecouldquoteasanexampleদঃখduhkholdquosorrowrsquorsquoldquounhappinessrsquorsquo(U+0926U+0941U+0983U+0916)The Avagraha ঽ (U+09BD) is mainly used in Sanskrit Pali Prakrt or Maithili textswritteninBanglaItisgraduallybeingreplacedbyanuppercomma(egনেরাঽপরািণre-writtenasনেরাrsquoপরািণ)ItisrarelyusednoweveninotherlanguagesusingBanglascriptIncaseofLGRtheAvagrahaisnotpartoftherepertoireIthasbeendecidedthereforenottoretainAvagraha(ঽ)(U+09BD)becauseitisblockedinTLDsaspertheMaximalStartingRepertoire(MSR)PleaseseeAppendixIIinsection11foracompletelistofBanglaconsonantsandtheirallographs
type in the following mannermdashর++য but for র G the sequence would be
র+ZWJ++য [154] In other words ZWJ is used in the rendering of wordsdemanding ya-phalā after ra which is otherwise not possible to type (render)due to the same order of ra+hasanta+antastha ja in the medial andor finalposition Interestingly ra+hasanta+antastha ja is used to type repha on theconsonant -antasthaja as inকায6 (kaarjo) In order to get a ya-phalā after the
consonant -ra it is therefore obligatory to use ZWJ after -ra as in র Gাপার
TheuseofZWJZWNJhavebeenruledoutfromtherootzonebythe[Procedure]Usedin Bangla to create alternate renderings the insertion of these two signs can affectsearchingaswellasNLPTheZeroWidthNon-joiner(ZWNJ)isaninvisiblecharacterusedincertaincases(afterHasanta)wheredefaultconjunctformationistobeexplicitlyrestrictedandtheHasantajoiningthetwoconsonantsparticipatingintheconjunctformationneedstobeexplicitlyshown
339UseofYa-phalaaYa-Phalaasequencesare two instances inBanglawhereHasanta isprecededbya fullvowel(U+0985অ-BENGALILETTERAandU+098Fএ-BENGALILETTERE)
For renderingYa-phalā followedbyঅ andএ it isnecessary to typeU+09CDHasantaplusU+09AFyaprecededby thesaidvowelsThis isapurely ligaturalentityand theadditionofYa-phalāandākaraisusedtoelicittheaeligsoundasinEnglishacidঅGািসড
association অGােসািসেয়শনlsquobatrsquoবGাটlsquofatrsquo ফGাট lsquomatrsquo মGাটlsquocaprsquoকGাপetcTheBrāhmīscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribed as lsquovowel killerrsquo although it actually indicates absenceof a vowel after themarkedconsonantOnly theconsonantscanhave theHasantamarkedButasweseehereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅ8াandএ8া(CfUnicode100p473[100])
Owingtoco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egT ieka+Hasanta+raasinচTchakraldquocyclerdquo)Thepointisinboth the cases the slot for ra could be Bangla ra র (U+09B0) or the Assamese ra ৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesameTheLGRmakesanoteofthispoint of concern with respect to the two RAs in disguise as it would be compeltelyimpossibletodistinguishbetweenthemwithnakedeyesinalablesogeneratedwhichmay consequently lead to concerns related to spoofing and other kind of cyberirregularitiesThemotivetoclassthesetwoCPsas(blocking)variantsisbecausefullyrendered labels may mask the distinction between Bangla ra র (U+09B0) or theAssameseraৰ(U+09F0)ThatprovidesthejustificationforVariantSet4thoughonlyinthecontextoffollowingHasantThedifferencebetweentheRAsisonlydistinguishableifonelooksintotheirUnicodevaluesThereforelabelssuchasঅকHarka শীষH sırsalsquotopapexrsquo অY abhra lsquocloudthe skyrsquo ম śrama lsquophysical labourrsquo could be extremelydangerous as theweb-usermay never verify the digital content (the labels) with itsunicodevaluecodepoints ThispointismadeexplicitlywithreferencetoTable9(ofsequencesp36)andTable16(ofWLESymbolsp47)thataretofollowMoreoverit
4 OverallDevelopmentProcessandMethodologyThe Neo-Brāhmī Generation Panel (NBGP) has been formed by members havingexperience in Linguistics (especially in NLP Computational linguistics) LiteratureLanguageHistoryandEpigraphyUndertheNeo-BrāhmīGenerationPanelBanglaandeightotherscriptsbelongingtoseparateUnicodeblocksarebeingtakenuptoassignaseparate LGR for each However an attempt ismade to ensure that the fundamentalphilosophybehindbuildingthoseLGRsconsistentwithallotherBrāhmī-derivedscriptsThepresentLGRwillcater tomultiple languagesbelongingtoEGIDSscale1to4(seeTable4)thatuseBanglascriptThefollowingguidingprinciplesareusedinmakingdecisionsaboutBanglaLGRCode-points
41 GuidingPrinciplesTheNBGP adopts followingbroadprinciples for selection of code-points in the code-pointrepertoireacrosstheboardforalltheNeo-Brāhmīscriptswithinitsambit
411 InclusionPrinciples4111 ModernUsageEvery character proposed should be in the everyday usage of a particular linguisticcommunityThecharacterswhichhavebeenencodedintheUnicodefortranscriptionpurposesonlyorforarchivalpurposeswillnotbeconsideredforinclusioninthecode-pointrepertoire4112 UnambiguousUseEvery character proposed should have unambiguous understanding among linguistsaboutitsusageinthelanguage
42 ExclusionPrinciplesThe main exclusion principle is that of External Limits on Scope These consist ofprotocolsor standardswhichareprerequisites to theLabelGenerationRule-setsAllfurtherprinciplesareinfactsubsumedundertheselimitationsbuthavebeenspeltoutseparatelyforthesakeofclarity
22
421 ExternalLimitsofScopeThecodepointrepertoireforrootzonebeingaveryspecialcaseatthetopofprotocolhierarchies thecanvasofavailablecharacters forselectionasapartof theRootZonecodepointrepertoireisalreadyconstrainedbyvariousprotocollayersbeneathitThefollowingthreemainprotocolsstandardsactassuccessivefiltersiTheUnicodeChartOut of all the characters that are needed by the script in question if a particularcharacter is not encoded in Unicode it cannot be incorporated in the code pointrepertoire Such cases are quite rare and especially so in Bangla-Asamiyā-ManipuriWritingSystemgiventheelaborateandexhaustivecharacterinclusioneffortsmadebytheUnicodeconsortiumiiIDNAProtocolUnicode being the character-encoding standard for providing the maximum possiblerepresentation of a given scriptlanguage it has encoded as far as possible all thepossible characters needed by the script However the Domain name being aspecialized case it is governed by an additional protocol known as IDNA(InternationalizedDomainNames inApplications) The IDNAprotocol excludes somecharactersoutofUnicoderepertoirefrombeingpartofthedomainnamesiiiMaximalStartingRepertoire(MSR)TheRoot-zoneLGRbeing the repertoireof characterswhicharegoing tobeused forcreationoftheRoot-zoneTLDswhichinturnconstituteanevenmorespecializedcaseof domain names the ROOT LGR procedure introduces additional exclusions on theIDNArsquosallowedsetofcharacters ExampleBanglaSignAvagrahaঽ(U+093D)evenifallowedbyIDNAprotocol isnotpermittedintheRootZoneRepertoireaspertheMSRTosumuptherestrictionsstartoffwithadmittingonlysuchcharactersasarepartofthe code-block of the given scriptlanguage The IDNA Protocol further narrows thisdownandfinallyanadditionalfilterintheformofMaximalStartingRepertoirerestrictsthecharactersetassociatedwiththegivenlanguageevenmore4211 NoPunctuationMarksTheTLDsbeingidentifierspunctuationmarkerspresentinBraHami-basedscriptswillnotbeincluded
23
4212 NoSymbolsandAbbreviationsAbbreviations weights and measures and other such iconic characters like BANGLAISSHAR(U+09FA)BANGLACURRENCYDENOMINATORSIXTEEN৹(U+09F9)etcwillalsonotbeincluded4213 NoRareandObsoleteCharactersThere are characterswhich have been added toUnicode to accommodate rare formssuchasSanskriticVOCALICRRৠ(U+09E0)andVOCALICLldquoঌrdquo (U+098C)aswellasVOCALICLLৡ(U+09E1)andtheallographicndashkaraformsofthelattertwosymbols-VOWELSIGNVOCALICL(U+09E2)andVOWELSIGNVOCALICLLldquo(U+09E3)Allsuch charactersareexcludedwhich complieswith theConservatismprincipleas laiddownintheRootZoneLGRprocedureHoweverinBanglathe-karacorrespondingtoVOCALICRRৠ(U+09E0)whichisVOWELSIGNVOCALICRRldquordquo(U+09C4)isstill inactiveuseincertainlimitedborrowedorSanskriticwordsandarethereforeretained4214 NoStressMarkersofClassicalSanskritandVedicStressmarkers for classical Sanskrit will not be included This is also in consonancewiththeLetterprincipleaslaiddownintheRootZoneLGRprocedure4215 ABNFThe Augmented Backus-Naur Formalism (ABNF) is described in Section 541 andAppendix(Section101)
5 RepertoireTheBanglaWritingSystemisrepresentedinUNICODEusingtheBengali(Bangla)scriptname as enumerated in ISO 15924 corresponding to languages such as Asamiyā(Assamese) Bangla (Bengali) and Manipuri The BENGALI block used for Bangla-Asamiyā-Manipuri in theUNICODEhas93 entriesThis sectiondetails the code-pointrepertoirethattheNeo-BrāhmīGenerationPanel[NBGP]proposestobeincludedintheBanglaLGRItmaybementionedherethat theGovernmentofAssamhassubmittedaproposal toBureauof Indian Standards (BIS) on26th February2016 for dis-unificationofBanglaand Asamiyā Scripts The BIS inits 8thMeetingofIndian Language Technologies andProducts Sectional Committee LITD 20 held on 23rd Aug 2017 decided torefer theproposalforrecognitionofAssamesescriptinISOIEC10646toISOUntiltheUNICODEConsortiumtakesanyfurtheractionitwillbeassumedthattheCodePointRepertoireunderTable11willbevalidforallthethreelanguagesasabove
24
For each of the code points language references have been given in the last columntitledReferenceunderTable8titledtheldquoCodePointRepertoirerdquoForentirecoverageofBanglacodepointsreferencesofBanglaAsamiyā(Assamese)Manipuri(Meitei)andBishnupriya are given Kokborok written in Bangla script is not known to haveintroducedmanynewcomplicationsexceptforoneparticularcharacterThoughonlyafewrepresentativelanguagesunderEGIDSScale1-4havebeenchosenforreferencingthey together cover all the code-points required for all the languages that NBGP hasconsideredasgivenunderBanglaUnicodePoints(asgiveninUNICODE63)Howeverbefore thedetailsarepresented it is ideal to lookat theBanglaCodePointChartfromMaximalStartingRepertoire[MSR]Version3Itmaybenotedthattheshapesofthereferenceglyphsgivenbelowinthecodechartsarebasedononeofthemanyfontsdesigned and are not prescriptive because there could be some variations in actualfonts ndash both UNICODE-compatible and True-Type ones Consider the following Codepointtable
ColourconventionAllcharactersthatareincludedinthe[MSR]-YellowbackgroundPVALIDinIDNA2008butexcludedfromthe[MSR]-PinkishbackgroundNot PVALID in IDNA2008 or are ineligiblefor the root zone (digits hyphen) - Whitebackground
Apart from the above individual code-points the Neo-Brāhmī Generation Panel alsoproposes some specific sequences which enable conditional inclusion of the BanglaLETTER A and E followed by Bangla SIGN VIRAMA and Bangla LETTER YA againfollowed by Bangla VOWEL SIGN AA in the repertoire for enabling inclusion of aeligsoundasinEnglishlsquobatrsquolsquocatrsquoetc
52 CodePointRepertoireExclusionTherearesomecharactersoftheBanglascriptthatfindplaceintheUnicodebuthavenot been included in the repertoire in the LGR proposal The reason for excludingঌ(U+098C)andৗ(U+09D7)isthattheyarerareandobsoletecharacters
53 CodepointnotusedaloneBENGALI SIGN NUKTA U+09BC (See 336) is excluded from repertoire since it will never be used alone It will be used as sequence in three special characters in normalized form for ড় ঢ় য়
542 TheVowelSequenceInwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaare given To facilitate understanding of other Brahmi script users equivalents inDevanāgarīareprovidedwherevernecessaryAvowelsequenceismadeupofasinglevowelItmaybefollowedbutnotnecessarily(optionally)byanAnusvāraonuʃʃār(B)Candrabindu(D)oraVisargabiʃɔrgo(X)ThenumberofDBorXwhichcanfollowaV inBanglamaynotberestrictedtooneGoingbytherules illustratedinthedocument it isclearthat formationssuchasVDDVBBandVXXare invalidorthographicunitsHowever it isvalidandpossible tohaveformationsorsequencessuchasanusvarafollowedbyachandrabinduononehandandvisarga followed by a chandrabindu on the other as in হ8াংচা lsquohaelignchārsquo and lsquohaelignrsquo হ8াঃrespectivelyThepossibilityof aVisarga orAnusvāra (onuʃʃār) followingaCandrabinduexists inBangla Vowel can optionally be followed by a combination of Hasanta Virama [H]Consonant [C] to formaYa-phala ldquoYa-phala isapresentation formofU+09AFBanglaletterযorlsquoyarsquoRepresentedbythesequenceltU+09CDieBENGALISIGNVIRAMABangla SIGNHasanta or VIRAMA U+09AF -য BENGALI LETTER YAgt Ya-phala has aspecialformয়AgainwhencombinedwithU+09BEাBENGALIVOWELSIGNAA(ielsquoaarsquo(ā))itisusedfortranscribing[aelig]asintheldquoardquointheEnglishwordldquobatrdquowritteninBanglaasব8াটAVowel-sequenceadmitsthefollowingcombinations
5421 ASingleVowel
ExamplesV অअ
5422 AVowelwithConditionsAVowelcanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D]orVisarga[X]or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX] or combination ofHasanta(orVirama)[H]followedbyConsonant[C]followedbykāra(Mātrā)[M]
5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]
5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX
Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)
545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full
10 Refer to Rule P in Section 7 Table 16
42
vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+
6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed
61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus
স +Hasanta+হ(U+09B8+U+09CD+U+09B9)
2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus
ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)
44
Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)
62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference
7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecifications
11 Unicode uses Oriya for the script although Odia is now the official term used 12 As used by the Unicode denoting and including both Assamese and Maṇipuri
47
Below are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)
C rarr Consonant
M rarr Kāra(Mātrā)
V rarr Vowel
B rarr Anusvāra
D rarr Candrabindu
X rarr Visarga
H rarr Hasanta
Z rarr KhandaTa
S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules
P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA
Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinations
49
711 Case of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াtঅuইিvয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অu)andthesecondwordbeginswithaV(ইিvয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstwordThis isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule
72 AdditionalExamplesfromBanglaABNF
Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive
ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of the
JanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka
[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018
[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473
57
10 Appendix-I
101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply
1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াwেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াxব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexampleyেসিরং(Tsering)
102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti
13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section
58
Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)
Table17ndashTheScriptTableofSylhetiNagarıorSiloti
103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints
ক k Uuml(p+ক)raquo(p+ট)(p+ত)Yacute(p+y+র)(p+y+ব)THORN(p+ন)szlig(p+ব)agrave(p+ম)ক8(p+য)aacute(p+র)acirc(p+ষ)atilde(p+frac12+ণ)auml(p+frac12+ম)aring(p+frac12+ব)acirc8(p+frac12+য)aelig(p+স)t(ccedil+ক)egrave(+p+র)
amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)
গ g ecirc(micro+গ)euml(micro+দ)acute(micro+ধ)igrave(micro+ন)iacute(micro+ব)icirc(micro+ম)গ8(micro+য)iuml(micro+র)eth(micro+ল)ntilde(ccedil+গ)ntilde6 ( +ccedil+গ)
ণ n ouml(Agrave+ট)iquest(Agrave+ঠ)divide(Agrave+ড)oslash(Agrave+Atilde+র)AElig(Agrave+ঢ)ugrave(Agrave+ণ)ণ8(Agrave+য)uacute(Agrave+ব)atilde(p+frac12+ণ)ucirc(frac12+ণ)uuml(+ণ)
Therewere in fact two types of proposals One concerned the shape of the lettersthose of consonant + vowel (CV) combinations and conjuncts which is consonant +consonantcombinationsTherewerefurthercomplexshapesiethoseofconsonant+consonant+ (consonant+) vowel (CC(CV) signs as in y (pru) or z (skru) SomedecisionsinthisareawerenecessarybecauseafewoftheCC(C)symbolsrepresentedcomplexitiesthatmadelearningthemdifficultforthechildrenTheotherdealtwiththespellings ofwords onlywithout any reference to the shapes of letters inwhich theywere written The basic objective here was lsquoone word one spellingrsquo to the greatestextentthatwaspossible[151]
Belowwe place a statement of themost salient changes that affect the consonant +vowelcombinations[153]
a The variants of the short u (^ উ-কার hrasva u-kāra) vowel sign have been brought down to one ie So zwnj (gu) is now গ Similarly h (ru) gt র zwnj (śu)gt শ j (hu)gtহ and therefore cluster + short u sign k (ntu)gt W
(ন++ত+উ) (stu)gt[ (স++ত+উ)
b The variants of long u (দীঘH ঊ-কার dīrgha u-kāra) have also been reduced
(rū)gt র n (bhrū) gt Y (ভ bh++র r+ঊ ū) (drū)gt (দ d++র r+ঊ ū) p (śrū)gt (শ ś++র r+ঊ ū)
c The variants of ঋ-কার (ṛ-kāra secondary symbol of ṛ) have been brought down to one q (hṛ) gt হ
Regarding consonant + consonant + (consonant)hellip+ (vowel) clusters PaschimbangaBanglaAkademi proposed transparent or semi-transparent shapes for clusters to theextentadmissibleinBanglawritingsystemSomeexampleswillclarifytheproposal(Aslashwillmeanthatthetraditionalcluster-shapeprecedesitwhiletheBanglaAkademiinnovationfollows)[153]
331TheConsonantsAsper traditional classificationBangla Consonants are categorized according to theirphoneticpropertiesespeciallyintermsofplaceandmannerofarticulation[107]Thereare Five lsquoVargarsquo (pronounced as lsquoBargarsquo in Bangla) or Groups (sets or classes)distinguished by Place of Articulation and one Non-lsquovargarsquo group [105] Each Vargawhich corresponds toStopsat a certainplaceof articulation containsa seriesof fiveconsonants classified as per their phonetic qualities (ie manner of articulation)beginning from Unvoiced and Unaspirated to Voiced and Aspirated (in the fourthcolumn)finallyendingwithaHomorganicorCorrespondingnasal[107]Considerthefollowingtable
332TheImplicitVowelKillerHasanta(calledrsquoHalantrsquoorlsquoHalantarsquoinotherBrahmı-basedscripts)As stated earlier all consonants are pronounced in isolation with an implicit vowel(centralback-ɔinBanglaastheneutralvowel)assumedtobeassociatedwiththem[121]ThelsquoHasantarsquo(=rsquoHalantrsquoorlsquoHalantarsquoinotherBrahmı-basedscripts)orthetermlsquoVirāmarsquo7(=rsquoDa rirsquoinBangla)aspreferredinUNICODE(cfUnicode30andabove)havebeenusedinthisreportastermsthathavebeenusedtodenotethecharacterthatmarkthe absence of this inherent vowel It may be noted that the term virama has beenadopted in UNICODE in a sense that is different from the traditional definition ofgrammarandhenceitrequiressomeexplanationhereConsideringtheimportanceofthedocumentthisnoteshouldbeapartofthisLGRdocumentsothatanybodyreferingtoitshouldbeabletoknowthepropergrammaticalexplanationofthetermBecauseaspecialsignisneededwheneverthisimplicitvowelisstrippedoffthesymbolisknownas the Hasanta (= Halant) (U+09CD) By placing the Hasanta under the firstconsonantofacombinationorclusteronecouldndashincommonparlanceldquokillrdquoitsvowelandcreate conjuncts In thismanner conjunct characters canbegenerallywrittenbyjoining two to fourconsonant combinations In rarecases thisprocess can joinup tofive consonantsHowever thenotionof amaximumnumberof consonants joining toformoneaksara8istobeboundedempiricallyThisisanobservationbasedontheCIIL-Emille Corpora of Bangla words [132 amp 133] as seen in print these days Given themixture of scripts and languages happening on theweb the possibility that onemaywant a generic Top Level Domain [gTLD] which may have more than the observedmaximum cannot be ruled out This can be the case when a foreign language wordwhichadmitsalargenumberofconsonantsistransliteratedintoBanglaHenceintheBanglaLGRworkthislimitwillnotbeenforced
333VowelsSeparate symbols exist for all lsquoSwararsquo or Vowels in Bangla which are pronouncedindependentlyeitheratthebeginningofthewordorafteranothervowelorconsonantsoundToindicateaVowelsoundotherthantheimplicitoneaVowelsigncalledlsquokārrsquoinBanglaorMātrāinNagarı9isattachedtotheconsonantSincetheconsonanthasthisbuilt in neutral vowel at the end there are equivalent kāras (Mātrās) for all vowelsexcepttheঅ(pronounced-ɔ)Thecorrelationisshownasfollows
7VirāmaasusedhereisalsoamisnomeraccordingtotheIndiangrammaticaltraditionsNowheremereabsence of a vowel is marked as virama Hasanta just marks the absence of a vowel nothing else(AbhyankarKashinathVasudevampJMShukla1961ADictionaryofSanskritGrammarBarodaːOrientalInstitute)8ThistermneedstobedisambiguatedAksaraalsomeanslsquosyllablelsquoinIndiangrammaticaltreaditions9AlthoughthetermlsquoMatralsquoinBanglastandsforanaltogetherdifferentconceptvizthetopbarplacedoveraletterndashtypicallyavailableinHindiandBanglabutmissinginGujarati
334TheAnusvāraonuʃʃār(ং-U+0982)TheAnusvāra or onuʃʃār inBangla at times represents a homorganic nasal but notalwaysItreplacesaconjunctgroupofalsquoNasalConsonant+Hasanta+ConsonantrsquowherethesecondconsonantbelongstotheVelarvargaorsetasinলংকাButitoftenappearsalso for such combinations involving non-velars appearing as the lastmember of thecombinationasinলGাংটা ldquonakedrdquoorলGাংচা ldquoakindofsweettolimprdquoBeforeanon-vargaconsonant the Anusvara represents a nasal sound that may have an alternativeconjoined writing symbol representing the corresponding nasal consonant of theparticularsetAlthoughModernHindiMarathiandKonkaniprefertheanusvāratothecorrespondingHalf-nasal inBangla it isclearlydemarcatedastowhereonemustusetheAnusvāraandwhere ithastobeaconjunctclusterwithanasalasthefirstorthesecondcomponent
335NasalizationCandrabindu(-U+0981)
Candrabindu denotes nasalization of the preceding vowel as in চাদ cad lsquomoonrsquo(U+099AU+09BEU+0981U+09A6)Thissignwithadotinsidethehalf-moonmarkisusedasnasalizationmarkerinmanyBrahmı-basedscripts[143]
336Nukta(-U+09BC)ThenuktasigndoesnotexistinBanglaorthographyItispredominantlyusedinmanyBrahmıderivedscriptssuchasDevanagarı(forHindiBodoMaithiliSantaliKashmiriandSindhiThetermandtheconceptofnuktaareborrowedinBanglaTheIDNAProtocol(RFC5891)statesthatIDNsmustbeinUnicodeNormalizationFormC (NFC) RFC 7940 applies this requirement to LGRs The definition of NFC in theUnicodeStandardcontainsanumberofcompositionexclusionsAsaresulttheBanglalettersয় YYAড় RRAandঢ় RRHAhavetoberepresentedinthethisLGRbyusingthesequences (YA +Nukta U+9AF + U+09BC) (DDA + Nukta U+9A1 + U+09BC) and(DDHA+NuktaU+9A2+U+09BC)insteadofthesinglecodepointsYYA(U+9DF)RRA(U+09DC) andRRHA (U+09DD) although the useof lsquoNuktarsquo is otherwise completelyunnaturalinBanglaIt is noted that in the current Unicode Standard chart these characters are listed asadditionalconsonantsAspertheLGRProcedurehoweverthesedecisionsdependontheIDNAProtocolthroughasetofprodeduresdevelopedbytheIETFEventhoughtheUnicode Standard also prescribesmethods to produce these three characters both asatomiccharacters (forexample09DC forড় [r]09DD forঢ় [rh] and09DFasয় [y]assinglekeystroke)theIDNAprotocolrequiresthatwetreatthemasconjunctcharactersandthenallocatecodesfortheseintheUnicodeBengaliBlock
18
ItmaybenotedthattherecouldbesporadicattemptsorcasesofwritingMuslimnamesUrdupoeticwordsandPerso-Arabicloanwordswithnuktaunderক(k)খ(kh)গ(g)জ(j) and ফ (ph) only for the sake of correct pronunciation and for maintaining thesanctityoftheloanwordThesewerealsolikeusingBanglawritingsystemtoworkliketheIPAscriptItishowevernotinuseinBanglawritinginprinting
337Visargabiʃɔrgo(ঃ-U+0983)andAvagraha(ঽ-U+09BD)
TheVisargabiʃɔrgoU+0983 is frequentlyused inBangla loanwordsborrowed fromSanskritandrepresentsasoundveryclosetohOnecouldquoteasanexampleদঃখduhkholdquosorrowrsquorsquoldquounhappinessrsquorsquo(U+0926U+0941U+0983U+0916)The Avagraha ঽ (U+09BD) is mainly used in Sanskrit Pali Prakrt or Maithili textswritteninBanglaItisgraduallybeingreplacedbyanuppercomma(egনেরাঽপরািণre-writtenasনেরাrsquoপরািণ)ItisrarelyusednoweveninotherlanguagesusingBanglascriptIncaseofLGRtheAvagrahaisnotpartoftherepertoireIthasbeendecidedthereforenottoretainAvagraha(ঽ)(U+09BD)becauseitisblockedinTLDsaspertheMaximalStartingRepertoire(MSR)PleaseseeAppendixIIinsection11foracompletelistofBanglaconsonantsandtheirallographs
type in the following mannermdashর++য but for র G the sequence would be
র+ZWJ++য [154] In other words ZWJ is used in the rendering of wordsdemanding ya-phalā after ra which is otherwise not possible to type (render)due to the same order of ra+hasanta+antastha ja in the medial andor finalposition Interestingly ra+hasanta+antastha ja is used to type repha on theconsonant -antasthaja as inকায6 (kaarjo) In order to get a ya-phalā after the
consonant -ra it is therefore obligatory to use ZWJ after -ra as in র Gাপার
TheuseofZWJZWNJhavebeenruledoutfromtherootzonebythe[Procedure]Usedin Bangla to create alternate renderings the insertion of these two signs can affectsearchingaswellasNLPTheZeroWidthNon-joiner(ZWNJ)isaninvisiblecharacterusedincertaincases(afterHasanta)wheredefaultconjunctformationistobeexplicitlyrestrictedandtheHasantajoiningthetwoconsonantsparticipatingintheconjunctformationneedstobeexplicitlyshown
339UseofYa-phalaaYa-Phalaasequencesare two instances inBanglawhereHasanta isprecededbya fullvowel(U+0985অ-BENGALILETTERAandU+098Fএ-BENGALILETTERE)
For renderingYa-phalā followedbyঅ andএ it isnecessary to typeU+09CDHasantaplusU+09AFyaprecededby thesaidvowelsThis isapurely ligaturalentityand theadditionofYa-phalāandākaraisusedtoelicittheaeligsoundasinEnglishacidঅGািসড
association অGােসািসেয়শনlsquobatrsquoবGাটlsquofatrsquo ফGাট lsquomatrsquo মGাটlsquocaprsquoকGাপetcTheBrāhmīscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribed as lsquovowel killerrsquo although it actually indicates absenceof a vowel after themarkedconsonantOnly theconsonantscanhave theHasantamarkedButasweseehereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅ8াandএ8া(CfUnicode100p473[100])
Owingtoco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egT ieka+Hasanta+raasinচTchakraldquocyclerdquo)Thepointisinboth the cases the slot for ra could be Bangla ra র (U+09B0) or the Assamese ra ৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesameTheLGRmakesanoteofthispoint of concern with respect to the two RAs in disguise as it would be compeltelyimpossibletodistinguishbetweenthemwithnakedeyesinalablesogeneratedwhichmay consequently lead to concerns related to spoofing and other kind of cyberirregularitiesThemotivetoclassthesetwoCPsas(blocking)variantsisbecausefullyrendered labels may mask the distinction between Bangla ra র (U+09B0) or theAssameseraৰ(U+09F0)ThatprovidesthejustificationforVariantSet4thoughonlyinthecontextoffollowingHasantThedifferencebetweentheRAsisonlydistinguishableifonelooksintotheirUnicodevaluesThereforelabelssuchasঅকHarka শীষH sırsalsquotopapexrsquo অY abhra lsquocloudthe skyrsquo ম śrama lsquophysical labourrsquo could be extremelydangerous as theweb-usermay never verify the digital content (the labels) with itsunicodevaluecodepoints ThispointismadeexplicitlywithreferencetoTable9(ofsequencesp36)andTable16(ofWLESymbolsp47)thataretofollowMoreoverit
4 OverallDevelopmentProcessandMethodologyThe Neo-Brāhmī Generation Panel (NBGP) has been formed by members havingexperience in Linguistics (especially in NLP Computational linguistics) LiteratureLanguageHistoryandEpigraphyUndertheNeo-BrāhmīGenerationPanelBanglaandeightotherscriptsbelongingtoseparateUnicodeblocksarebeingtakenuptoassignaseparate LGR for each However an attempt ismade to ensure that the fundamentalphilosophybehindbuildingthoseLGRsconsistentwithallotherBrāhmī-derivedscriptsThepresentLGRwillcater tomultiple languagesbelongingtoEGIDSscale1to4(seeTable4)thatuseBanglascriptThefollowingguidingprinciplesareusedinmakingdecisionsaboutBanglaLGRCode-points
41 GuidingPrinciplesTheNBGP adopts followingbroadprinciples for selection of code-points in the code-pointrepertoireacrosstheboardforalltheNeo-Brāhmīscriptswithinitsambit
411 InclusionPrinciples4111 ModernUsageEvery character proposed should be in the everyday usage of a particular linguisticcommunityThecharacterswhichhavebeenencodedintheUnicodefortranscriptionpurposesonlyorforarchivalpurposeswillnotbeconsideredforinclusioninthecode-pointrepertoire4112 UnambiguousUseEvery character proposed should have unambiguous understanding among linguistsaboutitsusageinthelanguage
42 ExclusionPrinciplesThe main exclusion principle is that of External Limits on Scope These consist ofprotocolsor standardswhichareprerequisites to theLabelGenerationRule-setsAllfurtherprinciplesareinfactsubsumedundertheselimitationsbuthavebeenspeltoutseparatelyforthesakeofclarity
22
421 ExternalLimitsofScopeThecodepointrepertoireforrootzonebeingaveryspecialcaseatthetopofprotocolhierarchies thecanvasofavailablecharacters forselectionasapartof theRootZonecodepointrepertoireisalreadyconstrainedbyvariousprotocollayersbeneathitThefollowingthreemainprotocolsstandardsactassuccessivefiltersiTheUnicodeChartOut of all the characters that are needed by the script in question if a particularcharacter is not encoded in Unicode it cannot be incorporated in the code pointrepertoire Such cases are quite rare and especially so in Bangla-Asamiyā-ManipuriWritingSystemgiventheelaborateandexhaustivecharacterinclusioneffortsmadebytheUnicodeconsortiumiiIDNAProtocolUnicode being the character-encoding standard for providing the maximum possiblerepresentation of a given scriptlanguage it has encoded as far as possible all thepossible characters needed by the script However the Domain name being aspecialized case it is governed by an additional protocol known as IDNA(InternationalizedDomainNames inApplications) The IDNAprotocol excludes somecharactersoutofUnicoderepertoirefrombeingpartofthedomainnamesiiiMaximalStartingRepertoire(MSR)TheRoot-zoneLGRbeing the repertoireof characterswhicharegoing tobeused forcreationoftheRoot-zoneTLDswhichinturnconstituteanevenmorespecializedcaseof domain names the ROOT LGR procedure introduces additional exclusions on theIDNArsquosallowedsetofcharacters ExampleBanglaSignAvagrahaঽ(U+093D)evenifallowedbyIDNAprotocol isnotpermittedintheRootZoneRepertoireaspertheMSRTosumuptherestrictionsstartoffwithadmittingonlysuchcharactersasarepartofthe code-block of the given scriptlanguage The IDNA Protocol further narrows thisdownandfinallyanadditionalfilterintheformofMaximalStartingRepertoirerestrictsthecharactersetassociatedwiththegivenlanguageevenmore4211 NoPunctuationMarksTheTLDsbeingidentifierspunctuationmarkerspresentinBraHami-basedscriptswillnotbeincluded
23
4212 NoSymbolsandAbbreviationsAbbreviations weights and measures and other such iconic characters like BANGLAISSHAR(U+09FA)BANGLACURRENCYDENOMINATORSIXTEEN৹(U+09F9)etcwillalsonotbeincluded4213 NoRareandObsoleteCharactersThere are characterswhich have been added toUnicode to accommodate rare formssuchasSanskriticVOCALICRRৠ(U+09E0)andVOCALICLldquoঌrdquo (U+098C)aswellasVOCALICLLৡ(U+09E1)andtheallographicndashkaraformsofthelattertwosymbols-VOWELSIGNVOCALICL(U+09E2)andVOWELSIGNVOCALICLLldquo(U+09E3)Allsuch charactersareexcludedwhich complieswith theConservatismprincipleas laiddownintheRootZoneLGRprocedureHoweverinBanglathe-karacorrespondingtoVOCALICRRৠ(U+09E0)whichisVOWELSIGNVOCALICRRldquordquo(U+09C4)isstill inactiveuseincertainlimitedborrowedorSanskriticwordsandarethereforeretained4214 NoStressMarkersofClassicalSanskritandVedicStressmarkers for classical Sanskrit will not be included This is also in consonancewiththeLetterprincipleaslaiddownintheRootZoneLGRprocedure4215 ABNFThe Augmented Backus-Naur Formalism (ABNF) is described in Section 541 andAppendix(Section101)
5 RepertoireTheBanglaWritingSystemisrepresentedinUNICODEusingtheBengali(Bangla)scriptname as enumerated in ISO 15924 corresponding to languages such as Asamiyā(Assamese) Bangla (Bengali) and Manipuri The BENGALI block used for Bangla-Asamiyā-Manipuri in theUNICODEhas93 entriesThis sectiondetails the code-pointrepertoirethattheNeo-BrāhmīGenerationPanel[NBGP]proposestobeincludedintheBanglaLGRItmaybementionedherethat theGovernmentofAssamhassubmittedaproposal toBureauof Indian Standards (BIS) on26th February2016 for dis-unificationofBanglaand Asamiyā Scripts The BIS inits 8thMeetingofIndian Language Technologies andProducts Sectional Committee LITD 20 held on 23rd Aug 2017 decided torefer theproposalforrecognitionofAssamesescriptinISOIEC10646toISOUntiltheUNICODEConsortiumtakesanyfurtheractionitwillbeassumedthattheCodePointRepertoireunderTable11willbevalidforallthethreelanguagesasabove
24
For each of the code points language references have been given in the last columntitledReferenceunderTable8titledtheldquoCodePointRepertoirerdquoForentirecoverageofBanglacodepointsreferencesofBanglaAsamiyā(Assamese)Manipuri(Meitei)andBishnupriya are given Kokborok written in Bangla script is not known to haveintroducedmanynewcomplicationsexceptforoneparticularcharacterThoughonlyafewrepresentativelanguagesunderEGIDSScale1-4havebeenchosenforreferencingthey together cover all the code-points required for all the languages that NBGP hasconsideredasgivenunderBanglaUnicodePoints(asgiveninUNICODE63)Howeverbefore thedetailsarepresented it is ideal to lookat theBanglaCodePointChartfromMaximalStartingRepertoire[MSR]Version3Itmaybenotedthattheshapesofthereferenceglyphsgivenbelowinthecodechartsarebasedononeofthemanyfontsdesigned and are not prescriptive because there could be some variations in actualfonts ndash both UNICODE-compatible and True-Type ones Consider the following Codepointtable
ColourconventionAllcharactersthatareincludedinthe[MSR]-YellowbackgroundPVALIDinIDNA2008butexcludedfromthe[MSR]-PinkishbackgroundNot PVALID in IDNA2008 or are ineligiblefor the root zone (digits hyphen) - Whitebackground
Apart from the above individual code-points the Neo-Brāhmī Generation Panel alsoproposes some specific sequences which enable conditional inclusion of the BanglaLETTER A and E followed by Bangla SIGN VIRAMA and Bangla LETTER YA againfollowed by Bangla VOWEL SIGN AA in the repertoire for enabling inclusion of aeligsoundasinEnglishlsquobatrsquolsquocatrsquoetc
52 CodePointRepertoireExclusionTherearesomecharactersoftheBanglascriptthatfindplaceintheUnicodebuthavenot been included in the repertoire in the LGR proposal The reason for excludingঌ(U+098C)andৗ(U+09D7)isthattheyarerareandobsoletecharacters
53 CodepointnotusedaloneBENGALI SIGN NUKTA U+09BC (See 336) is excluded from repertoire since it will never be used alone It will be used as sequence in three special characters in normalized form for ড় ঢ় য়
542 TheVowelSequenceInwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaare given To facilitate understanding of other Brahmi script users equivalents inDevanāgarīareprovidedwherevernecessaryAvowelsequenceismadeupofasinglevowelItmaybefollowedbutnotnecessarily(optionally)byanAnusvāraonuʃʃār(B)Candrabindu(D)oraVisargabiʃɔrgo(X)ThenumberofDBorXwhichcanfollowaV inBanglamaynotberestrictedtooneGoingbytherules illustratedinthedocument it isclearthat formationssuchasVDDVBBandVXXare invalidorthographicunitsHowever it isvalidandpossible tohaveformationsorsequencessuchasanusvarafollowedbyachandrabinduononehandandvisarga followed by a chandrabindu on the other as in হ8াংচা lsquohaelignchārsquo and lsquohaelignrsquo হ8াঃrespectivelyThepossibilityof aVisarga orAnusvāra (onuʃʃār) followingaCandrabinduexists inBangla Vowel can optionally be followed by a combination of Hasanta Virama [H]Consonant [C] to formaYa-phala ldquoYa-phala isapresentation formofU+09AFBanglaletterযorlsquoyarsquoRepresentedbythesequenceltU+09CDieBENGALISIGNVIRAMABangla SIGNHasanta or VIRAMA U+09AF -য BENGALI LETTER YAgt Ya-phala has aspecialformয়AgainwhencombinedwithU+09BEাBENGALIVOWELSIGNAA(ielsquoaarsquo(ā))itisusedfortranscribing[aelig]asintheldquoardquointheEnglishwordldquobatrdquowritteninBanglaasব8াটAVowel-sequenceadmitsthefollowingcombinations
5421 ASingleVowel
ExamplesV অअ
5422 AVowelwithConditionsAVowelcanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D]orVisarga[X]or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX] or combination ofHasanta(orVirama)[H]followedbyConsonant[C]followedbykāra(Mātrā)[M]
5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]
5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX
Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)
545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full
10 Refer to Rule P in Section 7 Table 16
42
vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+
6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed
61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus
স +Hasanta+হ(U+09B8+U+09CD+U+09B9)
2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus
ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)
44
Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)
62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference
7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecifications
11 Unicode uses Oriya for the script although Odia is now the official term used 12 As used by the Unicode denoting and including both Assamese and Maṇipuri
47
Below are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)
C rarr Consonant
M rarr Kāra(Mātrā)
V rarr Vowel
B rarr Anusvāra
D rarr Candrabindu
X rarr Visarga
H rarr Hasanta
Z rarr KhandaTa
S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules
P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA
Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinations
49
711 Case of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াtঅuইিvয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অu)andthesecondwordbeginswithaV(ইিvয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstwordThis isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule
72 AdditionalExamplesfromBanglaABNF
Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive
ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of the
JanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka
[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018
[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473
57
10 Appendix-I
101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply
1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াwেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াxব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexampleyেসিরং(Tsering)
102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti
13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section
58
Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)
Table17ndashTheScriptTableofSylhetiNagarıorSiloti
103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints
ক k Uuml(p+ক)raquo(p+ট)(p+ত)Yacute(p+y+র)(p+y+ব)THORN(p+ন)szlig(p+ব)agrave(p+ম)ক8(p+য)aacute(p+র)acirc(p+ষ)atilde(p+frac12+ণ)auml(p+frac12+ম)aring(p+frac12+ব)acirc8(p+frac12+য)aelig(p+স)t(ccedil+ক)egrave(+p+র)
amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)
গ g ecirc(micro+গ)euml(micro+দ)acute(micro+ধ)igrave(micro+ন)iacute(micro+ব)icirc(micro+ম)গ8(micro+য)iuml(micro+র)eth(micro+ল)ntilde(ccedil+গ)ntilde6 ( +ccedil+গ)
ণ n ouml(Agrave+ট)iquest(Agrave+ঠ)divide(Agrave+ড)oslash(Agrave+Atilde+র)AElig(Agrave+ঢ)ugrave(Agrave+ণ)ণ8(Agrave+য)uacute(Agrave+ব)atilde(p+frac12+ণ)ucirc(frac12+ণ)uuml(+ণ)
331TheConsonantsAsper traditional classificationBangla Consonants are categorized according to theirphoneticpropertiesespeciallyintermsofplaceandmannerofarticulation[107]Thereare Five lsquoVargarsquo (pronounced as lsquoBargarsquo in Bangla) or Groups (sets or classes)distinguished by Place of Articulation and one Non-lsquovargarsquo group [105] Each Vargawhich corresponds toStopsat a certainplaceof articulation containsa seriesof fiveconsonants classified as per their phonetic qualities (ie manner of articulation)beginning from Unvoiced and Unaspirated to Voiced and Aspirated (in the fourthcolumn)finallyendingwithaHomorganicorCorrespondingnasal[107]Considerthefollowingtable
332TheImplicitVowelKillerHasanta(calledrsquoHalantrsquoorlsquoHalantarsquoinotherBrahmı-basedscripts)As stated earlier all consonants are pronounced in isolation with an implicit vowel(centralback-ɔinBanglaastheneutralvowel)assumedtobeassociatedwiththem[121]ThelsquoHasantarsquo(=rsquoHalantrsquoorlsquoHalantarsquoinotherBrahmı-basedscripts)orthetermlsquoVirāmarsquo7(=rsquoDa rirsquoinBangla)aspreferredinUNICODE(cfUnicode30andabove)havebeenusedinthisreportastermsthathavebeenusedtodenotethecharacterthatmarkthe absence of this inherent vowel It may be noted that the term virama has beenadopted in UNICODE in a sense that is different from the traditional definition ofgrammarandhenceitrequiressomeexplanationhereConsideringtheimportanceofthedocumentthisnoteshouldbeapartofthisLGRdocumentsothatanybodyreferingtoitshouldbeabletoknowthepropergrammaticalexplanationofthetermBecauseaspecialsignisneededwheneverthisimplicitvowelisstrippedoffthesymbolisknownas the Hasanta (= Halant) (U+09CD) By placing the Hasanta under the firstconsonantofacombinationorclusteronecouldndashincommonparlanceldquokillrdquoitsvowelandcreate conjuncts In thismanner conjunct characters canbegenerallywrittenbyjoining two to fourconsonant combinations In rarecases thisprocess can joinup tofive consonantsHowever thenotionof amaximumnumberof consonants joining toformoneaksara8istobeboundedempiricallyThisisanobservationbasedontheCIIL-Emille Corpora of Bangla words [132 amp 133] as seen in print these days Given themixture of scripts and languages happening on theweb the possibility that onemaywant a generic Top Level Domain [gTLD] which may have more than the observedmaximum cannot be ruled out This can be the case when a foreign language wordwhichadmitsalargenumberofconsonantsistransliteratedintoBanglaHenceintheBanglaLGRworkthislimitwillnotbeenforced
333VowelsSeparate symbols exist for all lsquoSwararsquo or Vowels in Bangla which are pronouncedindependentlyeitheratthebeginningofthewordorafteranothervowelorconsonantsoundToindicateaVowelsoundotherthantheimplicitoneaVowelsigncalledlsquokārrsquoinBanglaorMātrāinNagarı9isattachedtotheconsonantSincetheconsonanthasthisbuilt in neutral vowel at the end there are equivalent kāras (Mātrās) for all vowelsexcepttheঅ(pronounced-ɔ)Thecorrelationisshownasfollows
7VirāmaasusedhereisalsoamisnomeraccordingtotheIndiangrammaticaltraditionsNowheremereabsence of a vowel is marked as virama Hasanta just marks the absence of a vowel nothing else(AbhyankarKashinathVasudevampJMShukla1961ADictionaryofSanskritGrammarBarodaːOrientalInstitute)8ThistermneedstobedisambiguatedAksaraalsomeanslsquosyllablelsquoinIndiangrammaticaltreaditions9AlthoughthetermlsquoMatralsquoinBanglastandsforanaltogetherdifferentconceptvizthetopbarplacedoveraletterndashtypicallyavailableinHindiandBanglabutmissinginGujarati
334TheAnusvāraonuʃʃār(ং-U+0982)TheAnusvāra or onuʃʃār inBangla at times represents a homorganic nasal but notalwaysItreplacesaconjunctgroupofalsquoNasalConsonant+Hasanta+ConsonantrsquowherethesecondconsonantbelongstotheVelarvargaorsetasinলংকাButitoftenappearsalso for such combinations involving non-velars appearing as the lastmember of thecombinationasinলGাংটা ldquonakedrdquoorলGাংচা ldquoakindofsweettolimprdquoBeforeanon-vargaconsonant the Anusvara represents a nasal sound that may have an alternativeconjoined writing symbol representing the corresponding nasal consonant of theparticularsetAlthoughModernHindiMarathiandKonkaniprefertheanusvāratothecorrespondingHalf-nasal inBangla it isclearlydemarcatedastowhereonemustusetheAnusvāraandwhere ithastobeaconjunctclusterwithanasalasthefirstorthesecondcomponent
335NasalizationCandrabindu(-U+0981)
Candrabindu denotes nasalization of the preceding vowel as in চাদ cad lsquomoonrsquo(U+099AU+09BEU+0981U+09A6)Thissignwithadotinsidethehalf-moonmarkisusedasnasalizationmarkerinmanyBrahmı-basedscripts[143]
336Nukta(-U+09BC)ThenuktasigndoesnotexistinBanglaorthographyItispredominantlyusedinmanyBrahmıderivedscriptssuchasDevanagarı(forHindiBodoMaithiliSantaliKashmiriandSindhiThetermandtheconceptofnuktaareborrowedinBanglaTheIDNAProtocol(RFC5891)statesthatIDNsmustbeinUnicodeNormalizationFormC (NFC) RFC 7940 applies this requirement to LGRs The definition of NFC in theUnicodeStandardcontainsanumberofcompositionexclusionsAsaresulttheBanglalettersয় YYAড় RRAandঢ় RRHAhavetoberepresentedinthethisLGRbyusingthesequences (YA +Nukta U+9AF + U+09BC) (DDA + Nukta U+9A1 + U+09BC) and(DDHA+NuktaU+9A2+U+09BC)insteadofthesinglecodepointsYYA(U+9DF)RRA(U+09DC) andRRHA (U+09DD) although the useof lsquoNuktarsquo is otherwise completelyunnaturalinBanglaIt is noted that in the current Unicode Standard chart these characters are listed asadditionalconsonantsAspertheLGRProcedurehoweverthesedecisionsdependontheIDNAProtocolthroughasetofprodeduresdevelopedbytheIETFEventhoughtheUnicode Standard also prescribesmethods to produce these three characters both asatomiccharacters (forexample09DC forড় [r]09DD forঢ় [rh] and09DFasয় [y]assinglekeystroke)theIDNAprotocolrequiresthatwetreatthemasconjunctcharactersandthenallocatecodesfortheseintheUnicodeBengaliBlock
18
ItmaybenotedthattherecouldbesporadicattemptsorcasesofwritingMuslimnamesUrdupoeticwordsandPerso-Arabicloanwordswithnuktaunderক(k)খ(kh)গ(g)জ(j) and ফ (ph) only for the sake of correct pronunciation and for maintaining thesanctityoftheloanwordThesewerealsolikeusingBanglawritingsystemtoworkliketheIPAscriptItishowevernotinuseinBanglawritinginprinting
337Visargabiʃɔrgo(ঃ-U+0983)andAvagraha(ঽ-U+09BD)
TheVisargabiʃɔrgoU+0983 is frequentlyused inBangla loanwordsborrowed fromSanskritandrepresentsasoundveryclosetohOnecouldquoteasanexampleদঃখduhkholdquosorrowrsquorsquoldquounhappinessrsquorsquo(U+0926U+0941U+0983U+0916)The Avagraha ঽ (U+09BD) is mainly used in Sanskrit Pali Prakrt or Maithili textswritteninBanglaItisgraduallybeingreplacedbyanuppercomma(egনেরাঽপরািণre-writtenasনেরাrsquoপরািণ)ItisrarelyusednoweveninotherlanguagesusingBanglascriptIncaseofLGRtheAvagrahaisnotpartoftherepertoireIthasbeendecidedthereforenottoretainAvagraha(ঽ)(U+09BD)becauseitisblockedinTLDsaspertheMaximalStartingRepertoire(MSR)PleaseseeAppendixIIinsection11foracompletelistofBanglaconsonantsandtheirallographs
type in the following mannermdashর++য but for র G the sequence would be
র+ZWJ++য [154] In other words ZWJ is used in the rendering of wordsdemanding ya-phalā after ra which is otherwise not possible to type (render)due to the same order of ra+hasanta+antastha ja in the medial andor finalposition Interestingly ra+hasanta+antastha ja is used to type repha on theconsonant -antasthaja as inকায6 (kaarjo) In order to get a ya-phalā after the
consonant -ra it is therefore obligatory to use ZWJ after -ra as in র Gাপার
TheuseofZWJZWNJhavebeenruledoutfromtherootzonebythe[Procedure]Usedin Bangla to create alternate renderings the insertion of these two signs can affectsearchingaswellasNLPTheZeroWidthNon-joiner(ZWNJ)isaninvisiblecharacterusedincertaincases(afterHasanta)wheredefaultconjunctformationistobeexplicitlyrestrictedandtheHasantajoiningthetwoconsonantsparticipatingintheconjunctformationneedstobeexplicitlyshown
339UseofYa-phalaaYa-Phalaasequencesare two instances inBanglawhereHasanta isprecededbya fullvowel(U+0985অ-BENGALILETTERAandU+098Fএ-BENGALILETTERE)
For renderingYa-phalā followedbyঅ andএ it isnecessary to typeU+09CDHasantaplusU+09AFyaprecededby thesaidvowelsThis isapurely ligaturalentityand theadditionofYa-phalāandākaraisusedtoelicittheaeligsoundasinEnglishacidঅGািসড
association অGােসািসেয়শনlsquobatrsquoবGাটlsquofatrsquo ফGাট lsquomatrsquo মGাটlsquocaprsquoকGাপetcTheBrāhmīscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribed as lsquovowel killerrsquo although it actually indicates absenceof a vowel after themarkedconsonantOnly theconsonantscanhave theHasantamarkedButasweseehereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅ8াandএ8া(CfUnicode100p473[100])
Owingtoco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egT ieka+Hasanta+raasinচTchakraldquocyclerdquo)Thepointisinboth the cases the slot for ra could be Bangla ra র (U+09B0) or the Assamese ra ৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesameTheLGRmakesanoteofthispoint of concern with respect to the two RAs in disguise as it would be compeltelyimpossibletodistinguishbetweenthemwithnakedeyesinalablesogeneratedwhichmay consequently lead to concerns related to spoofing and other kind of cyberirregularitiesThemotivetoclassthesetwoCPsas(blocking)variantsisbecausefullyrendered labels may mask the distinction between Bangla ra র (U+09B0) or theAssameseraৰ(U+09F0)ThatprovidesthejustificationforVariantSet4thoughonlyinthecontextoffollowingHasantThedifferencebetweentheRAsisonlydistinguishableifonelooksintotheirUnicodevaluesThereforelabelssuchasঅকHarka শীষH sırsalsquotopapexrsquo অY abhra lsquocloudthe skyrsquo ম śrama lsquophysical labourrsquo could be extremelydangerous as theweb-usermay never verify the digital content (the labels) with itsunicodevaluecodepoints ThispointismadeexplicitlywithreferencetoTable9(ofsequencesp36)andTable16(ofWLESymbolsp47)thataretofollowMoreoverit
4 OverallDevelopmentProcessandMethodologyThe Neo-Brāhmī Generation Panel (NBGP) has been formed by members havingexperience in Linguistics (especially in NLP Computational linguistics) LiteratureLanguageHistoryandEpigraphyUndertheNeo-BrāhmīGenerationPanelBanglaandeightotherscriptsbelongingtoseparateUnicodeblocksarebeingtakenuptoassignaseparate LGR for each However an attempt ismade to ensure that the fundamentalphilosophybehindbuildingthoseLGRsconsistentwithallotherBrāhmī-derivedscriptsThepresentLGRwillcater tomultiple languagesbelongingtoEGIDSscale1to4(seeTable4)thatuseBanglascriptThefollowingguidingprinciplesareusedinmakingdecisionsaboutBanglaLGRCode-points
41 GuidingPrinciplesTheNBGP adopts followingbroadprinciples for selection of code-points in the code-pointrepertoireacrosstheboardforalltheNeo-Brāhmīscriptswithinitsambit
411 InclusionPrinciples4111 ModernUsageEvery character proposed should be in the everyday usage of a particular linguisticcommunityThecharacterswhichhavebeenencodedintheUnicodefortranscriptionpurposesonlyorforarchivalpurposeswillnotbeconsideredforinclusioninthecode-pointrepertoire4112 UnambiguousUseEvery character proposed should have unambiguous understanding among linguistsaboutitsusageinthelanguage
42 ExclusionPrinciplesThe main exclusion principle is that of External Limits on Scope These consist ofprotocolsor standardswhichareprerequisites to theLabelGenerationRule-setsAllfurtherprinciplesareinfactsubsumedundertheselimitationsbuthavebeenspeltoutseparatelyforthesakeofclarity
22
421 ExternalLimitsofScopeThecodepointrepertoireforrootzonebeingaveryspecialcaseatthetopofprotocolhierarchies thecanvasofavailablecharacters forselectionasapartof theRootZonecodepointrepertoireisalreadyconstrainedbyvariousprotocollayersbeneathitThefollowingthreemainprotocolsstandardsactassuccessivefiltersiTheUnicodeChartOut of all the characters that are needed by the script in question if a particularcharacter is not encoded in Unicode it cannot be incorporated in the code pointrepertoire Such cases are quite rare and especially so in Bangla-Asamiyā-ManipuriWritingSystemgiventheelaborateandexhaustivecharacterinclusioneffortsmadebytheUnicodeconsortiumiiIDNAProtocolUnicode being the character-encoding standard for providing the maximum possiblerepresentation of a given scriptlanguage it has encoded as far as possible all thepossible characters needed by the script However the Domain name being aspecialized case it is governed by an additional protocol known as IDNA(InternationalizedDomainNames inApplications) The IDNAprotocol excludes somecharactersoutofUnicoderepertoirefrombeingpartofthedomainnamesiiiMaximalStartingRepertoire(MSR)TheRoot-zoneLGRbeing the repertoireof characterswhicharegoing tobeused forcreationoftheRoot-zoneTLDswhichinturnconstituteanevenmorespecializedcaseof domain names the ROOT LGR procedure introduces additional exclusions on theIDNArsquosallowedsetofcharacters ExampleBanglaSignAvagrahaঽ(U+093D)evenifallowedbyIDNAprotocol isnotpermittedintheRootZoneRepertoireaspertheMSRTosumuptherestrictionsstartoffwithadmittingonlysuchcharactersasarepartofthe code-block of the given scriptlanguage The IDNA Protocol further narrows thisdownandfinallyanadditionalfilterintheformofMaximalStartingRepertoirerestrictsthecharactersetassociatedwiththegivenlanguageevenmore4211 NoPunctuationMarksTheTLDsbeingidentifierspunctuationmarkerspresentinBraHami-basedscriptswillnotbeincluded
23
4212 NoSymbolsandAbbreviationsAbbreviations weights and measures and other such iconic characters like BANGLAISSHAR(U+09FA)BANGLACURRENCYDENOMINATORSIXTEEN৹(U+09F9)etcwillalsonotbeincluded4213 NoRareandObsoleteCharactersThere are characterswhich have been added toUnicode to accommodate rare formssuchasSanskriticVOCALICRRৠ(U+09E0)andVOCALICLldquoঌrdquo (U+098C)aswellasVOCALICLLৡ(U+09E1)andtheallographicndashkaraformsofthelattertwosymbols-VOWELSIGNVOCALICL(U+09E2)andVOWELSIGNVOCALICLLldquo(U+09E3)Allsuch charactersareexcludedwhich complieswith theConservatismprincipleas laiddownintheRootZoneLGRprocedureHoweverinBanglathe-karacorrespondingtoVOCALICRRৠ(U+09E0)whichisVOWELSIGNVOCALICRRldquordquo(U+09C4)isstill inactiveuseincertainlimitedborrowedorSanskriticwordsandarethereforeretained4214 NoStressMarkersofClassicalSanskritandVedicStressmarkers for classical Sanskrit will not be included This is also in consonancewiththeLetterprincipleaslaiddownintheRootZoneLGRprocedure4215 ABNFThe Augmented Backus-Naur Formalism (ABNF) is described in Section 541 andAppendix(Section101)
5 RepertoireTheBanglaWritingSystemisrepresentedinUNICODEusingtheBengali(Bangla)scriptname as enumerated in ISO 15924 corresponding to languages such as Asamiyā(Assamese) Bangla (Bengali) and Manipuri The BENGALI block used for Bangla-Asamiyā-Manipuri in theUNICODEhas93 entriesThis sectiondetails the code-pointrepertoirethattheNeo-BrāhmīGenerationPanel[NBGP]proposestobeincludedintheBanglaLGRItmaybementionedherethat theGovernmentofAssamhassubmittedaproposal toBureauof Indian Standards (BIS) on26th February2016 for dis-unificationofBanglaand Asamiyā Scripts The BIS inits 8thMeetingofIndian Language Technologies andProducts Sectional Committee LITD 20 held on 23rd Aug 2017 decided torefer theproposalforrecognitionofAssamesescriptinISOIEC10646toISOUntiltheUNICODEConsortiumtakesanyfurtheractionitwillbeassumedthattheCodePointRepertoireunderTable11willbevalidforallthethreelanguagesasabove
24
For each of the code points language references have been given in the last columntitledReferenceunderTable8titledtheldquoCodePointRepertoirerdquoForentirecoverageofBanglacodepointsreferencesofBanglaAsamiyā(Assamese)Manipuri(Meitei)andBishnupriya are given Kokborok written in Bangla script is not known to haveintroducedmanynewcomplicationsexceptforoneparticularcharacterThoughonlyafewrepresentativelanguagesunderEGIDSScale1-4havebeenchosenforreferencingthey together cover all the code-points required for all the languages that NBGP hasconsideredasgivenunderBanglaUnicodePoints(asgiveninUNICODE63)Howeverbefore thedetailsarepresented it is ideal to lookat theBanglaCodePointChartfromMaximalStartingRepertoire[MSR]Version3Itmaybenotedthattheshapesofthereferenceglyphsgivenbelowinthecodechartsarebasedononeofthemanyfontsdesigned and are not prescriptive because there could be some variations in actualfonts ndash both UNICODE-compatible and True-Type ones Consider the following Codepointtable
ColourconventionAllcharactersthatareincludedinthe[MSR]-YellowbackgroundPVALIDinIDNA2008butexcludedfromthe[MSR]-PinkishbackgroundNot PVALID in IDNA2008 or are ineligiblefor the root zone (digits hyphen) - Whitebackground
Apart from the above individual code-points the Neo-Brāhmī Generation Panel alsoproposes some specific sequences which enable conditional inclusion of the BanglaLETTER A and E followed by Bangla SIGN VIRAMA and Bangla LETTER YA againfollowed by Bangla VOWEL SIGN AA in the repertoire for enabling inclusion of aeligsoundasinEnglishlsquobatrsquolsquocatrsquoetc
52 CodePointRepertoireExclusionTherearesomecharactersoftheBanglascriptthatfindplaceintheUnicodebuthavenot been included in the repertoire in the LGR proposal The reason for excludingঌ(U+098C)andৗ(U+09D7)isthattheyarerareandobsoletecharacters
53 CodepointnotusedaloneBENGALI SIGN NUKTA U+09BC (See 336) is excluded from repertoire since it will never be used alone It will be used as sequence in three special characters in normalized form for ড় ঢ় য়
542 TheVowelSequenceInwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaare given To facilitate understanding of other Brahmi script users equivalents inDevanāgarīareprovidedwherevernecessaryAvowelsequenceismadeupofasinglevowelItmaybefollowedbutnotnecessarily(optionally)byanAnusvāraonuʃʃār(B)Candrabindu(D)oraVisargabiʃɔrgo(X)ThenumberofDBorXwhichcanfollowaV inBanglamaynotberestrictedtooneGoingbytherules illustratedinthedocument it isclearthat formationssuchasVDDVBBandVXXare invalidorthographicunitsHowever it isvalidandpossible tohaveformationsorsequencessuchasanusvarafollowedbyachandrabinduononehandandvisarga followed by a chandrabindu on the other as in হ8াংচা lsquohaelignchārsquo and lsquohaelignrsquo হ8াঃrespectivelyThepossibilityof aVisarga orAnusvāra (onuʃʃār) followingaCandrabinduexists inBangla Vowel can optionally be followed by a combination of Hasanta Virama [H]Consonant [C] to formaYa-phala ldquoYa-phala isapresentation formofU+09AFBanglaletterযorlsquoyarsquoRepresentedbythesequenceltU+09CDieBENGALISIGNVIRAMABangla SIGNHasanta or VIRAMA U+09AF -য BENGALI LETTER YAgt Ya-phala has aspecialformয়AgainwhencombinedwithU+09BEাBENGALIVOWELSIGNAA(ielsquoaarsquo(ā))itisusedfortranscribing[aelig]asintheldquoardquointheEnglishwordldquobatrdquowritteninBanglaasব8াটAVowel-sequenceadmitsthefollowingcombinations
5421 ASingleVowel
ExamplesV অअ
5422 AVowelwithConditionsAVowelcanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D]orVisarga[X]or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX] or combination ofHasanta(orVirama)[H]followedbyConsonant[C]followedbykāra(Mātrā)[M]
5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]
5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX
Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)
545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full
10 Refer to Rule P in Section 7 Table 16
42
vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+
6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed
61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus
স +Hasanta+হ(U+09B8+U+09CD+U+09B9)
2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus
ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)
44
Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)
62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference
7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecifications
11 Unicode uses Oriya for the script although Odia is now the official term used 12 As used by the Unicode denoting and including both Assamese and Maṇipuri
47
Below are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)
C rarr Consonant
M rarr Kāra(Mātrā)
V rarr Vowel
B rarr Anusvāra
D rarr Candrabindu
X rarr Visarga
H rarr Hasanta
Z rarr KhandaTa
S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules
P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA
Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinations
49
711 Case of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াtঅuইিvয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অu)andthesecondwordbeginswithaV(ইিvয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstwordThis isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule
72 AdditionalExamplesfromBanglaABNF
Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive
ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of the
JanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka
[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018
[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473
57
10 Appendix-I
101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply
1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াwেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াxব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexampleyেসিরং(Tsering)
102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti
13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section
58
Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)
Table17ndashTheScriptTableofSylhetiNagarıorSiloti
103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints
ক k Uuml(p+ক)raquo(p+ট)(p+ত)Yacute(p+y+র)(p+y+ব)THORN(p+ন)szlig(p+ব)agrave(p+ম)ক8(p+য)aacute(p+র)acirc(p+ষ)atilde(p+frac12+ণ)auml(p+frac12+ম)aring(p+frac12+ব)acirc8(p+frac12+য)aelig(p+স)t(ccedil+ক)egrave(+p+র)
amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)
গ g ecirc(micro+গ)euml(micro+দ)acute(micro+ধ)igrave(micro+ন)iacute(micro+ব)icirc(micro+ম)গ8(micro+য)iuml(micro+র)eth(micro+ল)ntilde(ccedil+গ)ntilde6 ( +ccedil+গ)
ণ n ouml(Agrave+ট)iquest(Agrave+ঠ)divide(Agrave+ড)oslash(Agrave+Atilde+র)AElig(Agrave+ঢ)ugrave(Agrave+ণ)ণ8(Agrave+য)uacute(Agrave+ব)atilde(p+frac12+ণ)ucirc(frac12+ণ)uuml(+ণ)
332TheImplicitVowelKillerHasanta(calledrsquoHalantrsquoorlsquoHalantarsquoinotherBrahmı-basedscripts)As stated earlier all consonants are pronounced in isolation with an implicit vowel(centralback-ɔinBanglaastheneutralvowel)assumedtobeassociatedwiththem[121]ThelsquoHasantarsquo(=rsquoHalantrsquoorlsquoHalantarsquoinotherBrahmı-basedscripts)orthetermlsquoVirāmarsquo7(=rsquoDa rirsquoinBangla)aspreferredinUNICODE(cfUnicode30andabove)havebeenusedinthisreportastermsthathavebeenusedtodenotethecharacterthatmarkthe absence of this inherent vowel It may be noted that the term virama has beenadopted in UNICODE in a sense that is different from the traditional definition ofgrammarandhenceitrequiressomeexplanationhereConsideringtheimportanceofthedocumentthisnoteshouldbeapartofthisLGRdocumentsothatanybodyreferingtoitshouldbeabletoknowthepropergrammaticalexplanationofthetermBecauseaspecialsignisneededwheneverthisimplicitvowelisstrippedoffthesymbolisknownas the Hasanta (= Halant) (U+09CD) By placing the Hasanta under the firstconsonantofacombinationorclusteronecouldndashincommonparlanceldquokillrdquoitsvowelandcreate conjuncts In thismanner conjunct characters canbegenerallywrittenbyjoining two to fourconsonant combinations In rarecases thisprocess can joinup tofive consonantsHowever thenotionof amaximumnumberof consonants joining toformoneaksara8istobeboundedempiricallyThisisanobservationbasedontheCIIL-Emille Corpora of Bangla words [132 amp 133] as seen in print these days Given themixture of scripts and languages happening on theweb the possibility that onemaywant a generic Top Level Domain [gTLD] which may have more than the observedmaximum cannot be ruled out This can be the case when a foreign language wordwhichadmitsalargenumberofconsonantsistransliteratedintoBanglaHenceintheBanglaLGRworkthislimitwillnotbeenforced
333VowelsSeparate symbols exist for all lsquoSwararsquo or Vowels in Bangla which are pronouncedindependentlyeitheratthebeginningofthewordorafteranothervowelorconsonantsoundToindicateaVowelsoundotherthantheimplicitoneaVowelsigncalledlsquokārrsquoinBanglaorMātrāinNagarı9isattachedtotheconsonantSincetheconsonanthasthisbuilt in neutral vowel at the end there are equivalent kāras (Mātrās) for all vowelsexcepttheঅ(pronounced-ɔ)Thecorrelationisshownasfollows
7VirāmaasusedhereisalsoamisnomeraccordingtotheIndiangrammaticaltraditionsNowheremereabsence of a vowel is marked as virama Hasanta just marks the absence of a vowel nothing else(AbhyankarKashinathVasudevampJMShukla1961ADictionaryofSanskritGrammarBarodaːOrientalInstitute)8ThistermneedstobedisambiguatedAksaraalsomeanslsquosyllablelsquoinIndiangrammaticaltreaditions9AlthoughthetermlsquoMatralsquoinBanglastandsforanaltogetherdifferentconceptvizthetopbarplacedoveraletterndashtypicallyavailableinHindiandBanglabutmissinginGujarati
334TheAnusvāraonuʃʃār(ং-U+0982)TheAnusvāra or onuʃʃār inBangla at times represents a homorganic nasal but notalwaysItreplacesaconjunctgroupofalsquoNasalConsonant+Hasanta+ConsonantrsquowherethesecondconsonantbelongstotheVelarvargaorsetasinলংকাButitoftenappearsalso for such combinations involving non-velars appearing as the lastmember of thecombinationasinলGাংটা ldquonakedrdquoorলGাংচা ldquoakindofsweettolimprdquoBeforeanon-vargaconsonant the Anusvara represents a nasal sound that may have an alternativeconjoined writing symbol representing the corresponding nasal consonant of theparticularsetAlthoughModernHindiMarathiandKonkaniprefertheanusvāratothecorrespondingHalf-nasal inBangla it isclearlydemarcatedastowhereonemustusetheAnusvāraandwhere ithastobeaconjunctclusterwithanasalasthefirstorthesecondcomponent
335NasalizationCandrabindu(-U+0981)
Candrabindu denotes nasalization of the preceding vowel as in চাদ cad lsquomoonrsquo(U+099AU+09BEU+0981U+09A6)Thissignwithadotinsidethehalf-moonmarkisusedasnasalizationmarkerinmanyBrahmı-basedscripts[143]
336Nukta(-U+09BC)ThenuktasigndoesnotexistinBanglaorthographyItispredominantlyusedinmanyBrahmıderivedscriptssuchasDevanagarı(forHindiBodoMaithiliSantaliKashmiriandSindhiThetermandtheconceptofnuktaareborrowedinBanglaTheIDNAProtocol(RFC5891)statesthatIDNsmustbeinUnicodeNormalizationFormC (NFC) RFC 7940 applies this requirement to LGRs The definition of NFC in theUnicodeStandardcontainsanumberofcompositionexclusionsAsaresulttheBanglalettersয় YYAড় RRAandঢ় RRHAhavetoberepresentedinthethisLGRbyusingthesequences (YA +Nukta U+9AF + U+09BC) (DDA + Nukta U+9A1 + U+09BC) and(DDHA+NuktaU+9A2+U+09BC)insteadofthesinglecodepointsYYA(U+9DF)RRA(U+09DC) andRRHA (U+09DD) although the useof lsquoNuktarsquo is otherwise completelyunnaturalinBanglaIt is noted that in the current Unicode Standard chart these characters are listed asadditionalconsonantsAspertheLGRProcedurehoweverthesedecisionsdependontheIDNAProtocolthroughasetofprodeduresdevelopedbytheIETFEventhoughtheUnicode Standard also prescribesmethods to produce these three characters both asatomiccharacters (forexample09DC forড় [r]09DD forঢ় [rh] and09DFasয় [y]assinglekeystroke)theIDNAprotocolrequiresthatwetreatthemasconjunctcharactersandthenallocatecodesfortheseintheUnicodeBengaliBlock
18
ItmaybenotedthattherecouldbesporadicattemptsorcasesofwritingMuslimnamesUrdupoeticwordsandPerso-Arabicloanwordswithnuktaunderক(k)খ(kh)গ(g)জ(j) and ফ (ph) only for the sake of correct pronunciation and for maintaining thesanctityoftheloanwordThesewerealsolikeusingBanglawritingsystemtoworkliketheIPAscriptItishowevernotinuseinBanglawritinginprinting
337Visargabiʃɔrgo(ঃ-U+0983)andAvagraha(ঽ-U+09BD)
TheVisargabiʃɔrgoU+0983 is frequentlyused inBangla loanwordsborrowed fromSanskritandrepresentsasoundveryclosetohOnecouldquoteasanexampleদঃখduhkholdquosorrowrsquorsquoldquounhappinessrsquorsquo(U+0926U+0941U+0983U+0916)The Avagraha ঽ (U+09BD) is mainly used in Sanskrit Pali Prakrt or Maithili textswritteninBanglaItisgraduallybeingreplacedbyanuppercomma(egনেরাঽপরািণre-writtenasনেরাrsquoপরািণ)ItisrarelyusednoweveninotherlanguagesusingBanglascriptIncaseofLGRtheAvagrahaisnotpartoftherepertoireIthasbeendecidedthereforenottoretainAvagraha(ঽ)(U+09BD)becauseitisblockedinTLDsaspertheMaximalStartingRepertoire(MSR)PleaseseeAppendixIIinsection11foracompletelistofBanglaconsonantsandtheirallographs
type in the following mannermdashর++য but for র G the sequence would be
র+ZWJ++য [154] In other words ZWJ is used in the rendering of wordsdemanding ya-phalā after ra which is otherwise not possible to type (render)due to the same order of ra+hasanta+antastha ja in the medial andor finalposition Interestingly ra+hasanta+antastha ja is used to type repha on theconsonant -antasthaja as inকায6 (kaarjo) In order to get a ya-phalā after the
consonant -ra it is therefore obligatory to use ZWJ after -ra as in র Gাপার
TheuseofZWJZWNJhavebeenruledoutfromtherootzonebythe[Procedure]Usedin Bangla to create alternate renderings the insertion of these two signs can affectsearchingaswellasNLPTheZeroWidthNon-joiner(ZWNJ)isaninvisiblecharacterusedincertaincases(afterHasanta)wheredefaultconjunctformationistobeexplicitlyrestrictedandtheHasantajoiningthetwoconsonantsparticipatingintheconjunctformationneedstobeexplicitlyshown
339UseofYa-phalaaYa-Phalaasequencesare two instances inBanglawhereHasanta isprecededbya fullvowel(U+0985অ-BENGALILETTERAandU+098Fএ-BENGALILETTERE)
For renderingYa-phalā followedbyঅ andএ it isnecessary to typeU+09CDHasantaplusU+09AFyaprecededby thesaidvowelsThis isapurely ligaturalentityand theadditionofYa-phalāandākaraisusedtoelicittheaeligsoundasinEnglishacidঅGািসড
association অGােসািসেয়শনlsquobatrsquoবGাটlsquofatrsquo ফGাট lsquomatrsquo মGাটlsquocaprsquoকGাপetcTheBrāhmīscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribed as lsquovowel killerrsquo although it actually indicates absenceof a vowel after themarkedconsonantOnly theconsonantscanhave theHasantamarkedButasweseehereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅ8াandএ8া(CfUnicode100p473[100])
Owingtoco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egT ieka+Hasanta+raasinচTchakraldquocyclerdquo)Thepointisinboth the cases the slot for ra could be Bangla ra র (U+09B0) or the Assamese ra ৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesameTheLGRmakesanoteofthispoint of concern with respect to the two RAs in disguise as it would be compeltelyimpossibletodistinguishbetweenthemwithnakedeyesinalablesogeneratedwhichmay consequently lead to concerns related to spoofing and other kind of cyberirregularitiesThemotivetoclassthesetwoCPsas(blocking)variantsisbecausefullyrendered labels may mask the distinction between Bangla ra র (U+09B0) or theAssameseraৰ(U+09F0)ThatprovidesthejustificationforVariantSet4thoughonlyinthecontextoffollowingHasantThedifferencebetweentheRAsisonlydistinguishableifonelooksintotheirUnicodevaluesThereforelabelssuchasঅকHarka শীষH sırsalsquotopapexrsquo অY abhra lsquocloudthe skyrsquo ম śrama lsquophysical labourrsquo could be extremelydangerous as theweb-usermay never verify the digital content (the labels) with itsunicodevaluecodepoints ThispointismadeexplicitlywithreferencetoTable9(ofsequencesp36)andTable16(ofWLESymbolsp47)thataretofollowMoreoverit
4 OverallDevelopmentProcessandMethodologyThe Neo-Brāhmī Generation Panel (NBGP) has been formed by members havingexperience in Linguistics (especially in NLP Computational linguistics) LiteratureLanguageHistoryandEpigraphyUndertheNeo-BrāhmīGenerationPanelBanglaandeightotherscriptsbelongingtoseparateUnicodeblocksarebeingtakenuptoassignaseparate LGR for each However an attempt ismade to ensure that the fundamentalphilosophybehindbuildingthoseLGRsconsistentwithallotherBrāhmī-derivedscriptsThepresentLGRwillcater tomultiple languagesbelongingtoEGIDSscale1to4(seeTable4)thatuseBanglascriptThefollowingguidingprinciplesareusedinmakingdecisionsaboutBanglaLGRCode-points
41 GuidingPrinciplesTheNBGP adopts followingbroadprinciples for selection of code-points in the code-pointrepertoireacrosstheboardforalltheNeo-Brāhmīscriptswithinitsambit
411 InclusionPrinciples4111 ModernUsageEvery character proposed should be in the everyday usage of a particular linguisticcommunityThecharacterswhichhavebeenencodedintheUnicodefortranscriptionpurposesonlyorforarchivalpurposeswillnotbeconsideredforinclusioninthecode-pointrepertoire4112 UnambiguousUseEvery character proposed should have unambiguous understanding among linguistsaboutitsusageinthelanguage
42 ExclusionPrinciplesThe main exclusion principle is that of External Limits on Scope These consist ofprotocolsor standardswhichareprerequisites to theLabelGenerationRule-setsAllfurtherprinciplesareinfactsubsumedundertheselimitationsbuthavebeenspeltoutseparatelyforthesakeofclarity
22
421 ExternalLimitsofScopeThecodepointrepertoireforrootzonebeingaveryspecialcaseatthetopofprotocolhierarchies thecanvasofavailablecharacters forselectionasapartof theRootZonecodepointrepertoireisalreadyconstrainedbyvariousprotocollayersbeneathitThefollowingthreemainprotocolsstandardsactassuccessivefiltersiTheUnicodeChartOut of all the characters that are needed by the script in question if a particularcharacter is not encoded in Unicode it cannot be incorporated in the code pointrepertoire Such cases are quite rare and especially so in Bangla-Asamiyā-ManipuriWritingSystemgiventheelaborateandexhaustivecharacterinclusioneffortsmadebytheUnicodeconsortiumiiIDNAProtocolUnicode being the character-encoding standard for providing the maximum possiblerepresentation of a given scriptlanguage it has encoded as far as possible all thepossible characters needed by the script However the Domain name being aspecialized case it is governed by an additional protocol known as IDNA(InternationalizedDomainNames inApplications) The IDNAprotocol excludes somecharactersoutofUnicoderepertoirefrombeingpartofthedomainnamesiiiMaximalStartingRepertoire(MSR)TheRoot-zoneLGRbeing the repertoireof characterswhicharegoing tobeused forcreationoftheRoot-zoneTLDswhichinturnconstituteanevenmorespecializedcaseof domain names the ROOT LGR procedure introduces additional exclusions on theIDNArsquosallowedsetofcharacters ExampleBanglaSignAvagrahaঽ(U+093D)evenifallowedbyIDNAprotocol isnotpermittedintheRootZoneRepertoireaspertheMSRTosumuptherestrictionsstartoffwithadmittingonlysuchcharactersasarepartofthe code-block of the given scriptlanguage The IDNA Protocol further narrows thisdownandfinallyanadditionalfilterintheformofMaximalStartingRepertoirerestrictsthecharactersetassociatedwiththegivenlanguageevenmore4211 NoPunctuationMarksTheTLDsbeingidentifierspunctuationmarkerspresentinBraHami-basedscriptswillnotbeincluded
23
4212 NoSymbolsandAbbreviationsAbbreviations weights and measures and other such iconic characters like BANGLAISSHAR(U+09FA)BANGLACURRENCYDENOMINATORSIXTEEN৹(U+09F9)etcwillalsonotbeincluded4213 NoRareandObsoleteCharactersThere are characterswhich have been added toUnicode to accommodate rare formssuchasSanskriticVOCALICRRৠ(U+09E0)andVOCALICLldquoঌrdquo (U+098C)aswellasVOCALICLLৡ(U+09E1)andtheallographicndashkaraformsofthelattertwosymbols-VOWELSIGNVOCALICL(U+09E2)andVOWELSIGNVOCALICLLldquo(U+09E3)Allsuch charactersareexcludedwhich complieswith theConservatismprincipleas laiddownintheRootZoneLGRprocedureHoweverinBanglathe-karacorrespondingtoVOCALICRRৠ(U+09E0)whichisVOWELSIGNVOCALICRRldquordquo(U+09C4)isstill inactiveuseincertainlimitedborrowedorSanskriticwordsandarethereforeretained4214 NoStressMarkersofClassicalSanskritandVedicStressmarkers for classical Sanskrit will not be included This is also in consonancewiththeLetterprincipleaslaiddownintheRootZoneLGRprocedure4215 ABNFThe Augmented Backus-Naur Formalism (ABNF) is described in Section 541 andAppendix(Section101)
5 RepertoireTheBanglaWritingSystemisrepresentedinUNICODEusingtheBengali(Bangla)scriptname as enumerated in ISO 15924 corresponding to languages such as Asamiyā(Assamese) Bangla (Bengali) and Manipuri The BENGALI block used for Bangla-Asamiyā-Manipuri in theUNICODEhas93 entriesThis sectiondetails the code-pointrepertoirethattheNeo-BrāhmīGenerationPanel[NBGP]proposestobeincludedintheBanglaLGRItmaybementionedherethat theGovernmentofAssamhassubmittedaproposal toBureauof Indian Standards (BIS) on26th February2016 for dis-unificationofBanglaand Asamiyā Scripts The BIS inits 8thMeetingofIndian Language Technologies andProducts Sectional Committee LITD 20 held on 23rd Aug 2017 decided torefer theproposalforrecognitionofAssamesescriptinISOIEC10646toISOUntiltheUNICODEConsortiumtakesanyfurtheractionitwillbeassumedthattheCodePointRepertoireunderTable11willbevalidforallthethreelanguagesasabove
24
For each of the code points language references have been given in the last columntitledReferenceunderTable8titledtheldquoCodePointRepertoirerdquoForentirecoverageofBanglacodepointsreferencesofBanglaAsamiyā(Assamese)Manipuri(Meitei)andBishnupriya are given Kokborok written in Bangla script is not known to haveintroducedmanynewcomplicationsexceptforoneparticularcharacterThoughonlyafewrepresentativelanguagesunderEGIDSScale1-4havebeenchosenforreferencingthey together cover all the code-points required for all the languages that NBGP hasconsideredasgivenunderBanglaUnicodePoints(asgiveninUNICODE63)Howeverbefore thedetailsarepresented it is ideal to lookat theBanglaCodePointChartfromMaximalStartingRepertoire[MSR]Version3Itmaybenotedthattheshapesofthereferenceglyphsgivenbelowinthecodechartsarebasedononeofthemanyfontsdesigned and are not prescriptive because there could be some variations in actualfonts ndash both UNICODE-compatible and True-Type ones Consider the following Codepointtable
ColourconventionAllcharactersthatareincludedinthe[MSR]-YellowbackgroundPVALIDinIDNA2008butexcludedfromthe[MSR]-PinkishbackgroundNot PVALID in IDNA2008 or are ineligiblefor the root zone (digits hyphen) - Whitebackground
Apart from the above individual code-points the Neo-Brāhmī Generation Panel alsoproposes some specific sequences which enable conditional inclusion of the BanglaLETTER A and E followed by Bangla SIGN VIRAMA and Bangla LETTER YA againfollowed by Bangla VOWEL SIGN AA in the repertoire for enabling inclusion of aeligsoundasinEnglishlsquobatrsquolsquocatrsquoetc
52 CodePointRepertoireExclusionTherearesomecharactersoftheBanglascriptthatfindplaceintheUnicodebuthavenot been included in the repertoire in the LGR proposal The reason for excludingঌ(U+098C)andৗ(U+09D7)isthattheyarerareandobsoletecharacters
53 CodepointnotusedaloneBENGALI SIGN NUKTA U+09BC (See 336) is excluded from repertoire since it will never be used alone It will be used as sequence in three special characters in normalized form for ড় ঢ় য়
542 TheVowelSequenceInwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaare given To facilitate understanding of other Brahmi script users equivalents inDevanāgarīareprovidedwherevernecessaryAvowelsequenceismadeupofasinglevowelItmaybefollowedbutnotnecessarily(optionally)byanAnusvāraonuʃʃār(B)Candrabindu(D)oraVisargabiʃɔrgo(X)ThenumberofDBorXwhichcanfollowaV inBanglamaynotberestrictedtooneGoingbytherules illustratedinthedocument it isclearthat formationssuchasVDDVBBandVXXare invalidorthographicunitsHowever it isvalidandpossible tohaveformationsorsequencessuchasanusvarafollowedbyachandrabinduononehandandvisarga followed by a chandrabindu on the other as in হ8াংচা lsquohaelignchārsquo and lsquohaelignrsquo হ8াঃrespectivelyThepossibilityof aVisarga orAnusvāra (onuʃʃār) followingaCandrabinduexists inBangla Vowel can optionally be followed by a combination of Hasanta Virama [H]Consonant [C] to formaYa-phala ldquoYa-phala isapresentation formofU+09AFBanglaletterযorlsquoyarsquoRepresentedbythesequenceltU+09CDieBENGALISIGNVIRAMABangla SIGNHasanta or VIRAMA U+09AF -য BENGALI LETTER YAgt Ya-phala has aspecialformয়AgainwhencombinedwithU+09BEাBENGALIVOWELSIGNAA(ielsquoaarsquo(ā))itisusedfortranscribing[aelig]asintheldquoardquointheEnglishwordldquobatrdquowritteninBanglaasব8াটAVowel-sequenceadmitsthefollowingcombinations
5421 ASingleVowel
ExamplesV অअ
5422 AVowelwithConditionsAVowelcanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D]orVisarga[X]or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX] or combination ofHasanta(orVirama)[H]followedbyConsonant[C]followedbykāra(Mātrā)[M]
5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]
5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX
Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)
545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full
10 Refer to Rule P in Section 7 Table 16
42
vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+
6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed
61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus
স +Hasanta+হ(U+09B8+U+09CD+U+09B9)
2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus
ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)
44
Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)
62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference
7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecifications
11 Unicode uses Oriya for the script although Odia is now the official term used 12 As used by the Unicode denoting and including both Assamese and Maṇipuri
47
Below are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)
C rarr Consonant
M rarr Kāra(Mātrā)
V rarr Vowel
B rarr Anusvāra
D rarr Candrabindu
X rarr Visarga
H rarr Hasanta
Z rarr KhandaTa
S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules
P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA
Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinations
49
711 Case of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াtঅuইিvয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অu)andthesecondwordbeginswithaV(ইিvয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstwordThis isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule
72 AdditionalExamplesfromBanglaABNF
Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive
ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of the
JanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka
[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018
[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473
57
10 Appendix-I
101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply
1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াwেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াxব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexampleyেসিরং(Tsering)
102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti
13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section
58
Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)
Table17ndashTheScriptTableofSylhetiNagarıorSiloti
103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints
ক k Uuml(p+ক)raquo(p+ট)(p+ত)Yacute(p+y+র)(p+y+ব)THORN(p+ন)szlig(p+ব)agrave(p+ম)ক8(p+য)aacute(p+র)acirc(p+ষ)atilde(p+frac12+ণ)auml(p+frac12+ম)aring(p+frac12+ব)acirc8(p+frac12+য)aelig(p+স)t(ccedil+ক)egrave(+p+র)
amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)
গ g ecirc(micro+গ)euml(micro+দ)acute(micro+ধ)igrave(micro+ন)iacute(micro+ব)icirc(micro+ম)গ8(micro+য)iuml(micro+র)eth(micro+ল)ntilde(ccedil+গ)ntilde6 ( +ccedil+গ)
ণ n ouml(Agrave+ট)iquest(Agrave+ঠ)divide(Agrave+ড)oslash(Agrave+Atilde+র)AElig(Agrave+ঢ)ugrave(Agrave+ণ)ণ8(Agrave+য)uacute(Agrave+ব)atilde(p+frac12+ণ)ucirc(frac12+ণ)uuml(+ণ)
334TheAnusvāraonuʃʃār(ং-U+0982)TheAnusvāra or onuʃʃār inBangla at times represents a homorganic nasal but notalwaysItreplacesaconjunctgroupofalsquoNasalConsonant+Hasanta+ConsonantrsquowherethesecondconsonantbelongstotheVelarvargaorsetasinলংকাButitoftenappearsalso for such combinations involving non-velars appearing as the lastmember of thecombinationasinলGাংটা ldquonakedrdquoorলGাংচা ldquoakindofsweettolimprdquoBeforeanon-vargaconsonant the Anusvara represents a nasal sound that may have an alternativeconjoined writing symbol representing the corresponding nasal consonant of theparticularsetAlthoughModernHindiMarathiandKonkaniprefertheanusvāratothecorrespondingHalf-nasal inBangla it isclearlydemarcatedastowhereonemustusetheAnusvāraandwhere ithastobeaconjunctclusterwithanasalasthefirstorthesecondcomponent
335NasalizationCandrabindu(-U+0981)
Candrabindu denotes nasalization of the preceding vowel as in চাদ cad lsquomoonrsquo(U+099AU+09BEU+0981U+09A6)Thissignwithadotinsidethehalf-moonmarkisusedasnasalizationmarkerinmanyBrahmı-basedscripts[143]
336Nukta(-U+09BC)ThenuktasigndoesnotexistinBanglaorthographyItispredominantlyusedinmanyBrahmıderivedscriptssuchasDevanagarı(forHindiBodoMaithiliSantaliKashmiriandSindhiThetermandtheconceptofnuktaareborrowedinBanglaTheIDNAProtocol(RFC5891)statesthatIDNsmustbeinUnicodeNormalizationFormC (NFC) RFC 7940 applies this requirement to LGRs The definition of NFC in theUnicodeStandardcontainsanumberofcompositionexclusionsAsaresulttheBanglalettersয় YYAড় RRAandঢ় RRHAhavetoberepresentedinthethisLGRbyusingthesequences (YA +Nukta U+9AF + U+09BC) (DDA + Nukta U+9A1 + U+09BC) and(DDHA+NuktaU+9A2+U+09BC)insteadofthesinglecodepointsYYA(U+9DF)RRA(U+09DC) andRRHA (U+09DD) although the useof lsquoNuktarsquo is otherwise completelyunnaturalinBanglaIt is noted that in the current Unicode Standard chart these characters are listed asadditionalconsonantsAspertheLGRProcedurehoweverthesedecisionsdependontheIDNAProtocolthroughasetofprodeduresdevelopedbytheIETFEventhoughtheUnicode Standard also prescribesmethods to produce these three characters both asatomiccharacters (forexample09DC forড় [r]09DD forঢ় [rh] and09DFasয় [y]assinglekeystroke)theIDNAprotocolrequiresthatwetreatthemasconjunctcharactersandthenallocatecodesfortheseintheUnicodeBengaliBlock
18
ItmaybenotedthattherecouldbesporadicattemptsorcasesofwritingMuslimnamesUrdupoeticwordsandPerso-Arabicloanwordswithnuktaunderক(k)খ(kh)গ(g)জ(j) and ফ (ph) only for the sake of correct pronunciation and for maintaining thesanctityoftheloanwordThesewerealsolikeusingBanglawritingsystemtoworkliketheIPAscriptItishowevernotinuseinBanglawritinginprinting
337Visargabiʃɔrgo(ঃ-U+0983)andAvagraha(ঽ-U+09BD)
TheVisargabiʃɔrgoU+0983 is frequentlyused inBangla loanwordsborrowed fromSanskritandrepresentsasoundveryclosetohOnecouldquoteasanexampleদঃখduhkholdquosorrowrsquorsquoldquounhappinessrsquorsquo(U+0926U+0941U+0983U+0916)The Avagraha ঽ (U+09BD) is mainly used in Sanskrit Pali Prakrt or Maithili textswritteninBanglaItisgraduallybeingreplacedbyanuppercomma(egনেরাঽপরািণre-writtenasনেরাrsquoপরািণ)ItisrarelyusednoweveninotherlanguagesusingBanglascriptIncaseofLGRtheAvagrahaisnotpartoftherepertoireIthasbeendecidedthereforenottoretainAvagraha(ঽ)(U+09BD)becauseitisblockedinTLDsaspertheMaximalStartingRepertoire(MSR)PleaseseeAppendixIIinsection11foracompletelistofBanglaconsonantsandtheirallographs
type in the following mannermdashর++য but for র G the sequence would be
র+ZWJ++য [154] In other words ZWJ is used in the rendering of wordsdemanding ya-phalā after ra which is otherwise not possible to type (render)due to the same order of ra+hasanta+antastha ja in the medial andor finalposition Interestingly ra+hasanta+antastha ja is used to type repha on theconsonant -antasthaja as inকায6 (kaarjo) In order to get a ya-phalā after the
consonant -ra it is therefore obligatory to use ZWJ after -ra as in র Gাপার
TheuseofZWJZWNJhavebeenruledoutfromtherootzonebythe[Procedure]Usedin Bangla to create alternate renderings the insertion of these two signs can affectsearchingaswellasNLPTheZeroWidthNon-joiner(ZWNJ)isaninvisiblecharacterusedincertaincases(afterHasanta)wheredefaultconjunctformationistobeexplicitlyrestrictedandtheHasantajoiningthetwoconsonantsparticipatingintheconjunctformationneedstobeexplicitlyshown
339UseofYa-phalaaYa-Phalaasequencesare two instances inBanglawhereHasanta isprecededbya fullvowel(U+0985অ-BENGALILETTERAandU+098Fএ-BENGALILETTERE)
For renderingYa-phalā followedbyঅ andএ it isnecessary to typeU+09CDHasantaplusU+09AFyaprecededby thesaidvowelsThis isapurely ligaturalentityand theadditionofYa-phalāandākaraisusedtoelicittheaeligsoundasinEnglishacidঅGািসড
association অGােসািসেয়শনlsquobatrsquoবGাটlsquofatrsquo ফGাট lsquomatrsquo মGাটlsquocaprsquoকGাপetcTheBrāhmīscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribed as lsquovowel killerrsquo although it actually indicates absenceof a vowel after themarkedconsonantOnly theconsonantscanhave theHasantamarkedButasweseehereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅ8াandএ8া(CfUnicode100p473[100])
Owingtoco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egT ieka+Hasanta+raasinচTchakraldquocyclerdquo)Thepointisinboth the cases the slot for ra could be Bangla ra র (U+09B0) or the Assamese ra ৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesameTheLGRmakesanoteofthispoint of concern with respect to the two RAs in disguise as it would be compeltelyimpossibletodistinguishbetweenthemwithnakedeyesinalablesogeneratedwhichmay consequently lead to concerns related to spoofing and other kind of cyberirregularitiesThemotivetoclassthesetwoCPsas(blocking)variantsisbecausefullyrendered labels may mask the distinction between Bangla ra র (U+09B0) or theAssameseraৰ(U+09F0)ThatprovidesthejustificationforVariantSet4thoughonlyinthecontextoffollowingHasantThedifferencebetweentheRAsisonlydistinguishableifonelooksintotheirUnicodevaluesThereforelabelssuchasঅকHarka শীষH sırsalsquotopapexrsquo অY abhra lsquocloudthe skyrsquo ম śrama lsquophysical labourrsquo could be extremelydangerous as theweb-usermay never verify the digital content (the labels) with itsunicodevaluecodepoints ThispointismadeexplicitlywithreferencetoTable9(ofsequencesp36)andTable16(ofWLESymbolsp47)thataretofollowMoreoverit
4 OverallDevelopmentProcessandMethodologyThe Neo-Brāhmī Generation Panel (NBGP) has been formed by members havingexperience in Linguistics (especially in NLP Computational linguistics) LiteratureLanguageHistoryandEpigraphyUndertheNeo-BrāhmīGenerationPanelBanglaandeightotherscriptsbelongingtoseparateUnicodeblocksarebeingtakenuptoassignaseparate LGR for each However an attempt ismade to ensure that the fundamentalphilosophybehindbuildingthoseLGRsconsistentwithallotherBrāhmī-derivedscriptsThepresentLGRwillcater tomultiple languagesbelongingtoEGIDSscale1to4(seeTable4)thatuseBanglascriptThefollowingguidingprinciplesareusedinmakingdecisionsaboutBanglaLGRCode-points
41 GuidingPrinciplesTheNBGP adopts followingbroadprinciples for selection of code-points in the code-pointrepertoireacrosstheboardforalltheNeo-Brāhmīscriptswithinitsambit
411 InclusionPrinciples4111 ModernUsageEvery character proposed should be in the everyday usage of a particular linguisticcommunityThecharacterswhichhavebeenencodedintheUnicodefortranscriptionpurposesonlyorforarchivalpurposeswillnotbeconsideredforinclusioninthecode-pointrepertoire4112 UnambiguousUseEvery character proposed should have unambiguous understanding among linguistsaboutitsusageinthelanguage
42 ExclusionPrinciplesThe main exclusion principle is that of External Limits on Scope These consist ofprotocolsor standardswhichareprerequisites to theLabelGenerationRule-setsAllfurtherprinciplesareinfactsubsumedundertheselimitationsbuthavebeenspeltoutseparatelyforthesakeofclarity
22
421 ExternalLimitsofScopeThecodepointrepertoireforrootzonebeingaveryspecialcaseatthetopofprotocolhierarchies thecanvasofavailablecharacters forselectionasapartof theRootZonecodepointrepertoireisalreadyconstrainedbyvariousprotocollayersbeneathitThefollowingthreemainprotocolsstandardsactassuccessivefiltersiTheUnicodeChartOut of all the characters that are needed by the script in question if a particularcharacter is not encoded in Unicode it cannot be incorporated in the code pointrepertoire Such cases are quite rare and especially so in Bangla-Asamiyā-ManipuriWritingSystemgiventheelaborateandexhaustivecharacterinclusioneffortsmadebytheUnicodeconsortiumiiIDNAProtocolUnicode being the character-encoding standard for providing the maximum possiblerepresentation of a given scriptlanguage it has encoded as far as possible all thepossible characters needed by the script However the Domain name being aspecialized case it is governed by an additional protocol known as IDNA(InternationalizedDomainNames inApplications) The IDNAprotocol excludes somecharactersoutofUnicoderepertoirefrombeingpartofthedomainnamesiiiMaximalStartingRepertoire(MSR)TheRoot-zoneLGRbeing the repertoireof characterswhicharegoing tobeused forcreationoftheRoot-zoneTLDswhichinturnconstituteanevenmorespecializedcaseof domain names the ROOT LGR procedure introduces additional exclusions on theIDNArsquosallowedsetofcharacters ExampleBanglaSignAvagrahaঽ(U+093D)evenifallowedbyIDNAprotocol isnotpermittedintheRootZoneRepertoireaspertheMSRTosumuptherestrictionsstartoffwithadmittingonlysuchcharactersasarepartofthe code-block of the given scriptlanguage The IDNA Protocol further narrows thisdownandfinallyanadditionalfilterintheformofMaximalStartingRepertoirerestrictsthecharactersetassociatedwiththegivenlanguageevenmore4211 NoPunctuationMarksTheTLDsbeingidentifierspunctuationmarkerspresentinBraHami-basedscriptswillnotbeincluded
23
4212 NoSymbolsandAbbreviationsAbbreviations weights and measures and other such iconic characters like BANGLAISSHAR(U+09FA)BANGLACURRENCYDENOMINATORSIXTEEN৹(U+09F9)etcwillalsonotbeincluded4213 NoRareandObsoleteCharactersThere are characterswhich have been added toUnicode to accommodate rare formssuchasSanskriticVOCALICRRৠ(U+09E0)andVOCALICLldquoঌrdquo (U+098C)aswellasVOCALICLLৡ(U+09E1)andtheallographicndashkaraformsofthelattertwosymbols-VOWELSIGNVOCALICL(U+09E2)andVOWELSIGNVOCALICLLldquo(U+09E3)Allsuch charactersareexcludedwhich complieswith theConservatismprincipleas laiddownintheRootZoneLGRprocedureHoweverinBanglathe-karacorrespondingtoVOCALICRRৠ(U+09E0)whichisVOWELSIGNVOCALICRRldquordquo(U+09C4)isstill inactiveuseincertainlimitedborrowedorSanskriticwordsandarethereforeretained4214 NoStressMarkersofClassicalSanskritandVedicStressmarkers for classical Sanskrit will not be included This is also in consonancewiththeLetterprincipleaslaiddownintheRootZoneLGRprocedure4215 ABNFThe Augmented Backus-Naur Formalism (ABNF) is described in Section 541 andAppendix(Section101)
5 RepertoireTheBanglaWritingSystemisrepresentedinUNICODEusingtheBengali(Bangla)scriptname as enumerated in ISO 15924 corresponding to languages such as Asamiyā(Assamese) Bangla (Bengali) and Manipuri The BENGALI block used for Bangla-Asamiyā-Manipuri in theUNICODEhas93 entriesThis sectiondetails the code-pointrepertoirethattheNeo-BrāhmīGenerationPanel[NBGP]proposestobeincludedintheBanglaLGRItmaybementionedherethat theGovernmentofAssamhassubmittedaproposal toBureauof Indian Standards (BIS) on26th February2016 for dis-unificationofBanglaand Asamiyā Scripts The BIS inits 8thMeetingofIndian Language Technologies andProducts Sectional Committee LITD 20 held on 23rd Aug 2017 decided torefer theproposalforrecognitionofAssamesescriptinISOIEC10646toISOUntiltheUNICODEConsortiumtakesanyfurtheractionitwillbeassumedthattheCodePointRepertoireunderTable11willbevalidforallthethreelanguagesasabove
24
For each of the code points language references have been given in the last columntitledReferenceunderTable8titledtheldquoCodePointRepertoirerdquoForentirecoverageofBanglacodepointsreferencesofBanglaAsamiyā(Assamese)Manipuri(Meitei)andBishnupriya are given Kokborok written in Bangla script is not known to haveintroducedmanynewcomplicationsexceptforoneparticularcharacterThoughonlyafewrepresentativelanguagesunderEGIDSScale1-4havebeenchosenforreferencingthey together cover all the code-points required for all the languages that NBGP hasconsideredasgivenunderBanglaUnicodePoints(asgiveninUNICODE63)Howeverbefore thedetailsarepresented it is ideal to lookat theBanglaCodePointChartfromMaximalStartingRepertoire[MSR]Version3Itmaybenotedthattheshapesofthereferenceglyphsgivenbelowinthecodechartsarebasedononeofthemanyfontsdesigned and are not prescriptive because there could be some variations in actualfonts ndash both UNICODE-compatible and True-Type ones Consider the following Codepointtable
ColourconventionAllcharactersthatareincludedinthe[MSR]-YellowbackgroundPVALIDinIDNA2008butexcludedfromthe[MSR]-PinkishbackgroundNot PVALID in IDNA2008 or are ineligiblefor the root zone (digits hyphen) - Whitebackground
Apart from the above individual code-points the Neo-Brāhmī Generation Panel alsoproposes some specific sequences which enable conditional inclusion of the BanglaLETTER A and E followed by Bangla SIGN VIRAMA and Bangla LETTER YA againfollowed by Bangla VOWEL SIGN AA in the repertoire for enabling inclusion of aeligsoundasinEnglishlsquobatrsquolsquocatrsquoetc
52 CodePointRepertoireExclusionTherearesomecharactersoftheBanglascriptthatfindplaceintheUnicodebuthavenot been included in the repertoire in the LGR proposal The reason for excludingঌ(U+098C)andৗ(U+09D7)isthattheyarerareandobsoletecharacters
53 CodepointnotusedaloneBENGALI SIGN NUKTA U+09BC (See 336) is excluded from repertoire since it will never be used alone It will be used as sequence in three special characters in normalized form for ড় ঢ় য়
542 TheVowelSequenceInwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaare given To facilitate understanding of other Brahmi script users equivalents inDevanāgarīareprovidedwherevernecessaryAvowelsequenceismadeupofasinglevowelItmaybefollowedbutnotnecessarily(optionally)byanAnusvāraonuʃʃār(B)Candrabindu(D)oraVisargabiʃɔrgo(X)ThenumberofDBorXwhichcanfollowaV inBanglamaynotberestrictedtooneGoingbytherules illustratedinthedocument it isclearthat formationssuchasVDDVBBandVXXare invalidorthographicunitsHowever it isvalidandpossible tohaveformationsorsequencessuchasanusvarafollowedbyachandrabinduononehandandvisarga followed by a chandrabindu on the other as in হ8াংচা lsquohaelignchārsquo and lsquohaelignrsquo হ8াঃrespectivelyThepossibilityof aVisarga orAnusvāra (onuʃʃār) followingaCandrabinduexists inBangla Vowel can optionally be followed by a combination of Hasanta Virama [H]Consonant [C] to formaYa-phala ldquoYa-phala isapresentation formofU+09AFBanglaletterযorlsquoyarsquoRepresentedbythesequenceltU+09CDieBENGALISIGNVIRAMABangla SIGNHasanta or VIRAMA U+09AF -য BENGALI LETTER YAgt Ya-phala has aspecialformয়AgainwhencombinedwithU+09BEাBENGALIVOWELSIGNAA(ielsquoaarsquo(ā))itisusedfortranscribing[aelig]asintheldquoardquointheEnglishwordldquobatrdquowritteninBanglaasব8াটAVowel-sequenceadmitsthefollowingcombinations
5421 ASingleVowel
ExamplesV অअ
5422 AVowelwithConditionsAVowelcanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D]orVisarga[X]or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX] or combination ofHasanta(orVirama)[H]followedbyConsonant[C]followedbykāra(Mātrā)[M]
5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]
5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX
Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)
545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full
10 Refer to Rule P in Section 7 Table 16
42
vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+
6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed
61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus
স +Hasanta+হ(U+09B8+U+09CD+U+09B9)
2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus
ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)
44
Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)
62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference
7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecifications
11 Unicode uses Oriya for the script although Odia is now the official term used 12 As used by the Unicode denoting and including both Assamese and Maṇipuri
47
Below are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)
C rarr Consonant
M rarr Kāra(Mātrā)
V rarr Vowel
B rarr Anusvāra
D rarr Candrabindu
X rarr Visarga
H rarr Hasanta
Z rarr KhandaTa
S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules
P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA
Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinations
49
711 Case of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াtঅuইিvয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অu)andthesecondwordbeginswithaV(ইিvয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstwordThis isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule
72 AdditionalExamplesfromBanglaABNF
Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive
ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of the
JanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka
[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018
[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473
57
10 Appendix-I
101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply
1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াwেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াxব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexampleyেসিরং(Tsering)
102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti
13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section
58
Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)
Table17ndashTheScriptTableofSylhetiNagarıorSiloti
103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints
ক k Uuml(p+ক)raquo(p+ট)(p+ত)Yacute(p+y+র)(p+y+ব)THORN(p+ন)szlig(p+ব)agrave(p+ম)ক8(p+য)aacute(p+র)acirc(p+ষ)atilde(p+frac12+ণ)auml(p+frac12+ম)aring(p+frac12+ব)acirc8(p+frac12+য)aelig(p+স)t(ccedil+ক)egrave(+p+র)
amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)
গ g ecirc(micro+গ)euml(micro+দ)acute(micro+ধ)igrave(micro+ন)iacute(micro+ব)icirc(micro+ম)গ8(micro+য)iuml(micro+র)eth(micro+ল)ntilde(ccedil+গ)ntilde6 ( +ccedil+গ)
ণ n ouml(Agrave+ট)iquest(Agrave+ঠ)divide(Agrave+ড)oslash(Agrave+Atilde+র)AElig(Agrave+ঢ)ugrave(Agrave+ণ)ণ8(Agrave+য)uacute(Agrave+ব)atilde(p+frac12+ণ)ucirc(frac12+ণ)uuml(+ণ)
334TheAnusvāraonuʃʃār(ং-U+0982)TheAnusvāra or onuʃʃār inBangla at times represents a homorganic nasal but notalwaysItreplacesaconjunctgroupofalsquoNasalConsonant+Hasanta+ConsonantrsquowherethesecondconsonantbelongstotheVelarvargaorsetasinলংকাButitoftenappearsalso for such combinations involving non-velars appearing as the lastmember of thecombinationasinলGাংটা ldquonakedrdquoorলGাংচা ldquoakindofsweettolimprdquoBeforeanon-vargaconsonant the Anusvara represents a nasal sound that may have an alternativeconjoined writing symbol representing the corresponding nasal consonant of theparticularsetAlthoughModernHindiMarathiandKonkaniprefertheanusvāratothecorrespondingHalf-nasal inBangla it isclearlydemarcatedastowhereonemustusetheAnusvāraandwhere ithastobeaconjunctclusterwithanasalasthefirstorthesecondcomponent
335NasalizationCandrabindu(-U+0981)
Candrabindu denotes nasalization of the preceding vowel as in চাদ cad lsquomoonrsquo(U+099AU+09BEU+0981U+09A6)Thissignwithadotinsidethehalf-moonmarkisusedasnasalizationmarkerinmanyBrahmı-basedscripts[143]
336Nukta(-U+09BC)ThenuktasigndoesnotexistinBanglaorthographyItispredominantlyusedinmanyBrahmıderivedscriptssuchasDevanagarı(forHindiBodoMaithiliSantaliKashmiriandSindhiThetermandtheconceptofnuktaareborrowedinBanglaTheIDNAProtocol(RFC5891)statesthatIDNsmustbeinUnicodeNormalizationFormC (NFC) RFC 7940 applies this requirement to LGRs The definition of NFC in theUnicodeStandardcontainsanumberofcompositionexclusionsAsaresulttheBanglalettersয় YYAড় RRAandঢ় RRHAhavetoberepresentedinthethisLGRbyusingthesequences (YA +Nukta U+9AF + U+09BC) (DDA + Nukta U+9A1 + U+09BC) and(DDHA+NuktaU+9A2+U+09BC)insteadofthesinglecodepointsYYA(U+9DF)RRA(U+09DC) andRRHA (U+09DD) although the useof lsquoNuktarsquo is otherwise completelyunnaturalinBanglaIt is noted that in the current Unicode Standard chart these characters are listed asadditionalconsonantsAspertheLGRProcedurehoweverthesedecisionsdependontheIDNAProtocolthroughasetofprodeduresdevelopedbytheIETFEventhoughtheUnicode Standard also prescribesmethods to produce these three characters both asatomiccharacters (forexample09DC forড় [r]09DD forঢ় [rh] and09DFasয় [y]assinglekeystroke)theIDNAprotocolrequiresthatwetreatthemasconjunctcharactersandthenallocatecodesfortheseintheUnicodeBengaliBlock
18
ItmaybenotedthattherecouldbesporadicattemptsorcasesofwritingMuslimnamesUrdupoeticwordsandPerso-Arabicloanwordswithnuktaunderক(k)খ(kh)গ(g)জ(j) and ফ (ph) only for the sake of correct pronunciation and for maintaining thesanctityoftheloanwordThesewerealsolikeusingBanglawritingsystemtoworkliketheIPAscriptItishowevernotinuseinBanglawritinginprinting
337Visargabiʃɔrgo(ঃ-U+0983)andAvagraha(ঽ-U+09BD)
TheVisargabiʃɔrgoU+0983 is frequentlyused inBangla loanwordsborrowed fromSanskritandrepresentsasoundveryclosetohOnecouldquoteasanexampleদঃখduhkholdquosorrowrsquorsquoldquounhappinessrsquorsquo(U+0926U+0941U+0983U+0916)The Avagraha ঽ (U+09BD) is mainly used in Sanskrit Pali Prakrt or Maithili textswritteninBanglaItisgraduallybeingreplacedbyanuppercomma(egনেরাঽপরািণre-writtenasনেরাrsquoপরািণ)ItisrarelyusednoweveninotherlanguagesusingBanglascriptIncaseofLGRtheAvagrahaisnotpartoftherepertoireIthasbeendecidedthereforenottoretainAvagraha(ঽ)(U+09BD)becauseitisblockedinTLDsaspertheMaximalStartingRepertoire(MSR)PleaseseeAppendixIIinsection11foracompletelistofBanglaconsonantsandtheirallographs
type in the following mannermdashর++য but for র G the sequence would be
র+ZWJ++য [154] In other words ZWJ is used in the rendering of wordsdemanding ya-phalā after ra which is otherwise not possible to type (render)due to the same order of ra+hasanta+antastha ja in the medial andor finalposition Interestingly ra+hasanta+antastha ja is used to type repha on theconsonant -antasthaja as inকায6 (kaarjo) In order to get a ya-phalā after the
consonant -ra it is therefore obligatory to use ZWJ after -ra as in র Gাপার
TheuseofZWJZWNJhavebeenruledoutfromtherootzonebythe[Procedure]Usedin Bangla to create alternate renderings the insertion of these two signs can affectsearchingaswellasNLPTheZeroWidthNon-joiner(ZWNJ)isaninvisiblecharacterusedincertaincases(afterHasanta)wheredefaultconjunctformationistobeexplicitlyrestrictedandtheHasantajoiningthetwoconsonantsparticipatingintheconjunctformationneedstobeexplicitlyshown
339UseofYa-phalaaYa-Phalaasequencesare two instances inBanglawhereHasanta isprecededbya fullvowel(U+0985অ-BENGALILETTERAandU+098Fএ-BENGALILETTERE)
For renderingYa-phalā followedbyঅ andএ it isnecessary to typeU+09CDHasantaplusU+09AFyaprecededby thesaidvowelsThis isapurely ligaturalentityand theadditionofYa-phalāandākaraisusedtoelicittheaeligsoundasinEnglishacidঅGািসড
association অGােসািসেয়শনlsquobatrsquoবGাটlsquofatrsquo ফGাট lsquomatrsquo মGাটlsquocaprsquoকGাপetcTheBrāhmīscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribed as lsquovowel killerrsquo although it actually indicates absenceof a vowel after themarkedconsonantOnly theconsonantscanhave theHasantamarkedButasweseehereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅ8াandএ8া(CfUnicode100p473[100])
Owingtoco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egT ieka+Hasanta+raasinচTchakraldquocyclerdquo)Thepointisinboth the cases the slot for ra could be Bangla ra র (U+09B0) or the Assamese ra ৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesameTheLGRmakesanoteofthispoint of concern with respect to the two RAs in disguise as it would be compeltelyimpossibletodistinguishbetweenthemwithnakedeyesinalablesogeneratedwhichmay consequently lead to concerns related to spoofing and other kind of cyberirregularitiesThemotivetoclassthesetwoCPsas(blocking)variantsisbecausefullyrendered labels may mask the distinction between Bangla ra র (U+09B0) or theAssameseraৰ(U+09F0)ThatprovidesthejustificationforVariantSet4thoughonlyinthecontextoffollowingHasantThedifferencebetweentheRAsisonlydistinguishableifonelooksintotheirUnicodevaluesThereforelabelssuchasঅকHarka শীষH sırsalsquotopapexrsquo অY abhra lsquocloudthe skyrsquo ম śrama lsquophysical labourrsquo could be extremelydangerous as theweb-usermay never verify the digital content (the labels) with itsunicodevaluecodepoints ThispointismadeexplicitlywithreferencetoTable9(ofsequencesp36)andTable16(ofWLESymbolsp47)thataretofollowMoreoverit
4 OverallDevelopmentProcessandMethodologyThe Neo-Brāhmī Generation Panel (NBGP) has been formed by members havingexperience in Linguistics (especially in NLP Computational linguistics) LiteratureLanguageHistoryandEpigraphyUndertheNeo-BrāhmīGenerationPanelBanglaandeightotherscriptsbelongingtoseparateUnicodeblocksarebeingtakenuptoassignaseparate LGR for each However an attempt ismade to ensure that the fundamentalphilosophybehindbuildingthoseLGRsconsistentwithallotherBrāhmī-derivedscriptsThepresentLGRwillcater tomultiple languagesbelongingtoEGIDSscale1to4(seeTable4)thatuseBanglascriptThefollowingguidingprinciplesareusedinmakingdecisionsaboutBanglaLGRCode-points
41 GuidingPrinciplesTheNBGP adopts followingbroadprinciples for selection of code-points in the code-pointrepertoireacrosstheboardforalltheNeo-Brāhmīscriptswithinitsambit
411 InclusionPrinciples4111 ModernUsageEvery character proposed should be in the everyday usage of a particular linguisticcommunityThecharacterswhichhavebeenencodedintheUnicodefortranscriptionpurposesonlyorforarchivalpurposeswillnotbeconsideredforinclusioninthecode-pointrepertoire4112 UnambiguousUseEvery character proposed should have unambiguous understanding among linguistsaboutitsusageinthelanguage
42 ExclusionPrinciplesThe main exclusion principle is that of External Limits on Scope These consist ofprotocolsor standardswhichareprerequisites to theLabelGenerationRule-setsAllfurtherprinciplesareinfactsubsumedundertheselimitationsbuthavebeenspeltoutseparatelyforthesakeofclarity
22
421 ExternalLimitsofScopeThecodepointrepertoireforrootzonebeingaveryspecialcaseatthetopofprotocolhierarchies thecanvasofavailablecharacters forselectionasapartof theRootZonecodepointrepertoireisalreadyconstrainedbyvariousprotocollayersbeneathitThefollowingthreemainprotocolsstandardsactassuccessivefiltersiTheUnicodeChartOut of all the characters that are needed by the script in question if a particularcharacter is not encoded in Unicode it cannot be incorporated in the code pointrepertoire Such cases are quite rare and especially so in Bangla-Asamiyā-ManipuriWritingSystemgiventheelaborateandexhaustivecharacterinclusioneffortsmadebytheUnicodeconsortiumiiIDNAProtocolUnicode being the character-encoding standard for providing the maximum possiblerepresentation of a given scriptlanguage it has encoded as far as possible all thepossible characters needed by the script However the Domain name being aspecialized case it is governed by an additional protocol known as IDNA(InternationalizedDomainNames inApplications) The IDNAprotocol excludes somecharactersoutofUnicoderepertoirefrombeingpartofthedomainnamesiiiMaximalStartingRepertoire(MSR)TheRoot-zoneLGRbeing the repertoireof characterswhicharegoing tobeused forcreationoftheRoot-zoneTLDswhichinturnconstituteanevenmorespecializedcaseof domain names the ROOT LGR procedure introduces additional exclusions on theIDNArsquosallowedsetofcharacters ExampleBanglaSignAvagrahaঽ(U+093D)evenifallowedbyIDNAprotocol isnotpermittedintheRootZoneRepertoireaspertheMSRTosumuptherestrictionsstartoffwithadmittingonlysuchcharactersasarepartofthe code-block of the given scriptlanguage The IDNA Protocol further narrows thisdownandfinallyanadditionalfilterintheformofMaximalStartingRepertoirerestrictsthecharactersetassociatedwiththegivenlanguageevenmore4211 NoPunctuationMarksTheTLDsbeingidentifierspunctuationmarkerspresentinBraHami-basedscriptswillnotbeincluded
23
4212 NoSymbolsandAbbreviationsAbbreviations weights and measures and other such iconic characters like BANGLAISSHAR(U+09FA)BANGLACURRENCYDENOMINATORSIXTEEN৹(U+09F9)etcwillalsonotbeincluded4213 NoRareandObsoleteCharactersThere are characterswhich have been added toUnicode to accommodate rare formssuchasSanskriticVOCALICRRৠ(U+09E0)andVOCALICLldquoঌrdquo (U+098C)aswellasVOCALICLLৡ(U+09E1)andtheallographicndashkaraformsofthelattertwosymbols-VOWELSIGNVOCALICL(U+09E2)andVOWELSIGNVOCALICLLldquo(U+09E3)Allsuch charactersareexcludedwhich complieswith theConservatismprincipleas laiddownintheRootZoneLGRprocedureHoweverinBanglathe-karacorrespondingtoVOCALICRRৠ(U+09E0)whichisVOWELSIGNVOCALICRRldquordquo(U+09C4)isstill inactiveuseincertainlimitedborrowedorSanskriticwordsandarethereforeretained4214 NoStressMarkersofClassicalSanskritandVedicStressmarkers for classical Sanskrit will not be included This is also in consonancewiththeLetterprincipleaslaiddownintheRootZoneLGRprocedure4215 ABNFThe Augmented Backus-Naur Formalism (ABNF) is described in Section 541 andAppendix(Section101)
5 RepertoireTheBanglaWritingSystemisrepresentedinUNICODEusingtheBengali(Bangla)scriptname as enumerated in ISO 15924 corresponding to languages such as Asamiyā(Assamese) Bangla (Bengali) and Manipuri The BENGALI block used for Bangla-Asamiyā-Manipuri in theUNICODEhas93 entriesThis sectiondetails the code-pointrepertoirethattheNeo-BrāhmīGenerationPanel[NBGP]proposestobeincludedintheBanglaLGRItmaybementionedherethat theGovernmentofAssamhassubmittedaproposal toBureauof Indian Standards (BIS) on26th February2016 for dis-unificationofBanglaand Asamiyā Scripts The BIS inits 8thMeetingofIndian Language Technologies andProducts Sectional Committee LITD 20 held on 23rd Aug 2017 decided torefer theproposalforrecognitionofAssamesescriptinISOIEC10646toISOUntiltheUNICODEConsortiumtakesanyfurtheractionitwillbeassumedthattheCodePointRepertoireunderTable11willbevalidforallthethreelanguagesasabove
24
For each of the code points language references have been given in the last columntitledReferenceunderTable8titledtheldquoCodePointRepertoirerdquoForentirecoverageofBanglacodepointsreferencesofBanglaAsamiyā(Assamese)Manipuri(Meitei)andBishnupriya are given Kokborok written in Bangla script is not known to haveintroducedmanynewcomplicationsexceptforoneparticularcharacterThoughonlyafewrepresentativelanguagesunderEGIDSScale1-4havebeenchosenforreferencingthey together cover all the code-points required for all the languages that NBGP hasconsideredasgivenunderBanglaUnicodePoints(asgiveninUNICODE63)Howeverbefore thedetailsarepresented it is ideal to lookat theBanglaCodePointChartfromMaximalStartingRepertoire[MSR]Version3Itmaybenotedthattheshapesofthereferenceglyphsgivenbelowinthecodechartsarebasedononeofthemanyfontsdesigned and are not prescriptive because there could be some variations in actualfonts ndash both UNICODE-compatible and True-Type ones Consider the following Codepointtable
ColourconventionAllcharactersthatareincludedinthe[MSR]-YellowbackgroundPVALIDinIDNA2008butexcludedfromthe[MSR]-PinkishbackgroundNot PVALID in IDNA2008 or are ineligiblefor the root zone (digits hyphen) - Whitebackground
Apart from the above individual code-points the Neo-Brāhmī Generation Panel alsoproposes some specific sequences which enable conditional inclusion of the BanglaLETTER A and E followed by Bangla SIGN VIRAMA and Bangla LETTER YA againfollowed by Bangla VOWEL SIGN AA in the repertoire for enabling inclusion of aeligsoundasinEnglishlsquobatrsquolsquocatrsquoetc
52 CodePointRepertoireExclusionTherearesomecharactersoftheBanglascriptthatfindplaceintheUnicodebuthavenot been included in the repertoire in the LGR proposal The reason for excludingঌ(U+098C)andৗ(U+09D7)isthattheyarerareandobsoletecharacters
53 CodepointnotusedaloneBENGALI SIGN NUKTA U+09BC (See 336) is excluded from repertoire since it will never be used alone It will be used as sequence in three special characters in normalized form for ড় ঢ় য়
542 TheVowelSequenceInwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaare given To facilitate understanding of other Brahmi script users equivalents inDevanāgarīareprovidedwherevernecessaryAvowelsequenceismadeupofasinglevowelItmaybefollowedbutnotnecessarily(optionally)byanAnusvāraonuʃʃār(B)Candrabindu(D)oraVisargabiʃɔrgo(X)ThenumberofDBorXwhichcanfollowaV inBanglamaynotberestrictedtooneGoingbytherules illustratedinthedocument it isclearthat formationssuchasVDDVBBandVXXare invalidorthographicunitsHowever it isvalidandpossible tohaveformationsorsequencessuchasanusvarafollowedbyachandrabinduononehandandvisarga followed by a chandrabindu on the other as in হ8াংচা lsquohaelignchārsquo and lsquohaelignrsquo হ8াঃrespectivelyThepossibilityof aVisarga orAnusvāra (onuʃʃār) followingaCandrabinduexists inBangla Vowel can optionally be followed by a combination of Hasanta Virama [H]Consonant [C] to formaYa-phala ldquoYa-phala isapresentation formofU+09AFBanglaletterযorlsquoyarsquoRepresentedbythesequenceltU+09CDieBENGALISIGNVIRAMABangla SIGNHasanta or VIRAMA U+09AF -য BENGALI LETTER YAgt Ya-phala has aspecialformয়AgainwhencombinedwithU+09BEাBENGALIVOWELSIGNAA(ielsquoaarsquo(ā))itisusedfortranscribing[aelig]asintheldquoardquointheEnglishwordldquobatrdquowritteninBanglaasব8াটAVowel-sequenceadmitsthefollowingcombinations
5421 ASingleVowel
ExamplesV অअ
5422 AVowelwithConditionsAVowelcanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D]orVisarga[X]or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX] or combination ofHasanta(orVirama)[H]followedbyConsonant[C]followedbykāra(Mātrā)[M]
5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]
5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX
Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)
545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full
10 Refer to Rule P in Section 7 Table 16
42
vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+
6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed
61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus
স +Hasanta+হ(U+09B8+U+09CD+U+09B9)
2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus
ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)
44
Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)
62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference
7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecifications
11 Unicode uses Oriya for the script although Odia is now the official term used 12 As used by the Unicode denoting and including both Assamese and Maṇipuri
47
Below are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)
C rarr Consonant
M rarr Kāra(Mātrā)
V rarr Vowel
B rarr Anusvāra
D rarr Candrabindu
X rarr Visarga
H rarr Hasanta
Z rarr KhandaTa
S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules
P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA
Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinations
49
711 Case of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াtঅuইিvয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অu)andthesecondwordbeginswithaV(ইিvয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstwordThis isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule
72 AdditionalExamplesfromBanglaABNF
Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive
ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of the
JanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka
[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018
[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473
57
10 Appendix-I
101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply
1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াwেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াxব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexampleyেসিরং(Tsering)
102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti
13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section
58
Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)
Table17ndashTheScriptTableofSylhetiNagarıorSiloti
103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints
ক k Uuml(p+ক)raquo(p+ট)(p+ত)Yacute(p+y+র)(p+y+ব)THORN(p+ন)szlig(p+ব)agrave(p+ম)ক8(p+য)aacute(p+র)acirc(p+ষ)atilde(p+frac12+ণ)auml(p+frac12+ম)aring(p+frac12+ব)acirc8(p+frac12+য)aelig(p+স)t(ccedil+ক)egrave(+p+র)
amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)
গ g ecirc(micro+গ)euml(micro+দ)acute(micro+ধ)igrave(micro+ন)iacute(micro+ব)icirc(micro+ম)গ8(micro+য)iuml(micro+র)eth(micro+ল)ntilde(ccedil+গ)ntilde6 ( +ccedil+গ)
ণ n ouml(Agrave+ট)iquest(Agrave+ঠ)divide(Agrave+ড)oslash(Agrave+Atilde+র)AElig(Agrave+ঢ)ugrave(Agrave+ণ)ণ8(Agrave+য)uacute(Agrave+ব)atilde(p+frac12+ণ)ucirc(frac12+ণ)uuml(+ণ)
ItmaybenotedthattherecouldbesporadicattemptsorcasesofwritingMuslimnamesUrdupoeticwordsandPerso-Arabicloanwordswithnuktaunderক(k)খ(kh)গ(g)জ(j) and ফ (ph) only for the sake of correct pronunciation and for maintaining thesanctityoftheloanwordThesewerealsolikeusingBanglawritingsystemtoworkliketheIPAscriptItishowevernotinuseinBanglawritinginprinting
337Visargabiʃɔrgo(ঃ-U+0983)andAvagraha(ঽ-U+09BD)
TheVisargabiʃɔrgoU+0983 is frequentlyused inBangla loanwordsborrowed fromSanskritandrepresentsasoundveryclosetohOnecouldquoteasanexampleদঃখduhkholdquosorrowrsquorsquoldquounhappinessrsquorsquo(U+0926U+0941U+0983U+0916)The Avagraha ঽ (U+09BD) is mainly used in Sanskrit Pali Prakrt or Maithili textswritteninBanglaItisgraduallybeingreplacedbyanuppercomma(egনেরাঽপরািণre-writtenasনেরাrsquoপরািণ)ItisrarelyusednoweveninotherlanguagesusingBanglascriptIncaseofLGRtheAvagrahaisnotpartoftherepertoireIthasbeendecidedthereforenottoretainAvagraha(ঽ)(U+09BD)becauseitisblockedinTLDsaspertheMaximalStartingRepertoire(MSR)PleaseseeAppendixIIinsection11foracompletelistofBanglaconsonantsandtheirallographs
type in the following mannermdashর++য but for র G the sequence would be
র+ZWJ++য [154] In other words ZWJ is used in the rendering of wordsdemanding ya-phalā after ra which is otherwise not possible to type (render)due to the same order of ra+hasanta+antastha ja in the medial andor finalposition Interestingly ra+hasanta+antastha ja is used to type repha on theconsonant -antasthaja as inকায6 (kaarjo) In order to get a ya-phalā after the
consonant -ra it is therefore obligatory to use ZWJ after -ra as in র Gাপার
TheuseofZWJZWNJhavebeenruledoutfromtherootzonebythe[Procedure]Usedin Bangla to create alternate renderings the insertion of these two signs can affectsearchingaswellasNLPTheZeroWidthNon-joiner(ZWNJ)isaninvisiblecharacterusedincertaincases(afterHasanta)wheredefaultconjunctformationistobeexplicitlyrestrictedandtheHasantajoiningthetwoconsonantsparticipatingintheconjunctformationneedstobeexplicitlyshown
339UseofYa-phalaaYa-Phalaasequencesare two instances inBanglawhereHasanta isprecededbya fullvowel(U+0985অ-BENGALILETTERAandU+098Fএ-BENGALILETTERE)
For renderingYa-phalā followedbyঅ andএ it isnecessary to typeU+09CDHasantaplusU+09AFyaprecededby thesaidvowelsThis isapurely ligaturalentityand theadditionofYa-phalāandākaraisusedtoelicittheaeligsoundasinEnglishacidঅGািসড
association অGােসািসেয়শনlsquobatrsquoবGাটlsquofatrsquo ফGাট lsquomatrsquo মGাটlsquocaprsquoকGাপetcTheBrāhmīscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribed as lsquovowel killerrsquo although it actually indicates absenceof a vowel after themarkedconsonantOnly theconsonantscanhave theHasantamarkedButasweseehereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅ8াandএ8া(CfUnicode100p473[100])
Owingtoco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egT ieka+Hasanta+raasinচTchakraldquocyclerdquo)Thepointisinboth the cases the slot for ra could be Bangla ra র (U+09B0) or the Assamese ra ৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesameTheLGRmakesanoteofthispoint of concern with respect to the two RAs in disguise as it would be compeltelyimpossibletodistinguishbetweenthemwithnakedeyesinalablesogeneratedwhichmay consequently lead to concerns related to spoofing and other kind of cyberirregularitiesThemotivetoclassthesetwoCPsas(blocking)variantsisbecausefullyrendered labels may mask the distinction between Bangla ra র (U+09B0) or theAssameseraৰ(U+09F0)ThatprovidesthejustificationforVariantSet4thoughonlyinthecontextoffollowingHasantThedifferencebetweentheRAsisonlydistinguishableifonelooksintotheirUnicodevaluesThereforelabelssuchasঅকHarka শীষH sırsalsquotopapexrsquo অY abhra lsquocloudthe skyrsquo ম śrama lsquophysical labourrsquo could be extremelydangerous as theweb-usermay never verify the digital content (the labels) with itsunicodevaluecodepoints ThispointismadeexplicitlywithreferencetoTable9(ofsequencesp36)andTable16(ofWLESymbolsp47)thataretofollowMoreoverit
4 OverallDevelopmentProcessandMethodologyThe Neo-Brāhmī Generation Panel (NBGP) has been formed by members havingexperience in Linguistics (especially in NLP Computational linguistics) LiteratureLanguageHistoryandEpigraphyUndertheNeo-BrāhmīGenerationPanelBanglaandeightotherscriptsbelongingtoseparateUnicodeblocksarebeingtakenuptoassignaseparate LGR for each However an attempt ismade to ensure that the fundamentalphilosophybehindbuildingthoseLGRsconsistentwithallotherBrāhmī-derivedscriptsThepresentLGRwillcater tomultiple languagesbelongingtoEGIDSscale1to4(seeTable4)thatuseBanglascriptThefollowingguidingprinciplesareusedinmakingdecisionsaboutBanglaLGRCode-points
41 GuidingPrinciplesTheNBGP adopts followingbroadprinciples for selection of code-points in the code-pointrepertoireacrosstheboardforalltheNeo-Brāhmīscriptswithinitsambit
411 InclusionPrinciples4111 ModernUsageEvery character proposed should be in the everyday usage of a particular linguisticcommunityThecharacterswhichhavebeenencodedintheUnicodefortranscriptionpurposesonlyorforarchivalpurposeswillnotbeconsideredforinclusioninthecode-pointrepertoire4112 UnambiguousUseEvery character proposed should have unambiguous understanding among linguistsaboutitsusageinthelanguage
42 ExclusionPrinciplesThe main exclusion principle is that of External Limits on Scope These consist ofprotocolsor standardswhichareprerequisites to theLabelGenerationRule-setsAllfurtherprinciplesareinfactsubsumedundertheselimitationsbuthavebeenspeltoutseparatelyforthesakeofclarity
22
421 ExternalLimitsofScopeThecodepointrepertoireforrootzonebeingaveryspecialcaseatthetopofprotocolhierarchies thecanvasofavailablecharacters forselectionasapartof theRootZonecodepointrepertoireisalreadyconstrainedbyvariousprotocollayersbeneathitThefollowingthreemainprotocolsstandardsactassuccessivefiltersiTheUnicodeChartOut of all the characters that are needed by the script in question if a particularcharacter is not encoded in Unicode it cannot be incorporated in the code pointrepertoire Such cases are quite rare and especially so in Bangla-Asamiyā-ManipuriWritingSystemgiventheelaborateandexhaustivecharacterinclusioneffortsmadebytheUnicodeconsortiumiiIDNAProtocolUnicode being the character-encoding standard for providing the maximum possiblerepresentation of a given scriptlanguage it has encoded as far as possible all thepossible characters needed by the script However the Domain name being aspecialized case it is governed by an additional protocol known as IDNA(InternationalizedDomainNames inApplications) The IDNAprotocol excludes somecharactersoutofUnicoderepertoirefrombeingpartofthedomainnamesiiiMaximalStartingRepertoire(MSR)TheRoot-zoneLGRbeing the repertoireof characterswhicharegoing tobeused forcreationoftheRoot-zoneTLDswhichinturnconstituteanevenmorespecializedcaseof domain names the ROOT LGR procedure introduces additional exclusions on theIDNArsquosallowedsetofcharacters ExampleBanglaSignAvagrahaঽ(U+093D)evenifallowedbyIDNAprotocol isnotpermittedintheRootZoneRepertoireaspertheMSRTosumuptherestrictionsstartoffwithadmittingonlysuchcharactersasarepartofthe code-block of the given scriptlanguage The IDNA Protocol further narrows thisdownandfinallyanadditionalfilterintheformofMaximalStartingRepertoirerestrictsthecharactersetassociatedwiththegivenlanguageevenmore4211 NoPunctuationMarksTheTLDsbeingidentifierspunctuationmarkerspresentinBraHami-basedscriptswillnotbeincluded
23
4212 NoSymbolsandAbbreviationsAbbreviations weights and measures and other such iconic characters like BANGLAISSHAR(U+09FA)BANGLACURRENCYDENOMINATORSIXTEEN৹(U+09F9)etcwillalsonotbeincluded4213 NoRareandObsoleteCharactersThere are characterswhich have been added toUnicode to accommodate rare formssuchasSanskriticVOCALICRRৠ(U+09E0)andVOCALICLldquoঌrdquo (U+098C)aswellasVOCALICLLৡ(U+09E1)andtheallographicndashkaraformsofthelattertwosymbols-VOWELSIGNVOCALICL(U+09E2)andVOWELSIGNVOCALICLLldquo(U+09E3)Allsuch charactersareexcludedwhich complieswith theConservatismprincipleas laiddownintheRootZoneLGRprocedureHoweverinBanglathe-karacorrespondingtoVOCALICRRৠ(U+09E0)whichisVOWELSIGNVOCALICRRldquordquo(U+09C4)isstill inactiveuseincertainlimitedborrowedorSanskriticwordsandarethereforeretained4214 NoStressMarkersofClassicalSanskritandVedicStressmarkers for classical Sanskrit will not be included This is also in consonancewiththeLetterprincipleaslaiddownintheRootZoneLGRprocedure4215 ABNFThe Augmented Backus-Naur Formalism (ABNF) is described in Section 541 andAppendix(Section101)
5 RepertoireTheBanglaWritingSystemisrepresentedinUNICODEusingtheBengali(Bangla)scriptname as enumerated in ISO 15924 corresponding to languages such as Asamiyā(Assamese) Bangla (Bengali) and Manipuri The BENGALI block used for Bangla-Asamiyā-Manipuri in theUNICODEhas93 entriesThis sectiondetails the code-pointrepertoirethattheNeo-BrāhmīGenerationPanel[NBGP]proposestobeincludedintheBanglaLGRItmaybementionedherethat theGovernmentofAssamhassubmittedaproposal toBureauof Indian Standards (BIS) on26th February2016 for dis-unificationofBanglaand Asamiyā Scripts The BIS inits 8thMeetingofIndian Language Technologies andProducts Sectional Committee LITD 20 held on 23rd Aug 2017 decided torefer theproposalforrecognitionofAssamesescriptinISOIEC10646toISOUntiltheUNICODEConsortiumtakesanyfurtheractionitwillbeassumedthattheCodePointRepertoireunderTable11willbevalidforallthethreelanguagesasabove
24
For each of the code points language references have been given in the last columntitledReferenceunderTable8titledtheldquoCodePointRepertoirerdquoForentirecoverageofBanglacodepointsreferencesofBanglaAsamiyā(Assamese)Manipuri(Meitei)andBishnupriya are given Kokborok written in Bangla script is not known to haveintroducedmanynewcomplicationsexceptforoneparticularcharacterThoughonlyafewrepresentativelanguagesunderEGIDSScale1-4havebeenchosenforreferencingthey together cover all the code-points required for all the languages that NBGP hasconsideredasgivenunderBanglaUnicodePoints(asgiveninUNICODE63)Howeverbefore thedetailsarepresented it is ideal to lookat theBanglaCodePointChartfromMaximalStartingRepertoire[MSR]Version3Itmaybenotedthattheshapesofthereferenceglyphsgivenbelowinthecodechartsarebasedononeofthemanyfontsdesigned and are not prescriptive because there could be some variations in actualfonts ndash both UNICODE-compatible and True-Type ones Consider the following Codepointtable
ColourconventionAllcharactersthatareincludedinthe[MSR]-YellowbackgroundPVALIDinIDNA2008butexcludedfromthe[MSR]-PinkishbackgroundNot PVALID in IDNA2008 or are ineligiblefor the root zone (digits hyphen) - Whitebackground
Apart from the above individual code-points the Neo-Brāhmī Generation Panel alsoproposes some specific sequences which enable conditional inclusion of the BanglaLETTER A and E followed by Bangla SIGN VIRAMA and Bangla LETTER YA againfollowed by Bangla VOWEL SIGN AA in the repertoire for enabling inclusion of aeligsoundasinEnglishlsquobatrsquolsquocatrsquoetc
52 CodePointRepertoireExclusionTherearesomecharactersoftheBanglascriptthatfindplaceintheUnicodebuthavenot been included in the repertoire in the LGR proposal The reason for excludingঌ(U+098C)andৗ(U+09D7)isthattheyarerareandobsoletecharacters
53 CodepointnotusedaloneBENGALI SIGN NUKTA U+09BC (See 336) is excluded from repertoire since it will never be used alone It will be used as sequence in three special characters in normalized form for ড় ঢ় য়
542 TheVowelSequenceInwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaare given To facilitate understanding of other Brahmi script users equivalents inDevanāgarīareprovidedwherevernecessaryAvowelsequenceismadeupofasinglevowelItmaybefollowedbutnotnecessarily(optionally)byanAnusvāraonuʃʃār(B)Candrabindu(D)oraVisargabiʃɔrgo(X)ThenumberofDBorXwhichcanfollowaV inBanglamaynotberestrictedtooneGoingbytherules illustratedinthedocument it isclearthat formationssuchasVDDVBBandVXXare invalidorthographicunitsHowever it isvalidandpossible tohaveformationsorsequencessuchasanusvarafollowedbyachandrabinduononehandandvisarga followed by a chandrabindu on the other as in হ8াংচা lsquohaelignchārsquo and lsquohaelignrsquo হ8াঃrespectivelyThepossibilityof aVisarga orAnusvāra (onuʃʃār) followingaCandrabinduexists inBangla Vowel can optionally be followed by a combination of Hasanta Virama [H]Consonant [C] to formaYa-phala ldquoYa-phala isapresentation formofU+09AFBanglaletterযorlsquoyarsquoRepresentedbythesequenceltU+09CDieBENGALISIGNVIRAMABangla SIGNHasanta or VIRAMA U+09AF -য BENGALI LETTER YAgt Ya-phala has aspecialformয়AgainwhencombinedwithU+09BEাBENGALIVOWELSIGNAA(ielsquoaarsquo(ā))itisusedfortranscribing[aelig]asintheldquoardquointheEnglishwordldquobatrdquowritteninBanglaasব8াটAVowel-sequenceadmitsthefollowingcombinations
5421 ASingleVowel
ExamplesV অअ
5422 AVowelwithConditionsAVowelcanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D]orVisarga[X]or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX] or combination ofHasanta(orVirama)[H]followedbyConsonant[C]followedbykāra(Mātrā)[M]
5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]
5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX
Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)
545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full
10 Refer to Rule P in Section 7 Table 16
42
vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+
6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed
61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus
স +Hasanta+হ(U+09B8+U+09CD+U+09B9)
2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus
ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)
44
Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)
62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference
7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecifications
11 Unicode uses Oriya for the script although Odia is now the official term used 12 As used by the Unicode denoting and including both Assamese and Maṇipuri
47
Below are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)
C rarr Consonant
M rarr Kāra(Mātrā)
V rarr Vowel
B rarr Anusvāra
D rarr Candrabindu
X rarr Visarga
H rarr Hasanta
Z rarr KhandaTa
S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules
P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA
Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinations
49
711 Case of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াtঅuইিvয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অu)andthesecondwordbeginswithaV(ইিvয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstwordThis isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule
72 AdditionalExamplesfromBanglaABNF
Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive
ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of the
JanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka
[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018
[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473
57
10 Appendix-I
101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply
1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াwেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াxব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexampleyেসিরং(Tsering)
102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti
13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section
58
Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)
Table17ndashTheScriptTableofSylhetiNagarıorSiloti
103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints
ক k Uuml(p+ক)raquo(p+ট)(p+ত)Yacute(p+y+র)(p+y+ব)THORN(p+ন)szlig(p+ব)agrave(p+ম)ক8(p+য)aacute(p+র)acirc(p+ষ)atilde(p+frac12+ণ)auml(p+frac12+ম)aring(p+frac12+ব)acirc8(p+frac12+য)aelig(p+স)t(ccedil+ক)egrave(+p+র)
amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)
গ g ecirc(micro+গ)euml(micro+দ)acute(micro+ধ)igrave(micro+ন)iacute(micro+ব)icirc(micro+ম)গ8(micro+য)iuml(micro+র)eth(micro+ল)ntilde(ccedil+গ)ntilde6 ( +ccedil+গ)
ণ n ouml(Agrave+ট)iquest(Agrave+ঠ)divide(Agrave+ড)oslash(Agrave+Atilde+র)AElig(Agrave+ঢ)ugrave(Agrave+ণ)ণ8(Agrave+য)uacute(Agrave+ব)atilde(p+frac12+ণ)ucirc(frac12+ণ)uuml(+ণ)
type in the following mannermdashর++য but for র G the sequence would be
র+ZWJ++য [154] In other words ZWJ is used in the rendering of wordsdemanding ya-phalā after ra which is otherwise not possible to type (render)due to the same order of ra+hasanta+antastha ja in the medial andor finalposition Interestingly ra+hasanta+antastha ja is used to type repha on theconsonant -antasthaja as inকায6 (kaarjo) In order to get a ya-phalā after the
consonant -ra it is therefore obligatory to use ZWJ after -ra as in র Gাপার
TheuseofZWJZWNJhavebeenruledoutfromtherootzonebythe[Procedure]Usedin Bangla to create alternate renderings the insertion of these two signs can affectsearchingaswellasNLPTheZeroWidthNon-joiner(ZWNJ)isaninvisiblecharacterusedincertaincases(afterHasanta)wheredefaultconjunctformationistobeexplicitlyrestrictedandtheHasantajoiningthetwoconsonantsparticipatingintheconjunctformationneedstobeexplicitlyshown
339UseofYa-phalaaYa-Phalaasequencesare two instances inBanglawhereHasanta isprecededbya fullvowel(U+0985অ-BENGALILETTERAandU+098Fএ-BENGALILETTERE)
For renderingYa-phalā followedbyঅ andএ it isnecessary to typeU+09CDHasantaplusU+09AFyaprecededby thesaidvowelsThis isapurely ligaturalentityand theadditionofYa-phalāandākaraisusedtoelicittheaeligsoundasinEnglishacidঅGািসড
association অGােসািসেয়শনlsquobatrsquoবGাটlsquofatrsquo ফGাট lsquomatrsquo মGাটlsquocaprsquoকGাপetcTheBrāhmīscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribed as lsquovowel killerrsquo although it actually indicates absenceof a vowel after themarkedconsonantOnly theconsonantscanhave theHasantamarkedButasweseehereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅ8াandএ8া(CfUnicode100p473[100])
Owingtoco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egT ieka+Hasanta+raasinচTchakraldquocyclerdquo)Thepointisinboth the cases the slot for ra could be Bangla ra র (U+09B0) or the Assamese ra ৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesameTheLGRmakesanoteofthispoint of concern with respect to the two RAs in disguise as it would be compeltelyimpossibletodistinguishbetweenthemwithnakedeyesinalablesogeneratedwhichmay consequently lead to concerns related to spoofing and other kind of cyberirregularitiesThemotivetoclassthesetwoCPsas(blocking)variantsisbecausefullyrendered labels may mask the distinction between Bangla ra র (U+09B0) or theAssameseraৰ(U+09F0)ThatprovidesthejustificationforVariantSet4thoughonlyinthecontextoffollowingHasantThedifferencebetweentheRAsisonlydistinguishableifonelooksintotheirUnicodevaluesThereforelabelssuchasঅকHarka শীষH sırsalsquotopapexrsquo অY abhra lsquocloudthe skyrsquo ম śrama lsquophysical labourrsquo could be extremelydangerous as theweb-usermay never verify the digital content (the labels) with itsunicodevaluecodepoints ThispointismadeexplicitlywithreferencetoTable9(ofsequencesp36)andTable16(ofWLESymbolsp47)thataretofollowMoreoverit
4 OverallDevelopmentProcessandMethodologyThe Neo-Brāhmī Generation Panel (NBGP) has been formed by members havingexperience in Linguistics (especially in NLP Computational linguistics) LiteratureLanguageHistoryandEpigraphyUndertheNeo-BrāhmīGenerationPanelBanglaandeightotherscriptsbelongingtoseparateUnicodeblocksarebeingtakenuptoassignaseparate LGR for each However an attempt ismade to ensure that the fundamentalphilosophybehindbuildingthoseLGRsconsistentwithallotherBrāhmī-derivedscriptsThepresentLGRwillcater tomultiple languagesbelongingtoEGIDSscale1to4(seeTable4)thatuseBanglascriptThefollowingguidingprinciplesareusedinmakingdecisionsaboutBanglaLGRCode-points
41 GuidingPrinciplesTheNBGP adopts followingbroadprinciples for selection of code-points in the code-pointrepertoireacrosstheboardforalltheNeo-Brāhmīscriptswithinitsambit
411 InclusionPrinciples4111 ModernUsageEvery character proposed should be in the everyday usage of a particular linguisticcommunityThecharacterswhichhavebeenencodedintheUnicodefortranscriptionpurposesonlyorforarchivalpurposeswillnotbeconsideredforinclusioninthecode-pointrepertoire4112 UnambiguousUseEvery character proposed should have unambiguous understanding among linguistsaboutitsusageinthelanguage
42 ExclusionPrinciplesThe main exclusion principle is that of External Limits on Scope These consist ofprotocolsor standardswhichareprerequisites to theLabelGenerationRule-setsAllfurtherprinciplesareinfactsubsumedundertheselimitationsbuthavebeenspeltoutseparatelyforthesakeofclarity
22
421 ExternalLimitsofScopeThecodepointrepertoireforrootzonebeingaveryspecialcaseatthetopofprotocolhierarchies thecanvasofavailablecharacters forselectionasapartof theRootZonecodepointrepertoireisalreadyconstrainedbyvariousprotocollayersbeneathitThefollowingthreemainprotocolsstandardsactassuccessivefiltersiTheUnicodeChartOut of all the characters that are needed by the script in question if a particularcharacter is not encoded in Unicode it cannot be incorporated in the code pointrepertoire Such cases are quite rare and especially so in Bangla-Asamiyā-ManipuriWritingSystemgiventheelaborateandexhaustivecharacterinclusioneffortsmadebytheUnicodeconsortiumiiIDNAProtocolUnicode being the character-encoding standard for providing the maximum possiblerepresentation of a given scriptlanguage it has encoded as far as possible all thepossible characters needed by the script However the Domain name being aspecialized case it is governed by an additional protocol known as IDNA(InternationalizedDomainNames inApplications) The IDNAprotocol excludes somecharactersoutofUnicoderepertoirefrombeingpartofthedomainnamesiiiMaximalStartingRepertoire(MSR)TheRoot-zoneLGRbeing the repertoireof characterswhicharegoing tobeused forcreationoftheRoot-zoneTLDswhichinturnconstituteanevenmorespecializedcaseof domain names the ROOT LGR procedure introduces additional exclusions on theIDNArsquosallowedsetofcharacters ExampleBanglaSignAvagrahaঽ(U+093D)evenifallowedbyIDNAprotocol isnotpermittedintheRootZoneRepertoireaspertheMSRTosumuptherestrictionsstartoffwithadmittingonlysuchcharactersasarepartofthe code-block of the given scriptlanguage The IDNA Protocol further narrows thisdownandfinallyanadditionalfilterintheformofMaximalStartingRepertoirerestrictsthecharactersetassociatedwiththegivenlanguageevenmore4211 NoPunctuationMarksTheTLDsbeingidentifierspunctuationmarkerspresentinBraHami-basedscriptswillnotbeincluded
23
4212 NoSymbolsandAbbreviationsAbbreviations weights and measures and other such iconic characters like BANGLAISSHAR(U+09FA)BANGLACURRENCYDENOMINATORSIXTEEN৹(U+09F9)etcwillalsonotbeincluded4213 NoRareandObsoleteCharactersThere are characterswhich have been added toUnicode to accommodate rare formssuchasSanskriticVOCALICRRৠ(U+09E0)andVOCALICLldquoঌrdquo (U+098C)aswellasVOCALICLLৡ(U+09E1)andtheallographicndashkaraformsofthelattertwosymbols-VOWELSIGNVOCALICL(U+09E2)andVOWELSIGNVOCALICLLldquo(U+09E3)Allsuch charactersareexcludedwhich complieswith theConservatismprincipleas laiddownintheRootZoneLGRprocedureHoweverinBanglathe-karacorrespondingtoVOCALICRRৠ(U+09E0)whichisVOWELSIGNVOCALICRRldquordquo(U+09C4)isstill inactiveuseincertainlimitedborrowedorSanskriticwordsandarethereforeretained4214 NoStressMarkersofClassicalSanskritandVedicStressmarkers for classical Sanskrit will not be included This is also in consonancewiththeLetterprincipleaslaiddownintheRootZoneLGRprocedure4215 ABNFThe Augmented Backus-Naur Formalism (ABNF) is described in Section 541 andAppendix(Section101)
5 RepertoireTheBanglaWritingSystemisrepresentedinUNICODEusingtheBengali(Bangla)scriptname as enumerated in ISO 15924 corresponding to languages such as Asamiyā(Assamese) Bangla (Bengali) and Manipuri The BENGALI block used for Bangla-Asamiyā-Manipuri in theUNICODEhas93 entriesThis sectiondetails the code-pointrepertoirethattheNeo-BrāhmīGenerationPanel[NBGP]proposestobeincludedintheBanglaLGRItmaybementionedherethat theGovernmentofAssamhassubmittedaproposal toBureauof Indian Standards (BIS) on26th February2016 for dis-unificationofBanglaand Asamiyā Scripts The BIS inits 8thMeetingofIndian Language Technologies andProducts Sectional Committee LITD 20 held on 23rd Aug 2017 decided torefer theproposalforrecognitionofAssamesescriptinISOIEC10646toISOUntiltheUNICODEConsortiumtakesanyfurtheractionitwillbeassumedthattheCodePointRepertoireunderTable11willbevalidforallthethreelanguagesasabove
24
For each of the code points language references have been given in the last columntitledReferenceunderTable8titledtheldquoCodePointRepertoirerdquoForentirecoverageofBanglacodepointsreferencesofBanglaAsamiyā(Assamese)Manipuri(Meitei)andBishnupriya are given Kokborok written in Bangla script is not known to haveintroducedmanynewcomplicationsexceptforoneparticularcharacterThoughonlyafewrepresentativelanguagesunderEGIDSScale1-4havebeenchosenforreferencingthey together cover all the code-points required for all the languages that NBGP hasconsideredasgivenunderBanglaUnicodePoints(asgiveninUNICODE63)Howeverbefore thedetailsarepresented it is ideal to lookat theBanglaCodePointChartfromMaximalStartingRepertoire[MSR]Version3Itmaybenotedthattheshapesofthereferenceglyphsgivenbelowinthecodechartsarebasedononeofthemanyfontsdesigned and are not prescriptive because there could be some variations in actualfonts ndash both UNICODE-compatible and True-Type ones Consider the following Codepointtable
ColourconventionAllcharactersthatareincludedinthe[MSR]-YellowbackgroundPVALIDinIDNA2008butexcludedfromthe[MSR]-PinkishbackgroundNot PVALID in IDNA2008 or are ineligiblefor the root zone (digits hyphen) - Whitebackground
Apart from the above individual code-points the Neo-Brāhmī Generation Panel alsoproposes some specific sequences which enable conditional inclusion of the BanglaLETTER A and E followed by Bangla SIGN VIRAMA and Bangla LETTER YA againfollowed by Bangla VOWEL SIGN AA in the repertoire for enabling inclusion of aeligsoundasinEnglishlsquobatrsquolsquocatrsquoetc
52 CodePointRepertoireExclusionTherearesomecharactersoftheBanglascriptthatfindplaceintheUnicodebuthavenot been included in the repertoire in the LGR proposal The reason for excludingঌ(U+098C)andৗ(U+09D7)isthattheyarerareandobsoletecharacters
53 CodepointnotusedaloneBENGALI SIGN NUKTA U+09BC (See 336) is excluded from repertoire since it will never be used alone It will be used as sequence in three special characters in normalized form for ড় ঢ় য়
542 TheVowelSequenceInwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaare given To facilitate understanding of other Brahmi script users equivalents inDevanāgarīareprovidedwherevernecessaryAvowelsequenceismadeupofasinglevowelItmaybefollowedbutnotnecessarily(optionally)byanAnusvāraonuʃʃār(B)Candrabindu(D)oraVisargabiʃɔrgo(X)ThenumberofDBorXwhichcanfollowaV inBanglamaynotberestrictedtooneGoingbytherules illustratedinthedocument it isclearthat formationssuchasVDDVBBandVXXare invalidorthographicunitsHowever it isvalidandpossible tohaveformationsorsequencessuchasanusvarafollowedbyachandrabinduononehandandvisarga followed by a chandrabindu on the other as in হ8াংচা lsquohaelignchārsquo and lsquohaelignrsquo হ8াঃrespectivelyThepossibilityof aVisarga orAnusvāra (onuʃʃār) followingaCandrabinduexists inBangla Vowel can optionally be followed by a combination of Hasanta Virama [H]Consonant [C] to formaYa-phala ldquoYa-phala isapresentation formofU+09AFBanglaletterযorlsquoyarsquoRepresentedbythesequenceltU+09CDieBENGALISIGNVIRAMABangla SIGNHasanta or VIRAMA U+09AF -য BENGALI LETTER YAgt Ya-phala has aspecialformয়AgainwhencombinedwithU+09BEাBENGALIVOWELSIGNAA(ielsquoaarsquo(ā))itisusedfortranscribing[aelig]asintheldquoardquointheEnglishwordldquobatrdquowritteninBanglaasব8াটAVowel-sequenceadmitsthefollowingcombinations
5421 ASingleVowel
ExamplesV অअ
5422 AVowelwithConditionsAVowelcanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D]orVisarga[X]or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX] or combination ofHasanta(orVirama)[H]followedbyConsonant[C]followedbykāra(Mātrā)[M]
5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]
5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX
Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)
545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full
10 Refer to Rule P in Section 7 Table 16
42
vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+
6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed
61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus
স +Hasanta+হ(U+09B8+U+09CD+U+09B9)
2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus
ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)
44
Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)
62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference
7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecifications
11 Unicode uses Oriya for the script although Odia is now the official term used 12 As used by the Unicode denoting and including both Assamese and Maṇipuri
47
Below are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)
C rarr Consonant
M rarr Kāra(Mātrā)
V rarr Vowel
B rarr Anusvāra
D rarr Candrabindu
X rarr Visarga
H rarr Hasanta
Z rarr KhandaTa
S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules
P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA
Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinations
49
711 Case of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াtঅuইিvয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অu)andthesecondwordbeginswithaV(ইিvয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstwordThis isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule
72 AdditionalExamplesfromBanglaABNF
Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive
ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of the
JanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka
[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018
[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473
57
10 Appendix-I
101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply
1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াwেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াxব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexampleyেসিরং(Tsering)
102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti
13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section
58
Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)
Table17ndashTheScriptTableofSylhetiNagarıorSiloti
103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints
ক k Uuml(p+ক)raquo(p+ট)(p+ত)Yacute(p+y+র)(p+y+ব)THORN(p+ন)szlig(p+ব)agrave(p+ম)ক8(p+য)aacute(p+র)acirc(p+ষ)atilde(p+frac12+ণ)auml(p+frac12+ম)aring(p+frac12+ব)acirc8(p+frac12+য)aelig(p+স)t(ccedil+ক)egrave(+p+র)
amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)
গ g ecirc(micro+গ)euml(micro+দ)acute(micro+ধ)igrave(micro+ন)iacute(micro+ব)icirc(micro+ম)গ8(micro+য)iuml(micro+র)eth(micro+ল)ntilde(ccedil+গ)ntilde6 ( +ccedil+গ)
ণ n ouml(Agrave+ট)iquest(Agrave+ঠ)divide(Agrave+ড)oslash(Agrave+Atilde+র)AElig(Agrave+ঢ)ugrave(Agrave+ণ)ণ8(Agrave+য)uacute(Agrave+ব)atilde(p+frac12+ণ)ucirc(frac12+ণ)uuml(+ণ)
For renderingYa-phalā followedbyঅ andএ it isnecessary to typeU+09CDHasantaplusU+09AFyaprecededby thesaidvowelsThis isapurely ligaturalentityand theadditionofYa-phalāandākaraisusedtoelicittheaeligsoundasinEnglishacidঅGািসড
association অGােসািসেয়শনlsquobatrsquoবGাটlsquofatrsquo ফGাট lsquomatrsquo মGাটlsquocaprsquoকGাপetcTheBrāhmīscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribed as lsquovowel killerrsquo although it actually indicates absenceof a vowel after themarkedconsonantOnly theconsonantscanhave theHasantamarkedButasweseehereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅ8াandএ8া(CfUnicode100p473[100])
Owingtoco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egT ieka+Hasanta+raasinচTchakraldquocyclerdquo)Thepointisinboth the cases the slot for ra could be Bangla ra র (U+09B0) or the Assamese ra ৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesameTheLGRmakesanoteofthispoint of concern with respect to the two RAs in disguise as it would be compeltelyimpossibletodistinguishbetweenthemwithnakedeyesinalablesogeneratedwhichmay consequently lead to concerns related to spoofing and other kind of cyberirregularitiesThemotivetoclassthesetwoCPsas(blocking)variantsisbecausefullyrendered labels may mask the distinction between Bangla ra র (U+09B0) or theAssameseraৰ(U+09F0)ThatprovidesthejustificationforVariantSet4thoughonlyinthecontextoffollowingHasantThedifferencebetweentheRAsisonlydistinguishableifonelooksintotheirUnicodevaluesThereforelabelssuchasঅকHarka শীষH sırsalsquotopapexrsquo অY abhra lsquocloudthe skyrsquo ম śrama lsquophysical labourrsquo could be extremelydangerous as theweb-usermay never verify the digital content (the labels) with itsunicodevaluecodepoints ThispointismadeexplicitlywithreferencetoTable9(ofsequencesp36)andTable16(ofWLESymbolsp47)thataretofollowMoreoverit
4 OverallDevelopmentProcessandMethodologyThe Neo-Brāhmī Generation Panel (NBGP) has been formed by members havingexperience in Linguistics (especially in NLP Computational linguistics) LiteratureLanguageHistoryandEpigraphyUndertheNeo-BrāhmīGenerationPanelBanglaandeightotherscriptsbelongingtoseparateUnicodeblocksarebeingtakenuptoassignaseparate LGR for each However an attempt ismade to ensure that the fundamentalphilosophybehindbuildingthoseLGRsconsistentwithallotherBrāhmī-derivedscriptsThepresentLGRwillcater tomultiple languagesbelongingtoEGIDSscale1to4(seeTable4)thatuseBanglascriptThefollowingguidingprinciplesareusedinmakingdecisionsaboutBanglaLGRCode-points
41 GuidingPrinciplesTheNBGP adopts followingbroadprinciples for selection of code-points in the code-pointrepertoireacrosstheboardforalltheNeo-Brāhmīscriptswithinitsambit
411 InclusionPrinciples4111 ModernUsageEvery character proposed should be in the everyday usage of a particular linguisticcommunityThecharacterswhichhavebeenencodedintheUnicodefortranscriptionpurposesonlyorforarchivalpurposeswillnotbeconsideredforinclusioninthecode-pointrepertoire4112 UnambiguousUseEvery character proposed should have unambiguous understanding among linguistsaboutitsusageinthelanguage
42 ExclusionPrinciplesThe main exclusion principle is that of External Limits on Scope These consist ofprotocolsor standardswhichareprerequisites to theLabelGenerationRule-setsAllfurtherprinciplesareinfactsubsumedundertheselimitationsbuthavebeenspeltoutseparatelyforthesakeofclarity
22
421 ExternalLimitsofScopeThecodepointrepertoireforrootzonebeingaveryspecialcaseatthetopofprotocolhierarchies thecanvasofavailablecharacters forselectionasapartof theRootZonecodepointrepertoireisalreadyconstrainedbyvariousprotocollayersbeneathitThefollowingthreemainprotocolsstandardsactassuccessivefiltersiTheUnicodeChartOut of all the characters that are needed by the script in question if a particularcharacter is not encoded in Unicode it cannot be incorporated in the code pointrepertoire Such cases are quite rare and especially so in Bangla-Asamiyā-ManipuriWritingSystemgiventheelaborateandexhaustivecharacterinclusioneffortsmadebytheUnicodeconsortiumiiIDNAProtocolUnicode being the character-encoding standard for providing the maximum possiblerepresentation of a given scriptlanguage it has encoded as far as possible all thepossible characters needed by the script However the Domain name being aspecialized case it is governed by an additional protocol known as IDNA(InternationalizedDomainNames inApplications) The IDNAprotocol excludes somecharactersoutofUnicoderepertoirefrombeingpartofthedomainnamesiiiMaximalStartingRepertoire(MSR)TheRoot-zoneLGRbeing the repertoireof characterswhicharegoing tobeused forcreationoftheRoot-zoneTLDswhichinturnconstituteanevenmorespecializedcaseof domain names the ROOT LGR procedure introduces additional exclusions on theIDNArsquosallowedsetofcharacters ExampleBanglaSignAvagrahaঽ(U+093D)evenifallowedbyIDNAprotocol isnotpermittedintheRootZoneRepertoireaspertheMSRTosumuptherestrictionsstartoffwithadmittingonlysuchcharactersasarepartofthe code-block of the given scriptlanguage The IDNA Protocol further narrows thisdownandfinallyanadditionalfilterintheformofMaximalStartingRepertoirerestrictsthecharactersetassociatedwiththegivenlanguageevenmore4211 NoPunctuationMarksTheTLDsbeingidentifierspunctuationmarkerspresentinBraHami-basedscriptswillnotbeincluded
23
4212 NoSymbolsandAbbreviationsAbbreviations weights and measures and other such iconic characters like BANGLAISSHAR(U+09FA)BANGLACURRENCYDENOMINATORSIXTEEN৹(U+09F9)etcwillalsonotbeincluded4213 NoRareandObsoleteCharactersThere are characterswhich have been added toUnicode to accommodate rare formssuchasSanskriticVOCALICRRৠ(U+09E0)andVOCALICLldquoঌrdquo (U+098C)aswellasVOCALICLLৡ(U+09E1)andtheallographicndashkaraformsofthelattertwosymbols-VOWELSIGNVOCALICL(U+09E2)andVOWELSIGNVOCALICLLldquo(U+09E3)Allsuch charactersareexcludedwhich complieswith theConservatismprincipleas laiddownintheRootZoneLGRprocedureHoweverinBanglathe-karacorrespondingtoVOCALICRRৠ(U+09E0)whichisVOWELSIGNVOCALICRRldquordquo(U+09C4)isstill inactiveuseincertainlimitedborrowedorSanskriticwordsandarethereforeretained4214 NoStressMarkersofClassicalSanskritandVedicStressmarkers for classical Sanskrit will not be included This is also in consonancewiththeLetterprincipleaslaiddownintheRootZoneLGRprocedure4215 ABNFThe Augmented Backus-Naur Formalism (ABNF) is described in Section 541 andAppendix(Section101)
5 RepertoireTheBanglaWritingSystemisrepresentedinUNICODEusingtheBengali(Bangla)scriptname as enumerated in ISO 15924 corresponding to languages such as Asamiyā(Assamese) Bangla (Bengali) and Manipuri The BENGALI block used for Bangla-Asamiyā-Manipuri in theUNICODEhas93 entriesThis sectiondetails the code-pointrepertoirethattheNeo-BrāhmīGenerationPanel[NBGP]proposestobeincludedintheBanglaLGRItmaybementionedherethat theGovernmentofAssamhassubmittedaproposal toBureauof Indian Standards (BIS) on26th February2016 for dis-unificationofBanglaand Asamiyā Scripts The BIS inits 8thMeetingofIndian Language Technologies andProducts Sectional Committee LITD 20 held on 23rd Aug 2017 decided torefer theproposalforrecognitionofAssamesescriptinISOIEC10646toISOUntiltheUNICODEConsortiumtakesanyfurtheractionitwillbeassumedthattheCodePointRepertoireunderTable11willbevalidforallthethreelanguagesasabove
24
For each of the code points language references have been given in the last columntitledReferenceunderTable8titledtheldquoCodePointRepertoirerdquoForentirecoverageofBanglacodepointsreferencesofBanglaAsamiyā(Assamese)Manipuri(Meitei)andBishnupriya are given Kokborok written in Bangla script is not known to haveintroducedmanynewcomplicationsexceptforoneparticularcharacterThoughonlyafewrepresentativelanguagesunderEGIDSScale1-4havebeenchosenforreferencingthey together cover all the code-points required for all the languages that NBGP hasconsideredasgivenunderBanglaUnicodePoints(asgiveninUNICODE63)Howeverbefore thedetailsarepresented it is ideal to lookat theBanglaCodePointChartfromMaximalStartingRepertoire[MSR]Version3Itmaybenotedthattheshapesofthereferenceglyphsgivenbelowinthecodechartsarebasedononeofthemanyfontsdesigned and are not prescriptive because there could be some variations in actualfonts ndash both UNICODE-compatible and True-Type ones Consider the following Codepointtable
ColourconventionAllcharactersthatareincludedinthe[MSR]-YellowbackgroundPVALIDinIDNA2008butexcludedfromthe[MSR]-PinkishbackgroundNot PVALID in IDNA2008 or are ineligiblefor the root zone (digits hyphen) - Whitebackground
Apart from the above individual code-points the Neo-Brāhmī Generation Panel alsoproposes some specific sequences which enable conditional inclusion of the BanglaLETTER A and E followed by Bangla SIGN VIRAMA and Bangla LETTER YA againfollowed by Bangla VOWEL SIGN AA in the repertoire for enabling inclusion of aeligsoundasinEnglishlsquobatrsquolsquocatrsquoetc
52 CodePointRepertoireExclusionTherearesomecharactersoftheBanglascriptthatfindplaceintheUnicodebuthavenot been included in the repertoire in the LGR proposal The reason for excludingঌ(U+098C)andৗ(U+09D7)isthattheyarerareandobsoletecharacters
53 CodepointnotusedaloneBENGALI SIGN NUKTA U+09BC (See 336) is excluded from repertoire since it will never be used alone It will be used as sequence in three special characters in normalized form for ড় ঢ় য়
542 TheVowelSequenceInwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaare given To facilitate understanding of other Brahmi script users equivalents inDevanāgarīareprovidedwherevernecessaryAvowelsequenceismadeupofasinglevowelItmaybefollowedbutnotnecessarily(optionally)byanAnusvāraonuʃʃār(B)Candrabindu(D)oraVisargabiʃɔrgo(X)ThenumberofDBorXwhichcanfollowaV inBanglamaynotberestrictedtooneGoingbytherules illustratedinthedocument it isclearthat formationssuchasVDDVBBandVXXare invalidorthographicunitsHowever it isvalidandpossible tohaveformationsorsequencessuchasanusvarafollowedbyachandrabinduononehandandvisarga followed by a chandrabindu on the other as in হ8াংচা lsquohaelignchārsquo and lsquohaelignrsquo হ8াঃrespectivelyThepossibilityof aVisarga orAnusvāra (onuʃʃār) followingaCandrabinduexists inBangla Vowel can optionally be followed by a combination of Hasanta Virama [H]Consonant [C] to formaYa-phala ldquoYa-phala isapresentation formofU+09AFBanglaletterযorlsquoyarsquoRepresentedbythesequenceltU+09CDieBENGALISIGNVIRAMABangla SIGNHasanta or VIRAMA U+09AF -য BENGALI LETTER YAgt Ya-phala has aspecialformয়AgainwhencombinedwithU+09BEাBENGALIVOWELSIGNAA(ielsquoaarsquo(ā))itisusedfortranscribing[aelig]asintheldquoardquointheEnglishwordldquobatrdquowritteninBanglaasব8াটAVowel-sequenceadmitsthefollowingcombinations
5421 ASingleVowel
ExamplesV অअ
5422 AVowelwithConditionsAVowelcanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D]orVisarga[X]or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX] or combination ofHasanta(orVirama)[H]followedbyConsonant[C]followedbykāra(Mātrā)[M]
5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]
5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX
Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)
545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full
10 Refer to Rule P in Section 7 Table 16
42
vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+
6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed
61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus
স +Hasanta+হ(U+09B8+U+09CD+U+09B9)
2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus
ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)
44
Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)
62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference
7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecifications
11 Unicode uses Oriya for the script although Odia is now the official term used 12 As used by the Unicode denoting and including both Assamese and Maṇipuri
47
Below are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)
C rarr Consonant
M rarr Kāra(Mātrā)
V rarr Vowel
B rarr Anusvāra
D rarr Candrabindu
X rarr Visarga
H rarr Hasanta
Z rarr KhandaTa
S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules
P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA
Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinations
49
711 Case of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াtঅuইিvয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অu)andthesecondwordbeginswithaV(ইিvয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstwordThis isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule
72 AdditionalExamplesfromBanglaABNF
Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive
ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of the
JanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka
[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018
[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473
57
10 Appendix-I
101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply
1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াwেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াxব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexampleyেসিরং(Tsering)
102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti
13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section
58
Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)
Table17ndashTheScriptTableofSylhetiNagarıorSiloti
103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints
ক k Uuml(p+ক)raquo(p+ট)(p+ত)Yacute(p+y+র)(p+y+ব)THORN(p+ন)szlig(p+ব)agrave(p+ম)ক8(p+য)aacute(p+র)acirc(p+ষ)atilde(p+frac12+ণ)auml(p+frac12+ম)aring(p+frac12+ব)acirc8(p+frac12+য)aelig(p+স)t(ccedil+ক)egrave(+p+র)
amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)
গ g ecirc(micro+গ)euml(micro+দ)acute(micro+ধ)igrave(micro+ন)iacute(micro+ব)icirc(micro+ম)গ8(micro+য)iuml(micro+র)eth(micro+ল)ntilde(ccedil+গ)ntilde6 ( +ccedil+গ)
ণ n ouml(Agrave+ট)iquest(Agrave+ঠ)divide(Agrave+ড)oslash(Agrave+Atilde+র)AElig(Agrave+ঢ)ugrave(Agrave+ণ)ণ8(Agrave+য)uacute(Agrave+ব)atilde(p+frac12+ণ)ucirc(frac12+ণ)uuml(+ণ)
4 OverallDevelopmentProcessandMethodologyThe Neo-Brāhmī Generation Panel (NBGP) has been formed by members havingexperience in Linguistics (especially in NLP Computational linguistics) LiteratureLanguageHistoryandEpigraphyUndertheNeo-BrāhmīGenerationPanelBanglaandeightotherscriptsbelongingtoseparateUnicodeblocksarebeingtakenuptoassignaseparate LGR for each However an attempt ismade to ensure that the fundamentalphilosophybehindbuildingthoseLGRsconsistentwithallotherBrāhmī-derivedscriptsThepresentLGRwillcater tomultiple languagesbelongingtoEGIDSscale1to4(seeTable4)thatuseBanglascriptThefollowingguidingprinciplesareusedinmakingdecisionsaboutBanglaLGRCode-points
41 GuidingPrinciplesTheNBGP adopts followingbroadprinciples for selection of code-points in the code-pointrepertoireacrosstheboardforalltheNeo-Brāhmīscriptswithinitsambit
411 InclusionPrinciples4111 ModernUsageEvery character proposed should be in the everyday usage of a particular linguisticcommunityThecharacterswhichhavebeenencodedintheUnicodefortranscriptionpurposesonlyorforarchivalpurposeswillnotbeconsideredforinclusioninthecode-pointrepertoire4112 UnambiguousUseEvery character proposed should have unambiguous understanding among linguistsaboutitsusageinthelanguage
42 ExclusionPrinciplesThe main exclusion principle is that of External Limits on Scope These consist ofprotocolsor standardswhichareprerequisites to theLabelGenerationRule-setsAllfurtherprinciplesareinfactsubsumedundertheselimitationsbuthavebeenspeltoutseparatelyforthesakeofclarity
22
421 ExternalLimitsofScopeThecodepointrepertoireforrootzonebeingaveryspecialcaseatthetopofprotocolhierarchies thecanvasofavailablecharacters forselectionasapartof theRootZonecodepointrepertoireisalreadyconstrainedbyvariousprotocollayersbeneathitThefollowingthreemainprotocolsstandardsactassuccessivefiltersiTheUnicodeChartOut of all the characters that are needed by the script in question if a particularcharacter is not encoded in Unicode it cannot be incorporated in the code pointrepertoire Such cases are quite rare and especially so in Bangla-Asamiyā-ManipuriWritingSystemgiventheelaborateandexhaustivecharacterinclusioneffortsmadebytheUnicodeconsortiumiiIDNAProtocolUnicode being the character-encoding standard for providing the maximum possiblerepresentation of a given scriptlanguage it has encoded as far as possible all thepossible characters needed by the script However the Domain name being aspecialized case it is governed by an additional protocol known as IDNA(InternationalizedDomainNames inApplications) The IDNAprotocol excludes somecharactersoutofUnicoderepertoirefrombeingpartofthedomainnamesiiiMaximalStartingRepertoire(MSR)TheRoot-zoneLGRbeing the repertoireof characterswhicharegoing tobeused forcreationoftheRoot-zoneTLDswhichinturnconstituteanevenmorespecializedcaseof domain names the ROOT LGR procedure introduces additional exclusions on theIDNArsquosallowedsetofcharacters ExampleBanglaSignAvagrahaঽ(U+093D)evenifallowedbyIDNAprotocol isnotpermittedintheRootZoneRepertoireaspertheMSRTosumuptherestrictionsstartoffwithadmittingonlysuchcharactersasarepartofthe code-block of the given scriptlanguage The IDNA Protocol further narrows thisdownandfinallyanadditionalfilterintheformofMaximalStartingRepertoirerestrictsthecharactersetassociatedwiththegivenlanguageevenmore4211 NoPunctuationMarksTheTLDsbeingidentifierspunctuationmarkerspresentinBraHami-basedscriptswillnotbeincluded
23
4212 NoSymbolsandAbbreviationsAbbreviations weights and measures and other such iconic characters like BANGLAISSHAR(U+09FA)BANGLACURRENCYDENOMINATORSIXTEEN৹(U+09F9)etcwillalsonotbeincluded4213 NoRareandObsoleteCharactersThere are characterswhich have been added toUnicode to accommodate rare formssuchasSanskriticVOCALICRRৠ(U+09E0)andVOCALICLldquoঌrdquo (U+098C)aswellasVOCALICLLৡ(U+09E1)andtheallographicndashkaraformsofthelattertwosymbols-VOWELSIGNVOCALICL(U+09E2)andVOWELSIGNVOCALICLLldquo(U+09E3)Allsuch charactersareexcludedwhich complieswith theConservatismprincipleas laiddownintheRootZoneLGRprocedureHoweverinBanglathe-karacorrespondingtoVOCALICRRৠ(U+09E0)whichisVOWELSIGNVOCALICRRldquordquo(U+09C4)isstill inactiveuseincertainlimitedborrowedorSanskriticwordsandarethereforeretained4214 NoStressMarkersofClassicalSanskritandVedicStressmarkers for classical Sanskrit will not be included This is also in consonancewiththeLetterprincipleaslaiddownintheRootZoneLGRprocedure4215 ABNFThe Augmented Backus-Naur Formalism (ABNF) is described in Section 541 andAppendix(Section101)
5 RepertoireTheBanglaWritingSystemisrepresentedinUNICODEusingtheBengali(Bangla)scriptname as enumerated in ISO 15924 corresponding to languages such as Asamiyā(Assamese) Bangla (Bengali) and Manipuri The BENGALI block used for Bangla-Asamiyā-Manipuri in theUNICODEhas93 entriesThis sectiondetails the code-pointrepertoirethattheNeo-BrāhmīGenerationPanel[NBGP]proposestobeincludedintheBanglaLGRItmaybementionedherethat theGovernmentofAssamhassubmittedaproposal toBureauof Indian Standards (BIS) on26th February2016 for dis-unificationofBanglaand Asamiyā Scripts The BIS inits 8thMeetingofIndian Language Technologies andProducts Sectional Committee LITD 20 held on 23rd Aug 2017 decided torefer theproposalforrecognitionofAssamesescriptinISOIEC10646toISOUntiltheUNICODEConsortiumtakesanyfurtheractionitwillbeassumedthattheCodePointRepertoireunderTable11willbevalidforallthethreelanguagesasabove
24
For each of the code points language references have been given in the last columntitledReferenceunderTable8titledtheldquoCodePointRepertoirerdquoForentirecoverageofBanglacodepointsreferencesofBanglaAsamiyā(Assamese)Manipuri(Meitei)andBishnupriya are given Kokborok written in Bangla script is not known to haveintroducedmanynewcomplicationsexceptforoneparticularcharacterThoughonlyafewrepresentativelanguagesunderEGIDSScale1-4havebeenchosenforreferencingthey together cover all the code-points required for all the languages that NBGP hasconsideredasgivenunderBanglaUnicodePoints(asgiveninUNICODE63)Howeverbefore thedetailsarepresented it is ideal to lookat theBanglaCodePointChartfromMaximalStartingRepertoire[MSR]Version3Itmaybenotedthattheshapesofthereferenceglyphsgivenbelowinthecodechartsarebasedononeofthemanyfontsdesigned and are not prescriptive because there could be some variations in actualfonts ndash both UNICODE-compatible and True-Type ones Consider the following Codepointtable
ColourconventionAllcharactersthatareincludedinthe[MSR]-YellowbackgroundPVALIDinIDNA2008butexcludedfromthe[MSR]-PinkishbackgroundNot PVALID in IDNA2008 or are ineligiblefor the root zone (digits hyphen) - Whitebackground
Apart from the above individual code-points the Neo-Brāhmī Generation Panel alsoproposes some specific sequences which enable conditional inclusion of the BanglaLETTER A and E followed by Bangla SIGN VIRAMA and Bangla LETTER YA againfollowed by Bangla VOWEL SIGN AA in the repertoire for enabling inclusion of aeligsoundasinEnglishlsquobatrsquolsquocatrsquoetc
52 CodePointRepertoireExclusionTherearesomecharactersoftheBanglascriptthatfindplaceintheUnicodebuthavenot been included in the repertoire in the LGR proposal The reason for excludingঌ(U+098C)andৗ(U+09D7)isthattheyarerareandobsoletecharacters
53 CodepointnotusedaloneBENGALI SIGN NUKTA U+09BC (See 336) is excluded from repertoire since it will never be used alone It will be used as sequence in three special characters in normalized form for ড় ঢ় য়
542 TheVowelSequenceInwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaare given To facilitate understanding of other Brahmi script users equivalents inDevanāgarīareprovidedwherevernecessaryAvowelsequenceismadeupofasinglevowelItmaybefollowedbutnotnecessarily(optionally)byanAnusvāraonuʃʃār(B)Candrabindu(D)oraVisargabiʃɔrgo(X)ThenumberofDBorXwhichcanfollowaV inBanglamaynotberestrictedtooneGoingbytherules illustratedinthedocument it isclearthat formationssuchasVDDVBBandVXXare invalidorthographicunitsHowever it isvalidandpossible tohaveformationsorsequencessuchasanusvarafollowedbyachandrabinduononehandandvisarga followed by a chandrabindu on the other as in হ8াংচা lsquohaelignchārsquo and lsquohaelignrsquo হ8াঃrespectivelyThepossibilityof aVisarga orAnusvāra (onuʃʃār) followingaCandrabinduexists inBangla Vowel can optionally be followed by a combination of Hasanta Virama [H]Consonant [C] to formaYa-phala ldquoYa-phala isapresentation formofU+09AFBanglaletterযorlsquoyarsquoRepresentedbythesequenceltU+09CDieBENGALISIGNVIRAMABangla SIGNHasanta or VIRAMA U+09AF -য BENGALI LETTER YAgt Ya-phala has aspecialformয়AgainwhencombinedwithU+09BEাBENGALIVOWELSIGNAA(ielsquoaarsquo(ā))itisusedfortranscribing[aelig]asintheldquoardquointheEnglishwordldquobatrdquowritteninBanglaasব8াটAVowel-sequenceadmitsthefollowingcombinations
5421 ASingleVowel
ExamplesV অअ
5422 AVowelwithConditionsAVowelcanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D]orVisarga[X]or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX] or combination ofHasanta(orVirama)[H]followedbyConsonant[C]followedbykāra(Mātrā)[M]
5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]
5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX
Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)
545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full
10 Refer to Rule P in Section 7 Table 16
42
vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+
6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed
61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus
স +Hasanta+হ(U+09B8+U+09CD+U+09B9)
2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus
ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)
44
Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)
62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference
7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecifications
11 Unicode uses Oriya for the script although Odia is now the official term used 12 As used by the Unicode denoting and including both Assamese and Maṇipuri
47
Below are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)
C rarr Consonant
M rarr Kāra(Mātrā)
V rarr Vowel
B rarr Anusvāra
D rarr Candrabindu
X rarr Visarga
H rarr Hasanta
Z rarr KhandaTa
S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules
P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA
Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinations
49
711 Case of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াtঅuইিvয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অu)andthesecondwordbeginswithaV(ইিvয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstwordThis isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule
72 AdditionalExamplesfromBanglaABNF
Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive
ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of the
JanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka
[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018
[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473
57
10 Appendix-I
101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply
1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াwেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াxব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexampleyেসিরং(Tsering)
102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti
13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section
58
Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)
Table17ndashTheScriptTableofSylhetiNagarıorSiloti
103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints
ক k Uuml(p+ক)raquo(p+ট)(p+ত)Yacute(p+y+র)(p+y+ব)THORN(p+ন)szlig(p+ব)agrave(p+ম)ক8(p+য)aacute(p+র)acirc(p+ষ)atilde(p+frac12+ণ)auml(p+frac12+ম)aring(p+frac12+ব)acirc8(p+frac12+য)aelig(p+স)t(ccedil+ক)egrave(+p+র)
amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)
গ g ecirc(micro+গ)euml(micro+দ)acute(micro+ধ)igrave(micro+ন)iacute(micro+ব)icirc(micro+ম)গ8(micro+য)iuml(micro+র)eth(micro+ল)ntilde(ccedil+গ)ntilde6 ( +ccedil+গ)
ণ n ouml(Agrave+ট)iquest(Agrave+ঠ)divide(Agrave+ড)oslash(Agrave+Atilde+র)AElig(Agrave+ঢ)ugrave(Agrave+ণ)ণ8(Agrave+য)uacute(Agrave+ব)atilde(p+frac12+ণ)ucirc(frac12+ণ)uuml(+ণ)
421 ExternalLimitsofScopeThecodepointrepertoireforrootzonebeingaveryspecialcaseatthetopofprotocolhierarchies thecanvasofavailablecharacters forselectionasapartof theRootZonecodepointrepertoireisalreadyconstrainedbyvariousprotocollayersbeneathitThefollowingthreemainprotocolsstandardsactassuccessivefiltersiTheUnicodeChartOut of all the characters that are needed by the script in question if a particularcharacter is not encoded in Unicode it cannot be incorporated in the code pointrepertoire Such cases are quite rare and especially so in Bangla-Asamiyā-ManipuriWritingSystemgiventheelaborateandexhaustivecharacterinclusioneffortsmadebytheUnicodeconsortiumiiIDNAProtocolUnicode being the character-encoding standard for providing the maximum possiblerepresentation of a given scriptlanguage it has encoded as far as possible all thepossible characters needed by the script However the Domain name being aspecialized case it is governed by an additional protocol known as IDNA(InternationalizedDomainNames inApplications) The IDNAprotocol excludes somecharactersoutofUnicoderepertoirefrombeingpartofthedomainnamesiiiMaximalStartingRepertoire(MSR)TheRoot-zoneLGRbeing the repertoireof characterswhicharegoing tobeused forcreationoftheRoot-zoneTLDswhichinturnconstituteanevenmorespecializedcaseof domain names the ROOT LGR procedure introduces additional exclusions on theIDNArsquosallowedsetofcharacters ExampleBanglaSignAvagrahaঽ(U+093D)evenifallowedbyIDNAprotocol isnotpermittedintheRootZoneRepertoireaspertheMSRTosumuptherestrictionsstartoffwithadmittingonlysuchcharactersasarepartofthe code-block of the given scriptlanguage The IDNA Protocol further narrows thisdownandfinallyanadditionalfilterintheformofMaximalStartingRepertoirerestrictsthecharactersetassociatedwiththegivenlanguageevenmore4211 NoPunctuationMarksTheTLDsbeingidentifierspunctuationmarkerspresentinBraHami-basedscriptswillnotbeincluded
23
4212 NoSymbolsandAbbreviationsAbbreviations weights and measures and other such iconic characters like BANGLAISSHAR(U+09FA)BANGLACURRENCYDENOMINATORSIXTEEN৹(U+09F9)etcwillalsonotbeincluded4213 NoRareandObsoleteCharactersThere are characterswhich have been added toUnicode to accommodate rare formssuchasSanskriticVOCALICRRৠ(U+09E0)andVOCALICLldquoঌrdquo (U+098C)aswellasVOCALICLLৡ(U+09E1)andtheallographicndashkaraformsofthelattertwosymbols-VOWELSIGNVOCALICL(U+09E2)andVOWELSIGNVOCALICLLldquo(U+09E3)Allsuch charactersareexcludedwhich complieswith theConservatismprincipleas laiddownintheRootZoneLGRprocedureHoweverinBanglathe-karacorrespondingtoVOCALICRRৠ(U+09E0)whichisVOWELSIGNVOCALICRRldquordquo(U+09C4)isstill inactiveuseincertainlimitedborrowedorSanskriticwordsandarethereforeretained4214 NoStressMarkersofClassicalSanskritandVedicStressmarkers for classical Sanskrit will not be included This is also in consonancewiththeLetterprincipleaslaiddownintheRootZoneLGRprocedure4215 ABNFThe Augmented Backus-Naur Formalism (ABNF) is described in Section 541 andAppendix(Section101)
5 RepertoireTheBanglaWritingSystemisrepresentedinUNICODEusingtheBengali(Bangla)scriptname as enumerated in ISO 15924 corresponding to languages such as Asamiyā(Assamese) Bangla (Bengali) and Manipuri The BENGALI block used for Bangla-Asamiyā-Manipuri in theUNICODEhas93 entriesThis sectiondetails the code-pointrepertoirethattheNeo-BrāhmīGenerationPanel[NBGP]proposestobeincludedintheBanglaLGRItmaybementionedherethat theGovernmentofAssamhassubmittedaproposal toBureauof Indian Standards (BIS) on26th February2016 for dis-unificationofBanglaand Asamiyā Scripts The BIS inits 8thMeetingofIndian Language Technologies andProducts Sectional Committee LITD 20 held on 23rd Aug 2017 decided torefer theproposalforrecognitionofAssamesescriptinISOIEC10646toISOUntiltheUNICODEConsortiumtakesanyfurtheractionitwillbeassumedthattheCodePointRepertoireunderTable11willbevalidforallthethreelanguagesasabove
24
For each of the code points language references have been given in the last columntitledReferenceunderTable8titledtheldquoCodePointRepertoirerdquoForentirecoverageofBanglacodepointsreferencesofBanglaAsamiyā(Assamese)Manipuri(Meitei)andBishnupriya are given Kokborok written in Bangla script is not known to haveintroducedmanynewcomplicationsexceptforoneparticularcharacterThoughonlyafewrepresentativelanguagesunderEGIDSScale1-4havebeenchosenforreferencingthey together cover all the code-points required for all the languages that NBGP hasconsideredasgivenunderBanglaUnicodePoints(asgiveninUNICODE63)Howeverbefore thedetailsarepresented it is ideal to lookat theBanglaCodePointChartfromMaximalStartingRepertoire[MSR]Version3Itmaybenotedthattheshapesofthereferenceglyphsgivenbelowinthecodechartsarebasedononeofthemanyfontsdesigned and are not prescriptive because there could be some variations in actualfonts ndash both UNICODE-compatible and True-Type ones Consider the following Codepointtable
ColourconventionAllcharactersthatareincludedinthe[MSR]-YellowbackgroundPVALIDinIDNA2008butexcludedfromthe[MSR]-PinkishbackgroundNot PVALID in IDNA2008 or are ineligiblefor the root zone (digits hyphen) - Whitebackground
Apart from the above individual code-points the Neo-Brāhmī Generation Panel alsoproposes some specific sequences which enable conditional inclusion of the BanglaLETTER A and E followed by Bangla SIGN VIRAMA and Bangla LETTER YA againfollowed by Bangla VOWEL SIGN AA in the repertoire for enabling inclusion of aeligsoundasinEnglishlsquobatrsquolsquocatrsquoetc
52 CodePointRepertoireExclusionTherearesomecharactersoftheBanglascriptthatfindplaceintheUnicodebuthavenot been included in the repertoire in the LGR proposal The reason for excludingঌ(U+098C)andৗ(U+09D7)isthattheyarerareandobsoletecharacters
53 CodepointnotusedaloneBENGALI SIGN NUKTA U+09BC (See 336) is excluded from repertoire since it will never be used alone It will be used as sequence in three special characters in normalized form for ড় ঢ় য়
542 TheVowelSequenceInwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaare given To facilitate understanding of other Brahmi script users equivalents inDevanāgarīareprovidedwherevernecessaryAvowelsequenceismadeupofasinglevowelItmaybefollowedbutnotnecessarily(optionally)byanAnusvāraonuʃʃār(B)Candrabindu(D)oraVisargabiʃɔrgo(X)ThenumberofDBorXwhichcanfollowaV inBanglamaynotberestrictedtooneGoingbytherules illustratedinthedocument it isclearthat formationssuchasVDDVBBandVXXare invalidorthographicunitsHowever it isvalidandpossible tohaveformationsorsequencessuchasanusvarafollowedbyachandrabinduononehandandvisarga followed by a chandrabindu on the other as in হ8াংচা lsquohaelignchārsquo and lsquohaelignrsquo হ8াঃrespectivelyThepossibilityof aVisarga orAnusvāra (onuʃʃār) followingaCandrabinduexists inBangla Vowel can optionally be followed by a combination of Hasanta Virama [H]Consonant [C] to formaYa-phala ldquoYa-phala isapresentation formofU+09AFBanglaletterযorlsquoyarsquoRepresentedbythesequenceltU+09CDieBENGALISIGNVIRAMABangla SIGNHasanta or VIRAMA U+09AF -য BENGALI LETTER YAgt Ya-phala has aspecialformয়AgainwhencombinedwithU+09BEাBENGALIVOWELSIGNAA(ielsquoaarsquo(ā))itisusedfortranscribing[aelig]asintheldquoardquointheEnglishwordldquobatrdquowritteninBanglaasব8াটAVowel-sequenceadmitsthefollowingcombinations
5421 ASingleVowel
ExamplesV অअ
5422 AVowelwithConditionsAVowelcanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D]orVisarga[X]or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX] or combination ofHasanta(orVirama)[H]followedbyConsonant[C]followedbykāra(Mātrā)[M]
5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]
5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX
Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)
545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full
10 Refer to Rule P in Section 7 Table 16
42
vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+
6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed
61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus
স +Hasanta+হ(U+09B8+U+09CD+U+09B9)
2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus
ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)
44
Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)
62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference
7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecifications
11 Unicode uses Oriya for the script although Odia is now the official term used 12 As used by the Unicode denoting and including both Assamese and Maṇipuri
47
Below are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)
C rarr Consonant
M rarr Kāra(Mātrā)
V rarr Vowel
B rarr Anusvāra
D rarr Candrabindu
X rarr Visarga
H rarr Hasanta
Z rarr KhandaTa
S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules
P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA
Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinations
49
711 Case of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াtঅuইিvয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অu)andthesecondwordbeginswithaV(ইিvয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstwordThis isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule
72 AdditionalExamplesfromBanglaABNF
Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive
ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of the
JanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka
[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018
[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473
57
10 Appendix-I
101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply
1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াwেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াxব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexampleyেসিরং(Tsering)
102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti
13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section
58
Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)
Table17ndashTheScriptTableofSylhetiNagarıorSiloti
103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints
ক k Uuml(p+ক)raquo(p+ট)(p+ত)Yacute(p+y+র)(p+y+ব)THORN(p+ন)szlig(p+ব)agrave(p+ম)ক8(p+য)aacute(p+র)acirc(p+ষ)atilde(p+frac12+ণ)auml(p+frac12+ম)aring(p+frac12+ব)acirc8(p+frac12+য)aelig(p+স)t(ccedil+ক)egrave(+p+র)
amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)
গ g ecirc(micro+গ)euml(micro+দ)acute(micro+ধ)igrave(micro+ন)iacute(micro+ব)icirc(micro+ম)গ8(micro+য)iuml(micro+র)eth(micro+ল)ntilde(ccedil+গ)ntilde6 ( +ccedil+গ)
ণ n ouml(Agrave+ট)iquest(Agrave+ঠ)divide(Agrave+ড)oslash(Agrave+Atilde+র)AElig(Agrave+ঢ)ugrave(Agrave+ণ)ণ8(Agrave+য)uacute(Agrave+ব)atilde(p+frac12+ণ)ucirc(frac12+ণ)uuml(+ণ)
4212 NoSymbolsandAbbreviationsAbbreviations weights and measures and other such iconic characters like BANGLAISSHAR(U+09FA)BANGLACURRENCYDENOMINATORSIXTEEN৹(U+09F9)etcwillalsonotbeincluded4213 NoRareandObsoleteCharactersThere are characterswhich have been added toUnicode to accommodate rare formssuchasSanskriticVOCALICRRৠ(U+09E0)andVOCALICLldquoঌrdquo (U+098C)aswellasVOCALICLLৡ(U+09E1)andtheallographicndashkaraformsofthelattertwosymbols-VOWELSIGNVOCALICL(U+09E2)andVOWELSIGNVOCALICLLldquo(U+09E3)Allsuch charactersareexcludedwhich complieswith theConservatismprincipleas laiddownintheRootZoneLGRprocedureHoweverinBanglathe-karacorrespondingtoVOCALICRRৠ(U+09E0)whichisVOWELSIGNVOCALICRRldquordquo(U+09C4)isstill inactiveuseincertainlimitedborrowedorSanskriticwordsandarethereforeretained4214 NoStressMarkersofClassicalSanskritandVedicStressmarkers for classical Sanskrit will not be included This is also in consonancewiththeLetterprincipleaslaiddownintheRootZoneLGRprocedure4215 ABNFThe Augmented Backus-Naur Formalism (ABNF) is described in Section 541 andAppendix(Section101)
5 RepertoireTheBanglaWritingSystemisrepresentedinUNICODEusingtheBengali(Bangla)scriptname as enumerated in ISO 15924 corresponding to languages such as Asamiyā(Assamese) Bangla (Bengali) and Manipuri The BENGALI block used for Bangla-Asamiyā-Manipuri in theUNICODEhas93 entriesThis sectiondetails the code-pointrepertoirethattheNeo-BrāhmīGenerationPanel[NBGP]proposestobeincludedintheBanglaLGRItmaybementionedherethat theGovernmentofAssamhassubmittedaproposal toBureauof Indian Standards (BIS) on26th February2016 for dis-unificationofBanglaand Asamiyā Scripts The BIS inits 8thMeetingofIndian Language Technologies andProducts Sectional Committee LITD 20 held on 23rd Aug 2017 decided torefer theproposalforrecognitionofAssamesescriptinISOIEC10646toISOUntiltheUNICODEConsortiumtakesanyfurtheractionitwillbeassumedthattheCodePointRepertoireunderTable11willbevalidforallthethreelanguagesasabove
24
For each of the code points language references have been given in the last columntitledReferenceunderTable8titledtheldquoCodePointRepertoirerdquoForentirecoverageofBanglacodepointsreferencesofBanglaAsamiyā(Assamese)Manipuri(Meitei)andBishnupriya are given Kokborok written in Bangla script is not known to haveintroducedmanynewcomplicationsexceptforoneparticularcharacterThoughonlyafewrepresentativelanguagesunderEGIDSScale1-4havebeenchosenforreferencingthey together cover all the code-points required for all the languages that NBGP hasconsideredasgivenunderBanglaUnicodePoints(asgiveninUNICODE63)Howeverbefore thedetailsarepresented it is ideal to lookat theBanglaCodePointChartfromMaximalStartingRepertoire[MSR]Version3Itmaybenotedthattheshapesofthereferenceglyphsgivenbelowinthecodechartsarebasedononeofthemanyfontsdesigned and are not prescriptive because there could be some variations in actualfonts ndash both UNICODE-compatible and True-Type ones Consider the following Codepointtable
ColourconventionAllcharactersthatareincludedinthe[MSR]-YellowbackgroundPVALIDinIDNA2008butexcludedfromthe[MSR]-PinkishbackgroundNot PVALID in IDNA2008 or are ineligiblefor the root zone (digits hyphen) - Whitebackground
Apart from the above individual code-points the Neo-Brāhmī Generation Panel alsoproposes some specific sequences which enable conditional inclusion of the BanglaLETTER A and E followed by Bangla SIGN VIRAMA and Bangla LETTER YA againfollowed by Bangla VOWEL SIGN AA in the repertoire for enabling inclusion of aeligsoundasinEnglishlsquobatrsquolsquocatrsquoetc
52 CodePointRepertoireExclusionTherearesomecharactersoftheBanglascriptthatfindplaceintheUnicodebuthavenot been included in the repertoire in the LGR proposal The reason for excludingঌ(U+098C)andৗ(U+09D7)isthattheyarerareandobsoletecharacters
53 CodepointnotusedaloneBENGALI SIGN NUKTA U+09BC (See 336) is excluded from repertoire since it will never be used alone It will be used as sequence in three special characters in normalized form for ড় ঢ় য়
542 TheVowelSequenceInwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaare given To facilitate understanding of other Brahmi script users equivalents inDevanāgarīareprovidedwherevernecessaryAvowelsequenceismadeupofasinglevowelItmaybefollowedbutnotnecessarily(optionally)byanAnusvāraonuʃʃār(B)Candrabindu(D)oraVisargabiʃɔrgo(X)ThenumberofDBorXwhichcanfollowaV inBanglamaynotberestrictedtooneGoingbytherules illustratedinthedocument it isclearthat formationssuchasVDDVBBandVXXare invalidorthographicunitsHowever it isvalidandpossible tohaveformationsorsequencessuchasanusvarafollowedbyachandrabinduononehandandvisarga followed by a chandrabindu on the other as in হ8াংচা lsquohaelignchārsquo and lsquohaelignrsquo হ8াঃrespectivelyThepossibilityof aVisarga orAnusvāra (onuʃʃār) followingaCandrabinduexists inBangla Vowel can optionally be followed by a combination of Hasanta Virama [H]Consonant [C] to formaYa-phala ldquoYa-phala isapresentation formofU+09AFBanglaletterযorlsquoyarsquoRepresentedbythesequenceltU+09CDieBENGALISIGNVIRAMABangla SIGNHasanta or VIRAMA U+09AF -য BENGALI LETTER YAgt Ya-phala has aspecialformয়AgainwhencombinedwithU+09BEাBENGALIVOWELSIGNAA(ielsquoaarsquo(ā))itisusedfortranscribing[aelig]asintheldquoardquointheEnglishwordldquobatrdquowritteninBanglaasব8াটAVowel-sequenceadmitsthefollowingcombinations
5421 ASingleVowel
ExamplesV অअ
5422 AVowelwithConditionsAVowelcanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D]orVisarga[X]or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX] or combination ofHasanta(orVirama)[H]followedbyConsonant[C]followedbykāra(Mātrā)[M]
5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]
5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX
Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)
545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full
10 Refer to Rule P in Section 7 Table 16
42
vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+
6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed
61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus
স +Hasanta+হ(U+09B8+U+09CD+U+09B9)
2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus
ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)
44
Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)
62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference
7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecifications
11 Unicode uses Oriya for the script although Odia is now the official term used 12 As used by the Unicode denoting and including both Assamese and Maṇipuri
47
Below are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)
C rarr Consonant
M rarr Kāra(Mātrā)
V rarr Vowel
B rarr Anusvāra
D rarr Candrabindu
X rarr Visarga
H rarr Hasanta
Z rarr KhandaTa
S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules
P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA
Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinations
49
711 Case of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াtঅuইিvয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অu)andthesecondwordbeginswithaV(ইিvয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstwordThis isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule
72 AdditionalExamplesfromBanglaABNF
Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive
ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of the
JanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka
[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018
[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473
57
10 Appendix-I
101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply
1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াwেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াxব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexampleyেসিরং(Tsering)
102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti
13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section
58
Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)
Table17ndashTheScriptTableofSylhetiNagarıorSiloti
103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints
ক k Uuml(p+ক)raquo(p+ট)(p+ত)Yacute(p+y+র)(p+y+ব)THORN(p+ন)szlig(p+ব)agrave(p+ম)ক8(p+য)aacute(p+র)acirc(p+ষ)atilde(p+frac12+ণ)auml(p+frac12+ম)aring(p+frac12+ব)acirc8(p+frac12+য)aelig(p+স)t(ccedil+ক)egrave(+p+র)
amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)
গ g ecirc(micro+গ)euml(micro+দ)acute(micro+ধ)igrave(micro+ন)iacute(micro+ব)icirc(micro+ম)গ8(micro+য)iuml(micro+র)eth(micro+ল)ntilde(ccedil+গ)ntilde6 ( +ccedil+গ)
ণ n ouml(Agrave+ট)iquest(Agrave+ঠ)divide(Agrave+ড)oslash(Agrave+Atilde+র)AElig(Agrave+ঢ)ugrave(Agrave+ণ)ণ8(Agrave+য)uacute(Agrave+ব)atilde(p+frac12+ণ)ucirc(frac12+ণ)uuml(+ণ)
For each of the code points language references have been given in the last columntitledReferenceunderTable8titledtheldquoCodePointRepertoirerdquoForentirecoverageofBanglacodepointsreferencesofBanglaAsamiyā(Assamese)Manipuri(Meitei)andBishnupriya are given Kokborok written in Bangla script is not known to haveintroducedmanynewcomplicationsexceptforoneparticularcharacterThoughonlyafewrepresentativelanguagesunderEGIDSScale1-4havebeenchosenforreferencingthey together cover all the code-points required for all the languages that NBGP hasconsideredasgivenunderBanglaUnicodePoints(asgiveninUNICODE63)Howeverbefore thedetailsarepresented it is ideal to lookat theBanglaCodePointChartfromMaximalStartingRepertoire[MSR]Version3Itmaybenotedthattheshapesofthereferenceglyphsgivenbelowinthecodechartsarebasedononeofthemanyfontsdesigned and are not prescriptive because there could be some variations in actualfonts ndash both UNICODE-compatible and True-Type ones Consider the following Codepointtable
ColourconventionAllcharactersthatareincludedinthe[MSR]-YellowbackgroundPVALIDinIDNA2008butexcludedfromthe[MSR]-PinkishbackgroundNot PVALID in IDNA2008 or are ineligiblefor the root zone (digits hyphen) - Whitebackground
Apart from the above individual code-points the Neo-Brāhmī Generation Panel alsoproposes some specific sequences which enable conditional inclusion of the BanglaLETTER A and E followed by Bangla SIGN VIRAMA and Bangla LETTER YA againfollowed by Bangla VOWEL SIGN AA in the repertoire for enabling inclusion of aeligsoundasinEnglishlsquobatrsquolsquocatrsquoetc
52 CodePointRepertoireExclusionTherearesomecharactersoftheBanglascriptthatfindplaceintheUnicodebuthavenot been included in the repertoire in the LGR proposal The reason for excludingঌ(U+098C)andৗ(U+09D7)isthattheyarerareandobsoletecharacters
53 CodepointnotusedaloneBENGALI SIGN NUKTA U+09BC (See 336) is excluded from repertoire since it will never be used alone It will be used as sequence in three special characters in normalized form for ড় ঢ় য়
542 TheVowelSequenceInwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaare given To facilitate understanding of other Brahmi script users equivalents inDevanāgarīareprovidedwherevernecessaryAvowelsequenceismadeupofasinglevowelItmaybefollowedbutnotnecessarily(optionally)byanAnusvāraonuʃʃār(B)Candrabindu(D)oraVisargabiʃɔrgo(X)ThenumberofDBorXwhichcanfollowaV inBanglamaynotberestrictedtooneGoingbytherules illustratedinthedocument it isclearthat formationssuchasVDDVBBandVXXare invalidorthographicunitsHowever it isvalidandpossible tohaveformationsorsequencessuchasanusvarafollowedbyachandrabinduononehandandvisarga followed by a chandrabindu on the other as in হ8াংচা lsquohaelignchārsquo and lsquohaelignrsquo হ8াঃrespectivelyThepossibilityof aVisarga orAnusvāra (onuʃʃār) followingaCandrabinduexists inBangla Vowel can optionally be followed by a combination of Hasanta Virama [H]Consonant [C] to formaYa-phala ldquoYa-phala isapresentation formofU+09AFBanglaletterযorlsquoyarsquoRepresentedbythesequenceltU+09CDieBENGALISIGNVIRAMABangla SIGNHasanta or VIRAMA U+09AF -য BENGALI LETTER YAgt Ya-phala has aspecialformয়AgainwhencombinedwithU+09BEাBENGALIVOWELSIGNAA(ielsquoaarsquo(ā))itisusedfortranscribing[aelig]asintheldquoardquointheEnglishwordldquobatrdquowritteninBanglaasব8াটAVowel-sequenceadmitsthefollowingcombinations
5421 ASingleVowel
ExamplesV অअ
5422 AVowelwithConditionsAVowelcanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D]orVisarga[X]or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX] or combination ofHasanta(orVirama)[H]followedbyConsonant[C]followedbykāra(Mātrā)[M]
5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]
5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX
Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)
545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full
10 Refer to Rule P in Section 7 Table 16
42
vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+
6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed
61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus
স +Hasanta+হ(U+09B8+U+09CD+U+09B9)
2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus
ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)
44
Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)
62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference
7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecifications
11 Unicode uses Oriya for the script although Odia is now the official term used 12 As used by the Unicode denoting and including both Assamese and Maṇipuri
47
Below are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)
C rarr Consonant
M rarr Kāra(Mātrā)
V rarr Vowel
B rarr Anusvāra
D rarr Candrabindu
X rarr Visarga
H rarr Hasanta
Z rarr KhandaTa
S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules
P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA
Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinations
49
711 Case of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াtঅuইিvয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অu)andthesecondwordbeginswithaV(ইিvয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstwordThis isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule
72 AdditionalExamplesfromBanglaABNF
Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive
ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of the
JanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka
[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018
[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473
57
10 Appendix-I
101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply
1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াwেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াxব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexampleyেসিরং(Tsering)
102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti
13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section
58
Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)
Table17ndashTheScriptTableofSylhetiNagarıorSiloti
103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints
ক k Uuml(p+ক)raquo(p+ট)(p+ত)Yacute(p+y+র)(p+y+ব)THORN(p+ন)szlig(p+ব)agrave(p+ম)ক8(p+য)aacute(p+র)acirc(p+ষ)atilde(p+frac12+ণ)auml(p+frac12+ম)aring(p+frac12+ব)acirc8(p+frac12+য)aelig(p+স)t(ccedil+ক)egrave(+p+র)
amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)
গ g ecirc(micro+গ)euml(micro+দ)acute(micro+ধ)igrave(micro+ন)iacute(micro+ব)icirc(micro+ম)গ8(micro+য)iuml(micro+র)eth(micro+ল)ntilde(ccedil+গ)ntilde6 ( +ccedil+গ)
ণ n ouml(Agrave+ট)iquest(Agrave+ঠ)divide(Agrave+ড)oslash(Agrave+Atilde+র)AElig(Agrave+ঢ)ugrave(Agrave+ণ)ণ8(Agrave+য)uacute(Agrave+ব)atilde(p+frac12+ণ)ucirc(frac12+ণ)uuml(+ণ)
ColourconventionAllcharactersthatareincludedinthe[MSR]-YellowbackgroundPVALIDinIDNA2008butexcludedfromthe[MSR]-PinkishbackgroundNot PVALID in IDNA2008 or are ineligiblefor the root zone (digits hyphen) - Whitebackground
Apart from the above individual code-points the Neo-Brāhmī Generation Panel alsoproposes some specific sequences which enable conditional inclusion of the BanglaLETTER A and E followed by Bangla SIGN VIRAMA and Bangla LETTER YA againfollowed by Bangla VOWEL SIGN AA in the repertoire for enabling inclusion of aeligsoundasinEnglishlsquobatrsquolsquocatrsquoetc
52 CodePointRepertoireExclusionTherearesomecharactersoftheBanglascriptthatfindplaceintheUnicodebuthavenot been included in the repertoire in the LGR proposal The reason for excludingঌ(U+098C)andৗ(U+09D7)isthattheyarerareandobsoletecharacters
53 CodepointnotusedaloneBENGALI SIGN NUKTA U+09BC (See 336) is excluded from repertoire since it will never be used alone It will be used as sequence in three special characters in normalized form for ড় ঢ় য়
542 TheVowelSequenceInwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaare given To facilitate understanding of other Brahmi script users equivalents inDevanāgarīareprovidedwherevernecessaryAvowelsequenceismadeupofasinglevowelItmaybefollowedbutnotnecessarily(optionally)byanAnusvāraonuʃʃār(B)Candrabindu(D)oraVisargabiʃɔrgo(X)ThenumberofDBorXwhichcanfollowaV inBanglamaynotberestrictedtooneGoingbytherules illustratedinthedocument it isclearthat formationssuchasVDDVBBandVXXare invalidorthographicunitsHowever it isvalidandpossible tohaveformationsorsequencessuchasanusvarafollowedbyachandrabinduononehandandvisarga followed by a chandrabindu on the other as in হ8াংচা lsquohaelignchārsquo and lsquohaelignrsquo হ8াঃrespectivelyThepossibilityof aVisarga orAnusvāra (onuʃʃār) followingaCandrabinduexists inBangla Vowel can optionally be followed by a combination of Hasanta Virama [H]Consonant [C] to formaYa-phala ldquoYa-phala isapresentation formofU+09AFBanglaletterযorlsquoyarsquoRepresentedbythesequenceltU+09CDieBENGALISIGNVIRAMABangla SIGNHasanta or VIRAMA U+09AF -য BENGALI LETTER YAgt Ya-phala has aspecialformয়AgainwhencombinedwithU+09BEাBENGALIVOWELSIGNAA(ielsquoaarsquo(ā))itisusedfortranscribing[aelig]asintheldquoardquointheEnglishwordldquobatrdquowritteninBanglaasব8াটAVowel-sequenceadmitsthefollowingcombinations
5421 ASingleVowel
ExamplesV অअ
5422 AVowelwithConditionsAVowelcanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D]orVisarga[X]or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX] or combination ofHasanta(orVirama)[H]followedbyConsonant[C]followedbykāra(Mātrā)[M]
5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]
5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX
Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)
545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full
10 Refer to Rule P in Section 7 Table 16
42
vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+
6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed
61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus
স +Hasanta+হ(U+09B8+U+09CD+U+09B9)
2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus
ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)
44
Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)
62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference
7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecifications
11 Unicode uses Oriya for the script although Odia is now the official term used 12 As used by the Unicode denoting and including both Assamese and Maṇipuri
47
Below are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)
C rarr Consonant
M rarr Kāra(Mātrā)
V rarr Vowel
B rarr Anusvāra
D rarr Candrabindu
X rarr Visarga
H rarr Hasanta
Z rarr KhandaTa
S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules
P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA
Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinations
49
711 Case of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াtঅuইিvয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অu)andthesecondwordbeginswithaV(ইিvয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstwordThis isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule
72 AdditionalExamplesfromBanglaABNF
Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive
ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of the
JanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka
[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018
[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473
57
10 Appendix-I
101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply
1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াwেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াxব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexampleyেসিরং(Tsering)
102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti
13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section
58
Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)
Table17ndashTheScriptTableofSylhetiNagarıorSiloti
103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints
ক k Uuml(p+ক)raquo(p+ট)(p+ত)Yacute(p+y+র)(p+y+ব)THORN(p+ন)szlig(p+ব)agrave(p+ম)ক8(p+য)aacute(p+র)acirc(p+ষ)atilde(p+frac12+ণ)auml(p+frac12+ম)aring(p+frac12+ব)acirc8(p+frac12+য)aelig(p+স)t(ccedil+ক)egrave(+p+র)
amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)
গ g ecirc(micro+গ)euml(micro+দ)acute(micro+ধ)igrave(micro+ন)iacute(micro+ব)icirc(micro+ম)গ8(micro+য)iuml(micro+র)eth(micro+ল)ntilde(ccedil+গ)ntilde6 ( +ccedil+গ)
ণ n ouml(Agrave+ট)iquest(Agrave+ঠ)divide(Agrave+ড)oslash(Agrave+Atilde+র)AElig(Agrave+ঢ)ugrave(Agrave+ণ)ণ8(Agrave+য)uacute(Agrave+ব)atilde(p+frac12+ণ)ucirc(frac12+ণ)uuml(+ণ)
Apart from the above individual code-points the Neo-Brāhmī Generation Panel alsoproposes some specific sequences which enable conditional inclusion of the BanglaLETTER A and E followed by Bangla SIGN VIRAMA and Bangla LETTER YA againfollowed by Bangla VOWEL SIGN AA in the repertoire for enabling inclusion of aeligsoundasinEnglishlsquobatrsquolsquocatrsquoetc
52 CodePointRepertoireExclusionTherearesomecharactersoftheBanglascriptthatfindplaceintheUnicodebuthavenot been included in the repertoire in the LGR proposal The reason for excludingঌ(U+098C)andৗ(U+09D7)isthattheyarerareandobsoletecharacters
53 CodepointnotusedaloneBENGALI SIGN NUKTA U+09BC (See 336) is excluded from repertoire since it will never be used alone It will be used as sequence in three special characters in normalized form for ড় ঢ় য়
542 TheVowelSequenceInwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaare given To facilitate understanding of other Brahmi script users equivalents inDevanāgarīareprovidedwherevernecessaryAvowelsequenceismadeupofasinglevowelItmaybefollowedbutnotnecessarily(optionally)byanAnusvāraonuʃʃār(B)Candrabindu(D)oraVisargabiʃɔrgo(X)ThenumberofDBorXwhichcanfollowaV inBanglamaynotberestrictedtooneGoingbytherules illustratedinthedocument it isclearthat formationssuchasVDDVBBandVXXare invalidorthographicunitsHowever it isvalidandpossible tohaveformationsorsequencessuchasanusvarafollowedbyachandrabinduononehandandvisarga followed by a chandrabindu on the other as in হ8াংচা lsquohaelignchārsquo and lsquohaelignrsquo হ8াঃrespectivelyThepossibilityof aVisarga orAnusvāra (onuʃʃār) followingaCandrabinduexists inBangla Vowel can optionally be followed by a combination of Hasanta Virama [H]Consonant [C] to formaYa-phala ldquoYa-phala isapresentation formofU+09AFBanglaletterযorlsquoyarsquoRepresentedbythesequenceltU+09CDieBENGALISIGNVIRAMABangla SIGNHasanta or VIRAMA U+09AF -য BENGALI LETTER YAgt Ya-phala has aspecialformয়AgainwhencombinedwithU+09BEাBENGALIVOWELSIGNAA(ielsquoaarsquo(ā))itisusedfortranscribing[aelig]asintheldquoardquointheEnglishwordldquobatrdquowritteninBanglaasব8াটAVowel-sequenceadmitsthefollowingcombinations
5421 ASingleVowel
ExamplesV অअ
5422 AVowelwithConditionsAVowelcanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D]orVisarga[X]or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX] or combination ofHasanta(orVirama)[H]followedbyConsonant[C]followedbykāra(Mātrā)[M]
5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]
5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX
Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)
545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full
10 Refer to Rule P in Section 7 Table 16
42
vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+
6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed
61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus
স +Hasanta+হ(U+09B8+U+09CD+U+09B9)
2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus
ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)
44
Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)
62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference
7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecifications
11 Unicode uses Oriya for the script although Odia is now the official term used 12 As used by the Unicode denoting and including both Assamese and Maṇipuri
47
Below are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)
C rarr Consonant
M rarr Kāra(Mātrā)
V rarr Vowel
B rarr Anusvāra
D rarr Candrabindu
X rarr Visarga
H rarr Hasanta
Z rarr KhandaTa
S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules
P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA
Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinations
49
711 Case of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াtঅuইিvয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অu)andthesecondwordbeginswithaV(ইিvয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstwordThis isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule
72 AdditionalExamplesfromBanglaABNF
Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive
ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of the
JanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka
[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018
[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473
57
10 Appendix-I
101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply
1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াwেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াxব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexampleyেসিরং(Tsering)
102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti
13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section
58
Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)
Table17ndashTheScriptTableofSylhetiNagarıorSiloti
103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints
ক k Uuml(p+ক)raquo(p+ট)(p+ত)Yacute(p+y+র)(p+y+ব)THORN(p+ন)szlig(p+ব)agrave(p+ম)ক8(p+য)aacute(p+র)acirc(p+ষ)atilde(p+frac12+ণ)auml(p+frac12+ম)aring(p+frac12+ব)acirc8(p+frac12+য)aelig(p+স)t(ccedil+ক)egrave(+p+র)
amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)
গ g ecirc(micro+গ)euml(micro+দ)acute(micro+ধ)igrave(micro+ন)iacute(micro+ব)icirc(micro+ম)গ8(micro+য)iuml(micro+র)eth(micro+ল)ntilde(ccedil+গ)ntilde6 ( +ccedil+গ)
ণ n ouml(Agrave+ট)iquest(Agrave+ঠ)divide(Agrave+ড)oslash(Agrave+Atilde+র)AElig(Agrave+ঢ)ugrave(Agrave+ণ)ণ8(Agrave+য)uacute(Agrave+ব)atilde(p+frac12+ণ)ucirc(frac12+ণ)uuml(+ণ)
Apart from the above individual code-points the Neo-Brāhmī Generation Panel alsoproposes some specific sequences which enable conditional inclusion of the BanglaLETTER A and E followed by Bangla SIGN VIRAMA and Bangla LETTER YA againfollowed by Bangla VOWEL SIGN AA in the repertoire for enabling inclusion of aeligsoundasinEnglishlsquobatrsquolsquocatrsquoetc
52 CodePointRepertoireExclusionTherearesomecharactersoftheBanglascriptthatfindplaceintheUnicodebuthavenot been included in the repertoire in the LGR proposal The reason for excludingঌ(U+098C)andৗ(U+09D7)isthattheyarerareandobsoletecharacters
53 CodepointnotusedaloneBENGALI SIGN NUKTA U+09BC (See 336) is excluded from repertoire since it will never be used alone It will be used as sequence in three special characters in normalized form for ড় ঢ় য়
542 TheVowelSequenceInwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaare given To facilitate understanding of other Brahmi script users equivalents inDevanāgarīareprovidedwherevernecessaryAvowelsequenceismadeupofasinglevowelItmaybefollowedbutnotnecessarily(optionally)byanAnusvāraonuʃʃār(B)Candrabindu(D)oraVisargabiʃɔrgo(X)ThenumberofDBorXwhichcanfollowaV inBanglamaynotberestrictedtooneGoingbytherules illustratedinthedocument it isclearthat formationssuchasVDDVBBandVXXare invalidorthographicunitsHowever it isvalidandpossible tohaveformationsorsequencessuchasanusvarafollowedbyachandrabinduononehandandvisarga followed by a chandrabindu on the other as in হ8াংচা lsquohaelignchārsquo and lsquohaelignrsquo হ8াঃrespectivelyThepossibilityof aVisarga orAnusvāra (onuʃʃār) followingaCandrabinduexists inBangla Vowel can optionally be followed by a combination of Hasanta Virama [H]Consonant [C] to formaYa-phala ldquoYa-phala isapresentation formofU+09AFBanglaletterযorlsquoyarsquoRepresentedbythesequenceltU+09CDieBENGALISIGNVIRAMABangla SIGNHasanta or VIRAMA U+09AF -য BENGALI LETTER YAgt Ya-phala has aspecialformয়AgainwhencombinedwithU+09BEাBENGALIVOWELSIGNAA(ielsquoaarsquo(ā))itisusedfortranscribing[aelig]asintheldquoardquointheEnglishwordldquobatrdquowritteninBanglaasব8াটAVowel-sequenceadmitsthefollowingcombinations
5421 ASingleVowel
ExamplesV অअ
5422 AVowelwithConditionsAVowelcanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D]orVisarga[X]or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX] or combination ofHasanta(orVirama)[H]followedbyConsonant[C]followedbykāra(Mātrā)[M]
5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]
5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX
Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)
545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full
10 Refer to Rule P in Section 7 Table 16
42
vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+
6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed
61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus
স +Hasanta+হ(U+09B8+U+09CD+U+09B9)
2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus
ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)
44
Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)
62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference
7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecifications
11 Unicode uses Oriya for the script although Odia is now the official term used 12 As used by the Unicode denoting and including both Assamese and Maṇipuri
47
Below are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)
C rarr Consonant
M rarr Kāra(Mātrā)
V rarr Vowel
B rarr Anusvāra
D rarr Candrabindu
X rarr Visarga
H rarr Hasanta
Z rarr KhandaTa
S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules
P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA
Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinations
49
711 Case of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াtঅuইিvয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অu)andthesecondwordbeginswithaV(ইিvয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstwordThis isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule
72 AdditionalExamplesfromBanglaABNF
Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive
ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of the
JanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka
[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018
[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473
57
10 Appendix-I
101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply
1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াwেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াxব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexampleyেসিরং(Tsering)
102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti
13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section
58
Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)
Table17ndashTheScriptTableofSylhetiNagarıorSiloti
103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints
ক k Uuml(p+ক)raquo(p+ট)(p+ত)Yacute(p+y+র)(p+y+ব)THORN(p+ন)szlig(p+ব)agrave(p+ম)ক8(p+য)aacute(p+র)acirc(p+ষ)atilde(p+frac12+ণ)auml(p+frac12+ম)aring(p+frac12+ব)acirc8(p+frac12+য)aelig(p+স)t(ccedil+ক)egrave(+p+র)
amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)
গ g ecirc(micro+গ)euml(micro+দ)acute(micro+ধ)igrave(micro+ন)iacute(micro+ব)icirc(micro+ম)গ8(micro+য)iuml(micro+র)eth(micro+ল)ntilde(ccedil+গ)ntilde6 ( +ccedil+গ)
ণ n ouml(Agrave+ট)iquest(Agrave+ঠ)divide(Agrave+ড)oslash(Agrave+Atilde+র)AElig(Agrave+ঢ)ugrave(Agrave+ণ)ণ8(Agrave+য)uacute(Agrave+ব)atilde(p+frac12+ণ)ucirc(frac12+ণ)uuml(+ণ)
Apart from the above individual code-points the Neo-Brāhmī Generation Panel alsoproposes some specific sequences which enable conditional inclusion of the BanglaLETTER A and E followed by Bangla SIGN VIRAMA and Bangla LETTER YA againfollowed by Bangla VOWEL SIGN AA in the repertoire for enabling inclusion of aeligsoundasinEnglishlsquobatrsquolsquocatrsquoetc
52 CodePointRepertoireExclusionTherearesomecharactersoftheBanglascriptthatfindplaceintheUnicodebuthavenot been included in the repertoire in the LGR proposal The reason for excludingঌ(U+098C)andৗ(U+09D7)isthattheyarerareandobsoletecharacters
53 CodepointnotusedaloneBENGALI SIGN NUKTA U+09BC (See 336) is excluded from repertoire since it will never be used alone It will be used as sequence in three special characters in normalized form for ড় ঢ় য়
542 TheVowelSequenceInwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaare given To facilitate understanding of other Brahmi script users equivalents inDevanāgarīareprovidedwherevernecessaryAvowelsequenceismadeupofasinglevowelItmaybefollowedbutnotnecessarily(optionally)byanAnusvāraonuʃʃār(B)Candrabindu(D)oraVisargabiʃɔrgo(X)ThenumberofDBorXwhichcanfollowaV inBanglamaynotberestrictedtooneGoingbytherules illustratedinthedocument it isclearthat formationssuchasVDDVBBandVXXare invalidorthographicunitsHowever it isvalidandpossible tohaveformationsorsequencessuchasanusvarafollowedbyachandrabinduononehandandvisarga followed by a chandrabindu on the other as in হ8াংচা lsquohaelignchārsquo and lsquohaelignrsquo হ8াঃrespectivelyThepossibilityof aVisarga orAnusvāra (onuʃʃār) followingaCandrabinduexists inBangla Vowel can optionally be followed by a combination of Hasanta Virama [H]Consonant [C] to formaYa-phala ldquoYa-phala isapresentation formofU+09AFBanglaletterযorlsquoyarsquoRepresentedbythesequenceltU+09CDieBENGALISIGNVIRAMABangla SIGNHasanta or VIRAMA U+09AF -য BENGALI LETTER YAgt Ya-phala has aspecialformয়AgainwhencombinedwithU+09BEাBENGALIVOWELSIGNAA(ielsquoaarsquo(ā))itisusedfortranscribing[aelig]asintheldquoardquointheEnglishwordldquobatrdquowritteninBanglaasব8াটAVowel-sequenceadmitsthefollowingcombinations
5421 ASingleVowel
ExamplesV অअ
5422 AVowelwithConditionsAVowelcanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D]orVisarga[X]or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX] or combination ofHasanta(orVirama)[H]followedbyConsonant[C]followedbykāra(Mātrā)[M]
5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]
5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX
Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)
545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full
10 Refer to Rule P in Section 7 Table 16
42
vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+
6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed
61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus
স +Hasanta+হ(U+09B8+U+09CD+U+09B9)
2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus
ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)
44
Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)
62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference
7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecifications
11 Unicode uses Oriya for the script although Odia is now the official term used 12 As used by the Unicode denoting and including both Assamese and Maṇipuri
47
Below are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)
C rarr Consonant
M rarr Kāra(Mātrā)
V rarr Vowel
B rarr Anusvāra
D rarr Candrabindu
X rarr Visarga
H rarr Hasanta
Z rarr KhandaTa
S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules
P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA
Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinations
49
711 Case of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াtঅuইিvয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অu)andthesecondwordbeginswithaV(ইিvয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstwordThis isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule
72 AdditionalExamplesfromBanglaABNF
Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive
ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of the
JanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka
[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018
[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473
57
10 Appendix-I
101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply
1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াwেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াxব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexampleyেসিরং(Tsering)
102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti
13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section
58
Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)
Table17ndashTheScriptTableofSylhetiNagarıorSiloti
103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints
ক k Uuml(p+ক)raquo(p+ট)(p+ত)Yacute(p+y+র)(p+y+ব)THORN(p+ন)szlig(p+ব)agrave(p+ম)ক8(p+য)aacute(p+র)acirc(p+ষ)atilde(p+frac12+ণ)auml(p+frac12+ম)aring(p+frac12+ব)acirc8(p+frac12+য)aelig(p+স)t(ccedil+ক)egrave(+p+র)
amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)
গ g ecirc(micro+গ)euml(micro+দ)acute(micro+ধ)igrave(micro+ন)iacute(micro+ব)icirc(micro+ম)গ8(micro+য)iuml(micro+র)eth(micro+ল)ntilde(ccedil+গ)ntilde6 ( +ccedil+গ)
ণ n ouml(Agrave+ট)iquest(Agrave+ঠ)divide(Agrave+ড)oslash(Agrave+Atilde+র)AElig(Agrave+ঢ)ugrave(Agrave+ণ)ণ8(Agrave+য)uacute(Agrave+ব)atilde(p+frac12+ণ)ucirc(frac12+ণ)uuml(+ণ)
Apart from the above individual code-points the Neo-Brāhmī Generation Panel alsoproposes some specific sequences which enable conditional inclusion of the BanglaLETTER A and E followed by Bangla SIGN VIRAMA and Bangla LETTER YA againfollowed by Bangla VOWEL SIGN AA in the repertoire for enabling inclusion of aeligsoundasinEnglishlsquobatrsquolsquocatrsquoetc
52 CodePointRepertoireExclusionTherearesomecharactersoftheBanglascriptthatfindplaceintheUnicodebuthavenot been included in the repertoire in the LGR proposal The reason for excludingঌ(U+098C)andৗ(U+09D7)isthattheyarerareandobsoletecharacters
53 CodepointnotusedaloneBENGALI SIGN NUKTA U+09BC (See 336) is excluded from repertoire since it will never be used alone It will be used as sequence in three special characters in normalized form for ড় ঢ় য়
542 TheVowelSequenceInwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaare given To facilitate understanding of other Brahmi script users equivalents inDevanāgarīareprovidedwherevernecessaryAvowelsequenceismadeupofasinglevowelItmaybefollowedbutnotnecessarily(optionally)byanAnusvāraonuʃʃār(B)Candrabindu(D)oraVisargabiʃɔrgo(X)ThenumberofDBorXwhichcanfollowaV inBanglamaynotberestrictedtooneGoingbytherules illustratedinthedocument it isclearthat formationssuchasVDDVBBandVXXare invalidorthographicunitsHowever it isvalidandpossible tohaveformationsorsequencessuchasanusvarafollowedbyachandrabinduononehandandvisarga followed by a chandrabindu on the other as in হ8াংচা lsquohaelignchārsquo and lsquohaelignrsquo হ8াঃrespectivelyThepossibilityof aVisarga orAnusvāra (onuʃʃār) followingaCandrabinduexists inBangla Vowel can optionally be followed by a combination of Hasanta Virama [H]Consonant [C] to formaYa-phala ldquoYa-phala isapresentation formofU+09AFBanglaletterযorlsquoyarsquoRepresentedbythesequenceltU+09CDieBENGALISIGNVIRAMABangla SIGNHasanta or VIRAMA U+09AF -য BENGALI LETTER YAgt Ya-phala has aspecialformয়AgainwhencombinedwithU+09BEাBENGALIVOWELSIGNAA(ielsquoaarsquo(ā))itisusedfortranscribing[aelig]asintheldquoardquointheEnglishwordldquobatrdquowritteninBanglaasব8াটAVowel-sequenceadmitsthefollowingcombinations
5421 ASingleVowel
ExamplesV অअ
5422 AVowelwithConditionsAVowelcanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D]orVisarga[X]or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX] or combination ofHasanta(orVirama)[H]followedbyConsonant[C]followedbykāra(Mātrā)[M]
5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]
5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX
Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)
545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full
10 Refer to Rule P in Section 7 Table 16
42
vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+
6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed
61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus
স +Hasanta+হ(U+09B8+U+09CD+U+09B9)
2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus
ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)
44
Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)
62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference
7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecifications
11 Unicode uses Oriya for the script although Odia is now the official term used 12 As used by the Unicode denoting and including both Assamese and Maṇipuri
47
Below are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)
C rarr Consonant
M rarr Kāra(Mātrā)
V rarr Vowel
B rarr Anusvāra
D rarr Candrabindu
X rarr Visarga
H rarr Hasanta
Z rarr KhandaTa
S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules
P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA
Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinations
49
711 Case of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াtঅuইিvয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অu)andthesecondwordbeginswithaV(ইিvয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstwordThis isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule
72 AdditionalExamplesfromBanglaABNF
Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive
ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of the
JanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka
[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018
[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473
57
10 Appendix-I
101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply
1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াwেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াxব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexampleyেসিরং(Tsering)
102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti
13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section
58
Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)
Table17ndashTheScriptTableofSylhetiNagarıorSiloti
103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints
ক k Uuml(p+ক)raquo(p+ট)(p+ত)Yacute(p+y+র)(p+y+ব)THORN(p+ন)szlig(p+ব)agrave(p+ম)ক8(p+য)aacute(p+র)acirc(p+ষ)atilde(p+frac12+ণ)auml(p+frac12+ম)aring(p+frac12+ব)acirc8(p+frac12+য)aelig(p+স)t(ccedil+ক)egrave(+p+র)
amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)
গ g ecirc(micro+গ)euml(micro+দ)acute(micro+ধ)igrave(micro+ন)iacute(micro+ব)icirc(micro+ম)গ8(micro+য)iuml(micro+র)eth(micro+ল)ntilde(ccedil+গ)ntilde6 ( +ccedil+গ)
ণ n ouml(Agrave+ট)iquest(Agrave+ঠ)divide(Agrave+ড)oslash(Agrave+Atilde+র)AElig(Agrave+ঢ)ugrave(Agrave+ণ)ণ8(Agrave+য)uacute(Agrave+ব)atilde(p+frac12+ণ)ucirc(frac12+ণ)uuml(+ণ)
Apart from the above individual code-points the Neo-Brāhmī Generation Panel alsoproposes some specific sequences which enable conditional inclusion of the BanglaLETTER A and E followed by Bangla SIGN VIRAMA and Bangla LETTER YA againfollowed by Bangla VOWEL SIGN AA in the repertoire for enabling inclusion of aeligsoundasinEnglishlsquobatrsquolsquocatrsquoetc
52 CodePointRepertoireExclusionTherearesomecharactersoftheBanglascriptthatfindplaceintheUnicodebuthavenot been included in the repertoire in the LGR proposal The reason for excludingঌ(U+098C)andৗ(U+09D7)isthattheyarerareandobsoletecharacters
53 CodepointnotusedaloneBENGALI SIGN NUKTA U+09BC (See 336) is excluded from repertoire since it will never be used alone It will be used as sequence in three special characters in normalized form for ড় ঢ় য়
542 TheVowelSequenceInwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaare given To facilitate understanding of other Brahmi script users equivalents inDevanāgarīareprovidedwherevernecessaryAvowelsequenceismadeupofasinglevowelItmaybefollowedbutnotnecessarily(optionally)byanAnusvāraonuʃʃār(B)Candrabindu(D)oraVisargabiʃɔrgo(X)ThenumberofDBorXwhichcanfollowaV inBanglamaynotberestrictedtooneGoingbytherules illustratedinthedocument it isclearthat formationssuchasVDDVBBandVXXare invalidorthographicunitsHowever it isvalidandpossible tohaveformationsorsequencessuchasanusvarafollowedbyachandrabinduononehandandvisarga followed by a chandrabindu on the other as in হ8াংচা lsquohaelignchārsquo and lsquohaelignrsquo হ8াঃrespectivelyThepossibilityof aVisarga orAnusvāra (onuʃʃār) followingaCandrabinduexists inBangla Vowel can optionally be followed by a combination of Hasanta Virama [H]Consonant [C] to formaYa-phala ldquoYa-phala isapresentation formofU+09AFBanglaletterযorlsquoyarsquoRepresentedbythesequenceltU+09CDieBENGALISIGNVIRAMABangla SIGNHasanta or VIRAMA U+09AF -য BENGALI LETTER YAgt Ya-phala has aspecialformয়AgainwhencombinedwithU+09BEাBENGALIVOWELSIGNAA(ielsquoaarsquo(ā))itisusedfortranscribing[aelig]asintheldquoardquointheEnglishwordldquobatrdquowritteninBanglaasব8াটAVowel-sequenceadmitsthefollowingcombinations
5421 ASingleVowel
ExamplesV অअ
5422 AVowelwithConditionsAVowelcanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D]orVisarga[X]or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX] or combination ofHasanta(orVirama)[H]followedbyConsonant[C]followedbykāra(Mātrā)[M]
5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]
5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX
Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)
545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full
10 Refer to Rule P in Section 7 Table 16
42
vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+
6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed
61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus
স +Hasanta+হ(U+09B8+U+09CD+U+09B9)
2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus
ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)
44
Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)
62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference
7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecifications
11 Unicode uses Oriya for the script although Odia is now the official term used 12 As used by the Unicode denoting and including both Assamese and Maṇipuri
47
Below are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)
C rarr Consonant
M rarr Kāra(Mātrā)
V rarr Vowel
B rarr Anusvāra
D rarr Candrabindu
X rarr Visarga
H rarr Hasanta
Z rarr KhandaTa
S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules
P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA
Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinations
49
711 Case of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াtঅuইিvয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অu)andthesecondwordbeginswithaV(ইিvয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstwordThis isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule
72 AdditionalExamplesfromBanglaABNF
Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive
ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of the
JanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka
[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018
[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473
57
10 Appendix-I
101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply
1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াwেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াxব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexampleyেসিরং(Tsering)
102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti
13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section
58
Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)
Table17ndashTheScriptTableofSylhetiNagarıorSiloti
103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints
ক k Uuml(p+ক)raquo(p+ট)(p+ত)Yacute(p+y+র)(p+y+ব)THORN(p+ন)szlig(p+ব)agrave(p+ম)ক8(p+য)aacute(p+র)acirc(p+ষ)atilde(p+frac12+ণ)auml(p+frac12+ম)aring(p+frac12+ব)acirc8(p+frac12+য)aelig(p+স)t(ccedil+ক)egrave(+p+র)
amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)
গ g ecirc(micro+গ)euml(micro+দ)acute(micro+ধ)igrave(micro+ন)iacute(micro+ব)icirc(micro+ম)গ8(micro+য)iuml(micro+র)eth(micro+ল)ntilde(ccedil+গ)ntilde6 ( +ccedil+গ)
ণ n ouml(Agrave+ট)iquest(Agrave+ঠ)divide(Agrave+ড)oslash(Agrave+Atilde+র)AElig(Agrave+ঢ)ugrave(Agrave+ণ)ণ8(Agrave+য)uacute(Agrave+ব)atilde(p+frac12+ণ)ucirc(frac12+ণ)uuml(+ণ)
Apart from the above individual code-points the Neo-Brāhmī Generation Panel alsoproposes some specific sequences which enable conditional inclusion of the BanglaLETTER A and E followed by Bangla SIGN VIRAMA and Bangla LETTER YA againfollowed by Bangla VOWEL SIGN AA in the repertoire for enabling inclusion of aeligsoundasinEnglishlsquobatrsquolsquocatrsquoetc
52 CodePointRepertoireExclusionTherearesomecharactersoftheBanglascriptthatfindplaceintheUnicodebuthavenot been included in the repertoire in the LGR proposal The reason for excludingঌ(U+098C)andৗ(U+09D7)isthattheyarerareandobsoletecharacters
53 CodepointnotusedaloneBENGALI SIGN NUKTA U+09BC (See 336) is excluded from repertoire since it will never be used alone It will be used as sequence in three special characters in normalized form for ড় ঢ় য়
542 TheVowelSequenceInwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaare given To facilitate understanding of other Brahmi script users equivalents inDevanāgarīareprovidedwherevernecessaryAvowelsequenceismadeupofasinglevowelItmaybefollowedbutnotnecessarily(optionally)byanAnusvāraonuʃʃār(B)Candrabindu(D)oraVisargabiʃɔrgo(X)ThenumberofDBorXwhichcanfollowaV inBanglamaynotberestrictedtooneGoingbytherules illustratedinthedocument it isclearthat formationssuchasVDDVBBandVXXare invalidorthographicunitsHowever it isvalidandpossible tohaveformationsorsequencessuchasanusvarafollowedbyachandrabinduononehandandvisarga followed by a chandrabindu on the other as in হ8াংচা lsquohaelignchārsquo and lsquohaelignrsquo হ8াঃrespectivelyThepossibilityof aVisarga orAnusvāra (onuʃʃār) followingaCandrabinduexists inBangla Vowel can optionally be followed by a combination of Hasanta Virama [H]Consonant [C] to formaYa-phala ldquoYa-phala isapresentation formofU+09AFBanglaletterযorlsquoyarsquoRepresentedbythesequenceltU+09CDieBENGALISIGNVIRAMABangla SIGNHasanta or VIRAMA U+09AF -য BENGALI LETTER YAgt Ya-phala has aspecialformয়AgainwhencombinedwithU+09BEাBENGALIVOWELSIGNAA(ielsquoaarsquo(ā))itisusedfortranscribing[aelig]asintheldquoardquointheEnglishwordldquobatrdquowritteninBanglaasব8াটAVowel-sequenceadmitsthefollowingcombinations
5421 ASingleVowel
ExamplesV অअ
5422 AVowelwithConditionsAVowelcanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D]orVisarga[X]or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX] or combination ofHasanta(orVirama)[H]followedbyConsonant[C]followedbykāra(Mātrā)[M]
5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]
5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX
Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)
545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full
10 Refer to Rule P in Section 7 Table 16
42
vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+
6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed
61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus
স +Hasanta+হ(U+09B8+U+09CD+U+09B9)
2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus
ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)
44
Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)
62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference
7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecifications
11 Unicode uses Oriya for the script although Odia is now the official term used 12 As used by the Unicode denoting and including both Assamese and Maṇipuri
47
Below are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)
C rarr Consonant
M rarr Kāra(Mātrā)
V rarr Vowel
B rarr Anusvāra
D rarr Candrabindu
X rarr Visarga
H rarr Hasanta
Z rarr KhandaTa
S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules
P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA
Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinations
49
711 Case of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াtঅuইিvয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অu)andthesecondwordbeginswithaV(ইিvয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstwordThis isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule
72 AdditionalExamplesfromBanglaABNF
Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive
ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of the
JanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka
[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018
[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473
57
10 Appendix-I
101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply
1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াwেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াxব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexampleyেসিরং(Tsering)
102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti
13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section
58
Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)
Table17ndashTheScriptTableofSylhetiNagarıorSiloti
103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints
ক k Uuml(p+ক)raquo(p+ট)(p+ত)Yacute(p+y+র)(p+y+ব)THORN(p+ন)szlig(p+ব)agrave(p+ম)ক8(p+য)aacute(p+র)acirc(p+ষ)atilde(p+frac12+ণ)auml(p+frac12+ম)aring(p+frac12+ব)acirc8(p+frac12+য)aelig(p+স)t(ccedil+ক)egrave(+p+র)
amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)
গ g ecirc(micro+গ)euml(micro+দ)acute(micro+ধ)igrave(micro+ন)iacute(micro+ব)icirc(micro+ম)গ8(micro+য)iuml(micro+র)eth(micro+ল)ntilde(ccedil+গ)ntilde6 ( +ccedil+গ)
ণ n ouml(Agrave+ট)iquest(Agrave+ঠ)divide(Agrave+ড)oslash(Agrave+Atilde+র)AElig(Agrave+ঢ)ugrave(Agrave+ণ)ণ8(Agrave+য)uacute(Agrave+ব)atilde(p+frac12+ণ)ucirc(frac12+ণ)uuml(+ণ)
Apart from the above individual code-points the Neo-Brāhmī Generation Panel alsoproposes some specific sequences which enable conditional inclusion of the BanglaLETTER A and E followed by Bangla SIGN VIRAMA and Bangla LETTER YA againfollowed by Bangla VOWEL SIGN AA in the repertoire for enabling inclusion of aeligsoundasinEnglishlsquobatrsquolsquocatrsquoetc
52 CodePointRepertoireExclusionTherearesomecharactersoftheBanglascriptthatfindplaceintheUnicodebuthavenot been included in the repertoire in the LGR proposal The reason for excludingঌ(U+098C)andৗ(U+09D7)isthattheyarerareandobsoletecharacters
53 CodepointnotusedaloneBENGALI SIGN NUKTA U+09BC (See 336) is excluded from repertoire since it will never be used alone It will be used as sequence in three special characters in normalized form for ড় ঢ় য়
542 TheVowelSequenceInwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaare given To facilitate understanding of other Brahmi script users equivalents inDevanāgarīareprovidedwherevernecessaryAvowelsequenceismadeupofasinglevowelItmaybefollowedbutnotnecessarily(optionally)byanAnusvāraonuʃʃār(B)Candrabindu(D)oraVisargabiʃɔrgo(X)ThenumberofDBorXwhichcanfollowaV inBanglamaynotberestrictedtooneGoingbytherules illustratedinthedocument it isclearthat formationssuchasVDDVBBandVXXare invalidorthographicunitsHowever it isvalidandpossible tohaveformationsorsequencessuchasanusvarafollowedbyachandrabinduononehandandvisarga followed by a chandrabindu on the other as in হ8াংচা lsquohaelignchārsquo and lsquohaelignrsquo হ8াঃrespectivelyThepossibilityof aVisarga orAnusvāra (onuʃʃār) followingaCandrabinduexists inBangla Vowel can optionally be followed by a combination of Hasanta Virama [H]Consonant [C] to formaYa-phala ldquoYa-phala isapresentation formofU+09AFBanglaletterযorlsquoyarsquoRepresentedbythesequenceltU+09CDieBENGALISIGNVIRAMABangla SIGNHasanta or VIRAMA U+09AF -য BENGALI LETTER YAgt Ya-phala has aspecialformয়AgainwhencombinedwithU+09BEাBENGALIVOWELSIGNAA(ielsquoaarsquo(ā))itisusedfortranscribing[aelig]asintheldquoardquointheEnglishwordldquobatrdquowritteninBanglaasব8াটAVowel-sequenceadmitsthefollowingcombinations
5421 ASingleVowel
ExamplesV অअ
5422 AVowelwithConditionsAVowelcanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D]orVisarga[X]or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX] or combination ofHasanta(orVirama)[H]followedbyConsonant[C]followedbykāra(Mātrā)[M]
5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]
5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX
Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)
545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full
10 Refer to Rule P in Section 7 Table 16
42
vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+
6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed
61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus
স +Hasanta+হ(U+09B8+U+09CD+U+09B9)
2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus
ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)
44
Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)
62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference
7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecifications
11 Unicode uses Oriya for the script although Odia is now the official term used 12 As used by the Unicode denoting and including both Assamese and Maṇipuri
47
Below are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)
C rarr Consonant
M rarr Kāra(Mātrā)
V rarr Vowel
B rarr Anusvāra
D rarr Candrabindu
X rarr Visarga
H rarr Hasanta
Z rarr KhandaTa
S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules
P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA
Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinations
49
711 Case of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াtঅuইিvয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অu)andthesecondwordbeginswithaV(ইিvয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstwordThis isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule
72 AdditionalExamplesfromBanglaABNF
Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive
ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of the
JanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka
[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018
[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473
57
10 Appendix-I
101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply
1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াwেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াxব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexampleyেসিরং(Tsering)
102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti
13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section
58
Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)
Table17ndashTheScriptTableofSylhetiNagarıorSiloti
103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints
ক k Uuml(p+ক)raquo(p+ট)(p+ত)Yacute(p+y+র)(p+y+ব)THORN(p+ন)szlig(p+ব)agrave(p+ম)ক8(p+য)aacute(p+র)acirc(p+ষ)atilde(p+frac12+ণ)auml(p+frac12+ম)aring(p+frac12+ব)acirc8(p+frac12+য)aelig(p+স)t(ccedil+ক)egrave(+p+র)
amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)
গ g ecirc(micro+গ)euml(micro+দ)acute(micro+ধ)igrave(micro+ন)iacute(micro+ব)icirc(micro+ম)গ8(micro+য)iuml(micro+র)eth(micro+ল)ntilde(ccedil+গ)ntilde6 ( +ccedil+গ)
ণ n ouml(Agrave+ট)iquest(Agrave+ঠ)divide(Agrave+ড)oslash(Agrave+Atilde+র)AElig(Agrave+ঢ)ugrave(Agrave+ণ)ণ8(Agrave+য)uacute(Agrave+ব)atilde(p+frac12+ণ)ucirc(frac12+ণ)uuml(+ণ)
Apart from the above individual code-points the Neo-Brāhmī Generation Panel alsoproposes some specific sequences which enable conditional inclusion of the BanglaLETTER A and E followed by Bangla SIGN VIRAMA and Bangla LETTER YA againfollowed by Bangla VOWEL SIGN AA in the repertoire for enabling inclusion of aeligsoundasinEnglishlsquobatrsquolsquocatrsquoetc
52 CodePointRepertoireExclusionTherearesomecharactersoftheBanglascriptthatfindplaceintheUnicodebuthavenot been included in the repertoire in the LGR proposal The reason for excludingঌ(U+098C)andৗ(U+09D7)isthattheyarerareandobsoletecharacters
53 CodepointnotusedaloneBENGALI SIGN NUKTA U+09BC (See 336) is excluded from repertoire since it will never be used alone It will be used as sequence in three special characters in normalized form for ড় ঢ় য়
542 TheVowelSequenceInwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaare given To facilitate understanding of other Brahmi script users equivalents inDevanāgarīareprovidedwherevernecessaryAvowelsequenceismadeupofasinglevowelItmaybefollowedbutnotnecessarily(optionally)byanAnusvāraonuʃʃār(B)Candrabindu(D)oraVisargabiʃɔrgo(X)ThenumberofDBorXwhichcanfollowaV inBanglamaynotberestrictedtooneGoingbytherules illustratedinthedocument it isclearthat formationssuchasVDDVBBandVXXare invalidorthographicunitsHowever it isvalidandpossible tohaveformationsorsequencessuchasanusvarafollowedbyachandrabinduononehandandvisarga followed by a chandrabindu on the other as in হ8াংচা lsquohaelignchārsquo and lsquohaelignrsquo হ8াঃrespectivelyThepossibilityof aVisarga orAnusvāra (onuʃʃār) followingaCandrabinduexists inBangla Vowel can optionally be followed by a combination of Hasanta Virama [H]Consonant [C] to formaYa-phala ldquoYa-phala isapresentation formofU+09AFBanglaletterযorlsquoyarsquoRepresentedbythesequenceltU+09CDieBENGALISIGNVIRAMABangla SIGNHasanta or VIRAMA U+09AF -য BENGALI LETTER YAgt Ya-phala has aspecialformয়AgainwhencombinedwithU+09BEাBENGALIVOWELSIGNAA(ielsquoaarsquo(ā))itisusedfortranscribing[aelig]asintheldquoardquointheEnglishwordldquobatrdquowritteninBanglaasব8াটAVowel-sequenceadmitsthefollowingcombinations
5421 ASingleVowel
ExamplesV অअ
5422 AVowelwithConditionsAVowelcanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D]orVisarga[X]or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX] or combination ofHasanta(orVirama)[H]followedbyConsonant[C]followedbykāra(Mātrā)[M]
5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]
5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX
Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)
545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full
10 Refer to Rule P in Section 7 Table 16
42
vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+
6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed
61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus
স +Hasanta+হ(U+09B8+U+09CD+U+09B9)
2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus
ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)
44
Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)
62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference
7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecifications
11 Unicode uses Oriya for the script although Odia is now the official term used 12 As used by the Unicode denoting and including both Assamese and Maṇipuri
47
Below are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)
C rarr Consonant
M rarr Kāra(Mātrā)
V rarr Vowel
B rarr Anusvāra
D rarr Candrabindu
X rarr Visarga
H rarr Hasanta
Z rarr KhandaTa
S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules
P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA
Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinations
49
711 Case of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াtঅuইিvয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অu)andthesecondwordbeginswithaV(ইিvয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstwordThis isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule
72 AdditionalExamplesfromBanglaABNF
Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive
ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of the
JanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka
[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018
[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473
57
10 Appendix-I
101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply
1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াwেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াxব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexampleyেসিরং(Tsering)
102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti
13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section
58
Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)
Table17ndashTheScriptTableofSylhetiNagarıorSiloti
103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints
ক k Uuml(p+ক)raquo(p+ট)(p+ত)Yacute(p+y+র)(p+y+ব)THORN(p+ন)szlig(p+ব)agrave(p+ম)ক8(p+য)aacute(p+র)acirc(p+ষ)atilde(p+frac12+ণ)auml(p+frac12+ম)aring(p+frac12+ব)acirc8(p+frac12+য)aelig(p+স)t(ccedil+ক)egrave(+p+র)
amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)
গ g ecirc(micro+গ)euml(micro+দ)acute(micro+ধ)igrave(micro+ন)iacute(micro+ব)icirc(micro+ম)গ8(micro+য)iuml(micro+র)eth(micro+ল)ntilde(ccedil+গ)ntilde6 ( +ccedil+গ)
ণ n ouml(Agrave+ট)iquest(Agrave+ঠ)divide(Agrave+ড)oslash(Agrave+Atilde+র)AElig(Agrave+ঢ)ugrave(Agrave+ণ)ণ8(Agrave+য)uacute(Agrave+ব)atilde(p+frac12+ণ)ucirc(frac12+ণ)uuml(+ণ)
Apart from the above individual code-points the Neo-Brāhmī Generation Panel alsoproposes some specific sequences which enable conditional inclusion of the BanglaLETTER A and E followed by Bangla SIGN VIRAMA and Bangla LETTER YA againfollowed by Bangla VOWEL SIGN AA in the repertoire for enabling inclusion of aeligsoundasinEnglishlsquobatrsquolsquocatrsquoetc
52 CodePointRepertoireExclusionTherearesomecharactersoftheBanglascriptthatfindplaceintheUnicodebuthavenot been included in the repertoire in the LGR proposal The reason for excludingঌ(U+098C)andৗ(U+09D7)isthattheyarerareandobsoletecharacters
53 CodepointnotusedaloneBENGALI SIGN NUKTA U+09BC (See 336) is excluded from repertoire since it will never be used alone It will be used as sequence in three special characters in normalized form for ড় ঢ় য়
542 TheVowelSequenceInwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaare given To facilitate understanding of other Brahmi script users equivalents inDevanāgarīareprovidedwherevernecessaryAvowelsequenceismadeupofasinglevowelItmaybefollowedbutnotnecessarily(optionally)byanAnusvāraonuʃʃār(B)Candrabindu(D)oraVisargabiʃɔrgo(X)ThenumberofDBorXwhichcanfollowaV inBanglamaynotberestrictedtooneGoingbytherules illustratedinthedocument it isclearthat formationssuchasVDDVBBandVXXare invalidorthographicunitsHowever it isvalidandpossible tohaveformationsorsequencessuchasanusvarafollowedbyachandrabinduononehandandvisarga followed by a chandrabindu on the other as in হ8াংচা lsquohaelignchārsquo and lsquohaelignrsquo হ8াঃrespectivelyThepossibilityof aVisarga orAnusvāra (onuʃʃār) followingaCandrabinduexists inBangla Vowel can optionally be followed by a combination of Hasanta Virama [H]Consonant [C] to formaYa-phala ldquoYa-phala isapresentation formofU+09AFBanglaletterযorlsquoyarsquoRepresentedbythesequenceltU+09CDieBENGALISIGNVIRAMABangla SIGNHasanta or VIRAMA U+09AF -য BENGALI LETTER YAgt Ya-phala has aspecialformয়AgainwhencombinedwithU+09BEাBENGALIVOWELSIGNAA(ielsquoaarsquo(ā))itisusedfortranscribing[aelig]asintheldquoardquointheEnglishwordldquobatrdquowritteninBanglaasব8াটAVowel-sequenceadmitsthefollowingcombinations
5421 ASingleVowel
ExamplesV অअ
5422 AVowelwithConditionsAVowelcanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D]orVisarga[X]or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX] or combination ofHasanta(orVirama)[H]followedbyConsonant[C]followedbykāra(Mātrā)[M]
5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]
5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX
Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)
545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full
10 Refer to Rule P in Section 7 Table 16
42
vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+
6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed
61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus
স +Hasanta+হ(U+09B8+U+09CD+U+09B9)
2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus
ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)
44
Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)
62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference
7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecifications
11 Unicode uses Oriya for the script although Odia is now the official term used 12 As used by the Unicode denoting and including both Assamese and Maṇipuri
47
Below are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)
C rarr Consonant
M rarr Kāra(Mātrā)
V rarr Vowel
B rarr Anusvāra
D rarr Candrabindu
X rarr Visarga
H rarr Hasanta
Z rarr KhandaTa
S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules
P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA
Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinations
49
711 Case of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াtঅuইিvয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অu)andthesecondwordbeginswithaV(ইিvয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstwordThis isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule
72 AdditionalExamplesfromBanglaABNF
Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive
ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of the
JanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka
[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018
[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473
57
10 Appendix-I
101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply
1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াwেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াxব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexampleyেসিরং(Tsering)
102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti
13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section
58
Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)
Table17ndashTheScriptTableofSylhetiNagarıorSiloti
103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints
ক k Uuml(p+ক)raquo(p+ট)(p+ত)Yacute(p+y+র)(p+y+ব)THORN(p+ন)szlig(p+ব)agrave(p+ম)ক8(p+য)aacute(p+র)acirc(p+ষ)atilde(p+frac12+ণ)auml(p+frac12+ম)aring(p+frac12+ব)acirc8(p+frac12+য)aelig(p+স)t(ccedil+ক)egrave(+p+র)
amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)
গ g ecirc(micro+গ)euml(micro+দ)acute(micro+ধ)igrave(micro+ন)iacute(micro+ব)icirc(micro+ম)গ8(micro+য)iuml(micro+র)eth(micro+ল)ntilde(ccedil+গ)ntilde6 ( +ccedil+গ)
ণ n ouml(Agrave+ট)iquest(Agrave+ঠ)divide(Agrave+ড)oslash(Agrave+Atilde+র)AElig(Agrave+ঢ)ugrave(Agrave+ণ)ণ8(Agrave+য)uacute(Agrave+ব)atilde(p+frac12+ণ)ucirc(frac12+ণ)uuml(+ণ)
Apart from the above individual code-points the Neo-Brāhmī Generation Panel alsoproposes some specific sequences which enable conditional inclusion of the BanglaLETTER A and E followed by Bangla SIGN VIRAMA and Bangla LETTER YA againfollowed by Bangla VOWEL SIGN AA in the repertoire for enabling inclusion of aeligsoundasinEnglishlsquobatrsquolsquocatrsquoetc
52 CodePointRepertoireExclusionTherearesomecharactersoftheBanglascriptthatfindplaceintheUnicodebuthavenot been included in the repertoire in the LGR proposal The reason for excludingঌ(U+098C)andৗ(U+09D7)isthattheyarerareandobsoletecharacters
53 CodepointnotusedaloneBENGALI SIGN NUKTA U+09BC (See 336) is excluded from repertoire since it will never be used alone It will be used as sequence in three special characters in normalized form for ড় ঢ় য়
542 TheVowelSequenceInwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaare given To facilitate understanding of other Brahmi script users equivalents inDevanāgarīareprovidedwherevernecessaryAvowelsequenceismadeupofasinglevowelItmaybefollowedbutnotnecessarily(optionally)byanAnusvāraonuʃʃār(B)Candrabindu(D)oraVisargabiʃɔrgo(X)ThenumberofDBorXwhichcanfollowaV inBanglamaynotberestrictedtooneGoingbytherules illustratedinthedocument it isclearthat formationssuchasVDDVBBandVXXare invalidorthographicunitsHowever it isvalidandpossible tohaveformationsorsequencessuchasanusvarafollowedbyachandrabinduononehandandvisarga followed by a chandrabindu on the other as in হ8াংচা lsquohaelignchārsquo and lsquohaelignrsquo হ8াঃrespectivelyThepossibilityof aVisarga orAnusvāra (onuʃʃār) followingaCandrabinduexists inBangla Vowel can optionally be followed by a combination of Hasanta Virama [H]Consonant [C] to formaYa-phala ldquoYa-phala isapresentation formofU+09AFBanglaletterযorlsquoyarsquoRepresentedbythesequenceltU+09CDieBENGALISIGNVIRAMABangla SIGNHasanta or VIRAMA U+09AF -য BENGALI LETTER YAgt Ya-phala has aspecialformয়AgainwhencombinedwithU+09BEাBENGALIVOWELSIGNAA(ielsquoaarsquo(ā))itisusedfortranscribing[aelig]asintheldquoardquointheEnglishwordldquobatrdquowritteninBanglaasব8াটAVowel-sequenceadmitsthefollowingcombinations
5421 ASingleVowel
ExamplesV অअ
5422 AVowelwithConditionsAVowelcanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D]orVisarga[X]or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX] or combination ofHasanta(orVirama)[H]followedbyConsonant[C]followedbykāra(Mātrā)[M]
5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]
5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX
Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)
545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full
10 Refer to Rule P in Section 7 Table 16
42
vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+
6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed
61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus
স +Hasanta+হ(U+09B8+U+09CD+U+09B9)
2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus
ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)
44
Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)
62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference
7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecifications
11 Unicode uses Oriya for the script although Odia is now the official term used 12 As used by the Unicode denoting and including both Assamese and Maṇipuri
47
Below are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)
C rarr Consonant
M rarr Kāra(Mātrā)
V rarr Vowel
B rarr Anusvāra
D rarr Candrabindu
X rarr Visarga
H rarr Hasanta
Z rarr KhandaTa
S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules
P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA
Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinations
49
711 Case of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াtঅuইিvয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অu)andthesecondwordbeginswithaV(ইিvয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstwordThis isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule
72 AdditionalExamplesfromBanglaABNF
Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive
ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of the
JanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka
[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018
[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473
57
10 Appendix-I
101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply
1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াwেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াxব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexampleyেসিরং(Tsering)
102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti
13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section
58
Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)
Table17ndashTheScriptTableofSylhetiNagarıorSiloti
103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints
ক k Uuml(p+ক)raquo(p+ট)(p+ত)Yacute(p+y+র)(p+y+ব)THORN(p+ন)szlig(p+ব)agrave(p+ম)ক8(p+য)aacute(p+র)acirc(p+ষ)atilde(p+frac12+ণ)auml(p+frac12+ম)aring(p+frac12+ব)acirc8(p+frac12+য)aelig(p+স)t(ccedil+ক)egrave(+p+র)
amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)
গ g ecirc(micro+গ)euml(micro+দ)acute(micro+ধ)igrave(micro+ন)iacute(micro+ব)icirc(micro+ম)গ8(micro+য)iuml(micro+র)eth(micro+ল)ntilde(ccedil+গ)ntilde6 ( +ccedil+গ)
ণ n ouml(Agrave+ট)iquest(Agrave+ঠ)divide(Agrave+ড)oslash(Agrave+Atilde+র)AElig(Agrave+ঢ)ugrave(Agrave+ণ)ণ8(Agrave+য)uacute(Agrave+ব)atilde(p+frac12+ণ)ucirc(frac12+ণ)uuml(+ণ)
52 CodePointRepertoireExclusionTherearesomecharactersoftheBanglascriptthatfindplaceintheUnicodebuthavenot been included in the repertoire in the LGR proposal The reason for excludingঌ(U+098C)andৗ(U+09D7)isthattheyarerareandobsoletecharacters
53 CodepointnotusedaloneBENGALI SIGN NUKTA U+09BC (See 336) is excluded from repertoire since it will never be used alone It will be used as sequence in three special characters in normalized form for ড় ঢ় য়
542 TheVowelSequenceInwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaare given To facilitate understanding of other Brahmi script users equivalents inDevanāgarīareprovidedwherevernecessaryAvowelsequenceismadeupofasinglevowelItmaybefollowedbutnotnecessarily(optionally)byanAnusvāraonuʃʃār(B)Candrabindu(D)oraVisargabiʃɔrgo(X)ThenumberofDBorXwhichcanfollowaV inBanglamaynotberestrictedtooneGoingbytherules illustratedinthedocument it isclearthat formationssuchasVDDVBBandVXXare invalidorthographicunitsHowever it isvalidandpossible tohaveformationsorsequencessuchasanusvarafollowedbyachandrabinduononehandandvisarga followed by a chandrabindu on the other as in হ8াংচা lsquohaelignchārsquo and lsquohaelignrsquo হ8াঃrespectivelyThepossibilityof aVisarga orAnusvāra (onuʃʃār) followingaCandrabinduexists inBangla Vowel can optionally be followed by a combination of Hasanta Virama [H]Consonant [C] to formaYa-phala ldquoYa-phala isapresentation formofU+09AFBanglaletterযorlsquoyarsquoRepresentedbythesequenceltU+09CDieBENGALISIGNVIRAMABangla SIGNHasanta or VIRAMA U+09AF -য BENGALI LETTER YAgt Ya-phala has aspecialformয়AgainwhencombinedwithU+09BEাBENGALIVOWELSIGNAA(ielsquoaarsquo(ā))itisusedfortranscribing[aelig]asintheldquoardquointheEnglishwordldquobatrdquowritteninBanglaasব8াটAVowel-sequenceadmitsthefollowingcombinations
5421 ASingleVowel
ExamplesV অअ
5422 AVowelwithConditionsAVowelcanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D]orVisarga[X]or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX] or combination ofHasanta(orVirama)[H]followedbyConsonant[C]followedbykāra(Mātrā)[M]
5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]
5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX
Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)
545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full
10 Refer to Rule P in Section 7 Table 16
42
vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+
6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed
61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus
স +Hasanta+হ(U+09B8+U+09CD+U+09B9)
2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus
ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)
44
Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)
62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference
7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecifications
11 Unicode uses Oriya for the script although Odia is now the official term used 12 As used by the Unicode denoting and including both Assamese and Maṇipuri
47
Below are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)
C rarr Consonant
M rarr Kāra(Mātrā)
V rarr Vowel
B rarr Anusvāra
D rarr Candrabindu
X rarr Visarga
H rarr Hasanta
Z rarr KhandaTa
S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules
P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA
Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinations
49
711 Case of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াtঅuইিvয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অu)andthesecondwordbeginswithaV(ইিvয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstwordThis isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule
72 AdditionalExamplesfromBanglaABNF
Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive
ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of the
JanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka
[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018
[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473
57
10 Appendix-I
101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply
1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াwেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াxব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexampleyেসিরং(Tsering)
102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti
13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section
58
Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)
Table17ndashTheScriptTableofSylhetiNagarıorSiloti
103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints
ক k Uuml(p+ক)raquo(p+ট)(p+ত)Yacute(p+y+র)(p+y+ব)THORN(p+ন)szlig(p+ব)agrave(p+ম)ক8(p+য)aacute(p+র)acirc(p+ষ)atilde(p+frac12+ণ)auml(p+frac12+ম)aring(p+frac12+ব)acirc8(p+frac12+য)aelig(p+স)t(ccedil+ক)egrave(+p+র)
amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)
গ g ecirc(micro+গ)euml(micro+দ)acute(micro+ধ)igrave(micro+ন)iacute(micro+ব)icirc(micro+ম)গ8(micro+য)iuml(micro+র)eth(micro+ল)ntilde(ccedil+গ)ntilde6 ( +ccedil+গ)
ণ n ouml(Agrave+ট)iquest(Agrave+ঠ)divide(Agrave+ড)oslash(Agrave+Atilde+র)AElig(Agrave+ঢ)ugrave(Agrave+ণ)ণ8(Agrave+য)uacute(Agrave+ব)atilde(p+frac12+ণ)ucirc(frac12+ণ)uuml(+ণ)
53 CodepointnotusedaloneBENGALI SIGN NUKTA U+09BC (See 336) is excluded from repertoire since it will never be used alone It will be used as sequence in three special characters in normalized form for ড় ঢ় য়
542 TheVowelSequenceInwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaare given To facilitate understanding of other Brahmi script users equivalents inDevanāgarīareprovidedwherevernecessaryAvowelsequenceismadeupofasinglevowelItmaybefollowedbutnotnecessarily(optionally)byanAnusvāraonuʃʃār(B)Candrabindu(D)oraVisargabiʃɔrgo(X)ThenumberofDBorXwhichcanfollowaV inBanglamaynotberestrictedtooneGoingbytherules illustratedinthedocument it isclearthat formationssuchasVDDVBBandVXXare invalidorthographicunitsHowever it isvalidandpossible tohaveformationsorsequencessuchasanusvarafollowedbyachandrabinduononehandandvisarga followed by a chandrabindu on the other as in হ8াংচা lsquohaelignchārsquo and lsquohaelignrsquo হ8াঃrespectivelyThepossibilityof aVisarga orAnusvāra (onuʃʃār) followingaCandrabinduexists inBangla Vowel can optionally be followed by a combination of Hasanta Virama [H]Consonant [C] to formaYa-phala ldquoYa-phala isapresentation formofU+09AFBanglaletterযorlsquoyarsquoRepresentedbythesequenceltU+09CDieBENGALISIGNVIRAMABangla SIGNHasanta or VIRAMA U+09AF -য BENGALI LETTER YAgt Ya-phala has aspecialformয়AgainwhencombinedwithU+09BEাBENGALIVOWELSIGNAA(ielsquoaarsquo(ā))itisusedfortranscribing[aelig]asintheldquoardquointheEnglishwordldquobatrdquowritteninBanglaasব8াটAVowel-sequenceadmitsthefollowingcombinations
5421 ASingleVowel
ExamplesV অअ
5422 AVowelwithConditionsAVowelcanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D]orVisarga[X]or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX] or combination ofHasanta(orVirama)[H]followedbyConsonant[C]followedbykāra(Mātrā)[M]
5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]
5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX
Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)
545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full
10 Refer to Rule P in Section 7 Table 16
42
vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+
6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed
61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus
স +Hasanta+হ(U+09B8+U+09CD+U+09B9)
2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus
ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)
44
Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)
62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference
7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecifications
11 Unicode uses Oriya for the script although Odia is now the official term used 12 As used by the Unicode denoting and including both Assamese and Maṇipuri
47
Below are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)
C rarr Consonant
M rarr Kāra(Mātrā)
V rarr Vowel
B rarr Anusvāra
D rarr Candrabindu
X rarr Visarga
H rarr Hasanta
Z rarr KhandaTa
S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules
P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA
Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinations
49
711 Case of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াtঅuইিvয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অu)andthesecondwordbeginswithaV(ইিvয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstwordThis isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule
72 AdditionalExamplesfromBanglaABNF
Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive
ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of the
JanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka
[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018
[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473
57
10 Appendix-I
101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply
1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াwেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াxব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexampleyেসিরং(Tsering)
102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti
13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section
58
Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)
Table17ndashTheScriptTableofSylhetiNagarıorSiloti
103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints
ক k Uuml(p+ক)raquo(p+ট)(p+ত)Yacute(p+y+র)(p+y+ব)THORN(p+ন)szlig(p+ব)agrave(p+ম)ক8(p+য)aacute(p+র)acirc(p+ষ)atilde(p+frac12+ণ)auml(p+frac12+ম)aring(p+frac12+ব)acirc8(p+frac12+য)aelig(p+স)t(ccedil+ক)egrave(+p+র)
amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)
গ g ecirc(micro+গ)euml(micro+দ)acute(micro+ধ)igrave(micro+ন)iacute(micro+ব)icirc(micro+ম)গ8(micro+য)iuml(micro+র)eth(micro+ল)ntilde(ccedil+গ)ntilde6 ( +ccedil+গ)
ণ n ouml(Agrave+ট)iquest(Agrave+ঠ)divide(Agrave+ড)oslash(Agrave+Atilde+র)AElig(Agrave+ঢ)ugrave(Agrave+ণ)ণ8(Agrave+য)uacute(Agrave+ব)atilde(p+frac12+ণ)ucirc(frac12+ণ)uuml(+ণ)
542 TheVowelSequenceInwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaare given To facilitate understanding of other Brahmi script users equivalents inDevanāgarīareprovidedwherevernecessaryAvowelsequenceismadeupofasinglevowelItmaybefollowedbutnotnecessarily(optionally)byanAnusvāraonuʃʃār(B)Candrabindu(D)oraVisargabiʃɔrgo(X)ThenumberofDBorXwhichcanfollowaV inBanglamaynotberestrictedtooneGoingbytherules illustratedinthedocument it isclearthat formationssuchasVDDVBBandVXXare invalidorthographicunitsHowever it isvalidandpossible tohaveformationsorsequencessuchasanusvarafollowedbyachandrabinduononehandandvisarga followed by a chandrabindu on the other as in হ8াংচা lsquohaelignchārsquo and lsquohaelignrsquo হ8াঃrespectivelyThepossibilityof aVisarga orAnusvāra (onuʃʃār) followingaCandrabinduexists inBangla Vowel can optionally be followed by a combination of Hasanta Virama [H]Consonant [C] to formaYa-phala ldquoYa-phala isapresentation formofU+09AFBanglaletterযorlsquoyarsquoRepresentedbythesequenceltU+09CDieBENGALISIGNVIRAMABangla SIGNHasanta or VIRAMA U+09AF -য BENGALI LETTER YAgt Ya-phala has aspecialformয়AgainwhencombinedwithU+09BEাBENGALIVOWELSIGNAA(ielsquoaarsquo(ā))itisusedfortranscribing[aelig]asintheldquoardquointheEnglishwordldquobatrdquowritteninBanglaasব8াটAVowel-sequenceadmitsthefollowingcombinations
5421 ASingleVowel
ExamplesV অअ
5422 AVowelwithConditionsAVowelcanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D]orVisarga[X]or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX] or combination ofHasanta(orVirama)[H]followedbyConsonant[C]followedbykāra(Mātrā)[M]
5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]
5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX
Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)
545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full
10 Refer to Rule P in Section 7 Table 16
42
vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+
6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed
61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus
স +Hasanta+হ(U+09B8+U+09CD+U+09B9)
2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus
ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)
44
Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)
62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference
7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecifications
11 Unicode uses Oriya for the script although Odia is now the official term used 12 As used by the Unicode denoting and including both Assamese and Maṇipuri
47
Below are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)
C rarr Consonant
M rarr Kāra(Mātrā)
V rarr Vowel
B rarr Anusvāra
D rarr Candrabindu
X rarr Visarga
H rarr Hasanta
Z rarr KhandaTa
S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules
P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA
Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinations
49
711 Case of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াtঅuইিvয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অu)andthesecondwordbeginswithaV(ইিvয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstwordThis isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule
72 AdditionalExamplesfromBanglaABNF
Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive
ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of the
JanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka
[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018
[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473
57
10 Appendix-I
101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply
1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াwেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াxব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexampleyেসিরং(Tsering)
102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti
13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section
58
Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)
Table17ndashTheScriptTableofSylhetiNagarıorSiloti
103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints
ক k Uuml(p+ক)raquo(p+ট)(p+ত)Yacute(p+y+র)(p+y+ব)THORN(p+ন)szlig(p+ব)agrave(p+ম)ক8(p+য)aacute(p+র)acirc(p+ষ)atilde(p+frac12+ণ)auml(p+frac12+ম)aring(p+frac12+ব)acirc8(p+frac12+য)aelig(p+স)t(ccedil+ক)egrave(+p+র)
amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)
গ g ecirc(micro+গ)euml(micro+দ)acute(micro+ধ)igrave(micro+ন)iacute(micro+ব)icirc(micro+ম)গ8(micro+য)iuml(micro+র)eth(micro+ল)ntilde(ccedil+গ)ntilde6 ( +ccedil+গ)
ণ n ouml(Agrave+ট)iquest(Agrave+ঠ)divide(Agrave+ড)oslash(Agrave+Atilde+র)AElig(Agrave+ঢ)ugrave(Agrave+ণ)ণ8(Agrave+য)uacute(Agrave+ব)atilde(p+frac12+ণ)ucirc(frac12+ণ)uuml(+ণ)
5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]
5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX
Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)
545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full
10 Refer to Rule P in Section 7 Table 16
42
vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+
6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed
61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus
স +Hasanta+হ(U+09B8+U+09CD+U+09B9)
2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus
ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)
44
Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)
62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference
7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecifications
11 Unicode uses Oriya for the script although Odia is now the official term used 12 As used by the Unicode denoting and including both Assamese and Maṇipuri
47
Below are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)
C rarr Consonant
M rarr Kāra(Mātrā)
V rarr Vowel
B rarr Anusvāra
D rarr Candrabindu
X rarr Visarga
H rarr Hasanta
Z rarr KhandaTa
S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules
P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA
Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinations
49
711 Case of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াtঅuইিvয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অu)andthesecondwordbeginswithaV(ইিvয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstwordThis isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule
72 AdditionalExamplesfromBanglaABNF
Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive
ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of the
JanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka
[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018
[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473
57
10 Appendix-I
101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply
1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াwেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াxব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexampleyেসিরং(Tsering)
102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti
13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section
58
Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)
Table17ndashTheScriptTableofSylhetiNagarıorSiloti
103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints
ক k Uuml(p+ক)raquo(p+ট)(p+ত)Yacute(p+y+র)(p+y+ব)THORN(p+ন)szlig(p+ব)agrave(p+ম)ক8(p+য)aacute(p+র)acirc(p+ষ)atilde(p+frac12+ণ)auml(p+frac12+ম)aring(p+frac12+ব)acirc8(p+frac12+য)aelig(p+স)t(ccedil+ক)egrave(+p+র)
amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)
গ g ecirc(micro+গ)euml(micro+দ)acute(micro+ধ)igrave(micro+ন)iacute(micro+ব)icirc(micro+ম)গ8(micro+য)iuml(micro+র)eth(micro+ল)ntilde(ccedil+গ)ntilde6 ( +ccedil+গ)
ণ n ouml(Agrave+ট)iquest(Agrave+ঠ)divide(Agrave+ড)oslash(Agrave+Atilde+র)AElig(Agrave+ঢ)ugrave(Agrave+ণ)ণ8(Agrave+য)uacute(Agrave+ব)atilde(p+frac12+ণ)ucirc(frac12+ণ)uuml(+ণ)
5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX
Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)
545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full
10 Refer to Rule P in Section 7 Table 16
42
vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+
6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed
61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus
স +Hasanta+হ(U+09B8+U+09CD+U+09B9)
2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus
ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)
44
Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)
62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference
7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecifications
11 Unicode uses Oriya for the script although Odia is now the official term used 12 As used by the Unicode denoting and including both Assamese and Maṇipuri
47
Below are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)
C rarr Consonant
M rarr Kāra(Mātrā)
V rarr Vowel
B rarr Anusvāra
D rarr Candrabindu
X rarr Visarga
H rarr Hasanta
Z rarr KhandaTa
S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules
P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA
Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinations
49
711 Case of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াtঅuইিvয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অu)andthesecondwordbeginswithaV(ইিvয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstwordThis isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule
72 AdditionalExamplesfromBanglaABNF
Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive
ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of the
JanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka
[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018
[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473
57
10 Appendix-I
101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply
1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াwেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াxব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexampleyেসিরং(Tsering)
102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti
13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section
58
Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)
Table17ndashTheScriptTableofSylhetiNagarıorSiloti
103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints
ক k Uuml(p+ক)raquo(p+ট)(p+ত)Yacute(p+y+র)(p+y+ব)THORN(p+ন)szlig(p+ব)agrave(p+ম)ক8(p+য)aacute(p+র)acirc(p+ষ)atilde(p+frac12+ণ)auml(p+frac12+ম)aring(p+frac12+ব)acirc8(p+frac12+য)aelig(p+স)t(ccedil+ক)egrave(+p+র)
amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)
গ g ecirc(micro+গ)euml(micro+দ)acute(micro+ধ)igrave(micro+ন)iacute(micro+ব)icirc(micro+ম)গ8(micro+য)iuml(micro+র)eth(micro+ল)ntilde(ccedil+গ)ntilde6 ( +ccedil+গ)
ণ n ouml(Agrave+ট)iquest(Agrave+ঠ)divide(Agrave+ড)oslash(Agrave+Atilde+র)AElig(Agrave+ঢ)ugrave(Agrave+ণ)ণ8(Agrave+য)uacute(Agrave+ব)atilde(p+frac12+ণ)ucirc(frac12+ণ)uuml(+ণ)
Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)
545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full
10 Refer to Rule P in Section 7 Table 16
42
vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+
6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed
61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus
স +Hasanta+হ(U+09B8+U+09CD+U+09B9)
2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus
ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)
44
Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)
62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference
7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecifications
11 Unicode uses Oriya for the script although Odia is now the official term used 12 As used by the Unicode denoting and including both Assamese and Maṇipuri
47
Below are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)
C rarr Consonant
M rarr Kāra(Mātrā)
V rarr Vowel
B rarr Anusvāra
D rarr Candrabindu
X rarr Visarga
H rarr Hasanta
Z rarr KhandaTa
S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules
P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA
Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinations
49
711 Case of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াtঅuইিvয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অu)andthesecondwordbeginswithaV(ইিvয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstwordThis isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule
72 AdditionalExamplesfromBanglaABNF
Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive
ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of the
JanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka
[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018
[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473
57
10 Appendix-I
101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply
1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াwেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াxব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexampleyেসিরং(Tsering)
102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti
13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section
58
Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)
Table17ndashTheScriptTableofSylhetiNagarıorSiloti
103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints
ক k Uuml(p+ক)raquo(p+ট)(p+ত)Yacute(p+y+র)(p+y+ব)THORN(p+ন)szlig(p+ব)agrave(p+ম)ক8(p+য)aacute(p+র)acirc(p+ষ)atilde(p+frac12+ণ)auml(p+frac12+ম)aring(p+frac12+ব)acirc8(p+frac12+য)aelig(p+স)t(ccedil+ক)egrave(+p+র)
amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)
গ g ecirc(micro+গ)euml(micro+দ)acute(micro+ধ)igrave(micro+ন)iacute(micro+ব)icirc(micro+ম)গ8(micro+য)iuml(micro+র)eth(micro+ল)ntilde(ccedil+গ)ntilde6 ( +ccedil+গ)
ণ n ouml(Agrave+ট)iquest(Agrave+ঠ)divide(Agrave+ড)oslash(Agrave+Atilde+র)AElig(Agrave+ঢ)ugrave(Agrave+ণ)ণ8(Agrave+য)uacute(Agrave+ব)atilde(p+frac12+ণ)ucirc(frac12+ণ)uuml(+ণ)
vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+
6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed
61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus
স +Hasanta+হ(U+09B8+U+09CD+U+09B9)
2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus
ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)
44
Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)
62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference
7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecifications
11 Unicode uses Oriya for the script although Odia is now the official term used 12 As used by the Unicode denoting and including both Assamese and Maṇipuri
47
Below are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)
C rarr Consonant
M rarr Kāra(Mātrā)
V rarr Vowel
B rarr Anusvāra
D rarr Candrabindu
X rarr Visarga
H rarr Hasanta
Z rarr KhandaTa
S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules
P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA
Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinations
49
711 Case of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াtঅuইিvয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অu)andthesecondwordbeginswithaV(ইিvয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstwordThis isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule
72 AdditionalExamplesfromBanglaABNF
Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive
ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of the
JanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka
[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018
[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473
57
10 Appendix-I
101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply
1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াwেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াxব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexampleyেসিরং(Tsering)
102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti
13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section
58
Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)
Table17ndashTheScriptTableofSylhetiNagarıorSiloti
103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints
ক k Uuml(p+ক)raquo(p+ট)(p+ত)Yacute(p+y+র)(p+y+ব)THORN(p+ন)szlig(p+ব)agrave(p+ম)ক8(p+য)aacute(p+র)acirc(p+ষ)atilde(p+frac12+ণ)auml(p+frac12+ম)aring(p+frac12+ব)acirc8(p+frac12+য)aelig(p+স)t(ccedil+ক)egrave(+p+র)
amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)
গ g ecirc(micro+গ)euml(micro+দ)acute(micro+ধ)igrave(micro+ন)iacute(micro+ব)icirc(micro+ম)গ8(micro+য)iuml(micro+র)eth(micro+ল)ntilde(ccedil+গ)ntilde6 ( +ccedil+গ)
ণ n ouml(Agrave+ট)iquest(Agrave+ঠ)divide(Agrave+ড)oslash(Agrave+Atilde+র)AElig(Agrave+ঢ)ugrave(Agrave+ণ)ণ8(Agrave+য)uacute(Agrave+ব)atilde(p+frac12+ণ)ucirc(frac12+ণ)uuml(+ণ)
6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed
61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus
স +Hasanta+হ(U+09B8+U+09CD+U+09B9)
2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus
ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)
44
Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)
62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference
7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecifications
11 Unicode uses Oriya for the script although Odia is now the official term used 12 As used by the Unicode denoting and including both Assamese and Maṇipuri
47
Below are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)
C rarr Consonant
M rarr Kāra(Mātrā)
V rarr Vowel
B rarr Anusvāra
D rarr Candrabindu
X rarr Visarga
H rarr Hasanta
Z rarr KhandaTa
S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules
P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA
Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinations
49
711 Case of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াtঅuইিvয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অu)andthesecondwordbeginswithaV(ইিvয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstwordThis isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule
72 AdditionalExamplesfromBanglaABNF
Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive
ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of the
JanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka
[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018
[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473
57
10 Appendix-I
101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply
1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াwেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াxব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexampleyেসিরং(Tsering)
102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti
13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section
58
Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)
Table17ndashTheScriptTableofSylhetiNagarıorSiloti
103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints
ক k Uuml(p+ক)raquo(p+ট)(p+ত)Yacute(p+y+র)(p+y+ব)THORN(p+ন)szlig(p+ব)agrave(p+ম)ক8(p+য)aacute(p+র)acirc(p+ষ)atilde(p+frac12+ণ)auml(p+frac12+ম)aring(p+frac12+ব)acirc8(p+frac12+য)aelig(p+স)t(ccedil+ক)egrave(+p+র)
amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)
গ g ecirc(micro+গ)euml(micro+দ)acute(micro+ধ)igrave(micro+ন)iacute(micro+ব)icirc(micro+ম)গ8(micro+য)iuml(micro+র)eth(micro+ল)ntilde(ccedil+গ)ntilde6 ( +ccedil+গ)
ণ n ouml(Agrave+ট)iquest(Agrave+ঠ)divide(Agrave+ড)oslash(Agrave+Atilde+র)AElig(Agrave+ঢ)ugrave(Agrave+ণ)ণ8(Agrave+য)uacute(Agrave+ব)atilde(p+frac12+ণ)ucirc(frac12+ণ)uuml(+ণ)
Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)
62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference
7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecifications
11 Unicode uses Oriya for the script although Odia is now the official term used 12 As used by the Unicode denoting and including both Assamese and Maṇipuri
47
Below are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)
C rarr Consonant
M rarr Kāra(Mātrā)
V rarr Vowel
B rarr Anusvāra
D rarr Candrabindu
X rarr Visarga
H rarr Hasanta
Z rarr KhandaTa
S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules
P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA
Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinations
49
711 Case of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াtঅuইিvয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অu)andthesecondwordbeginswithaV(ইিvয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstwordThis isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule
72 AdditionalExamplesfromBanglaABNF
Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive
ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of the
JanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka
[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018
[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473
57
10 Appendix-I
101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply
1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াwেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াxব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexampleyেসিরং(Tsering)
102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti
13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section
58
Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)
Table17ndashTheScriptTableofSylhetiNagarıorSiloti
103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints
ক k Uuml(p+ক)raquo(p+ট)(p+ত)Yacute(p+y+র)(p+y+ব)THORN(p+ন)szlig(p+ব)agrave(p+ম)ক8(p+য)aacute(p+র)acirc(p+ষ)atilde(p+frac12+ণ)auml(p+frac12+ম)aring(p+frac12+ব)acirc8(p+frac12+য)aelig(p+স)t(ccedil+ক)egrave(+p+র)
amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)
গ g ecirc(micro+গ)euml(micro+দ)acute(micro+ধ)igrave(micro+ন)iacute(micro+ব)icirc(micro+ম)গ8(micro+য)iuml(micro+র)eth(micro+ল)ntilde(ccedil+গ)ntilde6 ( +ccedil+গ)
ণ n ouml(Agrave+ট)iquest(Agrave+ঠ)divide(Agrave+ড)oslash(Agrave+Atilde+র)AElig(Agrave+ঢ)ugrave(Agrave+ণ)ণ8(Agrave+য)uacute(Agrave+ব)atilde(p+frac12+ণ)ucirc(frac12+ণ)uuml(+ণ)
62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference
7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecifications
11 Unicode uses Oriya for the script although Odia is now the official term used 12 As used by the Unicode denoting and including both Assamese and Maṇipuri
47
Below are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)
C rarr Consonant
M rarr Kāra(Mātrā)
V rarr Vowel
B rarr Anusvāra
D rarr Candrabindu
X rarr Visarga
H rarr Hasanta
Z rarr KhandaTa
S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules
P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA
Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinations
49
711 Case of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াtঅuইিvয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অu)andthesecondwordbeginswithaV(ইিvয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstwordThis isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule
72 AdditionalExamplesfromBanglaABNF
Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive
ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of the
JanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka
[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018
[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473
57
10 Appendix-I
101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply
1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াwেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াxব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexampleyেসিরং(Tsering)
102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti
13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section
58
Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)
Table17ndashTheScriptTableofSylhetiNagarıorSiloti
103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints
ক k Uuml(p+ক)raquo(p+ট)(p+ত)Yacute(p+y+র)(p+y+ব)THORN(p+ন)szlig(p+ব)agrave(p+ম)ক8(p+য)aacute(p+র)acirc(p+ষ)atilde(p+frac12+ণ)auml(p+frac12+ম)aring(p+frac12+ব)acirc8(p+frac12+য)aelig(p+স)t(ccedil+ক)egrave(+p+র)
amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)
গ g ecirc(micro+গ)euml(micro+দ)acute(micro+ধ)igrave(micro+ন)iacute(micro+ব)icirc(micro+ম)গ8(micro+য)iuml(micro+র)eth(micro+ল)ntilde(ccedil+গ)ntilde6 ( +ccedil+গ)
ণ n ouml(Agrave+ট)iquest(Agrave+ঠ)divide(Agrave+ড)oslash(Agrave+Atilde+র)AElig(Agrave+ঢ)ugrave(Agrave+ণ)ণ8(Agrave+য)uacute(Agrave+ব)atilde(p+frac12+ণ)ucirc(frac12+ণ)uuml(+ণ)
62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference
7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecifications
11 Unicode uses Oriya for the script although Odia is now the official term used 12 As used by the Unicode denoting and including both Assamese and Maṇipuri
47
Below are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)
C rarr Consonant
M rarr Kāra(Mātrā)
V rarr Vowel
B rarr Anusvāra
D rarr Candrabindu
X rarr Visarga
H rarr Hasanta
Z rarr KhandaTa
S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules
P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA
Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinations
49
711 Case of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াtঅuইিvয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অu)andthesecondwordbeginswithaV(ইিvয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstwordThis isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule
72 AdditionalExamplesfromBanglaABNF
Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive
ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of the
JanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka
[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018
[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473
57
10 Appendix-I
101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply
1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াwেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াxব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexampleyেসিরং(Tsering)
102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti
13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section
58
Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)
Table17ndashTheScriptTableofSylhetiNagarıorSiloti
103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints
ক k Uuml(p+ক)raquo(p+ট)(p+ত)Yacute(p+y+র)(p+y+ব)THORN(p+ন)szlig(p+ব)agrave(p+ম)ক8(p+য)aacute(p+র)acirc(p+ষ)atilde(p+frac12+ণ)auml(p+frac12+ম)aring(p+frac12+ব)acirc8(p+frac12+য)aelig(p+স)t(ccedil+ক)egrave(+p+র)
amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)
গ g ecirc(micro+গ)euml(micro+দ)acute(micro+ধ)igrave(micro+ন)iacute(micro+ব)icirc(micro+ম)গ8(micro+য)iuml(micro+র)eth(micro+ল)ntilde(ccedil+গ)ntilde6 ( +ccedil+গ)
ণ n ouml(Agrave+ট)iquest(Agrave+ঠ)divide(Agrave+ড)oslash(Agrave+Atilde+র)AElig(Agrave+ঢ)ugrave(Agrave+ণ)ণ8(Agrave+য)uacute(Agrave+ব)atilde(p+frac12+ণ)ucirc(frac12+ণ)uuml(+ণ)
Below are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)
C rarr Consonant
M rarr Kāra(Mātrā)
V rarr Vowel
B rarr Anusvāra
D rarr Candrabindu
X rarr Visarga
H rarr Hasanta
Z rarr KhandaTa
S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules
P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA
Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinations
49
711 Case of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াtঅuইিvয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অu)andthesecondwordbeginswithaV(ইিvয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstwordThis isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule
72 AdditionalExamplesfromBanglaABNF
Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive
ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of the
JanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka
[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018
[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473
57
10 Appendix-I
101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply
1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াwেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াxব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexampleyেসিরং(Tsering)
102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti
13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section
58
Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)
Table17ndashTheScriptTableofSylhetiNagarıorSiloti
103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints
ক k Uuml(p+ক)raquo(p+ট)(p+ত)Yacute(p+y+র)(p+y+ব)THORN(p+ন)szlig(p+ব)agrave(p+ম)ক8(p+য)aacute(p+র)acirc(p+ষ)atilde(p+frac12+ণ)auml(p+frac12+ম)aring(p+frac12+ব)acirc8(p+frac12+য)aelig(p+স)t(ccedil+ক)egrave(+p+র)
amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)
গ g ecirc(micro+গ)euml(micro+দ)acute(micro+ধ)igrave(micro+ন)iacute(micro+ব)icirc(micro+ম)গ8(micro+য)iuml(micro+র)eth(micro+ল)ntilde(ccedil+গ)ntilde6 ( +ccedil+গ)
ণ n ouml(Agrave+ট)iquest(Agrave+ঠ)divide(Agrave+ড)oslash(Agrave+Atilde+র)AElig(Agrave+ঢ)ugrave(Agrave+ণ)ণ8(Agrave+য)uacute(Agrave+ব)atilde(p+frac12+ণ)ucirc(frac12+ণ)uuml(+ণ)
Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinations
49
711 Case of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াtঅuইিvয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অu)andthesecondwordbeginswithaV(ইিvয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstwordThis isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule
72 AdditionalExamplesfromBanglaABNF
Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive
ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of the
JanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka
[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018
[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473
57
10 Appendix-I
101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply
1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াwেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াxব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexampleyেসিরং(Tsering)
102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti
13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section
58
Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)
Table17ndashTheScriptTableofSylhetiNagarıorSiloti
103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints
ক k Uuml(p+ক)raquo(p+ট)(p+ত)Yacute(p+y+র)(p+y+ব)THORN(p+ন)szlig(p+ব)agrave(p+ম)ক8(p+য)aacute(p+র)acirc(p+ষ)atilde(p+frac12+ণ)auml(p+frac12+ম)aring(p+frac12+ব)acirc8(p+frac12+য)aelig(p+স)t(ccedil+ক)egrave(+p+র)
amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)
গ g ecirc(micro+গ)euml(micro+দ)acute(micro+ধ)igrave(micro+ন)iacute(micro+ব)icirc(micro+ম)গ8(micro+য)iuml(micro+র)eth(micro+ল)ntilde(ccedil+গ)ntilde6 ( +ccedil+গ)
ণ n ouml(Agrave+ট)iquest(Agrave+ঠ)divide(Agrave+ড)oslash(Agrave+Atilde+র)AElig(Agrave+ঢ)ugrave(Agrave+ণ)ণ8(Agrave+য)uacute(Agrave+ব)atilde(p+frac12+ণ)ucirc(frac12+ণ)uuml(+ণ)
711 Case of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াtঅuইিvয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অu)andthesecondwordbeginswithaV(ইিvয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstwordThis isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule
72 AdditionalExamplesfromBanglaABNF
Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive
ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of the
JanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka
[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018
[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473
57
10 Appendix-I
101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply
1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াwেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াxব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexampleyেসিরং(Tsering)
102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti
13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section
58
Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)
Table17ndashTheScriptTableofSylhetiNagarıorSiloti
103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints
ক k Uuml(p+ক)raquo(p+ট)(p+ত)Yacute(p+y+র)(p+y+ব)THORN(p+ন)szlig(p+ব)agrave(p+ম)ক8(p+য)aacute(p+র)acirc(p+ষ)atilde(p+frac12+ণ)auml(p+frac12+ম)aring(p+frac12+ব)acirc8(p+frac12+য)aelig(p+স)t(ccedil+ক)egrave(+p+র)
amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)
গ g ecirc(micro+গ)euml(micro+দ)acute(micro+ধ)igrave(micro+ন)iacute(micro+ব)icirc(micro+ম)গ8(micro+য)iuml(micro+র)eth(micro+ল)ntilde(ccedil+গ)ntilde6 ( +ccedil+গ)
ণ n ouml(Agrave+ট)iquest(Agrave+ঠ)divide(Agrave+ড)oslash(Agrave+Atilde+র)AElig(Agrave+ঢ)ugrave(Agrave+ণ)ণ8(Agrave+য)uacute(Agrave+ব)atilde(p+frac12+ণ)ucirc(frac12+ণ)uuml(+ণ)
JanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka
[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018
[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473
57
10 Appendix-I
101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply
1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াwেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াxব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexampleyেসিরং(Tsering)
102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti
13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section
58
Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)
Table17ndashTheScriptTableofSylhetiNagarıorSiloti
103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints
ক k Uuml(p+ক)raquo(p+ট)(p+ত)Yacute(p+y+র)(p+y+ব)THORN(p+ন)szlig(p+ব)agrave(p+ম)ক8(p+য)aacute(p+র)acirc(p+ষ)atilde(p+frac12+ণ)auml(p+frac12+ম)aring(p+frac12+ব)acirc8(p+frac12+য)aelig(p+স)t(ccedil+ক)egrave(+p+র)
amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)
গ g ecirc(micro+গ)euml(micro+দ)acute(micro+ধ)igrave(micro+ন)iacute(micro+ব)icirc(micro+ম)গ8(micro+য)iuml(micro+র)eth(micro+ল)ntilde(ccedil+গ)ntilde6 ( +ccedil+গ)
ণ n ouml(Agrave+ট)iquest(Agrave+ঠ)divide(Agrave+ড)oslash(Agrave+Atilde+র)AElig(Agrave+ঢ)ugrave(Agrave+ণ)ণ8(Agrave+য)uacute(Agrave+ব)atilde(p+frac12+ণ)ucirc(frac12+ণ)uuml(+ণ)
JanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka
[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018
[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473
57
10 Appendix-I
101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply
1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াwেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াxব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexampleyেসিরং(Tsering)
102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti
13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section
58
Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)
Table17ndashTheScriptTableofSylhetiNagarıorSiloti
103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints
ক k Uuml(p+ক)raquo(p+ট)(p+ত)Yacute(p+y+র)(p+y+ব)THORN(p+ন)szlig(p+ব)agrave(p+ম)ক8(p+য)aacute(p+র)acirc(p+ষ)atilde(p+frac12+ণ)auml(p+frac12+ম)aring(p+frac12+ব)acirc8(p+frac12+য)aelig(p+স)t(ccedil+ক)egrave(+p+র)
amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)
গ g ecirc(micro+গ)euml(micro+দ)acute(micro+ধ)igrave(micro+ন)iacute(micro+ব)icirc(micro+ম)গ8(micro+য)iuml(micro+র)eth(micro+ল)ntilde(ccedil+গ)ntilde6 ( +ccedil+গ)
ণ n ouml(Agrave+ট)iquest(Agrave+ঠ)divide(Agrave+ড)oslash(Agrave+Atilde+র)AElig(Agrave+ঢ)ugrave(Agrave+ণ)ণ8(Agrave+য)uacute(Agrave+ব)atilde(p+frac12+ণ)ucirc(frac12+ণ)uuml(+ণ)
JanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka
[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018
[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473
57
10 Appendix-I
101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply
1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াwেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াxব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexampleyেসিরং(Tsering)
102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti
13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section
58
Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)
Table17ndashTheScriptTableofSylhetiNagarıorSiloti
103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints
ক k Uuml(p+ক)raquo(p+ট)(p+ত)Yacute(p+y+র)(p+y+ব)THORN(p+ন)szlig(p+ব)agrave(p+ম)ক8(p+য)aacute(p+র)acirc(p+ষ)atilde(p+frac12+ণ)auml(p+frac12+ম)aring(p+frac12+ব)acirc8(p+frac12+য)aelig(p+স)t(ccedil+ক)egrave(+p+র)
amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)
গ g ecirc(micro+গ)euml(micro+দ)acute(micro+ধ)igrave(micro+ন)iacute(micro+ব)icirc(micro+ম)গ8(micro+য)iuml(micro+র)eth(micro+ল)ntilde(ccedil+গ)ntilde6 ( +ccedil+গ)
ণ n ouml(Agrave+ট)iquest(Agrave+ঠ)divide(Agrave+ড)oslash(Agrave+Atilde+র)AElig(Agrave+ঢ)ugrave(Agrave+ণ)ণ8(Agrave+য)uacute(Agrave+ব)atilde(p+frac12+ণ)ucirc(frac12+ণ)uuml(+ণ)
JanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka
[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018
[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473
57
10 Appendix-I
101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply
1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াwেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াxব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexampleyেসিরং(Tsering)
102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti
13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section
58
Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)
Table17ndashTheScriptTableofSylhetiNagarıorSiloti
103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints
ক k Uuml(p+ক)raquo(p+ট)(p+ত)Yacute(p+y+র)(p+y+ব)THORN(p+ন)szlig(p+ব)agrave(p+ম)ক8(p+য)aacute(p+র)acirc(p+ষ)atilde(p+frac12+ণ)auml(p+frac12+ম)aring(p+frac12+ব)acirc8(p+frac12+য)aelig(p+স)t(ccedil+ক)egrave(+p+র)
amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)
গ g ecirc(micro+গ)euml(micro+দ)acute(micro+ধ)igrave(micro+ন)iacute(micro+ব)icirc(micro+ম)গ8(micro+য)iuml(micro+র)eth(micro+ল)ntilde(ccedil+গ)ntilde6 ( +ccedil+গ)
ণ n ouml(Agrave+ট)iquest(Agrave+ঠ)divide(Agrave+ড)oslash(Agrave+Atilde+র)AElig(Agrave+ঢ)ugrave(Agrave+ণ)ণ8(Agrave+য)uacute(Agrave+ব)atilde(p+frac12+ণ)ucirc(frac12+ণ)uuml(+ণ)
[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018
[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473
57
10 Appendix-I
101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply
1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াwেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াxব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexampleyেসিরং(Tsering)
102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti
13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section
58
Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)
Table17ndashTheScriptTableofSylhetiNagarıorSiloti
103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints
ক k Uuml(p+ক)raquo(p+ট)(p+ত)Yacute(p+y+র)(p+y+ব)THORN(p+ন)szlig(p+ব)agrave(p+ম)ক8(p+য)aacute(p+র)acirc(p+ষ)atilde(p+frac12+ণ)auml(p+frac12+ম)aring(p+frac12+ব)acirc8(p+frac12+য)aelig(p+স)t(ccedil+ক)egrave(+p+র)
amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)
গ g ecirc(micro+গ)euml(micro+দ)acute(micro+ধ)igrave(micro+ন)iacute(micro+ব)icirc(micro+ম)গ8(micro+য)iuml(micro+র)eth(micro+ল)ntilde(ccedil+গ)ntilde6 ( +ccedil+গ)
ণ n ouml(Agrave+ট)iquest(Agrave+ঠ)divide(Agrave+ড)oslash(Agrave+Atilde+র)AElig(Agrave+ঢ)ugrave(Agrave+ণ)ণ8(Agrave+য)uacute(Agrave+ব)atilde(p+frac12+ণ)ucirc(frac12+ণ)uuml(+ণ)
[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018
[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473
57
10 Appendix-I
101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply
1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াwেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াxব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexampleyেসিরং(Tsering)
102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti
13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section
58
Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)
Table17ndashTheScriptTableofSylhetiNagarıorSiloti
103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints
ক k Uuml(p+ক)raquo(p+ট)(p+ত)Yacute(p+y+র)(p+y+ব)THORN(p+ন)szlig(p+ব)agrave(p+ম)ক8(p+য)aacute(p+র)acirc(p+ষ)atilde(p+frac12+ণ)auml(p+frac12+ম)aring(p+frac12+ব)acirc8(p+frac12+য)aelig(p+স)t(ccedil+ক)egrave(+p+র)
amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)
গ g ecirc(micro+গ)euml(micro+দ)acute(micro+ধ)igrave(micro+ন)iacute(micro+ব)icirc(micro+ম)গ8(micro+য)iuml(micro+র)eth(micro+ল)ntilde(ccedil+গ)ntilde6 ( +ccedil+গ)
ণ n ouml(Agrave+ট)iquest(Agrave+ঠ)divide(Agrave+ড)oslash(Agrave+Atilde+র)AElig(Agrave+ঢ)ugrave(Agrave+ণ)ণ8(Agrave+য)uacute(Agrave+ব)atilde(p+frac12+ণ)ucirc(frac12+ণ)uuml(+ণ)
[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018
[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473
57
10 Appendix-I
101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply
1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াwেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াxব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexampleyেসিরং(Tsering)
102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti
13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section
58
Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)
Table17ndashTheScriptTableofSylhetiNagarıorSiloti
103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints
ক k Uuml(p+ক)raquo(p+ট)(p+ত)Yacute(p+y+র)(p+y+ব)THORN(p+ন)szlig(p+ব)agrave(p+ম)ক8(p+য)aacute(p+র)acirc(p+ষ)atilde(p+frac12+ণ)auml(p+frac12+ম)aring(p+frac12+ব)acirc8(p+frac12+য)aelig(p+স)t(ccedil+ক)egrave(+p+র)
amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)
গ g ecirc(micro+গ)euml(micro+দ)acute(micro+ধ)igrave(micro+ন)iacute(micro+ব)icirc(micro+ম)গ8(micro+য)iuml(micro+র)eth(micro+ল)ntilde(ccedil+গ)ntilde6 ( +ccedil+গ)
ণ n ouml(Agrave+ট)iquest(Agrave+ঠ)divide(Agrave+ড)oslash(Agrave+Atilde+র)AElig(Agrave+ঢ)ugrave(Agrave+ণ)ণ8(Agrave+য)uacute(Agrave+ব)atilde(p+frac12+ণ)ucirc(frac12+ণ)uuml(+ণ)
101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply
1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াwেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াxব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexampleyেসিরং(Tsering)
102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti
13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section
58
Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)
Table17ndashTheScriptTableofSylhetiNagarıorSiloti
103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints
ক k Uuml(p+ক)raquo(p+ট)(p+ত)Yacute(p+y+র)(p+y+ব)THORN(p+ন)szlig(p+ব)agrave(p+ম)ক8(p+য)aacute(p+র)acirc(p+ষ)atilde(p+frac12+ণ)auml(p+frac12+ম)aring(p+frac12+ব)acirc8(p+frac12+য)aelig(p+স)t(ccedil+ক)egrave(+p+র)
amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)
গ g ecirc(micro+গ)euml(micro+দ)acute(micro+ধ)igrave(micro+ন)iacute(micro+ব)icirc(micro+ম)গ8(micro+য)iuml(micro+র)eth(micro+ল)ntilde(ccedil+গ)ntilde6 ( +ccedil+গ)
ণ n ouml(Agrave+ট)iquest(Agrave+ঠ)divide(Agrave+ড)oslash(Agrave+Atilde+র)AElig(Agrave+ঢ)ugrave(Agrave+ণ)ণ8(Agrave+য)uacute(Agrave+ব)atilde(p+frac12+ণ)ucirc(frac12+ণ)uuml(+ণ)
Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)
Table17ndashTheScriptTableofSylhetiNagarıorSiloti
103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints
ক k Uuml(p+ক)raquo(p+ট)(p+ত)Yacute(p+y+র)(p+y+ব)THORN(p+ন)szlig(p+ব)agrave(p+ম)ক8(p+য)aacute(p+র)acirc(p+ষ)atilde(p+frac12+ণ)auml(p+frac12+ম)aring(p+frac12+ব)acirc8(p+frac12+য)aelig(p+স)t(ccedil+ক)egrave(+p+র)
amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)
গ g ecirc(micro+গ)euml(micro+দ)acute(micro+ধ)igrave(micro+ন)iacute(micro+ব)icirc(micro+ম)গ8(micro+য)iuml(micro+র)eth(micro+ল)ntilde(ccedil+গ)ntilde6 ( +ccedil+গ)
ণ n ouml(Agrave+ট)iquest(Agrave+ঠ)divide(Agrave+ড)oslash(Agrave+Atilde+র)AElig(Agrave+ঢ)ugrave(Agrave+ণ)ণ8(Agrave+য)uacute(Agrave+ব)atilde(p+frac12+ণ)ucirc(frac12+ণ)uuml(+ণ)
ক k Uuml(p+ক)raquo(p+ট)(p+ত)Yacute(p+y+র)(p+y+ব)THORN(p+ন)szlig(p+ব)agrave(p+ম)ক8(p+য)aacute(p+র)acirc(p+ষ)atilde(p+frac12+ণ)auml(p+frac12+ম)aring(p+frac12+ব)acirc8(p+frac12+য)aelig(p+স)t(ccedil+ক)egrave(+p+র)
amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)
গ g ecirc(micro+গ)euml(micro+দ)acute(micro+ধ)igrave(micro+ন)iacute(micro+ব)icirc(micro+ম)গ8(micro+য)iuml(micro+র)eth(micro+ল)ntilde(ccedil+গ)ntilde6 ( +ccedil+গ)
ণ n ouml(Agrave+ট)iquest(Agrave+ঠ)divide(Agrave+ড)oslash(Agrave+Atilde+র)AElig(Agrave+ঢ)ugrave(Agrave+ণ)ণ8(Agrave+য)uacute(Agrave+ব)atilde(p+frac12+ণ)ucirc(frac12+ণ)uuml(+ণ)
ক k Uuml(p+ক)raquo(p+ট)(p+ত)Yacute(p+y+র)(p+y+ব)THORN(p+ন)szlig(p+ব)agrave(p+ম)ক8(p+য)aacute(p+র)acirc(p+ষ)atilde(p+frac12+ণ)auml(p+frac12+ম)aring(p+frac12+ব)acirc8(p+frac12+য)aelig(p+স)t(ccedil+ক)egrave(+p+র)
amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)
গ g ecirc(micro+গ)euml(micro+দ)acute(micro+ধ)igrave(micro+ন)iacute(micro+ব)icirc(micro+ম)গ8(micro+য)iuml(micro+র)eth(micro+ল)ntilde(ccedil+গ)ntilde6 ( +ccedil+গ)
ণ n ouml(Agrave+ট)iquest(Agrave+ঠ)divide(Agrave+ড)oslash(Agrave+Atilde+র)AElig(Agrave+ঢ)ugrave(Agrave+ণ)ণ8(Agrave+য)uacute(Agrave+ব)atilde(p+frac12+ণ)ucirc(frac12+ণ)uuml(+ণ)
ক k Uuml(p+ক)raquo(p+ট)(p+ত)Yacute(p+y+র)(p+y+ব)THORN(p+ন)szlig(p+ব)agrave(p+ম)ক8(p+য)aacute(p+র)acirc(p+ষ)atilde(p+frac12+ণ)auml(p+frac12+ম)aring(p+frac12+ব)acirc8(p+frac12+য)aelig(p+স)t(ccedil+ক)egrave(+p+র)
amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)
গ g ecirc(micro+গ)euml(micro+দ)acute(micro+ধ)igrave(micro+ন)iacute(micro+ব)icirc(micro+ম)গ8(micro+য)iuml(micro+র)eth(micro+ল)ntilde(ccedil+গ)ntilde6 ( +ccedil+গ)
ণ n ouml(Agrave+ট)iquest(Agrave+ঠ)divide(Agrave+ড)oslash(Agrave+Atilde+র)AElig(Agrave+ঢ)ugrave(Agrave+ণ)ণ8(Agrave+য)uacute(Agrave+ব)atilde(p+frac12+ণ)ucirc(frac12+ণ)uuml(+ণ)
ক k Uuml(p+ক)raquo(p+ট)(p+ত)Yacute(p+y+র)(p+y+ব)THORN(p+ন)szlig(p+ব)agrave(p+ম)ক8(p+য)aacute(p+র)acirc(p+ষ)atilde(p+frac12+ণ)auml(p+frac12+ম)aring(p+frac12+ব)acirc8(p+frac12+য)aelig(p+স)t(ccedil+ক)egrave(+p+র)
amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)
গ g ecirc(micro+গ)euml(micro+দ)acute(micro+ধ)igrave(micro+ন)iacute(micro+ব)icirc(micro+ম)গ8(micro+য)iuml(micro+র)eth(micro+ল)ntilde(ccedil+গ)ntilde6 ( +ccedil+গ)
ণ n ouml(Agrave+ট)iquest(Agrave+ঠ)divide(Agrave+ড)oslash(Agrave+Atilde+র)AElig(Agrave+ঢ)ugrave(Agrave+ণ)ণ8(Agrave+য)uacute(Agrave+ব)atilde(p+frac12+ণ)ucirc(frac12+ণ)uuml(+ণ)
গ g ecirc(micro+গ)euml(micro+দ)acute(micro+ধ)igrave(micro+ন)iacute(micro+ব)icirc(micro+ম)গ8(micro+য)iuml(micro+র)eth(micro+ল)ntilde(ccedil+গ)ntilde6 ( +ccedil+গ)
ণ n ouml(Agrave+ট)iquest(Agrave+ঠ)divide(Agrave+ড)oslash(Agrave+Atilde+র)AElig(Agrave+ঢ)ugrave(Agrave+ণ)ণ8(Agrave+য)uacute(Agrave+ব)atilde(p+frac12+ণ)ucirc(frac12+ণ)uuml(+ণ)