Top Banner
Computing Support for Pakistani Languages – Challenges and Practice Sarmad Hussain Center for Language Engineering Al-Khawarizmi Institute of Computer Science University of Engineering and Technology Lahore [email protected] Unlocking Information for Human Development www.CLE.org.pk 1 www.cle.org.pk
32

"Computing support for Pakistani Languages, Challenges & Practices" by Dr. Sarmad Hussain

Oct 20, 2014

Download

Marketing

This presentation was shared on PAS Digital Marketing Conference "Dig-It 2.0"
Session name: Urdu Internet - Leveraging Technologies
Presentation: Computing support for Pakistani Languages, Challenges & Practices
Speaker: Dr. Sarmad Hussain, Professor and Head, Center for Language Engineering, University of Engineering and Technology, Pakistan
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: "Computing support for Pakistani Languages, Challenges & Practices" by Dr. Sarmad Hussain

www.cle.org.pk 1

Computing Support for Pakistani Languages – Challenges and Practice

Sarmad HussainCenter for Language Engineering

Al-Khawarizmi Institute of Computer ScienceUniversity of Engineering and Technology

Lahore

[email protected]

Unlocking Information for Human Developmentwww.CLE.org.pk

Page 2: "Computing support for Pakistani Languages, Challenges & Practices" by Dr. Sarmad Hussain

www.cle.org.pk

NeedICTs promise significant socio-economic impact

Impact dependent on size of population which can use ICTs

180 Million citizens need access66+ languages

10% understand English58% literate

11% have access to computers70% have access to mobile phones

ITU IDI: Pakistan ranked 127 of 155 nations

Human Language Technology necessary to bridge the gap 2

Page 3: "Computing support for Pakistani Languages, Challenges & Practices" by Dr. Sarmad Hussain

www.cle.org.pk

Languages of Pakistan

Urdu Punjabi Sindhi Pushto Balochi Saraiki Others (60)

Total 7.57 44.15 14.1 15.42 3.57 10.53 4.66

Rural 1.48 42.51 16.46 18.06 3.99 12.97 4.53

Urban 20.22 47.56 9.20 9.94 2.69 5.46 4.93

3

Percent Population of Pakistan by

Mother Tongue

Page 4: "Computing support for Pakistani Languages, Challenges & Practices" by Dr. Sarmad Hussain

www.cle.org.pk

Languages of Pakistan

Urdu Punjabi Sindhi Pushto Balochi Saraiki Others (60)

Total 7.57 44.15 14.1 15.42 3.57 10.53 4.66

Rural 1.48 42.51 16.46 18.06 3.99 12.97 4.53

Urban 20.22 47.56 9.20 9.94 2.69 5.46 4.93

4

Percent Population of Pakistan by

Mother Tongue

Economic

Socio-cultural

Page 5: "Computing support for Pakistani Languages, Challenges & Practices" by Dr. Sarmad Hussain

www.cle.org.pk

Languages of Pakistan

Urdu Punjabi Sindhi Pushto Balochi Saraiki Others (60)

Total 7.57 44.15 14.1 15.42 3.57 10.53 4.66

Rural 1.48 42.51 16.46 18.06 3.99 12.97 4.53

Urban 20.22 47.56 9.20 9.94 2.69 5.46 4.93

Languages of Pakistan in Danger (UNESCO)

Vulnerable

definitely endangered

severely endangered 5

Percent Population of Pakistan by

Mother Tongue

Economic

Socio-cultural

Page 6: "Computing support for Pakistani Languages, Challenges & Practices" by Dr. Sarmad Hussain

www.cle.org.pk 6

How?

USE

Human Language Technology Linguistic Research

StandardsApplicationsMaterials

Training

Relevant Content AccessRelevant Content Generation

Adoption

Page 7: "Computing support for Pakistani Languages, Challenges & Practices" by Dr. Sarmad Hussain

www.cle.org.pk 7

Human Language Technology – Bridging Barriers

• Interfacing• Assisting• Enabling• Empowering

Page 8: "Computing support for Pakistani Languages, Challenges & Practices" by Dr. Sarmad Hussain

www.cle.org.pk 8

و سخرالشمس والقمر

Interfacing– Character Set

• Input Methods• Writing• Collation

– Terminology Translation

Language

Technology– Applications

• Fonts• Keyboards, Keypads and

Other Input Methods• Collation Methods• Localized Platform

Standards– National– International

• ISO 639• ISO 3166• ISO 10646/Unicode

– Platforms: Computers and Phones• Linux/Unix and Symbian• Microsoft Windows and Phone• iOS – iPAD, iPhone, Macbook, …• Google – Gmail, Docs, …Android

Page 9: "Computing support for Pakistani Languages, Challenges & Practices" by Dr. Sarmad Hussain

Software Localization

SeaMonkey Navigator

OpenOffice.org Writer

Page 10: "Computing support for Pakistani Languages, Challenges & Practices" by Dr. Sarmad Hussain

www.cle.org.pk 10

Terminology and Content

Page 11: "Computing support for Pakistani Languages, Challenges & Practices" by Dr. Sarmad Hussain

www.cle.org.pk 11

Assisting

• Text– Assistive input/auto-complete methods– Thesaurus, Spelling and Grammar Checking– Machine Translation, Language Identification, Text Summarization …

• Speech– Speech Recognition– Text to Speech– Emotion Detection, …

• Image – Optical Character Recognition – www.UrduOCR.net – Handwriting Recognition

Page 12: "Computing support for Pakistani Languages, Challenges & Practices" by Dr. Sarmad Hussain

www.cle.org.pk 12

Page 13: "Computing support for Pakistani Languages, Challenges & Practices" by Dr. Sarmad Hussain

www.cle.org.pk 13

Page 14: "Computing support for Pakistani Languages, Challenges & Practices" by Dr. Sarmad Hussain

www.cle.org.pk 14

Enabling

• Hybrid– Online Content Sharing Tools – CMS, Social

Networks– Screen Readers– Book Readers– Text based Search Engines– Dialogue Systems– Speech to Speech Translation– Multi-modal Search Engines

Page 15: "Computing support for Pakistani Languages, Challenges & Practices" by Dr. Sarmad Hussain

www.cle.org.pk 15

Dialogue System

Page 16: "Computing support for Pakistani Languages, Challenges & Practices" by Dr. Sarmad Hussain

www.cle.org.pk 16

Empowering

• ICT for ICT - Focused on infrastructure• ICT for Development - Focused on content and applications• ICT for Human Development - Focused on participatory process

Page 17: "Computing support for Pakistani Languages, Challenges & Practices" by Dr. Sarmad Hussain

www.cle.org.pk 17

Page 18: "Computing support for Pakistani Languages, Challenges & Practices" by Dr. Sarmad Hussain

www.cle.org.pk 18

LANGUAGE AND ICT TRAINING

Before Training After Training Before Training After TrainingSoftware Training Material

0%

20%

40%

60%

80%

100%Preference for Urdu

Preference for English

Before Training After Training Before Training After TrainingSoftware Training Material

0

20

40

60

80

100Preference for Urdu

Preference for EnglishPe

rcen

t Te

ache

rs

Page 19: "Computing support for Pakistani Languages, Challenges & Practices" by Dr. Sarmad Hussain

www.cle.org.pk 19

LANGUAGE AND ICT TRAINING

Icons

Icon Identification by Students

Urdu English

English Transliterated

into Urdu

Didn't Recogni

ze

Sub-Total F M F M F M F M

Sub- Total

691

656

132

198

150

183 49 40 2099

Total 1347 330 333 89 2099

64%16%

16%

4%

Page 20: "Computing support for Pakistani Languages, Challenges & Practices" by Dr. Sarmad Hussain

www.cle.org.pk 20

ACCESSING INFO ONLINE

Students

Language Used

TotalUrdu

English

Female 44 2 46Male 45 2 47Total 89 4 93

Participant

English Urdu

Students

0 138

Teachers

5 13

Total 5 151

Preferred Language for Setting a Homepage

Language Preference for Searching on the Internet

Page 21: "Computing support for Pakistani Languages, Challenges & Practices" by Dr. Sarmad Hussain

www.cle.org.pk 21

LANGUAGE IN ONLINE COMMUNICATION

89%

9%1% 2%

Urdu

English

Punjabi

Others

1467 emails and 363 chats

Page 22: "Computing support for Pakistani Languages, Challenges & Practices" by Dr. Sarmad Hussain

www.cle.org.pk 22

LANGUAGE FOR CONTENT DEVELOPMENT

Website Competition CategoryLanguage of Website

Urdu English Total

School Website (by 10 School Teacher Teams)

9 1 10

Local Village Website (by 10 School Student Teams)

8 0 8

Open Category (Individual Students) 38 0 38

Total 55 1 56

[1] One school did not participate, and one school website was disqualified as the team took significant external assistance.

Page 23: "Computing support for Pakistani Languages, Challenges & Practices" by Dr. Sarmad Hussain

www.cle.org.pk 23

CONTENT

Page 24: "Computing support for Pakistani Languages, Challenges & Practices" by Dr. Sarmad Hussain

Development Process of Human Language Technology

Core Linguistic Analysis and

Definition

Detailed Linguistic Analysis

Development of Localization

Utilities

Linguistic Data Collection

Annotation of Linguistic Data

Localization of Existing

Applications

Development of Linguistic

Utilities

Extension of Localization Applications

Development of Advanced

HLT Application

Publishing Language Computing Standards

Publishing Data

Annotations Schema

Publishing Annotated Linguistic

Resources

Select Language

24

Page 25: "Computing support for Pakistani Languages, Challenges & Practices" by Dr. Sarmad Hussain

Status of Human Language Technology

Core Linguistic Analysis and

Definition

Detailed Linguistic Analysis

Development of Localization

Utilities

Linguistic Data Collection

Annotation of Linguistic Data

Localization of Existing

Applications

Development of Linguistic

Utilities

Extension of Localization Applications

Development of Advanced

HLT Application

Publishing Language Computing Standards

Publishing Data

Annotations Schema

Publishing Annotated Linguistic

Resources

URDU

Reasonable Support

Some Support

Minimal Support

25

Page 26: "Computing support for Pakistani Languages, Challenges & Practices" by Dr. Sarmad Hussain

Status of Human Language Technology

Core Linguistic Analysis and

Definition

Detailed Linguistic Analysis

Development of Localization

Utilities

Linguistic Data Collection

Annotation of Linguistic Data

Localization of Existing

Applications

Development of Linguistic

Utilities

Extension of Localization Applications

Development of Advanced

HLT Application

Publishing Language Computing Standards

Publishing Data

Annotations Schema

Publishing Annotated Linguistic

Resources

SINDHI

Reasonable Support

Some Support

Minimal Support

26

Page 27: "Computing support for Pakistani Languages, Challenges & Practices" by Dr. Sarmad Hussain

Status of Human Language Technology

Core Linguistic Analysis and

Definition

Detailed Linguistic Analysis

Development of Localization

Utilities

Linguistic Data Collection

Annotation of Linguistic Data

Localization of Existing

Applications

Development of Linguistic

Utilities

Extension of Localization Applications

Development of Advanced

HLT Application

Publishing Language Computing Standards

Publishing Data

Annotations Schema

Publishing Annotated Linguistic

Resources

PUSHTO

Reasonable Support

Some Support

Minimal Support

27

Page 28: "Computing support for Pakistani Languages, Challenges & Practices" by Dr. Sarmad Hussain

Status of Human Language Technology

Core Linguistic Analysis and

Definition

Detailed Linguistic Analysis

Development of Localization

Utilities

Linguistic Data Collection

Annotation of Linguistic Data

Localization of Existing

Applications

Development of Linguistic

Utilities

Extension of Localization Applications

Development of Advanced

HLT Application

Publishing Language Computing Standards

Publishing Data

Annotations Schema

Publishing Annotated Linguistic

Resources

PUNJABI

Reasonable Support

Some Support

Minimal Support

28

Page 29: "Computing support for Pakistani Languages, Challenges & Practices" by Dr. Sarmad Hussain

Status of Human Language Technology

Core Linguistic Analysis and

Definition

Detailed Linguistic Analysis

Development of Localization

Utilities

Linguistic Data Collection

Annotation of Linguistic Data

Localization of Existing

Applications

Development of Linguistic

Utilities

Extension of Localization Applications

Development of Advanced

HLT Application

Publishing Language Computing Standards

Publishing Data

Annotations Schema

Publishing Annotated Linguistic

Resources

BALOCHI

Reasonable Support

Some Support

Minimal Support

29

Page 30: "Computing support for Pakistani Languages, Challenges & Practices" by Dr. Sarmad Hussain

Status of Human Language Technology

30

Core Linguistic Analysis and

Definition

Detailed Linguistic Analysis

Development of Localization

Utilities

Linguistic Data Collection

Annotation of Linguistic Data

Localization of Existing

Applications

Development of Linguistic

Utilities

Extension of Localization Applications

Development of Advanced

HLT Application

Publishing Language Computing Standards

Publishing Data

Annotations Schema

Publishing Annotated Linguistic

Resources

SARAIKI

Reasonable Support

Some Support

Minimal Support

Page 31: "Computing support for Pakistani Languages, Challenges & Practices" by Dr. Sarmad Hussain

Status of Human Language Technology

31

Core Linguistic Analysis and

Definition

Detailed Linguistic Analysis

Development of Localization

Utilities

Linguistic Data Collection

Annotation of Linguistic Data

Localization of Existing

Applications

Development of Linguistic

Utilities

Extension of Localization Applications

Development of Advanced

HLT Application

Publishing Language Computing Standards

Publishing Data

Annotations Schema

Publishing Annotated Linguistic

Resources

OTHERS

Reasonable Support

Some Support

Minimal Support

Page 32: "Computing support for Pakistani Languages, Challenges & Practices" by Dr. Sarmad Hussain

www.cle.org.pk 32