Top Banner
The Role of Technology in Language Learning Rachel Edita Roxas, PhD
73
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Lecture 3 dr rachel edita roxas

The Role of Technology in Language Learning

Rachel Edita Roxas, PhD

Page 2: Lecture 3 dr rachel edita roxas

Aim of This Presentation

• To provide a state-of-the-art of computational linguistics, language documentation, and empathic computing towards the development of language learning applications in the Philippines

• To draw implications from these developments for future research, policy-making, and teaching and pedagogy particularly in the context of the current debate on the implementation of mother tongue-based multilingual education in the country

Page 3: Lecture 3 dr rachel edita roxas

• Why is human language technology (or natural language processing) challenging?

• Why is human language computationally challenging?

• How did we deal with these questions in our HLT research in the Philippines?

Page 4: Lecture 3 dr rachel edita roxas
Page 5: Lecture 3 dr rachel edita roxas

Fruit flies like a banana

Page 6: Lecture 3 dr rachel edita roxas

•The students greeted the teachers when they arrived.

Page 7: Lecture 3 dr rachel edita roxas
Page 8: Lecture 3 dr rachel edita roxas

User Profile

Page 9: Lecture 3 dr rachel edita roxas

Ambiguity

Page 10: Lecture 3 dr rachel edita roxas

• Multi-modality: text, audio, video

• Multi-disciplinary

Page 11: Lecture 3 dr rachel edita roxas

• How do we build the data?• Where do we get the data?

Page 12: Lecture 3 dr rachel edita roxas

Known and SpecifiedKnown but UnspecifiedUnknown and Unspecified

Page 13: Lecture 3 dr rachel edita roxas

• Building Philippine language resources: grammar, lexicon, morphological information, corpora

• Manual Construction: Rule-based• Automatic Methods for language

resource extraction/generation: Example-based

• Some applications: Domain-specific

Page 14: Lecture 3 dr rachel edita roxas

Language Resources

• Text: various linguistic levels such as lexical, syntactic and semantic– The Philippine Corpus– English-Filipino Lexicon– Filipino WordNet– Philippine Component of the

International Corpus of English• Speech Processing• Video: Sign Language Processing

Page 15: Lecture 3 dr rachel edita roxas

The Philippine Corpus

• Initial work on the manual collection of documents on Philippine languages has been done by Dita, Roxas, and Inventado (2009) for four major Philippine languages namely, Tagalog, Cebuano, Ilocano and Hiligaynon with 250,000 words each, and the Filipino sign language with 7,000 signs.

• Now: currently working on another 4 Philippine languages

• The future goal is to be able to collect the corpora for other Philippine languages.

Page 16: Lecture 3 dr rachel edita roxas

The Philippine Corpus

• The Philippine language corpora are also accessible online. It is especially of great importance to those interested in Philippine languages, both locally and internationally, that the corpora are available online and can be extended by native speakers of these languages from all over the world.

Page 17: Lecture 3 dr rachel edita roxas

The Philippine Corpus

• An important aspect of the building of the Philippine corpus as with other corpus building endeavors is storing data and keeping track of the data and the processes performed on the data. Digitization of data is now made possible by existing technologies which made the storing and tracking of data much easier.

• Moreover, through the connectivity through the Internet, data are made even more accessible to anyone on the web. Therefore, Palito, an online corpus management system, was designed to use these technologies for corpora building.

Page 18: Lecture 3 dr rachel edita roxas

Screenshot of Palito’s Front Page:

ccs.dlsu.edu.ph:8086/Palito

Page 19: Lecture 3 dr rachel edita roxas

Sample of the Document Browsing Feature in Palito

Page 20: Lecture 3 dr rachel edita roxas

Sample of the Word Frequency Counting Feature

of Palito

Page 21: Lecture 3 dr rachel edita roxas

Sample Concordancer Result in Palito

Page 22: Lecture 3 dr rachel edita roxas

Sample Video with Gloss and Transcription Viewing in Palito

Page 23: Lecture 3 dr rachel edita roxas

The Philippine Corpus

• An unexplored but equally challenging area is the collection of historical documents that will allow research on the development of the Philippine languages through the centuries (Roxas, 2007).

• An interesting piece of historical information is in Doctrina Christiana, the first ever published work in the country in 1593 which shows the translation of religious material in the local Philippine script, the Alibata, and Spanish.

• Current digitalization efforts include scanned pages of the document.

Page 24: Lecture 3 dr rachel edita roxas

A Page from Doctrina Christiana

Page 25: Lecture 3 dr rachel edita roxas

An English-Filipino Lexicon

• Currently, there exists an English-Filipino lexicon initially based on the English-Filipino dictionary of the Komisyon sa Wikang Filipino, and augmented by new words by Lim et al (2007).

• It contains 23,520 English and 20,540 Filipino word senses with information on the part of speech and co-occurring words.

Page 26: Lecture 3 dr rachel edita roxas

Filipino WordNet

• WordNet is a large collection of words in a language which are grouped into sets of synonyms called synsets, with each synset expressing a distinct concept.

• Initial work has been done by Borra, Pease, Roxas, and Dita (2010) on Filipino as it relates to English by Fellbaum (1998). The challenge in the building of a new WordNet is when synsets do not appear in existing synsets. In Filipino in particular, the word hilamos which means to wash one’s face does not have an equivalent English synset, and is represented as a hypernym of hugas which means to wash (Borra, et al, 2010).

Page 27: Lecture 3 dr rachel edita roxas

The Philippine Component of the International Corpus of English

(ICE-PHI)• Compiled by Bautista, Lising, and Dayag and

released in 2004• About one million words distributed almost

evenly across 500 texts with specified categories

• Approximately 2000 words per text with some being composite to reach the 2000-word minimum

• Samples from the English spoken or written by adults aged 18 and above and who received formal education through the medium of English up to the postsecondary level

Page 28: Lecture 3 dr rachel edita roxas

Current State of ICE-PHI

1. Since around June of 2008, the ICE-PHI team has started processing the lexical corpus. It was automatically tagged using the program MakeTag 1.0 and is (approximately) 90% accurate but it still needed to undergo manual verification to achieve 99-100% accuracy.

2. Before the actual analysis of the verb of the ten percent of ICE-PHI for the preparation of a grammar of the verb in Philippine English by Borlongan (2010), words bearing the tag V ‘verb’ and AUX ‘auxiliary’ were carefully verified. The tags were verified through ICE Corpus Utility Program (ICECUP) 3.1.

Page 29: Lecture 3 dr rachel edita roxas

Studies Using ICE-PHI

• A number of studies have stemmed from the analysis of ICE-PHI – individual analyses of the component and analyses comparing ICE-PHI with other ICE components (British, Hong Kong, Indian, New Zealand, and Singapore).

• Two recently published studies making use of ICE-PHI are that of Bautista (2008) on the validation of Philippine English grammatical features and Borlongan (2008) on tag questions in Philippine English. Both studies juxtapose Philippine English with other Englishes.

Page 30: Lecture 3 dr rachel edita roxas

Speech Processing

• Speech processing studies include automatic speech recognition (or speech-to-text) and text-to-speech. Studies that use the Filipino Speech Corpus (FSC) include speech-to-text (Cayaban, Climaco, Espina, & Guevara, 2001; Corpus, Liampo, Co, & Guevara, 2001; dela Vega, Co, & Guevara, 2002; Sagum, Ensomo, Tan, & Guevara, 2003; Tantan, Tan, & Guevara, 2003) and text-to-speech (Cayaban, et al., 2001; Co & Guevarra, 2003; Corpus, et al., 2001; Espina, Tan, & Guevara, 2002; Tupas, Co, & Guevara, 2002).

Page 31: Lecture 3 dr rachel edita roxas

Speech Processing

• Other Filipino speech processing applications that were developed without the use of FSC include PinoyTalk (Casas, Rivera, Tan, & Villamil, 2004) and Tagapagsalita (Aralar, Coloso, Moneda, Ilao, & Cu, 2008), which use the Filipino voice recordings as corpus. Speaker identification and verification applications were also developed using a small corpus of 10 speakers each with five recordings of their individual passwords (Go, Manza, Realeza, & Ting, 2001; Jacinto, Nario, See, & Umali, 2002).

Page 32: Lecture 3 dr rachel edita roxas

Speech Processing

• Ebarvia, Bayona, de Leon, Lopez, Guevara, Calingacion, and Naval (2008) developed a system that automatically recognizes emotions such as anger, boredom, happiness and satisfaction using an actual call center database. Chua, De Guia, Li, and Rojas (2009), on the other hand, came up with an application that recognizes emotions such as happiness, sadness, anger, fear, surprise, disgust, and neutral, using a corpus of 10,500 acted-emotion Filipino speech recordings.

Page 33: Lecture 3 dr rachel edita roxas

Sign Language Processing

• The Filipino Sign Language (FSL) is also included in attempts to come up with a corpus on Philippine languages.

• Work has been done on processing this data such as in FSL number recognition by Sandjaja and Marcos (2009) using color-coded gloves for feature extraction using digital signal processing.

Page 34: Lecture 3 dr rachel edita roxas

Color-coded Glove for FSL Number Recognition

Page 35: Lecture 3 dr rachel edita roxas

Language Applications for Teaching

• Instructional Aids• Applications on Reading Comprehension• Applications on Composition Writing

Page 36: Lecture 3 dr rachel edita roxas

• Automatic detection of code switching: ERDT Project from June 15, 2011 (for one year)

• Initial scope of the study: Textual information

• Next steps: audio

Page 37: Lecture 3 dr rachel edita roxas

Spel:IT

• SpeL:IT is a courseware that aids children with specific language impairment to differentiate and recall similar sounding words.

• It has 16 stories with 46 CVC similar sounding words through visual and auditory illustrations which can be played over and over. The stories end with lesson drills, practice activities and story assessment.

Page 38: Lecture 3 dr rachel edita roxas

Sample Screen at the Level of Word Recognition in Spel:IT

Page 39: Lecture 3 dr rachel edita roxas

SalinLahi

SalinLahi is an interactive learning environment for Filipino language learning for kids between six to eight years old.

The users interact with Popoy, a boy within the same age bracket, as well as his other family members and friends, and join him in his activities as the users go through the lessons.

Page 40: Lecture 3 dr rachel edita roxas

Interaction of SalinLahi’s User with Popoy

Page 41: Lecture 3 dr rachel edita roxas

Interaction of SalinLahi’s User with Popoy’s Family

Page 42: Lecture 3 dr rachel edita roxas

SalinLahi

• There are eleven lessons on basic Filipino and each lesson uses images, animation, interactive components, and audio and culminates with interactive exercises where feedback is immediately provided automatically. The application also keeps track of students’ progress.

Page 43: Lecture 3 dr rachel edita roxas

Popsicle

Popsicle is an intelligent tutoring system with a primary function of tutoring English second-language learners. It is a software that identifies and corrects language errors committed by students while they are learning English.

Page 44: Lecture 3 dr rachel edita roxas

Popsicle

The software initially assesses the grammatical competence of the learner based on an input essay document composed by the user, identifies the grammatical errors in the document, provides feedback and suggestions in natural language, and generates lessons on grammar that are tailor-fit to the individual needs of the learner. The evaluation uses the zone of proximal development (ZPD) in determining the level of user that the system has to consider in marking the input essay.

Page 45: Lecture 3 dr rachel edita roxas

Zone of Proximal Distance for an English Composition in

Popsicle

Page 46: Lecture 3 dr rachel edita roxas

MesCH

• Fajardo, Di, Novenario, and Yu in 2008 developed MesCH (Measurement System for Children’s Reading Comprehension), a software that accepts children’s stories and automatically generates multiple choice questions to test the child’s reading comprehension.

• The program rephrases parts of the story into four wh-questions (who, what, when, where), sequence questions (which came first), and vocabulary questions.

Page 47: Lecture 3 dr rachel edita roxas

MesCH

• To cite as an example, with the sentence Slimy tadpoles came out from the eggs, the system generates the following possible stems:

 1. What came out from the eggs?2. Where did the slimy tadpoles come out?3. In the sentence, “Slimy tadpoles came out

from the eggs,” what does the verb “came out” mean?

4. In the sentence, “Slimy tadpoles came out from the eggs,” what does the adjective “slimy” mean?

Page 48: Lecture 3 dr rachel edita roxas

MesCH

• The system considers principles in instructional assessment such as the formulation of four wh-questions and the construction of distractors through the use of entries in WordNet that relate with the correct answer.

Page 49: Lecture 3 dr rachel edita roxas

HelloPol

HelloPol is a system wherein the user can dialogue in English with the system within the political domain (Alimario, Cabrera, Ching, Sia, & Tan, 2003). The main objective of the system is to provide answers to the users’ questions such that the user would not have an idea that it is actually a program he/she is conversing with. Thus, answers or replies of the system should be as natural as possible and should not be repetitive.

Page 50: Lecture 3 dr rachel edita roxas

HelloPol

For the system to perform as such it has been fed with political news articles and information extraction has been integrated into the system to automatically extract relevant information from the articles into a more structured type of representation for use in the question-answering system.

The user may ask factoid questions (who, what, when, where) and the program answers these by referring to the database of information.

Page 51: Lecture 3 dr rachel edita roxas

Picture Books

• Picture Books generates stories for children from an input picture containing the background and a set of characters and object stickers (Solis, Siy, Tabirao, & Ong, 2009).

• The child chooses the stickers and the system associates these to a theme and a (manually-created) ontology which are then used to generate a fable-type of story.

Page 52: Lecture 3 dr rachel edita roxas

Screenshot of Picture Books’ Story Window

Page 53: Lecture 3 dr rachel edita roxas

Automatic Essay Evaluator

• Another application developed for composition writing is the automatic essay evaluator. The evaluator, which was developed by Cruz, Escutin, Estioko, and Plaza in 2003, evaluates large collections of essay-type documents using the latent semantic analysis (LSA) technique (Cruz, et al., 2003).

• Rule-based natural language parsing is used for the grammar checking of the input sentences, while LSA is used to evaluate the content. The system was trained on corpora containing pre-graded essays gathered from a particular high school class, which were graded by at least two human teachers according to three criteria: (1) Mechanics, (2) organization, and (3) content.

Page 54: Lecture 3 dr rachel edita roxas

Empathic Computing

• Empathic Research at the College of Computer Studies, De La Salle University

• Empathic computing is a marriage of many disciplines, notably affective computing, digital signal processing, social signal processing, sensor-rich and ambient intelligent, ubiquitous computing and machine learning.

• http://cehci.dlsu.edu.ph

Page 55: Lecture 3 dr rachel edita roxas

55

Objective:

It aims to build human-centered systems, with

emphasis on feedback based on its user’s emotion, and system-initiated response,

i.e. for assistance and support.

Page 56: Lecture 3 dr rachel edita roxas

56

Uniqueemotion & behavior

Uniquemirroring feedback

Page 57: Lecture 3 dr rachel edita roxas

57

• Emotion Modeling. – Software that automatically

recognizes human emotion.– Considers a person’s voice to

determine emotion. – Considers a person’s non-verbal

cues to determine emotions.

Page 58: Lecture 3 dr rachel edita roxas

58

• Emotion detection from verbal and non-verbal cues

• Audio and Videos files can be used to detect emotions.

• Uses Digital Signal Processing

• (Cu, et al, 2010)

Page 59: Lecture 3 dr rachel edita roxas

Filipino Laughter

Page 60: Lecture 3 dr rachel edita roxas

Pinoy Laughter is interesting.

Page 61: Lecture 3 dr rachel edita roxas

5 Kinds of Laughter

• Natutuwa• Kinikilig• Nasasabik• Nahihiya• Mapanakit

Page 62: Lecture 3 dr rachel edita roxas

Recognition of Emotion

• 73% accurate using audio information. • It seems that audio information is

more reliable than facial features in identifying the emotion carried by laughter.

• Restraint was noted (laughter not freely expressed)

• In the Filipino context, laughter is sometimes used to mask negative emotions.

Page 63: Lecture 3 dr rachel edita roxas

Study of Rapport in Dialogues

• Rapport in Dialogues• Point attention to the facial

expressions of the members in the dialogue, the movement of the body, its position, the arms, gestures of the hands, the eye gaze, head movement and its tilting

• (Data was collected between a Filipino and a Japanese in Japan. Both used English as communication medium during this interaction.)

Page 64: Lecture 3 dr rachel edita roxas

Affective Mirroring

• A psychological phenomenon called affective mirroring occurs during an interaction.

• Typically, there is high rapport when the person you talk with imitate your body position, gestures, movement and expression during an interaction.

Page 65: Lecture 3 dr rachel edita roxas

Implications for Future Research

• Applications such as bilingual/multilingual translators, Philippine languages speech-to-text and text-to-speech systems for mobile and low-cost devices, speech training software, and dialogue analysis for data mining are just some prospects for future research for those involved in computation.

• As more applications are being made available for the languages that currently have corpora or available data, other languages should likewise be at the agenda of researchers working on computational and corpus linguistics. Of utmost importance is the documentation of endangered languages.

Page 66: Lecture 3 dr rachel edita roxas

Implications for Future Research

• Reference grammars of Philippine languages have been impressively prepared recently, like those of Daguman (2004) and Dita (2004), following the footsteps of Schacter and Otanes (1972). But perhaps, a more corpus-based, corpus-driven approach should be explored by Philippine linguists.

Page 67: Lecture 3 dr rachel edita roxas

Implications for Policy-Making

• Computational linguistics, language documentation, and the development of language applications should be a priority among policy-makers in education, research, and even government in particular. More sectors of the government should be more supportive of the endeavors of those working on this field of research.

Page 68: Lecture 3 dr rachel edita roxas

Implications for Policy-Making

• Those who make policies in education should also look into how these resources and applications could mapped in, ultimately, the achievement of the goals of education in the country.

• And when one talks about technology in a developing country, one certainly has to touch on the issue of financing these technologies developed. There is obviously a big gap between what is ideal and what is realistic – these no matter how advanced these technologies are, if the financial and material resources of a specific school are not enough, they would not be able to harvest the benefits of the progress of technology.

Page 69: Lecture 3 dr rachel edita roxas

Implications for Teaching and Pedagogy

• Curriculum and materials developers should look into how these resources and applications could be integrated – and even streamlined – in the curriculum and textbooks of their respective schools and teaching contexts.

Page 70: Lecture 3 dr rachel edita roxas

• Classroom teachers can be as creative as they can be in using these resources and applications in the delivery of their instruction. The corpus could easily provide authentic examples in teaching languages, particularly in the teaching of less-documented languages and language varieties, as in the case of Philippine English versus “Standard” or American English. Teachers should also be easily aided by the assessment and evaluation tools when they provide evaluation to their students.

Page 71: Lecture 3 dr rachel edita roxas

Implications for Teaching and Pedagogy

• But it is also important that teachers, most especially those already in service, be trained on how to make use of these resources and applications in their teaching contexts. Admittedly, the advancing afforded by new technology is not so easily adopted most especially by those who got used to the more traditional instructional techniques.

Page 72: Lecture 3 dr rachel edita roxas

TED lecture: the Birth of a Word

• Deb Roy_ The birth of a word (TED) [clipnabber.com].mp4

Page 73: Lecture 3 dr rachel edita roxas

The Role of Technology in Language Learning

Rachel Edita Roxas, PhD