Finalpresentation

Natural Language ProcessingCan artificial systems acquire language the same way children do?

Andrea Hill – COMP 670

Introduction I studied first and second

language acquisition earlier in my college career; was always interested in how we process language

Natural language processing and machine translation is still a burgeoning field

Can we teach a machine to ‘learn’ as we do?

Not a unique idea Alan Turing, 1950:

“Instead of trying to produce a programme to simulate the adult mind, why not rather try to produce one which simulates the child's? If this were then subjected to an appropriate course of education one would obtain the adult brain.”

Introducing HAL Meet Hal. Like any 18-month-old

toddler, he likes bananas, toys and playing in the park. He especially enjoys bedtime stories.

But while other children are flesh and blood, Hal is a chain of algorithms — a computer program that is being raised as a child and taught to speak through experiential learning in the same way as human children.

- Indiana Tribune, August 2001 Has already passed an adaptation of the Turing test, wherein he exhibited the language skills of a 15 month old. His creators believe he will pass the Turing test in about a decade

Where to begin1. Collect a large amount of data of children

utterances. These should be categorized based on the age of the child so that we can see the speed with which children can handle more complicated structures.

2. Analyze the utterances in terms of their grammatical elements. Perceived ungrammaticalities will be flagged. This will eventually help us to see if there are specific structures that pose particular challenges.

3. A second stage will involve looking at what input elicited the children’s utterances. Is the child repeating elements in the initial question, either verb conjugations or syntax? Do certain words form a semantic grouping?

Implementation Use XML, creating a custom DTD to

describe the grammar Quickly saw how daunting this task

was (also, would need a DTD for each language)

In retrospect, another project may have been to use the input to drive the creation of the DTD (i.e. the grammar itself) rather than attempting to formulate a grammar that encompassed all possible input

How do you ‘create a grammar’ This is a fundamental question

when it comes to child language acquisition

There are two main theories as to how children learn language: First, the child starts with a more or less blank slate,

hears parents talk, mimics them, says things, hears corrections, and learns the language.

Second, the child starts with universal grammar as an innate endowment, already there, with all its parametrized options, and has only to hear how the options need to be set in his/her particular language to attain the full grammar of that language.

- Universal Language and Linguistics

Children create utterances they have never heard before, so they cannot simply be mimicking others

There are certain classes of errors that children make, and others they never make. This implies some underlying rules-

Ask Jabba if the boy [who is watching Mickey] is happy.

a. The structure-dependent responseIs [the boy who is watching Mickey] _ happy?

b. The non-structure-dependent response*Is [the boy who _ watching Mickey] is happy?

There were actually four different responses returned from the children.

Is [the boy who is watching Mickey] happy 38%*Is [the boy who _ watching Mickey] is happy? 0%*Is [the boy who is watching Mickey] is happy? 58%

*Is [the boy who is watching Mickey], is he happy? 22%

Universal Grammar Universal grammar is a theory of

linguistics postulating principles of grammar shared by all languages, thought to be innate to humans. It attempts to explain language acquisition in general, not describe specific languages. en.wikipedia.org/wiki/Universal_grammar

http://www.google.com/url?sa=X&start=2&oi=define&ei=1Hq9RKC8M8PYigHVpuWtCA&sig2=M6QnnCn4DxuAXq1Tt98v8w&q=http://en.wikipedia.org/wiki/Universal_grammar

http://www.google.com/url?sa=X&start=2&oi=define&ei=1Hq9RKC8M8PYigHVpuWtCA&sig2=M6QnnCn4DxuAXq1Tt98v8w&q=http://en.wikipedia.org/wiki/Universal_grammar

Principles and parameters The central idea of principles and

parameters is that a person's syntactic knowledge can be modelled with two formal mechanisms: A finite set of fundamental principles that

are common to all languages; e.g., that a sentence must always have a subject, even if it is not overtly pronounced.

A finite set of parameters that determine syntactic variability amongst languages; e.g., a binary parameter that determines whether or not the subject of a sentence must be overtly pronounced (this example is sometimes referred to as the Pro-drop parameter).

New design Refer to the framework to

identify the various parameters Create a DTD generic enough to

allow for the various parameter settings**Default settings will be identified. Only with positive input would alternative parameters be set. This is the concept known as markedness.

Benefits? Build on a widely-accepted

framework Not so biased towards input from

limited individuals Easily extensible

Still not a novel idea Close to 20 years ago, Vivian Cook

wrote a paper on PAL (Program for Acquiring Language), a parser that would use Government and Binding theory to parse input.

PAL takes in some data from a user, sets certain parameters based on previous input, and then attempts to parse new input using these rules. It offers specific feedback on how the input was parsed, as well as whether it conformed to the parameter settings or not.

Conclusion HAL has seen some success PAL worked within its domain

Looking at how children acquire language, we may be able to develop a system that can ‘learn’, by modifying parameters based on input received

More studies will eventually show us if “artificial systems can acquire language the same way children do”

Finalpresentation

Technology

watching mickey

watching mickey

overtly pronounced

turing test

child starts

finite set

language

child