School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Natural Language Processing aka Computational Linguistics aka Text Analytics: Introduction and overview Eric Atwell, Language Research Group (with thanks to Katja Markert, Marti Hearst, and other contributors)
30
Embed
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Natural Language Processing aka Computational Linguistics aka Text Analytics:
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
School of somethingFACULTY OF OTHER
School of ComputingFACULTY OF ENGINEERING
Natural Language Processing aka Computational Linguistics aka Text Analytics: Introduction and overview
Eric Atwell, Language Research Group
(with thanks to Katja Markert, Marti Hearst, and other contributors)
• Thanks to many others for much of the material; particularly…
• Katja Markert, Reader, School of Computing, Leeds University http://www.comp.leeds.ac.uk/markert http://www.comp.leeds.ac.uk/lng
• Marti Hearst, Associate Professor, School of Information, University of California at Berkeley http://www.ischool.berkeley.edu/people/faculty/martihearst http://courses.ischool.berkeley.edu/i256/f06/sched.html
School of ComputingFACULTY OF ENGINEERING
Today
Module Objectives
Why NLP is difficult: language is a complex system
How to solve it? Corpus-based machine-learning approaches
Motivation: applications of “The Language Machine”
Objectives
On completion of this module, students should be able to:- understand theory and terminology of empirical modelling of natural language;- understand and use algorithms, resources and techniques for implementing and evaluating NLP systems;- be familiar with some of the main language engineering and text analytics application areas;- appreciate why unrestricted natural language processing is still a major research task.
Goals of this Module
Learn about the problems and possibilities of natural language analysis:
• What are the major issues?
• What are the major solutions?
• How well do they work?
• How do they work?
At the end you should:
• Agree that language is subtle and interesting!
• Feel some ownership over the algorithms
• Be able to assess NLP problems
• Know which solutions to apply when, and how
• Be able to read research papers in the field
Why is NLP difficult?
Computers are not brains
• There is evidence that much of language understanding is built into the human brain
Computers do not socialize
• Much of language is about communicating with people
Key problems:
• Representation of meaning
• Language presupposes knowledge about the world
• Language is ambiguous: a message can have many interpretations
• Language presupposes communication between people
2001: A Space Odyssey (1968)
Dave Bowman: “Open the pod bay doors, HAL”
HAL 9000: “I’m sorry Dave. I’m afraid I can’t do that.”
Hidden Structure
English plural pronunciation
• Toy + s toyz ; add z
• Book + s books ; add s
• Church + s churchiz ; add iz
• Box + s boxiz ; add iz
• Sheep + s sheep ; add nothing
What about new words?
• Bach + ‘s baXs ; why not baXiz?
Language subtleties
Adjective order and placement
• A big black dog
• A big black scary dog
• A big scary dog
• A scary big dog
A black big dog
Antonyms
• Which sizes go together?
• Big and little
• Big and small
• Large and small
Large and little
World Knowledge is subtle
He arrived at the lecture.
He chuckled at the lecture.
He arrived drunk.
He chuckled drunk.
He chuckled his way through the lecture.
He arrived his way through the lecture.
Words are ambiguous: multiple functions and meanings
I know that.
I know that block.
I know that blocks the sun.
I know that block blocks the sun.
How can a machine understand these differences?
• Get the cat with the gloves.
How can a machine understand these differences?
• Get the sock from the cat with the gloves.
• Get the glove from the cat with the socks.
How can a machine understand these differences?
• Decorate the cake with the frosting.
• Decorate the cake with the kids.
• Throw out the cake with the frosting.
• Throw out the cake with the kids.
News Headline Ambiguity
Iraqi Head Seeks Arms
Juvenile Court to Try Shooting Defendant
Teacher Strikes Idle Kids
Kids Make Nutritious Snacks
British Left Waffles on Falkland Islands
Red Tape Holds Up New Bridges
Bush Wins on Budget, but More Lies Ahead
Hospitals are Sued by 7 Foot Doctors
(Headlines leave out punctuation and function-words)
Lynne Truss, 2003. Eats shoots and leaves:
The Zero Tolerance Approach to Punctuation
The Role of Memorization
Children learn words quickly
• Around age two they learn about 1 word every 2 hours.
• (Or 9 words/day)
• Often only need one exposure to associate meaning with word
• Can make mistakes, e.g., overgeneralization
“I goed to the store.”
• Exactly how they do this is still under study
Adult vocabulary
• Typical adult: about 60,000 words
• Literate adults: about twice that.
The Role of Memorization
Dogs can do word association too!
• Rico, a border collie in Germany
• Knows the names of each of 100 toys
• Can retrieve items called out to him with over 90% accuracy.
• Can also learn and remember the names of unfamiliar toys after just one encounter, putting him on a par with a three-year-old child.