Top Banner
Natural Language Processing for Underground Communications Dan Klein MURI Kickoff, 11/20/2009
23

Natural Language Processing for Underground Communications

Jan 08, 2016

Download

Documents

markku

Natural Language Processing for Underground Communications. Dan Klein MURI Kickoff, 11/20/2009. Underground Communications. Example Data. Underground Communications. Example Data, Manual Extraction. Processing: Information Extraction. Observation Graphs. http://www.rossmail.ru/offline.htm. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Natural Language Processing for Underground Communications

Natural Language Processing for Underground Communications

Dan Klein

MURI Kickoff, 11/20/2009

Page 2: Natural Language Processing for Underground Communications

Underground Communications

Example Data

Page 3: Natural Language Processing for Underground Communications

Underground Communications

Example Data, Manual Extraction

Page 4: Natural Language Processing for Underground Communications

Processing: Information Extraction

Page 5: Natural Language Processing for Underground Communications

Observation Graphs

http://www.spam-reklama.ru/contact.html

http://www.rossmail.ru/offline.htm

http://www.fax-reklama.ru/contact.html

http://www.f-mail.ru/kontact/

Page 6: Natural Language Processing for Underground Communications

Underlying Entities and Relations

Person 1211Alias: SteakcapICQ: 598199837Location: France

ReferralFrom: Person 2133To: Person 1211Product: 3319

Person 2133Alias: ThunderelviICQ: 787659871Location: USA

Product 3319Type: FB HarvesterContact: 709-324-0989

Person 9876Alias: ZakarICQ: 234150301Email: zakar@e-...

EmployeePerson: Person 9876Product: 5621Role: Developer

Product 5621Type: Spam SenderContact: 495-210-4423

Extraction Goal

Page 7: Natural Language Processing for Underground Communications

Existing NLP Tasks

Page 8: Natural Language Processing for Underground Communications

Discourse Structure

sign deliver vote

Page 9: Natural Language Processing for Underground Communications

General Approach

Page 10: Natural Language Processing for Underground Communications
Page 11: Natural Language Processing for Underground Communications
Page 12: Natural Language Processing for Underground Communications
Page 13: Natural Language Processing for Underground Communications
Page 14: Natural Language Processing for Underground Communications

An Entity Reference Model

Our Existing Approach

Page 15: Natural Language Processing for Underground Communications
Page 16: Natural Language Processing for Underground Communications
Page 17: Natural Language Processing for Underground Communications

Adding Semantic Knowledge

America Online company

Our Current Work

Page 18: Natural Language Processing for Underground Communications

Evaluation: Reference

MUC F1 - Cluster Similarity UnsupervisedSupervised

UnsupervisedBaseline

Bengston &Roth 08

PreliminaryCurrent Work

Does it Work?

Page 19: Natural Language Processing for Underground Communications

Cross-Document IdentityWhat’s Coming Up

Page 20: Natural Language Processing for Underground Communications

Extracting Global Entities

Page 21: Natural Language Processing for Underground Communications

Underlying Entities and Relations

Person 1211Alias: SteakcapICQ: 598199837Location: France

ReferralFrom: Person 2133To: Person 1211Product: 3319

Person 2133Alias: ThunderelviICQ: 787659871Location: USA

Product 3319Type: FB HarvesterContact: 709-324-0989

Person 9876Alias: ZakarICQ: 234150301Email: zakar@e-...

EmployeePerson: Person 9876Product: 5621Role: Developer

Product 5621Type: Spam SenderContact: 495-210-4423

Subsequent Goals

Page 22: Natural Language Processing for Underground Communications

Summary

Goal: systems which simultaneously extract and dedupe Train in an unsupervised / discovery manner Requires: both new statistical machinery and good models of

underlying domain structure (transactions, etc) Requires: processing domain-specific language (domain adaptation,

grammar induction)

Evaluation: are the entities and relations correct? First steps: measure general approach on newswire, etc. where we

know the right answers Also: evaluate on underground network data

Near term: increased accuracy in identity resolution, begin to extract simple relations, better basic analysis

Page 23: Natural Language Processing for Underground Communications

Thanks!