Top Banner
Subword and spatiotemporal models for identifying actionable information in Haitian Kreyol Robert Munro Stanford University CoNLL 2011 Munro, Robert. "Subword and spatiotemporal models for identifying actionable information in Haitian Kreyol." Proceedings of the Fifteenth Conference on Computational Natural Language Learning. Association for Computational Linguistics, 2011. http://www.robertmunro.com/research/ munro11kreyol.pdf
44

Subword and spatiotemporal models for identifying actionable information in Haitian Kreyol

Jul 18, 2015

Download

Technology

Robert Munro
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Subword and spatiotemporal models for identifying actionable information in Haitian Kreyol

Subword and spatiotemporal models

for identifying actionable information

in Haitian Kreyol

Robert Munro

Stanford University

CoNLL 2011

Munro, Robert. "Subword and spatiotemporal models for identifying actionable

information in Haitian Kreyol." Proceedings of the Fifteenth Conference on

Computational Natural Language Learning. Association for Computational Linguistics,

2011.

http://www.robertmunro.com/research/munro11kreyol.pdf

Page 2: Subword and spatiotemporal models for identifying actionable information in Haitian Kreyol

January 12, 2010

Page 3: Subword and spatiotemporal models for identifying actionable information in Haitian Kreyol

Messages start streaming in

Page 4: Subword and spatiotemporal models for identifying actionable information in Haitian Kreyol

Messages start streaming in

Page 5: Subword and spatiotemporal models for identifying actionable information in Haitian Kreyol

80,000 messages

Page 6: Subword and spatiotemporal models for identifying actionable information in Haitian Kreyol

80,000 messages

Page 7: Subword and spatiotemporal models for identifying actionable information in Haitian Kreyol

80,000 messages

Page 8: Subword and spatiotemporal models for identifying actionable information in Haitian Kreyol

80,000 messages

Page 9: Subword and spatiotemporal models for identifying actionable information in Haitian Kreyol

80,000 messages

Page 10: Subword and spatiotemporal models for identifying actionable information in Haitian Kreyol

80,000 messages

Page 11: Subword and spatiotemporal models for identifying actionable information in Haitian Kreyol

Crowdsourced (Mission 4636)

Page 12: Subword and spatiotemporal models for identifying actionable information in Haitian Kreyol

Feedback

US Marines

◦ “Saving lives every day.”

FEMA:

◦ “The most comprehensive and up-to-date

map available to the humanitarian

community.”

World Food Program

◦ “We delivered food to an informal camp of

2500 people that you identified for us.”

Page 13: Subword and spatiotemporal models for identifying actionable information in Haitian Kreyol

Sudden-onset language

processing

Page 14: Subword and spatiotemporal models for identifying actionable information in Haitian Kreyol

Sudden-onset language

processing

Page 15: Subword and spatiotemporal models for identifying actionable information in Haitian Kreyol

Sudden-onset language

processing

Page 16: Subword and spatiotemporal models for identifying actionable information in Haitian Kreyol

Sudden-onset language

processing

Page 17: Subword and spatiotemporal models for identifying actionable information in Haitian Kreyol

Prioritization

Only 2% of messages were

‘actionable’

◦ An identifiable location

◦ Medical, S+R, water, clustered food

requests, security, unaccompanied

children.

How can we prioritize the

actionable items in the original

Haitian Kreyol?

Can we leverage the models for

more-sparse information sources?

Page 18: Subword and spatiotemporal models for identifying actionable information in Haitian Kreyol

Evaluation data

Mission 4636. 40,811 text-messages

sent to a free number, ‘4636’, in Haiti.

(predominantly in Haitian Kreyol, with

translations, UN-defined categories,

and geolocation)

Radio Station. 7,528 text-messages

sent to a Haitian radio station.

Twitter. 63,195 Haiti-related tweets.

Page 19: Subword and spatiotemporal models for identifying actionable information in Haitian Kreyol

Variation is the norm

mesi mèsi mèci meci merci

Kreyol French

Abbrv. Full Form Pattern Meaning

s’on se yon sVn is a

av`en av`eknou VvVn with us

relem rele mwen relem call me

wap ouap uVp you are

map mwen ap map I will be

zanmimzanmi mwen zanmim my friend

lavel lave li lavel to wash (it)

Page 20: Subword and spatiotemporal models for identifying actionable information in Haitian Kreyol

Evaluation data

Page 21: Subword and spatiotemporal models for identifying actionable information in Haitian Kreyol

Combining spatio-temporal

models

Page 22: Subword and spatiotemporal models for identifying actionable information in Haitian Kreyol

Streaming architecture

Build from initial items

timeModel

Page 23: Subword and spatiotemporal models for identifying actionable information in Haitian Kreyol

Streaming architecture

Predict (and evaluate) on incoming

items

◦ (penalty for training)time

Model

Page 24: Subword and spatiotemporal models for identifying actionable information in Haitian Kreyol

Streaming architecture

Repeat / retrain

timeModel

Page 25: Subword and spatiotemporal models for identifying actionable information in Haitian Kreyol

Streaming architecture

Repeat / retrain

timeModel

Page 26: Subword and spatiotemporal models for identifying actionable information in Haitian Kreyol

Streaming architecture

Repeat / retrain

timeModel

Page 27: Subword and spatiotemporal models for identifying actionable information in Haitian Kreyol

Streaming architecture

Repeat / retrain

timeModel

Page 28: Subword and spatiotemporal models for identifying actionable information in Haitian Kreyol

Streaming architecture

Repeat / retrain

timeModel

Page 29: Subword and spatiotemporal models for identifying actionable information in Haitian Kreyol

Features

G : Words and ngrams

W : Subword patterns

P : Source of the message

T : Time received

C : Categories (c0,...,47)

L : Location (longitude and latitude)

L : Has-location (a location is written

in the message)

Page 30: Subword and spatiotemporal models for identifying actionable information in Haitian Kreyol

Features

Subword models

◦ Full-forms and normalizations (Munro and

Manning 2010)

Abbrv. Full Form Pattern Meaning

s’on se yon sVn is a

av`en av`eknou VvVn with us

relem rele mwen relem call me

wap ouap uVp you are

map mwen ap map I will be

zanmimzanmi mwen zanmim my friend

lavel lave li lavel to wash (it)

Page 31: Subword and spatiotemporal models for identifying actionable information in Haitian Kreyol

Time / Space / source

Timestamp (in place of discounting)

Phone-number of sender

Spatial tile-membership:

Page 32: Subword and spatiotemporal models for identifying actionable information in Haitian Kreyol

Additional streaming models

Message contains an identifiable

location

timeModel

Page 33: Subword and spatiotemporal models for identifying actionable information in Haitian Kreyol

Additional streaming models

Message contains an identifiable

location

Prediction for 47 categories

timeModel

timeModel

timeModel

timeModel

Page 34: Subword and spatiotemporal models for identifying actionable information in Haitian Kreyol

Final streaming model

Prediction for ‘is actionable’

timeModel

Page 35: Subword and spatiotemporal models for identifying actionable information in Haitian Kreyol

Final streaming model

Prediction for ‘is actionable’

timeModel

timeModel

timeModel

timeModel

Combines features with predictions

from Category and Has-Location

models

Page 36: Subword and spatiotemporal models for identifying actionable information in Haitian Kreyol

Evaluation

100 training epochs

◦ Calculated on predictions over epochs 2-

100

Comparison of two-tier architecture

with Oracle ‘has-location’ and

‘Categories’

Identification of actionable messages

in Radio Station and Twitter messages

(full results in paper)

Page 37: Subword and spatiotemporal models for identifying actionable information in Haitian Kreyol

Feature-based improvements

Subwords and Source (G, T,W, P)

Temporal feature (G, T)

Words/Ngrams (G)

0.326

0.252

0.207

Page 38: Subword and spatiotemporal models for identifying actionable information in Haitian Kreyol

Combined models

All Features/Models

Spatial clusters (L)

Words/Ngrams (G)

0.855

0.756

0.207

Page 39: Subword and spatiotemporal models for identifying actionable information in Haitian Kreyol

Outperforming the oracle

Location (two-teir prediction)

Location (oracle)

Words/Ngrams (G)

0.310

0.274

0.207

Page 40: Subword and spatiotemporal models for identifying actionable information in Haitian Kreyol

Negative results from filtering

Oracle true-neg filtering

All Features/Models

Words/Ngrams (G)

0.428

0.855

0.207

Page 41: Subword and spatiotemporal models for identifying actionable information in Haitian Kreyol

Social media

Twitter

Radio station

All Features/Models

0.969

0.904

0.855

Page 42: Subword and spatiotemporal models for identifying actionable information in Haitian Kreyol

Conclusions (usability)

Subword and spatio-temporal models

can give a 10-fold increase in

prioritization

Adding multi-tiered streaming models

can give a 50-fold increase in

prioritization

Cross-domain adaptation is possible

for need(le)-in-haystack information

extraction from social media

Page 43: Subword and spatiotemporal models for identifying actionable information in Haitian Kreyol

Questions?

Page 44: Subword and spatiotemporal models for identifying actionable information in Haitian Kreyol

Appendix: abstract

Crisis-affected populations are often able to maintain digital

communications but in a sudden-onset crisis any aid organizations

will have the least free resources to process such communications.

Information that aid agencies can actually act on, ‘actionable’

information, will be sparse so there is great potential to

(semi)automatically identify actionable communications. However,

there are hurdles as the languages spoken will often be

underresourced, have orthographic variation, and the precise

definition of ‘actionable’ will be response-specific and evolving.

We present a novel system that addresses this, drawing on 40,000

emergency text messages sent in Haiti following the January 12,

2010 earthquake, predominantly in Haitian Kreyol. We show that

keyword/ngram-based models using streaming MaxEnt achieve up to

F=0.21 accuracy. Further, we find current state-of-the-art subword

models increase this substantially to F=0.33 accuracy, while

modeling the spatial, temporal, topic and source contexts of the

messages can increase this to a very accurate F=0.86 over direct

text messages and F=0.90-0.97 over social media, making it a viable

strategy for message prioritization.