The Role of CNL and AMR in Scalable Abstractive Summarization for Multilingual Media Monitoring Normunds Grūzītis and Guntis Bārzdiņš University of Latvia, IMCS National information agency LETA 5th Workshop on Controlled Natural Language, 25–26 July 2016, Aberdeen, Scotland
21
Embed
The Role of CNL and AMR in Scalable Abstractive Summarization for Multilingual Media Monitoring
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
The Role of CNL and AMRin Scalable Abstractive
Summarizationfor Multilingual Media Monitoring
Normunds Grūzītis and Guntis Bārzdiņš
University of Latvia, IMCSNational information agency LETA
5th Workshop on Controlled Natural Language, 25–26 July 2016, Aberdeen, Scotland
Large-scale media monitoring
BBC monitoring journalists translate from 30 languages into English, follow 400 social media accounts every day.
A monitoring journalist typically monitors 4 TV channels and several online sources simultaneously. This is about the maximum that any person can cope with mentally and physically. The required human effort thus scales linearly with the number of monitored sources.
Monitoring journalists constantly need to be on the lookout for more sources and follow important stories—but as it is, they are tied down with mundane, routine monitoring tasks.
Monitoring 250 video channels results in a daily buffer of 2.5TB, a weekly buffer of 19Tb, and an annual buffer of 1Pb.
SUMMA – Scalable Understanding of Multilingual MediA
Identify people, places, events of interestDiscover trends, emerging events, crucial new stories
H2020 grant No. 688139
Timeline
Storyline
Event-based multi-document summarization: storyline highlights across a set of related stories
unrestricted
sort of CNL?(templates)
• Extractive summarization selects representative sentences from the input documents
• Abstractive summarization builds a semantic representation from which a summary is generated
• What semantic representation?
Sentence A: I saw Joe’s dog, which was running in the garden.Sentence B: The dog was chasing a cat.Summary: Joe’s dog was chasing a cat in the garden.
Liu F., Flanigan J., Thomson S., Sadeh N., Smith N.A. Toward Abstractive Summarization Using Semantic Representations. NAACL 2015
Abstractive summarization
AMR – Abstract Meaning Representation• A semantic representation aimed at large-scale human annotation
• A practical, replicable amount of abstraction
• Captures many aspects of meaning in a single simple data structure
• Aims to abstract away from (English) syntax
• Rooted, labeled graphs
• Makes heavy use of PropBank framesets
• An actual sembank of nearly 50K sentences
• Sentences paired with their whole-sentence, logical meanings
AMR – Abstract Meaning Representation• A form of AMR has been around for a long time (Langkilde and Knight, 1998)
• It has changed a lot since then: PropBank, DBpedia, etc.
• Banarescu et al. (2013) – the fundamentals of the current AMR annotation scheme
• Uses the PENMAN notation (Bateman, 1990)
• A way of representing a directed labeled graph in a simple tree-like form
• Easy to read and write (for a human), and to traverse (for a program)
• From semantic role labelling (SRL) to whole-sentence representation
AMR – Abstract Meaning Representation• Nodes are variables labelled by concepts
Natural Language Understanding• While it has been recently showed that the CNL approach can be scaled up..
• Embedded CNLs allowing for CNL-based domain-specific information extraction
• CNL as an efficient and user-friendly interface for Big Data end-point querying
• CNL for bootstrapping robust NL interfaces
• High-level CNL for legal sources
• ..use cases like media monitoring are not limited to a particular domain, the input sources vary from newswire texts to TV and radio transcripts to user-generated content in social networks
• In the era of Big Data, there is a dominating view that Deep Learning is the only way to cope with robust and scalable NLU
• NLU cannot be approached by CNLs, and grammars in general (?)
SemEval 2016 Task 8 on AMR parsing1. Riga (University of Latvia / LETA): 0.61962. CAMR (Brandeis University / Boulder Learning Inc. / Rensselaer Polytechnic Institute): 0.61953. ICL-HD (Ruprecht-Karls-Universität Heidelberg): 0.60054. UCL+Sheffield (University College London / University of Sheffield): 0.59835. M2L (Kyoto University): 0.59526. CMU (Carnegie Mellon University / University of Washington): 0.56367. CU-NLP (OK Robot Go Ltd. / University of Colorado): 0.55668. UofR (University of Rochester): 0.49859. MeaningFactory (University of Groningen): 0.4702*10. CLIP@UMD (University of Maryland): 0.437011. DynamicPower (National Institute for Japanese Language and Linguistics): 0.3706*
* Did not use AMR training data
NLG from AMR• The potential of grammar-based and CNL approaches becomes obvious in the opposite direction
• e.g. in the generation of story highlights from summarized (pruned) AMR graphs
• Text generation from AMR is still recognized as a future task• An unexplored niche for grammars and CNLs• GF, for instance, as an excellent framework for implementing multilingual AMR verbalizers• Issue: AMR to AST mapping
Pourdamghani N., Gao Y., Hermjakob U., Knight K. Aligning English Strings with Abstract Meaning Representation Graphs. EMNLP 2014
Butler A. Deterministic natural language generation from meaning representations for machine translation. NAACL 2016 Workshop on Semantics-Driven Machine Translation
Pourdamghani N., Knight K., Hermjakob U. Generating English from Abstract Meaning Representations. INLG 2016 (to appear)
Flanigan J., Dyer C., Smith N.A., Carbonell J. Generation from Abstract Meaning Representation using Tree Transducers. NAACL 2016
NLG from AMR• Butler A. 2016. Deterministic natural language generation from meaning representations for
machine translation. NAACL Workshop on Semantics-Driven Machine Translation
• Converts PENMAN-style representations to Penn-style trees
• Uses Tregex and Tsurgeon utilities which are a part of the Stanford NLP library
• Covers a wide range of constructions
• A simple example: “Girls see a boy.”
AMR to GF conversion: first experiment“Girls see a boy.”(x2 (see-01 (:ARG0 (x1 girl)) (:ARG1 (x4 boy))))
adjoin (Cl (VP @)) with PB-framemove ARG0 under Clmove ARG1 under VPadjoin (NP a_Quant singularNum (CN @)) with ARG0/1
excise var
AMR to GF conversion: first experiment“The boy sees the two pretty girls.”(x3 (see-01 (:ARG0 (x2 boy)) (:ARG1 (x7 (girl (:quant 2) (:mod (x6 pretty)))))))
mkCN : A ⟶ N ⟶ CNmkNum : Digits ⟶ NummkDigits : Str ⟶ Digits
move mod under CNreplace Num with quantadjoin (Num (Digits @)) with quant
Story headlines: Templates? Application grammar? CNL?Multilingual Headlines Generator(a GF toy example by Jose P. Moreno)http://grammaticalframework.org/demos/multilingual_headlines.html
Conclusion• There is a potential for cooperating with the DL folks in both NLU and NLG
• Especially in NLG which is recognized among the next problems to “solve” by DL
• Especially in domain specific use cases that can be approached by CNL
• AMR to text issues to be addressed: number, time, co-references, articles, concepts and WSD (for multilingual NLG), named entities, reification; the management of transformation rules