Towards improving automatic text summaries Vers une amélioration des résumés automatiques de textes Abdelkrime ARIES Supervisors: Pr. Zegour & Pr. Hidouci Research Group: D3 Team École nationale Supérieure d’Informatique (ESI, ex. INI), Algérie LCSI laboratory mid-term seminars: April 19th, 2016
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Towards improving automatic text summariesVers une amélioration des résumés automatiques de textes
Abdelkrime ARIESSupervisors: Pr. Zegour & Pr. Hidouci
Research Group: D3 Team
École nationale Supérieure d’Informatique (ESI, ex. INI), Algérie
LCSI laboratory mid-term seminars: April 19th, 2016
ProblematicExtractive methods
Abstractive methodsDemo
Thank you
Plan
1 Problematic
2 Extractive methods
3 Abstractive methods
4 Demo
5 Thank you
Abdelkrime ARIES (ESI 2016) Towards improving automatic text summaries 2/24
ProblematicExtractive methods
Abstractive methodsDemo
Thank you
MotivationSummarization classificationExtractive vs. AbstractiveMulti-Lingual systemsObjectives
Problematic
Abdelkrime ARIES (ESI 2016) Towards improving automatic text summaries 3/24
ProblematicExtractive methods
Abstractive methodsDemo
Thank you
MotivationSummarization classificationExtractive vs. AbstractiveMulti-Lingual systemsObjectives
ProblematicMotivation
Why should we summarize ?
Saving reading time
Showing content on
small devices
Facilitating document selection
Helping in search
Abdelkrime ARIES (ESI 2016) Towards improving automatic text summaries 4/24
ProblematicExtractive methods
Abstractive methodsDemo
Thank you
MotivationSummarization classificationExtractive vs. AbstractiveMulti-Lingual systemsObjectives
IntroductionSummarization classification
Following [1, 2] :
S u m m a r i z a t i o nOutput documentInput document Purpose
Source size
Single-documentMulti-document
Specificity
Domain-specificGeneral
Form
Audience
GenericQuery-oriented
Usage
Expansiveness
IndicativeInformative
Derivation
Conventionality
BackgroundJust-the-news
ExtractAbstract
Partiality
NeutralEvaluative
FixedFloating
ScaleGenre
Abdelkrime ARIES (ESI 2016) Towards improving automatic text summaries 5/24
ProblematicExtractive methods
Abstractive methodsDemo
Thank you
MotivationSummarization classificationExtractive vs. AbstractiveMulti-Lingual systemsObjectives
ProblematicExtractive vs. Abstractive
Extractive :
+ Fast with less resources (CPU + data)
+ Can be simply applied to many languages (statistical)
- Incoherent text
- Just pertinent sentences which can have no relation between them
Abstractive :
+ Good text presentation
+ Redundancy can be dealt with
- Slow with a lot of resources (CPU + data)
- Hard to be implemented (language dependent)
Abdelkrime ARIES (ESI 2016) Towards improving automatic text summaries 6/24
ProblematicExtractive methods
Abstractive methodsDemo
Thank you
MotivationSummarization classificationExtractive vs. AbstractiveMulti-Lingual systemsObjectives
ProblematicMulti-Lingual systems
Process more than one language.Language independent application :
Fully independentPartial independent
Also, there are Cross-lingual systems
Abdelkrime ARIES (ESI 2016) Towards improving automatic text summaries 7/24
ProblematicExtractive methods
Abstractive methodsDemo
Thank you
MotivationSummarization classificationExtractive vs. AbstractiveMulti-Lingual systemsObjectives
ProblematicObjectives
Create a multi-lingual system.
Introduce abstractive
Improve our method [3].
Improve readability and coherence.
Abdelkrime ARIES (ESI 2016) Towards improving automatic text summaries 8/24
ProblematicExtractive methods
Abstractive methodsDemo
Thank you
AllSummarizerAmeliorationLinks
Extractive methods
AllSummarizer as example
Abdelkrime ARIES (ESI 2016) Towards improving automatic text summaries 9/24
ProblematicExtractive methods
Abstractive methodsDemo
Thank you
AllSummarizerAmeliorationLinks
Extractive methodsAllSummarizer
Inputdocument(s)
Summary
Pre-processing
Normalizer
Segmenter
Stemmer
Stop-wordeliminator
Listof sentences
List ofpre-processedwords foreach sentence
Processing
Clustering
Learning
Scoring
Listof clusters
Summary size
P(f|C)
Extraction
ExtractionSentencesscores
ReOrdering
List of firsthigher scoredsentences
Reorderedsentences
Abdelkrime ARIES (ESI 2016) Towards improving automatic text summaries 10/24
ProblematicExtractive methods
Abstractive methodsDemo
Thank you
AllSummarizerAmeliorationLinks
Extractive methodsAmelioration
Some ameliorations have been made to the original AllSummarizer system[3] :
1 Adding more features to the Unigram and Bigram term frequencies :Sentence positionSentence length with stop words.Sentence length without stop words.
2 Adding more languages to the preprocessing task (27 languages) :Arabic, Bulgarian, Catalan, Czech, German, Greek, English, Spanish,Basque, Persian, Finnish, French, Hebrew, Hindi, Hungarian,Indonesian, Italian, Japanese, Dutch, Nynorsk, Norwegian,Portuguese, Romanian, Russian, Swedish, Thai, Turkish and Chinese.
Abdelkrime ARIES (ESI 2016) Towards improving automatic text summaries 11/24
ProblematicExtractive methods
Abstractive methodsDemo
Thank you
AllSummarizerAmeliorationLinks
Extractive methodsAmelioration
3 Testing the summarizer with more than 40 languages (we used defaultpreprocessing for languages without a preprocessing task).
4 Fixing the problem of redundant sentences (especially in case ofmulti-document summarization). This was done by calculating thesimilarity between the last added sentence and the sentence to beadded. Then judging if they are similar using clustering threshold.
5 Estimating the threshold and the features for each language (multiand single document summarization). For more information, see ourparticipation in MultiLing2015 workshop (SIGDIAL conference) [4].
Abdelkrime ARIES (ESI 2016) Towards improving automatic text summaries 12/24
ProblematicExtractive methods
Abstractive methodsDemo
Thank you
AllSummarizerAmeliorationLinks
Extractive methodsLinks
Take a look :https://github.com/kariminf/AllSummarizer
Test it :allsummarizer-kariminf.rhcloud.com
Abdelkrime ARIES (ESI 2016) Towards improving automatic text summaries 13/24
Abdelkrime ARIES (ESI 2016) Towards improving automatic text summaries 22/24
ProblematicExtractive methods
Abstractive methodsDemo
Thank you
So ...
Less has been done, more to be done
Always remember :Summarizing saves time
Abdelkrime ARIES (ESI 2016) Towards improving automatic text summaries 23/24
ProblematicExtractive methods
Abstractive methodsDemo
Thank you
Bibliography I
E. Hovy and C.-Y. Lin, “Automated text summarization and the SUMMARIST system,” in Proceedings of a workshop on held atBaltimore, Maryland : October 13-15, 1998. Association for Computational Linguistics, 1998, pp. 197–214.
K. Sparck Jones, “Automatic summarising : factors and directions,” in Advances in automatic text summarisation. CambridgeMA : MIT Press, 1999.
A. Aries, H. Oufaida, and O. Nouali, “Using clustering and a modified classification algorithm for automatic text summarization,”ser. Proc. SPIE, vol. 8658, 2013, pp. 865 811–865 811–9. [Online]. Available : http://dx.doi.org/10.1117/12.2004001
A. Aries, D. E. Zegour, and K. W. Hidouci, “Allsummarizer system at multiling 2015 : Multilingual single and multi-documentsummarization,” in Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue. Prague,Czech Republic : Association for Computational Linguistics, September 2015, pp. 237–244. [Online]. Available :http://aclweb.org/anthology/W15-4634
Abdelkrime ARIES (ESI 2016) Towards improving automatic text summaries 24/24