INFuture2009: Digital Resources and Knowledge Sharing, 4-7 November 2009 FF & FER Comparative Analysis of Automatic Term and Collocation Extraction Sanja Seljan, Bojana Dalbelo Bašić, Jan Šnajder, Davor Delač, Matija Šamec-Gjurin, Dina Crnec Faculty of Humanities and Social Sciences, Department of Information Sciences Faculty of Electrical Engineering and Computing
15
Embed
Comparative Analysis of Automatic Term and Collocation Extraction
Comparative Analysis of Automatic Term and Collocation Extraction. Sanja Seljan , Bojana Dalbelo Bašić , Jan Šnajder , Davor Delač , Matija Šamec-Gjurin, Dina Crnec Faculty of Humanities and Social Sciences, Department of I nformation Sciences - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
INFuture2009: Digital Resources and Knowledge Sharing, 4-7 November 2009
FF & FER
Comparative Analysis of Automatic Term and Collocation
Extraction
Sanja Seljan, Bojana Dalbelo Bašić, Jan Šnajder,Davor Delač, Matija Šamec-Gjurin, Dina Crnec
Faculty of Humanities and Social Sciences, Department of Information Sciences Faculty of Electrical Engineering and Computing
INFuture2009: Digital Resources and Knowledge Sharing, 4-7 November 2009
FF & FEROverview
I. Introduction– Reasons for extraction
II. Research– Resources & tools– Extracted lists
III. Evaluation– Precision, recall, F-measure
IV. Conclusion
INFuture2009: Digital Resources and Knowledge Sharing, 4-7 November 2009
FF & FERI. Introduction
• Monolingual and multilingual resources– Helpful– Integrated– Require human intervention
• EU pre-accession activities– Speed up + consistency
• Used in further research and practice
INFuture2009: Digital Resources and Knowledge Sharing, 4-7 November 2009
FF & FER
• List:– Terms (Member State, European Union)
– Collocations (adopt a/the resolution, decided as follows)
– Multi-word units (depend on, well-being)
• Term extraction process:– Term extraction (term acquisition)- identification– Term recognition - verification
INFuture2009: Digital Resources and Knowledge Sharing, 4-7 November 2009
FF & FERII. Research
• Resources– 10 documents – legislation, Cro-Eng
• Tools– TermeX tool (FER) – list A– SDL Multi Term Extract + NooJ (FF) – list B
• Reference list– Evaluation – reference list
INFuture2009: Digital Resources and Knowledge Sharing, 4-7 November 2009
FF & FERReference list
• 470 terms and collocations• Exclude unigrams• Balance between lexical coverage, adequacy,