INFuture2009: Digital Resources and Knowledge Sharing, 4-7 November 2009 FF & FER Comparative Analysis of Automatic Term and Collocation Extraction Sanja Seljan, Bojana Dalbelo Bašić, Jan Šnajder, Davor Delač, Matija Šamec-Gjurin, Dina Crnec Faculty of Humanities and Social Sciences, Department of Information Sciences Faculty of Electrical Engineering and Computing
15
Embed
FF & FER INFuture2009: Digital Resources and Knowledge Sharing, 4-7 November 2009 Comparative Analysis of Automatic Term and Collocation Extraction Sanja.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
INFuture2009: Digital Resources and Knowledge Sharing, 4-7 November 2009
FF & FER
Comparative Analysis of Automatic Term and Collocation
Extraction
Sanja Seljan, Bojana Dalbelo Bašić, Jan Šnajder,Davor Delač, Matija Šamec-Gjurin, Dina Crnec
Faculty of Humanities and Social Sciences, Department of Information Sciences Faculty of Electrical Engineering and Computing
INFuture2009: Digital Resources and Knowledge Sharing, 4-7 November 2009
FF & FEROverview
I. Introduction– Reasons for extraction
II. Research– Resources & tools– Extracted lists
III. Evaluation– Precision, recall, F-measure
IV. Conclusion
INFuture2009: Digital Resources and Knowledge Sharing, 4-7 November 2009
FF & FERI. Introduction
• Monolingual and multilingual resources– Helpful– Integrated– Require human intervention
• EU pre-accession activities– Speed up + consistency
• Used in further research and practice
INFuture2009: Digital Resources and Knowledge Sharing, 4-7 November 2009
FF & FER
• List:– Terms (Member State, European Union)
– Collocations (adopt a/the resolution, decided as follows)
– Multi-word units (depend on, well-being)
• Term extraction process:– Term extraction (term acquisition)- identification– Term recognition - verification
INFuture2009: Digital Resources and Knowledge Sharing, 4-7 November 2009
FF & FERII. Research
• Resources– 10 documents – legislation, Cro-Eng
• Tools– TermeX tool (FER) – list A– SDL Multi Term Extract + NooJ (FF) – list B
• Reference list– Evaluation – reference list
INFuture2009: Digital Resources and Knowledge Sharing, 4-7 November 2009
FF & FERReference list
• 470 terms and collocations• Exclude unigrams• Balance between lexical coverage, adequacy,