Collocation Extraction Based on Syntactic Criteria Violeta Seretan Department of Translation Technology Faculty of Translation and Interpreting University of Geneva September 2013 Recent Advances in Natural Language Processing, 7-13 September 2013, Hissar, Bulgaria
151
Embed
Collocation Extraction Based on Syntactic Criteria · Collocation Extraction Based on Syntactic Criteria Violeta Seretan Department of Translation Technology Faculty of Translation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Collocation Extraction Based on Syntactic Criteria
Violeta Seretan
Department of Translation TechnologyFaculty of Translation and Interpreting
University of Geneva
September 2013
Recent Advances in Natural Language Processing, 7-13 September 2013, Hissar, Bulgaria
Acknowledgment
Language Technology Laboratory, Department of Linguistics,University of Geneva
Eric Wehrli Luka Nerima Paola Merlo
Acknowledgment
Language Technology Laboratory, Department of Linguistics,University of Geneva
Eric Wehrli Luka Nerima Paola Merlo
Outline On collocations Importance Features Need for analysis Syntax-based extractors Evaluation
I chose to run for the presidency at this moment in history because Ibelieve deeply that we cannot solve the challenges of our time unless wesolve them together
I chose to run for the presidency at this moment in history because Ibelieve deeply that we cannot solve the challenges of our time unless wesolve them together
I chose to run for the presidency at this moment in history because Ibelieve deeply that we cannot solve the challenges of our time unless wesolve them together
I chose to run for the presidency at this moment in history because Ibelieve deeply that we cannot solve the challenges of our time unless wesolve them together
I chose to run for the presidency at this moment in history because Ibelieve deeply that we cannot solve the challenges of our time unless wesolve them together
I tender my heartfelt gratitude to all of them, while taking fullresponsibility for all errors . . .
[Mel’cuk1998, 23]
Collocations
“In all kinds of texts, collocations are indispensable elements with whichour utterances are very largely made”
[Kjellmer1987, 10]
Collocations
“Collocation is the way words combine in a language to producenatural-sounding speech and writing”
[Lea and Runcie2002, vii]
‘Appendix to the Grammar’
“Knowledge that will account for speakers‘ ability to construct andunderstand phrases and expressions in their language which are notcovered by the grammar, the lexicon, and the principles of compositionalsemantics”
[Fillmore et al.1988, 504]
Only available to native speakers
“Advanced learners of second language have great difficulty with nativelikecollocation and idiomaticity. Many grammatical sentences generated bylanguage learners sound unnatural and foreign.”
[Ellis2008]
More examples
EN open air
FR plein air ‘full’
RO aer liber ‘free’
More examples
EN open air
FR plein air ‘full’
RO aer liber ‘free’
More examples
EN open air
FR plein air ‘full’
RO aer liber ‘free’
More examples
EN ask a question
IT fare una domanda, ES hacer una pregunta‘make’
RO a pune o intrebare, FR poser une question‘put’
More examples
EN ask a question
IT fare una domanda, ES hacer una pregunta‘make’
RO a pune o intrebare, FR poser une question‘put’
More examples
EN ask a question
IT fare una domanda, ES hacer una pregunta‘make’
RO a pune o intrebare, FR poser une question‘put’
More examples
EN error occurred
FR erreur s’est produite‘produced itself’
More examples
EN error occurred
FR erreur s’est produite‘produced itself’
More examples
EN cheat death
FR froler la mort‘brush’
More examples
EN cheat death
FR froler la mort‘brush’
More examples
EN reach an agreement
FR parvenir a un accord‘arrive, get to’
IT trovare un accordo‘find’
More examples
EN reach an agreement
FR parvenir a un accord‘arrive, get to’
IT trovare un accordo‘find’
More examples
EN reach an agreement
FR parvenir a un accord‘arrive, get to’
IT trovare un accordo‘find’
More examples
EN money laundering
FR blanchiment d’argent‘whitening’
IT lavaggio di denaro‘washing’
More examples
EN money laundering
FR blanchiment d’argent‘whitening’
IT lavaggio di denaro‘washing’
More examples
EN money laundering
FR blanchiment d’argent‘whitening’
IT lavaggio di denaro‘washing’
Even more examples
EN narrow majority
FR courte majorite‘short’
EN bring to justice
FR traduire en justice‘translate’
FR urmari ın justitie‘follow track’
ENstory breaksstrike a dealin sharp contrastdraw criticismentertain hopeexperience difficultyfoot the billmeet requirementfine weatherdeep impressionserious injury
Even more examples
EN narrow majority
FR courte majorite‘short’
EN bring to justice
FR traduire en justice‘translate’
FR urmari ın justitie‘follow track’
ENstory breaksstrike a dealin sharp contrastdraw criticismentertain hopeexperience difficultyfoot the billmeet requirementfine weatherdeep impressionserious injury
Even more examples
EN narrow majority
FR courte majorite‘short’
EN bring to justice
FR traduire en justice‘translate’
FR urmari ın justitie‘follow track’
ENstory breaksstrike a dealin sharp contrastdraw criticismentertain hopeexperience difficultyfoot the billmeet requirementfine weatherdeep impressionserious injury
Even more examples
Automatically extracted collocation equivalents [Seretan and Wehrli2007]
Outline On collocations Importance Features Need for analysis Syntax-based extractors Evaluation
Importance
“Collocations make up the lion’s share of the phraseme inventory, and thusthey deserve our special attention.”
Outline On collocations Importance Features Need for analysis Syntax-based extractors Evaluation
Syntactic flexibility
make proposal
A proposal for the financing of the variable costs will be made to theCommittee . . .
submit proposal
A joint proposal which addressed such elements as notification,consultations, conciliation and mediation, arbitration, panel procedures,technical assistance, adoption of panel reports and GATTs surveillance oftheir implementation was submitted on behalf of fourteen participants.
Outline On collocations Importance Features Need for analysis Syntax-based extractors Evaluation
Syntactic flexibility
make proposal
A proposal for the financing of the variable costs will be made to theCommittee . . .
submit proposal
A joint proposal which addressed such elements as notification,consultations, conciliation and mediation, arbitration, panel procedures,technical assistance, adoption of panel reports and GATTs surveillance oftheir implementation was submitted on behalf of fourteen participants.
Outline On collocations Importance Features Need for analysis Syntax-based extractors Evaluation
Syntactic flexibility
paragraphe dispose
Notant en outre que le paragraphe 5 de l’Acte final reprenant les resultatsdes Negociations commerciales multilaterales du Cycle d’Uruguay (ci-apresdenommes respectivement l’“Acte final” et le “Cycle d’Uruguay”) disposeque . . .
Outline On collocations Importance Features Need for analysis Syntax-based extractors Evaluation
Identification approaches
1 Approaches based on linear proximityDefinition:
“Collocation is the cooccurrence of two or more wordswithin a short space of each other in a text. The usualmeasure of proximity is a maximum of four wordsintervening.” [Sinclair1991, 170].
2 Approaches based on structural proximityDefinition:
“lexically and/or pragmatically constrained recurrentco-occurrences of at least two lexical items which are in adirect syntactic relation with each other” [Bartsch2004, 76]
Outline On collocations Importance Features Need for analysis Syntax-based extractors Evaluation
Identification approaches
1 Approaches based on linear proximityDefinition:
“Collocation is the cooccurrence of two or more wordswithin a short space of each other in a text. The usualmeasure of proximity is a maximum of four wordsintervening.” [Sinclair1991, 170].
2 Approaches based on structural proximityDefinition:
“lexically and/or pragmatically constrained recurrentco-occurrences of at least two lexical items which are in adirect syntactic relation with each other” [Bartsch2004, 76]
Outline On collocations Importance Features Need for analysis Syntax-based extractors Evaluation
Long-distance dependencies
donner – exemple ‘give – example’
Le visionnaire a donne, lors de sa conference magistrale, a l’occasion de laremise du Prix Latsis 2011 aux differents laureats, le mercredi 30 novembre2011, dans la salle Piaget de l’Universite de Geneve a Uni-Dufour, devenutrop petite pour accueillir le monde scientifique et le public venus de tousles coins et recoins de la Suisse, l’exemple du professeur et ancienpresident senegalais le poete Leopold Sedar Senghor qui maıtrisait autantla culture du pays de Marianne que la langue de Moliere avec perfectionpour devenir le premier Noir membre de l’Academie francaise.
Outline On collocations Importance Features Need for analysis Syntax-based extractors Evaluation
The multilingual challenge
German
“Some properties of the German language make the task of extractingV-N collocations from German text corpora more difficult than for Englishcorpora.”[Breidt1993, 77]
“the assumption that a “semantic agent [...] is principally used before theverb” and a “semantic object [...] is used after it” as described in Smadja(1991a:180) does not hold for German. Therefore, complicated parsing isnecessary to distinguish subject-verb from object-verb combinations.”[Breidt1993, 77]
Outline On collocations Importance Features Need for analysis Syntax-based extractors Evaluation
The multilingual challenge
German
“Some properties of the German language make the task of extractingV-N collocations from German text corpora more difficult than for Englishcorpora.”[Breidt1993, 77]
“the assumption that a “semantic agent [...] is principally used before theverb” and a “semantic object [...] is used after it” as described in Smadja(1991a:180) does not hold for German. Therefore, complicated parsing isnecessary to distinguish subject-verb from object-verb combinations.”[Breidt1993, 77]
Outline On collocations Importance Features Need for analysis Syntax-based extractors Evaluation
Solution
“Ideally, in order to identify lexical relations in a corpus one would need tofirst parse it to verify that the words are used in a single phrase structure.
However, in practice, free-style texts contain a great deal of nonstandardfeatures over which automatic parsers would fail. [...] This fact is beingseriously challenged by current research [...] and might not be true in thenear future.”
Outline On collocations Importance Features Need for analysis Syntax-based extractors Evaluation
Solution
“Ideally, in order to identify lexical relations in a corpus one would need tofirst parse it to verify that the words are used in a single phrase structure.However, in practice, free-style texts contain a great deal of nonstandardfeatures over which automatic parsers would fail.
[...] This fact is beingseriously challenged by current research [...] and might not be true in thenear future.”
Outline On collocations Importance Features Need for analysis Syntax-based extractors Evaluation
Solution
“Ideally, in order to identify lexical relations in a corpus one would need tofirst parse it to verify that the words are used in a single phrase structure.However, in practice, free-style texts contain a great deal of nonstandardfeatures over which automatic parsers would fail. [...] This fact is beingseriously challenged by current research [...] and might not be true in thenear future.”[Smadja1993, 151]
Outline On collocations Importance Features Need for analysis Syntax-based extractors Evaluation
Solution
“with recent significant increases in parsing efficiency and accuracy, thereis no reason why explicit parse information should not be used”[Pearce2002, 1530]
Wu and Zhou (2003); Lu andZhou (2004) – English, Chinese
syntactic parser (NLPWin,Microsoft Research)
7.85% errors in top results
3 types: V-O, N-A, V-Adv
Villada Moiron (2005) – Dutch
dependency parser (Alpino)
sentences shorter than 20 words
many PP-attachment errors;parser only used for chunking
2 types: P-N-P and PP-V
Orliac and Dillinger (2003) –English
deep parser (Logos)
limited grammatical coverage(does not handle relatives)
3 types: S-V, V-O, V-P-N
OthersZinsmeister and Heid (2003), Schulte im Walde (2003) – German, statistical parser(LoPar)Charest et al. (2007) – French, dependency parser (Antidote)
Lin (1998, 1999) – English
dependency parser (unspecified)
sentences shorter than 25 words
9.7% errors in top results
6 types: N-D, N-A, N-N, V-N,N-V, V-Adv
Wu and Zhou (2003); Lu andZhou (2004) – English, Chinese
syntactic parser (NLPWin,Microsoft Research)
7.85% errors in top results
3 types: V-O, N-A, V-Adv
Villada Moiron (2005) – Dutch
dependency parser (Alpino)
sentences shorter than 20 words
many PP-attachment errors;parser only used for chunking
2 types: P-N-P and PP-V
Orliac and Dillinger (2003) –English
deep parser (Logos)
limited grammatical coverage(does not handle relatives)
3 types: S-V, V-O, V-P-N
OthersZinsmeister and Heid (2003), Schulte im Walde (2003) – German, statistical parser(LoPar)Charest et al. (2007) – French, dependency parser (Antidote)
Lin (1998, 1999) – English
dependency parser (unspecified)
sentences shorter than 25 words
9.7% errors in top results
6 types: N-D, N-A, N-N, V-N,N-V, V-Adv
Wu and Zhou (2003); Lu andZhou (2004) – English, Chinese
syntactic parser (NLPWin,Microsoft Research)
7.85% errors in top results
3 types: V-O, N-A, V-Adv
Villada Moiron (2005) – Dutch
dependency parser (Alpino)
sentences shorter than 20 words
many PP-attachment errors;parser only used for chunking
2 types: P-N-P and PP-V
Orliac and Dillinger (2003) –English
deep parser (Logos)
limited grammatical coverage(does not handle relatives)
3 types: S-V, V-O, V-P-N
OthersZinsmeister and Heid (2003), Schulte im Walde (2003) – German, statistical parser(LoPar)Charest et al. (2007) – French, dependency parser (Antidote)
Lin (1998, 1999) – English
dependency parser (unspecified)
sentences shorter than 25 words
9.7% errors in top results
6 types: N-D, N-A, N-N, V-N,N-V, V-Adv
Wu and Zhou (2003); Lu andZhou (2004) – English, Chinese
syntactic parser (NLPWin,Microsoft Research)
7.85% errors in top results
3 types: V-O, N-A, V-Adv
Villada Moiron (2005) – Dutch
dependency parser (Alpino)
sentences shorter than 20 words
many PP-attachment errors;parser only used for chunking
2 types: P-N-P and PP-V
Orliac and Dillinger (2003) –English
deep parser (Logos)
limited grammatical coverage(does not handle relatives)
3 types: S-V, V-O, V-P-N
OthersZinsmeister and Heid (2003), Schulte im Walde (2003) – German, statistical parser(LoPar)Charest et al. (2007) – French, dependency parser (Antidote)
Lin (1998, 1999) – English
dependency parser (unspecified)
sentences shorter than 25 words
9.7% errors in top results
6 types: N-D, N-A, N-N, V-N,N-V, V-Adv
Wu and Zhou (2003); Lu andZhou (2004) – English, Chinese
syntactic parser (NLPWin,Microsoft Research)
7.85% errors in top results
3 types: V-O, N-A, V-Adv
Villada Moiron (2005) – Dutch
dependency parser (Alpino)
sentences shorter than 20 words
many PP-attachment errors;parser only used for chunking
2 types: P-N-P and PP-V
Orliac and Dillinger (2003) –English
deep parser (Logos)
limited grammatical coverage(does not handle relatives)
3 types: S-V, V-O, V-P-N
OthersZinsmeister and Heid (2003), Schulte im Walde (2003) – German, statistical parser(LoPar)Charest et al. (2007) – French, dependency parser (Antidote)
Outline On collocations Importance Features Need for analysis Syntax-based extractors Evaluation
The LATL syntax-based collocation extractor: FipsCo
Goldman et al. (2001) – English, French
deep parser (Fips)
broad grammatical coverage
many types of collocation configurations
FipsCo precedes many syntax-based extractors, and overcomeslimitations pertaining to parsing robustness, precision, coverage, aswell as limitations regarding the list of supported syntactic types.
It has been developed mainly as a CAT tool for WTO translators inthe project “Linguistic Analysis and Collocation Extraction”.
Initially available for English and French, it was further extended toSpanish, Italian, Greek, German and Romanian.
Outline On collocations Importance Features Need for analysis Syntax-based extractors Evaluation
The LATL syntax-based collocation extractor: FipsCo
Goldman et al. (2001) – English, French
deep parser (Fips)
broad grammatical coverage
many types of collocation configurations
FipsCo precedes many syntax-based extractors, and overcomeslimitations pertaining to parsing robustness, precision, coverage, aswell as limitations regarding the list of supported syntactic types.
It has been developed mainly as a CAT tool for WTO translators inthe project “Linguistic Analysis and Collocation Extraction”.
Initially available for English and French, it was further extended toSpanish, Italian, Greek, German and Romanian.
Outline On collocations Importance Features Need for analysis Syntax-based extractors Evaluation
The LATL syntax-based collocation extractor: FipsCo
Goldman et al. (2001) – English, French
deep parser (Fips)
broad grammatical coverage
many types of collocation configurations
FipsCo precedes many syntax-based extractors, and overcomeslimitations pertaining to parsing robustness, precision, coverage, aswell as limitations regarding the list of supported syntactic types.
It has been developed mainly as a CAT tool for WTO translators inthe project “Linguistic Analysis and Collocation Extraction”.
Initially available for English and French, it was further extended toSpanish, Italian, Greek, German and Romanian.
Outline On collocations Importance Features Need for analysis Syntax-based extractors Evaluation
The LATL syntax-based collocation extractor: FipsCo
Goldman et al. (2001) – English, French
deep parser (Fips)
broad grammatical coverage
many types of collocation configurations
FipsCo precedes many syntax-based extractors, and overcomeslimitations pertaining to parsing robustness, precision, coverage, aswell as limitations regarding the list of supported syntactic types.
It has been developed mainly as a CAT tool for WTO translators inthe project “Linguistic Analysis and Collocation Extraction”.
Initially available for English and French, it was further extended toSpanish, Italian, Greek, German and Romanian.
Outline On collocations Importance Features Need for analysis Syntax-based extractors Evaluation
Fips - Key facts
Constituent
simplified X-bar structure [XP L X R] (no intermediate level)X – lexical head (A, N, V, D, P, Conj, ...)L/R – lists of left/right subconstituents
Manually-built lexica
detailed morphosyntactic and semantic information: selectional properties,subcategorization information, syntactico-semantic features likely toinfluence the syntactic analysis
Algorithm - main operations
Project: assignment of constituent structures to lexical entries
Merge: combination of adjacent constituents
Move: creation of chains by linking surface positions of “moved”constituents to their corresponding canonical positions.
Outline On collocations Importance Features Need for analysis Syntax-based extractors Evaluation
Fips - Key facts
Constituent
simplified X-bar structure [XP L X R] (no intermediate level)X – lexical head (A, N, V, D, P, Conj, ...)L/R – lists of left/right subconstituents
Manually-built lexica
detailed morphosyntactic and semantic information: selectional properties,subcategorization information, syntactico-semantic features likely toinfluence the syntactic analysis
Algorithm - main operations
Project: assignment of constituent structures to lexical entries
Merge: combination of adjacent constituents
Move: creation of chains by linking surface positions of “moved”constituents to their corresponding canonical positions.
Outline On collocations Importance Features Need for analysis Syntax-based extractors Evaluation
Fips - Key facts
Constituent
simplified X-bar structure [XP L X R] (no intermediate level)X – lexical head (A, N, V, D, P, Conj, ...)L/R – lists of left/right subconstituents
Manually-built lexica
detailed morphosyntactic and semantic information: selectional properties,subcategorization information, syntactico-semantic features likely toinfluence the syntactic analysis
Algorithm - main operations
Project: assignment of constituent structures to lexical entries
Merge: combination of adjacent constituents
Move: creation of chains by linking surface positions of “moved”constituents to their corresponding canonical positions.
Outline On collocations Importance Features Need for analysis Syntax-based extractors Evaluation
Method – Stage 1: Candidate selection
1 Lexical filter:rule out auxiliary and modal verbs, proper nouns, common nounsrepresenting titles (Mr.)
2 Structural filter:predicate-argument relation in the arguments table of predicatescombinations <X, head of item in L/R> in a given syntactic relation,e.g., head-modifier, noun-adjective in FP (functional phrase)
Outline On collocations Importance Features Need for analysis Syntax-based extractors Evaluation
Method – Stage 1: Candidate selection
1 Lexical filter:rule out auxiliary and modal verbs, proper nouns, common nounsrepresenting titles (Mr.)
2 Structural filter:predicate-argument relation in the arguments table of predicatescombinations <X, head of item in L/R> in a given syntactic relation,e.g., head-modifier, noun-adjective in FP (functional phrase)
Outline On collocations Importance Features Need for analysis Syntax-based extractors Evaluation
Method – Stage 1: Candidate selection
Syntactic patterns
adjective-noun heavy smokernoun-[predicate]-adjective effort [be] devotednoun-noun suicide attacknoun-preposition-noun round of negotiationsnoun-preposition inquiry intoadjective-preposition crazy aboutsubject-verb war breaksverb-object meet requirementverb-preposition-argument bring to boilverb-preposition point outadverb-verb fully supportadverb-adjective highly important
Outline On collocations Importance Features Need for analysis Syntax-based extractors Evaluation
Results – Syntactic environments
passivization:
I see that amendments to the report by Mr Mendez de Vigo andMr Leinen have been tabled on this subject.
relativization:
The communication devotes no attention to the impact the newlyannounced policy measures will have on the candidate countries.
interrogation:
What impact do you expect this to have on reducing our deficitand our level of imports?
cleft constructions:
It is a very pressing issue that Mr Sacredeus is addressing.
coordinated clauses:
This motion implies that somehow the current income tax lawson alimony and maintenance payments are unfair, contribute tothe problem and therefore should be amended.
Outline On collocations Importance Features Need for analysis Syntax-based extractors Evaluation
Results – Syntactic environments
passivization:
I see that amendments to the report by Mr Mendez de Vigo andMr Leinen have been tabled on this subject.
relativization:
The communication devotes no attention to the impact the newlyannounced policy measures will have on the candidate countries.
interrogation:
What impact do you expect this to have on reducing our deficitand our level of imports?
cleft constructions:
It is a very pressing issue that Mr Sacredeus is addressing.
coordinated clauses:
This motion implies that somehow the current income tax lawson alimony and maintenance payments are unfair, contribute tothe problem and therefore should be amended.
Outline On collocations Importance Features Need for analysis Syntax-based extractors Evaluation
Results – Syntactic environments
passivization:
I see that amendments to the report by Mr Mendez de Vigo andMr Leinen have been tabled on this subject.
relativization:
The communication devotes no attention to the impact the newlyannounced policy measures will have on the candidate countries.
interrogation:
What impact do you expect this to have on reducing our deficitand our level of imports?
cleft constructions:
It is a very pressing issue that Mr Sacredeus is addressing.
coordinated clauses:
This motion implies that somehow the current income tax lawson alimony and maintenance payments are unfair, contribute tothe problem and therefore should be amended.
Outline On collocations Importance Features Need for analysis Syntax-based extractors Evaluation
Results – Syntactic environments
passivization:
I see that amendments to the report by Mr Mendez de Vigo andMr Leinen have been tabled on this subject.
relativization:
The communication devotes no attention to the impact the newlyannounced policy measures will have on the candidate countries.
interrogation:
What impact do you expect this to have on reducing our deficitand our level of imports?
cleft constructions:
It is a very pressing issue that Mr Sacredeus is addressing.
coordinated clauses:
This motion implies that somehow the current income tax lawson alimony and maintenance payments are unfair, contribute tothe problem and therefore should be amended.
Outline On collocations Importance Features Need for analysis Syntax-based extractors Evaluation
Results – Syntactic environments
passivization:
I see that amendments to the report by Mr Mendez de Vigo andMr Leinen have been tabled on this subject.
relativization:
The communication devotes no attention to the impact the newlyannounced policy measures will have on the candidate countries.
interrogation:
What impact do you expect this to have on reducing our deficitand our level of imports?
cleft constructions:
It is a very pressing issue that Mr Sacredeus is addressing.
coordinated clauses:
This motion implies that somehow the current income tax lawson alimony and maintenance payments are unfair, contribute tothe problem and therefore should be amended.
Outline On collocations Importance Features Need for analysis Syntax-based extractors Evaluation
Conclusion
Parsing technologies, traditionally seen as inappropriate for large-scaleprocessing of corpora, are today the main ingredient for accuratecollocation extraction.
The strong syntactic filter applied on the source text reduces theamount of data to process in the subsequent step to almost onequarter.
Parsing is the solution to the combinatorial explosion problem in thetask of identifying longer collocations in text (e.g., be a major turningpoint, to stand in stark contrast).
Outline On collocations Importance Features Need for analysis Syntax-based extractors Evaluation
Sabine Bartsch.
2004.Structural and Functional Properties of Collocations in English. A Corpus Study of Lexical and Pragmatic Constraints onLexical Cooccurrence.Gunter Narr Verlag, Tubingen.
Elisabeth Breidt.
1993.Extraction of V-N-collocations from text corpora: A feasibility study for German.In Proceedings of the Workshop on Very Large Corpora: Academic and Industrial Perspectives, pages 74–83, Columbus,USA.
Simon Charest, Eric Brunelle, Jean Fontaine, and Bertrand Pelletier.
2007.Elaboration automatique d’un dictionnaire de cooccurrences grand public.In Actes de la 14e conference sur le Traitement Automatique des Langues Naturelles (TALN 2007), pages 283–292,Toulouse, France, June.
Nick Ellis.
2008.Phraseology: The periphery and the heart of language.In Fanny Meunier and Sylviane Granger, editors, Phraseology in Foreign Language and Teaching, pages 1–13. JohnBenjamins, Amsterdam/Philadelphia.
Stefan Evert.
2004.The Statistics of Word Cooccurrences: Word Pairs and Collocations.Ph.D. thesis, University of Stuttgart.
Outline On collocations Importance Features Need for analysis Syntax-based extractors Evaluation
Charles Fillmore, Paul Kay, and Catherine O’Connor.
1988.Regularity and idiomaticity in grammatical constructions: The case of let alone.Language, 64(3):501–538.
Jean-Philippe Goldman, Luka Nerima, and Eric Wehrli.
2001.Collocation extraction using a syntactic parser.In Proceedings of the ACL Workshop on Collocation: Computational Extraction, Analysis and Exploitation, pages 61–66,Toulouse, France.
Dirk Heylen, Kerry G. Maxwell, and Marc Verhagen.
1994.Lexical functions and machine translation.In Proceedings of the 15th International Conference on Computational Linguistics (COLING 1994), pages 1240–1244,Kyoto, Japan.
Seonho Kim, Zooil Yang, Mansuk Song, and Jung-Ho Ahn.
1999.Retrieving collocations from Korean text.In Proceedings of the 1999 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and VeryLarge Corpora, pages 71–81, Maryland, USA.
Goran Kjellmer.
1987.Aspects of English collocations.In Willem Meijs, editor, Corpus Linguistics and Beyond, pages 133–140. Rodopi, Amsterdam.
Philipp Koehn.
2005.Europarl: A parallel corpus for statistical machine translation.In Proceedings of The Tenth Machine Translation Summit (MT Summit X), pages 79–86, Phuket, Thailand, September.
Outline On collocations Importance Features Need for analysis Syntax-based extractors Evaluation
Diana Lea and Moira Runcie, editors.
2002.Oxford Collocations Dictionary for Students of English.Oxford University Press, Oxford.
Dekang Lin.
1998.Extracting collocations from text corpora.In First Workshop on Computational Terminology, pages 57–63, Montreal, Canada.
Dekang Lin.
1999.Automatic identification of non-compositional phrases.In Proceedings of the 37th annual meeting of the Association for Computational Linguistics on ComputationalLinguistics, pages 317–324, Morristown, NJ, USA.
Yajuan Lu and Ming Zhou.
2004.Collocation translation acquisition using monolingual corpora.In Proceedings of the 42nd Meeting of the Association for Computational Linguistics (ACL’04), pages 167–174,Barcelona, Spain.
Igor Mel’cuk.
1998.Collocations and lexical functions.In Anthony P. Cowie, editor, Phraseology. Theory, Analysis, and Applications, pages 23–53. Claredon Press, Oxford.
Athina Michou and Violeta Seretan.
2009.A tool for multi-word expression extraction in Modern Greek using syntactic parsing.In Proceedings of the Demonstrations Session at EACL 2009, pages 45–48, Athens, Greece, April. Association forComputational Linguistics.
Outline On collocations Importance Features Need for analysis Syntax-based extractors Evaluation
Brigitte Orliac and Mike Dillinger.
2003.Collocation extraction for machine translation.In Proceedings of Machine Translation Summit IX, pages 292–298, New Orleans, Lousiana, USA.
Darren Pearce.
2002.A comparative evaluation of collocation extraction techniques.In Third International Conference on Language Resources and Evaluation, pages 1530–1536, Las Palmas, Spain.
Violeta Seretan and Eric Wehrli.
2007.Collocation translation based on sentence alignment and parsing.In Proceedings of TALN 2007, Toulouse, France.
Violeta Seretan and Eric Wehrli.
2009.Multilingual collocation extraction with a syntactic parser.Language Resources and Evaluation, 43(1):71–85.
Violeta Seretan and Eric Wehrli.
2010.Extending a multilingual symbolic parser to Romanian.In Dan Tufis and Corina Forascu, editors, Multilinguality and Interoperability in Language Processing with Emphasis onRomanian. Romanian Academy Publishing House.
Violeta Seretan and Eric Wehrli.
2011.FipsCoView: On-line visualisation of collocations extracted from multilingual parallel corpora.In Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World, pages125–127, Portland, Oregon, USA, June. Association for Computational Linguistics.
Outline On collocations Importance Features Need for analysis Syntax-based extractors Evaluation
Violeta Seretan.
2011.A collocation-driven approach to text summarization.In Actes de la 18e conference sur le Traitement Automatique des Langues Naturelles (TALN 2011), pages 9–14,Montpellier, France.
John Sinclair.
1991.Corpus, Concordance, Collocation.Oxford University Press, Oxford.
Frank Smadja.
1993.Retrieving collocations from text: Xtract.Computational Linguistics, 19(1):143–177.
Marıa Begona Villada Moiron.
2005.Data-driven identification of fixed expressions and their modifiability.Ph.D. thesis, University of Groningen.
Eric Wehrli, Luka Nerima, and Yves Scherrer.
2009.Deep linguistic multilingual translation and bilingual dictionaries.In Proceedings of the Fourth Workshop on Statistical Machine Translation, pages 90–94, Athens, Greece.
Eric Wehrli, Violeta Seretan, and Luka Nerima.
2010.Sentence analysis and collocation identification.In Proceedings of the Workshop on Multiword Expressions: from Theory to Applications (MWE 2010), pages 27–35,Beijing, China.
Outline On collocations Importance Features Need for analysis Syntax-based extractors Evaluation
Eric Wehrli.
2003.Translation of words in context.In Proceedings of Machine Translation Summit IX, pages 502–504, New Orleans, Louisiana, USA.
Eric Wehrli.
2006.TwicPen: Hand-held scanner and translation software for non-native readers.In Proceedings of the COLING/ACL 2006 Interactive Presentation Sessions, pages 61–64, Sydney, Australia.
Eric Wehrli.
2007.Fips, a “deep” linguistic multilingual parser.In ACL 2007 Workshop on Deep Linguistic Processing, pages 120–127, Prague, Czech Republic.
Hau Wu and Ming Zhou.
2003.Synonymous collocation extraction using translation information.In Proceeding of the Annual Meeting of the Association for Computational Linguistics (ACL 2003), pages 120–127,Sapporo, Japan.
David Yarowsky.
1993.One sense per collocation.In Proceedings of ARPA Human Language Technology Workshop, pages 266–271, Princeton.