CKL CKL --- --- Center for Center for Computationa Computationa l l Linguistics Linguistics Proje Proje c c t MŠMT LC536 t MŠMT LC536 (LC05) (LC05) Univerzita Karlova v Praze, Univerzita Karlova v Praze, ÚFAL MFF ÚFAL MFF Západočeská univerzita Plzeň, Západočeská univerzita Plzeň, KKY FAV KKY FAV Masarykova Univerzita Brno, FI Masarykova Univerzita Brno, FI
28
Embed
CKL --- Center for Computational Linguistics Project MŠMT LC536 (LC05) Univerzita Karlova v Praze, ÚFAL MFF Západočeská univerzita Plzeň, KKY FAV Masarykova.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
(LC05)(LC05)Univerzita Karlova v Praze, ÚFAL MFFUniverzita Karlova v Praze, ÚFAL MFFZápadočeská univerzita Plzeň, KKY Západočeská univerzita Plzeň, KKY
FAVFAVMasarykova Univerzita Brno, FIMasarykova Univerzita Brno, FI
Ústav pro jazyk český AV ČR PrahaÚstav pro jazyk český AV ČR Prahahttp://www.centrumkomputacnilingvistiky.czhttp://www.centrumkomputacnilingvistiky.cz
10:00 Introduction to the Center, history, results (Jan Hajic)10:00 Introduction to the Center, history, results (Jan Hajic) 10:25 Charles University research and results (Jan Hajic)10:25 Charles University research and results (Jan Hajic) 10:40 Break10:40 Break 11:00 Institute for Czech Language research and results 11:00 Institute for Czech Language research and results
(Karel Oliva)(Karel Oliva) 11:15 Masaryk University research and results (Karel Pala)11:15 Masaryk University research and results (Karel Pala) 11:30 University of West Bohemia research and results 11:30 University of West Bohemia research and results
CentCenter for Computational er for Computational LinguisticsLinguistics (LC536) (LC536) 33
The CenterThe Center
Goals:Goals:– Research in all areas of computational Research in all areas of computational
linguistics and speechlinguistics and speech– Close cooperation in speech and langaugeClose cooperation in speech and langauge– Create annotated data Create annotated data – Algorithms and SW Tools for NL analysis Algorithms and SW Tools for NL analysis
and generationand generation– Create and integrate lexical resources Create and integrate lexical resources
CentCenter for Computational er for Computational LinguisticsLinguistics (LC536) (LC536) 44
History of the History of the CentCenterer
Former Former CentCenteerr for Computational for Computational LinguisticsLinguistics (program MŠMT LN) (program MŠMT LN)– 2000-20042000-2004– UK, ÚJČ, ZČUUK, ÚJČ, ZČU: fundamental research type (B): fundamental research type (B)
NowNow: Cent: Centeerr for Computational for Computational LinguisticsLinguistics – ((againagain) ) fundamental research,fundamental research, MŠMT LC MŠMT LC– Masaryk University in Brno added, now 4 Masaryk University in Brno added, now 4
CentCenter for Computational er for Computational LinguisticsLinguistics (LC536) (LC536) 66
The sites The sites (1)(1)
UK Praha (UK Praha (ÚFALÚFAL MFF MFF / Charles University / Charles University))– Formal language theory and algorithmsFormal language theory and algorithms– SW SW tools for NLU / NLGtools for NLU / NLG– Raw, Annotated data (incl. parallel)Raw, Annotated data (incl. parallel)
ZČU Plzeň, KKY FAZČU Plzeň, KKY FAV (University of West V (University of West Bohemia in Pilsen)Bohemia in Pilsen)– Speech recognition and TTSSpeech recognition and TTS– Data collection and annotationData collection and annotation
ÚJČ AV ČRÚJČ AV ČR (Institute of the Czech (Institute of the Czech Language, Academy of Sciences of Language, Academy of Sciences of the CR)the CR)– Digitization of historical dataDigitization of historical data– Lexical databasesLexical databases
CentCenter for Computational er for Computational LinguisticsLinguistics (LC536) (LC536) 88
20052005
Start of work, after some “gap”Start of work, after some “gap”– Apr. 1, Apr. 1, 2005 – 2005 – three months vacuumthree months vacuum– [Got back the name…][Got back the name…]– Reduced budget for 2005 (300k Reduced budget for 2005 (300k €)€)
– Cooperation: Cooperation: EU grant proposalsEU grant proposals continuing work on Malach (U.S.)continuing work on Malach (U.S.) Start of the PIRE NSF project (JHU, Brown Univ.)Start of the PIRE NSF project (JHU, Brown Univ.)
CentCenter for Computational er for Computational LinguisticsLinguistics (LC536) (LC536) 99
20062006
First full yearFirst full year– Prague Dependency Treebank v2.0 finished (published at LDC)Prague Dependency Treebank v2.0 finished (published at LDC)– Speech reconstruction projectSpeech reconstruction project (UK, specifi (UK, specification with PIRE/JHUcation with PIRE/JHU))– Lexical issuesLexical issues (UK, MU (UK, MU, , ÚJČ)ÚJČ)– Speech (ASR, TTS - ZČU)Speech (ASR, TTS - ZČU)– IR – CLEF test collection, CLEF shared task, 1st partIR – CLEF test collection, CLEF shared task, 1st part– Digitization of historical material (ÚJČ)Digitization of historical material (ÚJČ)– Start of EU Integrated project „Companions“: UK, ZČUStart of EU Integrated project „Companions“: UK, ZČU– More More internationalinternational cooperation: EU, USA (JHU, Brown, Univ. of cooperation: EU, USA (JHU, Brown, Univ. of
PPennsylvaniaennsylvania))– Organization of Treebanks and Linguistics Theories, Dec. 2006 Organization of Treebanks and Linguistics Theories, Dec. 2006
(UK)(UK)– 40 „results40 „results”” in the government database („RIV in the government database („RIV”)”)
CentCenter for Computational er for Computational LinguisticsLinguistics (LC536) (LC536) 1010
20072007 Mid-projectMid-project
– LexiLexical resources, new Czech language lexical databasecal resources, new Czech language lexical database (MU+ÚJČ)(MU+ÚJČ)
– Added more students for English work, translationAdded more students for English work, translation English annotation specification, annotationEnglish annotation specification, annotation (ZČU, UK) (ZČU, UK)
– Integration of ASR and TTS with NLU/NLG Integration of ASR and TTS with NLU/NLG (UK, ZČU)(UK, ZČU) In the “Companions” projectIn the “Companions” project
– SW tools for analysis and generationSW tools for analysis and generation Speech, language Speech, language (UK, MU, ZČU)(UK, MU, ZČU)
– International collaborationInternational collaboration EU (3 projeEU (3 projectscts 6 6thth F FP: UK, UK+ZČU), USA (UK, UK+ZČU)P: UK, UK+ZČU), USA (UK, UK+ZČU)
– Local oLocal organirganisation of ACL 2007 and EMNLP 2007sation of ACL 2007 and EMNLP 2007 Still (2011) holds record in attendance (~1100 participants)Still (2011) holds record in attendance (~1100 participants)
– SSemanticsemantics detection of plagiarism (detection of plagiarism (MU) MU) NLUNLU (UK, MU), (UK, MU), NLGNLG (UK (UK))
– NNew algorithms for ASRew algorithms for ASR ProProsody, language modeling, speech reconstructionsody, language modeling, speech reconstruction
– Data acquisition, annotation, corpus toolsData acquisition, annotation, corpus tools– Research (incl. data annotation) for machine translationResearch (incl. data annotation) for machine translation
The TectoMT SW and data platformThe TectoMT SW and data platform– Theoretical formal linguistics, language usageTheoretical formal linguistics, language usage
CentCenter for Computational er for Computational LinguisticsLinguistics (LC536) (LC536) 1212
20092009 Should have been the last year of CKL…Should have been the last year of CKL…
– Application for extension for 2010-11Application for extension for 2010-11 Granted for 2010Granted for 2010
– Research: English data, MT, ASR, DialogResearch: English data, MT, ASR, Dialog Work on the parallel Czech-English treebank (PTB)Work on the parallel Czech-English treebank (PTB) Companions project: integration workCompanions project: integration work
– Tight cooperation between UK and ZCUTight cooperation between UK and ZCU PIRE project – workshops, students from US at UKPIRE project – workshops, students from US at UK Euromatrix EU project on MT extended (-2012)Euromatrix EU project on MT extended (-2012)
– Organization of the CoNLL 2009 shared taskOrganization of the CoNLL 2009 shared task– Organization of session at FET 2009 (EU Organization of session at FET 2009 (EU
CentCenter for Computational er for Computational LinguisticsLinguistics (LC536) (LC536) 1313
20102010 Last fully-funded year: ext. to 2011 granted in Nov.Last fully-funded year: ext. to 2011 granted in Nov.
– Continuation of research along the same linesContinuation of research along the same lines Wrap-up in data annotation: PCEDT, PDTSxWrap-up in data annotation: PCEDT, PDTSx Departures of people due to uncertaintyDepartures of people due to uncertainty
– International cooperation:International cooperation: Companions project finished (Nov. 2010)Companions project finished (Nov. 2010) PIRE continuing towards 2011, EuromatrixPlus renewed (UK)PIRE continuing towards 2011, EuromatrixPlus renewed (UK) New projects in 2010:New projects in 2010:
– Univ. of Pennsylvania – discourse representation, annotation (UK)Univ. of Pennsylvania – discourse representation, annotation (UK)– Khresmoi (EU IP) – medical IR and IE, UKKhresmoi (EU IP) – medical IR and IE, UK– Faust (STREP, machine translation, UK)Faust (STREP, machine translation, UK)– META-NET network of excellence in MT / data sharingMETA-NET network of excellence in MT / data sharing
CentCenter for Computational er for Computational LinguisticsLinguistics (LC536) (LC536) 1515
Most valued publicationsMost valued publications PapersPapers
– Semi-supervised POS tagging (EACL 2009)Semi-supervised POS tagging (EACL 2009) Best results in POS tagging so far, incl. EnglishBest results in POS tagging so far, incl. English Now taggers available in 5 languagesNow taggers available in 5 languages
– Extension of HVS Semantic Parser by Allowing Left-RightExtension of HVS Semantic Parser by Allowing Left-Right BranchBranching (ICASSP 2008)ing (ICASSP 2008) NNew result, drawing from S. Young’s workew result, drawing from S. Young’s work
– Large-scale Semantic Networks: Annotation and Large-scale Semantic Networks: Annotation and EvaluationEvaluation NAACL 2009; NAACL 2009; in cooperation with in cooperation with Google ResearchGoogle Research (Zurich, K. (Zurich, K.
Overall task and system descriptionOverall task and system description BookBook
– Valenční slovník českých sloves Valenční slovník českých sloves ((Valency Lexicon of Czech Valency Lexicon of Czech Verbs, Verbs, KarolinumKarolinum Press Press)) EleElectronic version availablectronic version available
CentCenter for Computational er for Computational LinguisticsLinguistics (LC536) (LC536) 1616
Most valued dataMost valued data CorporaCorpora ( (language databases, publicly availablelanguage databases, publicly available))
– Prague Dependency Treebank 2.0, Linguistic Data Consortium Prague Dependency Treebank 2.0, Linguistic Data Consortium 20062006
– Prague Czech-English Dependency Treebank, to appear in 2011Prague Czech-English Dependency Treebank, to appear in 2011 Penn Treebank & translation to Czech, with semantic annotation Penn Treebank & translation to Czech, with semantic annotation
– SyntaSyntacctiticc dependency dependency parser „MST“ (parser „MST“ (CzechCzech)) WithWith Univ. of Pennsylvania Univ. of Pennsylvania
– Improved Czec ASR and Emotional TTS Improved Czec ASR and Emotional TTS Used in the Companions projectUsed in the Companions project
– NLG and Dialogue Manager w/knowledge baseNLG and Dialogue Manager w/knowledge base Also for the Companions projectAlso for the Companions project
– The TectoMT SW and data handling platform The TectoMT SW and data handling platform MT, dialogue systems (now any NLU/NLG processing -> MT, dialogue systems (now any NLU/NLG processing ->
Only in 2005/6 – need for renewalOnly in 2005/6 – need for renewal
– Small indirect costs (< Small indirect costs (< 12%12%, contribution of inst., contribution of inst.)) ““intangible” benefitsintangible” benefits
– (Sub)teams, even across institutions, flexible assignment (Sub)teams, even across institutions, flexible assignment of people to projects, of people to projects,
– dissertations, one assoc. professor promotiondissertations, one assoc. professor promotion
CentCenter for Computational er for Computational LinguisticsLinguistics (LC536) (LC536) 1919
The Center had to work The Center had to work under certain “restrictions”under certain “restrictions”
Employment of graduate students, postdocs, supervision of Employment of graduate students, postdocs, supervision of graduate studegraduate studentntss– NNow at all four sitesow at all four sites (2009: 10/4/9/1) (2009: 10/4/9/1)
RequirementRequirement: at least on site…: at least on site… →→ CheckCheck Requirement: Requirement: Participation of students (Participation of students (Bc./Mgr./Ph.D.)Bc./Mgr./Ph.D.)
Students - after graduation - went to (e.g.)…Students - after graduation - went to (e.g.)…– Petr Němec (UK): TextKernel, Hol.; Kiril Ribarov (UK): ČEZPetr Němec (UK): TextKernel, Hol.; Kiril Ribarov (UK): ČEZ– Jan Romportl, Aleš Pražák: SpeechTech (spinoff, ZČU)Jan Romportl, Aleš Pražák: SpeechTech (spinoff, ZČU)– VladimVladimír Kadlec (MU Brno): Acision (GB)ír Kadlec (MU Brno): Acision (GB)– Petr Pajas (UK): Google (Zurich)Petr Pajas (UK): Google (Zurich)– VVáclav Novák (UK): Ministry of Interior, then a small startupáclav Novák (UK): Ministry of Interior, then a small startup– FormerFormer CKL (LN CKL (LN, 00-04, 00-04): M. Čmejrek, J. Cuřín (UK): IBM Research): M. Čmejrek, J. Cuřín (UK): IBM Research
USAUSA– Malach (Malach (till till 2007; UK, ZČU): USC, JHU, IBM, UMD2007; UK, ZČU): USC, JHU, IBM, UMD– PIRE: rozpoznávání řeči a strojový překlad (UK, PIRE: rozpoznávání řeči a strojový překlad (UK, indirectlyindirectly ZČU): ZČU):
JHU, Brown Univ.JHU, Brown Univ.– Discourse: Univ. of PennsylvaniaDiscourse: Univ. of Pennsylvania– Treebanking: Univ. of Colorado Treebanking: Univ. of Colorado →→ CheckCheck