How would I like to see ACL conferences develop and change in the next five years? Ted Pedersen Department of Computer Science University of Minnesota, Duluth http://www.d.umn.edu/~tpederse June 22, 2011
May 11, 2015
How would I like to see ACL conferences develop and change in the next five years?
Ted PedersenDepartment of Computer ScienceUniversity of Minnesota, Duluthhttp://www.d.umn.edu/~tpederse
June 22, 2011
More papers with reproducible results...
Why?
If we are going to have highly empirical papers where progress is demonstrated via tables of results, then those results must be reproducible by the reader (and the author) to be believable
Are we doing science? Other benefits...
Empiricism is not a matter of faith (Pedersen), Computational Linguistics, Volume 34, Number 3, pp. 465-470, September 2008.
http://aclweb.org/anthology-new/J/J08/J08-3010.pdf
Great Progress!
Replicability a specific criteria in reviews Software and data submissions to ACL 2011!
1,146 submissions : 84 w/software, 117 w/data
292 accepted : 30 w/software, 35 w/data Software / data included in Proceedings USB!!
4 w/software+data 13 w/software, 17 w/data 258/292 = 88% with neither software nor data
Relatively low submission rate for data and code ...
Already available, no need to do it? Hard to anonymize existing or released code... Just can't do it?
Restrictions on data and code? Data and code aren't ready for public display...
Empirical Evaluation...
Randomly selected 10 of the 164 long papers 9 of 10 empirical
Reviewed papers to determine degree of replicability
Software available? Data available? Description self contained and complete?
Replicability (1-5)
Will members of the ACL community be able to reproduce or verify the results in this paper?
5 = could easily reproduce the results.
4 = could mostly reproduce the results, but there may be some variation because of sample variance or minor variations in their interpretation of the protocol or method.
3 = could reproduce the results with some difficulty. The settings of parameters are underspecified or subjectively determined; the training/evaluation data are not widely available.
2 = would be hard pressed to reproduce the results. The contribution depends on data that are simply not available outside the author's institution or consortium; not enough details are provided.
1 = could not reproduce the results here no matter how hard they tried.
A Table of Results
Data? Code? Description? Comparison? Claim Score
3rd party dist. 3rd party + ? Complete? self self-improve 3
3rd party dist. 3rd party + ? Complete? self self-improve 3
3rd party dist. No Parameters? self self-Improve 2
Closed No See elsewhere
self self-improve 1
Private sharing
3rd party + ? Complete? self self-improve 2
Shared task No See elsewhere
Shared task best ever! 1
Shared task 3rd party + ? Complete Shared task Lower cost 4
Private sharing
No Complete? Pub. results best ever! 1
Private sharing
3rd party + ? Parameters? Pub. results Improve over 2
N/A N/A Complete Theoretical Improve scope N/A
A Few Generalizations... We use data from 3rd parties and shared tasks
Still some private sharing and private data :( 1 of 10 submitted data (partial)
We use 3rd party code as a starting point... ...but don't provide extensions (3rd party + ?) :( 0 of 10 submitted software
Descriptions are often incomplete ...and this is why we need software and data
New Age Empiricism Lots of self improvement
Can't anonymize software? Agreed. How anonymous are submissions in
the first place? Web searches, plagiarism detectors, etc. often
reveal authors anyway We expand on ground breaking work by
Zigglebottom, 1999... (thus spake Zigglebottom)
Drop blind submissions Improving Our Reviewing Process (Mani)
Computational Linguistics, Volume 37, Number 1, March 2011.
(related, e.g., advocates signed reviews)
Expect More. Reward More.
Weight replicability higher for accept/reject decisions and best paper awards.
Drop blind submissions, enable more transparent review of papers and software/data.
Continue initiatives to encourage submission of software /data and enable distribution
Nice work ACL 2011! Be careful of domains where data is by
definition not sharable