Pedersen acl2011-business-meeting

How would I like to see ACL conferences develop and change in the next five years?

Ted PedersenDepartment of Computer ScienceUniversity of Minnesota, Duluthhttp://www.d.umn.edu/~tpederse

June 22, 2011

http://www.d.umn.edu/~tpederse

More papers with reproducible results...

Why?

If we are going to have highly empirical papers where progress is demonstrated via tables of results, then those results must be reproducible by the reader (and the author) to be believable

Are we doing science? Other benefits...

Empiricism is not a matter of faith (Pedersen), Computational Linguistics, Volume 34, Number 3, pp. 465-470, September 2008.

http://aclweb.org/anthology-new/J/J08/J08-3010.pdf

http://aclweb.org/anthology-new/J/J08/J08-3010.pdf

Great Progress!

Replicability a specific criteria in reviews Software and data submissions to ACL 2011!

1,146 submissions : 84 w/software, 117 w/data

292 accepted : 30 w/software, 35 w/data Software / data included in Proceedings USB!!

4 w/software+data 13 w/software, 17 w/data 258/292 = 88% with neither software nor data

Relatively low submission rate for data and code ...

Already available, no need to do it? Hard to anonymize existing or released code... Just can't do it?

Restrictions on data and code? Data and code aren't ready for public display...

Empirical Evaluation...

Randomly selected 10 of the 164 long papers 9 of 10 empirical

Reviewed papers to determine degree of replicability

Software available? Data available? Description self contained and complete?

Replicability (1-5)

Will members of the ACL community be able to reproduce or verify the results in this paper?

5 = could easily reproduce the results.

4 = could mostly reproduce the results, but there may be some variation because of sample variance or minor variations in their interpretation of the protocol or method.

3 = could reproduce the results with some difficulty. The settings of parameters are underspecified or subjectively determined; the training/evaluation data are not widely available.

2 = would be hard pressed to reproduce the results. The contribution depends on data that are simply not available outside the author's institution or consortium; not enough details are provided.

1 = could not reproduce the results here no matter how hard they tried.

A Table of Results

Data? Code? Description? Comparison? Claim Score

3rd party dist. 3rd party + ? Complete? self self-improve 3

3rd party dist. 3rd party + ? Complete? self self-improve 3

3rd party dist. No Parameters? self self-Improve 2

Closed No See elsewhere

self self-improve 1

Private sharing

3rd party + ? Complete? self self-improve 2

Shared task No See elsewhere

Shared task best ever! 1

Shared task 3rd party + ? Complete Shared task Lower cost 4

Private sharing

No Complete? Pub. results best ever! 1

Private sharing

3rd party + ? Parameters? Pub. results Improve over 2

N/A N/A Complete Theoretical Improve scope N/A

A Few Generalizations... We use data from 3rd parties and shared tasks

Still some private sharing and private data :( 1 of 10 submitted data (partial)

We use 3rd party code as a starting point... ...but don't provide extensions (3rd party + ?) :( 0 of 10 submitted software

Descriptions are often incomplete ...and this is why we need software and data

New Age Empiricism Lots of self improvement

Can't anonymize software? Agreed. How anonymous are submissions in

the first place? Web searches, plagiarism detectors, etc. often

reveal authors anyway We expand on ground breaking work by

Zigglebottom, 1999... (thus spake Zigglebottom)

Drop blind submissions Improving Our Reviewing Process (Mani)

Computational Linguistics, Volume 37, Number 1, March 2011.

(related, e.g., advocates signed reviews)

Expect More. Reward More.

Weight replicability higher for accept/reject decisions and best paper awards.

Drop blind submissions, enable more transparent review of papers and software/data.

Continue initiatives to encourage submission of software /data and enable distribution

Nice work ACL 2011! Be careful of domains where data is by

definition not sharable

Pedersen acl2011-business-meeting

Education

private data

data submissions

data available

table of results data

wdata software data

submission of software

data partial

trainingevaluation data