Top Banner
Victoria Muehleisen Waseda University
34

Introducing the SILS Learner Corpus Victoria Muehleisen Waseda University.

Dec 31, 2015

Download

Documents

Arthur Fisher
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Introducing the SILS Learner Corpus Victoria Muehleisen Waseda University.

Victoria MuehleisenWaseda University

Page 2: Introducing the SILS Learner Corpus Victoria Muehleisen Waseda University.

SILS = School of International Liberal Studies at Waseda University in Tokyo.

Waseda was founded in 1882, but SILS was only started in April 2004.

Page 3: Introducing the SILS Learner Corpus Victoria Muehleisen Waseda University.

The SILS curriculum is mainly taught in English.

The majority of students are Japanese who have been educated in Japan, but a growing number come from other (mainly Asian) countries and/or have been educated outside of Japan.

Page 4: Introducing the SILS Learner Corpus Victoria Muehleisen Waseda University.

Based on the results of a placement test (TOEFL-PBT), about 2/3 of entering students take extra classes in English reading and listening.

ALL students (regardless of English ability) take English writing courses.

Page 5: Introducing the SILS Learner Corpus Victoria Muehleisen Waseda University.

There are three levels of writing class, and students are placed by means of an in-house placement test.

All students must complete the Advanced Level before graduating: most do this within the first three semesters, before study abroad.

Page 6: Introducing the SILS Learner Corpus Victoria Muehleisen Waseda University.

We are collecting the essays from the required writing classes for the SILS corpus.

In the first few weeks of their first writing class, the corpus project is explained to students, and they are asked for their permission.

Page 7: Introducing the SILS Learner Corpus Victoria Muehleisen Waseda University.

Those who agree also fill out a survey about their language background.

All essays for the writing classes are submitted on-line, so after permission has been given, the teachers and students don’t have to do anything else. The essays are automatically collected.

Page 8: Introducing the SILS Learner Corpus Victoria Muehleisen Waseda University.

At any time until they graduate, students can ask for particular essays to be excluded, or even for all essays to be removed (but no one has done this so far.)

Page 9: Introducing the SILS Learner Corpus Victoria Muehleisen Waseda University.

The essays are downloaded class-by-class throughout the semester.

The background survey data and essays are entered by graduate student workers into a custom-made database.

Page 10: Introducing the SILS Learner Corpus Victoria Muehleisen Waseda University.

Gender, age, TOEFL score, native language(s), etc.

Where they have lived and studied, and what languages they used in these contexts…

Page 11: Introducing the SILS Learner Corpus Victoria Muehleisen Waseda University.
Page 12: Introducing the SILS Learner Corpus Victoria Muehleisen Waseda University.
Page 13: Introducing the SILS Learner Corpus Victoria Muehleisen Waseda University.

Each semester, we make a class list for all the students in each class who are participating in the project.

The class lists are used to organize the data entry.

Page 14: Introducing the SILS Learner Corpus Victoria Muehleisen Waseda University.
Page 15: Introducing the SILS Learner Corpus Victoria Muehleisen Waseda University.

The database also includes detailed information about the assignments the students were given.

Page 16: Introducing the SILS Learner Corpus Victoria Muehleisen Waseda University.
Page 17: Introducing the SILS Learner Corpus Victoria Muehleisen Waseda University.

Students upload their essays using their preferred word-processing program (usually a version of Word, but some as plain text).

After we download the essays, we use cut-and-paste to put them into the database. They become plain text (unicode).

Page 18: Introducing the SILS Learner Corpus Victoria Muehleisen Waseda University.

Entering essays is a slow procedure! But we can’t change the way the essays are submitted for the courses, and we need to be sure that we only include essays by students who have given permission.

Page 19: Introducing the SILS Learner Corpus Victoria Muehleisen Waseda University.

When putting the essays into the database, some formatting is lost (e.g., margins, font), but we make sure to keep some kinds: paragraph breaks, font styles (italic, bold, underline).

We also have ways to describe tables

or pictures which are removed.

Page 20: Introducing the SILS Learner Corpus Victoria Muehleisen Waseda University.

The title, essay body, and references are put into separate sections. Students’ names are removed, of course.

Both first drafts and second drafts (when available) are included in the database.

Page 21: Introducing the SILS Learner Corpus Victoria Muehleisen Waseda University.

There are no plans to annotate the whole corpus for errors or POS, but we may try it with small sub-corpora at some future time.

Page 22: Introducing the SILS Learner Corpus Victoria Muehleisen Waseda University.
Page 23: Introducing the SILS Learner Corpus Victoria Muehleisen Waseda University.
Page 24: Introducing the SILS Learner Corpus Victoria Muehleisen Waseda University.

After three semesters (Fall 2005, Spring 2006, and Fall 2006) , we have 2800 first drafts, and more than 5000 essays including both first and second drafts.

Page 25: Introducing the SILS Learner Corpus Victoria Muehleisen Waseda University.

The total number of words is around 1,650,000 for first drafts only, 3,180,000 for both first and second drafts.

Page 26: Introducing the SILS Learner Corpus Victoria Muehleisen Waseda University.

There are essays by about 700 different students.

Most of these have Japanese as their native language, but there are also 39 students whose native language is Chinese, 33 for Korean, 13 for English, and 6 for others.

Page 27: Introducing the SILS Learner Corpus Victoria Muehleisen Waseda University.

We are currently inputting the essays from the Spring 2007 semester (which starts in April and runs through the end of July).

Page 28: Introducing the SILS Learner Corpus Victoria Muehleisen Waseda University.

We can output a tailor-made corpus created using the variables mentioned already.

For example, we can create a corpus of all the essays written by women whose native language is Chinese.

Page 29: Introducing the SILS Learner Corpus Victoria Muehleisen Waseda University.

We can make a corpus of first drafts of a particular assignment and compare it to the second drafts.

We can even make a corpus of essays written for the advanced class in Fall 2006 by students with Japanese as a native language who started out in the intermediate class in spring and who went to high school in Japan.

Page 30: Introducing the SILS Learner Corpus Victoria Muehleisen Waseda University.
Page 31: Introducing the SILS Learner Corpus Victoria Muehleisen Waseda University.

Examining the effectiveness of the curriculum and materials used in the writing classes, e.g.students' use of quotation and

paraphrasing, which are emphasized in our writing courses.

differences in first and second drafts, to see how much and what students actually change.

Page 32: Introducing the SILS Learner Corpus Victoria Muehleisen Waseda University.

We also plan to look at students’ overuse and underuse of collocations found in academic writing.

The extensive language background data should also make the corpus useful for people studying L1 influence in L2 writing.

Page 33: Introducing the SILS Learner Corpus Victoria Muehleisen Waseda University.

At some point, data from the corpus will probably be publicly available, but I don’t know when. (It’s not clear who at the university would have to approve use of the data outside of Waseda.)

Page 34: Introducing the SILS Learner Corpus Victoria Muehleisen Waseda University.

[email protected]

http://www.f.waseda.jp/vicky/learner/ index.html

A research report describing the creation of the corpus will be available on-line soon. Please check the website above for details.