Top Banner
Automated Essay Scoring: Giving Voice to Grievances and Exploring New Contexts Elizabeth Edwards PhD Candidate, Graduate Teaching Assistant Washington State University
85

Automated Essay Scoring: Giving Voice to Grievances and Exploring New Contexts

Feb 25, 2016

Download

Documents

Deion

Automated Essay Scoring: Giving Voice to Grievances and Exploring New Contexts. Elizabeth Edwards PhD Candidate, Graduate Teaching Assistant Washington State University. A Need to Define Terms and Situations. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript

Presentation Title

Automated Essay Scoring: Giving Voice to Grievances and Exploring New Contexts

Elizabeth EdwardsPhD Candidate, Graduate Teaching AssistantWashington State University

A Need to Define Terms and SituationsWith efficiency and profit override the AES debate, what argumentative options are available to composition studies?

What areas of compromise and consideration exist, how are they discussed, and in which directions do they propel debates about AES?

The Complexity of WritingVojak et al.: writing a socially situated activity, functionally and formally diverse, [and] a meaning-making activity that can be conveyed in multiple modalities (98).

Ericsson: [c]onsidering the meaning of meaning is vitally importantbecause most people believe that conveying meaning is the most important goal of the written word (30).

Anson: [i]nferencingprovide[s] the connective tissue between assertions and yielding meaning and interpretation(42). Inferencing cant be replicated by a computer (39), without turn[ing words]into mindless bits of linguistic code (48).

The Commodification of WritingHaswell: data mining allows entrepreneurs to develop front ends and sets of documentation that make their systems friendly that is, easy to use, cheap, and accurate (24).

Burstein and Marcu: machines can build model[s to] look at trends across 1,200 essays (458).

Kukich: AES mightelucidate many of the features that characterize good and bad writing, and many of the linguistic, [and] cognitiveskills that underlie the human capacity forwriting (22).

Subtracting Humans (Teachers) from the EquationKukich: using e-rater clearly relieves a significant portion of the load on human scoring experts (25).

Rudner and Gagne: computer scoring can be faster, reduce costs, increase accuracy, and eliminate concerns about rater consistency and fatigue (1).

Shermis et al.: [o]bviously computers do not understand written messages in the same way that humans do (403), but assert that alternative technologies achieve similar results (403).

Mild Concessions from AES SupportersKlobucar et al.: automated essay scoring might fit within a larger ecology as one among a family of assessment techniques supporting the development of digitally enhanced literacy (105).

Klobucar et al.: [W]e must guard against the possibility that the machine will misjudge a writing feature and that students will be wrongly counseled (114).

Messick: validity should be plural, not singular (37) and issues with construct validity are not just errors or bugs in the experiment they are the issues with the entire test or system in the first place (41).

Byrne et al. write in ethical postscript that machines [can] not detect other subtleties of writing such as irony, metaphor, puns, connotation, or other rhetorical devices (33).

Options for Reframing the AES DebateWhithaus: composition researchers and teachers need to step back from a discourse of rejectionto make finerdistinctions among types of software and their uses (170). Finer distinctions will allow a situation by situation consideration of how software is used and its impact on writing pedagogy (176). Condon: when assessment and prompt design occur at a national level, both almost certainly have little to do with local curricula, and they may well be inappropriate for a local student population (214). Condon: machine scorings principle advantage economy comes at too great a costMachine scoring simply cannot compete economically, as long as we consider all the costs of employing it (217, original emphasis).Works Cited (1)Anson, Chris M. Cant Touch This: Reflections on the Servitude of Computers as Readers. Ericsson, Patricia Freitag, and Haswell, Richard H., Eds. Machine Scoring of Student Essays: Truth and Consequences. Logan, UT: Utah State University Press, 2006. 38-56. Print.Burstein, Jill and Daniel Marcu. A Machine Learning Approach for Identification of Thesis and Conclusion Statements in Student Essays. Computers and the Humanities 12.3 (2004): 455 467. Byrne, Roxanne et al. eGrader, a software application that automatically scores student essays: with a postscript on the ethical complexities. Journal of Systemics, Cybernetics & Informatics 8.6 (2010): 30 35.Condon, William. Why Less Is Not More: What We Lose by Letting a Computer Score Writing Samples. Ericsson, Patricia Freitag, and Haswell, Richard H., Eds. Machine Scoring of Student Essays: Truth and Consequences. Logan, UT: Utah State University Press, 2006. 211-220. Print.

Works Cited (2)Ericsson, Patricia Freitag. The Meaning of Meaning: Is a Paragraph More Than an Equation? Ericsson, Patricia Freitag, and Haswell, Richard H., Eds. Machine Scoring of Student Essays: Truth and Consequences. Logan, UT: Utah State University Press, 2006. 28-37. Print.Haswell, Richard H. Automatons and Automated Scoring: Drudges, Black Boxes, and Dei Ex Machina. Machine Scoring of Student Essays: Truth and Consequences. Logan, UT: Utah State University Press, 2006. 57-78. Print.Klobucar, A., et al.Automated scoring in context: Rapid assessment for placed students. Assessing Writing 18 (2013): 62 84. Kukich, Karen. Beyond Automated Essay Scoring. IEEE Intelligent Systems 15.5 (2000): 22 27. McCurry, Doug. Can machine scoring deal with broad and open writing tests as well as human readers? Assessing Writing 15.2 (2010): 118 129.

Works Cited (3)Messick, S.Test validity: A matter of consequences. Social Indicators Research 45(1998): 35 44.Rudner, Lawrence, and Phill Gagne. An Overview of Three Approaches to Scoring Written Essays by Computer. Practical Assessment, Research, and Evaluation 7.26 (2001). Shermis, Mark D., et al. Applications of Computers in Assessment and Analysis of Writing. Handbook of Writing Research. Ed. Charles A. MacArthur, Steve Graham, & Jill Fitzgerald. New York: Guilford Press, 2005. 403 416. Vojak, Colleen, et al. New Spaces and Old Places: An Analysis of Writing Assessment Software. Computers and Composition 28 (2011): 97 111. Whithaus, Carl. Always Already: Automated Essay Scoring and Grammar-Checkers in College Writing Courses. Ericsson, Patricia Freitag, and Haswell, Richard H., Eds. Machine Scoring of Student Essays: Truth and Consequences. Logan, UT: Utah State University Press, 2006. 166-176. Print.

When Machines Grade,Who Reads?Automation and the Production and Consumption of WritingMike Edwards@preterite#cwcon #g6DeautomationAutomationCapitalLaborEconomicsTechnologyFuturesDeautomationAutomationCapitalLaborEconomicsTechnologyFuturesde-automated labor of readingdidacticsmodelingfinding topicsenculturationself-discoveryinquiryanalysisengaging difficultyapprenticeshipto count and quantize and match patternsDeautomationAutomationCapitalLaborEconomicsTechnologyFuturesAES systems dont understand what they readreplace labor-intensive processeswith capital-intensive processespurpose of essay as laborpurpose of essay as capitalpurpose of assessment as laborpurpose of assessment as capitalno increased value for student learningDeautomationAutomationCapitalLaborEconomicsTechnologyFuturesWilliam & Flora Hewett ASAP prize$200,000OReilly Strata ConferenceNatural Language Processingcapital cannot idleadministrators like quantificationtechnological capital: 1x investmentpedagogical labor: ongoing investmentDeautomationAutomationCapitalLaborEconomicsTechnologyFuturesCollege English 1947: Arthur Coon,An Economic X Marks the Spot.Computers & Composition 1994: Carolyn Dowling, The Ongoing Difficulty of Writing.over the past few decades,student essays have gotten longerlabors value measuredby time and volumeDeautomationAutomationCapitalLaborEconomicsTechnologyFuturesAggregation Problem (Piero Sraffa)Cambridge Capital ControversyKeynesian, Marxian, Neoclassicalinvestigations of AES algorithms:who produces, distributes, uses, re-produces;how value is appropriated in assessmentDeautomationAutomationCapitalLaborEconomicsTechnologyFuturesPiketty, Thomas. Capital in the Twenty-First Century. Cambridge, MA: Belknap, 2014.a redistribution of income away from laborand toward holders of capitaltechnological innovation and replacing laborwith capital contributes to rising inequalitytechnological casualization ofthe labor of writing instruction DeautomationAutomationCapitalLaborEconomicsTechnologyFutureslabor is objectively scarceas time-metered (C-M-C) commoditywhat are the comparative measures of increase in G and R for students & instructors? [email protected] Machine Has SpokenAutomated Invention in an Era of Automated AssessmentMatthew Frye ([email protected])PhD Student, Washington State UniversityMachine Loyalist-Greetings (0:15)47OverviewShort, qualitative study (12 participants)What happens when machines write and humans evaluate?Are our programs showing progress towards understanding writing?What I didScare quotes because I dont want to debate issues of who/what is doing the understanding or what understanding really is, just whether or not we see evidence of machines approximating it48Overview

Re: Perelmans Basic Automatic BS Essay Language (BABEL) Generator

Weve gotten really good at showing that we can fool the machine evaluations

Can the machines fool us?Chronicle of Higher Ed. April 28, 2014

-Brief discussion of Narrative Science and the Quill program (0:45)50A Frameworkhttp://mylearningnetwork.com/wp-content/uploads/2014/04/bloom-taxonomy-pyramid.png

Blooms Revised Taxonomy (Anderson et al., 2000)

If were going to trust automation, we want it to behavesimilar toa human- Taxonomy- Emphasize similar to because a machine is not a human (insert Fry and/or Neo?)51We like when machineshttp://mylearningnetwork.com/wp-content/uploads/2014/04/bloom-taxonomy-pyramid.png

Identify words (ctrl F)Spell CheckHartley & Tynjl 2001. Four levels of composition programsLevel 1: Simple word processors. Can identify words and sentences. Biggest feature is word-wrapping. Corresponds with Blooms Remembering/IdentificationLevel 2: MS Word, ca. 1990s. Can identify adherence to and deviance from formulaic language and structures. (Spelling and grammar check.) Corresponds with Blooms Understanding/ComprehensionLevel 3: Modern Word and concordancers. Can do deeper analysis of writing for themes, trends, structures, etc. Cant really create anything, and cant evaluate beyond the parameters set out by the user. Corresponds with Blooms Applying and Analyzing steps.Level 4: Mythical composing environments. Just being proposed and theorized at the time of Hartley & Tynjls writing. Something like Scrivener.52Were OK when machineshttp://mylearningnetwork.com/wp-content/uploads/2014/04/bloom-taxonomy-pyramid.png

Identify words (ctrl F)Spell CheckAutoCompleteWord SuggestionMeta-analysisButhttp://mylearningnetwork.com/wp-content/uploads/2014/04/bloom-taxonomy-pyramid.png

Identify words (ctrl F)Spell CheckAutoCompleteWord SuggestionMeta-analysisThe items below do not an evaluation make!!-Machines can do first four levels, but evaluation is beyond them-First four levels are not accretive. They do not suddenly manifest in the ability to evaluate-We need a new way of understanding whats happening between Analyzing and Evaluating54A FrameworkThe current was too strong!The currant was too strong![Article + noun] [verb] [article + indirect object]

Current! Too strong!We can see in a lot of grief over statements like theseTwo are easily understood by machines. Humans understand all three.

55Radical RestructuringAccording to [the radical restructuring view of the expert/novice shift], the novice does not simply have an impoverished knowledge base compared to that of the expert; the novice has a different theory, different in terms of its structure, in the domain of phenomena it explains, and in its individual concepts.(Vosniadou, S. & Brewer, W. F. ,1987, p. 54)-Vosniadou and Brewers expansion of Piagetian learning theory provides a good lens to approach the problem. It isnt that computers know less about writing, its that the theory/construct of good writing is different from what experts/humans have.56Radical RestructuringAccording to [the radical restructuring view of the human /computer shift], the computer does not simply have an impoverished knowledge base compared to that of the human ; the computer has a different theory, different in terms of its structure, in the domain of phenomena it explains, and in its individual concepts.(Vosniadou, S. & Brewer, W. F. ,1987, p. 54)Humans and computers read the statements differentlyHumans read for the meaningComputers read for deviance from accepted linguistic structures57A StudyTwo PassagesOne written by a human, one by computerJournalistic report genreLocal issuesTwelve readersExperts (composition teachers)Novices (people who are literate)-Background of passages and participant pool58Left Passage: BaseballPutnam City North falls to Stillwater 6-4 in Spite of Brantleys performanceNathan Brantley did all he could to give Putnam City North a boost, but it wasnt enough to get past Stillwater, as Putnam City North lost 6-4 in seven innings at Sante Fe on Wednesday.It was a good day at the plate for Putnam City Norths Brantley. Brantley went 2-3, drove in one and scored one run. He singled in the second and sixth innings.Hunter Heffington had an impressive outing against Stillwaters lineup. Heffington held Stillwater hitless over 1 2/3 innings, allowed no earned runs, walked none and struck out two.The top of the second saw Putnam City North take an early lead, 1-0. Brantley singled to ignite Putnam City Norths offensive. Blake Seibert singled, bringing home Heffington.Stillwater went up for good in the third, scoring three runs on a two-run double by Joe Smith and an error.Stillwater built upon its lead with three runs in the fifth. Scott Williams started the inning with a double, plating Dan Johnson and Eric Welch. That was followed up by George Segals double, plating Williams.Three runs in the top of the sixth helped Putnam City North close its deficit to 6-4. An RBI single by Brantley, a groundout by Heffington, and a steal of home by Brantley triggered Putnam City Norths comeback. Chuckie Lundeen struck out to end the Putnam City North threat.

Passages included just in case handouts fail59Right Passage: School PerformanceEast City Senior High SchoolEast City Senior High School, part of the Community Schools of East City district, is located in East City, Ind. The school reports enrolling 860 students in grades nine through 12, and it has 63 teachers on staff.East City Senior High School is above the state average but below the district average in terms of the percentage of students eligible for free or reduced-price lunches. On average, 43 percent of students in Indiana are eligible for free or reduced-price lunch programs, whereas 52 percent of East City Senior High School students do. At the district level, 62 percent of students are eligible.ProPublicas analysis found that all too often, states and schools provide poor students fewer educational programs like Advanced Placement, gifted and talented programs, and advanced math and science classes. Studies have linked participation in these programs with better outcomes later in life. Our analysis uses free and reduced-price lunch to estimate poverty at schools. We based our findings on the most comprehensive data set of access to advanced classes and special programs in U.S. public schools known as the Civil Rights Data Set released by the U.S. Department of Education Office for Civil Rights.East City Senior High School offers 10 AP courses, and 8 percent of students participate in those classes.The schools pass rate for AP exams is the same as the districts, both at 23 percent.A schools AP pass rate is determed by the number of students who both sat for A exams and passed some or all of those exams.East City Senior High Schools enrollment rates in chemistry, physics, and advanced math subject areas are 4 percent, 2 percent and 7 percent, respectively. Gifted and talented at the school has an enrollment rate of 23 percent.West City Community High School, in West City, Ind., is a lower-poverty school than East City Senior High School, with 4 percent of its students qualifying for free or reduced-price lunch. The school offers 20 AP courses, and 38 percent of students are enrolled in those classes.Qualitative DataReaders read passagesQ1: Which was written by a machine?Q2: Why do you think that was the computer-written passage?Q3: Do you have any comments about one passage or the other?-I wanted to prime participants towards explaining their choice, rather than using the analysis to think through the problem, as my probing questions might have influenced their responses61ResponsesExpert 1This seems like a trickResponsesExpert 1Chose Passage 1 because the paragraphs were all roughly the same length, which seemed more mechanical.ResponsesExpert 3Questioned whether both were meant to be AP stylePassage 2 is not AP: too much text, royal we, no inverted pyramidPersonal pronouns in the second, not newspaperyComplex sentence structure [in schools article] marks it as humanResponsesNovice 1The last reason is that, though both are mostly a regurgitation of statistics, the [baseball] article is set up to tell a story. 'This guy did this, then that guy did that, which was great, but unfortunately not enough to win.' The [school article] was just a jumble of stats that had no theme or coherence and the last paragraph seemed completely unrelated.ResponsesNovice 5The [school] passage contains what I believe to be numerous grammatical errors, which to my knowledge many programs are able to detect and eliminate. The first passage appears to be grammatically correct, but reads exceptionally flat and linear. This is either terrible sports writing or a heartless robot.ResponsesExpert 4The title of the first one in spite of seems very human to me.ThemesReaderPrimary Concern(s)Expert 1Sentence Length/Mechanical StyleExpert 2Genre AwarenessExpert 3Style and FormattingExpert 4Human lexical bundlesExpert 5Stilted languageExpert 6Only identificationExpert 7Only identificationNovice 1ThesisNovice 2Filler/ErrorsNovice 3Numerical textNovice 4LengthNovice 5Errors-Primary concerns tended to be surface issues, but no other categorical signals towards machine writing68ThemesExperts and novices tended to identify one trait that marked a passage as human or computer writtenSpent much of analysis justifying that identification-Humans read linearly, every new idea is tested against whats already been read-I know Wiegle discusses this, but I cant find the original research69ThemesExperts and novices tended to identify one trait that marked a passage as human or computer writtenSpent much of analysis justifying that identificationMirrors earlier research that found human writing evaluators make a judgment early in a piece, then justify (cf. Weigle, 2002)Good GuessGroup% Correct% IncorrectExperts (n =7)57%43%Novices (n = 5)40%60%Total50%50%-Analyses and justifications are as good as a coin-toss for either group-Large % differences within groups are a difference of one participant71Stats Reader CommentsBaseballSchool PerformanceWords226342Sentences1616 Word/Sentence15.2 (max = 37, min = 7, sd = 7.4)23.1 (max = 41, min = 13, sd = 8.3)T-units1719 Word/t-unit14.3 (max = 23, min = 7, sd = 5.0)18.9 (max = 41, min =7 , sd = 9.8)Paragraphs78 Sentence/Para2 (max = 3, min = 1, sd = .95)2 (max = 4, min = 1, sd = 1.1)Passive Voice %6% (1)0%Pronouns75Prepositional Phrases24 (1.5/sentence)38 (2.4/sentence)-Structurally, these are very similar72Stats Word RarityBaseballSchool PerformanceTop 100058.6% (136)55.5% (202)Top 300070.7% (164)73.9% (269)Less Common29.3% (68)26.1% (95)*1000 and 3000 Most Common US English words from Paul Nolls online ESL Resources (paulnoll.com/books/clear-english Counts include proper nouns Rarity counted by word frequency in newspapers/magazines No significant difference in vocabulary levels-Structurally, these are very similar73The Machine Has SpokenRaise your hand if you thinkPutnam City North was writtenby a machineThe Machine Has SpokenRaise your hand if you thinkEast City Senior High School was writtenby a machineThe Machine Has Spokenand it was talking about East City Senior High School-Can the experts in the room get it?76The Machine Has SpokenEmerging QuestionsIf machines can write like humans, can we train them to read better?If machines can write meaningful prose, what is the purpose of the human writer?-The study raised more questions than it answered, but it revealed that writing instructors will need to be ready for this tool on the level that we needed to be ready for easy internet access twenty years ago77AporiaI have no answersMore complications and questions than anything

AporiaI have no answersMore complications and questions than anything

What does this mean for teachers of writing?For doers of writing?AporiaI have no answersMore complications and questions than anything

Dilbert.com (Jun 4, 2014)AporiaI have no answersMore complications and questions than anything

How complex is the construct of writing, really?

AporiaI have no answersMore complications and questions than anything

How complex is the construct of writing, really?

The programmers seem to be getting closeAporiaI have no answersMore complications and questions than anything

How will access to this technology spread?

AporiaI have no answersMore complications and questions than anything

How will access to this technology spread?

Monetary cost Availability Pirated/Unauthorized copies Knock-off softwareReferencesAnderson, L. W., Krathwohl, D. R., Airasian, P. W., Cruikshank, K. A., Mayer, R. E., Pintrich, P. R., Raths, J., Wittrock, M. C. (2000).A Taxonomy for Learning, Teaching, and Assessing: A revision of Bloom's Taxonomy of Educational Objectives. New York: Pearson, Allyn & Bacon.Hartley, J. & Tynjl, P. (2001). New technology, writing, and learning. In G. Rijlaarsdam (Series ed.) & P. Tynjl, L. Mason & K. Lonka (Volume eds.), Studies in Writing, Volume 7, Writing as a Learning tool: Integrating Theory and Practice. 161-182Kulowich, S. (2014). Writing instructor, skeptical of automated grading, pits machine vs. machine. Chronicle of Higher Education. Retrieved at http://chronicle.com/article/Writing-Instructor-Skeptical/146211/Vosniadou, S. & Brewer, W. F. (1987). Theories of knowledge restructuring in development. Review of Educational Research, 57(1). 51-67Weigle, S. C. (2002.) Assessing Writing. Cambridge: Cambridge University Press.