Top Banner
A Comparison of the Quality of Data-driven Programming Hint Generation Algorithms Thomas W. Price 1 Rui Zhi 1 Yihuan Dong 1 Tiffany Barnes 1 Nicholas Lytle 1 1 North Carolina State University 2 Bielefeld University June 27 th , 2019 - AIED Veronica Cateté 1 Benjamin Paaßen 2 PRICE ET AL. – COMPARISON OF DATA-DRIVEN HINT GENERATION ALGORITHMS 1
36

A Comparison of the Quality of Data-driven Programming ... · PRICE ET AL. – COMPARISON OF DATA-DRIVEN HINT GENERATION ALGORIT HMS. 13. Solution Space (one problem) T-SNE embedding

Jul 28, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A Comparison of the Quality of Data-driven Programming ... · PRICE ET AL. – COMPARISON OF DATA-DRIVEN HINT GENERATION ALGORIT HMS. 13. Solution Space (one problem) T-SNE embedding

A Comparison of the Quality of Data-driven ProgrammingHint Generation Algorithms

Thomas W. Price1 Rui Zhi1Yihuan Dong1 Tiffany Barnes1Nicholas Lytle1

1North Carolina State University 2Bielefeld UniversityJune 27th, 2019 - AIED

Veronica Cateté1Benjamin Paaßen2

PRICE ET AL. – COMPARISON OF DATA-DRIVEN HINT GENERATION ALGORITHMS 1

Page 2: A Comparison of the Quality of Data-driven Programming ... · PRICE ET AL. – COMPARISON OF DATA-DRIVEN HINT GENERATION ALGORIT HMS. 13. Solution Space (one problem) T-SNE embedding

Programming Hints

PRICE ET AL. – COMPARISON OF DATA-DRIVEN HINT GENERATION ALGORITHMS 2

iSnap (Price 2017)

1. On-demand

2. Next-step, edit-based

3. Data-driven

Page 3: A Comparison of the Quality of Data-driven Programming ... · PRICE ET AL. – COMPARISON OF DATA-DRIVEN HINT GENERATION ALGORIT HMS. 13. Solution Space (one problem) T-SNE embedding

Programming Hints

PRICE ET AL. – COMPARISON OF DATA-DRIVEN HINT GENERATION ALGORITHMS 3

iSnap (Price 2017) ITAP (Rivers 2017)

Page 4: A Comparison of the Quality of Data-driven Programming ... · PRICE ET AL. – COMPARISON OF DATA-DRIVEN HINT GENERATION ALGORIT HMS. 13. Solution Space (one problem) T-SNE embedding

Programming HintsIn the domain of programming, access to hints can:◦ Improve post-test performance and efficiency (Corbett 2001)

◦ Improve future performance (under some circumstances) (Marwan 2019, forthcoming)

Data-driven techniques could make hints scalable, adaptive◦ Since 2008, over 25 papers on data-driven programming hints◦ Evaluations focus on availability of hints, not quality (e.g. Peddycord 2014; Rivers 2017)

Not all programming hints are created equal (Price 2017)

◦ The quality of data-driven programming hints can vary considerably◦ Even one low-quality hint can deter students from requesting future hints

PRICE ET AL. – COMPARISON OF DATA-DRIVEN HINT GENERATION ALGORITHMS 4

Page 5: A Comparison of the Quality of Data-driven Programming ... · PRICE ET AL. – COMPARISON OF DATA-DRIVEN HINT GENERATION ALGORIT HMS. 13. Solution Space (one problem) T-SNE embedding

Proposed Contributions1. Methods: QUALITYSCORE: A procedure for comparing the quality of

hint generation approaches, that is validated and reusable

2. Results: a) An evaluation of six hint generation algorithms on multiple datasets and

multiple programming languages.b) Insight into current strengths and limitations of these algorithms.

3. Data: All data and code needed to rate a new algorithm available at: go.ncsu.edu/hint-quality-data

PRICE ET AL. – COMPARISON OF DATA-DRIVEN HINT GENERATION ALGORITHMS 5

Page 6: A Comparison of the Quality of Data-driven Programming ... · PRICE ET AL. – COMPARISON OF DATA-DRIVEN HINT GENERATION ALGORIT HMS. 13. Solution Space (one problem) T-SNE embedding

Data-Driven Hints Generation AlgorithmsOVERVIEW OF THE SIX ALGORITHMS COMPARED

PRICE ET AL. – COMPARISON OF DATA-DRIVEN HINT GENERATION ALGORITHMS 6

AlgorithmsMethodsResultsDiscussion

Page 7: A Comparison of the Quality of Data-driven Programming ... · PRICE ET AL. – COMPARISON OF DATA-DRIVEN HINT GENERATION ALGORIT HMS. 13. Solution Space (one problem) T-SNE embedding

PRICE ET AL. – COMPARISON OF DATA-DRIVEN HINT GENERATION ALGORITHMS 7

Solution Space (one problem)T-SNE embedding of iSnap data (Paaßen 2018)

Traces(student attempts)

Data-driven Hint GenerationInputs:◦ Correct Solutions (training data)

Page 8: A Comparison of the Quality of Data-driven Programming ... · PRICE ET AL. – COMPARISON OF DATA-DRIVEN HINT GENERATION ALGORIT HMS. 13. Solution Space (one problem) T-SNE embedding

PRICE ET AL. – COMPARISON OF DATA-DRIVEN HINT GENERATION ALGORITHMS 8

Solution Space (one problem)T-SNE embedding of iSnap data (Paaßen 2018)

Snapshots

Data-driven Hint GenerationInputs:◦ Correct Solutions (training data)

Page 9: A Comparison of the Quality of Data-driven Programming ... · PRICE ET AL. – COMPARISON OF DATA-DRIVEN HINT GENERATION ALGORIT HMS. 13. Solution Space (one problem) T-SNE embedding

PRICE ET AL. – COMPARISON OF DATA-DRIVEN HINT GENERATION ALGORITHMS 9

Solution Space (one problem)T-SNE embedding of iSnap data (Paaßen 2018)

Start

Solutions

Data-driven Hint GenerationInputs:◦ Correct Solutions (training data)

Page 10: A Comparison of the Quality of Data-driven Programming ... · PRICE ET AL. – COMPARISON OF DATA-DRIVEN HINT GENERATION ALGORIT HMS. 13. Solution Space (one problem) T-SNE embedding

PRICE ET AL. – COMPARISON OF DATA-DRIVEN HINT GENERATION ALGORITHMS 10

Solution Space (one problem)T-SNE embedding of iSnap data (Paaßen 2018)

Data-driven Hint GenerationInputs:◦ Correct Solutions (training data)◦ Hint Request (purple)

Outputs:◦ Next suggested snapshot/edit

Page 11: A Comparison of the Quality of Data-driven Programming ... · PRICE ET AL. – COMPARISON OF DATA-DRIVEN HINT GENERATION ALGORIT HMS. 13. Solution Space (one problem) T-SNE embedding

PRICE ET AL. – COMPARISON OF DATA-DRIVEN HINT GENERATION ALGORITHMS 11

Solution Space (one problem)T-SNE embedding of iSnap data (Paaßen 2018)

Graph-based Approaches:◦ Follow prior students’ paths to a

solution

Page 12: A Comparison of the Quality of Data-driven Programming ... · PRICE ET AL. – COMPARISON OF DATA-DRIVEN HINT GENERATION ALGORIT HMS. 13. Solution Space (one problem) T-SNE embedding

PRICE ET AL. – COMPARISON OF DATA-DRIVEN HINT GENERATION ALGORITHMS 12

Solution Space (one problem)T-SNE embedding of iSnap data (Paaßen 2018)

Graph-based Approaches:

1. NSNLS: Next Step of Nearest Learner Solution (Gross 2014)

a) Find the closest partial student solution

b) Suggest the next step

Page 13: A Comparison of the Quality of Data-driven Programming ... · PRICE ET AL. – COMPARISON OF DATA-DRIVEN HINT GENERATION ALGORIT HMS. 13. Solution Space (one problem) T-SNE embedding

PRICE ET AL. – COMPARISON OF DATA-DRIVEN HINT GENERATION ALGORITHMS 13

Solution Space (one problem)T-SNE embedding of iSnap data (Paaßen 2018)

Graph-based Approaches:

1. NSNLS (Gross 2014)

2. CTD: Contextual Tree Decomposition (Price 2016)

a) Decompose the source code into subtrees

◦ E.g. All code inside a given if-statement

b) For each subtree, construct the solution space; suggest an edit

Page 14: A Comparison of the Quality of Data-driven Programming ... · PRICE ET AL. – COMPARISON OF DATA-DRIVEN HINT GENERATION ALGORIT HMS. 13. Solution Space (one problem) T-SNE embedding

PRICE ET AL. – COMPARISON OF DATA-DRIVEN HINT GENERATION ALGORITHMS 14

Solution Space (one problem)T-SNE embedding of iSnap data (Paaßen 2018)

Graph-based Approaches:

1. NSNLS (Gross 2014)

2. CTD (Price 2016)

3. ITAP (Rivers 2017)a) Identify the closest solutionb) Select a target statec) Suggest a single edit

Page 15: A Comparison of the Quality of Data-driven Programming ... · PRICE ET AL. – COMPARISON OF DATA-DRIVEN HINT GENERATION ALGORIT HMS. 13. Solution Space (one problem) T-SNE embedding

PRICE ET AL. – COMPARISON OF DATA-DRIVEN HINT GENERATION ALGORITHMS 15

Solution Space (one problem)T-SNE embedding of iSnap data (Paaßen 2018)

Graph-based Approaches:

1. NSNLS (Gross 2014)

2. CTD (Price 2016)

3. ITAP (Rivers 2017)

Solution-based Approaches:

4. TR-ER (Zimmerman 2015)

5. SourceCheck (Price 2017)a) Identify the closest solutionb) Suggest edits to get closer to

that solution

Page 16: A Comparison of the Quality of Data-driven Programming ... · PRICE ET AL. – COMPARISON OF DATA-DRIVEN HINT GENERATION ALGORIT HMS. 13. Solution Space (one problem) T-SNE embedding

PRICE ET AL. – COMPARISON OF DATA-DRIVEN HINT GENERATION ALGORITHMS 16

Solution Space (one problem)T-SNE embedding of iSnap data (Paaßen 2018)

Graph-based Approaches:

1. NSNLS (Gross 2014)

2. CTD (Price 2016)

3. ITAP (Rivers 2017)

Solution-based Approaches:

4. TR-ER (Zimmerman 2015)

5. SourceCheck (Price 2017)a) Identify the closest solutionb) Suggest edits to get closer to

that solution

Page 17: A Comparison of the Quality of Data-driven Programming ... · PRICE ET AL. – COMPARISON OF DATA-DRIVEN HINT GENERATION ALGORIT HMS. 13. Solution Space (one problem) T-SNE embedding

PRICE ET AL. – COMPARISON OF DATA-DRIVEN HINT GENERATION ALGORITHMS 17

Solution Space (one problem)T-SNE embedding of iSnap data (Paaßen 2018)

Graph-based Approaches:

1. NSNLS (Gross 2014)

2. CTD (Price 2015)

3. ITAP (Rivers 2017)

Solution-based Approaches:

4. TR-ER (Zimmerman 2015)

5. SourceCheck (Price 2016)

Machine Learning Approaches:

6. Continuous Hint Factory(Paaßen 2018)

a. Predicts how successful students would edit their code

Page 18: A Comparison of the Quality of Data-driven Programming ... · PRICE ET AL. – COMPARISON OF DATA-DRIVEN HINT GENERATION ALGORIT HMS. 13. Solution Space (one problem) T-SNE embedding

Method: QUALITYSCOREREUSABLE QUALITY METRIC FOR DATA-DRIVEN HINT GENERATION

PRICE ET AL. – COMPARISON OF DATA-DRIVEN HINT GENERATION ALGORITHMS 18

AlgorithmsMethodsResultsDiscussion

Page 19: A Comparison of the Quality of Data-driven Programming ... · PRICE ET AL. – COMPARISON OF DATA-DRIVEN HINT GENERATION ALGORIT HMS. 13. Solution Space (one problem) T-SNE embedding

DataiSnap (Price 2017)

◦ Novice programming environment◦ On-demand data-driven hints◦ 120 non-CS majors◦ Fall 2016 and Spring 2017

◦ 2 iSnap assignments◦ 10-13 lines of code◦ Loops, conditionals, variables, procedures

◦ Extracted 47 hint requests◦ One per student per problem◦ 23-24 per problem

ITAP (Rivers 2017)

◦ ITS for Python programming◦ On-demand data-driven hints◦ 89 students in introductory CS◦ Spring 2017

◦ 5 Python assignments◦ 2-5 lines of code◦ Loops, variables, string operations, arithmetic

◦ Extracted 51 hint requests◦ Up to two per student per problem◦ 7-14 per problem

PRICE ET AL. – COMPARISON OF DATA-DRIVEN HINT GENERATION ALGORITHMS 19

Page 20: A Comparison of the Quality of Data-driven Programming ... · PRICE ET AL. – COMPARISON OF DATA-DRIVEN HINT GENERATION ALGORIT HMS. 13. Solution Space (one problem) T-SNE embedding

Hints: A B

Alg. Hints A B C

Weight: 1/3 1/3 1/3

QUALITYSCORE: 1/3 + 1/3 = 0.67

QUALITYSCORE Calculation1. 3 tutors independently generated Gold Standard

hints for each hint request (e.g. Piech 2015)

◦ Any hint voted valid by 2 out of 3 tutors included in G.S.

2. An algorithm generates hints for each hint request◦ It assigns a confidence weight to each hint it generates,

summing to 1

3. Keep only hints which match a Gold Standard hint

4. QUALITYSCORE is the sum of the weights of the remaining hints

PRICE ET AL. – COMPARISON OF DATA-DRIVEN HINT GENERATION ALGORITHMS 20

def firstAndLast(s):s[10] + s[] def firstAndLast(s):

return s[1] + s[]

Page 21: A Comparison of the Quality of Data-driven Programming ... · PRICE ET AL. – COMPARISON OF DATA-DRIVEN HINT GENERATION ALGORIT HMS. 13. Solution Space (one problem) T-SNE embedding

Partial MatchesA hint is a partial match to the gold standard when:1. The hint suggests a subset of the edits of a gold standard hint2. At least one of these edits adds code

Examples (Gold Standard vs Generated Hint):return 'Hello World’ vs return __'Hello World'repeat(x * 4) vs repeat(x__ * __)return __ + __ vs return __ BinOp __

PRICE ET AL. – COMPARISON OF DATA-DRIVEN HINT GENERATION ALGORITHMS 21

Page 22: A Comparison of the Quality of Data-driven Programming ... · PRICE ET AL. – COMPARISON OF DATA-DRIVEN HINT GENERATION ALGORIT HMS. 13. Solution Space (one problem) T-SNE embedding

Validating the QUALITYSCOREWhy not just have the tutors rate hints directly (e.g. Price 2017)?◦ Advantage of QUALITYSCORE: We can scale this approach to any number of hint

generation algorithms◦ Concern: Does the QUALITYSCORE reflect human quality judgements?

Validation: Used QUALITYSCORE to rate 252 hints on the iSnap dataset, and asked 3 human tutors to do the same, come to consensus:◦ Agreement (Cohen’s kappa) between QUALITYSCORE and consensus: 0.78◦ Agreement each human tutor and consensus: 0.76, 0.78, 0.85◦ Conclusion: QUALITYSCORE is as valid as a single human rater

PRICE ET AL. – COMPARISON OF DATA-DRIVEN HINT GENERATION ALGORITHMS 22

Page 23: A Comparison of the Quality of Data-driven Programming ... · PRICE ET AL. – COMPARISON OF DATA-DRIVEN HINT GENERATION ALGORIT HMS. 13. Solution Space (one problem) T-SNE embedding

ResultsCOMPARISON OF HINT GENERATION ALGORITHM QUALITY

PRICE ET AL. – COMPARISON OF DATA-DRIVEN HINT GENERATION ALGORITHMS 23

AlgorithmsMethodsResultsDiscussion

Page 24: A Comparison of the Quality of Data-driven Programming ... · PRICE ET AL. – COMPARISON OF DATA-DRIVEN HINT GENERATION ALGORIT HMS. 13. Solution Space (one problem) T-SNE embedding

PRICE ET AL. – COMPARISON OF DATA-DRIVEN HINT GENERATION ALGORITHMS 24

Significant differences in ratings across algorithms (p < 0.001, both datasets):iSnap (full or partial): TR-ER < NSNLS, CHF < CTD < SourceCheck < TutorsPython (full matches): TR-ER, CTD < CHF, NSNLS < SourceCheck, ITAP < TutorsPython (partial matches): TR-ER < NSNLS, CHF < CTD, SourceCheck < ITAP, Tutors

Page 25: A Comparison of the Quality of Data-driven Programming ... · PRICE ET AL. – COMPARISON OF DATA-DRIVEN HINT GENERATION ALGORIT HMS. 13. Solution Space (one problem) T-SNE embedding

PRICE ET AL. – COMPARISON OF DATA-DRIVEN HINT GENERATION ALGORITHMS 25

Performance is consistent across the two problems in the iSnap dataset.

Page 26: A Comparison of the Quality of Data-driven Programming ... · PRICE ET AL. – COMPARISON OF DATA-DRIVEN HINT GENERATION ALGORIT HMS. 13. Solution Space (one problem) T-SNE embedding

PRICE ET AL. – COMPARISON OF DATA-DRIVEN HINT GENERATION ALGORITHMS 26

Performance is notconsistent across problems in the ITAP dataset.

Page 27: A Comparison of the Quality of Data-driven Programming ... · PRICE ET AL. – COMPARISON OF DATA-DRIVEN HINT GENERATION ALGORIT HMS. 13. Solution Space (one problem) T-SNE embedding

What makes hint generation hard?Some hint requests had lower-quality hints across algorithms. Why?

Hypotheses: Hint generation is more difficulty for…◦ Large Code: The more code a student has written◦ ✅ Supported: rs = 0.376 (iSnap) and 0.389 (ITAP); p < 0.01

◦ Divergent Code: The more unique a student’s code is compared to others’◦ ✅ Supported: rs = 0.356 (iSnap) and 0.432 (ITAP); p < 0.01

◦ Few Correct Hints: The fewer Gold Standard hints there are◦ ❌ Not supported: No significant correlation

PRICE ET AL. – COMPARISON OF DATA-DRIVEN HINT GENERATION ALGORITHMS 27

Page 28: A Comparison of the Quality of Data-driven Programming ... · PRICE ET AL. – COMPARISON OF DATA-DRIVEN HINT GENERATION ALGORIT HMS. 13. Solution Space (one problem) T-SNE embedding

What makes algorithms perform poorly?Some algorithms performed worse across hint requests. Why?

Hypotheses: Algorithms perform worse due to…◦ Unfiltered Hints: Algorithms suggest too many hints◦ ✅ Supported: rs = 0.437 (iSnap) and 0.487 (ITAP); p < 0.001◦ Algorithms generated more hints for larger code; humans did not

◦ Incorrect or Unhelpful Deletions: Many hints suggest deleting code only◦ ✅ Supported: Only 2.8% of generated deletion hints matched the gold standard◦ The best-performing algorithms did not suggest deletions (SourceCheck, ITAP)

PRICE ET AL. – COMPARISON OF DATA-DRIVEN HINT GENERATION ALGORITHMS 28

Page 29: A Comparison of the Quality of Data-driven Programming ... · PRICE ET AL. – COMPARISON OF DATA-DRIVEN HINT GENERATION ALGORIT HMS. 13. Solution Space (one problem) T-SNE embedding

Discussion

PRICE ET AL. – COMPARISON OF DATA-DRIVEN HINT GENERATION ALGORITHMS 29

AlgorithmsMethodsResultsDiscussion

Page 30: A Comparison of the Quality of Data-driven Programming ... · PRICE ET AL. – COMPARISON OF DATA-DRIVEN HINT GENERATION ALGORIT HMS. 13. Solution Space (one problem) T-SNE embedding

Top-performing AlgorithmsSourceCheck (iSnap) and ITAP (Python) performed the best◦ These algorithms were designed for their respective datasets◦ However, SourceCheck still performs well on Python, outperforms its predecessor CTD

The ranking of the algorithms is consistent across datasets

PRICE ET AL. – COMPARISON OF DATA-DRIVEN HINT GENERATION ALGORITHMS 30

Page 31: A Comparison of the Quality of Data-driven Programming ... · PRICE ET AL. – COMPARISON OF DATA-DRIVEN HINT GENERATION ALGORIT HMS. 13. Solution Space (one problem) T-SNE embedding

Algorithms vs Human TutorsAlgorithms are beginning to approach human-quality hints◦ ITAP performed 84% as well as human tutors on the Python dataset◦ However, this is only for the simpler dataset, counting partial matches

More complex assignments remain difficult◦ SourceCheck performed only half as well as human tutors on the iSnap

dataset◦ These assignments were longer (10-13 LOC vs 2-4) and more complex

PRICE ET AL. – COMPARISON OF DATA-DRIVEN HINT GENERATION ALGORITHMS 31

Page 32: A Comparison of the Quality of Data-driven Programming ... · PRICE ET AL. – COMPARISON OF DATA-DRIVEN HINT GENERATION ALGORIT HMS. 13. Solution Space (one problem) T-SNE embedding

Improving Hint QualityAddress current weaknesses:◦ More emphasis on selecting the right hint when multiple can be generated◦ Also suggested in prior work (Price 2017)

◦ Avoid hints to delete without adding code

Recognize when a hint is unlikely to be high quality◦ E.g., when the student’s code it unique

Evaluate the quality of new and existing algorithms

PRICE ET AL. – COMPARISON OF DATA-DRIVEN HINT GENERATION ALGORITHMS 32

Page 33: A Comparison of the Quality of Data-driven Programming ... · PRICE ET AL. – COMPARISON OF DATA-DRIVEN HINT GENERATION ALGORIT HMS. 13. Solution Space (one problem) T-SNE embedding

PRICE ET AL. – COMPARISON OF DATA-DRIVEN HINT GENERATION ALGORITHMS 33

Thank You! Questions?Contact: [email protected]◦ Have a programming dataset with hint requests?◦ Have a hint generation algorithm you would like to evaluate?◦ Data Available: go.ncsu.edu/hint-quality-data

Page 34: A Comparison of the Quality of Data-driven Programming ... · PRICE ET AL. – COMPARISON OF DATA-DRIVEN HINT GENERATION ALGORIT HMS. 13. Solution Space (one problem) T-SNE embedding

Secret Bonus Slides™

PRICE ET AL. – COMPARISON OF DATA-DRIVEN HINT GENERATION ALGORITHMS 34

Page 35: A Comparison of the Quality of Data-driven Programming ... · PRICE ET AL. – COMPARISON OF DATA-DRIVEN HINT GENERATION ALGORIT HMS. 13. Solution Space (one problem) T-SNE embedding

• Valid• Useful• Not confusing• One edit (if possible)

Gold Standard Hints

PRICE ET AL. – COMPARISON OF DATA-DRIVEN HINT GENERATION ALGORITHMS 35

Code History

Next-stepHints

H1.1 H1.2 H1.3

Tutor 1

H2.1 H2.2 H3.1 H3.2 H3.3 H3.4

Tutor 2

1

Tutor 3

11

Hint Request

Page 36: A Comparison of the Quality of Data-driven Programming ... · PRICE ET AL. – COMPARISON OF DATA-DRIVEN HINT GENERATION ALGORIT HMS. 13. Solution Space (one problem) T-SNE embedding

✓ ✓ ✓ ✓ ✓

✓ ✓ ✓ ✓ ✓ ✓

✓ ✓ ✓ ✓ ✓

Gold Standard Hints

PRICE ET AL. – COMPARISON OF DATA-DRIVEN HINT GENERATION ALGORITHMS 36

Each tutor rates each other tutor’s hints:

Any hint with at least 2 votes part of the gold standard: