Top Banner
Maximizing Correctness with Minimal User Effort to Learn Data Transformations Bo Wu and Craig Knoblock University of Southern California 1 Department of Computer Sci
20

Maximizing Correctness with Minimal User Effort to Learn Data Transformations

Jan 27, 2017

Download

Data & Analytics

Bo Wu
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Maximizing Correctness with Minimal User Effort to Learn Data Transformations

1

Maximizing Correctness with Minimal User Effort to Learn Data Transformations

Bo Wu and Craig KnoblockUniversity of Southern California

Department of Computer Science

Page 2: Maximizing Correctness with Minimal User Effort to Learn Data Transformations

2

Art website Buyer

Page 3: Maximizing Correctness with Minimal User Effort to Learn Data Transformations

3

Dimension of artworks

Page 4: Maximizing Correctness with Minimal User Effort to Learn Data Transformations

4

Programming by Example

Video is from Excel YouTube official channel (https://www.youtube.com/watch?v=YPG8PAQQ894)

Page 5: Maximizing Correctness with Minimal User Effort to Learn Data Transformations

5

Too Many Records

Page 6: Maximizing Correctness with Minimal User Effort to Learn Data Transformations

6

Overconfident Users

Users are often too confident to examine the results thoroughly

Page 7: Maximizing Correctness with Minimal User Effort to Learn Data Transformations

7

Variations

Page 8: Maximizing Correctness with Minimal User Effort to Learn Data Transformations

8

Problem

Enable the users of PBE systems to achieve maximal correctness with minimal effort on large datasets

Help users to identify at least one of all incorrect records in every iteration with minimal effort on large datasets

Page 9: Maximizing Correctness with Minimal User Effort to Learn Data Transformations

Approach Overview

9

Raw Transformed

10“ H x 8” W 10

H: 58 x W:25” 58

12”H x 9”W 12

11”H x 6” 11

… …

30 x 46” 30 x 46

Entire dataset

RandomSampling

Raw Transformed

10“ H x 8” W 10

11”H x 6” 11

… …

30 x 46” 30 x 46

Sampled records

Verifying records

Raw Transformed

11”H x 6” 11

30 x 46” 30 x 46

… …

Sorting and color-codingRaw Transformed

30 x 46” 30 x 46

11”H x 6” 11

… …

Page 10: Maximizing Correctness with Minimal User Effort to Learn Data Transformations

10

Learning from users’ feedback

Page 11: Maximizing Correctness with Minimal User Effort to Learn Data Transformations

11

Verifying Records• First recommend records causing runtime

errors– Records cause the program exit abnormally

• Second recommend potentially incorrect records– Learn a binary meta-classifier

Input: 2008 Mitsubishi Galant ES $7500 (Sylmar CA) pic

Raw Transformed

11”H x 6” 11

30 x 46” 30 x 46

… …

Ex:

Page 12: Maximizing Correctness with Minimal User Effort to Learn Data Transformations

12

Learning the Meta-classifier

cs1

…Meta-classifier

cs2

cs4 cs3

cp1

cp2

cp3 cp4

cf1

cf2

cf3 cf4

Program agreement

Format ambiguity

Similarity

cs3

cs4

cp2

cf1

w1

w2

w3

w4

Page 13: Maximizing Correctness with Minimal User Effort to Learn Data Transformations

13

Evaluation

• The recommendation contains incorrect records

Page 14: Maximizing Correctness with Minimal User Effort to Learn Data Transformations

14

Evaluation• The recommendation can place incorrect

records on top

Page 15: Maximizing Correctness with Minimal User Effort to Learn Data Transformations

15

User studyExperiment setup:• 5 scenarios with 4000 records per scenario• 10 graduate students divided into two groups

Page 16: Maximizing Correctness with Minimal User Effort to Learn Data Transformations

16

Summary and Future Work

• Summary– Sample records– Identify incorrect/questionable records– Allow user to refine the recommendation– Color-code the results

• Future work– Show histograms of the data– Translate the program to readable natural text

Page 17: Maximizing Correctness with Minimal User Effort to Learn Data Transformations

17

Questions ?

Data and system available athttps://github.com/areshand/Web-Karma

Page 18: Maximizing Correctness with Minimal User Effort to Learn Data Transformations

18

Type of Classifiers

• Classifier based on distance• Classifier based on agreement of programs• Classifier based on format ambiguity

Page 19: Maximizing Correctness with Minimal User Effort to Learn Data Transformations

19

Learning from various past results

Raw Transformed

26" H x 24" W x 12.5 26

Framed at 21.75" H x 24.25” W 21

12" H x 9" 12

Raw Transformed

Ravage 2099#24 (November, 1994) November, 1994

Gambit III#1 (September, 1997) September, 1997

(comic) Spidey Super Stories#12/2 (September, 1975)

comic

Examples

Incorrectrecords

Correctrecords

Page 20: Maximizing Correctness with Minimal User Effort to Learn Data Transformations

20

Sorting Records

Runtime errors

Rank records using #failed_subprograms

Rank records using meta-classifier output

Yes

No

Checking transformed records

Record #failed_subprograms

2008 Mitsubishi Galant ES $7500 (Sylmar CA) pic 3

1998 Honda Civic 12k miles s. Auto. - $3800 (Arcadia) 2