Top Banner
Honeydew Progress Honeydew Team (3.30)
24
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 3 30(1)

Honeydew Progress

Honeydew Team (3.30)

Page 2: 3 30(1)

What we have done last week

• Implement Time Prediction– TimePropertyClassifier– TimePropertyGenerator

• Debug

• Some running results

• Slightly modify test cases– dummyclassifier classifier– add some boundary tests

Page 3: 3 30(1)
Page 4: 3 30(1)

Modifications• 1. generateTimeProperty is realized as Honeydewatime constructor in

HoneydewTime.java \• where the input arguments are formatedTime• 2. Add HoneydewTimeTest.java to test functions in HoneydewTime• 3. Delete getBestHour, getBestMinute and getBestSlot from

ITimePropertyGenerator.java• Since these three functions are useful only in test cases. • At this point, ITimePropertyGenerator is more suitable to be an interface rather than

an abastract class• 4. Modify getBestTime() return type to String formatted as "hh:mm AM"• 5. Add getBestHour, getBestMinute and getBestSlot to TimePropertyGenerator with

String the return type to test• 6. Implement functions in TimePropertyGenerator.java • 7. HOUR is modified to HOUR_VALUE, making it distinct from HOUR in

PROPERTY_NAMES

Page 5: 3 30(1)

Running results

Page 6: 3 30(1)

Running results

Page 7: 3 30(1)

Running results

Page 8: 3 30(1)

Next week plan

• Experiments

• Analysis on results

• Refactoring

Page 9: 3 30(1)

Tickets

Page 10: 3 30(1)

Experiment Result of TEA

Page 11: 3 30(1)

TEA Evaluation Wrapper

• TEA is stateful– Extract sent time as the reference time– Use current time if no sent time were found

Page 12: 3 30(1)

Evaluation Heuristics

• Heuristic I: pick the earliest time

• Heuristic II: pick a random time from the recognized time expressions

• Heuristic III: perfect oracle (upper bound). Check each recognized time expression against the ground truth

Page 13: 3 30(1)

Email Corpus

• 172 emails from last year’s experiment

• 320 more emails

• Manually went through all the emails, and labeled 215 of them

Page 14: 3 30(1)

Experiment Setup

• Skip emails that cause segmentation fault on the TEA package– TEA ran successfully on 165 emails (no

segmentation fault)

• Report accuracy (#emails correctly labeled by TEA/#total emails)

Page 15: 3 30(1)

Experimental Result

Heuristic I Heuristic II Heuristic III

#emails correctly labeled by TEA

19 30.33 80

Accuracy (%) 11.5152 18.3818 48.4848

Result of heuristic II is 100-run average

Page 16: 3 30(1)

Correct Predictions (Heuristic I) • prediction = 20070803T1100??, label = 20070803T110000• prediction = 20070803T??????, label = 20070803T100000• prediction = 20050418T??????, label = 20050418T154500• prediction = 20050404T1100??, label = 20050404T110000• prediction = 20050418T??????, label = 20050418T120000• prediction = 2007????T??????, label = 20070814T090000• prediction = 2007????T??????, label = 20070814T090000• prediction = 20071119T??????, label = 20071119T113000• prediction = 20050427T0930??, label = 20050427T093000

Page 17: 3 30(1)

Correct Predictions (Heuristic I)• prediction = 20080319T??????, label = 20080319T160000• prediction = 20050524T??????, label = 20050524T140000• prediction = 20050511T??????, label = 20050511T103000• prediction = 20050428T??????, label = 20050428T151500• prediction = 20080229T??????, label = 20080229T1330??• prediction = 20050512T??????, label = 20050512T0900??• prediction = 2007????T??????, label = 20070814T090000• prediction = 20070905T1100??, label = 20070905T110000• prediction = 20080306T1000??, label = 20080306T1000??• prediction = 20070830T1100??, label = 20070830T1100??

Page 18: 3 30(1)

Error Analysis

• 8 types of errors:– No time expression extracted– Wrong year– Wrong month– Wrong day– Wrong hour– Wrong minute– Wrong second– Misc

Page 19: 3 30(1)

Heuristic I

Type 1 (none)

2(year)

3(month)

4(day)

5(hour)

6(minute)

7(second)

8(misc)

# 33 19 31 97 42 36 1 0

Page 20: 3 30(1)

More Details (heuristic I)

• Only 1 error from type 7 (second):– 20050517t155632 20050521t120000

• Zero error from type 8 (misc):– Other 7 types cover all cases

• One instance may contribute errors to more than one types

Page 21: 3 30(1)

Error Analysis for Heuristic III

• For each meeting email– If one of the extracted time expression

matched the time label, count it as correct– If no matches are found, analyze each

extracted time expression to obtain error statistics

Page 22: 3 30(1)

Heuristic III

Type 1 (none)

2(year)

3(month)

4(day)

5(hour)

6(minute)

7(second)

8(misc)

# 33 137 147 267 276 235 95 0

Page 23: 3 30(1)

Case Closed for TEA

• Questions?

Page 24: 3 30(1)

Thank You