Transcript

Honeydew Progress

Honeydew Team (3.30)

What we have done last week

• Implement Time Prediction– TimePropertyClassifier– TimePropertyGenerator

• Debug

• Some running results

• Slightly modify test cases– dummyclassifier classifier– add some boundary tests

Modifications• 1. generateTimeProperty is realized as Honeydewatime constructor in

HoneydewTime.java \• where the input arguments are formatedTime• 2. Add HoneydewTimeTest.java to test functions in HoneydewTime• 3. Delete getBestHour, getBestMinute and getBestSlot from

ITimePropertyGenerator.java• Since these three functions are useful only in test cases. • At this point, ITimePropertyGenerator is more suitable to be an interface rather than

an abastract class• 4. Modify getBestTime() return type to String formatted as "hh:mm AM"• 5. Add getBestHour, getBestMinute and getBestSlot to TimePropertyGenerator with

String the return type to test• 6. Implement functions in TimePropertyGenerator.java • 7. HOUR is modified to HOUR_VALUE, making it distinct from HOUR in

PROPERTY_NAMES

Running results

Running results

Running results

Next week plan

• Experiments

• Analysis on results

• Refactoring

Tickets

Experiment Result of TEA

TEA Evaluation Wrapper

• TEA is stateful– Extract sent time as the reference time– Use current time if no sent time were found

Evaluation Heuristics

• Heuristic I: pick the earliest time

• Heuristic II: pick a random time from the recognized time expressions

• Heuristic III: perfect oracle (upper bound). Check each recognized time expression against the ground truth

Email Corpus

• 172 emails from last year’s experiment

• 320 more emails

• Manually went through all the emails, and labeled 215 of them

Experiment Setup

• Skip emails that cause segmentation fault on the TEA package– TEA ran successfully on 165 emails (no

segmentation fault)

• Report accuracy (#emails correctly labeled by TEA/#total emails)

Experimental Result

Heuristic I Heuristic II Heuristic III

#emails correctly labeled by TEA

19 30.33 80

Accuracy (%) 11.5152 18.3818 48.4848

Result of heuristic II is 100-run average

Correct Predictions (Heuristic I) • prediction = 20070803T1100??, label = 20070803T110000• prediction = 20070803T??????, label = 20070803T100000• prediction = 20050418T??????, label = 20050418T154500• prediction = 20050404T1100??, label = 20050404T110000• prediction = 20050418T??????, label = 20050418T120000• prediction = 2007????T??????, label = 20070814T090000• prediction = 2007????T??????, label = 20070814T090000• prediction = 20071119T??????, label = 20071119T113000• prediction = 20050427T0930??, label = 20050427T093000

Correct Predictions (Heuristic I)• prediction = 20080319T??????, label = 20080319T160000• prediction = 20050524T??????, label = 20050524T140000• prediction = 20050511T??????, label = 20050511T103000• prediction = 20050428T??????, label = 20050428T151500• prediction = 20080229T??????, label = 20080229T1330??• prediction = 20050512T??????, label = 20050512T0900??• prediction = 2007????T??????, label = 20070814T090000• prediction = 20070905T1100??, label = 20070905T110000• prediction = 20080306T1000??, label = 20080306T1000??• prediction = 20070830T1100??, label = 20070830T1100??

Error Analysis

• 8 types of errors:– No time expression extracted– Wrong year– Wrong month– Wrong day– Wrong hour– Wrong minute– Wrong second– Misc

Heuristic I

Type 1 (none)

2(year)

3(month)

4(day)

5(hour)

6(minute)

7(second)

8(misc)

# 33 19 31 97 42 36 1 0

More Details (heuristic I)

• Only 1 error from type 7 (second):– 20050517t155632 20050521t120000

• Zero error from type 8 (misc):– Other 7 types cover all cases

• One instance may contribute errors to more than one types

Error Analysis for Heuristic III

• For each meeting email– If one of the extracted time expression

matched the time label, count it as correct– If no matches are found, analyze each

extracted time expression to obtain error statistics

Heuristic III

Type 1 (none)

2(year)

3(month)

4(day)

5(hour)

6(minute)

7(second)

8(misc)

# 33 137 147 267 276 235 95 0

Case Closed for TEA

• Questions?

Thank You

top related