1 Fair Use Agreement This agreement covers the use of this presentation, please read carefully. • You may freely use these slides for teaching, if • You send me an email telling me the class number/ university in advance. • My name and email address appears on the first slide (if you are using all or most of the slides), or on each slide (if you are just taking a few slides). • You may freely use these slides for a conference presentation, if • You send me an email telling me the conference name in advance. • My name appears on each slide you use. • You may not use these slides for tutorials, or in a published work (tech report/ conference paper/ thesis/ journal etc). If you wish to do this, email me first,
Fair Use Agreement. This agreement covers the use of this presentation, please read carefully. You may freely use these slides for teaching, if You send me an email telling me the class number/ university in advance. - PowerPoint PPT Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
Fair Use Agreement
This agreement covers the use of this presentation, please read carefully. • You may freely use these slides for teaching, if
• You send me an email telling me the class number/ university in advance.• My name and email address appears on the first slide (if you are using all or most of the slides), or on each slide (if you are just taking a few slides).
• You may freely use these slides for a conference presentation, if • You send me an email telling me the conference name in advance.• My name appears on each slide you use.
• You may not use these slides for tutorials, or in a published work (tech report/ conference paper/ thesis/ journal etc). If you wish to do this, email me first, it is highly likely I will grant you permission.
(C) {Ken Ueno, Eamonn Keogh, Xiaopeng Xi}, University of California, Riverside
2
Ken UenoKen Ueno Toshiba Corporation, Japan Toshiba Corporation, Japan (( Visiting PostDoc Researcher at UC Riverside Visiting PostDoc Researcher at UC Riverside ))
Xiaopeng XiXiaopeng XiEamonn KeoghEamonn Keogh
Dah-Jye Lee Dah-Jye Lee Brigham Young University, U.S.A. Brigham Young University, U.S.A.
Anytime Classification Using the Nearest Neighbor Algorithm with Applications to Stream Mining
University of California, Riverside, U.S.A.University of California, Riverside, U.S.A.
Draft ver. 12/12/2006
3
Outline of the Talk
1. Motivation & BackgroundUsefulness of the anytime nearest neighbor classifier for real world applications including fish shape recognition.
2. Anytime Nearest Neighbor Classifier (ANNC)
3. SimpleRank, the critical ordering method for ANNCHow can we convert conventional nearest neighbor classifier
into the anytime version? What’s the critical intuition?
4. Empirical Evaluations
5. Conclusion
4
Case Study: Fish Recognition- Application for Video Monitoring System -
Time intervals tend to vary among fish appearances
0 500 1000 1500 2000 2500 300098
98.5
99
99.5
100
Number of instances seen before interruption, S
accu
racy
(%)
Random Test
SimpleRank Test
0 500 1000 1500 2000 2500 300098
98.5
99
99.5
100
Number of instances seen before interruption, S
accu
racy
(%)
Random Test
SimpleRank Test
2.0 sec 27.0 sec
Preliminary experiments with Rotation-Robust DTW [Keogh 05]
When will it be finished? Challenges for Data Mining
in Real World Applications. Accuracy / Speed Trade Off Limited memory space Real time processing
Best-so-far Answer Available anytime?
Medical Diagnosis
Fish MigrationBiological Shape Recognition
Motion Search
Multimedia Intelligence
6
Anytime Algorithms Trading execution time for quality of results. Always has a best-so-far answer available. Quality of the answer improves with execution time. Allowing users to suspend the process during
execution, and keep going if needed.
Time
Qua
lity
of
Sol
utio
n Current Solution
Setup Time
STime
Qua
lity
of
Sol
utio
n Current Solution
Setup Time
S1. Suspend
3. ContinueIf you want2. Peek the results
7
Anytime Characteristics Interruptability
After some small amount of setup time, the algorithm can be stopped at anytime and provide an answer
MonotonicityThe quality of the result is a non-decreasing function of computation time
Diminishing returnsThe improvement in solution quality is largest at the early stages of computation,
and diminishes over time Measurable Quality
The quality of an approximate result can be determined Preemptability
The algorithm can be suspended and resumed with minimal overhead
[Zilberstein and Russell 95]
Time
Qua
lity
of
Sol
utio
n Current Solution
Setup Time
STime
Qua
lity
of
Sol
utio
n Current Solution
Setup Time
S
8
Bumble Bee’s Anytime Strategy
Lars Chittka, Adrian G. Dyer, Fiola Bock, Anna Dornhaus, Nature Vol.424, 24 Jul 2003, p.388
To survive I can perform the best judgment
for finding real nectarslike “anytime learning” !
“Bumblebees can choose wisely or rapidly, but not both at once.”
Big Question: How can we make classifiers wiser / more rapid like bees?
9
Nearest Neighbor Classifiers
[Reasons] To the best of our knowledge there is
no “Anytime Nearest Neighbor Classifier” so far. Inherently familiar with similarity measures. Easily handle time series data by using DTW. Robust & accurate
Anytime Algorithm + Lazy Learning
10
Nearest Neighbor Classifiers
Instance-based, lazy classification algorithm based on training exemplars.
Giving the class label of the closest training exemplars with unknown instance based on a certain distance measure.
As for k-Nearest Neighbor (k-NN) we give the answer by voting.
.)(,argmax)(ˆ1
k
ii
vq xfvxc
V
:
:
:)(ˆ
:
:
1
k
xc
xxx
x
q
ki
q
V
a query instance
the k instances
estimated class of qx
a set of class labels
# of nearest neighbors
.0
1),(
otherwise
baifba
How can we convert it into anytime algorithm?
11
Designing the anytime Nearest Neighboranytime Nearest Neighbor
12345678910111213141516171819202122
Function [best_match_class]= Anytime_Classifier (Database, Index, O)best_match_val = inf;best_match_class = undefined;For p = 1 to number_of_classes(Database) D = distance(Database.object(Indexp) , O); If D < best_match_val best_match_val = D; best_match_class = Database.class_label(Indexp); EndEndDisp(‘The algorithm can now be interrupted’);p = number_of_classes(Database) + 1;While (user_has_not_interrupted AND p < max(index) ) D = distance(Database.object(Indexp) , O); If D < best_match_val best_match_val = D; best_match_class = Database.class_label(Indexp); End p = p +1; user_has_not_interrupted = test_for_user_interrupt;End
Plug-in design for any ordering method
InitialStep
Interruptiblestep
(Constant Time)
12
Tentative Solution for good ordering
Ordering Training Data is critical. Critical points for classification results best first or worst last?
put non-critical points last. Numerosity Reduction can partially be the good
ordering solutions. The problem is very similar to ordering problem for anytime algorithms.
Leave-one-out (k=1) within training data
Numerosity Reduction: S must be decidable before classification
Anytime Preprocessing: S does not need to be decidable before classification
Keypoint: in terms of interrupting time SStatic Dynamic
13
JF:two-class classification problem
-2 -1 0 1 2
-2
-1
0
1
2
-2 -1 0 1 2
-2
-1
0
1
2
Class AClass A
Class BClass B
2-D Gaussian ball
Hard to classify correctly because of the round shape.
We need non-linear and fast-enough classifier.
.
))(())((
2
1)(
222
2
)(2
2
otherwiseBClass
rymeanyxmeanxifAClass
exfmx
14
We cannot use DP for JF problem
I II IIII II III
Dynamic Programming (DP) ans(n-1) ans(n)
Ideal Tessellations heavily depend on entire feature space.
Captures the entire classification boundaries in the early stage.
DP is locally optimal.
15
Numerosity reduction
Scoring strategy: similar to Numerosity Reduction Random Ranking (baseline) DROP Algorithms [Wilson and Martinez 00] Weighting based on enemies / associates for Nearest Neighbor
NaïveRank Algorithms Sorting based on leave-one-out with 1-Nearest Neighbor
16
SimpleRank Ordering
otherwise )1__/(2
)class( )(class if 1)(
j
j
classofnum
xxxrank
Penalizing the close instance with the different class label.
Adjust the penalty weights with regard to the num. of Classes
Observation 1
Observation 2
NaiveRank Anytime Framework + SimpleRank
1. order training instances by the unimportance measure2. sort it in reverse order.
based on NaïveRank Algorithm [Xi and Keogh 06] Sorting by leave-one-out with 1-Nearest Neighbor
17
How SimpleRank works.Ranking process on JF Datasetby Simple Rank
Movie ( T = 1 … 50 )
-2 -1 0 1 2
I-2
-1
0
1
2
-2 -1 0 1 2
II-2 -1 0 1 2
I-2
-1
0
1
2
-2 -1 0 1 2
I-2
-1
0
1
2
-2 -1 0 1 2
II-2 -1 0 1 2
II
SimpleRank Random Rank (baseline)
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
Click here to start movie
T=10
wrong class estimation area
Voronoi Tessellation on JF Dataset
18
Empirical Evaluations
Name #
classes# features # instances Evaluation Data Type
JF 2 2 20,000 2,000/18,000 Real (synthetic)
Australian Credit 2 14 690 10-fold CV Mixed
Letter 26 16 20,000 5,000/15,000 Real
Pen Digits 10 16 10,992 7,494/3,498 Real
Forest Cover Type 7 54 581,012 11,340/569,672 Mixed
Ionosphere 2 34 351 10-fold CV Real
Voting Records 2 16 435 10-fold CV Boolean
Two Patterns 4 128 5,000 1,000/4,000 time series
Leaf 6 150 442 10-fold CV time series
Face 16 131 2,231 1,113/1,118 time series
All of the datasets are public and available for everyone! UCI ICS Machine Learning Data Archive UCI KDD Data Archive UCR Time Series Data Mining Archive
fair evaluations based on diverse kinds of datasets
19
0 50 100 150 200 250 300 350
90
100
Number of instances seen before interruption, S
acc
ura
cy(%
)
Random Test
SimpleRank Test
BestDrop Test
K=1: Voting Records
SimpleRank
BestDrop
RandomRank
10-fold Cross Validation, Euclidean
20
K=1: Forest Cover Type
0 2000 4000 6000 8000 10000 1200030
35
40
45
50
55
60
65
70
SimpleRank, k=1
Random Rank, k=1
Acc
urac
y (%
)
# of instances seen before interruption
21
K=1,3,5 Australian Credit
0 100 200 300 400 500 60040
45
50
55
60
65
70
75
80
85
90
K=1K=3K=5
10-CV, Euclidean
Australian Credit datasetAcc
urac
y (%
)
# of instances seen before interruption
Preliminary Results in our experiments
22
K=1 Two Patterns
- Time Series Data -
23
Future Research Directions
Make ordering+sorting much faster O(n log n) for sorting + α
Handling Concept Drift Showing Confidence
24
Conclusion and Summary
Our Contributions: - New framework for Anytime Nearest Neighbor.- SimpleRank: Quite simple but critically good ordering.
So far our method has achieved the highest accuracy in diverse datasets.
Demonstrates the usefulness for shape recognition in Stream Video Mining.
Good Job!This is the best-so-far ordering method familiar with anytime Nearest Neighbor!
25
Acknowledgments
Dr. Agenor Mafra-Neto, ISCA Technologies, IncDr. Geoffrey Webb, Monash UniversityDr. Ying Yang, Monash University Dr. Dennis Shiozawa, BYUDr, Xiaoqian Xua, BYUDr. Pengcheng Zhana, BYUDr. Robert Schoenberger, Agris-Schoen Vision Systems, IncJill Brady, UCRNSF grant IIS-0237918