This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Slide 1
Candidate: Parisa Rashidi Advisor: Diane J. Cook 1
Slide 2
Agenda Introduction Challenges Solutions Sequence mining Stream
mining Transfer Learning Active learning Results Conclusions &
future directions 2
Slide 3
Smart Homes Sensors & actuators integrated into everyday
objects Knowledge acquisition about inhabitant 3 Environment Agent
Percepts (sensors) Actions (controllers)
Slide 4
Applications Energy efficiency Security Achieving more comfort
Monitoring well-being of residents In home monitoring Monitor daily
activities Check for anomalies Help by giving prompts and cues
4
Slide 5
Activity Recognition A vital component of smart homes
Recognizing activities from stream of sensor events 5 An Activity
(Sequence of sensor events) A Sensor Event
Slide 6
Agenda Introduction Challenges Solutions Sequence mining Stream
mining Transfer learning Active learning Results Conclusions &
future directions 6
Slide 7
Why it is difficult? Human activity is erratic and complex
Discontinuous (interrupting events) Step order might vary each time
Inter-subject and intra-subject variability The algorithm should be
scalable Data annotation Costly and laborious Training for each new
space? 7
Slide 8
Unsolved Challenges Many methods proposed Hidden Markov models,
conditional random fields, nave Bayes, Current methods Consider
many simplifying assumptions Mostly are supervised Data annotation
problem Even if unsupervised Trained for each new setting from
scratch Ignore activity variations or interruptions 8
Slide 9
Agenda Introduction Challenges Solutions Sequence mining Stream
mining Transfer learning Active learning Results Conclusions &
future directions 9
Slide 10
Our Solutions Discovering complex activities Sequence mining
Discovery activities from stream Stream sequence mining
Transferring activity models to new spaces Transfer learning
Guiding activity annotation Active learning 10
Slide 11
Agenda Introduction Challenges Solutions Sequence mining Stream
mining Transfer learning Active learning Results Conclusions &
future directions 11
Slide 12
Sequence Mining Sequence Ordered set of items Examples Speech:
sequence of phonemes DNA sequence: AAGCTACGTAA Network: sequence of
packets Our data: sequence of sensor events Goal Finding repetitive
sequential patterns in data Many methods proposed GSP, PrefixSpan,
SPADE, 12
Slide 13
Activity Sequence Mining Problem Data: a single sequence with
no boundaries Unlike transaction data We are looking for activity
sequence patterns With discontinuous steps Variations of the same
activity 13 Transaction IDItems 1{Milk, Egg, Bread} 2{Bread, Beer}
3{Soap, Milk, Egg} MDMDACDF Item-set boundary No boundaries !
Slide 14
From Sequence Mining to Activity Recognition Find activity
patterns Discontinuous Varied Sequence Mining (DVSM) Continuous,
varied Order, Multi Threshold (COM) Cluster similar patterns
Cluster centroid is a representative activity. Recognize activities
Hidden Markov Model 14
Slide 15
DVSM Finds general patterns/variations in several iteration
During each iteration Finds increasing length patterns Extend by
prefix and suffix at each iteration Checks if it is a variation of
a general pattern At the end of each iteration Retain only
interesting patterns according to MDL principle 15 Pattern
Instances {b,x,a} {a,b,q} {a,u,b} General Pattern Continuity
Compression
Slide 16
DVSM Continuity Pattern Variations Instances Events Prunes
patterns/variations with low compression values Highly
discontinuous Infrequent Prunes non-maximal patterns Prune
irrelevant variations using mutual information and sensor 16
Slide 17
Improve DVSM: COM Different sensor frequencies for Different
regions of home Different types of sensor Rare item problem A
global min-support doesnt work! Use multiple support thresholds
17
Slide 18
Clustering Grouping similar objects together There are many
different clustering methods Partition based (k-Means) Hierarchal
(CURE) Density based (DBSCAN) Model based (EM) 18
Slide 19
Similarity Measure How similarity is determined? Our activity
similarity measure 19 Total Similarity Start Time Similarity
Duration Similarity Structure Similarity Location Similarity =+ +
+
Slide 20
Activity Recognition Basically a sequence classification
problem Different than ordinary classification problems Variable
length records Order Probabilistic methods are the most widely used
Markov chains Hidden Markov models Dynamic Bayesian Networks
Conditional random fields 20
Slide 21
Hidden Markov Model A statistical model Markovian property A
number of observed & hidden variables Their transition
probabilities We automatically build HMM from cluster centroids
21
Slide 22
Agenda Introduction Challenges Solutions Sequence mining Stream
mining Transfer learning Active learning Results Conclusions &
future directions 22
Slide 23
Stream Mining Many emerging applications IP network traffic
Scientific data Process data as it arrives We cannot store all data
One pass Approximate and randomization answers E.g. relaxed support
threshold Some proposed methods Frequent itemset mining Lossy
counting [Manku 2002], SpaceSaving algorithm [Metwally 2005],
Frequent sequence mining SPEED algorithm [Raissi 2005],.. 23
Slide 24
Tilted Time Model Uses a set of time-tilted windows to keep
frequency of items Finer details for more recent time frame Coarser
details for older time frames Shifting history into older time
frames as data arrives 24 Month day hour *C. Giannella, J. Han, J.
Pei, X. Yan, and P. S. Yu, Mining Frequent Patterns in Data Streams
at Multiple Time Granularities. MIT Press, 2003, ch. 3.
Slide 25
Tilted Time Model Minimum support: Maximum support error: An
itemset can be Frequent Sub-frequent Infrequent Pruning itemsets
(tail pruning) 25
Slide 26
StreamCOM Extending COM into a stream mining method Using
tilted time model 26 COM Titled Time Model StreamCOM
Slide 27
Finds general patterns/variations in several iteration During
each iteration Finds increasing length patterns Extend by prefix
and suffix at each iteration Checks if it is a variation of a
general pattern At the end of each iteration Retain only
interesting patterns according to MDL principle 27 Discovering
Patterns {b,x,c,a}{a,b,q}{a,u,b} General Pattern Variation
Slide 28
General Pattern T (a) g Interesting ( g s ) T (a) < g Sub-
interesting Otherwise uninteresting Variation i T (a i )
Interesting ( v ) T (a i ) < Sub- interesting Otherwise
uninteresting 28 Interesting Patterns Average compression of all
variations
Slide 29
Tail pruning 29 Pruning Patterns General Pattern Variation
Slide 30
Tail Pruning To reduce the number of frequency records in the
tilted-time windows Prune old frequency records of an itemset
30
Slide 31
Agenda Introduction Challenges Solutions Sequence mining Stream
mining Transfer learning Active learning Results Conclusions &
future directions 31
Slide 32
Transfer Learning Apply skills learned in previous tasks to
novel tasks Chess Checkers Math CS 32 Traditional ML Transfer
Learning training items test items training items test items
Slide 33
Transfer Learning Methods Transfer Learning Labeled Target
Data? Non-Inductive Transfer Learning Labeled Source Data?
Unsupervised Transfer Learning Transductive Transfer Learning Same
domains? Sample Selection/Covariance Shift Domain Adaptation
Inductive Transfer Learning Labeled Source Data? Self Taught
LearningMulti-Task Learning 33 Yes No Yes No Yes No * S. Pan;
Q.Yang;, "A Survey on Transfer Learning, IEEE TKDE, vol.22, no.10,
pp.1345-1359, Oct. 2010
Slide 34
Why in Smart Homes? Why transfer learning? Supervised methods
Requires annotation Unsupervised methods Requires lots of data
34
Slide 35
Our Transfer Learning Solutions Activity Transfer Transfer from
one resident to another Different residents, space layouts, sensors
Transfer from a single physical source to a target Transfer from
multiple physical source to a target Domain selection 35
Slide 36
Multi Resident Transfer Learning 1. Find interesting target
patterns using DVSM 2. Cluster discovered patterns 3. Map cluster
centroids to source activities 36
Slide 37
Multi Home Transfer Learning (MHTL) 1. Find activity models in
both spaces Source: extract activity model Target: location based
mining, incremental clustering Activity consolidation, sensor
selection 2. Map activity models from source to target Map Sensors
Map activities 3. Map Labels 4. Use labels for recognition! 37
Domain Selection Our previous works Assumed all sources are
equal Not all sources are equal Some sources are more equal! Select
top N sources Efficiency: do not use all sources Accuracy: negative
transfer effect 41 Some animals are more equal... George Orwell
Animal Farm
Slide 42
Domain Similarity How to measure difference between two
distributions? 42
Slide 43
Domain Similarity Conventional similarity measures Kullbeck
Leibler divergence (KL), Jensen Shannon divergence (JSD), L 1 or L
p norms Kifer et al [2004] proposed H distance Later Ben David et
al [2007] proved that It is exactly the problem of minimizing the
empirical risk of a classifier that discriminates between instances
drawn from the two domain! 43
Slide 44
Demonstration of H Distance 44 H-distance: 0.1, small! *Shai
Ben-David, John Blitzer, Koby Crammer, and Fernando Pereira.
Analysis of representations for domain adaptation. In NIPS,
2007.
Slide 45
Domain Similarity Kifer et al [2004] proposed H distance Later
Ben David et al [2007] proved that It is exactly the problem of
minimizing the empirical risk of a classifier that discriminates
between instances drawn from the two domain! 45
Slide 46
Our Domain Selection Method Find similarity of domains
activity-wise Overall similarity: average activity-wise similarity
Select n top sources 46
Slide 47
Agenda Introduction Challenges Solutions Sequence mining Stream
mining Transfer learning Active learning Results Conclusions &
future directions 47
Slide 48
Active Learning The learning algorithm can query for the label
of a point Ask the oracle! Proposed methods Uncertainty sampling,
committee based, 48
Slide 49
A Problem! Traditional active learning methods Ask overly
specific queries 49 What is the class label if (sex= female) and
(age =39) and (chest pain type =3) and (serum cholesterol = 150.2
mg/dL) and (fasting blood sugar = 150 mg/dL)... and
(electrocardiographic result = 1) and (maximum heart rate achieved
= 126) and (exercise induced angina = 90) and (heart old peak =
2.3) and (number of major vessels colored by fluoroscopy = 3)? vs.
What is the class label if (age > 65) and (chest pain type = 3)
and (serum cholesterol > 240 mg/dL) ?
Slide 50
Template Based Queries Select the most informative instances
Select friends (+) and enemies (-) = Select relevant and weakly
relevant features in Build a template query using relevant and
weakly relevant features 50
Slide 51
RIQY 51 RIQY: Rule Induced active learning QuerY method Select
the most informative instances Select friends (+) and enemies (-) =
Use rule induction to build generic queries
Slide 52
Details The most informative instance 52
Slide 53
Agenda Introduction Challenges Solutions Sequence mining Stream
mining Transfer learning Active learning Results Conclusions &
future directions 53
Slide 54
Can we discover activities? DVSM vs. COM 54
Slide 55
Activity Discovery Confusion matrix for various activities in
apartment 1 55
Slide 56
Some Discovered Patterns 56
Slide 57
StreamCOM Taking medication activity 57
Slide 58
Transferring Activities 58
Slide 59
Transferring Activities 59
Slide 60
What about active learning? 60 Wisconsin breast cancer dataset
-UCI repository Kyoto smart apartment dataset -CASAS
Slide 61
Conclusions Two novel sequence mining methods DVSM COM A novel
stream data mining method StreamCOM A couple of transfer learning
methods Between residents Between one/multiple smart homes Source
selection Two novel active learning methods Template based active
learning RIQY 61
Slide 62
Future Work Anomaly detection in sequences Exploiting more
temporal information Order of activities Change detection in
patterns 62
Slide 63
Publications Published/Accepted Parisa Rashidi and Diane J.
Cook. Mining and Monitoring Patterns of Daily Routines for Assisted
Living in Real World Settings. Proceedings of International Health
Informatics Conference (IHI). 2010. Parisa Rashidi and Diane J.
Cook. Transferring learned activities in smart environments between
different residents. Proceedings of International Conference on
Intelligent Environments (IE), volume 2, pages 185-192.
Springer-Verlag, 2009. Parisa Rashidi and Diane J. Cook. Multi Home
Transfer Learning for Resident Activity Discovery and Recognition.
Proceedings of International Workshop on Knowledge Discovery from
Sensor Data (KDD), pages 53-63, 2010. Parisa Rashidi, Diane J.
Cook, "Home to home transfer learning", Proceedings of AAAI Plan,
Activity, Intention Recognition Workshop (AAAI), 2010. 63
Slide 64
Publications Published/Accepted Parisa Rashidi, Diane J. Cook,
"Transferring Learned Activities and Cues between Different
Residential Spaces", Journal of Pervasive and Mobile Computing
(PMC). March 2010. Maureen Schmitter-Edgecombe, Parisa Rashidi,
Diane J. Cook, Larry Holder. Discovering and Tracking Activities
for Assisted Living, The American Journal of Geriatric Psychiatry.
In Press, 2010. Parisa Rashidi, Diane J. Cook,, Larry Holder,
Maureen Schmitter- Edgecombe. Discovering Activities to Recognize
and Track in a Smart Environment, IEEE Transaction of Data and
Knowledge Engineering (TKDE). In Press, 2010. Parisa Rashidi, Diane
J. Cook, Mining Sensor Streams for Discovering Human Activity
Patterns Over Time. Proceedings of International Conference on Data
Mining (ICDM), 2010. 64
Slide 65
Publications Submitted Parisa Rashidi, Diane J. Cook. Domain
Selection and Adaptation in Smart Homes. ICOST 2011, January 2011,
submitted. Parisa Rashidi, Diane J. Cook. Template Based Active
Learning. AAAI 2011, February 2011. Submitted. Parisa Rashidi,
Diane J. Cook. Ask Me Better Questions. Rule Induction Based Active
Learning. KDD 2011, February 2011. Submitted. 65
Slide 66
Publications Invited/To be submitted Parisa Rashidi, Diane J.
Cook. Mining and Monitoring Patterns of Daily Routines for Assisted
Living in Real World Settings. ACM Transactions special issue on
Intelligent Systems for Health Informatics. Invited. April 2011
Parisa Rashidi, Diane J. Cook. Generic Active Learning Queries.
TKDE or JMLR. May 2011. To be submitted. 66