Edgar Seemann, 19.12.08 1 Visual Perception for Human-Computer Interaction WS 2008/09 Dr. Rainer Stiefelhagen Dr. Edgar Seemann Interactive Systems Laboratories Universität Karlsruhe (TH) http://isl.ira.uka.de/VisionHCICourse [email protected][email protected]
83
Embed
cv:hci - Visual Perception for Human-Computer Interaction · 2008-12-27 · Computer Vision for Human-Computer Interaction Research Group, Universität Karlsruhe (TH) cv:hci Requirements
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Edgar Seemann, 19.12.08 1
Visual Perception for Human-Computer Interaction
WS 2008/09
Dr. Rainer StiefelhagenDr. Edgar Seemann
Interactive Systems LaboratoriesUniversität Karlsruhe (TH)
� Note, that space complexity is quite important for clustering large data sets� Example: 100 000 data points
� Standard distance matrix contains:
105*105 =1010 entries
-> ~40 GB if one entry has 32bit-> Does your PC have enough RAM?
Edgar Seemann, 15.12.08 57
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
� Agglomerative clustering produces a hierarchy
� Difficult question: where to stop?� Ideally, clusters should be visually compact.
� But� Distance value depends on feature dimensionality.
� Appropriate ratio #features/#clusters depends on data set and interest point detector.
⇒ Needs to be selected for each detector/descriptor combination!
Clustering Hierarchy
Edgar Seemann, 15.12.08 58
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Visual Vocabulary
� Vocabulary size ~10000 clusters� Probabilistic votes decide, whether part is important or
not
Edgar Seemann, 15.12.08 59
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Learning Spatial Structure:
“Star”-Model
Edgar Seemann, 15.12.08 60
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
1. Learn appearance codebook� Extract local features at interest points� Agglomerative clustering ⇒ codebook
2. Learn spatial distributions� Match codebook to training images� Record matching positions on object
� Sparse representation of the object appearance
Implicit Shape Model - Representation
Edgar Seemann, 15.12.08 61
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Training: Spatial Occurrence (Star-Model)
1. Record spatial occurrence� Match codebook to training images� Record occurrence distributions with
respect to object center� Location (x, y) and scale
Spatial occurrence distributionsStar-Modelx
y
s
x
y
sx
y
s
x
y
s
Edgar Seemann, 15.12.08 62
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Occurrence Distribution
� For each codebook entry, we obtain a non-parametric probability distribution of its position relative to the object center
� With� ci a codebook entry
� λ=(λx, λy, λs) the relative position and scale
Edgar Seemann, 15.12.08 63
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Remember: Generalized Hough Transform [Ballard81]
� Choose reference point for the contour (e.g. center)� For each point on the contour remember where it is located
w.r.t. to the reference point � Remember radius r and angle φφφφ
relative to the contour tangent� Recognition: whenever you find
a contour point, calculate the tangent angle and ‘vote’ for all possible reference points
� Instead of reference point, can also vote for transformation⇒ The same idea can be used with local features!
Slide credit: Bernt Schiele
Edgar Seemann, 15.12.08 64
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
64
Generalized Hough Transform
� For every feature, store possible “occurrences”
– Object identity– Pose– Relative position
• For new image, let the matched features vote for possible objectpositions
Edgar Seemann, 15.12.08 65
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hciProbabilistic Gen. Hough Transform
� Exact correspondences → Prob. match to object part
� NN matching → Soft matching
� Feature location on obj. → Part location distribution
� Uniform votes → Probabilistic vote weighting
� Quantized Hough array → Continuous Hough space
Edgar Seemann, 15.12.08 66
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Detection Procedure
Edgar Seemann, 15.12.08 67
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Image
Recognition: ISM Detection Procedure
Back-ProjectionSegmentation
Probabilistic Voting
xs
3D Voting Space
y
0.7 0.5
Detection Confidences
Edgar Seemann, 15.12.08 68
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Probabilistic Formulation
� Descriptor contribution:
� With� e an extracted image descriptor
� l the position of the descriptor in the image
� Marginalization over all found descriptors:
Edgar Seemann, 15.12.08 69
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
� Mean-Shift formulation for refinement� Scale-adaptive balloon density estimator
Scale Voting: Efficient Computation
y
s
Binned accum. array
y
s
x
Refinement(MSME)
y
s
x
Candidatemaxima
y
s
Scale votes
Edgar Seemann, 15.12.08 70
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Figure-Ground Segmentation
Edgar Seemann, 15.12.08 71
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Occurrence distributions
� Adding local segmentation masks
Spatial occurrence distributionsx
y
s
x
y
sx
y
s
x
y
s
+ local figure-ground labels
Edgar Seemann, 15.12.08 72
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Figure-Ground Segmentation
� Influence of descriptor on an object hypotheses:
� Figure probability for a hypothesis:
Segmentationinformation
Influence on object hypothesis
Edgar Seemann, 15.12.08 73
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Figure-Ground Segmentation
� Final segmentation value:
Edgar Seemann, 15.12.08 75
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Overlapping hypotheses
Edgar Seemann, 15.12.08 76
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Minimum Description Length (MDL) Reasoning
� Savings term:� Sarea : #pixels N in segmentation
� Smodel: model cost, assumed constant
� Serror : estimate of error
� Error term:
� Overlapping hypotheses:
Edgar Seemann, 15.12.08 77
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
� Secondary hypotheses� Desired property of algorithm! ⇒ robustness to occlusion
� Standard solution: reject based on bounding box
⇒ Problematic - may lead to missing detections!
⇒ Use segmentations to resolve ambiguities instead
MDL based Verification
Leibe, Leonardis, Schiele, ‘04
Edgar Seemann, 15.12.08 79
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Extensions and Evaluation
Edgar Seemann, 15.12.08 80
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Outline
1. Image Descriptors and Interest Points� Measure the influence of interest region extraction � Evaluate the robustness of local image descriptions
2. Body Articulations3. Cross-Articulation Learning4. Discriminative Hypothesis Verification5. Instance-Specific Models
Edgar Seemann, 15.12.08 81
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Interest Point Detectors� Interest point detectors sample different image regions� Unclear, which sampling is most informative for pedestrian detection
Harris
Harris-
Laplace
(scale-
invariant)
DoG
(scale-
invariant)
Hessian-
Laplace
(scale-
invariant)
Edgar Seemann, 15.12.08 82
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
� Object shape more important than actual pixel values� Shape generalizes better