The spatial histograms use a variety of layouts latching in different ways on the head location and body segmentation. Classification accuracies for different layouts Conclusions on the various layouts: • Using features on body parts improves performance over spatial BoW methods. • Better localization of pet body improves the performance. Species and breed are predicted by combining the head detectors scores and the appearance features in two ways: • Hierarchical: The head detectors scores are used to decide between cat or dog; then the app. features are fed to a linear SVM for breed classification • Flat: The head detector responses and appearance features are concatenated to jointly decide species and breed Classification accuracies for different combinations straeies Classification accuracies for feature combinations Conclusions: • Combining shape with appearance improves accuracy significantly in both species as well as breed classification • Flat classification is more accurate than the hierarchical method 1 Department of Engineering Science, University of Oxford 2 International Institute of Information Technology, Hyderabad, India E-mail : {omkar,vedaldi,az}@robots.ox.ac.uk, [email protected] This research is funded by UKIERI, EU Project AXES ICT-269980 and ERC grant VisRec no. 228180. Cats and Dogs Omkar M Parkhi 1,2 Andrea Vedaldi 1 Andrew Zisserman 1 C. V. Jawahar 2 Image Layout Image + Head Layout Image + Head + Body Layout 35.7% 39.0% 77.0% 81.8% 69.0% 71.1% 60.0% 64.0% 51.0% 46.0% 70.0% 82.0% 52.0% 4.0% 62.0% 33.0% 38.4% 20.0% 29.0% 43.0% 80.0% 70.0% 51.0% 82.0% 75.8% 53.0% 39.0% 82.0% 28.0% 85.0% 59.0% 91.0% 66.7% 57.0% 37.1% 53.0% 50.0% 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 Abyssinian 1 Bengal 2 Birman 3 Bombay 4 British Shorthair 5 Egyptian Mau 6 Maine Coon 7 Persian 8 Ragdoll 9 Russian Blue 10 Siamese 11 Sphynx 12 Am. Bulldog 13 Am. Pit Bull Terrier 14 Basset Hound 15 Beagle 16 Boxer 17 Chihuahua 18 Eng. Cocker Spaniel 19 Eng. Setter 20 German Shorthaired 21 Great Pyrenees 22 Havanese 23 Japanese Chin 24 Keeshond 25 Leonberger 26 Miniature Pinscher 27 Newfoundland 28 Pomeranian 29 Pug 30 Saint Bernard 31 Samoyed 32 Scottish Terrier 33 Shiba Inu 34 Staff. Bull Terrier 35 Wheaten Terrier 36 Yorkshire Terrier 37 Confusion matrix for 37 Class classification problem. (Image+Head+Body Layout with Flat classification method) Segmentation: Qualitative Results (Oxford IIIT Pet Dataset) Comparison with other Datasets Dataset Examples Layout Multi Class Classification Accuracy Cats Vs. Dogs Cats (25) Dogs (12) Combined (37) Image 82.56% 52.01% 40.59% 39.64% Image+Head 85.06% 60.37% 52.10% 51.23% Image+Head+Body 87.78% 64.27% 54.31% 54.05% Image+Head+Body (Ground Truth) 88.68% 66.12% 57.29% 56.60% Layout Classification Accuracy Cats Vs. Dogs Hierarchical (37) Flat (37) Image 94.88% 42.29% 43.30% Image+Head 95.07% 52.78% 54.03% Image+Head+Body 94.89% 55.26% 56.68% Image+Head+Body (Ground Truth) 95.37% 57.77% 59.21% • Asirra (Animal Species Image Recognition for Restricting Access) • Introduced by Microsoft Research to provide alternatives to text based CAPTCHA • 3 million pictures of cats and dogs from Petfinder.com • Test: given a number of such images, separate cats from dogs • 25,000 images are available to evaluate the system Method Class. Accuracy UCSD – Caltech Birds 6.91% OXFORD-IIIT Pet Dataset 38.45% Oxford Flowers 102 53.71% • Multi-class classification framework from software package VLFeat (www.vlfeat.org ) evaluated on 3 different datasets. • SIFT-BoW spatial histograms features with kernel approximations and linear SVM in 1 Vs. All classification setting ASIRRA Challenge III. Combining Models: II.2c Spatial Histogram Layouts • Introducing new annotated dataset covering 37 different breeds of cats and dogs • Fine grained categorization of cats and dogs • State of the art results on MSR ASIRRA challenge • 7,349 images of Cats and Dogs • Collected from various sources on the Internet • 37 different categories: 25 cat breeds and 12 dog breeds • Approx. 200 images/category with manual annotations • Annotations for an image include: • Species (cat or dog) • Breed • Tight bounding box around the pet head • Pixel level foreground/background masks (Trimaps) Method Classification Accuracy Break-in Probability [Golle et. al] 82.7% 9.2% This paper (Shape Only) 92.7% 42% Method Segmentation Accuracy All Foreground 45% Parkhi et. al (ICCV 2011) 61% This paper 65% Failure Cases: Top row: Bengal cats (right) classified as Egyptian Mau (left) Bottom row: English Setter (right) classified as English Cocker Spaniel (left) Segmentation: Quantitative Results (Oxford IIIT Pet Dataset) I. Problem and Contributions Cat Bengal Dog Pug Example Annotations Previously on Cats and Dogs.. • Our previous work [Parkhi et al. ICCV 2011] investigated the problem of detecting deformable animals. • Central idea was to detect a stable, distinctive part of the animal and localize the body using the clues from that part. • Deformable parts model was used to detect the distinctive part and GrabCut segmentation was used to localize the object. • In this work, we release a dataset helpful for evaluating performance of such methods and tackle the problem of multiclass classification. The Truth About Cats and Dogs, ICCV 2011 Model for Pet Classification II. Dataset can be downloaded from: http://www.robots.ox.ac.uk/~vgg/data/pets.html The Oxford-IIIT Pet Dataset I. • The breed of a pet affects its shape, size, fur type and color • These attributes are modeled by combinations of shape and appearance features • The heads of the pets captured by deformable part models • Constellation of HOG + LBP parts • Two head models for cats and dogs trained separately • Detection scores used for species classification • Cat Vs Dog classification accuracy of 94.21% achieved The texture of the fur is captured by a bag of words model: • Multi-scale dense SIFT features • Vocabulary of 4000 visual words using K-Means • Spatial histograms with varied layouts • Features computed on entire image as well as body parts of the animal obtained by automatic segmentation • The pet body (foreground) is segmented using Grabcut • Grabcut initialized from superpixels of an image obtained from Berkeley UCM • Super pixels seed GMMs depending upon classification scores [Chai et al. ICCV’11] • SIFT-BoW, size and location of a superpixel used as features • Head detection also assists GMM seeding [Parkhi et. al ICCV’11] • Berkeley Edge Detector response provides pairwise potentials Shape Model: II.1 Appearance Model: II.2 Automatic Segmentation: II.2a Spatial Layouts: II.2b