http://www.robots.ox.ac.uk/~vgg/research/ unsupervised_landmarks/ Unsupervised Learning of Object Landmarks through Conditional Image Generation 1. OVERVIEW “Unsupervised discovery of semantically stable landmarks for visual objects” CONTRIBUTIONS § Object landmark discovery without manual annotations. Outperform state-of-the-art facial landmark detection methods using a simple method. § Learn from synthetically warped images / videos directly. Applicable to a variety of datasets without modification ⎯⎯ faces, humans, 3D objects, digits. § Method factorizes object appearance and geometry transfer style / pose. 3. RESULTS 4. DISENTANGLING STYLE & GEOMETRY 2. METHOD DISTILLING GEOMETRY “SUBTRACT” pairs of images which share appearance, but differ in object pose / geometry. Videos Frames from a video of an object. Synthetically Warped Images Thin-plate spline warped versions of a single image. training input / output source target reconstruction landmarks HUMAN F ACES HUMAN POSE 3D OBJECTS (content loss) unsupervised landmarks N = 10 regressed landmarks linear regression AFLW Dataset (train: synthetic warps) VOXCELEB Dataset (train: video frames) unsupervised landmarks N = 20 Financial support was provided by the UK EPSRC CDT in Autonomous Intelligent Machines and Systems Grant EP/L015987/2, EPSRC Programme Grant Seebibyte EP/M013774/1, ERC 677195- IDIU, and the Clarendon Fund scholarship A I M S Autonomous Intelligent Machines & Systems supervised methods IoD normalised %-MSE 0 1 2 3 4 5 6 7 8 9 TCDCN, Zhang [2016] MTCNN, Zhang [2013] Zhang [2018] (w/o equiv.) Thewlis [2017] Thewlis [2017] frames Shu [2018] Wiles [2018] Zhang [2018] (w/ equiv.) Ours (30 kpts) Ours (50 kpts) unsupervised methods MAFL facial landmark detection 0 2 4 6 8 10 12 14 16 18 1 5 10 100 500 1000 5000 19000 n supervised examples Thewlis [2017] Ours sample efficiency for supervised regression 0 5 10 15 20 25 30 35 d = 60 d = 20 d = 10 Ours (K = 30) replace keypoint bottleneck with FC-layer Tomas Jakab* 1 Ankush Gupta* 1 Hakan Bilen 2 Andrea Vedaldi 1,3 *equal contribution (2) University of Edinburgh (1) Visual Geometry Group (VGG) University of Oxford (3) Facebook AI Research London BBCPose Dataset unsupervised landmarks regressed landmarks Human3.6M Dataset unsupervised landmarks SmallNORB Dataset azimuth elevation lighting shape / instance different style “source” geometry “target” output style geometry reconstruction