Message from the General and Program Chairs
1
Welcome to Santa Rosa, CA, and the 17th edition of the Winter Conference on Applications of Computer Vision (WACV), jointly sponsored by the IEEE Computer Society and the IEEE Biometrics Council. WACV is the premier outlet for research advances in applications of computer vision technology.
WACV 2017 spans four days, with a three-day, two-track, core program in which authors will present each accepted paper as a short oral and a poster. In addition, we have keynote talks and social functions, as well as several co-located events, including three workshops, two tutorials, a Ph.D. forum, and demo sessions. Following last year’s conference, WACV 2017 adopted a two-track core program, with two parallel oral sessions, each with 5-minute talks.
We used the Conference Management Toolkit (CMT) provided by Microsoft Research to manage the submission and selection of papers. To select papers for the program, we invited 27 researchers to act as Area Chairs (ACs). We recruited 275 experienced reviewers from the broader computer vision community. We received 320 original unpublished, full paper, submissions to the main conference. The Program Chairs (PCs) assigned the papers to the ACs who made recommendations for reviewers. All papers were reviewed by a minimum of three reviewers. Papers by PCs and GCs were handled to avoid conflict of interests, and the ACs were excluded from any decisions associated with papers from their research groups, affiliated institutions or collaborators. After the reviews were received, authors were offered an opportunity to rebut. Area chairs made initial recommendations based on the reviews, rebuttals, and reviewer discussions. In a few cases, the PCs discussed papers with the ACs to arrive at a final decision. Of the 320 full papers submitted, 144 high-quality papers were accepted to be part of the final program (~ 45% acceptance rate).
The proceedings of WACV 2017 are provided online before, during, and after the conference to all registered attendees. Like last year, there will not be USB proceedings, so participants are encouraged to download the proceedings before the conference. All papers in the main conference and associated workshops will be made available through the IEEE Computer Society Digital Library and IEEE Xplore.
The main conference also includes three keynote speakers: Dr. Richard Szeliski from Facebook & Univ. of Washington, Prof. Marc Pollefeys from Microsoft Research & ETH Zurich, and Prof. Tamara Berg from Shopagon Inc. & UNC-Chapel Hill.
We wish to thank all members of the Organizing Committee, the Area Chairs, reviewers, authors, and the CMT for the immense amount of hard work and professionalism that went into making WACV 2017 a first-rate conference on the applications of computer vision. Our thanks also go to the organizers of past WACV meetings and the steering committee for their helpful advice and support.
We are grateful to our Silver Sponsors, Cognex and Kitware, and Bronze sponsors, Adobe, Disney Research, Amazon, Verisk Analytics, and Google for their generous support.
Finally, we invite the attendees to be Sonomads for a few days and enjoy Sonoma County’s art, wine, and coffee.
Gérard Medioni, David Michael, Sudeep Sarkar (General Co-Chairs)
Michael S. Brown, Rogério Feris, Conrad Sanderson, Matthew Turk (Program Co-Chairs)
Organizing Committee & Area Chairs
2
WACV 2017 Organizing Committee
General Chairs: Gérard Medioni Sudeep Sarkar David Michael
Program Chairs: Michael S. Brown Conrad Sanderson Matthew Turk Rogério Feris
Steering Committee: Anthony Hoogs Bryan Morse Terrance Boult Bir Bhanu Fatih Porikli
Workshops Chair: Jiwen Lu
Tutorials Chair: Xiaoming Liu
Finance Chair: Terrance Boult
Publications Chairs: Eric Mortensen Revathy Narasimhan
Web Chair: Fillipe Souza
Demos Chair: Tal Hassner
PhD Forum Chair: Song Wang
Publicity Chair: Ajay Kumar
WACV 2017 Area Chairs
Teofilo de Campos Liangliang Cao Peter Carr Kristin Dana Victor Fragoso Danna Gurari
Bohyung Han Mehrtash Harandi Tal Hassner Wong Yong Kang Seon Joo Kim Adriana Kovashka
Laura Leal-Taixé Mohammad Mahoor Scott McCloskey Chris McCool Vlad Morariu Fatih Porikli
Andrea Prati Brian Price Behjat Siddiquie Kevin Smith Matt Turek Xiaoyu Wang
Arnold Wiliem Guoying Zhao Wenyi Zhao
Monday, March 27 Program
3
Sunday, March 26
1900–2100 Registration (Alexander Valley Foyer)
Monday, March 27
0800–1100 Registration (Alexander Valley Foyer)
0850–0900 Welcome by the General Chairs (Dry Creek Valley & Russian River Valley)
0900–1000 Oral 1A: Segmentation, Tracking (Dry Creek Valley)
Chair: Larry Davis (Univ. of Maryland)
Format (5 min. short presentation)
1. Deep Salient Object Detection by Integrating Multi-Level
Cues, Jing Zhang, Yuchao Dai, Fatih Porikli
2. Multi-Planar Fitting in an Indoor Manhattan World,
Seongdo Kim, Roberto Manduchi
3. Universal Skin Detection Without Color Information,
Abhijit Sarkar, Amos Lynn Abbott, Zachary Doerzaph
4. Recurrent Fully Convolutional Networks for Video
Segmentation, Sepehr Valipour, Mennatullah Siam, Martin
Jagersand, Nilanjan Ray
5. Learning Spatial Transforms for Refining Object Segment
Proposals, Haoyang Zhang, Xuming He, Fatih Porikli
6. Repeated Pattern Detection Using CNN Activations, Louis
Lettry, Michal Perdoch, Kenneth Vanhoey, Luc Van Gool
7. Deep Context Modeling for Semantic Segmentation, Kien
Thanh, Clinton Fookes, Sridha Sridharan
8. 3D Semantic Segmentation of Modular Furniture Using
rjMCMC, Ishrat Badami, Manu Tom, Markus Mathias,
Bastian Leibe
9. PASCAL Boundaries: A Semantic Boundary Dataset With a
Deep Semantic Boundary Detector, Vittal Premachandran,
Boyan Bonev, Xiaochen Lian, Alan Yuille
10. Can Affordances Guide Object Decomposition Into
Semantically Meaningful Parts?, Safoura Rezapour Lakani,
Antonio J. Rodriguez-Sanchez, Justus Piater
11. Solving Occlusion Problem in Pedestrian Detection by
Constructing Discriminative Part Layers, Cong Cao, Yu
Wang, Jien Kato, Guanwen Zhang, Kenji Mase
12. Unifying Registration Based Tracking: A Case Study With
Structural Similarity, Abhineet Singh, Mennatullah Siam,
Martin Jagersand
0900–1000 Oral 1B: Action Recognition (Russian River Valley)
Chair: François Brémond (INRIA Sophia Antipolis)
Format (5 min. short presentation)
1. Deep Moving Poselets for Video Based Action
Recognition, Effrosyni Mavroudi, Lingling Tao, René Vidal
2. First-Person Action Decomposition and Zero-Shot
Learning, Yun C. Zhang, Yin Li, James M. Rehg
3. Higher-Order Pooling of CNN Features via Kernel
Linearization for Action Recognition, Anoop Cherian, Piotr
Koniusz, Stephen Gould
4. Semi-Coupled Two-Stream Fusion ConvNets for Action
Recognition at Extremely Low Resolutions, Jiawei Chen,
Jonathan Wu, Janusz Konrad, Prakash Ishwar
5. On Geometric Features for Skeleton-Based Action
Recognition Using Multilayer LSTM Networks, Songyang
Zhang, Xiaoming Liu, Jun Xiao
6. Real-Time Online Action Detection Forests Using Spatio-
Temporal Contexts, Seungryul Baek, Kwang In Kim, Tae-
Kyun Kim
7. Ordered Pooling of Optical Flow Sequences for Action
Recognition, Jue Wang, Anoop Cherian, Fatih Porikli
8. Two Stream LSTM: A Deep Fusion Framework for Human
Action Recognition, Harshala Gammulle, Simon Denman,
Sridha Sridharan, Clinton Fookes
9. Multi-Camera Action Dataset for Cross-Camera Action
Recognition Benchmarking, Wenhui Li, Yongkang Wong,
An-An Liu, Yang Li, Yu-Ting Su, Mohan Kankanhalli
10. Efficient Action Detection in Untrimmed Videos via Multi-
Task Learning, Yi Zhu, Shawn Newsam
Monday, March 27 Program
4
11. Learning Discriminative Features via Label Consistent
Neural Network, Zhuolin Jiang, Yaming Wang, Larry Davis,
Walter Andrews, Viktor Rozgic
12. Recognition of Group Activities in Videos Based on Single-
and Two-Person Descriptors, Stéphane Lathuilière,
Georgios Evangelidis, Radu Horaud
1000–1045 Morning Break (Alexander Valley)
1045–1150 Oral 2A: Computational Photography, 3D Modeling, Remote Sensing, Gesture (Dry Creek Valley)
Chair: Larry Davis (Univ. of Maryland)
Format (5 min. short presentation)
1. Quantitative Analysis of Automatic Image Cropping
Algorithms: A Dataset and Comparative Study, Yi-Ling
Chen, Tzu-Wei Huang, Kai-Han Chang, Yu-Chen Tsai,
Hwann-Tzong Chen, Bing-Yu Chen
2. Joint Regression and Ranking for Image Enhancement,
Parag Shridhar Chandakkar, Baoxin Li
3. Material Classification Under Natural Illumination Using
Reflectance Maps, Stamatios Georgoulis, Vincent
Vanweddingen, Marc Proesmans, Luc Van Gool
4. Dense Batch Non-Rigid Structure From Motion in a
Second, Vladislav Golyanik, Didier Stricker
5. Global Model With Local Interpretation for Dynamic Shape
Reconstruction, Antonio Agudo, Francesc Moreno-Noguer
6. Occlusions Are Fleeting - Texture Is Forever: Moving Past
Brightness Constancy, Christopher Ham, Surya Singh,
Simon Lucey
7. Accurate 3D Reconstruction of Dynamic Scenes From
Monocular Image Sequences With Severe Occlusions,
Vladislav Golyanik, Torben Fetzer, Didier Stricker
8. Patchwork Stereo: Scalable, Structure-Aware 3D
Reconstruction in Man-Made Environments, Amine Bourki,
Martin de La Gorce, Renaud Marlet, Nikos Komodakis
9. Calibration Technique for Underwater Active Oneshot
Scanning System With Static Pattern Projector and
Multiple Cameras, Hiroshi Kawasaki, Hideaki Nakai,
Hirohisa Baba, Ryusuke Sagawa, Ryo Furukawa
10. Fast Deep Vehicle Detection in Aerial Images, Lars Wilko
Sommer, Tobias Schuchert, Jürgen Beyerer
11. Beyond Spatial Auto-Regressive Models: Predicting
Housing Prices With Satellite Imagery, Archith J. Bency,
Swati Rallapalli, Raghu K. Ganti, Mudhakar Srivatsa, B. S.
Manjunath
12. Robust Hand Gestural Interaction for Smartphone Based
AR/VR Applications, Shreyash Mohatta, Ramakrishna
Perla, Gaurav Gupta, Ehtesham Hassan, Ramya
Hebbalaguppe
13. Spatial-Temporal Motion Field Analysis for Pixelwise Crack
Detection on Concrete Surfaces, Subhajit Chaudhury, Gaku
Nakano, Jun Takada, Akihiko Iketani
1045–1150 Oral 2B: Scene Understanding, Motion Processing (Russian River Valley)
Chair: François Brémond (INRIA Sophia Antipolis)
Format (5 min. short presentation)
1. 2-Line Exhaustive Searching for Real-Time Vanishing Point
Estimation in Manhattan World, Xiaohu Lu, Jian Yao,
Haoang Li, Yahui Liu, Xiaofeng Zhang
2. Pano2CAD: Room Layout From a Single Panorama Image,
Jiu Xu, Björn Stenger, Tommi Kerola, Tony Tung
3. A Multi-View RGB-D Approach for Human Pose Estimation
in Operating Rooms, Abdolrahim Kadkhodamohammadi,
Afshin Gangi, Michel de Mathelin, Nicolas Padoy
4. Real Estate Image Classification, Jawadul Hasan Bappy,
Joseph R. Barr, Nani Narayanan Srinivasan, Amit K. Roy-
Chowdhury
5. Learn How to Choose: Independent Detectors Versus
Composite Visual Phrase, Guy Rosenthal, Ariel Shamir,
Leomid Sigal
6. Temporal Robust Features for Violence Detection, Daniel
Moreira, Sandra Avila, Mauricio Perez, Daniel Moraes,
Vanessa Testoni, Eduardo Valle, Siome Goldenstein,
Anderson Rocha
7. SAMP: Shape and Motion Priors for 4D Vehicle
Reconstruction, Francis Engelmann, Jörg Stückler, Bastian
Leibe
Monday, March 27 Program
5
8. Predicting the Perceptual Demands of Urban Driving With
Video Regression, Luke Palmer, Alina Bialkowski, Gabriel J.
Brostow, Jonas Ambeck-Madsen, Nilli Lavie
9. Optimal Threshold and LoG Based Feature Identification
and Tracking of Bat Flapping Flight, Yousi Lin, Yang Xu, Hui
Chen, Matthew J. Bender, Amos Lynn Abbott, Rolf Müller
10. Fast Semi Dense Epipolar Flow Estimation, Matthieu
Garrigues, Antoine Manzanera
11. Global Consistency Priors for Joint Part-Based Object
Tracking and Image Segmentation, Oliver Müller, Bodo
Rosenhahn
12. Joint Epipolar Tracking (JET): Simultaneous Optimization
of Epipolar Geometry and Feature Correspondences,
Henry Bradler, Matthias Ochs, Nolang Fanani, Rudolf
Mester
13. Computing Egomotion With Local Loop Closures for
Egocentric Videos, Suvam Patra, Himanshu Aggarwal,
Himani Arora, Subhashis Banerjee, Chetan Arora
1150–1300 Lunch (On your own)
1230–1730 Registration (Alexander Valley Foyer)
1300–1500 Tutorial (details on next page)
1700–1730 Afternoon Break (Alexander Valley)
1730–1830 Keynote Session (Dry Creek Valley)
Keynote Talk: 3D Reconstruction for Image-Based Rendering, Richard Szeliski (Facebook & Univ. of Washington)
Abstract: The reconstruction of 3D scenes and their appearance from imagery is one of the longest-standing problems in computer vision. Originally developed to support robotics and artificial intelligence applications, it has found some of its most widespread use in the support of interactive 3D scene visualization. One of the keys to this success has been the melding of 3D geometric and photometric reconstruction with a heavy re-use of the original imagery, which produces more realistic rendering than a pure 3D model-driven approach. In this talk, I give a retrospective of two decades of research in this area,
touching on topics such as sparse and dense 3D reconstruction, the fundamental concepts in image-based rendering and computational photography, applications to virtual reality, as well as ongoing research in the areas of layered decompositions and 3D-enabled video stabilization.
1830–1930 Dinner (Alexander Valley)
1930–2130 Demos (Alexander Valley)
Instant Immersion of Brands in Videos, Brunno Attorre, Bill Marino (Uru, Inc.)
Visual Intelligence for Fashion, Jayaguru Panda, Naveen Sinha, Labhesh Patel (Abzooba India InfoTech)
Regression of 3D Morphable Face Models Using a Deep CNN, Anh Tuan Tran (Univ. of Southern California)
1930–2130 Exhibits (Alexander Valley)
Amazon • Zillow
1930–2130 Poster Session 1 (Alexander Valley)
Posters for Oral Sessions 1A, 1B, 2A, and 2B.
1930–2130 PhD Forum 1 (Alexander Valley)
Ph.D. Forum Presenters:
1. Unaiza Ahsan
2. Daniel Hernández
3. Yanyang Gu
4. Arun CS Kumar
5. Julius Schöning
6. Tomas Hodan
7. Chi-Hao Wu
8. Jiaping Zhao
Monday, March 27 Program
6
Tutorial: Local 3D Vision, Multiview Geometry, Video Tracking and Visual Servoing for Robot Arm and Hand Motion Control
Organizers: Martin Jagersand Mona Gridseth Abhineet Singh Mennatullah Siam Camilo Perez Oscar Ramirez
Time: 1300-1500
Location: Russian River Valley
Description: Robot vision is significantly different from
general computer vision. While general vision often aims to
reconstruct the whole 3D model or identify all objects,
guiding robot motion towards a target requires tracking
typically only one or a few specific features. Research on
human hand-eye coordination shows that when solving arm
and hand manipulation tasks we acquire minimal
representations of very specific information rather than a
global scene model. For robot vision minimal, but provably
sufficient representations can be build from individual
projective geometry constraints. Computationally, these
constraints are formulated on tracked geometric features
(points, lines, regions etc), and solved through visual
servoing. The tutorial covers the uncalibrated formulation of
multi-view geometry, video tracking, visual servoing, and
shows how to use several ROS softwares to design robot
vision systems.
Notes:
Tuesday, March 28 Program
7
Tuesday, March 28
0800–1100 Registration (Alexander Valley Foyer)
0850–0900 Announcements (Dry Creek Valley & Russian River Valley)
0900–1000 Oral 3A: Statistical Methods, Object Recognition (Dry Creek Valley)
Chair: Scott McCloskey (Honeywell)
Format (5 min. short presentation)
1. Cyclical Learning Rates for Training Neural Networks,
Leslie N. Smith
2. Guaranteed Parameter Estimation for Discrete Energy
Minimization, Mengtian Li, Daniel Huber
3. Solving Robust Regularization Problems Using Iteratively
Re-Weighted Least Squares, Khurrum Aftab Kiani, Tom
Drummond
4. Detecting Social Insects in Videos With Spatiotemporal
Regularization, N. Rich Nguyen, Min C. Shin
5. From Affine Rank Minimization Solution to Sparse
Modeling, Iman Abbasnejad, Sridha Sridharan, Simon
Denman, Clinton Fookes, Simon Lucey
6. Learning Attributes From Human Gaze, Nils Murrugarra-
Llerena, Adriana Kovashka
7. Multi-Task Curriculum Transfer Deep Learning of Clothing
Attributes, Qi Dong, Shaogang Gong, Xiatian Zhu
8. Deep Learning Logo Detection With Data Expansion by
Synthesising Context, Hang Su, Xiatian Zhu, Shaogang
Gong
9. Boosted Convolutional Neural Networks (BCNN) for
Pedestrian Detection, Chi-Hao Wu, Weihao Gan, De Lan,
C.-C. Jay Kuo
10. Improved Deep Learning of Object Category Using Pose
Information, Jiaping Zhao, Laurent Itti
11. Learning to Recognize Objects by Retaining Other Factors
of Variation, Jiaping Zhao, Chin-kai Chang, Laurent Itti
12. Artistic Movement Recognition by Boosted Fusion of Color
Structure and Topographic Description, Corneliu Florea,
Cosmin Ţoca, Fabian Gieske
0900–1000 Oral 3B: Security, Vision for Aerial, Multimedia (Russian River Valley)
Chair: Nicolas Padoy (Univ. of Strasbourg)
Format (5 min. short presentation)
1. Plug-And-Play CNN for Crowd Motion Analysis: An
Application to Abnormal Event Detection, Mahdyar
Ravanbakhsh, Moin Nabi, Hossein Mousavi, Enver
Sangineto, Nicu Sebe
2. Deep Heterogeneous Feature Fusion for Template-Based
Face Recognition, Navaneeth Bodla, Jingxiao Zheng,
Hongyu Xu, Jun-Cheng Chen, Carlos Castillo, Rama
Chellappa
3. Integrated Global-Local Metric Learning for Person Re-
Identification, Jing Zhang, Xu Zhao
4. Multi-Shot Person Re-Identification Using Part
Appearance Mixture, Furqan M. Khan, Francois Brèmond
5. Active Online Anomaly Detection Using Dirichlet Process
Mixture Model and Gaussian Process Classification,
Jagannadan Varadarajan, Ramanathan Subramanian,
Narendra Ahuja, Pierre Moulin, Jean-Marc Odobez
6. Flowdometry: An Optical Flow and Deep Learning Based
Approach to Visual Odometry, Peter Muller, Andreas
Savakis
7. PCA Based Computation of Illumination-Invariant Space
for Road Detection, Taeyoung Kim, Yu-Wing Tai, Sung-Eui
Yoon
8. Road Detection Using Convolutional Neural Networks,
Aparajit Narayan, Elio Tuci, Frédéric Labrosse, Muhanad H.
Mohammed Alkilabi
9. Providing Video Annotations in Multimedia Containers for
Visualization and Research, Julius Schöning, Patrick Faion,
Gunther Heidemann, Ulf Krumnack
10. Detecting Sexually Provocative Images, Debashis Ganguly,
Mohammad Hasanzadeh Mofrad, Adriana Kovashka
11. Complex Event Recognition From Images With Few
Training Examples, Unaiza Ahsan, Chen Sun, James Hays,
Irfan Essa
12. High Level Concepts for Affective Understanding of
Images, Afsheen Rafaqat Ali, Usman Shahid, Mohsen Ali,
Jeffrey Ho
Tuesday, March 28 Program
8
1000–1045 Morning Break (Alexander Valley)
1045–1145 Oral 4A: Vision Systems (Dry Creek Valley)
Chair: Scott McCloskey (Honeywell)
Format (5 min. short presentation)
1. Assessment of Peanut Pod Maturity, Ekta Bindlish, Amos
Lynn Abbott, Maria Balota
2. X-Ray Scattering Image Classification Using Deep
Learning, Boyu Wang, Kevin Yager, Dantong Yu, Minh
Nguyen
3. A Deep Learning Frame-Work for Recognizing
Developmental Disorders, Pushkar Shukla, Tanu Gupta,
Aradhya Saini, Priyanka Singh, Raman Balasubramanian
4. When Was That Made?, Sirion Vittayakorn, Alexander C.
Berg, Tamara L. Berg
5. Telecom Inventory Management via Object Recognition
and Localisation on Google Street View Images, Ramya
Hebbalaguppe, Gaurav Garg, Ehtesham Hassan, Hiranmay
Ghosh, Ankit Verma
6. Deep Object Ranking for Template Matching, Jean-
Philippe Mercier, Ludovic Trottier, Philippe Giguère, Brahim
Chaib-draa
7. A Deep Learning Paradigm for Detection of Harmful Algal
Blooms, Arun CS Kumar, Suchendra M. Bhandarkar
8. Crime Mapping From Satellite Imagery via Deep Learning,
Alameen Najjar, Shun’ichi Kaneko, Yoshikazu Miyanaga
9. Robust Road Marking Detection and Recognition Using
Density-Based Grouping and Machine Learning
Techniques, Oleksandr Bailo, Seokju Lee, Francois Rameau,
Jae Shin Yoon, In So Kweon
10. Beacon-Guided Structure From Motion for Smartphone-
Based Navigation, Tatsuya Ishihara, Jayakorn
Vongkulbhisal, Kris M. Kitani, Chieko Asakawa
11. Hardware-Centric Vision Processing for Mobile IoT
Environment Exploiting Approximate Graph Cut in
Resistor Grid, Yeongjae Choi, Jun-Seok Park, Lee-Sup Kim
12. Exploring Local Context for Multi-Target Tracking in Wide
Area Aerial Surveillance, Bor-Jeng Chen, Gérard Medioni
1045–1145 Oral 4B: Medical, Vision for Graphics & Robotics, Open Source API (Russian River Valley)
Chair: Nicolas Padoy (Univ. of Strasbourg)
Format (5 min. short presentation)
1. Melanoma Detection Based on Mahalanobis Distance
Learning and Constrained Graph Regularized Nonnegative
Matrix Factorization, Yanyang Gu, Jun Zhou, Bin Qian
2. Size and Texture-Based Classification of Lung Tumors
With 3D CNNs, ZhiHao Luo, Marcus A. Brubaker, Michael
Brudno
3. 3D-Brain Segmentation Using Deep Neural Network and
Gaussian Mixture Model, Duy M. . Nguyen, Huy T. Vu, Huy
Q. Ung, Binh T. Nguyen
4. Ultrasound Tracking Using ProbeSight: Camera Pose
Estimation Relative to External Anatomy by Inverse
Rendering of a Prior High-Resolution 3D Surface Map,
Jihang Wang, Chengqian Che, John Galeotti, Samantha
Horvath, Vijay Gorantla, George Stetten
5. Center-Focusing Multi-Task CNN With Injected Features
for Classification of Glioma Nuclear Images, Veda Murthy,
Le Hou, Dimitris Samaras, Tahsin M. Kurc, Joel H. Saltz
6. Densification of Semi-Dense Reconstructions for Novel
View Generation of Live Scenes, Domagoj Baričević, Tobias
Höllerer, Matthew Turk
7. Texture Attribute Synthesis and Transfer Using Feed-
Forward CNNs, Thomas Irmer, Tobias Glasmachers,
Subhransu Maji
8. A Statistical Approach to Continuous Self-Calibrating Eye
Gaze Tracking for Head-Mounted Virtual Reality Systems,
Subarna Tripathi, Brian Guenter
9. Sparse Dictionary Learning for Identifying Grasp
Locations, Ludovic Trottier, Philippe Giguère, Brahim Chaib-
draa
10. T-LESS: An RGB-D Dataset for 6D Pose Estimation of
Texture-Less Objects, Tomáš Hodaň, Pavel Haluza, Štěpán
Obdržálek, Jiri Matas, Manolis Lourakis, Xenophon Zabulis
11. Gaussian Mixture Models for Temporal Depth Fusion,
Cevahir Cigla, Roland Brockers, Larry Matthies
Tuesday, March 28 Program
9
12. An Open-Source Platform for Underwater Image and
Video Analytics, Matthew Dawkins, Linus Sherrill, Keith
Fieldhouse, Anthony Hoogs, Benjamin Richards, David
Zhang, Lakshman Prasad, Kresimir Williams, Nathan
Lauffenburger, Gaoang Wang
1145–1300 Lunch (On your own)
1230–1730 Registration (Alexander Valley Foyer)
1300–1700 Tutorial (details on next page)
1700–1730 Afternoon Break (Alexander Valley)
1730–1830 Keynote Session (Dry Creek Valley)
Keynote Talk: Computer Vision for Mixed Reality, Marc Pollefeys (Microsoft Research & ETH Zurich)
Abstract: This is a golden age for computer vision. Research breakthroughs are leaving the lab and getting into users’ hands in record time. Computer vision now plays a pivotal role in many advances benefitting society, such as autonomous vehicles, improved biometric security, and medical imaging. But out of all these innovations, one stands out as having the potential to completely upend how we access information and communicate with each other: mixed reality. Spurred by recent developments in SLAM, 3D reconstruction, gesture recognition, scene understanding, and power-efficient embedded computing, we’re already experiencing it in the form of groundbreaking products like Microsoft HoloLens. In this talk I will present some of the key computer vision components that are essential for enabling compelling mixed reality experiences on HoloLens and also discuss some of the unique features that HoloLens offers as an experimental platform for computer vision researchers.
1830–1845 Best Paper Awards (Dry Creek Valley)
1845–1930 Dinner (Alexander Valley)
1930–2130 Demos (Alexander Valley)
Instant Immersion of Brands in Videos, Brunno Attorre, Bill Marino (Uru, Inc.)
Visual Intelligence for Fashion, Jayaguru Panda, Naveen Sinha, Labhesh Patel (Abzooba India InfoTech)
Regression of 3D Morphable Face Models Using a Deep CNN, Anh Tuan Tran (Univ. of Southern California)
1930–2130 Exhibits (Alexander Valley)
Amazon • Zillow
1930–2130 Poster Session 2 (Alexander Valley)
Posters for Oral Sessions 3A, 3B, 4A, and 4B.
1930–2130 PhD Forum 2 (Alexander Valley)
Ph.D. Forum Presenters:
1. Haoang Li
2. Yi Zhu
3. Archith John Bency
4. Nguyen Van Dinh
5. Shagan Sah
6. Jiawei Chen
7. Abhijit Sarkar
Notes:
Tuesday, March 28 Program
10
Tutorial: Understanding the In-Camera Image Processing Pipeline for Computer Vision
Organizer: Michael S. Brown
Time: 1300-1700
Location: Russian River Valley
Description: Image processing and computer vision
algorithms often treat a camera as a light measurement
device, where pixel intensities represent meaningful physical
measurements of the imaged scene. However, modern digital
cameras are anything but light measuring devices, with a
wide range of on- board processing, including scene
relighting (dynamic light optimization), white balance, and
various color rendering options (e.g. landscape, portrait,
vivid). This on-board processing is often how camera
manufacturers distinguish themselves among competitors,
resulting in two different cameras producing noticeably
different output images (sRGB) for the same scene. This
raises the question if meaningful values can be obtained from
camera objects. This tutorial will overview the basics of color
theory and the camera imaging pipeline, discussing various
methods that have addressed how to reverse this processing
to obtain meaningful physical values from digital
photographs.
Notes:
Wednesday, March 29 Program
11
Wednesday, March 29
0800–1100 Registration (Alexander Valley Foyer)
0850–0900 Announcements (Dry Creek Valley & Russian River Valley)
0900–1000 Oral 5A: Object Recognition 2, Large Scale Systems (Dry Creek Valley)
Chair: Terry Boult (Univ. of Colorado Colorado Springs)
Format (5 min. short presentation)
1. Describing Unseen Classes by Exemplars: Zero-Shot
Learning Using Grouped Simile Ensemble, Yang Long, Ling
Shao
2. Deep Multi-Modal Vehicle Detection in Aerial ISR Imagery,
Wesam Sakla, Goran Konjevod, T. Nathan Mundhenk
3. Subcategory-Aware Convolutional Neural Networks for
Object Proposals and Detection, Yu Xiang, Wongun Choi,
Yuanqing Lin, Silvio Savarese
4. StuffNet: Using ‘Stuff’ to Improve Object Detection,
Samarth Manoj Brahmbhatt, Henrik I. Christensen, James
Hays
5. Towards Fine-Grained Open Zero-Shot Learning: Inferring
Unseen Visual Features From Attributes, Yang Long, Li Liu,
Ling Shao
6. Fused DNN: A Deep Neural Network Fusion Approach to
Fast and Robust Pedestrian Detection, Xianzhi Du, Mostafa
El-Khamy, Jungwon Lee, Larry Davis
7. Fast Pedestrian Detection via Random Projection Features
With Shape Prior, Yun Zhao, Zejian Yuan, Dapeng Chen, Jie
Lyu, Tie Liu
8. Enriched Deep Recurrent Visual Attention Model for
Multiple Object Recognition, Artsiom Ablavatski, Shijian
Lu, Jianfei Cai
9. Box Refinement: Object Proposal Enhancement and
Pruning, Siyang Li, Heming Zhang, Jjunting Zhang, Yuzhuo
Ren, C.-C. Jay Kuo
10. Semantic Text Summarization of Long Videos, Shagan
Sah, Sourabh Kulhare, Allison Gray, Subhashini
Venugopalan, Emily Prud'hommeaux, Raymond Ptucha
11. Unsupervised Joint Mining of Deep Features and Image
Labels for Large-Scale Radiology Image Annotation and
Scene Recognition, Xiaosong Wang, Le Lu, Hoo-chang Shin,
Lauren Kim, Mohammadhadi Bagheri, Isabella Nogues,
Jianhua Yao, Ronald M. Summers
0900–0955 Oral 5B: Industrial Inspection, VR & AR, Stereo, Evaluation (Russian River Valley)
Chair: Tom Drummond (Monash Univ.)
Format (5 min. short presentation)
1. Probabilistic Surface Inference for Industrial Inspection
Planning, Mahsa Mohammadikaji, Stephan Bergmann,
Stephan Irgenfried, Jürgen Beyerer, Carsten Dachsbacher,
Heinz Wörn
2. Spatio-Temporal Anomaly Detection for Industrial Robots
Through Prediction in Unsupervised Feature Space, Asim
Munawar, Phongtharin Vinayavekhin, Giovanni De Magistris
3. Automatic Defect Recognition in X-Ray Testing Using
Computer Vision, Domingo Mery, Carlos Arteta
4. X-Ray PoseNet: 6 DoF Pose Estimation for Mobile X-Ray
Devices, Mai Bui, Shadi Albarqouni, Michael Schrapp, Nassir
Navab, Slobodan Ilic
5. Crack Segmentation by Leveraging Multiple Frames of
Varying Illumination, Stephen J. Schmugge, Lance Rice,
John Lindberg, Robert Grizzi, Chris Joffe, Min C. Shin
6. GPU-Accelerated Real-Time Stixel Computation, Daniel
Hernandez-Juarez, Antonio Espinosa, Juan Carlos Moure,
David Vázquez, Antonio López
7. Model-Driven Simulations for Computer Vision, VSR
Veeravasarapu, Constantin Rothkopf, Ramesh Visvanathan
8. Automatic Calibration of a Multiple-Projector Spherical
Fish Tank VR Display, Qian Zhou, Gregor Miller, Kai Wu,
Daniela Correa, Sidney Fels
9. Transfer Learning and Deep Feature Extraction for
Planktonic Image Data Sets, Eric C. Orenstein, Oscar
Beijbom
10. Fast and Robust Eyelid Outline and Aperture Detection in
Real-World Scenarios, Wolfgang Fuhl, Thiago Santini,
Enkelejda Kasneci
Wednesday, March 29 Program
12
11. On Crater Verification Using Mislocalized Crater Regions,
Ebrahim Emami, George Bebis, Ara Nefian, Terry Fong
1000–1045 Morning Break (Alexander Valley)
1045–1145 Oral 6A: Face Processing, Biometrics, Image Compression, HCI (Dry Creek Valley)
Chair: Terry Boult (Univ. of Colorado Colorado Springs)
Format (5 min. short presentation)
1. Robust 3D Patch-Based Face Hallucination, Chengchao Qu,
Christian Herrmann, Eduardo Monari, Tobias Schuchert,
Jürgen Beyerer
2. Dictionary Alignment for Low-Resolution and
Heterogeneous Face Recognition, Sivaram Prasad
Mudunuri, Soma Biswas
3. Pose-Robust Face Verification by Exploiting Competing
Tasks, Boyu Lu, Jingxiao Zheng, Jun-Cheng Chen, Rama
Chellappa
4. Deep Feature Consistent Variational Autoencoder, Xianxu
Hou, Linlin Shen, Ke Sun, Guoping Qiu
5. Egocentric Height Estimation, Jessica Finocchiaro, Aisha
Urooj Khan, Ali Borji
6. Gender-From-Iris or Gender-From-Mascara?, Andrey
Kuehlkamp, Benedict Becker, Kevin Bowyer
7. ContlensNet: Robust Iris Contact Lens Detection Using
Deep Convolutional Neural Networks, Raghavendra
Ramachandra, Kiran B. Raja, Christoph Busch
8. Breathing Rate Monitoring During Sleep From a Depth
Camera Under Real-Life Conditions, Manuel Martinez,
Rainer Stiefelhagen
9. Writer Identification in Noisy Handwritten Documents,
Karl Ni, Patrick Callier, Bradley Hatch
10. Image Set Classification Using Sparse Bayesian
Regression, Mohammed E. Fathy, Rama Chellappa
11. Bandwidth Limited Object Recognition in High Resolution
Imagery, Laura Lopez-Fuentes, Andrew D. Bagdanov, Joost
van de Weijer, Harald Skinnemoen
12. Personalized Image Aesthetic Quality Assessment by Joint
Regression and Ranking, Kayoung Park, Seunghoon Hong,
Mooyeol Baek, Bohyung Han
1045–1140 Oral 6B: Human Motion, Image Indexing, Vision Systems (Russian River Valley)
Chair: Tom Drummond (Monash Univ.)
Format (5 min. short presentation)
1. Deep Spatio-Temporal Features for Multimodal Emotion
Recognition, Dung Nguyen, Kien Thanh, Sridha Sridharan,
Afsane Ghasemi, David Dean, Clinton Fookes
2. Human Pose Estimation Using Deep Structure Guided
Learning, Baole Ai, Yu Zhou, Yao Yu, Sidan Du
3. Switching Linear Inverse-Regression Model for Tracking
Head Pose, Vincent Drouard, Silèye Ba, Radu Horaud
4. Deep Image Set Hashing, Jie Feng, Svebor Karaman, Shih-
Fu Chang
5. Learning Effective Binary Descriptors via Cross Entropy,
Liu Liu, Hairong Qi
6. Convolutional Sparse and Low-Rank Coding-Based Rain
Streak Removal, He Zhang, Vishal M. Patel
7. Fast, Accurate, Small-Scale 3D Scene Capture Using a
Low-Cost Depth Sensor, Nicole Carey, Justin Werfel,
Radhika Nagpal
8. Who Moved My Cheese? Automatic Annotation of Rodent
Behaviors With Convolutional Neural Networks,
Zhongzheng Ren, Adriana Noronha, Annie Vogel Ciernia,
Yong Jae Lee
9. Temporally Coded Illumination for Rolling Shutter Motion
De-Blurring, Scott McCloskey, Sharath Venkatesha
10. Text-Edge-Box: An Object Proposal Approach for Scene
Texts Localization, Dinh Nguyen, Lu Shijian, Nizar Ouarti,
Mounir Mokhtari
11. Distance Penalization and Fusion for Person Re-
Identification, Behzad Mirmahboub, Mohamed Lamine
Mekhalfi, Vittorio Murino
1145–1300 Lunch (On your own)
1230–1730 Registration (Alexander Valley Foyer)
1700–1730 Afternoon Break (Alexander Valley)
Wednesday, March 29 Program
13
1730–1830 Keynote Session (Dry Creek Valley)
Keynote Talk: Image Description & Beyond..., Tamara Berg (Shopagon Inc. & UNC-Chapel Hill)
Abstract: Much of everyday language and discourse concerns the visual world around us, making understanding the relationship between the physical world and language describing that world an important challenge problem for AI. Comprehending the complex and subtle interplay between the visual and linguistic domains will have broad applicability toward inferring human-like understanding of images, producing natural human-robot interactions, and grounding natural language. In computer vision, along with improvements in deep learning based visual recognition, there has been an explosion of recent interest in methods to automatically generate natural language outputs for images and videos. In this talk I will describe our group's efforts to understand and produce relevant natural language about images, from developing early methods to generate complete and human-like image descriptions, to modeling how people interpret and describe image content, to moving beyond general image descriptions toward more focused natural language, such as referring expressions and question-answering.
1830–1845 Closing Remarks (Dry Creek Valley)
1845–1930 Dinner (Alexander Valley)
1930–2130 Demos (Alexander Valley)
Instant Immersion of Brands in Videos, Brunno Attorre, Bill Marino (Uru, Inc.)
Visual Intelligence for Fashion, Jayaguru Panda, Naveen Sinha, Labhesh Patel (Abzooba India InfoTech)
Regression of 3D Morphable Face Models Using a Deep CNN, Anh Tuan Tran (Univ. of Southern California)
1930–2130 Exhibits (Alexander Valley)
Amazon • Zillow
1930–2130 Poster Session 3 (Alexander Valley)
Posters for Oral Sessions 5A, 5B, 6A, and 6B.
Notes:
Thursday, March 30 Workshops
14
Thursday, March 30
0800–1100 Registration (Alexander Valley Foyer)
1015–1045 Morning Break (Alexander Valley)
1230–1730 Registration (Alexander Valley Foyer)
Automated Analysis of Video Data for Wildlife Surveillance
Organizers: Benjamin Richards Anthony Hoogs David Kriegman
Location: Russian River Valley
Schedule: Full Day
0900 Overview of NOAA Fisheries Strategic Initiative on
Automated Image Analysis, Benjamin Richards
1000 VIAME: Open-Source Software for Video and Image
Analysis in the Marine Environment, Anthony Hoogs
1030 Morning Break
1045 Deep Learning for All: Managing and Analyzing
Underwater and Remote Sensing Imagery on the Web
Using BisQue, Dmitry V. Fedorov, Kristian G. Kvilekval,
B. S. Manjunath, Brandon M. Doheny, Sarah R.
Sampson, Robert J. Miller
1115 TBA
1145 TBA
1215 Lunch (On your own)
1315 TBA
1345 TBA
1415 TBA
1445 TBA
Large-Scale Soft Biometrics
Organizers: Yongxin Ge Xin Feng Xiuzhuang Zhou Li Geng
Location: Dry Creek Valley I
Schedule: Half Day (Morning)
0900 Opening Remarks
0910 Soft Biometrics in Online Social Networks: A Case
Study on Twitter User Gender Recognition, Li Geng, Ke
Zhang, Xinzhou Wei, Xin Feng
0930 Online Cost Efficient Customer Recognition System for
Retail Analytics, Yilin Song, Yuanyi Xue, Chenge Li, Xuan
Zhao, Sixuan Liu, Xiaona Zhuo, Kangjin Zhang, Bo Yan,
Xiaoran Ning, Yao Wang, Xin Feng
0950 An Intelligent Building Occupancy Detection System
Based on Sparse Auto-Encoder, Zhi Liu, Jie Zhang, Li
Geng
1010 Morning Break
1030 Automatic Video Annotation System for Archival
Sports Video, Yuanyi Xue, Yilin Song, Chenge Li, An-ti
Chiang, Xiaoran Ning
1050 A Phase Field Variational Model With Arctangent
Regularization for Saliency Detection, Meng Li, Xing
Liu, Liming Tang
1110 A Variable Exponent p-Laplace Variational Model
Preserving Texture for Image Interpolation, Zhan Yi,
Yongxin Ge
1130 Conclusions & Future Work
Thursday, March 30 Workshops
15
Human Activity Analysis With Highly Diverse Cameras
Organizers: Hideo Saito Yoichi Sato Ryo Yonetani Yuko Ozasa Kris M. Kitani
Location: Dry Creek Valley II
Schedule: Half Day (Morning)
0900 Opening Remarks
S1: Keynote Session 1 (0910-0940)
0910 Keynote Talk: Activity Recognition From Persons'
Viewpoint and Robots' Viewpoint, Michael S. Ryoo
(Indiana Univ. Bloomington)
S2: Paper Session 1 (0940-1025)
0940 Measuring Grasp Posture Using an Embedded Camera,
Naoaki Kashiwagi, Yuta Sugiura, Natsuki Miyata,
Mitsunori Tada, Maki Sugimoto, Hideo Saito
0955 Gaze Estimation Based on Eyeball-Head Dynamics,
Ikuhisa Mitsugami, Yamato Okinaka, Yasushi Yagi
1010 Speaker Identification Based on Integrated Face
Direction in a Group Conversation, Naoto Ienaga, Yuko
Ozasa, Hideo Saito
1025 Break
S3: Keynote Session 2 (1040-1110)
1040 Keynote Talk: Advances in Automating Analysis of
Highway and Driver Video Image Data: Managing Low
and Variable Image Quality, David Kuehn, Charles Fay
(FHWA Exploratory Advanced Research (EAR) Program)
S4: Paper Session 2 (1110-1155)
1110 Action Recognition in Still Images Using Word
Embeddings From Natural Language Descriptions,
Karan Sharma, Arun CS Kumar, Suchendra M.
Bhandarkar
1125 Investigation of Customer Behavior Analysis Based on
Top-View Depth Camera, Junpei Yamamoto, Katsufumi
Inoue, Michifumi Yoshioka
1140 Measurement of Eyeball Rotational Movements in the
Dark Environment, Kiyoshi Hoshino, Nayuta Ono
S3: Keynote Session 3 (1155-1225)
1155 Keynote Talk: Weakly-supervised activity localization,
Yong-Jae Lee (Univ. of California, Davis)
1225 Closing Remarks
Notes: