Visual Information Retrieval in Endoscopic Video Archives Jennifer Roldan Carlos, Mathias Lux, Xavier Giro-i-Nieto, Pia Munoz & Nektarios Anagnostopoulos
Visual Information Retrieval in Endoscopic Video Archives Jennifer Roldan Carlos, Mathias Lux, Xavier Giro-i-Nieto, Pia Munoz & Nektarios Anagnostopoulos
Motivation
• Surgery videos are taken every day
• Operations rooms are fully booked
• Many procedures already involve video
• Storing videos is / will be req. by law
Amount of Videos
• 8-10 h operations / room and day • say 6 hours excluding set ups, etc.
• 5-6 days a week
• 1,560 h video / year & OR
Use Case of Re-finding Frames
• Surgeons take „shots“ • documentation, for patients, discussion
• Shots are intentionally framed • and make for excellent
representative images
Approach
• Temporal sampling: every 5th frame
• Indexing and search based on • a set of global features • or a localized global features
Late Fusion for Global Features
Features Employed
• Pyramid HOG • extensive and large texture feature
• Color and Edge Directivity Descriptor • compact and well performing joint histogram
• SIMPLE • CEDD descriptors of patches at SURF key points
Data Set
• 33 hours of video • from actual procedures focusing on laporoscopy
• 1,276 videos in total • 593,446 frames after temporal sampling
Example Results - SIMPLE
Evaluation – Re-Finding in Numbers
• Randomly selected more than 700 shots
• Excluding tests, white balance and out-of-patient
• Resulting in 600 sample queries
Evaluation – Re-Finding in Numbers
• Hypothesis I: every 5th frame is enough to re-find images.
• Hypothesis II: There is a noticeable difference between global and local features.
Evaluation – Re-Finding in Numbers
Evaluation – User Study
• Exploratory study, thinking aloud test
• Interactive web page presented to users • ten cases with all available shots as queries • three non-labeled search engines
Evaluation – User Study
Evaluation – User Study
• Population drawn from our projects • experts in processing endoscopic videos • well-aware of the requirements surgeons registered
• Task was to ... • browse diverse results and • voice drawbacks and benefits
Findings
• Sampling every 5th frame works (with headroom)
• Study participants noted that • late fusion works as expected and yields
interesting results besides near duplicates • SIMPLE works better for semantically similar
content, ie. translated instruments, etc.
Conclusions
• The system does not utilize • domain dependent methods and heuristics • run-time and storage demanding methods
• Still, it works out for the use case as a • candidate support system for surgeons • baseline to start on interactive video retrieval for
laporoscopy.
Future Work
• Salient contours of images • focus on being robust against lighting and noise
Future Work
credits for feature & images: Chryssanthi Iakovidou
Future Work
credits for feature & images: Chryssanthi Iakovidou
Time for questions?
Mathias Lux ± Associate Professor @ Klagenfurt University, Austria
Thanks go to Jennifer Roldan Carlos, Xavier Giro-i-Nieto, Pia Munoz & Nektarios Anagnostopoulos