Visual Information Retrieval in Endoscopic Video Archives

Visual Information Retrieval in Endoscopic Video Archives Jennifer Roldan Carlos, Mathias Lux, Xavier Giro-i-Nieto, Pia Munoz & Nektarios Anagnostopoulos

Motivation

•  Surgery videos are taken every day

•  Operations rooms are fully booked

•  Many procedures already involve video

•  Storing videos is / will be req. by law

Amount of Videos

•  8-10 h operations / room and day •  say 6 hours excluding set ups, etc.

•  5-6 days a week

•  1,560 h video / year & OR

Use Case of Re-finding Frames

•  Surgeons take „shots“ •  documentation, for patients, discussion

•  Shots are intentionally framed •  and make for excellent

representative images

Approach

•  Temporal sampling: every 5th frame

•  Indexing and search based on •  a set of global features •  or a localized global features

Late Fusion for Global Features

Features Employed

•  Pyramid HOG •  extensive and large texture feature

•  Color and Edge Directivity Descriptor •  compact and well performing joint histogram

•  SIMPLE •  CEDD descriptors of patches at SURF key points

Data Set

•  33 hours of video •  from actual procedures focusing on laporoscopy

•  1,276 videos in total •  593,446 frames after temporal sampling

Example Results - SIMPLE

Evaluation – Re-Finding in Numbers

•  Randomly selected more than 700 shots

•  Excluding tests, white balance and out-of-patient

•  Resulting in 600 sample queries


•  Hypothesis I: every 5th frame is enough to re-find images.

•  Hypothesis II: There is a noticeable difference between global and local features.


Evaluation – User Study

•  Exploratory study, thinking aloud test

•  Interactive web page presented to users •  ten cases with all available shots as queries •  three non-labeled search engines



•  Population drawn from our projects •  experts in processing endoscopic videos •  well-aware of the requirements surgeons registered

•  Task was to ... •  browse diverse results and •  voice drawbacks and benefits

Findings

•  Sampling every 5th frame works (with headroom)

•  Study participants noted that •  late fusion works as expected and yields

interesting results besides near duplicates •  SIMPLE works better for semantically similar

content, ie. translated instruments, etc.

Conclusions

•  The system does not utilize •  domain dependent methods and heuristics •  run-time and storage demanding methods

•  Still, it works out for the use case as a •  candidate support system for surgeons •  baseline to start on interactive video retrieval for

laporoscopy.

Future Work

•  Salient contours of images •  focus on being robust against lighting and noise

Future Work

credits for feature & images: Chryssanthi Iakovidou

Future Work

credits for feature & images: Chryssanthi Iakovidou

Time for questions?

Mathias Lux ± Associate Professor @ Klagenfurt University, Austria

[email protected]

Thanks go to Jennifer Roldan Carlos, Xavier Giro-i-Nieto, Pia Munoz & Nektarios Anagnostopoulos

Visual Information Retrieval in Endoscopic Video Archives

Technology