-
International Journal of Computer Assisted Radiology and Surgery
(2019) 14:2165–2176https://doi.org/10.1007/s11548-019-02030-z
ORIG INAL ART ICLE
Moving object tracking in clinical scenarios: application to
cardiacsurgery and cerebral aneurysm clipping
Sarada Prasad Dakua1 · Julien Abinahed1 · Ayman Zakaria1 ·
Shidin Balakrishnan1 · Georges Younes1 ·Nikhil Navkar1 · Abdulla
Al-Ansari1 · Xiaojun Zhai3 · Faycal Bensaali2 · Abbes Amira4
Received: 28 January 2019 / Accepted: 3 July 2019 / Published
online: 15 July 2019© The Author(s) 2019
AbstractBackground andobjectives Surgical procedures such as
laparoscopic and robotic surgeries are popular since they are
invasivein nature and use miniaturized surgical instruments for
small incisions. Tracking of the instruments (graspers, needle
drivers)and field of view from the stereoscopic camera during
surgery could further help the surgeons to remain focussed and
reduce theprobability of committing any mistakes. Tracking is
usually preferred in computerized video surveillance, traffic
monitoring,military surveillance system, and vehicle navigation.
Despite the numerous efforts over the last few years, object
trackingstill remains an open research problem, mainly due to
motion blur, image noise, lack of image texture, and occlusion.
Mostof the existing object tracking methods are time-consuming and
less accurate when the input video contains high volume
ofinformation and more number of instruments.Methods This paper
presents a variational framework to track the motion of moving
objects in surgery videos. The keycontributions are as follows: (1)
A denoisingmethod using stochastic resonance inmaximal overlap
discrete wavelet transformis proposed and (2) a robust energy
functional based on Bhattacharyya coefficient tomatch the target
region in the first frame ofthe input sequence with the subsequent
frames using a similarity metric is developed. A modified affine
transformation-basedregistration is used to estimate the motion of
the features following an active contour-based segmentation method
to convergethe contour resulted from the registration
process.Results and conclusion The proposed method has been
implemented on publicly available databases; the results are
foundsatisfactory. Overlap index (OI) is used to evaluate the
tracking performance, and the maximum OI is found to be 76% and88%
on private data and public data sequences.
Keywords Cerebral aneurysm · Segmentation · Object tracking ·
Heart surgery · Brain aneurysm clipping · Level sets
Introduction
Looking at the steep rise in cardiac diseases, bona
fidetreatment including surgery is necessary to prevent its riseand
avoid sudden cardiac death [1]. Similarly, cerebral
B Sarada Prasad [email protected]
1 Department of Surgery, Hamad Medical Corporation,
Doha,Qatar
2 Department of Electrical Engineering, Qatar University,Doha,
Qatar
3 School of Computer Science and Electronic
Engineering,University of Essex, Colchester, UK
4 Faculty of Computing, Engineering and Media De
MontfortUniversity, Leicester, UK
aneurysm (CA) is one of the devastating cerebrovasculardiseases
of adult population worldwide that cause sub-arachnoid hemorrhage,
intracerebral hematoma, and othercomplications leading to high
mortality rate [2]. Surgeryis considered as an efficient modality
for the patients withcardiac complications and ruptured cerebral
aneurysms.Tracking could be considered as a treatment support
andplanning in robotic, laparoscopic, and medical education.During
robotic surgery or laparoscopic surgery, the sur-geons concentrate
on the surgery to avoid even slight,possible mortality and
morbidity and usually get stressed.In this scenario, motion
tracking of the tools and viewingthe desired operating field may be
considered two support-ive pillars to augment the treatment and
improve successrate.
123
http://crossmark.crossref.org/dialog/?doi=10.1007/s11548-019-02030-z&domain=pdf
-
2166 International Journal of Computer Assisted Radiology and
Surgery (2019) 14:2165–2176
Clinical requirements in surgery
Many factors contribute to successful outcome of a
surgery,specificallyminimally invasive surgery (MIS). These
includetechnical factors, such as in-depth understanding of the
rel-evant anatomy, clear understanding of the steps involved inthe
procedure, well-honed surgical skills and tool manipu-lation, as
well as anthropomorphic factors such as operatingteam chemistry and
dynamics. To a certain degree, MIS sur-geons can advance their
anatomy knowledge and proceduralunderstanding through reading and
surgical videos; how-ever, other technical skills such as tool
manipulation andpositioning, which are very crucial to the
successful out-come of the surgery [3,4] are more complex, nuanced
andtime dependent to develop due to restricted vision,
limitedworking space, loss of visual cues and tactile feedback
[5].Quality and adequacy of surgical proficiency directly
impactintra-operative and postoperative outcomes [6]. The
existing“apprenticeship” model of training in surgery provides
lim-ited and time-consuming opportunities to gain the
requiredtechnical competencies. In its current form, the
assessmentof surgical proficiency is heavily reliant on
subject-matterexperts/subjective assessments [3]. Thus, surgical
trainingand planning could benefit greatly from visual
supportprovidedby instrument/motion tracking,
byprovidingbench-marked metrics for continued objective and
constructiveassessment of highest standards of surgical skills, and
low-ering the risk of false tool trajectories and orientations
[7],alignment of implants and placement of screws [8], etc.
Such augmented visual support for both surgical train-ing and
planning could be provided through object/motiontracking of the
tools (such as scope, scissors, etc.) by pro-viding objective
assessment, benchmarking, and automatedfeedback on metrics such as
path length/deviation, econ-omy and smoothness of hand movements,
depth perception,rotational orientation, changes in instrument
velocity andtime [9]. Zhao et al. [10] report that intra-operative
track-ing/detection of surgical instruments can provide
importantinformation to monitor instruments for the operational
nav-igation in MIS, especially in the robotic minimally
invasivesurgeries (RMIS). Thus, based on the above, the
perceivedimpact of tool tracking/positioning on surgical training
andintra-operative guidance leads to (a) ensured patient safetyvia
proficient tool movements and avoidance of criticaltissue
structures and (b) facilitation of a smooth and effi-cient invasive
procedure [11]. This is crucial in surgery,as by continuously
charting the location, movement, speed,and acceleration of the
different surgical instruments in theoperating field, the surgeon
is continuously aware of thewhereabouts of his instruments in
relation to the patient’svital organs, blood vessels, and nerves
during surgery. Forsurgical training, it objectively helps assess
surgical perfor-mance and helps differentiate between an expert and
a novice
surgeon, such that optimal training can then be provided tothe
novice to ensure the highest levels of patient care [3].Therefore,
precise positioning of the tools remains pivotal inminimally
invasive surgical procedures [12] highlighting theneed of object
tracking via its impact on surgical training andintra-operative
guidance.
Kobayashi et al. [13] applied surgical navigation tech-niques
and tool tracking to renal artery dissection withinthe
robot-assisted partial nephrectomy procedure and foundthat
inefficient toolmovements involving “insert,” “pull,” and“rotate”
motions, as well as time to visualize and dissectthe artery were
significantly improved owing to improvedvisualization and control
over the tool and anatomy. Pedi-atric orthopedic surgeons found an
increase in accuracyand a reduction in operating time when using
image-guidedsurgical robotic systems to overcome the inaccuracies
ofhand-controlled tool positioning [14]; these robots achievethis
by providing information about surgical tools or implantsrelative
to a target organ (bone). In urology, motion track-ing can greatly
assist in outpatient procedures such as MRIand ultrasound-guided
prostate biopsy, allowing the sur-geon to accurately position and
invade suspicious malignantzones for a tissue sample [15]. In
interventional radiology,motion tracking can help track guide-wires
during endovas-cular interventions and radiation therapy [16]. In
addition tothese, applications of surgical navigation systems and
tooltracking/motion analysis are being explored in many
othersurgical fields, including ear-nose-and-throat (ENT)
surgery[7], craniomaxillofacial surgery [17], cardiothoracic
surgery[18], and orthopedic surgery [19].
Related work
The literature ofmotion tracking is rich; a few recentmethodsare
included in this paper. Kim and Park [20] present a strat-egy that
is based on edge information to assist object-basedvideo coding,
motion estimation, and motion compensationfor MPEG 4 and MPEG 7
utilizing the human visual per-ception to provide edge information.
However, the methodcritically depends on its ability to establish
correct corre-spondences between points on the model edges and
edgepixels in an image. Furthermore, this is a non-trivial prob-lem
especially in the presence of large inter-frame motionsand
cluttered environments. Subudhi et al. [21] propose atwo-step
method: spatio-temporal spatial segmentation andtemporal
segmentation that usesMarkov randomfield (MRF)model and posteriori
probability (MAP) estimation tech-nique. Duffner and Garcia [22]
present an algorithm forreal-time single-object tracking, where a
detector makes useof the generalized Hough transform with color and
gradientdescriptors; a probabilistic segmentation method is used
forforeground and background color distributions. However, itis
computationally expensive, especially when the number
123
-
International Journal of Computer Assisted Radiology and Surgery
(2019) 14:2165–2176 2167
of parameters is large. It also could be erroneous because
thegradient information usually leads to error when noise levelis
high. Li et al. [23] suggest a method within the correla-tion
framework (CF) that models a tracker maximizing themargin between
the target and surrounding background byexploiting background
information effectively. They proposeto train a CF by multilevel
scale supervision, which aims tomake CF sensitive to the target
scale variation. Then the twoindividualmodules are integrated into
one framework simpli-fying the tracking model. However, the
computational loadand efficiency are still twomajor
concerns.Mahalingam et al.[24] propose a fuzzymorphological filter
and blob detection-based method for object tracking. However, the
performancegets deteriorated in the presence of noise, lack of
illumina-tion, and occlusion. Zhang et al. [25] propose a
correlationparticle filter (CPF) that combines a correlation filter
and aparticle filter. However, this tracker is still unable to
dealwith scale variation and partial occlusion. Yang et al.
[26]present a method to analyze frames extracted from videosusing
kernelized correlation filters (KCF) and backgroundsubtraction (BS)
(KCF-BS) to plot the 3D trajectory of cab-bage butterfly. The
KCF-BS algorithm is used to track thebutterfly in video frames and
obtain coordinates of the tar-get centroid in two videos. However,
it is noticed that thetarget sometimes gets lost and the method is
unable to re-detect or recognize the target when the target motion
is fast.Du et al. [27] propose an object tracking method for
satellitevideos by fusing KCF tracker and a three-frame
differencealgorithm. Although the method reports interesting
results,it takes long time to perform. Liu et al. [29] propose a
corre-lation filter-based tracker that consists of multiple
positions’detections and alternate templates. The detection
position isrepositioned according to the estimated speed of target
by anoptical flowmethod, and the alternate template is stored witha
template update mechanism. However, this method failsto perform if
the size of each target is too small comparedwith the entire image,
and the target and the background arevery similar. Liu et al. [30]
propose a method by integratinghistogram of oriented gradient, RGB
histogram, and motionhistogram into a novel statistical model to
track the target inunmanned aerial vehicle-captured videos.
However, it failsto perform in occluded scenes.
Du et al. [31] present a method that is based on itera-tive
graph seeking. Usually, the superpixel-based methodsuse mid-level
visual cues to represent target parts wherelocal appearance
variations are exploited by superpixel rep-resentation. These
methods have three sequential steps: (A)target part selection, (B)
target part matching, and (C) targetstate estimation. (A) selects
candidate target parts from thebackground, (B) a local appearance
model associates partsbetween consecutive frames (target part
matching, centerpixel location and size of the target) is estimated
based onmajority voting, and (C) target state is estimated based
on
majority voting of matching results. This method
integratestarget part selection, part matching, and state
estimationusing a unified energy minimization framework. It
incorpo-rates structural information in local parts variations
using theglobal constraint. Although the results are reported
promis-ing, the target part selection and target part matching
whencombinedly merge with the correlation filter, the estimationof
the target takes long time to converge due to scale varia-tion and
partial occlusion that are bound to happen in surgeryscenarios.
Furthermore, when the noise level (for instance, incardiac cineMRI
data) in the input frames is high, themethodwould certainly
struggle to perform. We intend to addressthese issues through our
proposed method. Furthermore, ifthe literature above is carefully
observed, noise has alwaysbeen an issue in most of the methods.
Therefore, in our pro-posed method, we first denoise the input
frames. The targetregion on the first frame is chosen by a level
set (LS) func-tion, and then, the foreground and background models
aregenerated. The foreground and background distributions
aredetermined using the models in subsequent frames, and themotion
of the pixels from the region of interest is estimatedthrough a
registration framework. Additionally, the selectedregion contour in
the current frame is registered with thesubsequent frame. Finally,
segmentation is applied to refinethe contour generated during
registration and the contour isupdated.
The paper is organized as follows: Section “Methodologyand data”
describes the denoising stage; “Target rendering”section presents
the approach for target rendering (includ-ing region selection and
developing models); “Registration”section defines a method for
motion estimation through reg-istration; “Segmentation” section
presents the segmentation;“Results and discussion” section provides
the results, while“Conclusions and future work” section concludes
the paper.
Methodology and data
The method is illustrated in Fig. 1. First, the input frame
isdenoised to minimize the negative impact of noise on sub-sequent
steps. The target region is then selected followedby the
development of background models for motion esti-mation through a
registration framework. Finally, the roughcontour generated in
registration step is further refined (by aproper segmentation
method) and the contour is updated onsubsequent frames.
Denoising of image sequences
Over the years, most of the methods address the noisy
andcluttered medical images, mostly, by filtering that result
sig-nificant degradation in image quality. One of the
efficientapproaches that counter noise and constructively utilize
noise
123
-
2168 International Journal of Computer Assisted Radiology and
Surgery (2019) 14:2165–2176
Fig. 1 Block diagramdescribing the proposed method
Input frame Denoising Target Region SelectionAnd Initialization
of LS
RegistrationSegmentationObject Tracked
is stochastic resonance (SR) [33]. SR occurs if the
signal-to-noise ratio (SNR) and input/output correlation have
awell-marked maximum at a certain noise level. Unlike verylow or
high noise intensities, moderate ones allow the signalto cross the
threshold giving maximum SNR at some opti-mum noise level. In the
bistable SR model, upon additionof zero mean Gaussian noise, the
pixel is transferred fromweak signal state to strong signal state,
which is modeled byBrownian motion of a particle (pc) placed in a
double-wellpotential system. The state at which performance metrics
arefound optimum can be considered as the stable state provid-ing
maximum SNR. There have already been many attemptsto use SR in
different domains such as Fourier and spatialdomains [34]; however,
we have chosen the maximal overlapdiscrete wavelet transform
(MODWT) [36] because of someof its key advantages: (1) MODWT can
handle any samplesize, (2) the smooth and detail coefficients of
MODWTmul-tiresolution analysis are associatedwith zero phase
filters, (3)it is transform invariant, and (4) it produces a more
asymp-totically efficient wavelet variance estimator than DWT.
Maximal overlap discrete wavelet transform
Generally,DWTis definedby:ψ j,k (t) = 2 j2 ψ(2 j t − k) j,
k ∈ Z; z = {0, 1, 2, . . .}, where ψ is a real-valued func-tion
compactly supported, and
∫ ∞−∞ ψ (t) dt = 0. MODWT
is evaluated using dilation equations: φ (t) = √2∑k lkφ(2t − k)
, ψ (t) = √2∑k hkφ (2t − k), where φ (2t − k)and ψ (t) are father
wavelet defining low-pass filter coef-ficients and mother wavelet
defining high-pass filtercoefficients lk : lk =
√2
∫ ∞−∞ φ (t) φ (2t − k) dt, hk =√
2∫ ∞−∞ψ (t) ψ(2t − k)dt .
Denoising by MODWT
In this methodology, 2D MODWT is applied to the M × Nsize image
I . Applying SR to the approximation and detailcoefficients, the
stochastically enhanced (tuned) coefficientsets in MODWT domain are
obtained as Wsψ (l, p, q)SRand W (l0, p, q)SR . The SR in discrete
form is defined as:dxdt =
[ax − ex3] + B sinωt + √Dξ (t), where √Dξ (t)
and B sinωt represent noise and input, respectively; these
arereplaced by MODWT sub-band coefficients. The noise termis the
factor to produce SR; maximization of SNR occurs atthe double-well
parameter a. Implementation of SR on dig-ital images necessitates
the need for solving the stochasticdifferential equation using
Euler–Maruyama’s method [35]that gives the iterative discrete
equation:
x(n + 1) = x(n) + Δt[(ax(n) − ex3(n)) + Input(n)
]
(1)
where a and e are the bistable parameters, whereas n and
Δtrepresent iteration and sampling time, respectively. I
nputdenotes the sequence of input signal and noise, with the
ini-tial condition being x(0) = 0. The final stochastic
simulationis obtained after some predefined number of iterations.
Giventhe tuned (enhanced and stabilized) set ofwavelet
coefficients(Xφ (l0, p, q) and Xsψ (l, p, q)), the denoised image
Idenoisedin spatial domain is obtained by inversemaximal overlap
dis-crete wavelet transform (IMODWT) as:
Idenoised = 1√MN
∑
p
∑
q
Xφ (l0, p, q) φl0,p,q (i, j)
+ 1√MN
∑
s∈(H ,V ,D)
∑
l=l0
∑
p
∑
q
Xsψ (l, p, q) ψsl0,p,q (i, j)
The double-well parameters a and e are determined from theSNR by
differentiating SR with respect to a and equating tozero; in this
way, SNR is maximized resulting in a = 2σ 20for maximum SNR, where
σ0 is the noise level administeredto the input image. The maximum
possible value of restoringforce (R = B sinωt) in terms of gradient
of some bistablepotential function U (x), R = − dUdx = −ax + ex3,
dRdx =−a + 3ex2 = 0 resulting in x = √a/3e. R at this valuegives
maximum force as
√4a327e and B sinωt <
√4a327e . Max-
imizing the left term (keeping B = 1), e < 4a327 . In orderto
get the parameter values, we consider a = w × 2σ 20 , ande = z
×
√4a327 ; w and z are weight parameters for a and e.
Initially, w is an experimentally chosen constant that later
123
-
International Journal of Computer Assisted Radiology and Surgery
(2019) 14:2165–2176 2169
becomes input image standard deviation dependent, while zis a
number less than 1 to ensure sub-threshold condition ofthe signal.
In this way, the noise in input image is counteredand maximum
information from the image is achieved.
Target rendering
Target region selection or target rendering [28,37] is the
ini-tial step in this motion tracking. Then the features (suchas
intensity, color, edge, texture, etc.) are selected thatcan
appropriately describe the target. The notations usedin target
rendering are: f s—feature space, r—number offeatures,
fd—foreground distribution (by the features), andbd—background
distribution. The region is initialized onthe first frame and
represented by a level set function φbecause of its flexibility in
choosing the contour. The dis-tributions of foreground (φ ≥ 0) and
background (−th <φ < 0, th is the threshold to restrict the
region of inter-est into small area) regions are represented by f g
(φ) andbg (φ), respectively, and match with fd and bd. Next,
theforeground and background models are generated. Supposethe
pixels
{x f ,i
}i=1,...,n f and
{xb,i
}i=1,...,nb fall in fore-
ground and background regions; the function z : �2 →{1, . . . ,
r} can be used to map the pixels (xi ) into thebin b(xi ) in
feature space. The probability of the featurespace in the models
is: f d f s = 1n f
∑n fi=1 δ
[(xi, f
) − f s]
and bd f s = 1nb∑nb
i=1 δ[(xi, f
) − f s], where δ is theKronecker delta function and n f and nb
are the num-ber of pixels in foreground and background,
respectively.The foreground and background distributions in the
cur-rent frame candidate region (−th < φ < 0) areobtained
as:
f g (φ) = 1Ff
n∑
i=1H (φ (xi )) δ [b (xi ) − f s] and bg (φ)
= 1Fb
n∑
i=1(1 − H (φ (xi ))) δ [b (xi ) − f s] (2)
H(.) is the Heaviside function to select foreground region;Ff
and Fb are the normalization factors.
Registration
Registration of the target in the first frame with the next
sub-sequent frame is performed to estimate the affine deformationof
the target. We determine the foreground and backgrounddistributions
in the frames and match them with respectiveforeground and
background models. We use Bhattacharyyametric [38] because it is
computationally fast and is alreadybeing used in face recognition
for years. Additionally, ithas straightforward geometric
interpretation. Since it is the
cosine angle between fd and f g(φ) or between bd andbg(φ),
higher value of the coefficient indicates better match-ing between
candidate and targetmodels. Thus, our similaritydistance
measure:
En1 (φ) =r∑
f s=1(√
f g f s (φ) f d f s + γ√bg f s (φ) bd f s
)
(3)
where γ is the weight to balance the contribution from
bothforeground and background in the matching.
For deformation estimation, we have proposed a sim-ple and
efficient framework as follows. Suppose in thecurrent frame, φ0 is
the target initial position and the con-tour is obtained by φ = 0.
The probabilities f g (φ0) ={f g f s (φ0)
}f s=1,...,r and bg (φ0) =
{bg f s (φ0)
}f s=1,...,r
are computed. Applying Taylor’s expansion:
En1 (φ) = 12
⎛
⎝r∑
f s=1
√f g f s (φ0) f d f s +
r∑
f s=1f g f s (φ)
√f d f s
f g f s (φ0)
⎞
⎠
+ 12γ
⎛
⎝r∑
f s=1
√bg f s (φ0) bd f s +
r∑
f s=1bg f s (φ)
√bd f s
bg f s (φ0)
⎞
⎠
(4)
By putting Eq. (2) in (4), we get:
En1 (φ) = 12
⎛
⎝r∑
f s=1
√f g f s (φ0) f d f s + 1
Ff
n∑
f s=1h f ,i H (φ (xi ))
⎞
⎠
+ 12γ
⎛
⎝r∑
f s=1
√bg f s (φ0) bd f s + 1
Fb
n∑
f s=1hb,i (1 − H (φ (xi )))
⎞
⎠
(5)
where the weights that play a pivotal role in detect-ing the new
centroid of the target are: h f ,i = ∑rf s=1√
f d f sf g f s (φ0)
δ [z (xi ) − f s] and h f ,i = ∑rf s=1√
bd f sbg f s (φ0)
δ [z (xi ) − f s]. Higher value of Bhattacharyyacoefficient can
be obtained by maximizing (5) that is a func-tion of location x and
contour.
Furthermore, we consider the foreground and backgroundintensity
as additional feature. Suppose the first frame,u0(x, y), consists
of two concentric regions (ui0, u
o0) mean-
ing the input image contains more than one intensity label.This
is certainly challenging in determining a smooth
contourinitialization and deformation because of varying
intensities.Therefore, we integrate both local and global image
informa-tion in the energy term in order tomake it perform as a
perfectstep detector with respect to the initialization of contour.
Theenergy term is defined as:
En2 = λ1EG + λ2EL + ER (6)
where λ1 and λ2 are fixed constants; EG , EL , and ER arethe
global term, local term, and regularized term, respectively
123
-
2170 International Journal of Computer Assisted Radiology and
Surgery (2019) 14:2165–2176
(containing respective image information). ER controls
theboundary smoothness. The local term is defined as,
EL =∫
φ0
(gku0 (x, y) − u0 (x, y) − d2 (x, y))2d2(x, y)2
dxdy
(7)
where gk is an averaging filter with k × k size, d1 and d2are
intensity averages of the difference image gku0 (x, y) −u0 (x, y)
inside and outside the variable curve C , respec-tively. The global
term:
EG =∫
φ0
(u0(x,y)−c2(x,y))2c2(x,y)2
dxdy
(8)
where the constants c1, c2 represent the average inten-sity of
u0(x, y) inside C and outside C , respectively.c1 and c2 are
approximated by a weighted average ofimage intensity u0 (p, q),
where (p, q) is the neighbor-hood of (x, y). It means c1 (x, y) and
c2 (x, y) are spa-tially varying; we formulate c1 (x, y) and c2 (x,
y) as,
c1 (x, y) =∫
Ω
gk ((x,y)−(p,q))u0(p,q)H(φ(p,q))dpdqgk
((x,y)−(p,q))H(φ(p,q))dpdq and c2 (x, y)
=∫
Ω
gk ((x,y)−(p,q))u0(p,q)(1−H(φ(p,q)))dpdqgk
((x,y)−(p,q))(1−H(φ(p,q)))dpdq . We use the con-
ventional regularizing term ER that includes a penalty termon
the total length of the edge contour for a given segmenta-tion.
Also it contains another penalty term on the total area ofthe
foreground region found by the segmentation. The energyterm
therefore becomes:
En2(φ) = μ∫
Ω
δ (φ) +v∫
Ω
H (φ (x, y)) dxdy + |∇φ|dxdy
+λ1∫
Ω
(u0 (x, y) − c1 (x, y))2H (φ (x, y))c1(x, y)2
dxdy
+λ1∫
Ω
(u0 (x, y) − c1 (x, y))2 (1 − H (φ (x, y)))c2(x, y)2
dxdy
+λ2 (gku0 (x, y) − d1 (x, y))2H (φ (x, y))
d1(x, y)2dxdy
+λ2 (gku0 (x, y) − d2 (x, y))2 (1 − H (φ (x, y)))
d2(x, y)2dxdy
(9)
This Eq. (9) has to be maximized to obtain higher Bhat-tacharyya
coefficient. The similarity distance measure nowbecomes:
En (φ) = En1 (φ) + En2 (φ) (10)
We model the motion of the target as affine transformationby
introducing a wrap in (10):
x = h (x,ΔT ) =(1 + f g1 f g3 f g5
f g2 1 + f g4 f g6)⎛
⎝xy1
⎞
⎠
(11)
The column vector characterizes the change in poses.
Sub-stituting (11) in (10) and omitting the terms that are not
afunction ofΔT -incrementalwarp (representedφ), we obtain:
En (φ) = 12Ff
n∑
i=1H (φ (h (x,ΔT ))) w f ,i
+ 12Fb
γ
n∑
i=1(1 − H (φ (h (x,ΔT ))))wb,i (12)
ΔT tends to 0, and the estimation gets converged. In this
way,the registration step iteratively estimates the shape
changeuntil it gets converged.
Segmentation
Since the tracker in the registration stage is still not ableto
extract the target contour properly, the registration resultneeds
to be refined through segmentation. In order to do this,we optimize
φ in Eq. (10) because the equation is a functionof φ; in other
words, ∂En(φ(xi ))
∂φ(xi )= 0. This is solved by well-
known steepest-ascent method: ∂En(φ(xi ),t)∂t = ∂En(φ(xi ))∂φ(xi
) .
We obtain:
δφ (x, y, t)
δt= δ∈ (φ)
[μ∇.
( ∇φ|∇φ|
)− v
+λ1(
(u0 (x, y) − c2 (x, y))2c2(x, y)
2− (u0 (x, y) − c1 (x, y))
2
c1(x, y)2
)]
+λ2( (
gku0 (x, y) − d2 (x, y))2
d2(x, y)2
−(gku0 (x, y) − d1 (x, y)
)2
d1(x, y)2
)
+ 12
Δtδ∈ (φ)(
1
F fh f ,i − γ 1Fb
hb,i
)
(13)
∂ε (φ)
|∇φ|∂φ
∂−→n = 0 on ∂Ω (14)
where H and δ∈ represent the Heaviside function and
Diracmeasure, respectively; ∂φ
∂−→n and
−→n denote the normal deriva-tive of φ at the boundary and the
exterior normal to theboundary, respectively. Finally, the target
is updated on sub-sequent frames.
123
-
International Journal of Computer Assisted Radiology and Surgery
(2019) 14:2165–2176 2171
Fig. 2 a–d Input frames in a video sequence to be denoised. e,f
Results of denoising
Data
The datasets used in this work are obtained from pri-vate
sources such as Hamad Medical Corporation (30 datasequences) and
public sources such as Sunnybrook [32] (45data sequences) andVOT
2015 [40] (60 data sequences). TheSunnybrook Cardiac Data (SCD)
consist of cine MRI datafrom a mixed set of patients and
pathologies:healthy,hyper-trophy, heart failure with infarction,and
heart failure withoutinfarction. Subset of this data set was first
used in the auto-mated myocardium segmentation challenge from
short-axisMRI.The VOT 2015 sequences are chosen from a large poolof
sequences including ALOV, OTB, non-tracking, Com-puter Vision
Online, Professor Bob Fisher’s ImageDatabase,Videezy, Center for
Research inComputerVision, Universityof Central Florida,USA,
NYUCenter for Genomics and Sys-tems Biology,Data Wrangling, Open
Access Directory andLearning and Recognition in Vision Group,
INRIA.The ini-tial pool of sequences is created by combining the
sequencesfrom all the sources. After removal of duplicate
sequences,grayscale sequences and sequences that contained
objectswith area smaller than 400 pixels, the final sequences
areobtained;more details can be obtained from theWeb site [45].
Results and discussion
Results
The proposed method is implemented on both private andpublic
databases as described earlier. The qualitative results
of denoising are provided in Fig. 2. We have
quantitativelycompared the proposeddenoisingmethodwith that of
Fourierbecause of its huge popularity [34]. The perceptual
qualitymeasurement (PQM) [41] is provided in Fig. 3, which
showsgreater value in case of MODWT suggesting higher efficacyof
MODWT; in this figure, m denotes the mass of the parti-cle that
moves under stochastic condition. For denoising ofthe input images,
the initial values of Δt and z are taken as0.007 and 0.000027,
respectively. To determine the quality ofthe denoised image, we
have calculated distribution separa-tion measure that estimates the
degree of image quality. TheDSM is defined as [34]: DSM = ∣∣μET −
μEB
∣∣ − ∣∣μOT − μOB∣∣,
whereμET andμOT are the mean of the selected target regions
of the denoised and original images, respectively; μEB andμOB
are the mean of the selected background region of thedenoised and
original image, respectively. The higher thevalue of DSM, the
better is the quality. It is observed thatthe value of DSM is
maximum at iteration 200 and then itstarts decreasing; therefore,
this iteration is considered asthe optimal.
These denoised frames are further used in the subsequentsteps in
the proposed method. As mentioned earlier, we haveincluded the
image sequences of cardiac surgery and clip-ping for ruptured
cerebral aneurysms in this work. We havealso tested our method on
cardiac cine MRI datasets, highcontrast and low contrast levels, to
highlight the performingcapability of the method in varying
intensities. The perfor-mance results on these datasets are
provided in Figs. 4 and5. We have chosen different scenarios for
cerebral aneurysmsurgical procedure (clipping): One is to track the
scissors’
123
-
2172 International Journal of Computer Assisted Radiology and
Surgery (2019) 14:2165–2176
15 10 5 08.3
8.4
8.5
8.6
8.7
8.8
8.9
9Perceptual Quality Measure
log m
PQ
M
Method1Method2
(a)
5 10 15 20 25 30 35
iteration number
0.5
1
1.5
2
2.5
3
ener
gy
10 8
Paragios et al.Du et al.Proposed Method
(b)
Fig. 3 Perceptual quality measures by Fourier (Method 2) and
MODWT (Method 1); m in x-axis denotes the mass of the particle that
moves understochastic condition. b Energy convergence comparison of
three methods
(a) (b) (c) (d)
(e) (f) (g) (h)
Fig. 4 a–d Ground truth frames. e–h Tracking of left ventricle
in low-contrast cine magnetic resonance imaging (low-contrast CMRI)
duringcardiac surgery
or clippers’ movement and the other one is to focus on
theoperating field during surgery, where multiple tools are usedby
the surgeons. It is important to track the motion of thescissors in
order to minimize the damage caused by theirmovement. Besides the
tools’ tracking, capturing or trackingthe operating field is also
important; it helps the surgeon inconcentrating on the tools used
during the surgery and theimpacted tissues of interest. The results
are given in Figs.6 and 7. We have also tested the proposed method
on VOT2015 datasets and found some satisfactory results as can
beobserved in Fig. 8. We have included this particular dataset
in this paper to emphasize on the fact that the foregroundis not
very significantly different than the background like ithappens
inmedical data sequences. Usually, themedical dataare blurry
(either reddish or grayish) and lack contrast as canbe observed
from the figures. In this scenario, only a contoursurrounding the
tools could easily be ignored; therefore, justfor user’s (surgeon)
convenience, we have added the blueline surrounding the red line in
the tracking results. Whilecalculating the accuracy, red line is
only taken into consid-eration. In order to determine the
segmentation accuracy, wehave used Dice coefficient (DC), which may
be defined as
123
-
International Journal of Computer Assisted Radiology and Surgery
(2019) 14:2165–2176 2173
Fig. 5 Tracking of left ventricle in high-contrast cine magnetic
resonance imaging (high-contrast CMRI) during cardiac surgery
Fig. 6 Tracking of the operating field with multiple objects
during cerebral aneurysm clipping
[44]: DC = 2 × |X∩Y ||X |+|Y | , where X and Y are two point
sets.The average segmentation accuracy on 3-T machine is
94%,whereas in case of 7 T, it is found to be 96%. The
proposedmethod has performed as expected, which can be verifiedfrom
the results provided in “Results” section. We have opti-mized the
algorithm and code; average time taken to performtracking and
average number of frames are less than 25–30sand 24 frames per
second, respectively. We have also com-
pared the performance of the proposed method with othersimilar
methods ([31,42]); the proposed method convergesfaster than the
other methods 3(b). We have also calculatedoverlap index (OI) [43]
to determine the overlap between theresulting target contour and
the actual boundary. We havefound it highest in case of the
proposed method against oth-ers as can be observed from Table
1.
123
-
2174 International Journal of Computer Assisted Radiology and
Surgery (2019) 14:2165–2176
Fig. 7 Tracking of the operating field with multiple objects
during cardiac surgery
Fig. 8 a–d Tracked frames in a video sequence (VOT 2015). e–f
Corresponding ground truth sequences
Discussion
The values of bistable system parameters play a crucial rolein
the process of denoising using SR. The expression for SRon any data
set contains additive terms of multiples of w andsubtractive term
of multiples of z. This is observed that theimages that have low
contrast and low dynamic range requirelarger values ofw, while
those that have relatively more con-
trast and cover an appreciable gray level range require
smallervalues of w for proper denoising. Values of Δt have
beenstudied to be similar to that of w. This is also perceived
thatw is inversely proportional to overall variance signifying
thecontrast of input image. Optimization process leads us to
theoptimum value of w; the value of z should be less than 1 so
that condition e <√
4a327 holds assuring that the system is
bistable and signal is sub-threshold so that SR can be
appli-
123
-
International Journal of Computer Assisted Radiology and Surgery
(2019) 14:2165–2176 2175
Table 1 Overlap index comparison of different methods on
hospital and VOT 2015 datasets
Method Hospital and SCD datasets
FOV-CA (%) Scissors-CA (%) Low-contrast CMRI (%) High-contrast
CMRI (%)
Paragois et al. [42] 64 65 64 65
Du et al. [31] 67 69 68 72
Proposed method 72 74 73 76
Method VOT 2015 Dataset
Rabit (%) Shaking (%) Racing (%) Octopus(%)
Paragois et al. [42] 69 70 68 71
Du et al. [31] 72 75 70 76
Proposed method 83 85 82 88
The results of the proposed methods are shown in bold
cable. We prefer a very small value of this factor to remainwell
within the allowable range of e. Finally, we have noticedthat the
varying segmentation accuracy depends on the qual-ity of the input
data sequence. The MRI data obtained from7-T machine give better
accuracy than 3-T MRI machine.
Conclusions and future work
A variational framework has been presented to track themotion of
moving objects and field of view in surgerysequences. We have
presented a method that has used SRto denoise the input frames and
a combined registration–segmentation framework to conduct motion
tracking. Wehave introduced a robust similarity metric and an
efficientenergy functional in this framework. Despite the fact that
theinput data contain varying illumination, motion blur, lack
ofimage texture, occlusion, and fast object movements, the
per-formance of the proposed method is found quite satisfactory.In
future, we intend to extensively evaluate the method
quan-titatively so that it can be well tested before trying in
clinicalpractice.
Acknowledgements Open Access funding provided by the
QatarNational Library. This work was partly supported by NPRP
Grant#NPRP 5-792-2-328 from theQatar National Research Fund
(amemberof Qatar Foundation).
Compliance with ethical standards
Conflict of interest The authors declare that they have no
conflict ofinterest.
Open Access This article is distributed under the terms of the
CreativeCommons Attribution 4.0 International License
(http://creativecommons.org/licenses/by/4.0/), which permits
unrestricted use, distribution,and reproduction in any medium,
provided you give appropriate creditto the original author(s) and
the source, provide a link to the CreativeCommons license, and
indicate if changes were made.
References
1. Di Salvo TG, Acker MA, Dec GW, Byrne JG (2010) Mitral
valvesurgery in advanced heart failure. J Am Coll Cardiol
55:271–282
2. Zhai X, Eslami M, Hussein ES, Filali MS, Shalaby ST, Amira
A,Bensaali F, Dakua S, Abinahed J, Al-Ansari A, Ahmed AZ
(2018)Real-time automated image segmentation technique for
cerebralaneurysm on reconfigurable system on chip. J Comput Sci
27:35–45
3. Ganni S, Botden SMBI, Chmarra M (2018) A software-based
toolfor video motion tracking in the surgical skills assessment
land-scape. Surg Endosc 32:2994
4. Jakimowicz JJ, Buzink S (2015) Training curriculum in
minimalaccess surgery. In: Francis N, Fingerhut A, Bergamaschi
R,MotsonR (eds) Training in minimal access surgery. Springer,
London, pp15–34
5. Feng C, Haniffa H, Rozenblit JW, Peng J, Hamilton AJ,
SalkiniM (2006) Surgical training and performance assessment using
amotion tracking system. In: International mediterranean
modellingmulticonference, I3M, pp 647–652
6. Carroll SM,KennedyAM,TraynorO,GallagherAG (2009)Objec-tive
assessment of surgical performance and its impact on a
nationalselection programme of candidates for higher surgical
training inplastic surgery. J Plast Reconstr Aesthet Surg
62:1543–1549
7. Pruliere-Escabasse V, Coste A (2010) Image-guided sinus
surgery.Eur Ann Otorhinolaryngol Head Neck Dis 127:33–39
8. Tjardes T, Shafizadeh S, Rixen D, Paffrath T, Bouillon B,
Stein-hausen ES, Baethis H (2010) Image-guided spine surgery: state
ofthe art and future directions. Eur Spine J 19:25–45
9. Shaharan S, Nugent E, Ryan DM, Traynor O, Neary P, BuckleyD
(2016) Basic surgical skill retention: can patriot motion track-ing
system provide an objective measurement for it? J Surg
Educ73:245–9
10. Zhao Z, Voros S, Weng Y, Chang F, Li R (2017)
Tracking-by-detection of surgical instruments in minimally invasive
surgeryvia the convolutional neural network deep learning-based
method.Comput Assist Surg 22:26–35
11. ZhangM,Wu B, Ye C, Wang Y, Duan J, Zhang X, Zhang N
(2019)Multiple instruments motion trajectory tracking in optical
surgicalnavigation. Opt Express 27:15827–15845
12. Berry D (2009) Percutaneous aortic valve replacement: an
impor-tant advance in cardiology. Eur Heart J 30:2167–2169
13. Kobayashi S, Cho B, Huaulme A, Tatsugami K, Honda H,
JanninP, Hashizumea M, Eto M (2019) Assessment of surgical skills
byusing surgical navigation in robot-assisted partial nephrectomy.
Int
123
http://creativecommons.org/licenses/by/4.0/http://creativecommons.org/licenses/by/4.0/
-
2176 International Journal of Computer Assisted Radiology and
Surgery (2019) 14:2165–2176
JComputAssistRadiol Surg.
https://doi.org/10.1007/s11548-019-01980-8
14. Docquier PL, Paul L, TranDuy K (2016) Surgical navigation
inpaediatric orthopaedics. EFORT Open Rev 1:152–159
15. Tadayyon H, Lasso A, Kaushal A, Guion P, Fichtinger G
(2011)Target motion tracking in MRI-guided transrectal robotic
prostatebiopsy. IEEE Trans Biomed Eng 58:3135–42
16. Ozkan E, Tanner C, Kastelic M, Mattausch O, Makhinya M,
Gok-sel O (2017) Robust motion tracking in liver from 2D
ultrasoundimages using supporters. Int J Comput Assist Radiol Surg
12:941–950
17. Liu TJ, KoAT, TangYB, Lai HS, ChienHF,Hsieh TM (2016)
Clin-ical application of different surgical navigation systems in
complexcraniomaxillofacial surgery: the use of multisurface
3-dimensionalimages and a 2-plane reference system. Ann Plast Surg
76:411–9
18. Engelhardt S, Simone RD, Al-Maisary S, Kolb S, Karck
M,Meinzer HP, Wolf I (2016) Accuracy evaluation of a mitral
valvesurgery assistance system based on optical tracking. Int J
ComputAssist Radiol Surg 11:1891–904
19. Niehaus R, Schilter D, Fornaciari P, Weinand C, Boyd M,
ZiswilerM, Ehrendorfer S (2017) Experience of total knee
arthroplastyusing a novel navigation system within the surgical
field. Knee24:518–524
20. Kim BG, Park DJ (2004) Unsupervised video object
segmentationand tracking based on new edge features. Pattern
Recognit Lett25:1731–1742
21. Subudhi BN, Nanda PK, Ghosh A (2011) A change
informationbased fast algorithm for video object detection and
tracking. IEEETrans Circuits Syst Video Technol 21:993–1004
22. Duffner S, Garcia C (2017) Fast pixel wise adaptive visual
trackingof non rigid objects. IEEE Trans Image Process
26:2368–2380
23. Li J, Zhou X, Chan S, Chen S (2017) Robust object tracking
vialarge margin and scale adaptive correlation filter. IEEE
Access6:12642–12655
24. Mahalingam T, Subramoniam M (2018) A robust single and
mul-tiple moving object detection, tracking and classification.
ApplComput Inf. https://doi.org/10.1016/j.aci.2018.01.001
25. Zhang T, Liu S, Xu C, Liu B, Yang M (2018) Correlation
particlefilter for visual tracking. IEEE Trans Image Process
27:2676–2687
26. Yang yang G, Dong-jian H, Cong L (2018) Target tracking and
3Dtrajectory acquisition of cabbage butterfly based on the
KCF-BSalgorithm. Sci Rep Nat 8:9622
27. Du B, Sun Y, Cai S, Wu C, Du Q (2018) Object tracking in
satellitevideos by fusing the kernel correlation filter and the
three-frame-difference algorithm. IEEE Trans Image Process
15:168–1821
28. Ning J,ZhangL,ZhangD,YuW(2013) Joint registration and
activecontour segmentation for object tracking. IEEE Trans Circuits
SystVideo Technol 23:1589–1597
29. Liu G, Liu S, Muhammad K, Sangaiah A, Doctor F (2018)
Objecttracking in vary lighting conditions for fog based
intelligent surveil-lance of public spaces. IEEE Access
6:29283–29296
30. Liu S, Feng Y (2018) Real-time fast moving object tracking
inseverely degraded videos captured by unmanned aerial vehicle.
IntJ Adv Robot Syst SAGE 11:1–10
31. Du D,Wen L, Qi H, Huang Q, Tian Q, Lyu S (2018) Iterative
graphseeking for object tracking. IEEE Trans Image Process
27:1809–1821
32. Radau P, Lu Y, Connelly K, Paul G, Dick AJ, Wright GA
(2009)Evaluation framework for algorithms segmenting short axis
cardiacMRI. MIDAS J Cardiac MR Left Ventricle Segm Chall
33. Dakua S, Abinahed J, Al-Ansari A (2018) A PCA based
approachfor brain aneurysm segmentation. J Multi Dimens Syst Signal
Pro-cess 29:257–277
34. Rallabandi V, Roy P (2010) MRI enhancement using
stochasticresonance in Fourier domain. Magn Reson Imaging
28:1361–1373
35. vom Scheidt J, Gard TC Introduction to stochastic
differentialequations. Pure and applied mathematics 114, XI, 234
pp. Mar-cel Dekker Inc., New York . ISBN 0-8247-7776-X
36. Yao SJ, Song YH, Zhang LZ, Cheng XY (2000) MODWT andnetworks
for short-term electrical load forecasting. EnergyConversManag
41:1975–1988
37. ComaniciuD,RameshV,Meer P (2003)Kernel-based object
track-ing. IEEE Trans Pattern Anal Mach Intell 25:564–577
38. Vezzetti E, Marcolin F (2015) Similarity measures for face
recog-nition. Bentham Books, Sharjah, United Arab Emirates.
ISBN:978-1-68108-045-1
39. ErdemCE, SankurB, TekalpAM(2004) Performancemeasures
forvideo object segmentation and tracking. IEEETrans Image
Process13:937–951
40. Matej K, Matas J, Leonardis A, Felsberg M, Cehovin L
(2015)The visual object tracking VOT 2015 challenge results. In:
IEEEinternational conference on computer vision workshop, pp
564–586
41. Wang Z, Sheikh HR, Bovik AC (2002) No-reference
perceptualquality assessment of jpeg compressed images. In:
Proceedings ofIEEE international conference image processing, page
477–480
42. Paragios N, Deriche R (2000) Geodesic active contours and
levelsets for the detection and tracking of moving objects. IEEE
TransPattern Anal Mach Intell 22:266–280
43. Rosenfield GH, Fitzpatrick Lins K (1986) A coefficient of
agree-ment as ameasure of thematic classification accuracy.
PhotogrammEng Remote Sens 52:223–227
44. Shi F, YangQ, GuoX, Qureshi T, Tian Z,MiaoH, DeyD, Li D,
FanZ (2019) Vessel wall segmentation using convolutional neural
net-works. IEEE Trans Biomed Eng.
https://doi.org/10.1109/TBME.2019.2896972
45. http://www.votchallenge.net/vot2015/dataset.html
Publisher’s Note Springer Nature remains neutral with regard to
juris-dictional claims in published maps and institutional
affiliations.
123
https://doi.org/10.1007/s11548-019-01980-8https://doi.org/10.1007/s11548-019-01980-8https://doi.org/10.1016/j.aci.2018.01.001https://doi.org/10.1109/TBME.2019.2896972https://doi.org/10.1109/TBME.2019.2896972http://www.votchallenge.net/vot2015/dataset.html
Moving object tracking in clinical scenarios: application to
cardiac surgery and cerebral aneurysm
clippingAbstractIntroductionClinical requirements in surgeryRelated
work
Methodology and dataDenoising of image sequencesMaximal overlap
discrete wavelet transformDenoising by MODWT
Target renderingRegistrationSegmentationData
Results and discussionResultsDiscussion
Conclusions and future workAcknowledgementsReferences