Moving object tracking in clinical scenarios: application to cardiac … · 2019. 11. 15. · Keywords Cerebral aneurysm ·Segmentation ·Object tracking · Heart surgery ·Brain

International Journal of Computer Assisted Radiology and Surgery (2019) 14:2165–2176https://doi.org/10.1007/s11548-019-02030-z

ORIG INAL ART ICLE

Moving object tracking in clinical scenarios: application to cardiacsurgery and cerebral aneurysm clipping

Sarada Prasad Dakua1 · Julien Abinahed1 · Ayman Zakaria1 · Shidin Balakrishnan1 · Georges Younes1 ·Nikhil Navkar1 · Abdulla Al-Ansari1 · Xiaojun Zhai3 · Faycal Bensaali2 · Abbes Amira4

Received: 28 January 2019 / Accepted: 3 July 2019 / Published online: 15 July 2019© The Author(s) 2019

AbstractBackground andobjectives Surgical procedures such as laparoscopic and robotic surgeries are popular since they are invasivein nature and use miniaturized surgical instruments for small incisions. Tracking of the instruments (graspers, needle drivers)and field of view from the stereoscopic camera during surgery could further help the surgeons to remain focussed and reduce theprobability of committing any mistakes. Tracking is usually preferred in computerized video surveillance, traffic monitoring,military surveillance system, and vehicle navigation. Despite the numerous efforts over the last few years, object trackingstill remains an open research problem, mainly due to motion blur, image noise, lack of image texture, and occlusion. Mostof the existing object tracking methods are time-consuming and less accurate when the input video contains high volume ofinformation and more number of instruments.Methods This paper presents a variational framework to track the motion of moving objects in surgery videos. The keycontributions are as follows: (1) A denoisingmethod using stochastic resonance inmaximal overlap discrete wavelet transformis proposed and (2) a robust energy functional based on Bhattacharyya coefficient tomatch the target region in the first frame ofthe input sequence with the subsequent frames using a similarity metric is developed. A modified affine transformation-basedregistration is used to estimate the motion of the features following an active contour-based segmentation method to convergethe contour resulted from the registration process.Results and conclusion The proposed method has been implemented on publicly available databases; the results are foundsatisfactory. Overlap index (OI) is used to evaluate the tracking performance, and the maximum OI is found to be 76% and88% on private data and public data sequences.

Keywords Cerebral aneurysm · Segmentation · Object tracking · Heart surgery · Brain aneurysm clipping · Level sets

Introduction

Looking at the steep rise in cardiac diseases, bona fidetreatment including surgery is necessary to prevent its riseand avoid sudden cardiac death [1]. Similarly, cerebral

B Sarada Prasad [email protected]

1 Department of Surgery, Hamad Medical Corporation, Doha,Qatar

2 Department of Electrical Engineering, Qatar University,Doha, Qatar

3 School of Computer Science and Electronic Engineering,University of Essex, Colchester, UK

4 Faculty of Computing, Engineering and Media De MontfortUniversity, Leicester, UK

aneurysm (CA) is one of the devastating cerebrovasculardiseases of adult population worldwide that cause sub-arachnoid hemorrhage, intracerebral hematoma, and othercomplications leading to high mortality rate [2]. Surgeryis considered as an efficient modality for the patients withcardiac complications and ruptured cerebral aneurysms.Tracking could be considered as a treatment support andplanning in robotic, laparoscopic, and medical education.During robotic surgery or laparoscopic surgery, the sur-geons concentrate on the surgery to avoid even slight,possible mortality and morbidity and usually get stressed.In this scenario, motion tracking of the tools and viewingthe desired operating field may be considered two support-ive pillars to augment the treatment and improve successrate.

123

http://crossmark.crossref.org/dialog/?doi=10.1007/s11548-019-02030-z&domain=pdf

2166 International Journal of Computer Assisted Radiology and Surgery (2019) 14:2165–2176

Clinical requirements in surgery

Many factors contribute to successful outcome of a surgery,specificallyminimally invasive surgery (MIS). These includetechnical factors, such as in-depth understanding of the rel-evant anatomy, clear understanding of the steps involved inthe procedure, well-honed surgical skills and tool manipu-lation, as well as anthropomorphic factors such as operatingteam chemistry and dynamics. To a certain degree, MIS sur-geons can advance their anatomy knowledge and proceduralunderstanding through reading and surgical videos; how-ever, other technical skills such as tool manipulation andpositioning, which are very crucial to the successful out-come of the surgery [3,4] are more complex, nuanced andtime dependent to develop due to restricted vision, limitedworking space, loss of visual cues and tactile feedback [5].Quality and adequacy of surgical proficiency directly impactintra-operative and postoperative outcomes [6]. The existing“apprenticeship” model of training in surgery provides lim-ited and time-consuming opportunities to gain the requiredtechnical competencies. In its current form, the assessmentof surgical proficiency is heavily reliant on subject-matterexperts/subjective assessments [3]. Thus, surgical trainingand planning could benefit greatly from visual supportprovidedby instrument/motion tracking, byprovidingbench-marked metrics for continued objective and constructiveassessment of highest standards of surgical skills, and low-ering the risk of false tool trajectories and orientations [7],alignment of implants and placement of screws [8], etc.

Such augmented visual support for both surgical train-ing and planning could be provided through object/motiontracking of the tools (such as scope, scissors, etc.) by pro-viding objective assessment, benchmarking, and automatedfeedback on metrics such as path length/deviation, econ-omy and smoothness of hand movements, depth perception,rotational orientation, changes in instrument velocity andtime [9]. Zhao et al. [10] report that intra-operative track-ing/detection of surgical instruments can provide importantinformation to monitor instruments for the operational nav-igation in MIS, especially in the robotic minimally invasivesurgeries (RMIS). Thus, based on the above, the perceivedimpact of tool tracking/positioning on surgical training andintra-operative guidance leads to (a) ensured patient safetyvia proficient tool movements and avoidance of criticaltissue structures and (b) facilitation of a smooth and effi-cient invasive procedure [11]. This is crucial in surgery,as by continuously charting the location, movement, speed,and acceleration of the different surgical instruments in theoperating field, the surgeon is continuously aware of thewhereabouts of his instruments in relation to the patient’svital organs, blood vessels, and nerves during surgery. Forsurgical training, it objectively helps assess surgical perfor-mance and helps differentiate between an expert and a novice

surgeon, such that optimal training can then be provided tothe novice to ensure the highest levels of patient care [3].Therefore, precise positioning of the tools remains pivotal inminimally invasive surgical procedures [12] highlighting theneed of object tracking via its impact on surgical training andintra-operative guidance.

Kobayashi et al. [13] applied surgical navigation tech-niques and tool tracking to renal artery dissection withinthe robot-assisted partial nephrectomy procedure and foundthat inefficient toolmovements involving “insert,” “pull,” and“rotate” motions, as well as time to visualize and dissectthe artery were significantly improved owing to improvedvisualization and control over the tool and anatomy. Pedi-atric orthopedic surgeons found an increase in accuracyand a reduction in operating time when using image-guidedsurgical robotic systems to overcome the inaccuracies ofhand-controlled tool positioning [14]; these robots achievethis by providing information about surgical tools or implantsrelative to a target organ (bone). In urology, motion track-ing can greatly assist in outpatient procedures such as MRIand ultrasound-guided prostate biopsy, allowing the sur-geon to accurately position and invade suspicious malignantzones for a tissue sample [15]. In interventional radiology,motion tracking can help track guide-wires during endovas-cular interventions and radiation therapy [16]. In addition tothese, applications of surgical navigation systems and tooltracking/motion analysis are being explored in many othersurgical fields, including ear-nose-and-throat (ENT) surgery[7], craniomaxillofacial surgery [17], cardiothoracic surgery[18], and orthopedic surgery [19].

Related work

The literature ofmotion tracking is rich; a few recentmethodsare included in this paper. Kim and Park [20] present a strat-egy that is based on edge information to assist object-basedvideo coding, motion estimation, and motion compensationfor MPEG 4 and MPEG 7 utilizing the human visual per-ception to provide edge information. However, the methodcritically depends on its ability to establish correct corre-spondences between points on the model edges and edgepixels in an image. Furthermore, this is a non-trivial prob-lem especially in the presence of large inter-frame motionsand cluttered environments. Subudhi et al. [21] propose atwo-step method: spatio-temporal spatial segmentation andtemporal segmentation that usesMarkov randomfield (MRF)model and posteriori probability (MAP) estimation tech-nique. Duffner and Garcia [22] present an algorithm forreal-time single-object tracking, where a detector makes useof the generalized Hough transform with color and gradientdescriptors; a probabilistic segmentation method is used forforeground and background color distributions. However, itis computationally expensive, especially when the number

123

International Journal of Computer Assisted Radiology and Surgery (2019) 14:2165–2176 2167

of parameters is large. It also could be erroneous because thegradient information usually leads to error when noise levelis high. Li et al. [23] suggest a method within the correla-tion framework (CF) that models a tracker maximizing themargin between the target and surrounding background byexploiting background information effectively. They proposeto train a CF by multilevel scale supervision, which aims tomake CF sensitive to the target scale variation. Then the twoindividualmodules are integrated into one framework simpli-fying the tracking model. However, the computational loadand efficiency are still twomajor concerns.Mahalingam et al.[24] propose a fuzzymorphological filter and blob detection-based method for object tracking. However, the performancegets deteriorated in the presence of noise, lack of illumina-tion, and occlusion. Zhang et al. [25] propose a correlationparticle filter (CPF) that combines a correlation filter and aparticle filter. However, this tracker is still unable to dealwith scale variation and partial occlusion. Yang et al. [26]present a method to analyze frames extracted from videosusing kernelized correlation filters (KCF) and backgroundsubtraction (BS) (KCF-BS) to plot the 3D trajectory of cab-bage butterfly. The KCF-BS algorithm is used to track thebutterfly in video frames and obtain coordinates of the tar-get centroid in two videos. However, it is noticed that thetarget sometimes gets lost and the method is unable to re-detect or recognize the target when the target motion is fast.Du et al. [27] propose an object tracking method for satellitevideos by fusing KCF tracker and a three-frame differencealgorithm. Although the method reports interesting results,it takes long time to perform. Liu et al. [29] propose a corre-lation filter-based tracker that consists of multiple positions’detections and alternate templates. The detection position isrepositioned according to the estimated speed of target by anoptical flowmethod, and the alternate template is stored witha template update mechanism. However, this method failsto perform if the size of each target is too small comparedwith the entire image, and the target and the background arevery similar. Liu et al. [30] propose a method by integratinghistogram of oriented gradient, RGB histogram, and motionhistogram into a novel statistical model to track the target inunmanned aerial vehicle-captured videos. However, it failsto perform in occluded scenes.

Du et al. [31] present a method that is based on itera-tive graph seeking. Usually, the superpixel-based methodsuse mid-level visual cues to represent target parts wherelocal appearance variations are exploited by superpixel rep-resentation. These methods have three sequential steps: (A)target part selection, (B) target part matching, and (C) targetstate estimation. (A) selects candidate target parts from thebackground, (B) a local appearance model associates partsbetween consecutive frames (target part matching, centerpixel location and size of the target) is estimated based onmajority voting, and (C) target state is estimated based on

majority voting of matching results. This method integratestarget part selection, part matching, and state estimationusing a unified energy minimization framework. It incorpo-rates structural information in local parts variations using theglobal constraint. Although the results are reported promis-ing, the target part selection and target part matching whencombinedly merge with the correlation filter, the estimationof the target takes long time to converge due to scale varia-tion and partial occlusion that are bound to happen in surgeryscenarios. Furthermore, when the noise level (for instance, incardiac cineMRI data) in the input frames is high, themethodwould certainly struggle to perform. We intend to addressthese issues through our proposed method. Furthermore, ifthe literature above is carefully observed, noise has alwaysbeen an issue in most of the methods. Therefore, in our pro-posed method, we first denoise the input frames. The targetregion on the first frame is chosen by a level set (LS) func-tion, and then, the foreground and background models aregenerated. The foreground and background distributions aredetermined using the models in subsequent frames, and themotion of the pixels from the region of interest is estimatedthrough a registration framework. Additionally, the selectedregion contour in the current frame is registered with thesubsequent frame. Finally, segmentation is applied to refinethe contour generated during registration and the contour isupdated.

The paper is organized as follows: Section “Methodologyand data” describes the denoising stage; “Target rendering”section presents the approach for target rendering (includ-ing region selection and developing models); “Registration”section defines a method for motion estimation through reg-istration; “Segmentation” section presents the segmentation;“Results and discussion” section provides the results, while“Conclusions and future work” section concludes the paper.

Methodology and data

The method is illustrated in Fig. 1. First, the input frame isdenoised to minimize the negative impact of noise on sub-sequent steps. The target region is then selected followedby the development of background models for motion esti-mation through a registration framework. Finally, the roughcontour generated in registration step is further refined (by aproper segmentation method) and the contour is updated onsubsequent frames.

Denoising of image sequences

Over the years, most of the methods address the noisy andcluttered medical images, mostly, by filtering that result sig-nificant degradation in image quality. One of the efficientapproaches that counter noise and constructively utilize noise

123


Fig. 1 Block diagramdescribing the proposed method

Input frame Denoising Target Region SelectionAnd Initialization of LS

RegistrationSegmentationObject Tracked

is stochastic resonance (SR) [33]. SR occurs if the signal-to-noise ratio (SNR) and input/output correlation have awell-marked maximum at a certain noise level. Unlike verylow or high noise intensities, moderate ones allow the signalto cross the threshold giving maximum SNR at some opti-mum noise level. In the bistable SR model, upon additionof zero mean Gaussian noise, the pixel is transferred fromweak signal state to strong signal state, which is modeled byBrownian motion of a particle (pc) placed in a double-wellpotential system. The state at which performance metrics arefound optimum can be considered as the stable state provid-ing maximum SNR. There have already been many attemptsto use SR in different domains such as Fourier and spatialdomains [34]; however, we have chosen the maximal overlapdiscrete wavelet transform (MODWT) [36] because of someof its key advantages: (1) MODWT can handle any samplesize, (2) the smooth and detail coefficients of MODWTmul-tiresolution analysis are associatedwith zero phase filters, (3)it is transform invariant, and (4) it produces a more asymp-totically efficient wavelet variance estimator than DWT.

Maximal overlap discrete wavelet transform

Generally,DWTis definedby:ψ j,k (t) = 2 j2 ψ(2 j t − k) j,

k ∈ Z; z = {0, 1, 2, . . .}, where ψ is a real-valued func-tion compactly supported, and

∫ ∞−∞ ψ (t) dt = 0. MODWT

is evaluated using dilation equations: φ (t) = √2∑k lkφ(2t − k) , ψ (t) = √2∑k hkφ (2t − k), where φ (2t − k)and ψ (t) are father wavelet defining low-pass filter coef-ficients and mother wavelet defining high-pass filtercoefficients lk : lk =

√2

∫ ∞−∞ φ (t) φ (2t − k) dt, hk =√

2∫ ∞−∞ψ (t) ψ(2t − k)dt .

Denoising by MODWT

In this methodology, 2D MODWT is applied to the M × Nsize image I . Applying SR to the approximation and detailcoefficients, the stochastically enhanced (tuned) coefficientsets in MODWT domain are obtained as Wsψ (l, p, q)SRand W (l0, p, q)SR . The SR in discrete form is defined as:dxdt =

[ax − ex3] + B sinωt + √Dξ (t), where √Dξ (t)

and B sinωt represent noise and input, respectively; these arereplaced by MODWT sub-band coefficients. The noise termis the factor to produce SR; maximization of SNR occurs atthe double-well parameter a. Implementation of SR on dig-ital images necessitates the need for solving the stochasticdifferential equation using Euler–Maruyama’s method [35]that gives the iterative discrete equation:

x(n + 1) = x(n) + Δt[(ax(n) − ex3(n)) + Input(n)

]

(1)

where a and e are the bistable parameters, whereas n and Δtrepresent iteration and sampling time, respectively. I nputdenotes the sequence of input signal and noise, with the ini-tial condition being x(0) = 0. The final stochastic simulationis obtained after some predefined number of iterations. Giventhe tuned (enhanced and stabilized) set ofwavelet coefficients(Xφ (l0, p, q) and Xsψ (l, p, q)), the denoised image Idenoisedin spatial domain is obtained by inversemaximal overlap dis-crete wavelet transform (IMODWT) as:

Idenoised = 1√MN

∑

p

∑

q

Xφ (l0, p, q) φl0,p,q (i, j)

+ 1√MN

∑

s∈(H ,V ,D)

∑

l=l0

∑

p

∑

q

Xsψ (l, p, q) ψsl0,p,q (i, j)

The double-well parameters a and e are determined from theSNR by differentiating SR with respect to a and equating tozero; in this way, SNR is maximized resulting in a = 2σ 20for maximum SNR, where σ0 is the noise level administeredto the input image. The maximum possible value of restoringforce (R = B sinωt) in terms of gradient of some bistablepotential function U (x), R = − dUdx = −ax + ex3, dRdx =−a + 3ex2 = 0 resulting in x = √a/3e. R at this valuegives maximum force as

√4a327e and B sinωt <

√4a327e . Max-

imizing the left term (keeping B = 1), e < 4a327 . In orderto get the parameter values, we consider a = w × 2σ 20 , ande = z ×

√4a327 ; w and z are weight parameters for a and e.

Initially, w is an experimentally chosen constant that later

123


becomes input image standard deviation dependent, while zis a number less than 1 to ensure sub-threshold condition ofthe signal. In this way, the noise in input image is counteredand maximum information from the image is achieved.

Target rendering

Target region selection or target rendering [28,37] is the ini-tial step in this motion tracking. Then the features (suchas intensity, color, edge, texture, etc.) are selected thatcan appropriately describe the target. The notations usedin target rendering are: f s—feature space, r—number offeatures, fd—foreground distribution (by the features), andbd—background distribution. The region is initialized onthe first frame and represented by a level set function φbecause of its flexibility in choosing the contour. The dis-tributions of foreground (φ ≥ 0) and background (−th <φ < 0, th is the threshold to restrict the region of inter-est into small area) regions are represented by f g (φ) andbg (φ), respectively, and match with fd and bd. Next, theforeground and background models are generated. Supposethe pixels

{x f ,i

}i=1,...,n f and

{xb,i

}i=1,...,nb fall in fore-

ground and background regions; the function z : �2 →{1, . . . , r} can be used to map the pixels (xi ) into thebin b(xi ) in feature space. The probability of the featurespace in the models is: f d f s = 1n f

∑n fi=1 δ

[(xi, f

) − f s]

and bd f s = 1nb∑nb

i=1 δ[(xi, f

) − f s], where δ is theKronecker delta function and n f and nb are the num-ber of pixels in foreground and background, respectively.The foreground and background distributions in the cur-rent frame candidate region (−th < φ < 0) areobtained as:

f g (φ) = 1Ff

n∑

i=1H (φ (xi )) δ [b (xi ) − f s] and bg (φ)

= 1Fb

n∑

i=1(1 − H (φ (xi ))) δ [b (xi ) − f s] (2)

H(.) is the Heaviside function to select foreground region;Ff and Fb are the normalization factors.

Registration

Registration of the target in the first frame with the next sub-sequent frame is performed to estimate the affine deformationof the target. We determine the foreground and backgrounddistributions in the frames and match them with respectiveforeground and background models. We use Bhattacharyyametric [38] because it is computationally fast and is alreadybeing used in face recognition for years. Additionally, ithas straightforward geometric interpretation. Since it is the

cosine angle between fd and f g(φ) or between bd andbg(φ), higher value of the coefficient indicates better match-ing between candidate and targetmodels. Thus, our similaritydistance measure:

En1 (φ) =r∑

f s=1(√

f g f s (φ) f d f s + γ√bg f s (φ) bd f s

)

(3)

where γ is the weight to balance the contribution from bothforeground and background in the matching.

For deformation estimation, we have proposed a sim-ple and efficient framework as follows. Suppose in thecurrent frame, φ0 is the target initial position and the con-tour is obtained by φ = 0. The probabilities f g (φ0) ={f g f s (φ0)

}f s=1,...,r and bg (φ0) =

{bg f s (φ0)

}f s=1,...,r

are computed. Applying Taylor’s expansion:

En1 (φ) = 12

⎛

⎝r∑

f s=1

√f g f s (φ0) f d f s +

r∑

f s=1f g f s (φ)

√f d f s

f g f s (φ0)

⎞

⎠

+ 12γ

⎛

⎝r∑

f s=1

√bg f s (φ0) bd f s +

r∑

f s=1bg f s (φ)

√bd f s

bg f s (φ0)

⎞

⎠

(4)

By putting Eq. (2) in (4), we get:

En1 (φ) = 12

⎛

⎝r∑

f s=1

√f g f s (φ0) f d f s + 1

Ff

n∑

f s=1h f ,i H (φ (xi ))

⎞

⎠

+ 12γ

⎛

⎝r∑

f s=1

√bg f s (φ0) bd f s + 1

Fb

n∑

f s=1hb,i (1 − H (φ (xi )))

⎞

⎠

(5)

where the weights that play a pivotal role in detect-ing the new centroid of the target are: h f ,i = ∑rf s=1√

f d f sf g f s (φ0)

δ [z (xi ) − f s] and h f ,i = ∑rf s=1√

bd f sbg f s (φ0)

δ [z (xi ) − f s]. Higher value of Bhattacharyyacoefficient can be obtained by maximizing (5) that is a func-tion of location x and contour.

Furthermore, we consider the foreground and backgroundintensity as additional feature. Suppose the first frame,u0(x, y), consists of two concentric regions (ui0, u

o0) mean-

ing the input image contains more than one intensity label.This is certainly challenging in determining a smooth contourinitialization and deformation because of varying intensities.Therefore, we integrate both local and global image informa-tion in the energy term in order tomake it perform as a perfectstep detector with respect to the initialization of contour. Theenergy term is defined as:

En2 = λ1EG + λ2EL + ER (6)

where λ1 and λ2 are fixed constants; EG , EL , and ER arethe global term, local term, and regularized term, respectively

123


(containing respective image information). ER controls theboundary smoothness. The local term is defined as,

EL =∫

φ0

(gku0 (x, y) − u0 (x, y) − d2 (x, y))2d2(x, y)2

dxdy

(7)

where gk is an averaging filter with k × k size, d1 and d2are intensity averages of the difference image gku0 (x, y) −u0 (x, y) inside and outside the variable curve C , respec-tively. The global term:

EG =∫

φ0

(u0(x,y)−c2(x,y))2c2(x,y)2

dxdy

(8)

where the constants c1, c2 represent the average inten-sity of u0(x, y) inside C and outside C , respectively.c1 and c2 are approximated by a weighted average ofimage intensity u0 (p, q), where (p, q) is the neighbor-hood of (x, y). It means c1 (x, y) and c2 (x, y) are spa-tially varying; we formulate c1 (x, y) and c2 (x, y) as,

c1 (x, y) =∫

Ω

gk ((x,y)−(p,q))u0(p,q)H(φ(p,q))dpdqgk ((x,y)−(p,q))H(φ(p,q))dpdq and c2 (x, y)

=∫

Ω

gk ((x,y)−(p,q))u0(p,q)(1−H(φ(p,q)))dpdqgk ((x,y)−(p,q))(1−H(φ(p,q)))dpdq . We use the con-

ventional regularizing term ER that includes a penalty termon the total length of the edge contour for a given segmenta-tion. Also it contains another penalty term on the total area ofthe foreground region found by the segmentation. The energyterm therefore becomes:

En2(φ) = μ∫

Ω

δ (φ) +v∫

Ω

H (φ (x, y)) dxdy + |∇φ|dxdy

+λ1∫

Ω

(u0 (x, y) − c1 (x, y))2H (φ (x, y))c1(x, y)2

dxdy

+λ1∫

Ω

(u0 (x, y) − c1 (x, y))2 (1 − H (φ (x, y)))c2(x, y)2

dxdy

+λ2 (gku0 (x, y) − d1 (x, y))2H (φ (x, y))

d1(x, y)2dxdy

+λ2 (gku0 (x, y) − d2 (x, y))2 (1 − H (φ (x, y)))

d2(x, y)2dxdy

(9)

This Eq. (9) has to be maximized to obtain higher Bhat-tacharyya coefficient. The similarity distance measure nowbecomes:

En (φ) = En1 (φ) + En2 (φ) (10)

We model the motion of the target as affine transformationby introducing a wrap in (10):

x = h (x,ΔT ) =(1 + f g1 f g3 f g5

f g2 1 + f g4 f g6)⎛

⎝xy1

⎞

⎠

(11)

The column vector characterizes the change in poses. Sub-stituting (11) in (10) and omitting the terms that are not afunction ofΔT -incrementalwarp (representedφ), we obtain:

En (φ) = 12Ff

n∑

i=1H (φ (h (x,ΔT ))) w f ,i

+ 12Fb

γ

n∑

i=1(1 − H (φ (h (x,ΔT ))))wb,i (12)

ΔT tends to 0, and the estimation gets converged. In this way,the registration step iteratively estimates the shape changeuntil it gets converged.

Segmentation

Since the tracker in the registration stage is still not ableto extract the target contour properly, the registration resultneeds to be refined through segmentation. In order to do this,we optimize φ in Eq. (10) because the equation is a functionof φ; in other words, ∂En(φ(xi ))

∂φ(xi )= 0. This is solved by well-

known steepest-ascent method: ∂En(φ(xi ),t)∂t = ∂En(φ(xi ))∂φ(xi ) .

We obtain:

δφ (x, y, t)

δt= δ∈ (φ)

[μ∇.

( ∇φ|∇φ|

)− v

+λ1(

(u0 (x, y) − c2 (x, y))2c2(x, y)

2− (u0 (x, y) − c1 (x, y))

2

c1(x, y)2

)]

+λ2( (

gku0 (x, y) − d2 (x, y))2

d2(x, y)2

−(gku0 (x, y) − d1 (x, y)

)2

d1(x, y)2

)

+ 12

Δtδ∈ (φ)(

1

F fh f ,i − γ 1Fb

hb,i

)

(13)

∂ε (φ)

|∇φ|∂φ

∂−→n = 0 on ∂Ω (14)

where H and δ∈ represent the Heaviside function and Diracmeasure, respectively; ∂φ

∂−→n and

−→n denote the normal deriva-tive of φ at the boundary and the exterior normal to theboundary, respectively. Finally, the target is updated on sub-sequent frames.

123


Fig. 2 a–d Input frames in a video sequence to be denoised. e,f Results of denoising

Data

The datasets used in this work are obtained from pri-vate sources such as Hamad Medical Corporation (30 datasequences) and public sources such as Sunnybrook [32] (45data sequences) andVOT 2015 [40] (60 data sequences). TheSunnybrook Cardiac Data (SCD) consist of cine MRI datafrom a mixed set of patients and pathologies:healthy,hyper-trophy, heart failure with infarction,and heart failure withoutinfarction. Subset of this data set was first used in the auto-mated myocardium segmentation challenge from short-axisMRI.The VOT 2015 sequences are chosen from a large poolof sequences including ALOV, OTB, non-tracking, Com-puter Vision Online, Professor Bob Fisher’s ImageDatabase,Videezy, Center for Research inComputerVision, Universityof Central Florida,USA, NYUCenter for Genomics and Sys-tems Biology,Data Wrangling, Open Access Directory andLearning and Recognition in Vision Group, INRIA.The ini-tial pool of sequences is created by combining the sequencesfrom all the sources. After removal of duplicate sequences,grayscale sequences and sequences that contained objectswith area smaller than 400 pixels, the final sequences areobtained;more details can be obtained from theWeb site [45].

Results and discussion

Results

The proposed method is implemented on both private andpublic databases as described earlier. The qualitative results

of denoising are provided in Fig. 2. We have quantitativelycompared the proposeddenoisingmethodwith that of Fourierbecause of its huge popularity [34]. The perceptual qualitymeasurement (PQM) [41] is provided in Fig. 3, which showsgreater value in case of MODWT suggesting higher efficacyof MODWT; in this figure, m denotes the mass of the parti-cle that moves under stochastic condition. For denoising ofthe input images, the initial values of Δt and z are taken as0.007 and 0.000027, respectively. To determine the quality ofthe denoised image, we have calculated distribution separa-tion measure that estimates the degree of image quality. TheDSM is defined as [34]: DSM = ∣∣μET − μEB

∣∣ − ∣∣μOT − μOB∣∣,

whereμET andμOT are the mean of the selected target regions

of the denoised and original images, respectively; μEB andμOB are the mean of the selected background region of thedenoised and original image, respectively. The higher thevalue of DSM, the better is the quality. It is observed thatthe value of DSM is maximum at iteration 200 and then itstarts decreasing; therefore, this iteration is considered asthe optimal.

These denoised frames are further used in the subsequentsteps in the proposed method. As mentioned earlier, we haveincluded the image sequences of cardiac surgery and clip-ping for ruptured cerebral aneurysms in this work. We havealso tested our method on cardiac cine MRI datasets, highcontrast and low contrast levels, to highlight the performingcapability of the method in varying intensities. The perfor-mance results on these datasets are provided in Figs. 4 and5. We have chosen different scenarios for cerebral aneurysmsurgical procedure (clipping): One is to track the scissors’

123


15 10 5 08.3

8.4

8.5

8.6

8.7

8.8

8.9

9Perceptual Quality Measure

log m

PQ

M

Method1Method2

(a)

5 10 15 20 25 30 35

iteration number

0.5

1

1.5

2

2.5

3

ener

gy

10 8

Paragios et al.Du et al.Proposed Method

(b)

Fig. 3 Perceptual quality measures by Fourier (Method 2) and MODWT (Method 1); m in x-axis denotes the mass of the particle that moves understochastic condition. b Energy convergence comparison of three methods

(a) (b) (c) (d)

(e) (f) (g) (h)

Fig. 4 a–d Ground truth frames. e–h Tracking of left ventricle in low-contrast cine magnetic resonance imaging (low-contrast CMRI) duringcardiac surgery

or clippers’ movement and the other one is to focus on theoperating field during surgery, where multiple tools are usedby the surgeons. It is important to track the motion of thescissors in order to minimize the damage caused by theirmovement. Besides the tools’ tracking, capturing or trackingthe operating field is also important; it helps the surgeon inconcentrating on the tools used during the surgery and theimpacted tissues of interest. The results are given in Figs.6 and 7. We have also tested the proposed method on VOT2015 datasets and found some satisfactory results as can beobserved in Fig. 8. We have included this particular dataset

in this paper to emphasize on the fact that the foregroundis not very significantly different than the background like ithappens inmedical data sequences. Usually, themedical dataare blurry (either reddish or grayish) and lack contrast as canbe observed from the figures. In this scenario, only a contoursurrounding the tools could easily be ignored; therefore, justfor user’s (surgeon) convenience, we have added the blueline surrounding the red line in the tracking results. Whilecalculating the accuracy, red line is only taken into consid-eration. In order to determine the segmentation accuracy, wehave used Dice coefficient (DC), which may be defined as

123


Fig. 5 Tracking of left ventricle in high-contrast cine magnetic resonance imaging (high-contrast CMRI) during cardiac surgery

Fig. 6 Tracking of the operating field with multiple objects during cerebral aneurysm clipping

[44]: DC = 2 × |X∩Y ||X |+|Y | , where X and Y are two point sets.The average segmentation accuracy on 3-T machine is 94%,whereas in case of 7 T, it is found to be 96%. The proposedmethod has performed as expected, which can be verifiedfrom the results provided in “Results” section. We have opti-mized the algorithm and code; average time taken to performtracking and average number of frames are less than 25–30sand 24 frames per second, respectively. We have also com-

pared the performance of the proposed method with othersimilar methods ([31,42]); the proposed method convergesfaster than the other methods 3(b). We have also calculatedoverlap index (OI) [43] to determine the overlap between theresulting target contour and the actual boundary. We havefound it highest in case of the proposed method against oth-ers as can be observed from Table 1.

123


Fig. 7 Tracking of the operating field with multiple objects during cardiac surgery

Fig. 8 a–d Tracked frames in a video sequence (VOT 2015). e–f Corresponding ground truth sequences

Discussion

The values of bistable system parameters play a crucial rolein the process of denoising using SR. The expression for SRon any data set contains additive terms of multiples of w andsubtractive term of multiples of z. This is observed that theimages that have low contrast and low dynamic range requirelarger values ofw, while those that have relatively more con-

trast and cover an appreciable gray level range require smallervalues of w for proper denoising. Values of Δt have beenstudied to be similar to that of w. This is also perceived thatw is inversely proportional to overall variance signifying thecontrast of input image. Optimization process leads us to theoptimum value of w; the value of z should be less than 1 so

that condition e <√

4a327 holds assuring that the system is

bistable and signal is sub-threshold so that SR can be appli-

123


Table 1 Overlap index comparison of different methods on hospital and VOT 2015 datasets

Method Hospital and SCD datasets

FOV-CA (%) Scissors-CA (%) Low-contrast CMRI (%) High-contrast CMRI (%)

Paragois et al. [42] 64 65 64 65

Du et al. [31] 67 69 68 72

Proposed method 72 74 73 76

Method VOT 2015 Dataset

Rabit (%) Shaking (%) Racing (%) Octopus(%)

Paragois et al. [42] 69 70 68 71

Du et al. [31] 72 75 70 76

Proposed method 83 85 82 88

The results of the proposed methods are shown in bold

cable. We prefer a very small value of this factor to remainwell within the allowable range of e. Finally, we have noticedthat the varying segmentation accuracy depends on the qual-ity of the input data sequence. The MRI data obtained from7-T machine give better accuracy than 3-T MRI machine.

Conclusions and future work

A variational framework has been presented to track themotion of moving objects and field of view in surgerysequences. We have presented a method that has used SRto denoise the input frames and a combined registration–segmentation framework to conduct motion tracking. Wehave introduced a robust similarity metric and an efficientenergy functional in this framework. Despite the fact that theinput data contain varying illumination, motion blur, lack ofimage texture, occlusion, and fast object movements, the per-formance of the proposed method is found quite satisfactory.In future, we intend to extensively evaluate the method quan-titatively so that it can be well tested before trying in clinicalpractice.

Acknowledgements Open Access funding provided by the QatarNational Library. This work was partly supported by NPRP Grant#NPRP 5-792-2-328 from theQatar National Research Fund (amemberof Qatar Foundation).

Compliance with ethical standards

Conflict of interest The authors declare that they have no conflict ofinterest.

Open Access This article is distributed under the terms of the CreativeCommons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution,and reproduction in any medium, provided you give appropriate creditto the original author(s) and the source, provide a link to the CreativeCommons license, and indicate if changes were made.

References

1. Di Salvo TG, Acker MA, Dec GW, Byrne JG (2010) Mitral valvesurgery in advanced heart failure. J Am Coll Cardiol 55:271–282

2. Zhai X, Eslami M, Hussein ES, Filali MS, Shalaby ST, Amira A,Bensaali F, Dakua S, Abinahed J, Al-Ansari A, Ahmed AZ (2018)Real-time automated image segmentation technique for cerebralaneurysm on reconfigurable system on chip. J Comput Sci 27:35–45

3. Ganni S, Botden SMBI, Chmarra M (2018) A software-based toolfor video motion tracking in the surgical skills assessment land-scape. Surg Endosc 32:2994

4. Jakimowicz JJ, Buzink S (2015) Training curriculum in minimalaccess surgery. In: Francis N, Fingerhut A, Bergamaschi R,MotsonR (eds) Training in minimal access surgery. Springer, London, pp15–34

5. Feng C, Haniffa H, Rozenblit JW, Peng J, Hamilton AJ, SalkiniM (2006) Surgical training and performance assessment using amotion tracking system. In: International mediterranean modellingmulticonference, I3M, pp 647–652

6. Carroll SM,KennedyAM,TraynorO,GallagherAG (2009)Objec-tive assessment of surgical performance and its impact on a nationalselection programme of candidates for higher surgical training inplastic surgery. J Plast Reconstr Aesthet Surg 62:1543–1549

7. Pruliere-Escabasse V, Coste A (2010) Image-guided sinus surgery.Eur Ann Otorhinolaryngol Head Neck Dis 127:33–39

8. Tjardes T, Shafizadeh S, Rixen D, Paffrath T, Bouillon B, Stein-hausen ES, Baethis H (2010) Image-guided spine surgery: state ofthe art and future directions. Eur Spine J 19:25–45

9. Shaharan S, Nugent E, Ryan DM, Traynor O, Neary P, BuckleyD (2016) Basic surgical skill retention: can patriot motion track-ing system provide an objective measurement for it? J Surg Educ73:245–9

10. Zhao Z, Voros S, Weng Y, Chang F, Li R (2017) Tracking-by-detection of surgical instruments in minimally invasive surgeryvia the convolutional neural network deep learning-based method.Comput Assist Surg 22:26–35

11. ZhangM,Wu B, Ye C, Wang Y, Duan J, Zhang X, Zhang N (2019)Multiple instruments motion trajectory tracking in optical surgicalnavigation. Opt Express 27:15827–15845

12. Berry D (2009) Percutaneous aortic valve replacement: an impor-tant advance in cardiology. Eur Heart J 30:2167–2169

13. Kobayashi S, Cho B, Huaulme A, Tatsugami K, Honda H, JanninP, Hashizumea M, Eto M (2019) Assessment of surgical skills byusing surgical navigation in robot-assisted partial nephrectomy. Int

123

http://creativecommons.org/licenses/by/4.0/http://creativecommons.org/licenses/by/4.0/


JComputAssistRadiol Surg. https://doi.org/10.1007/s11548-019-01980-8

14. Docquier PL, Paul L, TranDuy K (2016) Surgical navigation inpaediatric orthopaedics. EFORT Open Rev 1:152–159

15. Tadayyon H, Lasso A, Kaushal A, Guion P, Fichtinger G (2011)Target motion tracking in MRI-guided transrectal robotic prostatebiopsy. IEEE Trans Biomed Eng 58:3135–42

16. Ozkan E, Tanner C, Kastelic M, Mattausch O, Makhinya M, Gok-sel O (2017) Robust motion tracking in liver from 2D ultrasoundimages using supporters. Int J Comput Assist Radiol Surg 12:941–950

17. Liu TJ, KoAT, TangYB, Lai HS, ChienHF,Hsieh TM (2016) Clin-ical application of different surgical navigation systems in complexcraniomaxillofacial surgery: the use of multisurface 3-dimensionalimages and a 2-plane reference system. Ann Plast Surg 76:411–9

18. Engelhardt S, Simone RD, Al-Maisary S, Kolb S, Karck M,Meinzer HP, Wolf I (2016) Accuracy evaluation of a mitral valvesurgery assistance system based on optical tracking. Int J ComputAssist Radiol Surg 11:1891–904

19. Niehaus R, Schilter D, Fornaciari P, Weinand C, Boyd M, ZiswilerM, Ehrendorfer S (2017) Experience of total knee arthroplastyusing a novel navigation system within the surgical field. Knee24:518–524

20. Kim BG, Park DJ (2004) Unsupervised video object segmentationand tracking based on new edge features. Pattern Recognit Lett25:1731–1742

21. Subudhi BN, Nanda PK, Ghosh A (2011) A change informationbased fast algorithm for video object detection and tracking. IEEETrans Circuits Syst Video Technol 21:993–1004

22. Duffner S, Garcia C (2017) Fast pixel wise adaptive visual trackingof non rigid objects. IEEE Trans Image Process 26:2368–2380

23. Li J, Zhou X, Chan S, Chen S (2017) Robust object tracking vialarge margin and scale adaptive correlation filter. IEEE Access6:12642–12655

24. Mahalingam T, Subramoniam M (2018) A robust single and mul-tiple moving object detection, tracking and classification. ApplComput Inf. https://doi.org/10.1016/j.aci.2018.01.001

25. Zhang T, Liu S, Xu C, Liu B, Yang M (2018) Correlation particlefilter for visual tracking. IEEE Trans Image Process 27:2676–2687

26. Yang yang G, Dong-jian H, Cong L (2018) Target tracking and 3Dtrajectory acquisition of cabbage butterfly based on the KCF-BSalgorithm. Sci Rep Nat 8:9622

27. Du B, Sun Y, Cai S, Wu C, Du Q (2018) Object tracking in satellitevideos by fusing the kernel correlation filter and the three-frame-difference algorithm. IEEE Trans Image Process 15:168–1821

28. Ning J,ZhangL,ZhangD,YuW(2013) Joint registration and activecontour segmentation for object tracking. IEEE Trans Circuits SystVideo Technol 23:1589–1597

29. Liu G, Liu S, Muhammad K, Sangaiah A, Doctor F (2018) Objecttracking in vary lighting conditions for fog based intelligent surveil-lance of public spaces. IEEE Access 6:29283–29296

30. Liu S, Feng Y (2018) Real-time fast moving object tracking inseverely degraded videos captured by unmanned aerial vehicle. IntJ Adv Robot Syst SAGE 11:1–10

31. Du D,Wen L, Qi H, Huang Q, Tian Q, Lyu S (2018) Iterative graphseeking for object tracking. IEEE Trans Image Process 27:1809–1821

32. Radau P, Lu Y, Connelly K, Paul G, Dick AJ, Wright GA (2009)Evaluation framework for algorithms segmenting short axis cardiacMRI. MIDAS J Cardiac MR Left Ventricle Segm Chall

33. Dakua S, Abinahed J, Al-Ansari A (2018) A PCA based approachfor brain aneurysm segmentation. J Multi Dimens Syst Signal Pro-cess 29:257–277

34. Rallabandi V, Roy P (2010) MRI enhancement using stochasticresonance in Fourier domain. Magn Reson Imaging 28:1361–1373

35. vom Scheidt J, Gard TC Introduction to stochastic differentialequations. Pure and applied mathematics 114, XI, 234 pp. Mar-cel Dekker Inc., New York . ISBN 0-8247-7776-X

36. Yao SJ, Song YH, Zhang LZ, Cheng XY (2000) MODWT andnetworks for short-term electrical load forecasting. EnergyConversManag 41:1975–1988

37. ComaniciuD,RameshV,Meer P (2003)Kernel-based object track-ing. IEEE Trans Pattern Anal Mach Intell 25:564–577

38. Vezzetti E, Marcolin F (2015) Similarity measures for face recog-nition. Bentham Books, Sharjah, United Arab Emirates. ISBN:978-1-68108-045-1

39. ErdemCE, SankurB, TekalpAM(2004) Performancemeasures forvideo object segmentation and tracking. IEEETrans Image Process13:937–951

40. Matej K, Matas J, Leonardis A, Felsberg M, Cehovin L (2015)The visual object tracking VOT 2015 challenge results. In: IEEEinternational conference on computer vision workshop, pp 564–586

41. Wang Z, Sheikh HR, Bovik AC (2002) No-reference perceptualquality assessment of jpeg compressed images. In: Proceedings ofIEEE international conference image processing, page 477–480

42. Paragios N, Deriche R (2000) Geodesic active contours and levelsets for the detection and tracking of moving objects. IEEE TransPattern Anal Mach Intell 22:266–280

43. Rosenfield GH, Fitzpatrick Lins K (1986) A coefficient of agree-ment as ameasure of thematic classification accuracy. PhotogrammEng Remote Sens 52:223–227

44. Shi F, YangQ, GuoX, Qureshi T, Tian Z,MiaoH, DeyD, Li D, FanZ (2019) Vessel wall segmentation using convolutional neural net-works. IEEE Trans Biomed Eng. https://doi.org/10.1109/TBME.2019.2896972

45. http://www.votchallenge.net/vot2015/dataset.html

Publisher’s Note Springer Nature remains neutral with regard to juris-dictional claims in published maps and institutional affiliations.

123

https://doi.org/10.1007/s11548-019-01980-8https://doi.org/10.1007/s11548-019-01980-8https://doi.org/10.1016/j.aci.2018.01.001https://doi.org/10.1109/TBME.2019.2896972https://doi.org/10.1109/TBME.2019.2896972http://www.votchallenge.net/vot2015/dataset.html

Moving object tracking in clinical scenarios: application to cardiac surgery and cerebral aneurysm clippingAbstractIntroductionClinical requirements in surgeryRelated work

Methodology and dataDenoising of image sequencesMaximal overlap discrete wavelet transformDenoising by MODWT

Target renderingRegistrationSegmentationData

Results and discussionResultsDiscussion

Conclusions and future workAcknowledgementsReferences

Moving object tracking in clinical scenarios: application to cardiac … · 2019. 11. 15. · Keywords Cerebral aneurysm ·Segmentation ·Object tracking · Heart surgery ·Brain

Documents