Aquatic Debris Monitoring Using Smartphone-Based Robotic ...cvlab.cse.msu.edu/pdfs/Wang_Tan_Xing_Wang_Tan_Liu_Chang_IPSN… · Robot), a low-cost, vision-based surveillance robot

Aquatic Debris Monitoring Using Smartphone-Based

Robotic Sensors

Yu Wang1, Rui Tan2, Guoliang Xing1, Jianxun Wang3, Xiaobo Tan3, Xiaoming Liu1, and Xiangmao Chang1,4

1Department of Computer Science and Engineering, Michigan State University, USA2Advanced Digital Sciences Center, Illinois at Singapore

3Department of Electrical and Computer Engineering, Michigan State University, USA4College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, China

Abstract—Monitoring aquatic debris is of great interest tothe ecosystems, marine life, human health, and water transport.This paper presents the design and implementation of SOAR– a vision-based surveillance robot system that integrates anoff-the-shelf Android smartphone and a gliding robotic fish fordebris monitoring. SOAR features real-time debris detectionand coverage-based rotation scheduling algorithms. The imageprocessing algorithms for debris detection are specifically de-signed to address the unique challenges in aquatic environments.The rotation scheduling algorithm provides effective coverage ofsporadic debris arrivals despite camera’s limited angular view.Moreover, SOAR is able to dynamically offload computation-intensive processing tasks to the cloud for battery power con-servation. We have implemented a SOAR prototype and con-ducted extensive experimental evaluation. The results show thatSOAR can accurately detect debris in the presence of variousenvironment and system dynamics, and the rotation schedulingalgorithm enables SOAR to capture debris arrivals with reducedenergy consumption.

Keywords—Robotic sensor; aquatic debris; smartphone; com-puter vision; object detection

I. INTRODUCTION

Aquatic debris – human-created waste found in waterenvironments – has emerged to be a grave environmental issue.The 2011 Japan tsunami released about one million tons ofdebris that heads toward North America [1], and some hasdrifted to U.S. West Coast as shown in Figure 1(a). Inlandwaters also face severe threats from debris. Over 15 sceniclakes in New Jersey still suffer debris resulted from HurricaneSandy after one year of cleaning [2]. The debris fields posenumerous potential risks to aquatic ecosystems, marine life,human health, and water transport. For instance, the debrishas led to a loss of up to 4 to 10 million crabs a year inLouisiana [3], and caused damages like propeller entanglementto 58% fishing boats in an Oregon port [4]. It is thus imperativeto monitor the debris arrivals and alarm the authorities to takepreventive actions for the potential risks.

Opportunistic spotting by beach-goers or fishermen is oftenthe only viable solution for small-scale debris monitoring.However, this approach is labor-intensive and unreliable. Analternative approach is in situ visual survey by using patrolboats [10]. However, it is costly and can only cover a limitedperiod of time. More advanced methods involve remote sensingtechnologies, e.g., balloon-board camera [20] and satelliteimaging [22]. The former is only effective for one-off andshort-term monitoring of highly concentrated debris fields that

(a) (b)

Fig. 1. (a) Debris from the Japan tsunami arriving at U.S. West Coast, 2012(Photo Credit: Scripps Institution of Oceanography [5]); (b) SOAR prototypeintegrating a Samsung Galaxy Nexus smartphone in a water-proof enclosurewith a gliding robotic fish [6].

have been already detected, and the latter often has highoperational cost and falls short of monitoring resolution. Re-cently, autonomous underwater vehicles (AUVs) [14] [31] havebeen used to for various underwater sensing tasks. However,AUV platforms often have high manufacturing costs (over$50,000 per unit [25]). The limitations of these remote sensingand AUV-based approaches make them cost prohibitive formonitoring spatially and temporally scattered debris fields withsmall-sized objects. For instance, the debris from the 2011Japan tsunami is expected to arrive dispersedly along U.S.West Coast over two years starting from spring of 2012 tolate 2014 [1].

This paper presents SOAR (SmartphOne-based AquaticRobot), a low-cost, vision-based surveillance robot systemthat integrates an off-the-shelf smartphone and a robotic fishplatform. Figure 1(b) shows a prototype of SOAR built witha gliding robotic fish developed in our previous work [6]and a Samsung Galaxy Nexus smartphone. Various salientadvantages of smartphone and gliding robotic fish make theintegration of them a promising platform for debris monitoring.First, recent smartphones are powerful enough to execute ad-vanced computer vision (CV) algorithms to process the imagescaptured by the camera to detect debris objects. Meanwhile,the price of smartphones has been dropping drastically inthe last five years. Many low-end Android phones (e.g., LGOptimus Net with 800MHz CPU and 2GB memory) cost onlyabout $100 [7]. Second, besides visual sensing, various built-in sensing modalities such as GPS and accelerometer can beused to facilitate the navigation and control of the robot andenable situation awareness to improve the debris detectionperformance. Third, the long-range communication capability(3G/4G) of smartphone makes it possible to leverage the cloudto increase robot’s intelligence and reduce energy consumption

by offloading intensive computation. Lastly, as a commercialoff-the-shelf platform, smartphone provides an integrated sens-ing system and diverse system configurations, which can meetthe requirements of a wide spectrum of embedded applications.Moreover, it offers user-friendly programming environmentsand extensive library support, which greatly accelerates thedevelopment process. The gliding robotic fish, which is a low-cost aquatic mobile platform with high maneuverability in rota-tion and orientation maintenance, provides SOAR the mobilityto adapt to the dynamics of debris and water environments.Owing to these features, SOAR represents an unprecedentedvision-based, cloud-enabled, low-cost, yet intelligent aquaticmobile sensing platform for debris monitoring.

However, the design of SOAR still faces several uniquechallenges associated with aquatic debris monitoring. First,due to the impact of waves, SOAR cannot acquire a stablecamera view, thereby making it highly difficult to reliablyrecognize the debris objects. A possible solution is imageregistration [19] that aligns multiple images into a commoncoordinate system. However, water environments often lackdetectable features such as sharp corners that are commonlyused for image registration. Second, SOAR is powered bysmall batteries due to the constraints on the form factorand cost budget, while both aquatic movement of the robotand image processing on the smartphone incur high energyconsumption. Lastly, debris arrivals are often sporadic in alarge geographic region [10] [12], making them highly difficultto be captured using smartphone cameras that typically havelimited field of view. To address these challenges, in this paperwe make the following contributions:

1) We develop several lightweight CV algorithms to addressthe inherent dynamics in aquatic debris detection, whichinclude an image registration algorithm for extracting thehorizon line above water and using it to register the imagesto mitigate the impact of camera shaking, and an adaptivebackground subtraction algorithm for reliable detection ofdebris objects.

2) We propose a novel approach to dynamically offloadingthe computation-intensive CV tasks to the cloud. The off-loading decisions are made to minimize the system energyconsumption based on in situ measurements of wireless linkspeed and robot acceleration.

3) We analyze the coverage for sporadic and uncertain debrisarrivals based on geometric models. Using the analyticaldebris arriving probability, we design a robot rotationscheduling algorithm that minimizes the movement energyconsumption while maintaining a desired level of debriscoverage performance.

We implement a prototype of SOAR and evaluate it throughextensive testbed experiments and trace-driven simulations.The results show that SOAR can accurately detect debris inthe presence of various dynamics and maintain a satisfactorylevel of debris arrival coverage while reducing the energyconsumption of robot movement significantly.

The rest of this paper is organized as follows. Section IIreviews related work. Section III overviews SOAR. Section IVpresents the vision-based debris detection algorithm. Section Vpresents the coverage-based rotation scheduling algorithm.Section VI describes the implementation of SOAR. Section VIIpresents the evaluation results. Section VIII concludes.

II. RELATED WORK

Several research efforts have explored the integrationof cameras with low-power wireless sensing platforms.Cyclops [23] integrates a CMOS imager hosted by a MICA2mote [11]. It can perform object detection using a naive back-ground subtraction method. In [31], a low-end camera moduleis installed on an AUV for navigation. However, these camera-based platforms can only conduct simple image processingtasks due to the resource constraints of motes. Recently, mobilesensing based on smartphones has received increasing researchinterest due to their rich computation, communication, andstorage resources. The study in [32] designs a driving safetyalert system that can detect dangerous driving behaviors usingboth front- and rear-facing cameras of a smartphone. In [16], abarcode-based visual communication scheme is implementedon smartphone. This paper aims to design an aquatic debrissurveillance robot that utilizes the built-in camera, inertialsensors, and other resources on smartphone. Different fromexisting vision-based systems, we need to deal with severalunique challenges in aquatic debris monitoring, such as camerashaking and sporadic debris arrivals.

Extracting the foreground objects from a sequence ofvideo frames is a fundamental CV task. Background sub-traction [19] is a widely adopted approach, which, however,often incurs significant computation overhead to resource-constrained devices. In [26], compressive sensing is appliedfor background subtraction to reduce computation overhead.In [33], an adaptive background model is proposed to tradeoff the object detection performance and computation overheadof background subtraction. These approaches assume a staticcamera view, and hence cannot be readily applied to the debrisdetection in water environments where camera is constantlyshaking due to waves. This paper develops a collection ofvision-based detection algorithms that are specifically designedfor background subtraction in dynamic water environments andoptimized for smartphone platforms.

Several approaches have been proposed to maintain moni-toring coverage with cameras. In [13], the placement of staticcameras is determined according to a floor plan. A cameranetwork deployment approach is developed in [17] to minimizethe coverage overlap between neighboring cameras. Differentfrom these approaches that focus on static cameras, we exploitthe controllable mobility of robot to increase the likelihoodof capturing debris objects. Moreover, unlike previous studies(e.g., [9]) that use mobile cameras to cover fixed locations ofinterest, this paper aims to reduce the miss rate in coveringsporadic debris arrivals.

III. OVERVIEW OF SOAR

SOAR consists of an off-the-shelf Android smartphoneand a gliding robotic fish. The smartphone is loaded withan app that implements the CV, movement scheduling, andcloud communication algorithms. The gliding robotic fish iscapable of moving in water by beating its tail that is driven bya servo motor. The motor is manipulated by a programmablecontrol board, which can communicate with the smartphonethrough either a USB cable or short-range wireless linkssuch as ZigBee. Various closed-loop motion control algorithmsbased on smartphone’s built-in inertial sensor readings can beimplemented on either fish control board or smartphone.

SOAR is designed to operate on water surface and monitorfloating debris in nearshore aquatic environments, such aspublic recreational beaches, where wireless (cellular or WiFi)coverage is available. We focus on monitoring static or slow-moving on-water objects, and filter out other objects such asboats and swimmers based on the estimated speed. When along shoreline needs to be monitored, multiple SOAR nodescan be deployed dispersedly to form barrier coverage. Inthis case, the number of needed nodes is the ratio of thelength of the monitored shoreline to the coverage range ofthe smartphone’s built-in camera. In this paper, we focuson the design of debris detection and mobility schedulingalgorithms running on a single SOAR node. The sensing resultsof multiple nodes can be sent back to a central server via thelong-range communication interface on smartphones for fusionand human inspection. SOAR has a limited sensing area due tothe angular view of the built-in camera on smartphone1, whichmakes it difficult to capture the debris arrivals that are likelysporadic [10] [12]. Mobility can be exploited to address thischallenge. The gliding robotic fish is capable of both rotatingand moving forward. As a rotation can be achieved by beatingthe fish tail once, it consumes much less energy than movingforward that requires continuous tail beats. Thus, this paperexploits the rotation mobility of the robotic fish and assumesthat the SOAR remains relatively stationary in water. In stillwater or slow water current, feedback motion control canmaintain SOAR’s station. In the presence of fast water current,an anchor can be used together with motion control to reduceenergy consumption in maintaining a stationary position.

After the initial deployment, SOAR begins a debris surveil-lance process consisting of multiple monitoring rounds. In eachround, SOAR executes a rotation schedule, which includes thecamera orientation and an associated monitoring time interval.Specifically, SOAR rotates to the desired orientation at thebeginning of a round and starts to take images at a certainrate, which is determined by a sleep/wake duty cycle. Foreach image, SOAR uses several CV algorithms to detect theexistence of debris objects in real time. Between two imagecaptures, SOAR sleeps to save energy. At the end of a round,SOAR computes the rotation schedule for the next round basedon the detection results to ensure a desired level of debriscoverage. SOAR is designed to achieve long-term (up to a fewmonths) autonomous debris monitoring. In addition to dutycycling, it adopts a novel offloading approach to leveragingthe cloud for battery power conservation. Specifically, SOARcomprises the following two major information processingcomponents.

Real-time debris detection: This component aims to extractdebris objects from the taken images. It consists of threelightweight image processing modules, i.e., image registration,background subtraction, and debris identification, which caneffectively deal with various environment and system dynamicssuch as shaking and internal noise of the camera. Specifically,SOAR first registers each frame by exploiting the uniquefeatures in aquatic environments, e.g., the coastline for inland

1Extra optical components like fisheye lens can be used to broaden thecamera view. However, the integration of these components to SOAR willcomplicate the system design. In particular, additional complex and energy-consuming image processing algorithms (e.g., distortion rectification) are oftenneeded.

waters and the horizon line for marine scenarios. Then, back-ground subtraction in HSV color space is performed on theregistered frame to identify the foreground pixel candidates.Finally, the foreground is passed to debris identification fornoise removal and debris recognition. At runtime, SOARminimizes the battery power consumption by determining ifthe above image processing tasks should be locally executedor entirely/partially offloaded to the cloud depending on thecurrent network condition, e.g., the cellular network availabil-ity and link speed.

Coverage-based rotation scheduling: On the completion of amonitoring round, SOAR analyzes the debris coverage perfor-mance based on the estimated debris movement orientationand the surveillance history. It then adaptively configuresthe camera orientation and monitoring time interval for thenext round. Because of the limited energy supply and power-consuming movement in water environments, SOAR mustefficiently adjust its orientation while maintaining a desiredlevel of debris coverage performance. To this end, we proposea scheduling algorithm that minimizes the rotation energy con-sumption in a round by dynamically configuring the rotationschedule, subject to a specified upper bound on miss coveragerate for debris arrivals.

IV. REAL-TIME DEBRIS DETECTION

The image processing pipeline of SOAR is illustrated inFigure 2. Although it is based on a collection of elementaryCV algorithms, it is non-trivial to optimize these computation-intensive synergistic algorithms for smartphones given thelimited resources and stringent requirement on system lifetime.Specifically, SOAR consists of the following image processingcomponents. The image registration aligns consecutive framesto mitigate the impact of camera shaking caused by waves. Inthis paper, we focus on the marine environment as an example,although our techniques can adapt to other environments. In amarine scenario, the horizon line can be used as a referenceto register the frames. To deal with the high computationoverhead of horizon line extraction, SOAR offloads a portionof computation to the cloud based on the network link speedand shaking levels indicated by the smartphone’s inertial sensorreadings. The registered frames are then compared with abackground model to extract the foreground. Lastly, the debrisidentification removes the salt-and-pepper noises from theforeground image and then identifies the debris objects.

A. Horizon-based Image Registration

Image registration is the process of aligning images takenat different time instants into one coordinate system [19]. Indebris detection, image registration is necessary to mitigatethe impact of camera shaking caused by waves, such thatsubsequent pixel-wise processing can be executed properly.Registration is performed by establishing correspondence be-tween images based on their distinguishable features. However,a key challenge is that there are few detectable image featuresin typical water environments that can guide the image registra-tion. A novelty of our approach is to leverage the horizon line,which segments the sky and water areas, for image registration,as shown in Figure 2(a).

Fig. 2. Real-time debris objects detection.

We employ Hough transform [18] to extract the horizonline. Hough transform has been widely used to identify thepositions of lines in an image. We assume that the majorityof an image is either sky or water area, which are separatedby the horizon line. As the sky and water areas typically havedistinct colors [12], Hough transform is able to extract thehorizon line accurately. For each frame, we first convert it to agrayscale image and detect edges using a Sobel operator [19].The Sobel operator detects an edge point according to thelocal intensity of gradient magnitude. Hough transform thenfinds the horizon line through a voting process based onthe number of edge points that each candidate line passesthrough. The result of Hough transform is a line expressedas r= x · cosφ+y · sinφ in the Cartesian coordinate systemoriginated at the bottom-left corner of an image. The horizonline is parameterized by r and φ, which are the distancefrom the horizon line to the origin and the angle between thehorizon line’s orthogonal direction and x-axis, respectively.An illustration of the extracted horizon line is shown inFigure 2(a).

Based on the extracted horizon line, we register each videoframe to mitigate the impact of camera shaking. Specifically,for two consecutive frames, we register the successor frameaccording to the registered predecessor by aligning their ex-tracted horizon lines. Let ∆y denote the vertical shifting atthe midpoint of the horizon lines, and η denote the anglethat the horizon line rotates in these two frames. We assumethat the midpoints of the horizon lines in the two framescorrespond to the same point in the world plane. Such anassumption is motivated by the observation that the closed-loopmotion control algorithms can maintain the robot’s orientationand position by adaptive course-correction. Therefore, ∆y andη define the affine transform between these two consecutiveframes. First, we shift the successor frame to align the midpointof its horizon line with that of the registered predecessor frameby x′ = x and y′ = y+∆y, where (x, y) are the coordinatesof a pixel in the unregistered frame, and (x′, y′) are thecorresponding coordinates of (x, y) after shifting. Then, werotate the frame by

[xryr

]=

[cos η sin η x0(1−cosη)−y0 sin η− sin η cos η x0 sin η−y0(1−cosη)

]·

[x′

y′

1

],

where (xr , yr) denote the coordinates of the underlying pixelin the reference frame, and (x0, y0) denote the coordinates ofhorizon line midpoint in the successor frame after shifting andserve as the center of rotation. For those pixels without theircorrespondents in the original unregistered frame due to therotation, we adopt bilinear interpolation to fill the color data(i.e., RGB) for them.

B. Background Subtraction

To reduce the energy consumption in image processing,we adopt a lightweight background subtraction approach todetecting the foreground debris objects. We first convert therepresentation of an image to HSV (Hue, Saturation, andValue) model. In HSV, hue represents the color, saturation isthe dominance of hue, and value indicates the lightness ofthe color. The HSV representation is robust to illuminationchanges and hence more effective in interpreting color featuresin the presence of reflection in water environments.

In our approach, the background of a pixel is representedby a Gaussian mixture model (GMM) [19]. The GMM com-prises K three-dimensional Gaussians, where each Gaussiancharacterizes the three channels of HSV color space. Whena new frame is available, each pixel is compared with itsbackground model. If the HSV vector of a pixel in the newframe does not fall within a certain range from any mean vectorof the K Gaussians, this pixel is considered a foreground pixelcandidate; otherwise, it is classified as background. Therefore,the color difference between the foreground and backgroundaffects the classification accuracy. In our implementation, therange is chosen to be 2.5 times of the standard deviation ofthe corresponding Gaussian. Under this setting, the image seg-mentation can tolerate minor inaccuracy introduced by noisesand accommodate certain environmental condition changes.

Any labeled pixel will be used to update the GMM. InGMM, each Gaussian has a weight that characterizes thefraction of historical pixels supporting it. If none of the existingK Gaussians match this pixel (i.e., the pixel has been classifiedas foreground), the distribution with the lowest weight isreplaced by a new Gaussian, which is assigned with a largevariance, a small weight, and a mean vector equal to thisnewly arrived pixel; otherwise, the weight, mean, and variancevectors of the matched Gaussian are updated based on the newpixel value and a user-specified learning rate. The details ofGMM update can be found in [29]. This scheme allows theGMM to adapt to environmental condition changes, enhancingthe robustness of background subtraction.

We now discuss the impact of the number of Gaussians(i.e., K) in the GMM on background substraction performance.As shown in [29], the background model with K=1 can onlydeal with static background. Therefore, more Gaussians areneeded in aquatic debris detection to account for the dynamicsfrom camera shaking, reflection, and noises. A larger K canenrich the information maintained by the GMM and henceimprove its robustness. However, it also imposes additionalcomputation overhead for the smartphone. In Section VII-A3,we will evaluate the trade-off between system overhead anddetection performance through experiments, which guides thesetting of K . We note that it may require a large K todescribe the environments with more complex and dynamic

background. Our approach can be easily extended to employexisting online algorithms that maintain a GMM with variableK adaptive to background changes, such as the reversible jumpMarkov chain Monte Carlo method [24] and its variant [30]that has been used for video processing.

C. Debris Identification

The binarized foreground image contains randomly dis-tributed noise pixels, as depicted in Figure 2(b). Because thebackground subtraction is conducted in a pixel-wise manner,the labeling of foreground and background can be affectedby camera noise, resulting in false foreground pixels. Todeal with these noise pixels, we adopt the opening operationin image morphology [19]. The opening operation, whichconsists of erosion followed by dilation, eliminates the noisepixels through erosion while preserving the true foreground bydilation. After the noise removal, we employ region growingto identify the debris objects from the foreground image. Ituses the foreground pixels as the initial seed points and formsconnected regions that represent the candidate debris objectsby merging nearby foreground pixels.

We note that the extracted foreground candidate objectsmay contain objects that can move actively, e.g., boats andswimmers. SOAR adopts a threshold-based approach to filterout these non-debris objects. Specifically, the robot estimatesthe object movement speed based on the pinhole cameraprojection model (cf. Section V-D). When the estimated speedis higher than a threshold, SOAR will save the image andperiodically transmit back to a central server for human in-spection. The threshold on speed is chosen to exclude the non-debris objects with active mobility. Similar heuristics basedon object moving orientations can be applied to improve theaccuracy of debris recognition. We use another threshold-basedmethod to remove the small objects. The small object size inthe image usually indicates a false alarm or a distant debrisobject. Ignoring them does not affect the system accuracy sincedistant real debris objects will likely be detected when theyapproach closer to the camera. Note that the shape of debrisobject has little impact on the debris detection performance,as the false negatives in detection are mainly caused by thehigh color similarity between foreground and background andthe long distance between object and camera.

D. Dynamic Task Offloading

A key advantage of smartphone-based robots lies in theircapability of leveraging the abundant resources of the cloud. Toprolong the smartphone battery lifetime, SOAR dynamicallyoffloads the entire/partial image processing to the cloud whenthere is network coverage. As the typical coverage of acellular tower is up to several miles [28], cellular networkscan be available in nearshore water surface. The offloadingdecision mainly depends on two factors: 1) the overhead ofimage processing algorithms, e.g., the power consumption,when they are executed locally; and 2) the wireless networkcondition, e.g., the uploading speed, which determines theenergy consumption for uploading images to the cloud. In ourdesign, SOAR has three offloading schemes, i.e., cloud, local,and hybrid processing, and dynamically chooses a scheme withthe lowest energy consumption. The energy consumption ofthese schemes are analyzed as follows.

3

4

5

6

7

8

9

1 1.5 2 2.5

Ener

gy

(J)

WiFi link speed γ (Mbps)

cloud processinghybrid processing

local processing

Fig. 3. Energy consumption per frame on Samsung Galaxy Nexus.

Cloud and local processing schemes: We first analyzethe energy consumption of the local and cloud processingschemes, where all the processing on the original frame isconducted either in the cloud or on the phone. When a newframe is available, SOAR checks the network (3G/4G/WiFi)link speed and estimates the delay to upload this frame. Theenergy consumption for uploading the entire image is givenby ecloud = pc(γ) · s/γ, where s is the frame size, γ isthe measured link speed, and pc(γ) is the uploading powerconsumption under the link speed γ. Alternatively, the framecan be processed on the phone. Let tl denote the delay to pro-cess a frame locally, including image registration, backgroundsubtraction, and debris identification. Our measurements showthat these modules have similar CPU power consumption,which is denoted by pl. The energy drain for processing aframe on the phone is thus given by elocal=pl ·tl.

Hybrid processing scheme: In addition to the above twooptions, we propose a hybrid solution that offloads the Houghtransform in image registration to the cloud and conductsthe rest of processing locally. Such a design is motivatedby the observation that the Hough transform incurs nearly70% processing overhead for a frame (cf. Section VII-A1).Moreover, as the energy consumption for offloading is largelydetermined by the uploading volume, we propose to uploada rectangular part of the original frame, which contains thehorizon line. As the camera is shaking, the selection is based onthe horizon line in the reference frame and the accelerometerreadings. Specifically, we adopt the accumulated linear verticalacceleration (denoted by

∑az) over the time duration from the

predecessor reference frame to the current frame to quantifythe camera shaking. In theory, if

∑az < 0, the horizon line

will shift upward in the current frame; otherwise, it will shiftdownward. Therefore, from the sign of

∑az , we can estimate

whether the horizon line in the current frame is in the upperor lower part divided by the horizon line in the referenceframe. We verify this hypothesis using 20 video sequenceswith each consisting of 30 frames. The results show that thishypothesis holds for 576 frames out of all 600 frames. In ourhybrid scheme, the rectangular part to be uploaded has theoriginal frame width and a height (denoted by hc) from thehorizon line midpoint in the unregistered reference frame toeither width depending on the sign of

∑az . Let h0 denote

the original frame height, and th denote the delay to conductHough transform locally. The energy consumption for hybridprocessing is given by ehybrid =

hch0ecloud+

tl−thtl

elocal. Note thatthis formula ignores the low-probability cases where the abovehypothesis does not hold. In these cases, the cloud will fail

to identify the horizon line and the original frame will beprocessed locally.

By comparing elocal, ehybrid, and ecloud, SOAR chooses ascheme with the lowest energy consumption. All the parame-ters except the height hc in hybrid scheme and the link speedγ can be obtained using offline measurements. In our currentprototype, we use WiFi to upload video frames to the cloud,although the implementation can be easily extended to cellularnetwork. We measure the power consumption of a SamsungGalaxy Nexus using an Agilent 34411A multimeter when thesmartphone is uploading video frames under various WiFi linkspeed settings and locally processing the frames. Figure 3 plotsthe energy consumption per frame under the three schemes.Note that for the hybrid scheme, we set hc/h0 = 50%. Wecan observe that when the link speed is high, it is preferableto offload the entire/partial image processing to the cloud forenergy conservation.

V. COVERAGE-BASED ROTATION SCHEDULING

In this section, we first introduce camera sensing anddebris arrival models, and analyze the effectiveness of coveringdebris arrivals. We then present a rotation scheduling algorithmthat aims to minimize the rotation energy consumption whilemaintaining a desired coverage rate for debris arrivals.

A. Camera and Debris Models

The field of view (FOV) is a widely used concept tocharacterize a camera’s sensing area, in which any object withdimensions larger than pre-defined thresholds can be reliablydetected from the taken images. A camera’s FOV is typicallyrepresented by a sector originated at the camera’s pinhole withan angular view α and a radius R [19], where α is hardware-dependent and R can be measured through experiments fora specific application. For instance, in pedestrian detection,the camera’s FOV has a radius of up to 40 meters [15]. Forthe objects within the FOV, image sensing is insensitive totheir dimensions as long as they are larger than a certainsize. For instance, our on-water experiments show that anyfloating object with a cross-sectional area over 0.28m2 (e.g.,a five gallon water bottle) can be reliably detected at 40meters away from a smartphone’s camera. We note that FOVis a conservative approximation to the camera’s sensing areabecause the objects outside FOV may also be detected bythe camera. This approximation simplifies the analysis of thecoverage of debris arrivals. In Section V-B, we will show thatthe rotation scheduling algorithm does not depend on the valueof R.

Since we focus on the nearshore debris arrival monitoring,the surveillance region for SOAR is defined as the semi-circular area originated at the robot with a radius of R. Wedefine the camera orientation by its optical axis. Because of thelimited angular coverage of FOV, SOAR needs to constantlyadjust its orientation to maintain a desired coverage level fordebris arrivals over the defined surveillance region. In thispaper, we assume that the adjustments of camera orientationonly occur at time instants that are multiples of a fixed timeinterval unit, which is referred to as slot.

In the nearshore water environments, debris objects oftenpassively drift with water current that moves straightly in a

Fig. 4. Illustration of FOV, surveillance region, thickness, debris arrivingangle β, debris movement orientation θ and its estimation, and the cut-offregion.

fixed direction [12]. We set up a Cartesian coordinate systemwith x-axis, y-axis, and z-axis, which are parallel, horizontallyperpendicular, and vertically perpendicular to the shoreline,respectively. We assume that the positions of debris objects onthe water surface at a given time instant follow the Poissonpoint process, which has been verified in previous studies [10][12]. Our analysis (omitted due to space limit) shows that thenumber of debris objects that arrive at the surveillance regionfrontier, as illustrated in Figure 4, follows the Poisson process.A Poisson process is characterized by an arrival rate, denotedby λ, and a time interval, denoted by τ . The λ is the expectednumber of arriving debris objects at the surveillance regionduring a slot, and τ is the number of slots. The probabilitythat there is at least one object arriving at the surveillanceregion during the interval τ is

Pa = 1− e−λτ . (1)

As the occurrences of debris objects are rare events, thePoisson process has a small λ. In Section V-D, we will describean approach to estimating λ based on the historical detectionresults and envision a cloud-based approach that leverages thepublicly available information about aquatic debris.

B. Debris Arrival Coverage

In the rotation scheduling, an arriving debris object is cov-ered by SOAR if the arrival position at the surveillance regionfrontier falls into the camera’s FOV. A rotation schedule aimsto cover as many newly arriving debris objects as possible.Accordingly, our rotation scheduling algorithm controls theorientation of SOAR based on the statistics of the historicaldebris arrival observations. Therefore, it is important to modelthe arrivals of debris objects at the surveillance region. We notethat whether a debris object covered by the camera’s FOV canbe eventually extracted from the taken images is determinedby the performance of the CV algorithms (cf. Section IV).However, it is desirable to maximize the coverage rate, whichimproves the overall debris monitoring performance. We definethe coverage rate for debris arrivals provided by SOAR asthe ratio of the number of trajectories hitting camera’s FOVfrontier to that hitting surveillance region frontier. However,one immediate challenge is that lines are uncountable. Weemploy thickness in geometric probability [27] to measurethe set of trajectories. Specifically, the thickness of an arcfrontier, denoted by T , is defined as the length of its projectionto the line perpendicular to debris movement orientation, asillustrated in Figure 4.

The thickness allows us to directly evaluate the debriscoverage performance. We first analyze the probability ofmiss coverage based on thickness. Let θ denote the debrismovement orientation, which is the angle between the debris’trajectory and x-axis. Let β denote the debris arriving angle,which is the angle between the radius from debris’ arrivalposition at the surveillance region frontier and x-axis. Bothθ and β are illustrated in Figure 4. To simplify the discussion,we discretize the possible arriving angles by 1◦, and definearrival frontier as the 1◦ arc centered at the arrival position.Let T (β) and T0 denote the thickness of arrival frontierand the thickness of the whole surveillance region frontier,respectively. Based on the geometric relationship, we haveT (β)=2R sin 0.5◦ cos(β−θ) and T0=(1+sin θ)R. Therefore,the probability that a debris object hits the arrival frontierdefined by β conditioned that its trajectory intersects with thesurveillance region frontier, is given by g(β, θ) = T (β)/T0,which is independent of R. As the probability that at leastone debris object arrives at the surveillance region is Pa givenin Equation (1), the probability that at least one debris objectreaches the arrival frontier defined by β (denoted by Pa(β))is Pa(β) = Pa · T (β)/T0 = Pa · g(β, θ). We note that if β isnot covered by the FOV during an interval τ , Pa(β) representsthe probability of missing debris arrivals. We then derive themiss coverage rate, which will be used to schedule the rotationof SOAR. Let tr(β) denote the end time instant of the mostrecent slot when β was covered by the camera’s FOV, andt denote the end time instant of interval [tr(β), t], duringwhich the arrival frontier defined by β remains uncovered. Themiss coverage rate at arriving angle β and time t, denoted byω(β, t | tr(β), θ), is defined as the probability of missing debrisarrivals during interval [tr(β), t]. Formally,

ω(β, t | tr(β), θ) = Pa(β) | τ=t−tr(β)= (1−exp(−λ(t−tr(β))))·g(β, θ). (2)

Note that the miss coverage rate does not depend on the spe-cific value of R, but the ratio of thicknesses, i.e., g(β, θ). Thus,the coverage-based rotation scheduling algorithm presented inSection V-C will not rely on R.

C. Coverage-based Rotation Scheduling

Because of the limited angular coverage and energy budget,the rotation of SOAR needs to be carefully scheduled toachieve the desired coverage rate for debris arrivals. In thissection, we formulate the problem of coverage-based robotrotation scheduling. Our objective is to schedule the nextcamera orientation and corresponding monitoring interval tominimize the overall energy consumption, subject to an upper-bounded miss coverage rate at all arrival angles.

As the rotation energy consumption is approximately pro-portional to the orientation change [21], we adopt the changedorientation to characterize the consumed energy in rotation. Let

θ̃ denote the estimated debris movement orientation (cf. Sec-tion V-D). Suppose the current camera orientation and timeinstant are β′0 and t

′. The next rotation schedule, includingthe camera orientation β0 and end time instant of monitoringinterval t, minimizes the average rotation rate v:

v =|β0 − β

′

0|

t− t′, (3)

H

y2

h1

water

surface

focus

y1

h2

f

z

y

x

(a) Pinhole projection y-z.

x2

water surface

x1

focus

h1h2

H

d1d2

yx

z

(b) Pinhole projection x-z.

Fig. 5. Pinhole camera projection model.

subject to

ω(β, t | tr(β), θ̃) < ξ, ∀β∈ [0, β0−α/2] ∪ [β0+α/2, π], (4)

where ξ is a user-specified maximally tolerable miss coveragerate. The constraint in Equation (4) upper-bounds the misscoverage rate ω at each uncovered β. Note that for β withinthe camera’s FOV, i.e., β∈ [β0−α/2, β0+α/2], it has zero misscoverage rate. The above problem can be efficiently solvedby searching a set of discrete candidate orientations. Duringsurveillance, SOAR keeps a map of ω at each β, and updatesit before each new scheduling to account for system dynamics,which includes the error in orientation adjustment, the updated

θ̃ (cf. Section V-D) and λ. With the updated map of ω, SOARadaptively schedules the next orientation and the associatedmonitoring interval.

Equation (4) ensures an upper bound on ω for uncoveredarriving angles. Given the debris movement orientation θ,there always exists a cut-off region as illustrated in Figure 4.Specifically, the arrival position of a debris object will neverfall into the frontier of this cut-off region. To avoid theunnecessary rotation, the surveillance region can exclude thiscut-off region. Our analysis shows that v is reduced by about25% after excluding this cut-off region when θ=π/3.

D. Debris Movement and Arrival Estimation

From Equation (2), the miss coverage rate ω dependson the debris movement orientation θ. Before deployment,SOAR can be configured with a coarsely estimated θ fromprior knowledge about the water movement direction in thedeployment region. Once SOAR detects a debris object, θ canbe accurately estimated based on the pinhole camera projectionmodel and the positions of the object in images. Figure 5shows the pinhole projection model. Specifically, Figure 5(a)illustrates how real-world distance along y-axis is projected tovertical axis h in the image. It can be obtained that |y1−y2|=fH · |1/h2−1/h1|, where y1 and y2 are distances betweenSOAR and the debris object in two frames; h1 and h2 are thevertical pixel distance equivalents of y1 and y2; f is the camerafocal length; and H is the mounting height of smartphoneon SOAR. Similarly, Figure 5(b) illustrates how real-worlddistance along x-axis corresponds to horizontal axis d. Thefollowing relationship holds |x1−x2|=H · |d2/h2−d1/h1|,where x1, x2, d1, and d2 are similar measures to y1, y2, h1,and h2. Based on the geometric relation shown in Figure 4,the estimated debris movement orientation θ̃ is given by

θ̃=arctan

∣∣∣∣y1−y2x1−x2

∣∣∣∣=arctan(f ·

∣∣∣∣h1−h2

d2h1−d1h2

∣∣∣∣)+θr, (5)

where θr is the heading direction of SOAR and can be obtainedfrom the built-in digital compass of smartphone. We will

evaluate the accuracy of θ̃ in Section VII-A2. Moreover, theobject movement speed can be estimated by (|x1−x2|

2+|y1−y2|

2)1/2/|t1−t2|, where t1 and t2 are the time instants whenthe object is at (x1, y1, 0) and (x2, y2, 0), respectively.

We then describe two approaches to estimating the debrisarrival rate λ. First, λ can be estimated based on the historicaldetection results by SOAR. Suppose the robot detects n debrisobjects in the rotation schedule specified by β0 and t. Asdiscussed in Section V-B, the probability that an arrivingdebris object is covered by the camera’s FOV conditionedthat it arrives at the surveillance region is given by the ratioof their thicknesses, i.e., TFOV/T0. Hence, the expectation ofdebris arrivals at the whole surveillance region during oneslot, i.e., λ, can be estimated by nT0/(PdTFOV(t−t

′)), wherePd is the lower bound on detection probability for the FOVand t− t′ counts the monitoring slots. Second, the long-range communication capability of SOAR makes it possible toexploit the available web resources to estimate λ. For instance,the Marine Debris Clearinghouse [8] is a representative onlinetool for aquatic debris tracking. Based on satellite images, on-site reports, and aquatic field simulations, it can estimate theintensity of incoming debris over large areas.

VI. SYSTEM PROTOTYPE

We have built a proof-of-concept prototype of SOAR for e-valuation. The vision-based debris detection algorithm present-ed in Section IV is fully implemented on smartphone platformsrunning Android 4.3 Jelly Bean. System evaluation is mainlyconducted on two representative handsets, a Samsung GalaxyNexus (referred to as Galaxy) and a Google Nexus 4 (referredto as Nexus). The implementation takes about 1.99MB storageon the phone, and requires about 10.2MB RAM when theframe resolution is set to be 720×480. When a new frameis available, SOAR checks the current WiFi condition usingAndroid API WifiInfo.getLinkSpeed(). Based on themeasured link speed and horizon line position, it determineswhether this newly arrived frame is locally processed orentirely/partially uploaded to the cloud, following the schemeproposed in Section IV-D. Our initial implementation of imageprocessing modules in Java incurs extensive delays on currentAndroid system. To boost the frame processing performance,we use OpenCV libraries and interface them with Java usingJava Native Interface on Android. In particular, we adoptOpenCV’s native implementation of the Hough transform,which is more efficient than other implementations in our tests.

We integrate the smartphone to a gliding robotic fishdeveloped in our previous work. The fish platform weights9 kilogram and represents a hybrid of underwater gliders androbotic fish with advantages of both. It is equipped witha ZigBee radio for wireless communication, two 75W · honboard batteries, and a circuit board for mobility control,localization, and navigation. On the control board, we im-plement a closed-loop proportional-integral-derivative (PID)controller that adaptively configures the tail beat based on thediscrepancy between the scheduled and actual orientations. Inour current implementation, we use a host computer to relaythe communication between the smartphone (using WiFi) andthe fish (using ZigBee). In the future, we will establish direct

connection between them, and migrate the PID controller tothe smartphone to reduce the physical size and cost of the fishcontrol board. Figure 1(b) depicts our prototype system thatintegrates a Galaxy in a water-proof enclosure with a glidingrobotic fish.

VII. PERFORMANCE EVALUATION

We evaluate SOAR through testbed experiments and sim-ulations based on data traces collected from the prototype.The testbed experiments validate the feasibility of SOAR byevaluating the computation overhead, effectiveness of eachmodule, and the overall performance of a fully integratedSOAR. The simulations extensively evaluate the performanceof SOAR working under wide ranges for parameter settings.

A. Testbed Experiments

We evaluate SOAR in controlled lab experiments to fullyunderstand its performance. We conduct extensive experimentsusing our prototype in a 15 feet × 10 feet water tank inour lab. Along one side of the tank, we vertically place apiece of white foam board above the water surface to imitatethe sky area. The line where the foam board intersects withthe water surface produces an artificial horizon line. In theexperiments, we set the camera frame rate to be 0.25 fps. Thesetting is based on the fact that debris usually has a slowdrifting speed and hence does not require a rapidly updatedbackground model. Moreover, a low frame rate helps prolongthe battery lifetime. The GMM comprises 3 three-dimensionalGaussians (i.e., K = 3), unless otherwise specified. We testthe debris detection performance under various environmentalconditions and experimental settings, which include differentcamera shaking levels, with and without registration, anddifferent settings of GMM. For each scenario, we conduct 9runs of experiments. For each run, we calculate the detectionprobability as the ratio of frames with correct detections to thetotal frames with debris, and the false alarm rate as the ratio offrames with false detections to the total frames without debris.

1) Overhead on Smartphone Hardware: We first evaluatethe overhead of the vision-based detection algorithms onsmartphone platforms. Specifically, we measure the compu-tation delay of each image processing module, i.e., imageregistration, background subtraction, and debris identificationon Galaxy and Nexus, respectively. Galaxy has an 1.2GHzdual-core processor and 1GB memory, and Nexus has an1.5GHz quad-core processor and 2GB memory. They arerepresentative mid- and high-end mainstream smartphone plat-forms. The computation delay is measured as the elapsed timeusing Android API System.nanoTime(). The results areplotted in Figure 6. We can see that background subtractiontakes the least time, followed by debris identification com-bined with debris movement orientation estimation. Imageregistration incurs the longest delay. Breakdown shows thatthis long delay is mostly caused by the Hough transform. Theoverall computation delay is within 3.7 and 3.3 seconds onGalaxy and Nexus, respectively, which well meet the real-time requirement of debris monitoring as debris arrivals aretypically sporadic [10] [12].

0

1

2

3

image background identification

Exec

uti

on

tim

e(s

)

registration subtraction & θ estimation

GalaxyNexus

Hough on GalaxyHough on Nexus

Fig. 6. Execution time of image processing mod-ules.

0

0.2

0.4

0.6

0.8

1

0 20 40 60 80 100

CD

F

Relative estimation error (%)

with registration

without registration

Fig. 7. CDF and average of the relative estimationerror for θ.

0.5

0.6

0.7

0.8

0.9

1

2 3 60

0.2

0.4

0.6

0.8

1

Det

ecti

on

pro

bab

ilit

y

Rel

ativ

eex

ecuti

on

tim

e

Number of Gaussians K

detection probabilityrelative execution time

Fig. 8. Impact of the number of mixed GaussiansK .

horizon line of predecessor frame horizon line of current frame

(a) Original frame (b) Gaussian with the largest weight (c) Without registration (d) With registration

Fig. 9. Sample background subtraction outputs for approaches with/without horizon-based image registration.

2) Accuracy of Debris Movement Orientation Estimation:We then evaluate the debris movement orientation estimationpresented in Section V-D. Initially, the debris movement ori-entation θ is unknown to SOAR. After a debris object issuccessfully detected in two frames, SOAR can estimate θbased on Equation (5). In the experiments, an object (a can)is fastened to a rope. We drag the object using the ropeto simulate the movement of debris object. We define the

relative estimation error as |θ−θ̃|/θ, where θ̃ is the estimatedorientation from Equation (5) and θ is the groundtruth obtainedby using a protractor. Figure 7 plots the cumulative distributionfunction (CDF) and the average of the relative estimation errorfor the approaches with and without image registration. We cansee that the approach with registration can accurately estimateθ, with an average relative estimation error of only about 7%.In contrast, the approach without registration results in anaverage relative estimation error of around 43%.

3) Impact of Mixed Gaussians: We now evaluate the im-pact of the number of mixed Gaussians (i.e., K) on detectionperformance. Section IV-B discusses the trade-off between thedetection performance and system overhead caused by thesetting of K . The detection probability and execution time onGalaxy are plotted in Figure 8. We can see that the detectionprobability increases with K . Moreover, Figure 10 evaluatesthe false alarm rate versus K , where the error bar represents thestandard deviation. It can be observed that the false alarm ratedecreases with K . A larger K imposes heavier computationoverhead in both image segmentation and background modelupdate. When the GMM adopts 2 Gaussians, the computationdelay is about 30% of that with 6 Gaussians. From the figures,we also find that the setting K=3 achieves a satisfactory trade-off between detection performance and computation delay.Therefore, we set K=3 in other testbed experiments.

4) Effectiveness of Image Registration: As discussed inSection IV-B, the background subtraction is conducted ina pixel-wise manner, hence its performance is sensitive to

camera shaking. Figure 9 shows a sample of background sub-traction. Specifically, Figure 9(a) is the original frame wherethe red and black dashed lines represent the extracted horizonlines for this frame and the registered predecessor frame,respectively. Figure 9(b) shows the background model, whereeach pixel is the mean vector of the Gaussian with the largestweight in the GMM. Figure 9(c) is the result of backgroundsubtraction without image registration. Figure 9(d) is the resultwith image registration before subtraction. We can see thatour horizon-based registration effectively mitigates the impactof camera shaking, and hence the detection algorithm canmore accurately pinpoint the foreground object location in theimage. Figure 11 plots the detection probability for approacheswith and without image registration under different camerashaking levels. In the experiments, we generate different levelsof waves by controlling a feed pump connected to the tank.The az reported in Figure 11 is the average linear verticalacceleration (i.e., excluding gravity) measured by the built-in accelerometer on smartphone, and hence characterizes thecamera shaking levels. We can see that the image registrationnot only improves the average detection performance, butalso decreases the variance in detection probability in thepresence of camera shaking. It effectively mitigates the impactof shaking, leading to a smaller degradation in detectionprobability as camera shaking increases.

5) Integrated Evaluation: In this set of experiments, allmodules of SOAR (vision-based debris detection, θ estima-tion, rotation scheduling, and PID-controlled orientation ad-justment) are integrated and evaluated. Similar to previousexperiments, we drag a can to simulate a debris object. Thedebris movement orientation θ is 0.156π, which is unknownto the system before deployment. We set the slot durationas 1 minute. At time t = 0, SOAR is deployed perpendic-ularly to the tank length. At time t = 1min, it starts thefirst monitoring round as discussed in Section III. In thisexperiment, SOAR achieves 83.3% detection probability and

0

0.05

0.1

0.15

0.2

2 3 6Number of Gaussians K

Fal

seal

arm

rate

Fig. 10. False alarm rate versus number of mixedGaussians K .

0.5

0.6

0.7

0.8

0.9

1

w/o registration w/i registration

Det

ecti

on

pro

bab

ilit

y

az =0.17m/s2

az =0.09m/s2

Fig. 11. Impact of image registration and shakinglevel.

0◦

30◦

60◦

90◦

1 2 3 4 5 6Index of scheduling rounds

Cam

era

ori

enta

tionβ simulated schdl

expected schdl

actual orient

Fig. 12. Simulated / expected / actual orientationsin the integrated evaluation.

0

5

10

15

20

25

30

1 2 3 4 5 6

Tota

lm

onit

ori

ng

tim

e(m

in)

Index of scheduling rounds

expected schedule

simulated schedule

Fig. 13. Simulated / actual monitoring intervals inthe integrated evaluation.

0.5

0.6

0.7

0.8

0.9

1

0.2 0.6 1.2 2

Det

ecti

on

pro

bab

ilit

y

Frame rate

trace 1trace 2

Fig. 14. Impact of frame rate on debris detectionprobability.

0

0.1

0.2

0.3

0.4

1 2 3 4 5 60.2

0.4

0.6

0.8

Max

imum

ω

Aver

age

rota

tion

ratev


desired upper bound on ω

ωm of our approach

ωm of uniform scan

v of our approach

v of uniform scan

Fig. 15. Maximum ω and average rotation rate vversus index of scheduling rounds.

5.8% average relative θ estimation error. We further study theperformance of rotation scheduling and orientation adjustmentof SOAR. It is unlikely for SOAR to rotate to the exactscheduled orientation due to complex fluid dynamics andcompass inaccuracy. This orientation adjustment error andminor inaccuracy in θ estimation affect the rotation schedulingfor the next round. For evaluation, we compare the actualrotations of SOAR with the real-time scheduled rotationsduring this experiment (referred to as expected schedule), andthose in an ideal simulation (referred to as simulated schedule)where we assume both orientation adjustment and θ estimationare accurate and thus feed the scheduling algorithm withgroundtruth. We note that the differences between the actualrotations and expected schedule characterize the performanceof the PID-controlled orientation adjustment. The deviationsbetween the expected schedule and simulated schedule indicatethe robustness of the rotation scheduling algorithm to the con-trol and estimation errors. The results are shown in Figure 12and Figure 13. We find that the orientation adjustment errorsvary for different expected orientations due to different levelsof fluid obstruction. For instance, in the 3rd scheduling round,SOAR is subject to a higher level of fluid obstruction whenit targets an expected orientation perpendicular to the watermovement direction. Our PID controller can generally maintainan orientation adjustment error lower than 15◦. Moreover, asshown in Figure 13, the expected monitoring intervals wellfollow the simulated schedule because the temporal schedulingis mainly determined by the debris arriving intensity over time.

B. Trace-driven Simulations

To more extensively evaluate SOAR under realistic settings,we conduct simulations driven by real data traces collectedusing our prototype in the water tank. The data include errortraces of SOAR orientation adjustments and video traces of

debris arrivals. First, the error traces of SOAR orientationadjustments are collected using our prototype system. Thecamera orientation is represented by the heading directionof SOAR. We collect the error traces by measuring thediscrepancy between the desired orientation and actual headingdirection. Second, we use our prototype to collect two distinctsets of video traces of an arriving bottle. In our prototype,the phone is mounted about 3 inches above the water surface.Trace 1 is collected in calm environment, and has 1495 framesin total. Trace 2, with a total of 1275 frames, is collected inthe presence of persistent waves generated by the feed pump.To provide groundtruth data, the foreground debris object ineach frame is manually labeled.

In the simulations, SOAR is deployed to monitor debrisobjects arriving in a semi-circular surveillance region. Thearrival rate λ is set to be 9, unless otherwise specified. Thedebris movement orientation is θ=π/3, and we assume that θis known to the robot. We set the FOV angular coverage α tobe 5π/18 based on our measurements of Galaxy. Initially, therobot is deployed perpendicular to the shoreline. It is allowedto adjust its orientation after the first slot. For each orientationadjustment, the actual direction is set according to the collectedtraces, which is thus subject to discrepancies from the desiredorientation.

1) Impact of Frame Rate on Detection Performance:Figure 14 plots the detection probability versus frame rate.We can observe that the detection probability increases withframe rate. This is because a higher frame rate enables theGMM to be more timely updated to capture the environmentalcondition changes. However, the improvement gained by in-creasing frame rate is fairly limited. The reasons are two-fold.First, debris objects usually drift slowly with water current.The GMM can thus be updated with a low rate. Second,

0.1

0.3

0.5

0.7

1 3 5 70

5

10

15

20

Aver

age

rota

tion

ratev

Tota

lm

onit

ori

ng

tim

e(s

lot)


t (λ=4.5)t (λ=9.0)

v (λ=4.5)v (λ=9.0)

Fig. 16. Impact of arrival rate λ on rotation ratev and scheduling frequency.

0

0.1

0.2

0.3

1 2 3 4 5Index of scheduling rounds

Max

imum

ω

ε(θ) =0%ε(θ) =5%

ε(θ)=10%ε(θ)=15%

Fig. 17. Impact of estimation error in debrismovement orientation θ.

25

30

35

40

45

50

10 20 30 400

2

4

6

Lif

etim

e(d

ay)

Ener

gy

consu

mpti

on

Duty cycle (%)

(W·h

/day

)

fish rotationphone sleep

phone wake

lifetime (local)lifetime (hybrid)

Fig. 18. Projected lifetime and daily energyconsumption of SOAR.

our horizon-based image registration effectively mitigates theimpact of camera shaking, which is the major affecting factorfor debris detection. Hence, a low frame rate can achieve sat-isfactory debris detection performance. Moreover, in terms ofsystem overhead, a low frame rate is desirable for smartphoneplatforms that have constraints on resources and energy supply.

2) Coverage Effectiveness: We compare the coverage ef-fectiveness of our approach with a heuristic baseline approachthat uniformly scans the surveillance region. Specifically, thebaseline approach evenly partitions the semi-circular surveil-lance region into ⌈π/α⌉ sub-regions. In this scheme, the robotsequentially scans the sub-regions by making an orientationadjustment each slot. For our approach, the desired upperbound on miss coverage rate is set to be 0.3. We evaluatethe coverage effectiveness by examining the maximum misscoverage rate (denoted by ωm) among all arriving angles. Theωm characterizes the worst-case debris coverage performancefor a rotation schedule. Figure 15 plots the ωm versus indexof scheduling rounds. We can observe that our approach canguarantee the upper bound on miss coverage rate, since itadaptively allocates surveillance slots based on the ω at eachβ, while the uniform scan cannot bound the ωm. Moreover, weevaluate the average rotation rate v, where a larger v indicateshigher power consumption. Figure 15 also plots the v versusindex of scheduling rounds. As the uniform scan approachcontinuously adjusts the orientation, it consumes more powerthan our approach.

3) Impact of Arrival Rate: Figure 16 plots the averagerotation rate v versus index of scheduling rounds under dif-ferent settings of debris arrival rate λ. We can see that therobot has a higher v when λ is higher. This is consistentwith the intuition that the robot needs to rotate faster whendebris arrives more frequently. Figure 16 also plots the totalmonitoring time versus index of scheduling rounds. We can seethat for a higher λ, the rotation scheduling has to be conductedmore frequently to meet the upper-bounded miss coverage rateω, as ω at the uncovered arriving angles increases with λ.The robot is scheduled to rotate less frequently under a lowerλ, resulting in a lower v and a longer monitoring interval ateach orientation. Overall, the results in Figure 16 demonstratethat our scheduling algorithm can adaptively control the robotrotation to achieve the desired coverage performance whileminimizing the energy consumption.

4) Impact of Estimation Errors: This set of simulationsevaluate the impact of estimation errors of debris movementorientation θ on debris coverage performance. Let ε(·) denote

the relative estimation error with respect to groundtruth. Fig-ure 17 plots the maximum miss coverage rate ωm versus indexof scheduling rounds under various estimation error levels forθ. We note that the estimation error in θ affects the schedulingof camera orientation, as θ determines the distribution of debrisarriving probability at the frontier of surveillance region. Fromthe figure, we can see that our rotation scheduling algorithmgenerally maintains ωm below the desired upper bound of 0.3,as long as the relative estimation errors for θ is below 15%. Asshown in Section VII-A, ε(θ) is only about 7%. Thus, SOARcan tolerate practical inaccuracy in estimating θ. Moreover,our analysis (omitted due to space limit) validates that SOARshows similar tolerance to the inaccuracy in estimating debrisarrival rate λ.

5) Projected Lifetime: Finally, we evaluate the lifetime ofSOAR based on its power consumption profile. The majorenergy consumption of SOAR is due to the smartphone duringthe wake periods and the fish rotation. According to our resultsfrom the integrated evaluation (cf. Figure 13), a monitoringround lasts for 5 minutes averagely, and SOAR can rotate to thescheduled orientation within 15 seconds. We can thus calculatethe upper bound on daily (12 hours of daytime) energy con-sumption for fish rotation as (12×60/5)×(15/3600)×pr W ·h,where pr is the battery power consumption for fish rotation.The energy drain on smartphone can be calculated using offlinepower consumption measurements and the duty cycle. Thetotal battery capacity of SOAR is 170W·h, including a backup13.5W ·h and two main 75W ·h batteries on the fish, and a6.48W·h battery on the smartphone. Fig.18 plots the projectedlifetime under various duty cycle settings, when all the CVtasks are conducted on the smartphone. The duty cycle isdefined as the ratio of wake time to the total time. As expected,the lifetime decreases with duty cycle. Note that a low dutycycle will decrease the temporal sensing granularity of SOAR.For instance, during a 5 minutes monitoring interval, it cancapture and process about 20 and 50 frames under 20% and50% duty cycles, respectively. The breakdown of SOAR dailyenergy consumption is also shown in Fig.18. We find thatthe majority of energy is consumed by the wake periods andfish rotation. Moreover, Fig.18 also plots the projected lifetimewhen smartphone runs the hybrid scheme under a link speed of2Mbps. The hybrid scheme reduces the power consumption forsmartphone during the wake periods by offloading the intensiveHough transform to the cloud. It can be seen that the hybridscheme leads to 9.1% to 21.5% improvement on lifetime underdifferent duty cycles.

VIII. CONCLUSION AND FUTURE WORK

This paper presents SOAR – a new vision-based roboticsensor system designed for aquatic debris monitoring. SOARintegrates an off-the-shelf Android smartphone and a glidingrobotic fish. The vision-based debris detection algorithms ofSOAR effectively deal with various dynamics such as camerashaking and reflection. A rotation scheduling algorithm adap-tively guides the rotation of SOAR to capture the images ofarriving debris objects. Moreover, SOAR dynamically offloadsthe entire/partial image processing to the cloud for energyconservation. Testbed experiments and extensive simulationsbased on a prototype system show that SOAR provides robustdebris detection performance, meets the real-time requirementon smartphone platforms, and efficiently covers the sporadicdebris arrivals.

In our future work, we plan to deploy SOAR in aninland lake and evaluate it under various conditions such asdebris flow speed and brightness/lightening. Moreover, wewill develop multi-SOAR coordination schemes, including thefusion of images taken by different robots and collaborativemovement/rotation scheduling algorithms for increased spa-tiotemporal coverage.

ACKNOWLEDGMENTS

This work was supported in part by the U.S. NationalScience Foundation under grants CNS-1218475, IIS-0916720,CNS-1059373, IIS-1319602, CCF-1331852, IIP-1343413, andthe National Nature Science Foundation of China (GrantNo. 61202350). We thank Tian Hao, Dennis Phillips, andZheyun Feng for their help on Android development, platformbuilding, and image processing algorithms. We also thank ourshepherd Dr. Bodhi Priyantha and the anonymous reviewersfor providing valuable feedback.

REFERENCES

[1] 2011 Japan tsunami. http://marinedebris.noaa.gov/tsunamidebris.

[2] Hurricane Sandy. http://usat.ly/LWoxTI.

[3] Marine debris impacts. http://water.epa.gov/type/oceb/marinedebris.

[4] Problem with marine debris. http://waterboards.ca.gov/water issues.

[5] Project Kaisei. http://projectkaisei.org.

[6] Gliding robotic fish. http://nbcnews.to/1fGAomj.

[7] LG Optimus Net. http://amzn.to/Y4BvO9.

[8] NOAA Clearinghouse. http://clearinghouse.marinedebris.noaa.gov.

[9] J. Ai and A. Abouzeid, “Coverage by directional sensors in randomlydeployed wireless sensor networks,” Journal of Combinatorial Opti-mization, vol. 11, no. 1, pp. 21–41, 2006.

[10] R. C. Boland and M. J. Donohue, “Marine debris accumulation inthe nearshore marine habitat of the endangered Hawaiian monk seal,”Marine Pollution Bulletin, vol. 46, no. 11, pp. 1385–1394, 2003.

[11] Crossbow Technology, MICA2, TelosB datasheets.

[12] J. Davies, J. Baxter, M. Bradley, D. Connor, J. Khan, E. Murray,W. Sanderson, C. Turnbull, and M. Vincent, Marine Monitoring Hand-book. Joint Nature Conservation Committee, 2001.

[13] U. M. Erdem and S. Sclaroff, “Automated camera layout to satisfytask-specific and floor plan-specific coverage requirements,” ComputerVision and Image Understanding, vol. 103, no. 3, pp. 156–169, 2006.

[14] C. C. Eriksen, T. J. Osse, R. D. Light, T. Wen, T. W. Lehman, P. L.Sabin, J. W. Ballard, and A. M. Chiodi, “Seaglider: a long-rangeautonomous underwater vehicle for oceanographic research,” IEEEJournal of Oceanic Engineering, vol. 26, no. 4, pp. 424–436, 2001.

[15] T. Gandhi and M. M. Trivedi, “Pedestrian protection systems: issues,survey, and challenges,” IEEE Transactions on Intelligent Transporta-tion Systems, vol. 8, no. 3, pp. 413–430, 2007.

[16] T. Hao, R. Zhou, and G. Xing, “COBRA: color barcode streamingfor smartphone systems,” in Proceedings of the 10th InternationalConference on Mobile Systems, Applications, and Services (MobiSys),2012, pp. 85–98.

[17] M. Hoffmann, M. Wittke, J. Hahner, and C. Muller-Schloer, “Spatialpartitioning in self-organizing smart camera systems,” Journal of Se-lected Topics in Signal Processing, vol. 2, no. 4, pp. 480–492, 2008.

[18] J. Illingworth and J. Kittler, “A survey of the Hough transform,”Computer Vision, Graphics, and Image Processing, vol. 44, no. 1, pp.87–116, 1988.

[19] A. K. Jain, Fundamentals of Digital Image Processing. Prentice Hall,1989, vol. 3.

[20] S. Kako, A. Isobe, and S. Magome, “Low altitude remote-sensingmethod to monitor marine and beach litter of various colors using aballoon equipped with a digital camera,” Marine Pollution Bulletin,vol. 64, no. 6, pp. 1156–1162, 2012.

[21] N. E. Leonard and J. G. Graver, “Model-based feedback control ofautonomous underwater gliders,” IEEE Journal of Oceanic Engineering,vol. 26, no. 4, pp. 633–645, 2001.

[22] T. Mace, “At-sea detection of marine debris: overview of technologies,processes, issues, and options,” Marine Pollution Bulletin, vol. 65, no. 1,pp. 23–27, 2012.

[23] M. Rahimi, R. Baer, O. I. Iroezi, J. C. Garcia, J. Warrior, D. Estrin,and M. Srivastava, “Cyclops: in situ image sensing and interpretationin wireless sensor networks,” in Proceedings of the 3rd InternationalConference on Embedded Networked Sensor Systems (SenSys), 2005,pp. 192–204.

[24] S. Richardson and P. J. Green, “On Bayesian analysis of mixtures withan unknown number of components,” Journal of the Royal StatisticalSociety, vol. 59, no. 4, pp. 731–792, 1997.

[25] D. L. Rudnick, R. E. Davis, C. C. Eriksen, D. M. Fratantoni, and M. J.Perry, “Underwater gliders for ocean research,” Marine TechnologySociety Journal, vol. 38, no. 2, pp. 73–84, 2004.

[26] Y. Shen, W. Hu, J. Liu, M. Yang, B. Wei, and C. T. Chou, “Efficientbackground subtraction for real-time tracking in embedded cameranetwork,” in Proceedings of the 10th International Conference onEmbedded Networked Sensor Systems (SenSys), 2012, pp. 295–308.

[27] H. Solomon, Geometric Probability. Society for industrial and appliedmathematics, 1978, vol. 28.

[28] C. Song, Z. Qu, N. Blumm, and A.-L. Barabási, “Limits of predictabilityin human mobility,” Science, vol. 327, no. 5968, pp. 1018–1021, 2010.

[29] C. Stauffer and W. E. L. Grimson, “Adaptive background mixture mod-els for real-time tracking,” in Proceedings of the 13th IEEE Conferenceon Computer Vision and Pattern Recognition (CVPR), 1999, pp. 246–252.

[30] R. Tan, H. Huo, J. Qian, and T. Fang, “Traffic video segmentation usingadaptive-k gaussian mixture model,” in Advances in Machine Vision,Image Processing, and Pattern Analysis, 2006, pp. 125–134.

[31] I. Vasilescu, K. Kotay, D. Rus, M. Dunbabin, and P. Corke, “Datacollection, storage, and retrieval with an underwater sensor network,”in Proceedings of the 3rd International Conference on EmbeddedNetworked Sensor Systems (SenSys), 2005, pp. 154–165.

[32] C.-W. You, N. D. Lane, F. Chen, R. Wang, Z. Chen, , T. J. Bao,Y. Cheng, M. Lin, L. Torresani, and A. T. Campbell, “CarSafe app:alerting drowsy and distracted drivers using dual cameras on smart-phones,” in Proceedings of the 11th International Conference on MobileSystems, Applications, and Services (MobiSys), 2013, pp. 13–26.

[33] Z. Zivkovic, “Improved adaptive Gaussian mixture model for back-ground subtraction,” in Proceedings of the 17th International Confer-ence on Pattern Recognition (ICPR), 2004, pp. 28–31.

Aquatic Debris Monitoring Using Smartphone-Based Robotic ...cvlab.cse.msu.edu/pdfs/Wang_Tan_Xing_Wang_Tan_Liu_Chang_IPSN… · Robot), a low-cost, vision-based surveillance robot

Documents