Top Banner
LAVEA: Latency-aware Video Analytics on Edge Computing Platform Shanhe Yi College of William and Mary [email protected] Zijiang Hao College of William and Mary [email protected] Qingyang Zhang Wayne State University Anhui University, China [email protected] an Zhang Wayne State University [email protected] Weisong Shi Wayne State University [email protected] n Li College of William and Mary [email protected] ABSTRACT Along the trend pushing computation from the network core to the edge where the most of data are generated, edge computing has shown its potential in reducing response time, lowering bandwidth usage, improving energy eciency and so on. At the same time, low-latency video analytics is becoming more and more important for applications in public safety, counter-terrorism, self-driving cars, VR/AR, etc. As those tasks are either computation intensive or bandwidth hungry, edge computing ts in well here with its ability to exibly utilize computation and bandwidth from and between each layer. In this paper, we present LAVEA, a system built on top of an edge computing platform, which ooads computation between clients and edge nodes, collaborates nearby edge nodes, to provide low-latency video analytics at places closer to the users. We have utilized an edge-rst design and formulated an optimization prob- lem for ooading task selection and prioritized ooading requests received at the edge node to minimize the response time. In case of a saturating workload on the front edge node, we have designed and compared various task placement schemes that are tailed for inter-edge collaboration. We have implemented and evaluated our system. Our results reveal that the client-edge conguration has a speedup ranging from 1.3x to 4x (1.2x to 1.7x) against running in local (client-cloud conguration). e proposed shortest sched- uling latency rst scheme outputs the best overall task placement performance for inter-edge collaboration. CCS CONCEPTS Networks Cloud computing; Computing methodologies Object recognition; Soware and its engineering Publish- subscribe / event-based architectures; KEYWORDS computation ooading, edge computing Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for prot or commercial advantage and that copies bear this notice and the full citation on the rst page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permied. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specic permission and/or a fee. Request permissions from [email protected]. SEC ’17, San Jose / Silicon Valley, CA, USA © 2017 ACM. 978-1-4503-5087-7/17/10. . . $15.00 DOI: 10.1145/3132211.3134459 ACM Reference format: Shanhe Yi, Zijiang Hao, Qingyang Zhang, an Zhang, Weisong Shi, and n Li. 2017. LAVEA: Latency-aware Video Analytics on Edge Com- puting Platform. In Proceedings of SEC ’17, San Jose / Silicon Valley, CA, USA, October 12–14, 2017, 13 pages. DOI: 10.1145/3132211.3134459 1 INTRODUCTION Edge computing (also termed fog computing [4], cloudlets [28], MEC [24], etc.) has brought us beer opportunities to achieve the ultimate goal of a world with pervasive computation [28]. is new computing paradigm is proposed to overcome the inherent problems of cloud computing and provide supports to the emerging Internet of ings (IoT) [14, 33, 37]. Typically, when using the cloud, all the data generated shall be uploaded to the cloud data center before processing. However, considering nowadays a huge amount of data is being intensively generated at the edge of the network, transferring the data at such scale to the distant cloud for processing will add burdens to the network and lead to unacceptable response time, especially for latency-sensitive applications. More specically, as for edge computing, we aim to provide edge analytics, which focuses on data analytics at or near the places (the network edge) where data is generated [30]. Data analytics done at the edge of the network has many benets such as gathering more client side information, cuing short the response time, saving network bandwidth, lowering the peak workload to the cloud, and so on. Among many edge analytic applications, in this paper, we focus on delivering video analytics at the edge. e ability to provide low latency video analytics is critical for applications in the elds of public safety, counter-terrorism, self-driving cars, VR/AR, etc [32]. In video edge analytic applications, we consider typical client de- vices such as mobile phones, body-worn cameras or dash cameras mounted on vehicles, web cameras at toll stations or highway check- points, security cameras in public places, or even video captured by UAVs [35]. For example, in “Amber Alert”, our system can au- tomate and speedup the searching of objects of interest by vehicle recognition, vehicle license plate recognition and face recognition utilizing various web cameras deployed at highway entrances, or dash cameras or cameras of smartphones mounted on cars. Simply uploading all the captured video or redirecting video feeds to the cloud cannot meet the requirement of latency-sensitive applications, because the computer vision algorithms involved in
13

LAVEA: Latency-aware Video Analytics on Edge Computing Platformweisongshi.org/papers/yi17-LAVEA.pdf · 2020. 8. 19. · LAVEA: Latency-aware Video Analytics on Edge Computing Platform

Oct 17, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: LAVEA: Latency-aware Video Analytics on Edge Computing Platformweisongshi.org/papers/yi17-LAVEA.pdf · 2020. 8. 19. · LAVEA: Latency-aware Video Analytics on Edge Computing Platform

LAVEA Latency-aware Video Analytics on Edge ComputingPlatform

Shanhe YiCollege of William and Mary

syicswmcom

Zijiang HaoCollege of William and Mary

hebocswmcom

Qingyang ZhangWayne State UniversityAnhui University Chinaqyzhangwaynecom

an ZhangWayne State Universityquanzhangwaynecom

Weisong ShiWayne State Universityweisongwaynecom

n LiCollege of William and Mary

liquncswmcom

ABSTRACTAlong the trend pushing computation from the network core to theedge where the most of data are generated edge computing hasshown its potential in reducing response time lowering bandwidthusage improving energy eciency and so on At the same timelow-latency video analytics is becoming more and more importantfor applications in public safety counter-terrorism self-drivingcars VRAR etc As those tasks are either computation intensive orbandwidth hungry edge computing ts in well here with its abilityto exibly utilize computation and bandwidth from and betweeneach layer In this paper we present LAVEA a system built on top ofan edge computing platform which ooads computation betweenclients and edge nodes collaborates nearby edge nodes to providelow-latency video analytics at places closer to the users We haveutilized an edge-rst design and formulated an optimization prob-lem for ooading task selection and prioritized ooading requestsreceived at the edge node to minimize the response time In case ofa saturating workload on the front edge node we have designedand compared various task placement schemes that are tailed forinter-edge collaboration We have implemented and evaluated oursystem Our results reveal that the client-edge conguration hasa speedup ranging from 13x to 4x (12x to 17x) against runningin local (client-cloud conguration) e proposed shortest sched-uling latency rst scheme outputs the best overall task placementperformance for inter-edge collaboration

CCS CONCEPTSbullNetworks rarrCloud computing bullComputingmethodologiesrarrObject recognition bullSoware and its engineering rarrPublish-subscribe event-based architectures

KEYWORDScomputation ooading edge computing

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor prot or commercial advantage and that copies bear this notice and the full citationon the rst page Copyrights for components of this work owned by others than ACMmust be honored Abstracting with credit is permied To copy otherwise or republishto post on servers or to redistribute to lists requires prior specic permission andor afee Request permissions from permissionsacmorgSEC rsquo17 San Jose Silicon Valley CA USAcopy 2017 ACM 978-1-4503-5087-71710 $1500DOI 10114531322113134459

ACM Reference formatShanhe Yi Zijiang Hao Qingyang Zhang an Zhang Weisong Shiandn Li 2017 LAVEA Latency-aware Video Analytics on Edge Com-puting Platform In Proceedings of SEC rsquo17 San Jose Silicon Valley CA USAOctober 12ndash14 2017 13 pagesDOI 10114531322113134459

1 INTRODUCTIONEdge computing (also termed fog computing [4] cloudlets [28]MEC [24] etc) has brought us beer opportunities to achieve theultimate goal of a world with pervasive computation [28] isnew computing paradigm is proposed to overcome the inherentproblems of cloud computing and provide supports to the emergingInternet of ings (IoT) [14 33 37] Typically when using thecloud all the data generated shall be uploaded to the cloud datacenter before processing However considering nowadays a hugeamount of data is being intensively generated at the edge of thenetwork transferring the data at such scale to the distant cloud forprocessingwill add burdens to the network and lead to unacceptableresponse time especially for latency-sensitive applications Morespecically as for edge computing we aim to provide edge analyticswhich focuses on data analytics at or near the places (the networkedge) where data is generated [30] Data analytics done at the edgeof the network has many benets such as gathering more clientside information cuing short the response time saving networkbandwidth lowering the peak workload to the cloud and so on

Among many edge analytic applications in this paper we focuson delivering video analytics at the edge e ability to provide lowlatency video analytics is critical for applications in the elds ofpublic safety counter-terrorism self-driving cars VRAR etc [32]In video edge analytic applications we consider typical client de-vices such as mobile phones body-worn cameras or dash camerasmounted on vehicles web cameras at toll stations or highway check-points security cameras in public places or even video capturedby UAVs [35] For example in ldquoAmber Alertrdquo our system can au-tomate and speedup the searching of objects of interest by vehiclerecognition vehicle license plate recognition and face recognitionutilizing various web cameras deployed at highway entrances ordash cameras or cameras of smartphones mounted on cars

Simply uploading all the captured video or redirecting videofeeds to the cloud cannot meet the requirement of latency-sensitiveapplications because the computer vision algorithms involved in

SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA S Yi et al

object tracking object detection object recognition face and opti-cal character recognition (OCR) are either computation intensiveor bandwidth hungry In addressing these problems mobile cloudcomputing (MCC) is proposed to run heavy tasks on resource richcloud node to improve response time or energy cost is techniqueutilizes both the mobile and cloud for computation An appropriatepartition of tasks that makes trade-o between local and remoteexecution can speed up the computation and preserve mobile en-ergy at the same time [7 13 15 21 31] However there are stillconcerns of cloud about the limited bandwidth the unpredictablelatency and the abrupt service outage Existing work has exploredadding intermediate servers (cloudlets) between mobile client andthe cloud Cloudlet is an early implementation of the cloud-likeedge computing platform with virtual machine (VM) techniquese edge computing platform in our work has a dierent designon top of lightweight OS-level virtualization which is modular ndasheasy to deploy manage and scale Compared to VM the OS-levelvirtualization provides resource isolation in a much lower cost eadoption of container technique leads to a server-less platformwhere the end user can deploy and enable edge computing platformon heterogeneous devices with minimal eorts e user programs(scripts or executable binaries) will be encapsulated in containerswhich provide resource isolation self-contained packaging any-where deploy and easy-to-congure clustering e end user onlyneeds to register events of interest and provide corresponding han-dler functions to our system which automatically handle the eventsbehind the scene

In this paper we are considering a 3-tier mobile-edge-cloud de-ployment and we put most of our eorts into the mobile-edge sideand inter-edge side design To demonstrate the eectiveness of ouredge computing platform we have built the Latency-Aware VideoEdge Analytics (LAVEA) system We divide the response time min-imization problem into three sub-problems First we select clienttasks that benet from being ooaded to edge node in reducingtime cost We formulated this problem as a mathematical optimiza-tion problem to choose ooading tasks and allocate bandwidthamong clients Unlike existing work in mobile cloud computingwe cannot make the assumption that edge node is as powerful ascloud node which can process all the tasks instantly ereforewe consider the increasing resource contention and response timewhen more and more tasks are running on edge node by addinglatency constraints to the optimization problem Second upon re-ceiving ooading task requests at each epoch the edge node runsthese tasks in an order to minimize the makespan However theooaded tasks cannot be started when the corresponding inputsare not ready To address this problem we employed a classic two-stage job shop model and adapted Johnsonrsquos rule with topologicalordering constraint in a heuristic to prioritize the tasks Last weenable inter-edge collaboration leveraging nearby edge nodes to re-duce the overall task completion time We have investigated severaltask placement schemes that are tailored for inter-edge collabo-ration e ndings provided us insights that lead to an ecientprediction-based task placement scheme

In summary we make the following contributions

bull We have designed an edge computing platform based ona server-less architecture which is able to provide exi-ble computation ooading to nearby clients to speed upcomputation-intensive and delay-sensitive applicationsOur implementation is lightweight-virtualized event-basedmodular and easy to deploy and manage on either edge orcloud nodes

bull We have formulated an optimization problem for ooad-ing task selection and prioritized ooading requests tominimize the response time e task selection problemco-optimizes the ooading decision and bandwidth alloca-tion and is constrained by the latency requirement whichcan be tuned to adapt to the workload on edge node forooading e task prioritizing is modeled as a two-stagejob shop problem and a heuristic is proposed with thetopological ordering constraint

bull We have evaluated several task placement schemes forinter-edge collaboration and proposed a predication-basedmethod which eciently estimates the response time

2 BACKGROUND AND MOTIVATIONIn this section we briey introduce the background of edge com-puting and relevant techniques present our observations frompreliminary measurements and discuss the scenarios that motivateus

WiFi BSCloudEdge Computing

Server

WAN

WiFi BS

Edge Computing Server

LAN

WiFi BS

Server

WiFi BSServer

Figure 1 An overview of edge computing environment

21 Edge Computing NetworkIn the paper we consider an edge computing network as shownin Figure 1 in which we focus on two types of nodes the clientnode (in this paper we call it client for short) and the edge servernode (in this paper we call it edge edge node or edge server forshort) We assume that clients are one-hop away from edge servervia wire or wireless links When a client connects to the edgenode we implicitly indicate that the client will rst connect to the

LAVEA Latency-aware Video Analytics on Edge Computing Platform SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA

correspond access points (APs) using cable or wireless and thenutilize the services provided by the co-located edge node In asparse edge node deployment a client will only connect to oneof the available edge nodes nearby at certain location While in adense deployment a client may have multiple choices on selectingthe multiple edge servers for services Implicitly we assume thatthere is a remote cloud node which can be reached via the widearea network (WAN)

To understand the factors that impact the feasibility of realiz-ing practical edge computing systems we have performed severalpreliminary measurements on existing network and shown theresults in Fig 2 and Fig 3 In these experiments we measured thelatency and bandwidth of combinations between clients nodes withdierent network interfaces connecting to edge (or cloud) nodesBased on the measurements of bandwidth all clients have benetsin utilizing a wire-connected or advanced-wireless (80211ac 5Ghz)edge computing node In terms of latency wire-connected edgenodes is the best while the 5Ghz wireless edge computing nodeshave larger means and variances in latency compared to the cloudnode in the closest region due to the intrinsic nature of wirelesschannels erefore in this paper we pragmatically assume thatedge nodes are connected to APs via cables to deliver services withbeer latency and bandwidth than the cloud erefore in sucha setup the cloud node can be considered as a backup computingnode which will be utilized only when the edge node is saturatedand experiences a long response time

wired edge

WiFi 5Gedge

WiFi 24Ghzedge

ec2east

ec2west

050

100

150

200

RTT (

ms)

wired clientWiFi 24GHzWiFi 5GHz

Figure 2 Round trip time between client and edgecloud

22 Serverless ArchitectureServerless architecture or Function as a Service (FaaS) such asAWS Lambda Google Cloud Function Azure Functions is an agilesolution for developer to build cloud computing services withoutthe heave liing of managing cloud instances To use AWS Lambdaas an example AWS Lambda is a event-based micro-service frame-work in which a user-supplied Lambda function as the applicationlogic will be executed in response to corresponding event e AWScloud will take care of the provisioning and resource managementfor running Lambda functions At the rst time a Lambda functionis created a container will be built and launched based on the con-gurations provided Each container will also be provided a small

wired edge

WiFi 5Gedge

WiFi 24Ghzedge

ec2east

ec2west

05

0100

Bandw

idth

(M

bps)

wired clientWiFi 24GHzWiFi 5GHz

Figure 3 Bandwidth between client and edgecloud

disk space as transient cache during multiple invocations AWShas its own way to run Lambda functions with either reusing anexisting container or creating a new one Recently there is AWSLambdaEdge [1] that allows using serverless functions at theAWS edge location in response to CDN event to apply moderatecomputations We strongly advocate the adoption of serverless ar-chitecture at the edge computing layers as serverless architecturenaturally solves two important problems for edge computing 1)serverless programming model greatly reduces the burden on usersor developers in developing deploying and managing edge appli-cations as there is no need to understand the complex underlyingprocedures to run the applications or heavy liing of distributed sys-tem management 2) the functions are exible to run on either edgeor cloud which lowers the barrier of edge-cloud inter-operatabilityand federation Recent works have shown the potentials of sucharchitecture in low latency video processing tasks [11] and dis-tributed computing tasks [20] and there have been research eortsof incorporating serverless architecture in edge computing [8]

23 Video Edge Analytics for Public SafetyVideo surveillance is of great importance for public safety Besidesthe ldquoAmber Alertrdquo example there are many other applications inthis eld For example secure cameras deployed at public places(eg the airport) can quickly spot unaended bags [42] policewith body-worn cameras can identify suspects and suspicious vehi-cles during approaching and so on Because those scenarios areurgent and critical the applications need to provide the quickestresponses with best eorts However most tasks in video analyticsare undoubtedly computationally intensive [26] While runningon resource constrained mobile clients or IoT devices directly thelatency in computation baery drain (if baery-powered) or evenheat dissipation will eventually ruin the user experience failing toachieve the performance goals of the applications If running oncloud nodes transferring large volume of multimedia data will incurunacceptable transmission latency and additional bandwidth costBeing proposed as a dedicated solution the deployment of edgecomputing platform enables the quickest responses to these videoanalytics tasks which require both low latency and high bandwidth

In this paper we mainly focus on building video edge analyticsplatform and we demonstrate our platform using the application

SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA S Yi et al

of Automated License Plate Recognition (ALPR) Even though weintegrate specic application our edge platform is a general designand can be extended for other application with lile modicationsAn ALPR system usually has four stages 1) image acquisition 2)license plate extraction 3) license plate analysis and 4) characterrecognition [2 10] Each of the stages involves various computervision paern recognition and machine learning algorithms Mi-grating the execution of some algorithms to powerful edgecloudnode can signicantly reduce the response time [34] However of-oaded tasks require intermediate data application state variablesand corresponding congurations to be uploaded Some of the algo-rithms produce large amount of intermediate data will add delay tothe whole processing time if ooaded to remote cloud We believethat a carefully designed edge computing platform will assist ALPRsystem to expand on more resource-constrained devices at morelocations and provide beer response time at the same time

3 LAVEA SYSTEM DESIGNIn this section we present our system design First we will discussour design goals en we will overview our system design andintroduce several important edge computing services

31 Design Goalsbull Latency e ability to provide low latency services is

recognized as one of the essential requirements of edgecomputing system design

bull Flexibility Edge computing system should be able to ex-ibility utilize the hierarchical resources from client nodesnearby edge nodes and remote cloud nodes

bull Edge-rst By edge-rst we mean that the edge comput-ing platform is the rst choice of our computation ooad-ing target

32 System OverviewLAVEA is intrinsically an edge computing platform which supportslow-latency video processing e main components are edge com-puting node and edge client Whenever a client is running tasks andthe nearby edge computing node is available a task can be decidedto run either locally or remotely We present the architecture ofour edge computing platform in Figure 4

321 Edge Computing Node In LAVEA the edge computingnode provides edge computing services to themobile devices nearbye edge computing node aached to the same access point or basestation as clients is called the edge-front By deploying edge com-puting node with access point or base station we ensure that edgecomputing service can be as ubiquitous as Internet access Multipleedge computing nodes can collaborate and the edge-front will al-ways serve as the master and be in charge of the coordination withother edge nodes and cloud nodes As shown in Figure 4 we usethe light-weight virtualization technique to provide resource allo-cation and isolation to dierent clients Any client can submit tasksto the platform via client APIs e platform will be responsiblefor shaping workload managing queue priorities and schedulingtasks ose functions are implemented via internal APIs providedby multiple micro-services such as queueing service scheduling

service data store service etc We will introduce several importantservices later in this section

322 Edge Client Since most edge clients are either resourceconstrained devices or need to accommodate requests from a largenumber of clients an edge client usually runs lightweight dataprocessing tasks locally and ooads heavy tasks to edge computingnode nearby In LAVEA the edge client has a thin client design tomake sure all the clients can run it without introducing too muchoverhead For low-end devices there is only one worker to makeprogress on the assigned job e most important part of clientnode design is the proler and the ooading controller acting asparticipants in the corresponding proler service and ooadingservice With proler and ooading controller a client can provideooading information to the edge-front node and fulll ooadingdecision received

33 Edge Computing Services331 Profiler Service Similar to [7 21 31] our system uses a

proler to collect task performance information on various devicessince it is dicult to derive an analytic model to accurately capturethe behavior of the whole system However we have found thatthe execution of video process tasks is relatively stable (when inputand algorithmic congurations are given) and a proler can be usedto collect relevant metrics erefore we add a proling phaseto the deployment of every new type of client devices and edgedevices e proler will execute instrumented tasks multiple timeswith dierent inputs and congurations on the device and measuremetrics including but not limited to execution time inputoutputdata size etc e time-stamped logs will be gathered to buildthe task execution graph for specic tasks inputs congurationsand devices e proler service will collect those information onwhich LAVEA relies for ooading decisions

332 Monitoring Service Unlike proler service which gath-ers pre-run-time execution information on pre-dened inputs andcongurations the monitoring service is used to continuously mon-itoring and collect run-time information such as the network sys-tem load from not only the clients but also nearby edge nodesMonitoring the network between client and edge-front is neces-sary since most edge clients are connected to edge-front servervia wireless link e condition of wireless link is changing fromtime to time erefore we need to constantly monitor the wirelesslink to estimate the bandwidth and the latency Monitoring systemload on the edge client provides exible workload shaping andtask ooading from client to the edge is information is alsobroadcasted among nearby edge nodes When an edge-front nodeis saturated or unstable some tasks will be assigned to nearby edgenodes according to the system load the network bandwidth andnetwork delay between edge nodes as long as there is still benetinstead of assigning tasks to cloud node

333 Oloading Service e ooading controller will tracktasks running locally at the client exchange information with theooading service running on the edge-front server e variablesgather in proler and monitoring services will be used to as in-puts to the ooading decision problem which is formulated as anoptimization problem to minimize the response time Every time

LAVEA Latency-aware Video Analytics on Edge Computing Platform SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA

Host or Host Cluster Hardware

Host OS

Container

Container Manager (Docker Engine)

HDFS SQL KV Store

Data Store Service

Offloading Service

Queueing Service

Scheduling Service

Edge Front Gateway

Worker

Task Queue

Container Container Container Container

MonitoringService

Worker Worker

Task Scheduler

Worker Worker Worker

Worker Worker Worker

Producer

Workload Optimizer

Producer

Graph

Queue Prioritizer

Task Worker

ProfilerService

Edge Computing Platform API

Platform Internal API

OS-level Virtualization

Edge Computing Platform SDKEdge Computing Platform Client API

Application

ProfilerOffloading Controller

Worker Worker Worker

Task Scheduler

Local Worker Stack

OS or Container

Edge Computing Node

Access Potint

Security Camera

Dash Camera Smartphone and Tablet

Laptop

Figure 4 e architecture of edge computing platform

when a new client registers itself to the ooading services aerthe edge-front node collects enough prerequisite information andstatistics the optimization problem is solved again and the updatedooading decisions will be sent to all the clients Periodically theooading service also solves the optimization problem and updateooading decisions with its clients

4 EDGE-FRONT OFFLOADINGIn this section we describe how we select tasks of a job to runremotely on the edge server in order to minimize the responsetime

We consider selecting tasks to run on the edge as a computationooading problem Traditional ooading problems are ooadingschemes between clients and remote powerful cloud servers Inliterature [7 21 31] those system models usually assume the taskwill be instantly nished remotely once the task is ooaded to theserver However we argue that this assumption will not hold inedge computing environment as we need to consider the variousdelays at the server side especially when lots of clients are sendingooading requests We call it edge-front computation ooadingfrom the perspective of client

bull Tasks will be only by ooaded from client to the nearestedge node which we call the edge front

bull e underlying scheduling and processing is agnostic toclients

bull When a mobile node is disconnected from any edge nodeor even cloud node it will resort to local execution of allthe tasks

We assume that edge node is wire-connected to the access pointwhich indicates that the out-going trac can go through edge nodewith no additional cost e only dierence between ooading taskto edge node and cloud node is that the task running on edge nodemay experience resource contention and scheduling delay whilewe assume task ooaded to cloud node will get enough resourceand be scheduled to run immediately In light work load case ifthere is any response time reduction when this task is ooadedto cloud then we know that there is denitely benet when thistask is ooaded to the edge e reasons are 1) an edge server isas responsive as the server in the cloud data center 2) running atask on edge server experiences shorter data transmission delay asclient-edge link has much larger bandwidth than edge-cloud linkwhich is usually limited and imbalanced by the Internet serviceproviders (ISPs) erefore in this section we focus on the taskooading only between client and edge server and we will discussintegrating nearby edge nodes for the heavy work load scenario inthe next section

41 Task Oloading System Model andProblem Formulation

In the paper we call a running instance of the application as a jobwhich is usually a set of tasks e job is the unit of work thatuser submits to our system while the task is the unit of work forour system to make scheduling and optimization decisions esetasks generated from each application will be queued and processedeither locally or remotely By remotely we mean run the task on anedge node For simplicity we consider that all clients are running

SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA S Yi et al

instances of applications processing same kind of jobs which istypically the case in our edge application scenario However oursystem can easily extend to heterogeneous applications running oneach client device

We choose to work on the granularity of task since those tasksare modularized and can be exibly combined to either achievea speed processing or form a workow with high accuracy Inour ALPR application each task is usually a common algorithmor library of computer vision For example We have analyzed anopen source ALPR project called OpenALPR [22] and illustrate itstask graph in Fig 5

Input

MotionDetection

PlateDetection

PlateCharacterAnalysis

image

video frame

motionregion

Output

imagestill

CharacterRecognition

no plate detected

plate candidate

ResultGeneration

PlateCharacterAnalysis

CharacterRecognition

PlateCharacterAnalysis

CharacterRecognition

plate candidate

Figure 5 e task graph of OpenALPR

en we consider there are N clients and only one edge serverconnected as shown in Fig 1 is edge server could be a singleserver or a cluster of servers Each client i i isin [1N ] will processthe upcoming job upon request eg recognizing the license platesin video streams Usually those jobs will generate heavy compu-tation task and could benet from ooading some of them to theedge server Without loss of generality we use a graph of task torepresent the complex task relations inside a job which is essen-tially similar to the method call graph in [7] but in a more coarsegranularity For a certain kind of job we start with its directedacyclic graph (DAG) G = (V E) which gives the task executionsequence Each vertex v isin V weight is the computation or memorycost of a task (cv ) while each edge e = (uv)uv isin V e isin E weightrepresents the data size of intermediate results (duv ) us ourooading problem can be taken as a graph partition problem inwhich we need to assign a directed graph of tasks to dierent com-puting nodes (local edge or cloud) with the purpose to minimize

certain cost In this paper we primarily try to minimize the jobnish time

e remote response time includes the communication delay thenetwork transmission delay of sending data to the edge server andthe execution time on that server We use an indicator Ivi isin 0 1for all v in V and for all i isin [1N ] If Ivi = 1 then the task v atclient i will run locally otherwise it will run on the remote edgeserver For those tasks running locally the total execution time forclient i is a sum as

T locali =sumv isinV

Ivicvpi (1)

where pi is the processor speed of client i Similarly we use

Tlocali =

sumv isinV(1 minus Ivi )cvpi (2)

to represent the execution time of running the ooaded taskslocally For network when there is an ooading decision theclient need to upload the intermediate data (outputs of previoustask application status variables congurations etc) to the edgeserver in order to continue the computation e network delay ismodeled as

Tneti =sum(uv)isinE

(Iui minus Ivi )duvri + βi (3)

where ri is the bandwidth assigned for this client and βi is thecommunication latency which can be estimated using round triptime between the client i and the edge server

For each client the remote execution time is

T r emotei =

sumv isinV(1 minus Ivi )(cvp0) (4)

where p0 is the processor speed of the edge serveren our ooading task selection problem can be formulated as

minIi ri

Nsumi=1(T locali +Tneti +T r emote

i ) (5)

e ooading task selection is represented by the indicator matrixI is optimization problem is subject to the following constraints

bull e total bandwidth

stNsumi=1

ri le R (6)

bull Like existing work we restrict the data ow to avoid ping-pong eect in which intermediate data is transmied backand forth between client and edge serverst Ivi le Iui foralle(uv) isin Eforalli isin [1N ] (7)

bull Unlike exiting ooading frameworks for mobile cloud com-puting we take the resource contention or schedule delayat the edge side into consideration by adding a end-to-enddelay constraint

st Tlocali minus (Tneti +T r emote

i ) gt τ foralli isin [1N ] (8)where τ can be tuned to avoid selecting borderline tasksthat if ooaded will get no gain due to the resource con-tention or schedule delay at the edge

LAVEA Latency-aware Video Analytics on Edge Computing Platform SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA

42 Optimization Solvere proposed optimization is a mixed integer non-linear program-ming problem (MINLP) where the integer variable stands for theooading decision and the continuous variable stands for the band-width allocation To solve this optimization problem we start fromrelaxing the integer constraints and solve the non-linear program-ming version of the problem using Sequential adratic Program-ming method a constrained nonlinear optimization method issolution is optimal without considering the integer constraintsStarting from this optimal solution we optionally employ branchand bound (BampB) method to search for the optimal integer solutionor simply do an exhaustive search when the number of clients andthe number of tasks of each job are small

43 Prioritizing Edge Taskeuee ooading strategy produced by the task selection optimizesthe rdquoowldquo time of each type of job At each time epoch during therun time the edge-front node receives a large number of ooadedtasks from the clients Originally we follow the rst come rstserve rule to accommodate all the client requests For each requestat the head of the task queue the edge-front server rst checksif the input or intermediate data (eg images or videos) availableat the edge otherwise the server waits is scheme is easy toimplement but substantial computation is wasted if the networkIO is busy with a large size le and there is no task that is readyfor processing erefore we improve the task scheduling witha task queue prioritizer to maintain a task sequence which mini-mizes the makespan for the task scheduling of all ooading taskrequests received at a certain time epoch Since the edge node canexecute the task only when the input data has been fully receivedor the depended tasks have nished execution We consider thatan ooaded task has to go through two stages the rst stage isthe retrieval of input or intermediate data and state variables thesecond stage is the execution of the task

We model our scheduling problem using the ow job shop modeland apply the Johnsonrsquos rule [19] is scheme is optimal and themakespan is minimized when the number of stages is two Nev-ertheless this model only ts in the case that all submied jobrequests are independent and no priorities When consideringtask dependencies a successor can only start aer its predecessornishes By enforcing the topological ordering constraints theproblem can be solved optimally using the BampB method [5] How-ever this solution hardly scales against the number of tasks Inthis case we adapt the method in [3] and group tasks with depen-dencies and execute all tasks in a group sequentially e basicidea is applying Johnsonrsquos rule in two levels e rst level is todecide the sequence of tasks with in each group e dierence inour problem is that we need to decide the best sequence amongall valid topological orderings en the boom level is a job shopscheduling problem in terms of grouped jobs (ie a group of taskswith dependencies in topological ordering) in which we can utilizeJohnsonrsquos rule directly

44 Workload OptimizerIf the workload is overwhelming and the edge-front server is satu-rated the task queue will be unstable and the response time will

be accumulated indenitely ere are several measures we cantake to address this problem First we can adjust the imagevideoresolution in client-side congurations which makes well trade-o between speed and accuracy Second by constraining the taskooading problem we can restrain more computation tasks at theclient side ird if there are nearby edge nodes which are favoredin terms of latency bandwidth and computation we can furtherooad tasks to nearby edge nodes We have investigated this casewith performance improvement considerations in the Section 5Last we can always redirect tasks to the remote cloud just likeooading in MCC

5 INTER-EDGE COLLABORATIONIn this section We improve our edge-rst design by taking the casewhen the incoming workload saturates our edge-front node intoconsideration We will rst discuss our motivation of providingsuch option and list the corresponding challenges en we willintroduce several collaboration schemes we have proposed andinvestigated

51 Motivation and Challengese resources of edge computing node are much richer than clientnodes but are relatively limited compared to cloud nodes Whileserving an increasing number of client nodes nearby the edge-frontnode will be eventually overloaded and become non-responsive tonew requests As a baseline we can optionally choose to ooad fur-ther requests to the remote cloud We assume that the remote cloudhas unlimited resources and is capable to handle all the requestsHowever running tasks remotely in the cloud the application needto bear with unpredictable latency and limited bandwidth whichis not the best choice especially when there are other nearby edgenodes that can accommodate those tasks We assume that underthe condition when all available edge nodes nearby are exhaustedthe mobile-edge-cloud computing paradigm will simply fall back tothe mobile cloud computing paradigm e fallback design is notin the scope of this paper In this paper we mainly investigate theinter-edge collaboration with the prime purpose to alleviate theburden on edge-front node

When the edge-front node is saturated with requests it cancollaborate with nearby edge nodes by placing some tasks to thesenot-so-busy edge nodes such that all the tasks can get scheduledin a reasonable time is is slightly dierent from balancing theworkload among the edge nodes and the edge-front node in that thegoal of inter-edge collaboration is to beer serve the client nodeswith submied requests rather than simply making the workloadbalanced For example an edge-front node that is not overloadeddoes not need to place any tasks to the nearby edge nodes evenwhen they are idle

e challenges of inter-edge collaboration are two-fold 1) weneed to design a proper inter-edge task placement scheme thatfullls our goal of reducing the workload on the edge-front nodewhile ooading proper amount of workload to the qualied edgenodes 2) the task placement scheme should be lightweight scalableand easy-to-implement

SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA S Yi et al

52 Inter-Edge Task Placement SchemesWe have investigated three task placement schemes for inter-edgecollaboration

bull Shortest Transmission Time First (STTF)bull Shortest eue Length First (SQLF)bull Shortest Scheduling Latency First (SSLF)

e STTF task placement scheme tends to place tasks on theedge node that has the shortest estimated latency for the edge-front node to transfer the tasks e edge-front node maintains atable to record the latency of transmiing data to each availableedge node e periodical re-calibration is necessary because thenetwork condition between the edge-front node and other edgenodes may vary from time to time

e SQLF task placement scheme on the other hand tends totransfer tasks from the edge-front node to the edge node which hasthe least number of tasks queued upon the time of query Whenthe edge-front node is saturated with requests it will rst queryall the available edge nodes about their current task queue lengthand then transfer tasks to the edge node that has the shortest valuereported

e SSLF task placement scheme tends to transmit tasks from theedge-front node to the edge node that is predicted to have the short-est response time e response time is the time interval betweenthe time when the edge-front node submits a task to an availableedge node and the time when it receives the result of the task fromthat edge node Unlike the SQLF task placement scheme the edge-front node keeps querying the edge nodes about the queue lengthwhich may has performance issue when the number of nodes scalesup and results in a large volume of queries We have designed anovel method for the edge-front node to measure the schedulinglatency eciently During the measurement phase before edge-front node chooses task placement target edge-front node sendsa request message to each available edge node which appends aspecial task to the tail of the task queue When the special taskis executed the edge node simply sends a response message tothe edge-front node e edge-front node receives the responsemessage and records the response time Periodically the edge-frontnode maintains a series of response times for each available edgenode When the edge-front node is saturated it will start to reassigntasks to the edge node having the shortest response time Unlikethe STTF and SQLF task assignment schemes which choose thetarget edge node based on the current or most recent measurementsthe SSLF scheme predicts the current response time for each edgenode by applying regression analysis to the response time seriesrecorded so far e reason is that the edge nodes are also receivingtask requests from client nodes and their local workload may varyfrom time to time so the most recent response time cannot serveas a good predictor of the current response time for the edge nodesAs the local workload in the real world on each edge node usuallyfollows certain paern or trend applying regression analysis tothe recorded response times is a good way to estimate the currentresponse time To this end we recorded measurements of responsetimes from each edge node and ooads tasks to the edge nodethat is predicted to have the least current response time Once theedge-front node starts to place task to a certain edge node the

estimation will be updated using piggybacking of the redirectedtasks which lowers the overhead of measuring

Each of the task placement schemes described above has someadvantages and disadvantages For instance the STTF scheme canquickly reduce the workload on the edge-front node But there isa chance that tasks may be placed to an edge node which alreadyhas intensive workload as STTF scheme gathers no information ofthe workload on the target e SQLF scheme works well when thenetwork latency and bandwidth are stable among all the availableedge nodes When the network overheads are highly variant thisscheme fails to factor the network condition and always choosesedge node with the lowest workload When an intensive workloadis placed under a high network overhead this scheme potentiallydeteriorates the performance as it needs to measure the workloadfrequently e SSLF task placement scheme estimates the responsetime of each edge node by following the task-ooading processand the response time is a good indicator of which edge node shouldbe chosen as the target of task placement in terms of the workloadand network overhead e SSLF scheme is a well trade-o betweenprevious two schemes However the regression analysis may intro-duce a large error to the predicted response time if inappropriatemodels are selected We believe that the decision of which taskplacement scheme should be employed for achieving good systemperformance should always give proper considerations on the work-load and network conditions We evaluated those three schemesthrough a case study in the next section

6 SYSTEM IMPLEMENTATION ANDPERFORMANCE EVALUATION

In this section we rst brief the implementation details in buildingour system Next we introduce our evaluation setup and presentresults of our evaluations

61 Implementation DetailsOur implementation aims at a serverless edge architecture Asshown in system architecture of Fig 4 our implementation is basedon docker container for the benets of quick deployment and easymanagement Every component has been dockerized and its de-ployment is greatly simplied via distributing pre-built images ecreation and destruction of docker instances is much faster thanthat of VM instances Inspired by the IBM OpenWhisk [18] eachworker container contains an action proxy which uses Pythonto run any scripts or compile and execute any binary executablee worker container communicates with others using a messagequeue as all the inputsoutputs will be jsonied However we donrsquotjsonied imagevideo and use its path reference in shared storagee task queue is implemented using Redis as it is in memory andhas very good performance e end user only needs to 1) deployour edge computing platform on heterogeneous devices with justa click 2) dene the event of interests using provided API 3) pro-vide a function (scripts or binary executable) to process such evente function we have implemented is using open source projectOpenALPR[22] as the task payload for workers

LAVEA Latency-aware Video Analytics on Edge Computing Platform SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA

62 Evaluation Setup621 Testbed We have built a testbed consisting of four edge

computing nodes One of the edge nodes is the edge-front nodewhich is directly connected to a wireless router using a cable Otherthree nodes are set as nearby edge computing nodes for the evalua-tion of inter-edge collaboration ese four machines have the samehardware specications ey all have a quad-core CPU and 4 GBmain memory e three nearby edge nodes are directly connectedto the edge-front node through a network cable We make use oftwo types of Raspberry Pi (RPi) nodes as clients one type is RPi 2which is wired to the router while the other type is RPi 3 which isconnected to router using built-in 24 Ghz WiFi

622 Datasets We have employed three datasets for evalua-tions One dataset is the Caltech Vision Group 2001 testing databasein which the car rear image resolution (126 images with resolution896x592) is adequate for license plate recognition [25] Anotherdataset is a self-collected 4K video contains rear license plates takenon an Android smartphone and is converted into videos of dierentresolutions(640x480 960x720 1280x960 and 1600x1200) e otherdataset used in inter-edge collaboration evaluation contains 22 carimages with the various resolution ranging from 405x540 pixelsto 2514x1210 pixels (le size 316 KB to 285 MB) e task requestsuse the car images as input in a round-robin way one car imagefor each task request

63 Task ProlerBeside the round trip time and bandwidth benchmark we havepresented in Fig 2 and Fig 3 to characterize the edge computingnetwork we have done proling of the OpenALPR application onvarious client edge and cloud nodes

896x592workload1

640x480workload2

960x720workload2

1280x960workload2

1600x1200workload2

0200

400

600

800

Clie

nt

RPi2

Execu

tion T

ime(m

s)

MotionDetectonPlateDetectionPlateAnalysisOCR

Figure 6 OpenALPR prole result of client type 1 (RPi2quad-core 09GHz)

In this experiment we are using both dataset 1 (workload 1)and dataset 2 (workload 2) at various resolutions e executiontime for each tasks are shown in Fig 6 Fig 7 Fig 8 and Fig 9e results indicate that by utilizing an edge nodes we can geta comparable amount of computation power close to clients forcomputation-intensive tasks Another observations is that dueto the uneven optimizations on heterogeneous CPU architecturessome tasks are beer to keep local while some others should be

896x592workload1

640x480workload2

960x720workload2

1280x960workload2

1600x1200workload2

020

04

00

60

0800

Clie

nt

RPi3

Execu

tion T

ime(m

s)

MotionDetectonPlateDetectionPlateAnalysisOCR

Figure 7 OpenALPR prole result of client type 2 (RPi3quad-core 12GHz)

896x592workload1

640x480workload2

960x720workload2

1280x960workload2

1600x1200workload2

0200

400

600

800

Edge E

xecu

tion T

ime(m

s)

MotionDetectonPlateDetectionPlateAnalysisOCR

Figure 8 OpenALPR prole result of a type of edge node (i7quad-core 230GHz)

896x592workload1

640x480workload2

960x720workload2

1280x960workload2

1600x1200workload2

0200

400

600

800

EC

2 T

2 L

arg

e E

xecu

tion T

ime(m

s)

MotionDetectonPlateDetectionPlateAnalysisOCR

Figure 9 OpenALPR prole of a type of cloud node (AWSEC2 t2large Xeon dual-core 240GHz)

ooaded to edge computing node is observation justied theneed of computation ooading between clients and edge nodes

SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA S Yi et al

640x480workload2

960x720workload2

1280x960workload2

1600x1200workload2

01

23

4R

esp

onse

Tim

e p

er

fram

e p

er

clie

nt(

s) Client-edge optClient onlyEdge onlyClient-cloud optCloud only

Figure 10 e comparison of task selection impacts on edgeoloading and cloud oloading for wired clients (RPi2)

640x480workload2

960x720workload2

1280x960workload2

1600x1200workload2

01

23

4R

esp

onse

Tim

e p

er

fram

e p

er

clie

nt(

s) Client-edge optClient onlyEdge onlyClient-cloud optCloud only

Figure 11 e comparison of task selection impacts on edgeoloading and cloud oloading for 24 Ghz wireless clients(RPi3)

64 Oloading Task SelectionTo understand how much the execution time can be reduced byspliing tasks between the client and the edge or between theclient and the cloud we design an experiment with workloadsgenerated from dataset 2 on two setups of scenarios 1) one edgenode provides service to three wired client nodes that have thebest network latency and bandwidth 2) one edge node providesservice to three wireless 24 Ghz client nodes that have latencywith high variance and relatively low bandwidth e result ofthe rst case is very straightforward the clients simply uploadall the input data and run all the tasks on the edge node in edgeooading or cloud node in cloud ooading as shown in Fig 10is is mainly because using Ethernet cable can stably providelowest latency and highest bandwidth which makes ooading toedge very rewarding We didnrsquot evaluate 5 Ghz wireless client sincethis interface is not supported on our client hardware while weanticipate similar results as the wire case We plot the result of a24 Ghz wireless client node with ooading to an edge node or aremote cloud node in the second case in Fig 11 Overall the resultsshowed that by ooading tasks to an edge computing platform

the application we choose experienced a speedup up to 40x onwired client-edge conguration compared to local execution andup to 17x compared to a similar client-cloud conguration Forclients with 24 Ghz wireless interface the speedup is up to 13x onclient-edge conguration compared to local execution and is up to12x compared to similar client-cloud conguration

5 10 15 20 25 30 35Number of task offloading requests

05

Resp

onse

Tim

e(s

)

Our schemeSIOFLCPUL

Figure 12 e comparison result of three task prioritizingschemes

65 Edge-front Taskeue PrioritizingTo evaluate the performance of the task queue prioritizing wecollect the statistical results from our proler service and moni-toring service on various workload for simulation We choose thesimulation method because we can freely setup the numbers andtypes of client and edge nodes to overcome the limitation of ourcurrent testbed to evaluate more complex deployments We addtwo simple schemes as baselines 1) shortest IO rst (SIOF) sortingall the tasks against the time cost of the network transmission 2)longest CPU last (LCPUL) sorting all the tasks against the time costof the processing on the edge node In the simulation base on thecombination of client device types workloads and ooading deci-sions we have in total seven types of jobs to run on the edge nodeWe increase the total number of jobs and evenly distributed themamong the seven types and report the makespan time in Fig 12 eresult showed that LCPUL is the worst among those three schemesand our scheme outperforms the shortest job rst scheme

66 Inter-Edge CollaborationWe also evaluate the performance of the three task placementschemes (ie STTF SQLF and SSLF) discussed in Section 5 througha controlled experiment on our testbed For evaluation purpose wecongure the network in the edge computing system as followse rst edge node denoted as ldquoedge node 1rdquo aerwards has 10ms RTT and 40 Mbps bandwidth to the edge-front node e secondedge node ldquoedge node 2rdquo has 20 ms RTT and 20 Mbps bandwidthto the edge-front node e third edge node ldquoedge node 3rdquo has100 ms RTT and 2 Mbps bandwidth to the edge-front node uswe emulate the situation where three edge nodes are in the distanceto the edge-front node from near to far

LAVEA Latency-aware Video Analytics on Edge Computing Platform SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA

0

05

1

15

2

25

3

0 2 4 6 8 10 12

Th

rou

gh

pu

t (t

ask

sse

c)

Time (min)

Edge-front nodeEdge node 1Edge node 2Edge node 3

Figure 13 Performance with no task placement scheme

0

05

1

15

2

25

3

0 2 4 6 8 10 12

Th

rou

gh

pu

t (t

ask

sse

c)

Time (min)

Edge-front nodeEdge node 1Edge node 2Edge node 3

Figure 14 Performance of STTF

0

05

1

15

2

25

3

0 2 4 6 8 10 12

Th

rou

gh

pu

t (t

ask

sse

c)

Time (min)

Edge-front nodeEdge node 1Edge node 2Edge node 3

Figure 15 Performance of SQLF

0

05

1

15

2

25

3

0 2 4 6 8 10 12

Th

rou

gh

pu

t (t

ask

sse

c)

Time (min)

Edge-front nodeEdge node 1Edge node 2Edge node 3

Figure 16 Performance of SSLF

We use the third dataset to synthesize a workload as follows Inthe rst 4 minutes the edge-front node receives 5 task requests persecond edge node 1 receives 4 task requests per second edge node2 receives 3 task requests per second and edge node 3 receives 2task requests per second respectively No task comes to any of theedge nodes aer the rst 4 minutes For the SSLF task placementscheme we implement a simple linear regression to predict thescheduling latency of the task being transmied since the workloadwe have injected is uniform distributed

Fig 13 illustrates the throughput on each edge node when notask placement scheme is enabled on the edge-front node eedge-front node has the heaviest workload and it takes about 1236minutes to nish all the tasks We consider this result as our base-line

Fig 14 is the throughput result of STTF scheme In this case theedge-front node only transmits tasks to edge node 1 because edgenode 1 has the highest bandwidth and the shortest RTT to theedge-front node Fig 17 reveals that the edge-front node transmits120 tasks to edge node 1 and no task to other edge nodes Asedge node 1 has heavier workload than edge node 2 and edgenode 3 the STTF scheme has limited improvement on the systemperformance the edge-front node takes about 1129 minutes tonish all the tasks Fig 15 illustrates the throughput result of SQLF

scheme is scheme works beer than the STTF scheme becausethe edge-front node transmits more tasks to less-saturated edgenodes eciently reducing the workload on the edge-front nodeHowever the edge-front node intends to transmit many tasks toedge node 3 at the beginning which has the lowest bandwidth andthe longest RTT to the edge-front node As such the task placementmay incur more delay then expected From Fig 17 the edge-frontnode transmits 0 task to edge node 1 132 tasks to edge node 2and 152 tasks to edge node 3 e edge-front node takes about 96minutes to nish all the tasks

Fig 16 demonstrates the throughput result of SSLF scheme isscheme considers both the transmission time of the task beingplaced and the waiting time in the queue on the target edge nodeand therefore achieves the best performance of the three As men-tioned edge node 1 has the lowest transmission overhead but theheaviest workload among the three edge nodes while edge node3 has the lightest workload but the highest transmission overheadIn contrast edge node 2 has modest transmission overhead andmodest workload e SSLF scheme takes all these situations intoconsideration and places the most number of tasks on edge node2 As shown in Fig 17 the edge-front node transmits 4 tasks toedge node 1 152 tasks to edge node 2 and 148 tasks to edgenode 3 when working with the SSLF scheme e edge-front node

SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA S Yi et al

takes about 936 minutes to nish all the tasks which is the bestresult among the three schemes We infer that the third schemewill further improve the task complete time if more tough networkconditions and workloads are considered

STTF SQLF SSLF

050

10

015

0N

um

ber

of

task

palc

ed

0task

0task

0task

4tasks

Edge node 1Edge node 2Edge node 3

Figure 17 Numbers of tasks placed by the edge-front node

7 RELATEDWORKe emergence of edge computing has drawn aentions due to itscapabilities to reshape the land surface of IoTs mobile computingand cloud computing [6 14 32 33 36ndash38] Satyanarayanan [29]has briefed the origin of edge computing also known as fog comput-ing [4] cloudlet [28] mobile edge computing [24] and so on Herewe will review several relevant research elds towards video edgeanalytics including distributed data processing and computationooading in various computing paradigms

71 Distributed Data ProcessingDistributed data processing has close relationship to the edge an-alytics in the sense that those data processing platforms [9 39]and underlying techniques [16 23 27] can be easily deployed on acluster of edge nodes In this paper we pay specially aentions todistributed imagevideo data processing systems VideoStorm[40]made insightful observation on vision-related algorithms and pro-posed resource-quality trade-o with multi-dimensional congu-rations (eg video resolution frame rate sampling rate slidingwindow size etc) e resource-quality proles are generated of-ine and a online scheduler is built to allocate resources to queriesto optimize the utility of quality and latency eir work is comple-mentary to ours in that we do not consider the trade-o betweenquality and latency goals via adaptive congurations Vigil [42] isa wireless video surveillance system that leveraged edge comput-ing nodes with emphasis on the content-aware frame selections ina scenario where multiple web cameras are at the same locationto optimize the bandwidth utilization which is orthogonal to theproblems we have addressed here Firework [41] is a computingparadigm for big data processing in collaborative edge environmentwhich is complementary to our work in terms of shared data viewand programming interface

While there should be more on-going eorts for investigating theadaptation improvement and optimization of existing distributed

data processing techniques on edge computing platform we focusmore on the taskapplication-level queue management and sched-uling and leave all the underlying resource negotiating processscheduling to the container cluster engine

72 Computation OloadingComputation Ooading (aka Cyber foraging [28]) has been pro-posed to improve resource utilization response time and energyconsumption in various computing environments [7 13 21 31]Work [17] has quantied the impact of edge computing on mobileapplications and found that edge computing can improve responsetime and energy consumption signicantly for mobile devicesthrough ooading via both WiFi and LTE networks Mocha [34]has investigated how a two-stage face recognition task from mobiledevice can be accelerated by cloudlet and cloud In their designclients simply capture image and sends to cloudlet e optimaltask partition can be easily achieved as it has only two stages InLAVEA our application is more complicated in multiple stages andwe leverage client-edge ooading and other techniques to improvethe resource utilization and optimize the response time

8 DISCUSSIONS AND LIMITATIONSIn this section we will discuss alternative design options point outcurrent limitations and future work that can improve the system

Measurement-based Oloading In this paper we are using ameasurement-based ooading (static ooading) ie the ooadingdecisions are based on the outcome of periodic measurements Weconsider this as one of the limitations of our implementations asstated in [15] and there are several dynamic computation ooadingschemes have been proposed [12] We are planning to improve themeasurement-based ooading in the future work

Video Streaming Our current data processing is image-basedwhich is one of the limitations of our implementation e inputis either in the format of image or in video stream which is readinto frames and sent out We believe by utilizing existing videostreaming techniques in between our system components for datasharing will further improve the system performance and openmore potential opportunities for optimization

Discovering EdgeNodes ere are dierent ways for the edge-front node to discover the available edge nodes nearby For exampleevery edge node intending to serve as a collaborator may open adesignated port so that the edge-front node can periodically scanthe network and discover the available edge nodes is is calledthe ldquopull-basedrdquo method In contrast there is also a ldquopush-basedrdquomethod in which the edge-front node opens a designated port andevery edge node intending to serve as a collaborator will register tothe edge-front node When the network is in a large scale the pull-based method usually performs poorly because the edge-front nodemay not be able to discover an available edge node in a short timeFor this reason the edge node discovery is implemented in a push-based method which guarantees good performance regardless ofthe network scale

9 CONCLUSIONIn this paper we have investigated providing video analytic servicesto latency-sensitive applications in edge computing environment

LAVEA Latency-aware Video Analytics on Edge Computing Platform SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA

As a result we have built LAVEA a low-latency video edge analyticsystem which collaborates nearby client edge and remote cloudnodes and transfers video feeds into semantic information at placescloser to the users in early stages We have utilized an edge-frontdesign and formulated an optimization problem for ooading taskselection and prioritized task queue to minimize the response timeOur result indicated that by ooading tasks to the closest edgenode the client-edge conguration has speedup range from 13x to4x (12x to 17x) against running in local (client-cloud) under variousnetwork conditions and workloads In case of a saturating workloadon the front edge node we have proposed and compared varioustask placement schemes that are tailed for inter-edge collaboratione proposed prediction-based shortest scheduling latency rsttask placement scheme considers both the transmission time ofthe task and the waiting time in the queue and outputs overallperformance beer than the other schemes

REFERENCES[1] Amazon Web Service 2017 AWS LambdaEdge hpdocsawsamazoncom

lambdalatestdglambda-edgehtml (2017)[2] Christos-Nikolaos E Anagnostopoulos Ioannis E Anagnostopoulos Ioannis D

Psoroulas Vassili Loumos and Eleherios Kayafas 2008 License plate recog-nition from still images and video sequences A survey IEEE Transactions onintelligent transportation systems 9 3 (2008) 377ndash391

[3] KR Baker 1990 Scheduling groups of jobs in the two-machine ow shop Math-ematical and Computer Modelling 13 3 (1990) 29ndash36

[4] Flavio Bonomi Rodolfo Milito Jiang Zhu and Sateesh Addepalli 2012 Fogcomputing and its role in the internet of things In Proceedings of the rst editionof the MCC workshop on Mobile cloud computing ACM 13ndash16

[5] Peter Brucker Bernd Jurisch and Bernd Sievers 1994 A branch and boundalgorithm for the job-shop scheduling problem Discrete applied mathematics 491 (1994) 107ndash127

[6] Yu Cao Songqing Chen Peng Hou and Donald Brown 2015 FAST A fog com-puting assisted distributed analytics system to monitor fall for stroke mitigationIn Networking Architecture and Storage (NAS) 2015 IEEE International Conferenceon IEEE 2ndash11

[7] Eduardo Cuervo Aruna Balasubramanian Dae-ki Cho Alec Wolman StefanSaroiu Ranveer Chandra and Paramvir Bahl 2010 MAUI Making SmartphonesLast Longer with Code Ooad In Proceedings of the 8th International Conferenceon Mobile Systems Applications and Services (MobiSys rsquo10) ACM New York NYUSA 49ndash62 DOIhpsdoiorg10114518144331814441

[8] Eyal de Lara Carolina S Gomes Steve Langridge S Hossein Mortazavi andMeysam Roodi 2016 Hierarchical Serverless Computing for the Mobile EdgeIn Edge Computing (SEC) IEEEACM Symposium on IEEE 109ndash110

[9] Jerey Dean and Sanjay Ghemawat 2008 MapReduce simplied data processingon large clusters Commun ACM 51 1 (2008) 107ndash113

[10] Shan Du Mahmoud Ibrahim Mohamed Shehata and Wael Badawy 2013 Auto-matic license plate recognition (ALPR) A state-of-the-art review IEEE Transac-tions on circuits and systems for video technology 23 2 (2013) 311ndash325

[11] Sadjad Fouladi Riad S Wahby Brennan Shackle Karthikeyan Vasuki Balasubra-maniam William Zeng Rahul Bhalerao Anirudh Sivaraman George Porter andKeith Winstein 2017 Encoding Fast and Slow Low-Latency Video ProcessingUsing ousands of Tiny reads In 14th USENIX Symposium on NetworkedSystems Design and Implementation (NSDI 17) USENIX Association BostonMA 363ndash376 hpswwwusenixorgconferencensdi17technical-sessionspresentationfouladi

[12] Wei Gao Yong Li Haoyang Lu Ting Wang and Cong Liu 2014 On exploitingdynamic execution paerns for workload ooading in mobile cloud applicationsIn Network Protocols (ICNP) 2014 IEEE 22nd International Conference on IEEE1ndash12

[13] Mark S Gordon Davoud Anoushe Jamshidi Sco A Mahlke Zhuoqing Mor-ley Mao and Xu Chen 2012 COMET Code Ooad by Migrating ExecutionTransparently In OSDI Vol 12 93ndash106

[14] Zijiang Hao Ed Novak Shanhe Yi andn Li 2017 Challenges and SowareArchitecture for Fog Computing Internet Computing (2017)

[15] Mohammed A Hassan Kshitiz Bhaarai Qi Wei and Songqing Chen 2014POMAC Properly Ooading Mobile Applications to Clouds In 6th USENIXWorkshop on Hot Topics in Cloud Computing (HotCloud 14)

[16] Benjamin Hindman Andy Konwinski Matei Zaharia Ali Ghodsi Anthony DJoseph Randy H Katz Sco Shenker and Ion Stoica 2011 Mesos A Platformfor Fine-Grained Resource Sharing in the Data Center In NSDI Vol 11 22ndash22

[17] Wenlu Hu Ying Gao Kiryong Ha Junjue Wang Brandon Amos Zhuo ChenPadmanabhan Pillai andMahadev Satyanarayanan 2016antifying the Impactof Edge Computing onMobile Applications In Proceedings of the 7th ACM SIGOPSAsia-Pacic Workshop on Systems ACM 5

[18] IBM 2017 Apache OpenWhisk hpopenwhiskorg (April 2017)[19] Selmer Martin Johnson 1954 Optimal two-and three-stage production schedules

with setup times included Naval research logistics quarterly 1 1 (1954) 61ndash68[20] Eric Jonas Shivaram Venkataraman Ion Stoica and Benjamin Recht 2017

Occupy the Cloud Distributed computing for the 99 arXiv preprintarXiv170204024 (2017)

[21] Ryan Newton Sivan Toledo Lewis Girod Hari Balakrishnan and Samuel Mad-den 2009 Wishbone Prole-based Partitioning for Sensornet Applications InNSDI Vol 9 395ndash408

[22] OpenALPR 2017 OpenALPR ndash Automatic License Plate Recognition hpwwwopenalprcom (April 2017)

[23] Kay Ousterhout Patrick Wendell Matei Zaharia and Ion Stoica 2013 Sparrowdistributed low latency scheduling In Proceedings of the Twenty-Fourth ACMSymposium on Operating Systems Principles ACM 69ndash84

[24] M Patel B Naughton C Chan N Sprecher S Abeta A Neal and others 2014Mobile-edge computing introductory technical white paper White Paper Mobile-edge Computing (MEC) industry initiative (2014)

[25] Brad Philip and Paul Updike 2001 Caltech Vision Group 2001 testing databasehpwwwvisioncaltecheduhtml-lesarchivehtml (2001)

[26] Kari Pulli Anatoly Baksheev Kirill Kornyakov and Victor Eruhimov 2012 Real-time computer vision with OpenCV Commun ACM 55 6 (2012) 61ndash69

[27] Je Rasley Konstantinos Karanasos Srikanth Kandula Rodrigo Fonseca MilanVojnovic and Sriram Rao 2016 Ecient queue management for cluster sched-uling In Proceedings of the Eleventh European Conference on Computer SystemsACM 36

[28] Mahadev Satyanarayanan 2001 Pervasive computing Vision and challengesIEEE Personal communications 8 4 (2001) 10ndash17

[29] Mahadev Satyanarayanan 2017 e Emergence of Edge Computing Computer50 1 (2017) 30ndash39

[30] Mahadev Satyanarayanan Pieter Simoens Yu Xiao Padmanabhan Pillai ZhuoChen Kiryong Ha Wenlu Hu and Brandon Amos 2015 Edge analytics in theinternet of things IEEE Pervasive Computing 14 2 (2015) 24ndash31

[31] Cong Shi Karim Habak Pranesh Pandurangan Mostafa Ammar Mayur Naikand Ellen Zegura 2014 Cosmos computation ooading as a service for mobiledevices In Proceedings of the 15th ACM international symposium on Mobile adhoc networking and computing ACM 287ndash296

[32] W Shi J Cao Q Zhang Y Li and L Xu 2016 Edge Computing Visionand Challenges IEEE Internet of ings Journal 3 5 (Oct 2016) 637ndash646 DOIhpsdoiorg101109JIOT20162579198

[33] Weisong Shi and Schahram Dustdar 2016 e Promise of Edge ComputingComputer 49 5 (2016) 78ndash81

[34] Tolga Soyata Rajani Muraleedharan Colin Funai Minseok Kwon and WendiHeinzelman 2012 Cloud-Vision Real-time face recognition using a mobile-cloudlet-cloud acceleration architecture In Computers and Communications(ISCC) 2012 IEEE Symposium on IEEE 000059ndash000066

[35] Xiaoli Wang Aakanksha Chowdhery andMung Chiang 2016 SkyEyes adaptivevideo streaming from UAVs In Proceedings of the 3rd Workshop on Hot Topics inWireless ACM 2ndash6

[36] Shanhe Yi Zijiang Hao Zhengrui Qin andn Li 2015 Fog Computing Plat-form and Applications In Hot Topics in Web Systems and Technologies (HotWeb)2015 ird IEEE Workshop on IEEE 73ndash78

[37] Shanhe Yi Cheng Li and n Li 2015 A Survey of Fog Computing ConceptsApplications and Issues In Proceedings of the 2015 Workshop on Mobile Big DataMobidata rsquo15 ACM 37ndash42

[38] Shanhe Yi Zhengrui Qin andn Li 2015 Security and privacy issues of fogcomputing A survey InWireless Algorithms Systems and Applications Springer685ndash695

[39] Matei Zaharia Mosharaf Chowdhury Michael J Franklin Sco Shenker and IonStoica 2010 Spark cluster computing with working sets HotCloud 10 (2010)10ndash10

[40] Haoyu Zhang Ganesh Ananthanarayanan Peter Bodik Mahai PhiliposeParamvir Bahl and Michael J Freedman 2017 Live Video Analytics at Scale withApproximation and Delay-Tolerance In 14th USENIX Symposium on NetworkedSystems Design and Implementation (NSDI 17) USENIX Association BostonMA 377ndash392 hpswwwusenixorgconferencensdi17technical-sessionspresentationzhang

[41] Q Zhang X Zhang Q Zhang W Shi and H Zhong 2016 Firework Big DataSharing and Processing in Collaborative Edge Environment In 2016 Fourth IEEEWorkshop on Hot Topics in Web Systems and Technologies (HotWeb) 20ndash25 DOIhpsdoiorg101109HotWeb201612

[42] Tan Zhang Aakanksha Chowdhery Paramvir Victor Bahl Kyle Jamieson andSuman Banerjee 2015 e design and implementation of a wireless videosurveillance system In Proceedings of the 21st Annual International Conferenceon Mobile Computing and Networking ACM 426ndash438

  • Abstract
  • 1 Introduction
  • 2 Background and Motivation
    • 21 Edge Computing Network
    • 22 Serverless Architecture
    • 23 Video Edge Analytics for Public Safety
      • 3 LAVEA System Design
        • 31 Design Goals
        • 32 System Overview
        • 33 Edge Computing Services
          • 4 Edge-front Offloading
            • 41 Task Offloading System Model and Problem Formulation
            • 42 Optimization Solver
            • 43 Prioritizing Edge Task Queue
            • 44 Workload Optimizer
              • 5 Inter-edge Collaboration
                • 51 Motivation and Challenges
                • 52 Inter-Edge Task Placement Schemes
                  • 6 System Implementation and Performance Evaluation
                    • 61 Implementation Details
                    • 62 Evaluation Setup
                    • 63 Task Profiler
                    • 64 Offloading Task Selection
                    • 65 Edge-front Task Queue Prioritizing
                    • 66 Inter-Edge Collaboration
                      • 7 Related Work
                        • 71 Distributed Data Processing
                        • 72 Computation Offloading
                          • 8 Discussions and Limitations
                          • 9 Conclusion
                          • References
Page 2: LAVEA: Latency-aware Video Analytics on Edge Computing Platformweisongshi.org/papers/yi17-LAVEA.pdf · 2020. 8. 19. · LAVEA: Latency-aware Video Analytics on Edge Computing Platform

SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA S Yi et al

object tracking object detection object recognition face and opti-cal character recognition (OCR) are either computation intensiveor bandwidth hungry In addressing these problems mobile cloudcomputing (MCC) is proposed to run heavy tasks on resource richcloud node to improve response time or energy cost is techniqueutilizes both the mobile and cloud for computation An appropriatepartition of tasks that makes trade-o between local and remoteexecution can speed up the computation and preserve mobile en-ergy at the same time [7 13 15 21 31] However there are stillconcerns of cloud about the limited bandwidth the unpredictablelatency and the abrupt service outage Existing work has exploredadding intermediate servers (cloudlets) between mobile client andthe cloud Cloudlet is an early implementation of the cloud-likeedge computing platform with virtual machine (VM) techniquese edge computing platform in our work has a dierent designon top of lightweight OS-level virtualization which is modular ndasheasy to deploy manage and scale Compared to VM the OS-levelvirtualization provides resource isolation in a much lower cost eadoption of container technique leads to a server-less platformwhere the end user can deploy and enable edge computing platformon heterogeneous devices with minimal eorts e user programs(scripts or executable binaries) will be encapsulated in containerswhich provide resource isolation self-contained packaging any-where deploy and easy-to-congure clustering e end user onlyneeds to register events of interest and provide corresponding han-dler functions to our system which automatically handle the eventsbehind the scene

In this paper we are considering a 3-tier mobile-edge-cloud de-ployment and we put most of our eorts into the mobile-edge sideand inter-edge side design To demonstrate the eectiveness of ouredge computing platform we have built the Latency-Aware VideoEdge Analytics (LAVEA) system We divide the response time min-imization problem into three sub-problems First we select clienttasks that benet from being ooaded to edge node in reducingtime cost We formulated this problem as a mathematical optimiza-tion problem to choose ooading tasks and allocate bandwidthamong clients Unlike existing work in mobile cloud computingwe cannot make the assumption that edge node is as powerful ascloud node which can process all the tasks instantly ereforewe consider the increasing resource contention and response timewhen more and more tasks are running on edge node by addinglatency constraints to the optimization problem Second upon re-ceiving ooading task requests at each epoch the edge node runsthese tasks in an order to minimize the makespan However theooaded tasks cannot be started when the corresponding inputsare not ready To address this problem we employed a classic two-stage job shop model and adapted Johnsonrsquos rule with topologicalordering constraint in a heuristic to prioritize the tasks Last weenable inter-edge collaboration leveraging nearby edge nodes to re-duce the overall task completion time We have investigated severaltask placement schemes that are tailored for inter-edge collabo-ration e ndings provided us insights that lead to an ecientprediction-based task placement scheme

In summary we make the following contributions

bull We have designed an edge computing platform based ona server-less architecture which is able to provide exi-ble computation ooading to nearby clients to speed upcomputation-intensive and delay-sensitive applicationsOur implementation is lightweight-virtualized event-basedmodular and easy to deploy and manage on either edge orcloud nodes

bull We have formulated an optimization problem for ooad-ing task selection and prioritized ooading requests tominimize the response time e task selection problemco-optimizes the ooading decision and bandwidth alloca-tion and is constrained by the latency requirement whichcan be tuned to adapt to the workload on edge node forooading e task prioritizing is modeled as a two-stagejob shop problem and a heuristic is proposed with thetopological ordering constraint

bull We have evaluated several task placement schemes forinter-edge collaboration and proposed a predication-basedmethod which eciently estimates the response time

2 BACKGROUND AND MOTIVATIONIn this section we briey introduce the background of edge com-puting and relevant techniques present our observations frompreliminary measurements and discuss the scenarios that motivateus

WiFi BSCloudEdge Computing

Server

WAN

WiFi BS

Edge Computing Server

LAN

WiFi BS

Server

WiFi BSServer

Figure 1 An overview of edge computing environment

21 Edge Computing NetworkIn the paper we consider an edge computing network as shownin Figure 1 in which we focus on two types of nodes the clientnode (in this paper we call it client for short) and the edge servernode (in this paper we call it edge edge node or edge server forshort) We assume that clients are one-hop away from edge servervia wire or wireless links When a client connects to the edgenode we implicitly indicate that the client will rst connect to the

LAVEA Latency-aware Video Analytics on Edge Computing Platform SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA

correspond access points (APs) using cable or wireless and thenutilize the services provided by the co-located edge node In asparse edge node deployment a client will only connect to oneof the available edge nodes nearby at certain location While in adense deployment a client may have multiple choices on selectingthe multiple edge servers for services Implicitly we assume thatthere is a remote cloud node which can be reached via the widearea network (WAN)

To understand the factors that impact the feasibility of realiz-ing practical edge computing systems we have performed severalpreliminary measurements on existing network and shown theresults in Fig 2 and Fig 3 In these experiments we measured thelatency and bandwidth of combinations between clients nodes withdierent network interfaces connecting to edge (or cloud) nodesBased on the measurements of bandwidth all clients have benetsin utilizing a wire-connected or advanced-wireless (80211ac 5Ghz)edge computing node In terms of latency wire-connected edgenodes is the best while the 5Ghz wireless edge computing nodeshave larger means and variances in latency compared to the cloudnode in the closest region due to the intrinsic nature of wirelesschannels erefore in this paper we pragmatically assume thatedge nodes are connected to APs via cables to deliver services withbeer latency and bandwidth than the cloud erefore in sucha setup the cloud node can be considered as a backup computingnode which will be utilized only when the edge node is saturatedand experiences a long response time

wired edge

WiFi 5Gedge

WiFi 24Ghzedge

ec2east

ec2west

050

100

150

200

RTT (

ms)

wired clientWiFi 24GHzWiFi 5GHz

Figure 2 Round trip time between client and edgecloud

22 Serverless ArchitectureServerless architecture or Function as a Service (FaaS) such asAWS Lambda Google Cloud Function Azure Functions is an agilesolution for developer to build cloud computing services withoutthe heave liing of managing cloud instances To use AWS Lambdaas an example AWS Lambda is a event-based micro-service frame-work in which a user-supplied Lambda function as the applicationlogic will be executed in response to corresponding event e AWScloud will take care of the provisioning and resource managementfor running Lambda functions At the rst time a Lambda functionis created a container will be built and launched based on the con-gurations provided Each container will also be provided a small

wired edge

WiFi 5Gedge

WiFi 24Ghzedge

ec2east

ec2west

05

0100

Bandw

idth

(M

bps)

wired clientWiFi 24GHzWiFi 5GHz

Figure 3 Bandwidth between client and edgecloud

disk space as transient cache during multiple invocations AWShas its own way to run Lambda functions with either reusing anexisting container or creating a new one Recently there is AWSLambdaEdge [1] that allows using serverless functions at theAWS edge location in response to CDN event to apply moderatecomputations We strongly advocate the adoption of serverless ar-chitecture at the edge computing layers as serverless architecturenaturally solves two important problems for edge computing 1)serverless programming model greatly reduces the burden on usersor developers in developing deploying and managing edge appli-cations as there is no need to understand the complex underlyingprocedures to run the applications or heavy liing of distributed sys-tem management 2) the functions are exible to run on either edgeor cloud which lowers the barrier of edge-cloud inter-operatabilityand federation Recent works have shown the potentials of sucharchitecture in low latency video processing tasks [11] and dis-tributed computing tasks [20] and there have been research eortsof incorporating serverless architecture in edge computing [8]

23 Video Edge Analytics for Public SafetyVideo surveillance is of great importance for public safety Besidesthe ldquoAmber Alertrdquo example there are many other applications inthis eld For example secure cameras deployed at public places(eg the airport) can quickly spot unaended bags [42] policewith body-worn cameras can identify suspects and suspicious vehi-cles during approaching and so on Because those scenarios areurgent and critical the applications need to provide the quickestresponses with best eorts However most tasks in video analyticsare undoubtedly computationally intensive [26] While runningon resource constrained mobile clients or IoT devices directly thelatency in computation baery drain (if baery-powered) or evenheat dissipation will eventually ruin the user experience failing toachieve the performance goals of the applications If running oncloud nodes transferring large volume of multimedia data will incurunacceptable transmission latency and additional bandwidth costBeing proposed as a dedicated solution the deployment of edgecomputing platform enables the quickest responses to these videoanalytics tasks which require both low latency and high bandwidth

In this paper we mainly focus on building video edge analyticsplatform and we demonstrate our platform using the application

SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA S Yi et al

of Automated License Plate Recognition (ALPR) Even though weintegrate specic application our edge platform is a general designand can be extended for other application with lile modicationsAn ALPR system usually has four stages 1) image acquisition 2)license plate extraction 3) license plate analysis and 4) characterrecognition [2 10] Each of the stages involves various computervision paern recognition and machine learning algorithms Mi-grating the execution of some algorithms to powerful edgecloudnode can signicantly reduce the response time [34] However of-oaded tasks require intermediate data application state variablesand corresponding congurations to be uploaded Some of the algo-rithms produce large amount of intermediate data will add delay tothe whole processing time if ooaded to remote cloud We believethat a carefully designed edge computing platform will assist ALPRsystem to expand on more resource-constrained devices at morelocations and provide beer response time at the same time

3 LAVEA SYSTEM DESIGNIn this section we present our system design First we will discussour design goals en we will overview our system design andintroduce several important edge computing services

31 Design Goalsbull Latency e ability to provide low latency services is

recognized as one of the essential requirements of edgecomputing system design

bull Flexibility Edge computing system should be able to ex-ibility utilize the hierarchical resources from client nodesnearby edge nodes and remote cloud nodes

bull Edge-rst By edge-rst we mean that the edge comput-ing platform is the rst choice of our computation ooad-ing target

32 System OverviewLAVEA is intrinsically an edge computing platform which supportslow-latency video processing e main components are edge com-puting node and edge client Whenever a client is running tasks andthe nearby edge computing node is available a task can be decidedto run either locally or remotely We present the architecture ofour edge computing platform in Figure 4

321 Edge Computing Node In LAVEA the edge computingnode provides edge computing services to themobile devices nearbye edge computing node aached to the same access point or basestation as clients is called the edge-front By deploying edge com-puting node with access point or base station we ensure that edgecomputing service can be as ubiquitous as Internet access Multipleedge computing nodes can collaborate and the edge-front will al-ways serve as the master and be in charge of the coordination withother edge nodes and cloud nodes As shown in Figure 4 we usethe light-weight virtualization technique to provide resource allo-cation and isolation to dierent clients Any client can submit tasksto the platform via client APIs e platform will be responsiblefor shaping workload managing queue priorities and schedulingtasks ose functions are implemented via internal APIs providedby multiple micro-services such as queueing service scheduling

service data store service etc We will introduce several importantservices later in this section

322 Edge Client Since most edge clients are either resourceconstrained devices or need to accommodate requests from a largenumber of clients an edge client usually runs lightweight dataprocessing tasks locally and ooads heavy tasks to edge computingnode nearby In LAVEA the edge client has a thin client design tomake sure all the clients can run it without introducing too muchoverhead For low-end devices there is only one worker to makeprogress on the assigned job e most important part of clientnode design is the proler and the ooading controller acting asparticipants in the corresponding proler service and ooadingservice With proler and ooading controller a client can provideooading information to the edge-front node and fulll ooadingdecision received

33 Edge Computing Services331 Profiler Service Similar to [7 21 31] our system uses a

proler to collect task performance information on various devicessince it is dicult to derive an analytic model to accurately capturethe behavior of the whole system However we have found thatthe execution of video process tasks is relatively stable (when inputand algorithmic congurations are given) and a proler can be usedto collect relevant metrics erefore we add a proling phaseto the deployment of every new type of client devices and edgedevices e proler will execute instrumented tasks multiple timeswith dierent inputs and congurations on the device and measuremetrics including but not limited to execution time inputoutputdata size etc e time-stamped logs will be gathered to buildthe task execution graph for specic tasks inputs congurationsand devices e proler service will collect those information onwhich LAVEA relies for ooading decisions

332 Monitoring Service Unlike proler service which gath-ers pre-run-time execution information on pre-dened inputs andcongurations the monitoring service is used to continuously mon-itoring and collect run-time information such as the network sys-tem load from not only the clients but also nearby edge nodesMonitoring the network between client and edge-front is neces-sary since most edge clients are connected to edge-front servervia wireless link e condition of wireless link is changing fromtime to time erefore we need to constantly monitor the wirelesslink to estimate the bandwidth and the latency Monitoring systemload on the edge client provides exible workload shaping andtask ooading from client to the edge is information is alsobroadcasted among nearby edge nodes When an edge-front nodeis saturated or unstable some tasks will be assigned to nearby edgenodes according to the system load the network bandwidth andnetwork delay between edge nodes as long as there is still benetinstead of assigning tasks to cloud node

333 Oloading Service e ooading controller will tracktasks running locally at the client exchange information with theooading service running on the edge-front server e variablesgather in proler and monitoring services will be used to as in-puts to the ooading decision problem which is formulated as anoptimization problem to minimize the response time Every time

LAVEA Latency-aware Video Analytics on Edge Computing Platform SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA

Host or Host Cluster Hardware

Host OS

Container

Container Manager (Docker Engine)

HDFS SQL KV Store

Data Store Service

Offloading Service

Queueing Service

Scheduling Service

Edge Front Gateway

Worker

Task Queue

Container Container Container Container

MonitoringService

Worker Worker

Task Scheduler

Worker Worker Worker

Worker Worker Worker

Producer

Workload Optimizer

Producer

Graph

Queue Prioritizer

Task Worker

ProfilerService

Edge Computing Platform API

Platform Internal API

OS-level Virtualization

Edge Computing Platform SDKEdge Computing Platform Client API

Application

ProfilerOffloading Controller

Worker Worker Worker

Task Scheduler

Local Worker Stack

OS or Container

Edge Computing Node

Access Potint

Security Camera

Dash Camera Smartphone and Tablet

Laptop

Figure 4 e architecture of edge computing platform

when a new client registers itself to the ooading services aerthe edge-front node collects enough prerequisite information andstatistics the optimization problem is solved again and the updatedooading decisions will be sent to all the clients Periodically theooading service also solves the optimization problem and updateooading decisions with its clients

4 EDGE-FRONT OFFLOADINGIn this section we describe how we select tasks of a job to runremotely on the edge server in order to minimize the responsetime

We consider selecting tasks to run on the edge as a computationooading problem Traditional ooading problems are ooadingschemes between clients and remote powerful cloud servers Inliterature [7 21 31] those system models usually assume the taskwill be instantly nished remotely once the task is ooaded to theserver However we argue that this assumption will not hold inedge computing environment as we need to consider the variousdelays at the server side especially when lots of clients are sendingooading requests We call it edge-front computation ooadingfrom the perspective of client

bull Tasks will be only by ooaded from client to the nearestedge node which we call the edge front

bull e underlying scheduling and processing is agnostic toclients

bull When a mobile node is disconnected from any edge nodeor even cloud node it will resort to local execution of allthe tasks

We assume that edge node is wire-connected to the access pointwhich indicates that the out-going trac can go through edge nodewith no additional cost e only dierence between ooading taskto edge node and cloud node is that the task running on edge nodemay experience resource contention and scheduling delay whilewe assume task ooaded to cloud node will get enough resourceand be scheduled to run immediately In light work load case ifthere is any response time reduction when this task is ooadedto cloud then we know that there is denitely benet when thistask is ooaded to the edge e reasons are 1) an edge server isas responsive as the server in the cloud data center 2) running atask on edge server experiences shorter data transmission delay asclient-edge link has much larger bandwidth than edge-cloud linkwhich is usually limited and imbalanced by the Internet serviceproviders (ISPs) erefore in this section we focus on the taskooading only between client and edge server and we will discussintegrating nearby edge nodes for the heavy work load scenario inthe next section

41 Task Oloading System Model andProblem Formulation

In the paper we call a running instance of the application as a jobwhich is usually a set of tasks e job is the unit of work thatuser submits to our system while the task is the unit of work forour system to make scheduling and optimization decisions esetasks generated from each application will be queued and processedeither locally or remotely By remotely we mean run the task on anedge node For simplicity we consider that all clients are running

SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA S Yi et al

instances of applications processing same kind of jobs which istypically the case in our edge application scenario However oursystem can easily extend to heterogeneous applications running oneach client device

We choose to work on the granularity of task since those tasksare modularized and can be exibly combined to either achievea speed processing or form a workow with high accuracy Inour ALPR application each task is usually a common algorithmor library of computer vision For example We have analyzed anopen source ALPR project called OpenALPR [22] and illustrate itstask graph in Fig 5

Input

MotionDetection

PlateDetection

PlateCharacterAnalysis

image

video frame

motionregion

Output

imagestill

CharacterRecognition

no plate detected

plate candidate

ResultGeneration

PlateCharacterAnalysis

CharacterRecognition

PlateCharacterAnalysis

CharacterRecognition

plate candidate

Figure 5 e task graph of OpenALPR

en we consider there are N clients and only one edge serverconnected as shown in Fig 1 is edge server could be a singleserver or a cluster of servers Each client i i isin [1N ] will processthe upcoming job upon request eg recognizing the license platesin video streams Usually those jobs will generate heavy compu-tation task and could benet from ooading some of them to theedge server Without loss of generality we use a graph of task torepresent the complex task relations inside a job which is essen-tially similar to the method call graph in [7] but in a more coarsegranularity For a certain kind of job we start with its directedacyclic graph (DAG) G = (V E) which gives the task executionsequence Each vertex v isin V weight is the computation or memorycost of a task (cv ) while each edge e = (uv)uv isin V e isin E weightrepresents the data size of intermediate results (duv ) us ourooading problem can be taken as a graph partition problem inwhich we need to assign a directed graph of tasks to dierent com-puting nodes (local edge or cloud) with the purpose to minimize

certain cost In this paper we primarily try to minimize the jobnish time

e remote response time includes the communication delay thenetwork transmission delay of sending data to the edge server andthe execution time on that server We use an indicator Ivi isin 0 1for all v in V and for all i isin [1N ] If Ivi = 1 then the task v atclient i will run locally otherwise it will run on the remote edgeserver For those tasks running locally the total execution time forclient i is a sum as

T locali =sumv isinV

Ivicvpi (1)

where pi is the processor speed of client i Similarly we use

Tlocali =

sumv isinV(1 minus Ivi )cvpi (2)

to represent the execution time of running the ooaded taskslocally For network when there is an ooading decision theclient need to upload the intermediate data (outputs of previoustask application status variables congurations etc) to the edgeserver in order to continue the computation e network delay ismodeled as

Tneti =sum(uv)isinE

(Iui minus Ivi )duvri + βi (3)

where ri is the bandwidth assigned for this client and βi is thecommunication latency which can be estimated using round triptime between the client i and the edge server

For each client the remote execution time is

T r emotei =

sumv isinV(1 minus Ivi )(cvp0) (4)

where p0 is the processor speed of the edge serveren our ooading task selection problem can be formulated as

minIi ri

Nsumi=1(T locali +Tneti +T r emote

i ) (5)

e ooading task selection is represented by the indicator matrixI is optimization problem is subject to the following constraints

bull e total bandwidth

stNsumi=1

ri le R (6)

bull Like existing work we restrict the data ow to avoid ping-pong eect in which intermediate data is transmied backand forth between client and edge serverst Ivi le Iui foralle(uv) isin Eforalli isin [1N ] (7)

bull Unlike exiting ooading frameworks for mobile cloud com-puting we take the resource contention or schedule delayat the edge side into consideration by adding a end-to-enddelay constraint

st Tlocali minus (Tneti +T r emote

i ) gt τ foralli isin [1N ] (8)where τ can be tuned to avoid selecting borderline tasksthat if ooaded will get no gain due to the resource con-tention or schedule delay at the edge

LAVEA Latency-aware Video Analytics on Edge Computing Platform SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA

42 Optimization Solvere proposed optimization is a mixed integer non-linear program-ming problem (MINLP) where the integer variable stands for theooading decision and the continuous variable stands for the band-width allocation To solve this optimization problem we start fromrelaxing the integer constraints and solve the non-linear program-ming version of the problem using Sequential adratic Program-ming method a constrained nonlinear optimization method issolution is optimal without considering the integer constraintsStarting from this optimal solution we optionally employ branchand bound (BampB) method to search for the optimal integer solutionor simply do an exhaustive search when the number of clients andthe number of tasks of each job are small

43 Prioritizing Edge Taskeuee ooading strategy produced by the task selection optimizesthe rdquoowldquo time of each type of job At each time epoch during therun time the edge-front node receives a large number of ooadedtasks from the clients Originally we follow the rst come rstserve rule to accommodate all the client requests For each requestat the head of the task queue the edge-front server rst checksif the input or intermediate data (eg images or videos) availableat the edge otherwise the server waits is scheme is easy toimplement but substantial computation is wasted if the networkIO is busy with a large size le and there is no task that is readyfor processing erefore we improve the task scheduling witha task queue prioritizer to maintain a task sequence which mini-mizes the makespan for the task scheduling of all ooading taskrequests received at a certain time epoch Since the edge node canexecute the task only when the input data has been fully receivedor the depended tasks have nished execution We consider thatan ooaded task has to go through two stages the rst stage isthe retrieval of input or intermediate data and state variables thesecond stage is the execution of the task

We model our scheduling problem using the ow job shop modeland apply the Johnsonrsquos rule [19] is scheme is optimal and themakespan is minimized when the number of stages is two Nev-ertheless this model only ts in the case that all submied jobrequests are independent and no priorities When consideringtask dependencies a successor can only start aer its predecessornishes By enforcing the topological ordering constraints theproblem can be solved optimally using the BampB method [5] How-ever this solution hardly scales against the number of tasks Inthis case we adapt the method in [3] and group tasks with depen-dencies and execute all tasks in a group sequentially e basicidea is applying Johnsonrsquos rule in two levels e rst level is todecide the sequence of tasks with in each group e dierence inour problem is that we need to decide the best sequence amongall valid topological orderings en the boom level is a job shopscheduling problem in terms of grouped jobs (ie a group of taskswith dependencies in topological ordering) in which we can utilizeJohnsonrsquos rule directly

44 Workload OptimizerIf the workload is overwhelming and the edge-front server is satu-rated the task queue will be unstable and the response time will

be accumulated indenitely ere are several measures we cantake to address this problem First we can adjust the imagevideoresolution in client-side congurations which makes well trade-o between speed and accuracy Second by constraining the taskooading problem we can restrain more computation tasks at theclient side ird if there are nearby edge nodes which are favoredin terms of latency bandwidth and computation we can furtherooad tasks to nearby edge nodes We have investigated this casewith performance improvement considerations in the Section 5Last we can always redirect tasks to the remote cloud just likeooading in MCC

5 INTER-EDGE COLLABORATIONIn this section We improve our edge-rst design by taking the casewhen the incoming workload saturates our edge-front node intoconsideration We will rst discuss our motivation of providingsuch option and list the corresponding challenges en we willintroduce several collaboration schemes we have proposed andinvestigated

51 Motivation and Challengese resources of edge computing node are much richer than clientnodes but are relatively limited compared to cloud nodes Whileserving an increasing number of client nodes nearby the edge-frontnode will be eventually overloaded and become non-responsive tonew requests As a baseline we can optionally choose to ooad fur-ther requests to the remote cloud We assume that the remote cloudhas unlimited resources and is capable to handle all the requestsHowever running tasks remotely in the cloud the application needto bear with unpredictable latency and limited bandwidth whichis not the best choice especially when there are other nearby edgenodes that can accommodate those tasks We assume that underthe condition when all available edge nodes nearby are exhaustedthe mobile-edge-cloud computing paradigm will simply fall back tothe mobile cloud computing paradigm e fallback design is notin the scope of this paper In this paper we mainly investigate theinter-edge collaboration with the prime purpose to alleviate theburden on edge-front node

When the edge-front node is saturated with requests it cancollaborate with nearby edge nodes by placing some tasks to thesenot-so-busy edge nodes such that all the tasks can get scheduledin a reasonable time is is slightly dierent from balancing theworkload among the edge nodes and the edge-front node in that thegoal of inter-edge collaboration is to beer serve the client nodeswith submied requests rather than simply making the workloadbalanced For example an edge-front node that is not overloadeddoes not need to place any tasks to the nearby edge nodes evenwhen they are idle

e challenges of inter-edge collaboration are two-fold 1) weneed to design a proper inter-edge task placement scheme thatfullls our goal of reducing the workload on the edge-front nodewhile ooading proper amount of workload to the qualied edgenodes 2) the task placement scheme should be lightweight scalableand easy-to-implement

SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA S Yi et al

52 Inter-Edge Task Placement SchemesWe have investigated three task placement schemes for inter-edgecollaboration

bull Shortest Transmission Time First (STTF)bull Shortest eue Length First (SQLF)bull Shortest Scheduling Latency First (SSLF)

e STTF task placement scheme tends to place tasks on theedge node that has the shortest estimated latency for the edge-front node to transfer the tasks e edge-front node maintains atable to record the latency of transmiing data to each availableedge node e periodical re-calibration is necessary because thenetwork condition between the edge-front node and other edgenodes may vary from time to time

e SQLF task placement scheme on the other hand tends totransfer tasks from the edge-front node to the edge node which hasthe least number of tasks queued upon the time of query Whenthe edge-front node is saturated with requests it will rst queryall the available edge nodes about their current task queue lengthand then transfer tasks to the edge node that has the shortest valuereported

e SSLF task placement scheme tends to transmit tasks from theedge-front node to the edge node that is predicted to have the short-est response time e response time is the time interval betweenthe time when the edge-front node submits a task to an availableedge node and the time when it receives the result of the task fromthat edge node Unlike the SQLF task placement scheme the edge-front node keeps querying the edge nodes about the queue lengthwhich may has performance issue when the number of nodes scalesup and results in a large volume of queries We have designed anovel method for the edge-front node to measure the schedulinglatency eciently During the measurement phase before edge-front node chooses task placement target edge-front node sendsa request message to each available edge node which appends aspecial task to the tail of the task queue When the special taskis executed the edge node simply sends a response message tothe edge-front node e edge-front node receives the responsemessage and records the response time Periodically the edge-frontnode maintains a series of response times for each available edgenode When the edge-front node is saturated it will start to reassigntasks to the edge node having the shortest response time Unlikethe STTF and SQLF task assignment schemes which choose thetarget edge node based on the current or most recent measurementsthe SSLF scheme predicts the current response time for each edgenode by applying regression analysis to the response time seriesrecorded so far e reason is that the edge nodes are also receivingtask requests from client nodes and their local workload may varyfrom time to time so the most recent response time cannot serveas a good predictor of the current response time for the edge nodesAs the local workload in the real world on each edge node usuallyfollows certain paern or trend applying regression analysis tothe recorded response times is a good way to estimate the currentresponse time To this end we recorded measurements of responsetimes from each edge node and ooads tasks to the edge nodethat is predicted to have the least current response time Once theedge-front node starts to place task to a certain edge node the

estimation will be updated using piggybacking of the redirectedtasks which lowers the overhead of measuring

Each of the task placement schemes described above has someadvantages and disadvantages For instance the STTF scheme canquickly reduce the workload on the edge-front node But there isa chance that tasks may be placed to an edge node which alreadyhas intensive workload as STTF scheme gathers no information ofthe workload on the target e SQLF scheme works well when thenetwork latency and bandwidth are stable among all the availableedge nodes When the network overheads are highly variant thisscheme fails to factor the network condition and always choosesedge node with the lowest workload When an intensive workloadis placed under a high network overhead this scheme potentiallydeteriorates the performance as it needs to measure the workloadfrequently e SSLF task placement scheme estimates the responsetime of each edge node by following the task-ooading processand the response time is a good indicator of which edge node shouldbe chosen as the target of task placement in terms of the workloadand network overhead e SSLF scheme is a well trade-o betweenprevious two schemes However the regression analysis may intro-duce a large error to the predicted response time if inappropriatemodels are selected We believe that the decision of which taskplacement scheme should be employed for achieving good systemperformance should always give proper considerations on the work-load and network conditions We evaluated those three schemesthrough a case study in the next section

6 SYSTEM IMPLEMENTATION ANDPERFORMANCE EVALUATION

In this section we rst brief the implementation details in buildingour system Next we introduce our evaluation setup and presentresults of our evaluations

61 Implementation DetailsOur implementation aims at a serverless edge architecture Asshown in system architecture of Fig 4 our implementation is basedon docker container for the benets of quick deployment and easymanagement Every component has been dockerized and its de-ployment is greatly simplied via distributing pre-built images ecreation and destruction of docker instances is much faster thanthat of VM instances Inspired by the IBM OpenWhisk [18] eachworker container contains an action proxy which uses Pythonto run any scripts or compile and execute any binary executablee worker container communicates with others using a messagequeue as all the inputsoutputs will be jsonied However we donrsquotjsonied imagevideo and use its path reference in shared storagee task queue is implemented using Redis as it is in memory andhas very good performance e end user only needs to 1) deployour edge computing platform on heterogeneous devices with justa click 2) dene the event of interests using provided API 3) pro-vide a function (scripts or binary executable) to process such evente function we have implemented is using open source projectOpenALPR[22] as the task payload for workers

LAVEA Latency-aware Video Analytics on Edge Computing Platform SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA

62 Evaluation Setup621 Testbed We have built a testbed consisting of four edge

computing nodes One of the edge nodes is the edge-front nodewhich is directly connected to a wireless router using a cable Otherthree nodes are set as nearby edge computing nodes for the evalua-tion of inter-edge collaboration ese four machines have the samehardware specications ey all have a quad-core CPU and 4 GBmain memory e three nearby edge nodes are directly connectedto the edge-front node through a network cable We make use oftwo types of Raspberry Pi (RPi) nodes as clients one type is RPi 2which is wired to the router while the other type is RPi 3 which isconnected to router using built-in 24 Ghz WiFi

622 Datasets We have employed three datasets for evalua-tions One dataset is the Caltech Vision Group 2001 testing databasein which the car rear image resolution (126 images with resolution896x592) is adequate for license plate recognition [25] Anotherdataset is a self-collected 4K video contains rear license plates takenon an Android smartphone and is converted into videos of dierentresolutions(640x480 960x720 1280x960 and 1600x1200) e otherdataset used in inter-edge collaboration evaluation contains 22 carimages with the various resolution ranging from 405x540 pixelsto 2514x1210 pixels (le size 316 KB to 285 MB) e task requestsuse the car images as input in a round-robin way one car imagefor each task request

63 Task ProlerBeside the round trip time and bandwidth benchmark we havepresented in Fig 2 and Fig 3 to characterize the edge computingnetwork we have done proling of the OpenALPR application onvarious client edge and cloud nodes

896x592workload1

640x480workload2

960x720workload2

1280x960workload2

1600x1200workload2

0200

400

600

800

Clie

nt

RPi2

Execu

tion T

ime(m

s)

MotionDetectonPlateDetectionPlateAnalysisOCR

Figure 6 OpenALPR prole result of client type 1 (RPi2quad-core 09GHz)

In this experiment we are using both dataset 1 (workload 1)and dataset 2 (workload 2) at various resolutions e executiontime for each tasks are shown in Fig 6 Fig 7 Fig 8 and Fig 9e results indicate that by utilizing an edge nodes we can geta comparable amount of computation power close to clients forcomputation-intensive tasks Another observations is that dueto the uneven optimizations on heterogeneous CPU architecturessome tasks are beer to keep local while some others should be

896x592workload1

640x480workload2

960x720workload2

1280x960workload2

1600x1200workload2

020

04

00

60

0800

Clie

nt

RPi3

Execu

tion T

ime(m

s)

MotionDetectonPlateDetectionPlateAnalysisOCR

Figure 7 OpenALPR prole result of client type 2 (RPi3quad-core 12GHz)

896x592workload1

640x480workload2

960x720workload2

1280x960workload2

1600x1200workload2

0200

400

600

800

Edge E

xecu

tion T

ime(m

s)

MotionDetectonPlateDetectionPlateAnalysisOCR

Figure 8 OpenALPR prole result of a type of edge node (i7quad-core 230GHz)

896x592workload1

640x480workload2

960x720workload2

1280x960workload2

1600x1200workload2

0200

400

600

800

EC

2 T

2 L

arg

e E

xecu

tion T

ime(m

s)

MotionDetectonPlateDetectionPlateAnalysisOCR

Figure 9 OpenALPR prole of a type of cloud node (AWSEC2 t2large Xeon dual-core 240GHz)

ooaded to edge computing node is observation justied theneed of computation ooading between clients and edge nodes

SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA S Yi et al

640x480workload2

960x720workload2

1280x960workload2

1600x1200workload2

01

23

4R

esp

onse

Tim

e p

er

fram

e p

er

clie

nt(

s) Client-edge optClient onlyEdge onlyClient-cloud optCloud only

Figure 10 e comparison of task selection impacts on edgeoloading and cloud oloading for wired clients (RPi2)

640x480workload2

960x720workload2

1280x960workload2

1600x1200workload2

01

23

4R

esp

onse

Tim

e p

er

fram

e p

er

clie

nt(

s) Client-edge optClient onlyEdge onlyClient-cloud optCloud only

Figure 11 e comparison of task selection impacts on edgeoloading and cloud oloading for 24 Ghz wireless clients(RPi3)

64 Oloading Task SelectionTo understand how much the execution time can be reduced byspliing tasks between the client and the edge or between theclient and the cloud we design an experiment with workloadsgenerated from dataset 2 on two setups of scenarios 1) one edgenode provides service to three wired client nodes that have thebest network latency and bandwidth 2) one edge node providesservice to three wireless 24 Ghz client nodes that have latencywith high variance and relatively low bandwidth e result ofthe rst case is very straightforward the clients simply uploadall the input data and run all the tasks on the edge node in edgeooading or cloud node in cloud ooading as shown in Fig 10is is mainly because using Ethernet cable can stably providelowest latency and highest bandwidth which makes ooading toedge very rewarding We didnrsquot evaluate 5 Ghz wireless client sincethis interface is not supported on our client hardware while weanticipate similar results as the wire case We plot the result of a24 Ghz wireless client node with ooading to an edge node or aremote cloud node in the second case in Fig 11 Overall the resultsshowed that by ooading tasks to an edge computing platform

the application we choose experienced a speedup up to 40x onwired client-edge conguration compared to local execution andup to 17x compared to a similar client-cloud conguration Forclients with 24 Ghz wireless interface the speedup is up to 13x onclient-edge conguration compared to local execution and is up to12x compared to similar client-cloud conguration

5 10 15 20 25 30 35Number of task offloading requests

05

Resp

onse

Tim

e(s

)

Our schemeSIOFLCPUL

Figure 12 e comparison result of three task prioritizingschemes

65 Edge-front Taskeue PrioritizingTo evaluate the performance of the task queue prioritizing wecollect the statistical results from our proler service and moni-toring service on various workload for simulation We choose thesimulation method because we can freely setup the numbers andtypes of client and edge nodes to overcome the limitation of ourcurrent testbed to evaluate more complex deployments We addtwo simple schemes as baselines 1) shortest IO rst (SIOF) sortingall the tasks against the time cost of the network transmission 2)longest CPU last (LCPUL) sorting all the tasks against the time costof the processing on the edge node In the simulation base on thecombination of client device types workloads and ooading deci-sions we have in total seven types of jobs to run on the edge nodeWe increase the total number of jobs and evenly distributed themamong the seven types and report the makespan time in Fig 12 eresult showed that LCPUL is the worst among those three schemesand our scheme outperforms the shortest job rst scheme

66 Inter-Edge CollaborationWe also evaluate the performance of the three task placementschemes (ie STTF SQLF and SSLF) discussed in Section 5 througha controlled experiment on our testbed For evaluation purpose wecongure the network in the edge computing system as followse rst edge node denoted as ldquoedge node 1rdquo aerwards has 10ms RTT and 40 Mbps bandwidth to the edge-front node e secondedge node ldquoedge node 2rdquo has 20 ms RTT and 20 Mbps bandwidthto the edge-front node e third edge node ldquoedge node 3rdquo has100 ms RTT and 2 Mbps bandwidth to the edge-front node uswe emulate the situation where three edge nodes are in the distanceto the edge-front node from near to far

LAVEA Latency-aware Video Analytics on Edge Computing Platform SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA

0

05

1

15

2

25

3

0 2 4 6 8 10 12

Th

rou

gh

pu

t (t

ask

sse

c)

Time (min)

Edge-front nodeEdge node 1Edge node 2Edge node 3

Figure 13 Performance with no task placement scheme

0

05

1

15

2

25

3

0 2 4 6 8 10 12

Th

rou

gh

pu

t (t

ask

sse

c)

Time (min)

Edge-front nodeEdge node 1Edge node 2Edge node 3

Figure 14 Performance of STTF

0

05

1

15

2

25

3

0 2 4 6 8 10 12

Th

rou

gh

pu

t (t

ask

sse

c)

Time (min)

Edge-front nodeEdge node 1Edge node 2Edge node 3

Figure 15 Performance of SQLF

0

05

1

15

2

25

3

0 2 4 6 8 10 12

Th

rou

gh

pu

t (t

ask

sse

c)

Time (min)

Edge-front nodeEdge node 1Edge node 2Edge node 3

Figure 16 Performance of SSLF

We use the third dataset to synthesize a workload as follows Inthe rst 4 minutes the edge-front node receives 5 task requests persecond edge node 1 receives 4 task requests per second edge node2 receives 3 task requests per second and edge node 3 receives 2task requests per second respectively No task comes to any of theedge nodes aer the rst 4 minutes For the SSLF task placementscheme we implement a simple linear regression to predict thescheduling latency of the task being transmied since the workloadwe have injected is uniform distributed

Fig 13 illustrates the throughput on each edge node when notask placement scheme is enabled on the edge-front node eedge-front node has the heaviest workload and it takes about 1236minutes to nish all the tasks We consider this result as our base-line

Fig 14 is the throughput result of STTF scheme In this case theedge-front node only transmits tasks to edge node 1 because edgenode 1 has the highest bandwidth and the shortest RTT to theedge-front node Fig 17 reveals that the edge-front node transmits120 tasks to edge node 1 and no task to other edge nodes Asedge node 1 has heavier workload than edge node 2 and edgenode 3 the STTF scheme has limited improvement on the systemperformance the edge-front node takes about 1129 minutes tonish all the tasks Fig 15 illustrates the throughput result of SQLF

scheme is scheme works beer than the STTF scheme becausethe edge-front node transmits more tasks to less-saturated edgenodes eciently reducing the workload on the edge-front nodeHowever the edge-front node intends to transmit many tasks toedge node 3 at the beginning which has the lowest bandwidth andthe longest RTT to the edge-front node As such the task placementmay incur more delay then expected From Fig 17 the edge-frontnode transmits 0 task to edge node 1 132 tasks to edge node 2and 152 tasks to edge node 3 e edge-front node takes about 96minutes to nish all the tasks

Fig 16 demonstrates the throughput result of SSLF scheme isscheme considers both the transmission time of the task beingplaced and the waiting time in the queue on the target edge nodeand therefore achieves the best performance of the three As men-tioned edge node 1 has the lowest transmission overhead but theheaviest workload among the three edge nodes while edge node3 has the lightest workload but the highest transmission overheadIn contrast edge node 2 has modest transmission overhead andmodest workload e SSLF scheme takes all these situations intoconsideration and places the most number of tasks on edge node2 As shown in Fig 17 the edge-front node transmits 4 tasks toedge node 1 152 tasks to edge node 2 and 148 tasks to edgenode 3 when working with the SSLF scheme e edge-front node

SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA S Yi et al

takes about 936 minutes to nish all the tasks which is the bestresult among the three schemes We infer that the third schemewill further improve the task complete time if more tough networkconditions and workloads are considered

STTF SQLF SSLF

050

10

015

0N

um

ber

of

task

palc

ed

0task

0task

0task

4tasks

Edge node 1Edge node 2Edge node 3

Figure 17 Numbers of tasks placed by the edge-front node

7 RELATEDWORKe emergence of edge computing has drawn aentions due to itscapabilities to reshape the land surface of IoTs mobile computingand cloud computing [6 14 32 33 36ndash38] Satyanarayanan [29]has briefed the origin of edge computing also known as fog comput-ing [4] cloudlet [28] mobile edge computing [24] and so on Herewe will review several relevant research elds towards video edgeanalytics including distributed data processing and computationooading in various computing paradigms

71 Distributed Data ProcessingDistributed data processing has close relationship to the edge an-alytics in the sense that those data processing platforms [9 39]and underlying techniques [16 23 27] can be easily deployed on acluster of edge nodes In this paper we pay specially aentions todistributed imagevideo data processing systems VideoStorm[40]made insightful observation on vision-related algorithms and pro-posed resource-quality trade-o with multi-dimensional congu-rations (eg video resolution frame rate sampling rate slidingwindow size etc) e resource-quality proles are generated of-ine and a online scheduler is built to allocate resources to queriesto optimize the utility of quality and latency eir work is comple-mentary to ours in that we do not consider the trade-o betweenquality and latency goals via adaptive congurations Vigil [42] isa wireless video surveillance system that leveraged edge comput-ing nodes with emphasis on the content-aware frame selections ina scenario where multiple web cameras are at the same locationto optimize the bandwidth utilization which is orthogonal to theproblems we have addressed here Firework [41] is a computingparadigm for big data processing in collaborative edge environmentwhich is complementary to our work in terms of shared data viewand programming interface

While there should be more on-going eorts for investigating theadaptation improvement and optimization of existing distributed

data processing techniques on edge computing platform we focusmore on the taskapplication-level queue management and sched-uling and leave all the underlying resource negotiating processscheduling to the container cluster engine

72 Computation OloadingComputation Ooading (aka Cyber foraging [28]) has been pro-posed to improve resource utilization response time and energyconsumption in various computing environments [7 13 21 31]Work [17] has quantied the impact of edge computing on mobileapplications and found that edge computing can improve responsetime and energy consumption signicantly for mobile devicesthrough ooading via both WiFi and LTE networks Mocha [34]has investigated how a two-stage face recognition task from mobiledevice can be accelerated by cloudlet and cloud In their designclients simply capture image and sends to cloudlet e optimaltask partition can be easily achieved as it has only two stages InLAVEA our application is more complicated in multiple stages andwe leverage client-edge ooading and other techniques to improvethe resource utilization and optimize the response time

8 DISCUSSIONS AND LIMITATIONSIn this section we will discuss alternative design options point outcurrent limitations and future work that can improve the system

Measurement-based Oloading In this paper we are using ameasurement-based ooading (static ooading) ie the ooadingdecisions are based on the outcome of periodic measurements Weconsider this as one of the limitations of our implementations asstated in [15] and there are several dynamic computation ooadingschemes have been proposed [12] We are planning to improve themeasurement-based ooading in the future work

Video Streaming Our current data processing is image-basedwhich is one of the limitations of our implementation e inputis either in the format of image or in video stream which is readinto frames and sent out We believe by utilizing existing videostreaming techniques in between our system components for datasharing will further improve the system performance and openmore potential opportunities for optimization

Discovering EdgeNodes ere are dierent ways for the edge-front node to discover the available edge nodes nearby For exampleevery edge node intending to serve as a collaborator may open adesignated port so that the edge-front node can periodically scanthe network and discover the available edge nodes is is calledthe ldquopull-basedrdquo method In contrast there is also a ldquopush-basedrdquomethod in which the edge-front node opens a designated port andevery edge node intending to serve as a collaborator will register tothe edge-front node When the network is in a large scale the pull-based method usually performs poorly because the edge-front nodemay not be able to discover an available edge node in a short timeFor this reason the edge node discovery is implemented in a push-based method which guarantees good performance regardless ofthe network scale

9 CONCLUSIONIn this paper we have investigated providing video analytic servicesto latency-sensitive applications in edge computing environment

LAVEA Latency-aware Video Analytics on Edge Computing Platform SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA

As a result we have built LAVEA a low-latency video edge analyticsystem which collaborates nearby client edge and remote cloudnodes and transfers video feeds into semantic information at placescloser to the users in early stages We have utilized an edge-frontdesign and formulated an optimization problem for ooading taskselection and prioritized task queue to minimize the response timeOur result indicated that by ooading tasks to the closest edgenode the client-edge conguration has speedup range from 13x to4x (12x to 17x) against running in local (client-cloud) under variousnetwork conditions and workloads In case of a saturating workloadon the front edge node we have proposed and compared varioustask placement schemes that are tailed for inter-edge collaboratione proposed prediction-based shortest scheduling latency rsttask placement scheme considers both the transmission time ofthe task and the waiting time in the queue and outputs overallperformance beer than the other schemes

REFERENCES[1] Amazon Web Service 2017 AWS LambdaEdge hpdocsawsamazoncom

lambdalatestdglambda-edgehtml (2017)[2] Christos-Nikolaos E Anagnostopoulos Ioannis E Anagnostopoulos Ioannis D

Psoroulas Vassili Loumos and Eleherios Kayafas 2008 License plate recog-nition from still images and video sequences A survey IEEE Transactions onintelligent transportation systems 9 3 (2008) 377ndash391

[3] KR Baker 1990 Scheduling groups of jobs in the two-machine ow shop Math-ematical and Computer Modelling 13 3 (1990) 29ndash36

[4] Flavio Bonomi Rodolfo Milito Jiang Zhu and Sateesh Addepalli 2012 Fogcomputing and its role in the internet of things In Proceedings of the rst editionof the MCC workshop on Mobile cloud computing ACM 13ndash16

[5] Peter Brucker Bernd Jurisch and Bernd Sievers 1994 A branch and boundalgorithm for the job-shop scheduling problem Discrete applied mathematics 491 (1994) 107ndash127

[6] Yu Cao Songqing Chen Peng Hou and Donald Brown 2015 FAST A fog com-puting assisted distributed analytics system to monitor fall for stroke mitigationIn Networking Architecture and Storage (NAS) 2015 IEEE International Conferenceon IEEE 2ndash11

[7] Eduardo Cuervo Aruna Balasubramanian Dae-ki Cho Alec Wolman StefanSaroiu Ranveer Chandra and Paramvir Bahl 2010 MAUI Making SmartphonesLast Longer with Code Ooad In Proceedings of the 8th International Conferenceon Mobile Systems Applications and Services (MobiSys rsquo10) ACM New York NYUSA 49ndash62 DOIhpsdoiorg10114518144331814441

[8] Eyal de Lara Carolina S Gomes Steve Langridge S Hossein Mortazavi andMeysam Roodi 2016 Hierarchical Serverless Computing for the Mobile EdgeIn Edge Computing (SEC) IEEEACM Symposium on IEEE 109ndash110

[9] Jerey Dean and Sanjay Ghemawat 2008 MapReduce simplied data processingon large clusters Commun ACM 51 1 (2008) 107ndash113

[10] Shan Du Mahmoud Ibrahim Mohamed Shehata and Wael Badawy 2013 Auto-matic license plate recognition (ALPR) A state-of-the-art review IEEE Transac-tions on circuits and systems for video technology 23 2 (2013) 311ndash325

[11] Sadjad Fouladi Riad S Wahby Brennan Shackle Karthikeyan Vasuki Balasubra-maniam William Zeng Rahul Bhalerao Anirudh Sivaraman George Porter andKeith Winstein 2017 Encoding Fast and Slow Low-Latency Video ProcessingUsing ousands of Tiny reads In 14th USENIX Symposium on NetworkedSystems Design and Implementation (NSDI 17) USENIX Association BostonMA 363ndash376 hpswwwusenixorgconferencensdi17technical-sessionspresentationfouladi

[12] Wei Gao Yong Li Haoyang Lu Ting Wang and Cong Liu 2014 On exploitingdynamic execution paerns for workload ooading in mobile cloud applicationsIn Network Protocols (ICNP) 2014 IEEE 22nd International Conference on IEEE1ndash12

[13] Mark S Gordon Davoud Anoushe Jamshidi Sco A Mahlke Zhuoqing Mor-ley Mao and Xu Chen 2012 COMET Code Ooad by Migrating ExecutionTransparently In OSDI Vol 12 93ndash106

[14] Zijiang Hao Ed Novak Shanhe Yi andn Li 2017 Challenges and SowareArchitecture for Fog Computing Internet Computing (2017)

[15] Mohammed A Hassan Kshitiz Bhaarai Qi Wei and Songqing Chen 2014POMAC Properly Ooading Mobile Applications to Clouds In 6th USENIXWorkshop on Hot Topics in Cloud Computing (HotCloud 14)

[16] Benjamin Hindman Andy Konwinski Matei Zaharia Ali Ghodsi Anthony DJoseph Randy H Katz Sco Shenker and Ion Stoica 2011 Mesos A Platformfor Fine-Grained Resource Sharing in the Data Center In NSDI Vol 11 22ndash22

[17] Wenlu Hu Ying Gao Kiryong Ha Junjue Wang Brandon Amos Zhuo ChenPadmanabhan Pillai andMahadev Satyanarayanan 2016antifying the Impactof Edge Computing onMobile Applications In Proceedings of the 7th ACM SIGOPSAsia-Pacic Workshop on Systems ACM 5

[18] IBM 2017 Apache OpenWhisk hpopenwhiskorg (April 2017)[19] Selmer Martin Johnson 1954 Optimal two-and three-stage production schedules

with setup times included Naval research logistics quarterly 1 1 (1954) 61ndash68[20] Eric Jonas Shivaram Venkataraman Ion Stoica and Benjamin Recht 2017

Occupy the Cloud Distributed computing for the 99 arXiv preprintarXiv170204024 (2017)

[21] Ryan Newton Sivan Toledo Lewis Girod Hari Balakrishnan and Samuel Mad-den 2009 Wishbone Prole-based Partitioning for Sensornet Applications InNSDI Vol 9 395ndash408

[22] OpenALPR 2017 OpenALPR ndash Automatic License Plate Recognition hpwwwopenalprcom (April 2017)

[23] Kay Ousterhout Patrick Wendell Matei Zaharia and Ion Stoica 2013 Sparrowdistributed low latency scheduling In Proceedings of the Twenty-Fourth ACMSymposium on Operating Systems Principles ACM 69ndash84

[24] M Patel B Naughton C Chan N Sprecher S Abeta A Neal and others 2014Mobile-edge computing introductory technical white paper White Paper Mobile-edge Computing (MEC) industry initiative (2014)

[25] Brad Philip and Paul Updike 2001 Caltech Vision Group 2001 testing databasehpwwwvisioncaltecheduhtml-lesarchivehtml (2001)

[26] Kari Pulli Anatoly Baksheev Kirill Kornyakov and Victor Eruhimov 2012 Real-time computer vision with OpenCV Commun ACM 55 6 (2012) 61ndash69

[27] Je Rasley Konstantinos Karanasos Srikanth Kandula Rodrigo Fonseca MilanVojnovic and Sriram Rao 2016 Ecient queue management for cluster sched-uling In Proceedings of the Eleventh European Conference on Computer SystemsACM 36

[28] Mahadev Satyanarayanan 2001 Pervasive computing Vision and challengesIEEE Personal communications 8 4 (2001) 10ndash17

[29] Mahadev Satyanarayanan 2017 e Emergence of Edge Computing Computer50 1 (2017) 30ndash39

[30] Mahadev Satyanarayanan Pieter Simoens Yu Xiao Padmanabhan Pillai ZhuoChen Kiryong Ha Wenlu Hu and Brandon Amos 2015 Edge analytics in theinternet of things IEEE Pervasive Computing 14 2 (2015) 24ndash31

[31] Cong Shi Karim Habak Pranesh Pandurangan Mostafa Ammar Mayur Naikand Ellen Zegura 2014 Cosmos computation ooading as a service for mobiledevices In Proceedings of the 15th ACM international symposium on Mobile adhoc networking and computing ACM 287ndash296

[32] W Shi J Cao Q Zhang Y Li and L Xu 2016 Edge Computing Visionand Challenges IEEE Internet of ings Journal 3 5 (Oct 2016) 637ndash646 DOIhpsdoiorg101109JIOT20162579198

[33] Weisong Shi and Schahram Dustdar 2016 e Promise of Edge ComputingComputer 49 5 (2016) 78ndash81

[34] Tolga Soyata Rajani Muraleedharan Colin Funai Minseok Kwon and WendiHeinzelman 2012 Cloud-Vision Real-time face recognition using a mobile-cloudlet-cloud acceleration architecture In Computers and Communications(ISCC) 2012 IEEE Symposium on IEEE 000059ndash000066

[35] Xiaoli Wang Aakanksha Chowdhery andMung Chiang 2016 SkyEyes adaptivevideo streaming from UAVs In Proceedings of the 3rd Workshop on Hot Topics inWireless ACM 2ndash6

[36] Shanhe Yi Zijiang Hao Zhengrui Qin andn Li 2015 Fog Computing Plat-form and Applications In Hot Topics in Web Systems and Technologies (HotWeb)2015 ird IEEE Workshop on IEEE 73ndash78

[37] Shanhe Yi Cheng Li and n Li 2015 A Survey of Fog Computing ConceptsApplications and Issues In Proceedings of the 2015 Workshop on Mobile Big DataMobidata rsquo15 ACM 37ndash42

[38] Shanhe Yi Zhengrui Qin andn Li 2015 Security and privacy issues of fogcomputing A survey InWireless Algorithms Systems and Applications Springer685ndash695

[39] Matei Zaharia Mosharaf Chowdhury Michael J Franklin Sco Shenker and IonStoica 2010 Spark cluster computing with working sets HotCloud 10 (2010)10ndash10

[40] Haoyu Zhang Ganesh Ananthanarayanan Peter Bodik Mahai PhiliposeParamvir Bahl and Michael J Freedman 2017 Live Video Analytics at Scale withApproximation and Delay-Tolerance In 14th USENIX Symposium on NetworkedSystems Design and Implementation (NSDI 17) USENIX Association BostonMA 377ndash392 hpswwwusenixorgconferencensdi17technical-sessionspresentationzhang

[41] Q Zhang X Zhang Q Zhang W Shi and H Zhong 2016 Firework Big DataSharing and Processing in Collaborative Edge Environment In 2016 Fourth IEEEWorkshop on Hot Topics in Web Systems and Technologies (HotWeb) 20ndash25 DOIhpsdoiorg101109HotWeb201612

[42] Tan Zhang Aakanksha Chowdhery Paramvir Victor Bahl Kyle Jamieson andSuman Banerjee 2015 e design and implementation of a wireless videosurveillance system In Proceedings of the 21st Annual International Conferenceon Mobile Computing and Networking ACM 426ndash438

  • Abstract
  • 1 Introduction
  • 2 Background and Motivation
    • 21 Edge Computing Network
    • 22 Serverless Architecture
    • 23 Video Edge Analytics for Public Safety
      • 3 LAVEA System Design
        • 31 Design Goals
        • 32 System Overview
        • 33 Edge Computing Services
          • 4 Edge-front Offloading
            • 41 Task Offloading System Model and Problem Formulation
            • 42 Optimization Solver
            • 43 Prioritizing Edge Task Queue
            • 44 Workload Optimizer
              • 5 Inter-edge Collaboration
                • 51 Motivation and Challenges
                • 52 Inter-Edge Task Placement Schemes
                  • 6 System Implementation and Performance Evaluation
                    • 61 Implementation Details
                    • 62 Evaluation Setup
                    • 63 Task Profiler
                    • 64 Offloading Task Selection
                    • 65 Edge-front Task Queue Prioritizing
                    • 66 Inter-Edge Collaboration
                      • 7 Related Work
                        • 71 Distributed Data Processing
                        • 72 Computation Offloading
                          • 8 Discussions and Limitations
                          • 9 Conclusion
                          • References
Page 3: LAVEA: Latency-aware Video Analytics on Edge Computing Platformweisongshi.org/papers/yi17-LAVEA.pdf · 2020. 8. 19. · LAVEA: Latency-aware Video Analytics on Edge Computing Platform

LAVEA Latency-aware Video Analytics on Edge Computing Platform SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA

correspond access points (APs) using cable or wireless and thenutilize the services provided by the co-located edge node In asparse edge node deployment a client will only connect to oneof the available edge nodes nearby at certain location While in adense deployment a client may have multiple choices on selectingthe multiple edge servers for services Implicitly we assume thatthere is a remote cloud node which can be reached via the widearea network (WAN)

To understand the factors that impact the feasibility of realiz-ing practical edge computing systems we have performed severalpreliminary measurements on existing network and shown theresults in Fig 2 and Fig 3 In these experiments we measured thelatency and bandwidth of combinations between clients nodes withdierent network interfaces connecting to edge (or cloud) nodesBased on the measurements of bandwidth all clients have benetsin utilizing a wire-connected or advanced-wireless (80211ac 5Ghz)edge computing node In terms of latency wire-connected edgenodes is the best while the 5Ghz wireless edge computing nodeshave larger means and variances in latency compared to the cloudnode in the closest region due to the intrinsic nature of wirelesschannels erefore in this paper we pragmatically assume thatedge nodes are connected to APs via cables to deliver services withbeer latency and bandwidth than the cloud erefore in sucha setup the cloud node can be considered as a backup computingnode which will be utilized only when the edge node is saturatedand experiences a long response time

wired edge

WiFi 5Gedge

WiFi 24Ghzedge

ec2east

ec2west

050

100

150

200

RTT (

ms)

wired clientWiFi 24GHzWiFi 5GHz

Figure 2 Round trip time between client and edgecloud

22 Serverless ArchitectureServerless architecture or Function as a Service (FaaS) such asAWS Lambda Google Cloud Function Azure Functions is an agilesolution for developer to build cloud computing services withoutthe heave liing of managing cloud instances To use AWS Lambdaas an example AWS Lambda is a event-based micro-service frame-work in which a user-supplied Lambda function as the applicationlogic will be executed in response to corresponding event e AWScloud will take care of the provisioning and resource managementfor running Lambda functions At the rst time a Lambda functionis created a container will be built and launched based on the con-gurations provided Each container will also be provided a small

wired edge

WiFi 5Gedge

WiFi 24Ghzedge

ec2east

ec2west

05

0100

Bandw

idth

(M

bps)

wired clientWiFi 24GHzWiFi 5GHz

Figure 3 Bandwidth between client and edgecloud

disk space as transient cache during multiple invocations AWShas its own way to run Lambda functions with either reusing anexisting container or creating a new one Recently there is AWSLambdaEdge [1] that allows using serverless functions at theAWS edge location in response to CDN event to apply moderatecomputations We strongly advocate the adoption of serverless ar-chitecture at the edge computing layers as serverless architecturenaturally solves two important problems for edge computing 1)serverless programming model greatly reduces the burden on usersor developers in developing deploying and managing edge appli-cations as there is no need to understand the complex underlyingprocedures to run the applications or heavy liing of distributed sys-tem management 2) the functions are exible to run on either edgeor cloud which lowers the barrier of edge-cloud inter-operatabilityand federation Recent works have shown the potentials of sucharchitecture in low latency video processing tasks [11] and dis-tributed computing tasks [20] and there have been research eortsof incorporating serverless architecture in edge computing [8]

23 Video Edge Analytics for Public SafetyVideo surveillance is of great importance for public safety Besidesthe ldquoAmber Alertrdquo example there are many other applications inthis eld For example secure cameras deployed at public places(eg the airport) can quickly spot unaended bags [42] policewith body-worn cameras can identify suspects and suspicious vehi-cles during approaching and so on Because those scenarios areurgent and critical the applications need to provide the quickestresponses with best eorts However most tasks in video analyticsare undoubtedly computationally intensive [26] While runningon resource constrained mobile clients or IoT devices directly thelatency in computation baery drain (if baery-powered) or evenheat dissipation will eventually ruin the user experience failing toachieve the performance goals of the applications If running oncloud nodes transferring large volume of multimedia data will incurunacceptable transmission latency and additional bandwidth costBeing proposed as a dedicated solution the deployment of edgecomputing platform enables the quickest responses to these videoanalytics tasks which require both low latency and high bandwidth

In this paper we mainly focus on building video edge analyticsplatform and we demonstrate our platform using the application

SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA S Yi et al

of Automated License Plate Recognition (ALPR) Even though weintegrate specic application our edge platform is a general designand can be extended for other application with lile modicationsAn ALPR system usually has four stages 1) image acquisition 2)license plate extraction 3) license plate analysis and 4) characterrecognition [2 10] Each of the stages involves various computervision paern recognition and machine learning algorithms Mi-grating the execution of some algorithms to powerful edgecloudnode can signicantly reduce the response time [34] However of-oaded tasks require intermediate data application state variablesand corresponding congurations to be uploaded Some of the algo-rithms produce large amount of intermediate data will add delay tothe whole processing time if ooaded to remote cloud We believethat a carefully designed edge computing platform will assist ALPRsystem to expand on more resource-constrained devices at morelocations and provide beer response time at the same time

3 LAVEA SYSTEM DESIGNIn this section we present our system design First we will discussour design goals en we will overview our system design andintroduce several important edge computing services

31 Design Goalsbull Latency e ability to provide low latency services is

recognized as one of the essential requirements of edgecomputing system design

bull Flexibility Edge computing system should be able to ex-ibility utilize the hierarchical resources from client nodesnearby edge nodes and remote cloud nodes

bull Edge-rst By edge-rst we mean that the edge comput-ing platform is the rst choice of our computation ooad-ing target

32 System OverviewLAVEA is intrinsically an edge computing platform which supportslow-latency video processing e main components are edge com-puting node and edge client Whenever a client is running tasks andthe nearby edge computing node is available a task can be decidedto run either locally or remotely We present the architecture ofour edge computing platform in Figure 4

321 Edge Computing Node In LAVEA the edge computingnode provides edge computing services to themobile devices nearbye edge computing node aached to the same access point or basestation as clients is called the edge-front By deploying edge com-puting node with access point or base station we ensure that edgecomputing service can be as ubiquitous as Internet access Multipleedge computing nodes can collaborate and the edge-front will al-ways serve as the master and be in charge of the coordination withother edge nodes and cloud nodes As shown in Figure 4 we usethe light-weight virtualization technique to provide resource allo-cation and isolation to dierent clients Any client can submit tasksto the platform via client APIs e platform will be responsiblefor shaping workload managing queue priorities and schedulingtasks ose functions are implemented via internal APIs providedby multiple micro-services such as queueing service scheduling

service data store service etc We will introduce several importantservices later in this section

322 Edge Client Since most edge clients are either resourceconstrained devices or need to accommodate requests from a largenumber of clients an edge client usually runs lightweight dataprocessing tasks locally and ooads heavy tasks to edge computingnode nearby In LAVEA the edge client has a thin client design tomake sure all the clients can run it without introducing too muchoverhead For low-end devices there is only one worker to makeprogress on the assigned job e most important part of clientnode design is the proler and the ooading controller acting asparticipants in the corresponding proler service and ooadingservice With proler and ooading controller a client can provideooading information to the edge-front node and fulll ooadingdecision received

33 Edge Computing Services331 Profiler Service Similar to [7 21 31] our system uses a

proler to collect task performance information on various devicessince it is dicult to derive an analytic model to accurately capturethe behavior of the whole system However we have found thatthe execution of video process tasks is relatively stable (when inputand algorithmic congurations are given) and a proler can be usedto collect relevant metrics erefore we add a proling phaseto the deployment of every new type of client devices and edgedevices e proler will execute instrumented tasks multiple timeswith dierent inputs and congurations on the device and measuremetrics including but not limited to execution time inputoutputdata size etc e time-stamped logs will be gathered to buildthe task execution graph for specic tasks inputs congurationsand devices e proler service will collect those information onwhich LAVEA relies for ooading decisions

332 Monitoring Service Unlike proler service which gath-ers pre-run-time execution information on pre-dened inputs andcongurations the monitoring service is used to continuously mon-itoring and collect run-time information such as the network sys-tem load from not only the clients but also nearby edge nodesMonitoring the network between client and edge-front is neces-sary since most edge clients are connected to edge-front servervia wireless link e condition of wireless link is changing fromtime to time erefore we need to constantly monitor the wirelesslink to estimate the bandwidth and the latency Monitoring systemload on the edge client provides exible workload shaping andtask ooading from client to the edge is information is alsobroadcasted among nearby edge nodes When an edge-front nodeis saturated or unstable some tasks will be assigned to nearby edgenodes according to the system load the network bandwidth andnetwork delay between edge nodes as long as there is still benetinstead of assigning tasks to cloud node

333 Oloading Service e ooading controller will tracktasks running locally at the client exchange information with theooading service running on the edge-front server e variablesgather in proler and monitoring services will be used to as in-puts to the ooading decision problem which is formulated as anoptimization problem to minimize the response time Every time

LAVEA Latency-aware Video Analytics on Edge Computing Platform SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA

Host or Host Cluster Hardware

Host OS

Container

Container Manager (Docker Engine)

HDFS SQL KV Store

Data Store Service

Offloading Service

Queueing Service

Scheduling Service

Edge Front Gateway

Worker

Task Queue

Container Container Container Container

MonitoringService

Worker Worker

Task Scheduler

Worker Worker Worker

Worker Worker Worker

Producer

Workload Optimizer

Producer

Graph

Queue Prioritizer

Task Worker

ProfilerService

Edge Computing Platform API

Platform Internal API

OS-level Virtualization

Edge Computing Platform SDKEdge Computing Platform Client API

Application

ProfilerOffloading Controller

Worker Worker Worker

Task Scheduler

Local Worker Stack

OS or Container

Edge Computing Node

Access Potint

Security Camera

Dash Camera Smartphone and Tablet

Laptop

Figure 4 e architecture of edge computing platform

when a new client registers itself to the ooading services aerthe edge-front node collects enough prerequisite information andstatistics the optimization problem is solved again and the updatedooading decisions will be sent to all the clients Periodically theooading service also solves the optimization problem and updateooading decisions with its clients

4 EDGE-FRONT OFFLOADINGIn this section we describe how we select tasks of a job to runremotely on the edge server in order to minimize the responsetime

We consider selecting tasks to run on the edge as a computationooading problem Traditional ooading problems are ooadingschemes between clients and remote powerful cloud servers Inliterature [7 21 31] those system models usually assume the taskwill be instantly nished remotely once the task is ooaded to theserver However we argue that this assumption will not hold inedge computing environment as we need to consider the variousdelays at the server side especially when lots of clients are sendingooading requests We call it edge-front computation ooadingfrom the perspective of client

bull Tasks will be only by ooaded from client to the nearestedge node which we call the edge front

bull e underlying scheduling and processing is agnostic toclients

bull When a mobile node is disconnected from any edge nodeor even cloud node it will resort to local execution of allthe tasks

We assume that edge node is wire-connected to the access pointwhich indicates that the out-going trac can go through edge nodewith no additional cost e only dierence between ooading taskto edge node and cloud node is that the task running on edge nodemay experience resource contention and scheduling delay whilewe assume task ooaded to cloud node will get enough resourceand be scheduled to run immediately In light work load case ifthere is any response time reduction when this task is ooadedto cloud then we know that there is denitely benet when thistask is ooaded to the edge e reasons are 1) an edge server isas responsive as the server in the cloud data center 2) running atask on edge server experiences shorter data transmission delay asclient-edge link has much larger bandwidth than edge-cloud linkwhich is usually limited and imbalanced by the Internet serviceproviders (ISPs) erefore in this section we focus on the taskooading only between client and edge server and we will discussintegrating nearby edge nodes for the heavy work load scenario inthe next section

41 Task Oloading System Model andProblem Formulation

In the paper we call a running instance of the application as a jobwhich is usually a set of tasks e job is the unit of work thatuser submits to our system while the task is the unit of work forour system to make scheduling and optimization decisions esetasks generated from each application will be queued and processedeither locally or remotely By remotely we mean run the task on anedge node For simplicity we consider that all clients are running

SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA S Yi et al

instances of applications processing same kind of jobs which istypically the case in our edge application scenario However oursystem can easily extend to heterogeneous applications running oneach client device

We choose to work on the granularity of task since those tasksare modularized and can be exibly combined to either achievea speed processing or form a workow with high accuracy Inour ALPR application each task is usually a common algorithmor library of computer vision For example We have analyzed anopen source ALPR project called OpenALPR [22] and illustrate itstask graph in Fig 5

Input

MotionDetection

PlateDetection

PlateCharacterAnalysis

image

video frame

motionregion

Output

imagestill

CharacterRecognition

no plate detected

plate candidate

ResultGeneration

PlateCharacterAnalysis

CharacterRecognition

PlateCharacterAnalysis

CharacterRecognition

plate candidate

Figure 5 e task graph of OpenALPR

en we consider there are N clients and only one edge serverconnected as shown in Fig 1 is edge server could be a singleserver or a cluster of servers Each client i i isin [1N ] will processthe upcoming job upon request eg recognizing the license platesin video streams Usually those jobs will generate heavy compu-tation task and could benet from ooading some of them to theedge server Without loss of generality we use a graph of task torepresent the complex task relations inside a job which is essen-tially similar to the method call graph in [7] but in a more coarsegranularity For a certain kind of job we start with its directedacyclic graph (DAG) G = (V E) which gives the task executionsequence Each vertex v isin V weight is the computation or memorycost of a task (cv ) while each edge e = (uv)uv isin V e isin E weightrepresents the data size of intermediate results (duv ) us ourooading problem can be taken as a graph partition problem inwhich we need to assign a directed graph of tasks to dierent com-puting nodes (local edge or cloud) with the purpose to minimize

certain cost In this paper we primarily try to minimize the jobnish time

e remote response time includes the communication delay thenetwork transmission delay of sending data to the edge server andthe execution time on that server We use an indicator Ivi isin 0 1for all v in V and for all i isin [1N ] If Ivi = 1 then the task v atclient i will run locally otherwise it will run on the remote edgeserver For those tasks running locally the total execution time forclient i is a sum as

T locali =sumv isinV

Ivicvpi (1)

where pi is the processor speed of client i Similarly we use

Tlocali =

sumv isinV(1 minus Ivi )cvpi (2)

to represent the execution time of running the ooaded taskslocally For network when there is an ooading decision theclient need to upload the intermediate data (outputs of previoustask application status variables congurations etc) to the edgeserver in order to continue the computation e network delay ismodeled as

Tneti =sum(uv)isinE

(Iui minus Ivi )duvri + βi (3)

where ri is the bandwidth assigned for this client and βi is thecommunication latency which can be estimated using round triptime between the client i and the edge server

For each client the remote execution time is

T r emotei =

sumv isinV(1 minus Ivi )(cvp0) (4)

where p0 is the processor speed of the edge serveren our ooading task selection problem can be formulated as

minIi ri

Nsumi=1(T locali +Tneti +T r emote

i ) (5)

e ooading task selection is represented by the indicator matrixI is optimization problem is subject to the following constraints

bull e total bandwidth

stNsumi=1

ri le R (6)

bull Like existing work we restrict the data ow to avoid ping-pong eect in which intermediate data is transmied backand forth between client and edge serverst Ivi le Iui foralle(uv) isin Eforalli isin [1N ] (7)

bull Unlike exiting ooading frameworks for mobile cloud com-puting we take the resource contention or schedule delayat the edge side into consideration by adding a end-to-enddelay constraint

st Tlocali minus (Tneti +T r emote

i ) gt τ foralli isin [1N ] (8)where τ can be tuned to avoid selecting borderline tasksthat if ooaded will get no gain due to the resource con-tention or schedule delay at the edge

LAVEA Latency-aware Video Analytics on Edge Computing Platform SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA

42 Optimization Solvere proposed optimization is a mixed integer non-linear program-ming problem (MINLP) where the integer variable stands for theooading decision and the continuous variable stands for the band-width allocation To solve this optimization problem we start fromrelaxing the integer constraints and solve the non-linear program-ming version of the problem using Sequential adratic Program-ming method a constrained nonlinear optimization method issolution is optimal without considering the integer constraintsStarting from this optimal solution we optionally employ branchand bound (BampB) method to search for the optimal integer solutionor simply do an exhaustive search when the number of clients andthe number of tasks of each job are small

43 Prioritizing Edge Taskeuee ooading strategy produced by the task selection optimizesthe rdquoowldquo time of each type of job At each time epoch during therun time the edge-front node receives a large number of ooadedtasks from the clients Originally we follow the rst come rstserve rule to accommodate all the client requests For each requestat the head of the task queue the edge-front server rst checksif the input or intermediate data (eg images or videos) availableat the edge otherwise the server waits is scheme is easy toimplement but substantial computation is wasted if the networkIO is busy with a large size le and there is no task that is readyfor processing erefore we improve the task scheduling witha task queue prioritizer to maintain a task sequence which mini-mizes the makespan for the task scheduling of all ooading taskrequests received at a certain time epoch Since the edge node canexecute the task only when the input data has been fully receivedor the depended tasks have nished execution We consider thatan ooaded task has to go through two stages the rst stage isthe retrieval of input or intermediate data and state variables thesecond stage is the execution of the task

We model our scheduling problem using the ow job shop modeland apply the Johnsonrsquos rule [19] is scheme is optimal and themakespan is minimized when the number of stages is two Nev-ertheless this model only ts in the case that all submied jobrequests are independent and no priorities When consideringtask dependencies a successor can only start aer its predecessornishes By enforcing the topological ordering constraints theproblem can be solved optimally using the BampB method [5] How-ever this solution hardly scales against the number of tasks Inthis case we adapt the method in [3] and group tasks with depen-dencies and execute all tasks in a group sequentially e basicidea is applying Johnsonrsquos rule in two levels e rst level is todecide the sequence of tasks with in each group e dierence inour problem is that we need to decide the best sequence amongall valid topological orderings en the boom level is a job shopscheduling problem in terms of grouped jobs (ie a group of taskswith dependencies in topological ordering) in which we can utilizeJohnsonrsquos rule directly

44 Workload OptimizerIf the workload is overwhelming and the edge-front server is satu-rated the task queue will be unstable and the response time will

be accumulated indenitely ere are several measures we cantake to address this problem First we can adjust the imagevideoresolution in client-side congurations which makes well trade-o between speed and accuracy Second by constraining the taskooading problem we can restrain more computation tasks at theclient side ird if there are nearby edge nodes which are favoredin terms of latency bandwidth and computation we can furtherooad tasks to nearby edge nodes We have investigated this casewith performance improvement considerations in the Section 5Last we can always redirect tasks to the remote cloud just likeooading in MCC

5 INTER-EDGE COLLABORATIONIn this section We improve our edge-rst design by taking the casewhen the incoming workload saturates our edge-front node intoconsideration We will rst discuss our motivation of providingsuch option and list the corresponding challenges en we willintroduce several collaboration schemes we have proposed andinvestigated

51 Motivation and Challengese resources of edge computing node are much richer than clientnodes but are relatively limited compared to cloud nodes Whileserving an increasing number of client nodes nearby the edge-frontnode will be eventually overloaded and become non-responsive tonew requests As a baseline we can optionally choose to ooad fur-ther requests to the remote cloud We assume that the remote cloudhas unlimited resources and is capable to handle all the requestsHowever running tasks remotely in the cloud the application needto bear with unpredictable latency and limited bandwidth whichis not the best choice especially when there are other nearby edgenodes that can accommodate those tasks We assume that underthe condition when all available edge nodes nearby are exhaustedthe mobile-edge-cloud computing paradigm will simply fall back tothe mobile cloud computing paradigm e fallback design is notin the scope of this paper In this paper we mainly investigate theinter-edge collaboration with the prime purpose to alleviate theburden on edge-front node

When the edge-front node is saturated with requests it cancollaborate with nearby edge nodes by placing some tasks to thesenot-so-busy edge nodes such that all the tasks can get scheduledin a reasonable time is is slightly dierent from balancing theworkload among the edge nodes and the edge-front node in that thegoal of inter-edge collaboration is to beer serve the client nodeswith submied requests rather than simply making the workloadbalanced For example an edge-front node that is not overloadeddoes not need to place any tasks to the nearby edge nodes evenwhen they are idle

e challenges of inter-edge collaboration are two-fold 1) weneed to design a proper inter-edge task placement scheme thatfullls our goal of reducing the workload on the edge-front nodewhile ooading proper amount of workload to the qualied edgenodes 2) the task placement scheme should be lightweight scalableand easy-to-implement

SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA S Yi et al

52 Inter-Edge Task Placement SchemesWe have investigated three task placement schemes for inter-edgecollaboration

bull Shortest Transmission Time First (STTF)bull Shortest eue Length First (SQLF)bull Shortest Scheduling Latency First (SSLF)

e STTF task placement scheme tends to place tasks on theedge node that has the shortest estimated latency for the edge-front node to transfer the tasks e edge-front node maintains atable to record the latency of transmiing data to each availableedge node e periodical re-calibration is necessary because thenetwork condition between the edge-front node and other edgenodes may vary from time to time

e SQLF task placement scheme on the other hand tends totransfer tasks from the edge-front node to the edge node which hasthe least number of tasks queued upon the time of query Whenthe edge-front node is saturated with requests it will rst queryall the available edge nodes about their current task queue lengthand then transfer tasks to the edge node that has the shortest valuereported

e SSLF task placement scheme tends to transmit tasks from theedge-front node to the edge node that is predicted to have the short-est response time e response time is the time interval betweenthe time when the edge-front node submits a task to an availableedge node and the time when it receives the result of the task fromthat edge node Unlike the SQLF task placement scheme the edge-front node keeps querying the edge nodes about the queue lengthwhich may has performance issue when the number of nodes scalesup and results in a large volume of queries We have designed anovel method for the edge-front node to measure the schedulinglatency eciently During the measurement phase before edge-front node chooses task placement target edge-front node sendsa request message to each available edge node which appends aspecial task to the tail of the task queue When the special taskis executed the edge node simply sends a response message tothe edge-front node e edge-front node receives the responsemessage and records the response time Periodically the edge-frontnode maintains a series of response times for each available edgenode When the edge-front node is saturated it will start to reassigntasks to the edge node having the shortest response time Unlikethe STTF and SQLF task assignment schemes which choose thetarget edge node based on the current or most recent measurementsthe SSLF scheme predicts the current response time for each edgenode by applying regression analysis to the response time seriesrecorded so far e reason is that the edge nodes are also receivingtask requests from client nodes and their local workload may varyfrom time to time so the most recent response time cannot serveas a good predictor of the current response time for the edge nodesAs the local workload in the real world on each edge node usuallyfollows certain paern or trend applying regression analysis tothe recorded response times is a good way to estimate the currentresponse time To this end we recorded measurements of responsetimes from each edge node and ooads tasks to the edge nodethat is predicted to have the least current response time Once theedge-front node starts to place task to a certain edge node the

estimation will be updated using piggybacking of the redirectedtasks which lowers the overhead of measuring

Each of the task placement schemes described above has someadvantages and disadvantages For instance the STTF scheme canquickly reduce the workload on the edge-front node But there isa chance that tasks may be placed to an edge node which alreadyhas intensive workload as STTF scheme gathers no information ofthe workload on the target e SQLF scheme works well when thenetwork latency and bandwidth are stable among all the availableedge nodes When the network overheads are highly variant thisscheme fails to factor the network condition and always choosesedge node with the lowest workload When an intensive workloadis placed under a high network overhead this scheme potentiallydeteriorates the performance as it needs to measure the workloadfrequently e SSLF task placement scheme estimates the responsetime of each edge node by following the task-ooading processand the response time is a good indicator of which edge node shouldbe chosen as the target of task placement in terms of the workloadand network overhead e SSLF scheme is a well trade-o betweenprevious two schemes However the regression analysis may intro-duce a large error to the predicted response time if inappropriatemodels are selected We believe that the decision of which taskplacement scheme should be employed for achieving good systemperformance should always give proper considerations on the work-load and network conditions We evaluated those three schemesthrough a case study in the next section

6 SYSTEM IMPLEMENTATION ANDPERFORMANCE EVALUATION

In this section we rst brief the implementation details in buildingour system Next we introduce our evaluation setup and presentresults of our evaluations

61 Implementation DetailsOur implementation aims at a serverless edge architecture Asshown in system architecture of Fig 4 our implementation is basedon docker container for the benets of quick deployment and easymanagement Every component has been dockerized and its de-ployment is greatly simplied via distributing pre-built images ecreation and destruction of docker instances is much faster thanthat of VM instances Inspired by the IBM OpenWhisk [18] eachworker container contains an action proxy which uses Pythonto run any scripts or compile and execute any binary executablee worker container communicates with others using a messagequeue as all the inputsoutputs will be jsonied However we donrsquotjsonied imagevideo and use its path reference in shared storagee task queue is implemented using Redis as it is in memory andhas very good performance e end user only needs to 1) deployour edge computing platform on heterogeneous devices with justa click 2) dene the event of interests using provided API 3) pro-vide a function (scripts or binary executable) to process such evente function we have implemented is using open source projectOpenALPR[22] as the task payload for workers

LAVEA Latency-aware Video Analytics on Edge Computing Platform SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA

62 Evaluation Setup621 Testbed We have built a testbed consisting of four edge

computing nodes One of the edge nodes is the edge-front nodewhich is directly connected to a wireless router using a cable Otherthree nodes are set as nearby edge computing nodes for the evalua-tion of inter-edge collaboration ese four machines have the samehardware specications ey all have a quad-core CPU and 4 GBmain memory e three nearby edge nodes are directly connectedto the edge-front node through a network cable We make use oftwo types of Raspberry Pi (RPi) nodes as clients one type is RPi 2which is wired to the router while the other type is RPi 3 which isconnected to router using built-in 24 Ghz WiFi

622 Datasets We have employed three datasets for evalua-tions One dataset is the Caltech Vision Group 2001 testing databasein which the car rear image resolution (126 images with resolution896x592) is adequate for license plate recognition [25] Anotherdataset is a self-collected 4K video contains rear license plates takenon an Android smartphone and is converted into videos of dierentresolutions(640x480 960x720 1280x960 and 1600x1200) e otherdataset used in inter-edge collaboration evaluation contains 22 carimages with the various resolution ranging from 405x540 pixelsto 2514x1210 pixels (le size 316 KB to 285 MB) e task requestsuse the car images as input in a round-robin way one car imagefor each task request

63 Task ProlerBeside the round trip time and bandwidth benchmark we havepresented in Fig 2 and Fig 3 to characterize the edge computingnetwork we have done proling of the OpenALPR application onvarious client edge and cloud nodes

896x592workload1

640x480workload2

960x720workload2

1280x960workload2

1600x1200workload2

0200

400

600

800

Clie

nt

RPi2

Execu

tion T

ime(m

s)

MotionDetectonPlateDetectionPlateAnalysisOCR

Figure 6 OpenALPR prole result of client type 1 (RPi2quad-core 09GHz)

In this experiment we are using both dataset 1 (workload 1)and dataset 2 (workload 2) at various resolutions e executiontime for each tasks are shown in Fig 6 Fig 7 Fig 8 and Fig 9e results indicate that by utilizing an edge nodes we can geta comparable amount of computation power close to clients forcomputation-intensive tasks Another observations is that dueto the uneven optimizations on heterogeneous CPU architecturessome tasks are beer to keep local while some others should be

896x592workload1

640x480workload2

960x720workload2

1280x960workload2

1600x1200workload2

020

04

00

60

0800

Clie

nt

RPi3

Execu

tion T

ime(m

s)

MotionDetectonPlateDetectionPlateAnalysisOCR

Figure 7 OpenALPR prole result of client type 2 (RPi3quad-core 12GHz)

896x592workload1

640x480workload2

960x720workload2

1280x960workload2

1600x1200workload2

0200

400

600

800

Edge E

xecu

tion T

ime(m

s)

MotionDetectonPlateDetectionPlateAnalysisOCR

Figure 8 OpenALPR prole result of a type of edge node (i7quad-core 230GHz)

896x592workload1

640x480workload2

960x720workload2

1280x960workload2

1600x1200workload2

0200

400

600

800

EC

2 T

2 L

arg

e E

xecu

tion T

ime(m

s)

MotionDetectonPlateDetectionPlateAnalysisOCR

Figure 9 OpenALPR prole of a type of cloud node (AWSEC2 t2large Xeon dual-core 240GHz)

ooaded to edge computing node is observation justied theneed of computation ooading between clients and edge nodes

SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA S Yi et al

640x480workload2

960x720workload2

1280x960workload2

1600x1200workload2

01

23

4R

esp

onse

Tim

e p

er

fram

e p

er

clie

nt(

s) Client-edge optClient onlyEdge onlyClient-cloud optCloud only

Figure 10 e comparison of task selection impacts on edgeoloading and cloud oloading for wired clients (RPi2)

640x480workload2

960x720workload2

1280x960workload2

1600x1200workload2

01

23

4R

esp

onse

Tim

e p

er

fram

e p

er

clie

nt(

s) Client-edge optClient onlyEdge onlyClient-cloud optCloud only

Figure 11 e comparison of task selection impacts on edgeoloading and cloud oloading for 24 Ghz wireless clients(RPi3)

64 Oloading Task SelectionTo understand how much the execution time can be reduced byspliing tasks between the client and the edge or between theclient and the cloud we design an experiment with workloadsgenerated from dataset 2 on two setups of scenarios 1) one edgenode provides service to three wired client nodes that have thebest network latency and bandwidth 2) one edge node providesservice to three wireless 24 Ghz client nodes that have latencywith high variance and relatively low bandwidth e result ofthe rst case is very straightforward the clients simply uploadall the input data and run all the tasks on the edge node in edgeooading or cloud node in cloud ooading as shown in Fig 10is is mainly because using Ethernet cable can stably providelowest latency and highest bandwidth which makes ooading toedge very rewarding We didnrsquot evaluate 5 Ghz wireless client sincethis interface is not supported on our client hardware while weanticipate similar results as the wire case We plot the result of a24 Ghz wireless client node with ooading to an edge node or aremote cloud node in the second case in Fig 11 Overall the resultsshowed that by ooading tasks to an edge computing platform

the application we choose experienced a speedup up to 40x onwired client-edge conguration compared to local execution andup to 17x compared to a similar client-cloud conguration Forclients with 24 Ghz wireless interface the speedup is up to 13x onclient-edge conguration compared to local execution and is up to12x compared to similar client-cloud conguration

5 10 15 20 25 30 35Number of task offloading requests

05

Resp

onse

Tim

e(s

)

Our schemeSIOFLCPUL

Figure 12 e comparison result of three task prioritizingschemes

65 Edge-front Taskeue PrioritizingTo evaluate the performance of the task queue prioritizing wecollect the statistical results from our proler service and moni-toring service on various workload for simulation We choose thesimulation method because we can freely setup the numbers andtypes of client and edge nodes to overcome the limitation of ourcurrent testbed to evaluate more complex deployments We addtwo simple schemes as baselines 1) shortest IO rst (SIOF) sortingall the tasks against the time cost of the network transmission 2)longest CPU last (LCPUL) sorting all the tasks against the time costof the processing on the edge node In the simulation base on thecombination of client device types workloads and ooading deci-sions we have in total seven types of jobs to run on the edge nodeWe increase the total number of jobs and evenly distributed themamong the seven types and report the makespan time in Fig 12 eresult showed that LCPUL is the worst among those three schemesand our scheme outperforms the shortest job rst scheme

66 Inter-Edge CollaborationWe also evaluate the performance of the three task placementschemes (ie STTF SQLF and SSLF) discussed in Section 5 througha controlled experiment on our testbed For evaluation purpose wecongure the network in the edge computing system as followse rst edge node denoted as ldquoedge node 1rdquo aerwards has 10ms RTT and 40 Mbps bandwidth to the edge-front node e secondedge node ldquoedge node 2rdquo has 20 ms RTT and 20 Mbps bandwidthto the edge-front node e third edge node ldquoedge node 3rdquo has100 ms RTT and 2 Mbps bandwidth to the edge-front node uswe emulate the situation where three edge nodes are in the distanceto the edge-front node from near to far

LAVEA Latency-aware Video Analytics on Edge Computing Platform SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA

0

05

1

15

2

25

3

0 2 4 6 8 10 12

Th

rou

gh

pu

t (t

ask

sse

c)

Time (min)

Edge-front nodeEdge node 1Edge node 2Edge node 3

Figure 13 Performance with no task placement scheme

0

05

1

15

2

25

3

0 2 4 6 8 10 12

Th

rou

gh

pu

t (t

ask

sse

c)

Time (min)

Edge-front nodeEdge node 1Edge node 2Edge node 3

Figure 14 Performance of STTF

0

05

1

15

2

25

3

0 2 4 6 8 10 12

Th

rou

gh

pu

t (t

ask

sse

c)

Time (min)

Edge-front nodeEdge node 1Edge node 2Edge node 3

Figure 15 Performance of SQLF

0

05

1

15

2

25

3

0 2 4 6 8 10 12

Th

rou

gh

pu

t (t

ask

sse

c)

Time (min)

Edge-front nodeEdge node 1Edge node 2Edge node 3

Figure 16 Performance of SSLF

We use the third dataset to synthesize a workload as follows Inthe rst 4 minutes the edge-front node receives 5 task requests persecond edge node 1 receives 4 task requests per second edge node2 receives 3 task requests per second and edge node 3 receives 2task requests per second respectively No task comes to any of theedge nodes aer the rst 4 minutes For the SSLF task placementscheme we implement a simple linear regression to predict thescheduling latency of the task being transmied since the workloadwe have injected is uniform distributed

Fig 13 illustrates the throughput on each edge node when notask placement scheme is enabled on the edge-front node eedge-front node has the heaviest workload and it takes about 1236minutes to nish all the tasks We consider this result as our base-line

Fig 14 is the throughput result of STTF scheme In this case theedge-front node only transmits tasks to edge node 1 because edgenode 1 has the highest bandwidth and the shortest RTT to theedge-front node Fig 17 reveals that the edge-front node transmits120 tasks to edge node 1 and no task to other edge nodes Asedge node 1 has heavier workload than edge node 2 and edgenode 3 the STTF scheme has limited improvement on the systemperformance the edge-front node takes about 1129 minutes tonish all the tasks Fig 15 illustrates the throughput result of SQLF

scheme is scheme works beer than the STTF scheme becausethe edge-front node transmits more tasks to less-saturated edgenodes eciently reducing the workload on the edge-front nodeHowever the edge-front node intends to transmit many tasks toedge node 3 at the beginning which has the lowest bandwidth andthe longest RTT to the edge-front node As such the task placementmay incur more delay then expected From Fig 17 the edge-frontnode transmits 0 task to edge node 1 132 tasks to edge node 2and 152 tasks to edge node 3 e edge-front node takes about 96minutes to nish all the tasks

Fig 16 demonstrates the throughput result of SSLF scheme isscheme considers both the transmission time of the task beingplaced and the waiting time in the queue on the target edge nodeand therefore achieves the best performance of the three As men-tioned edge node 1 has the lowest transmission overhead but theheaviest workload among the three edge nodes while edge node3 has the lightest workload but the highest transmission overheadIn contrast edge node 2 has modest transmission overhead andmodest workload e SSLF scheme takes all these situations intoconsideration and places the most number of tasks on edge node2 As shown in Fig 17 the edge-front node transmits 4 tasks toedge node 1 152 tasks to edge node 2 and 148 tasks to edgenode 3 when working with the SSLF scheme e edge-front node

SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA S Yi et al

takes about 936 minutes to nish all the tasks which is the bestresult among the three schemes We infer that the third schemewill further improve the task complete time if more tough networkconditions and workloads are considered

STTF SQLF SSLF

050

10

015

0N

um

ber

of

task

palc

ed

0task

0task

0task

4tasks

Edge node 1Edge node 2Edge node 3

Figure 17 Numbers of tasks placed by the edge-front node

7 RELATEDWORKe emergence of edge computing has drawn aentions due to itscapabilities to reshape the land surface of IoTs mobile computingand cloud computing [6 14 32 33 36ndash38] Satyanarayanan [29]has briefed the origin of edge computing also known as fog comput-ing [4] cloudlet [28] mobile edge computing [24] and so on Herewe will review several relevant research elds towards video edgeanalytics including distributed data processing and computationooading in various computing paradigms

71 Distributed Data ProcessingDistributed data processing has close relationship to the edge an-alytics in the sense that those data processing platforms [9 39]and underlying techniques [16 23 27] can be easily deployed on acluster of edge nodes In this paper we pay specially aentions todistributed imagevideo data processing systems VideoStorm[40]made insightful observation on vision-related algorithms and pro-posed resource-quality trade-o with multi-dimensional congu-rations (eg video resolution frame rate sampling rate slidingwindow size etc) e resource-quality proles are generated of-ine and a online scheduler is built to allocate resources to queriesto optimize the utility of quality and latency eir work is comple-mentary to ours in that we do not consider the trade-o betweenquality and latency goals via adaptive congurations Vigil [42] isa wireless video surveillance system that leveraged edge comput-ing nodes with emphasis on the content-aware frame selections ina scenario where multiple web cameras are at the same locationto optimize the bandwidth utilization which is orthogonal to theproblems we have addressed here Firework [41] is a computingparadigm for big data processing in collaborative edge environmentwhich is complementary to our work in terms of shared data viewand programming interface

While there should be more on-going eorts for investigating theadaptation improvement and optimization of existing distributed

data processing techniques on edge computing platform we focusmore on the taskapplication-level queue management and sched-uling and leave all the underlying resource negotiating processscheduling to the container cluster engine

72 Computation OloadingComputation Ooading (aka Cyber foraging [28]) has been pro-posed to improve resource utilization response time and energyconsumption in various computing environments [7 13 21 31]Work [17] has quantied the impact of edge computing on mobileapplications and found that edge computing can improve responsetime and energy consumption signicantly for mobile devicesthrough ooading via both WiFi and LTE networks Mocha [34]has investigated how a two-stage face recognition task from mobiledevice can be accelerated by cloudlet and cloud In their designclients simply capture image and sends to cloudlet e optimaltask partition can be easily achieved as it has only two stages InLAVEA our application is more complicated in multiple stages andwe leverage client-edge ooading and other techniques to improvethe resource utilization and optimize the response time

8 DISCUSSIONS AND LIMITATIONSIn this section we will discuss alternative design options point outcurrent limitations and future work that can improve the system

Measurement-based Oloading In this paper we are using ameasurement-based ooading (static ooading) ie the ooadingdecisions are based on the outcome of periodic measurements Weconsider this as one of the limitations of our implementations asstated in [15] and there are several dynamic computation ooadingschemes have been proposed [12] We are planning to improve themeasurement-based ooading in the future work

Video Streaming Our current data processing is image-basedwhich is one of the limitations of our implementation e inputis either in the format of image or in video stream which is readinto frames and sent out We believe by utilizing existing videostreaming techniques in between our system components for datasharing will further improve the system performance and openmore potential opportunities for optimization

Discovering EdgeNodes ere are dierent ways for the edge-front node to discover the available edge nodes nearby For exampleevery edge node intending to serve as a collaborator may open adesignated port so that the edge-front node can periodically scanthe network and discover the available edge nodes is is calledthe ldquopull-basedrdquo method In contrast there is also a ldquopush-basedrdquomethod in which the edge-front node opens a designated port andevery edge node intending to serve as a collaborator will register tothe edge-front node When the network is in a large scale the pull-based method usually performs poorly because the edge-front nodemay not be able to discover an available edge node in a short timeFor this reason the edge node discovery is implemented in a push-based method which guarantees good performance regardless ofthe network scale

9 CONCLUSIONIn this paper we have investigated providing video analytic servicesto latency-sensitive applications in edge computing environment

LAVEA Latency-aware Video Analytics on Edge Computing Platform SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA

As a result we have built LAVEA a low-latency video edge analyticsystem which collaborates nearby client edge and remote cloudnodes and transfers video feeds into semantic information at placescloser to the users in early stages We have utilized an edge-frontdesign and formulated an optimization problem for ooading taskselection and prioritized task queue to minimize the response timeOur result indicated that by ooading tasks to the closest edgenode the client-edge conguration has speedup range from 13x to4x (12x to 17x) against running in local (client-cloud) under variousnetwork conditions and workloads In case of a saturating workloadon the front edge node we have proposed and compared varioustask placement schemes that are tailed for inter-edge collaboratione proposed prediction-based shortest scheduling latency rsttask placement scheme considers both the transmission time ofthe task and the waiting time in the queue and outputs overallperformance beer than the other schemes

REFERENCES[1] Amazon Web Service 2017 AWS LambdaEdge hpdocsawsamazoncom

lambdalatestdglambda-edgehtml (2017)[2] Christos-Nikolaos E Anagnostopoulos Ioannis E Anagnostopoulos Ioannis D

Psoroulas Vassili Loumos and Eleherios Kayafas 2008 License plate recog-nition from still images and video sequences A survey IEEE Transactions onintelligent transportation systems 9 3 (2008) 377ndash391

[3] KR Baker 1990 Scheduling groups of jobs in the two-machine ow shop Math-ematical and Computer Modelling 13 3 (1990) 29ndash36

[4] Flavio Bonomi Rodolfo Milito Jiang Zhu and Sateesh Addepalli 2012 Fogcomputing and its role in the internet of things In Proceedings of the rst editionof the MCC workshop on Mobile cloud computing ACM 13ndash16

[5] Peter Brucker Bernd Jurisch and Bernd Sievers 1994 A branch and boundalgorithm for the job-shop scheduling problem Discrete applied mathematics 491 (1994) 107ndash127

[6] Yu Cao Songqing Chen Peng Hou and Donald Brown 2015 FAST A fog com-puting assisted distributed analytics system to monitor fall for stroke mitigationIn Networking Architecture and Storage (NAS) 2015 IEEE International Conferenceon IEEE 2ndash11

[7] Eduardo Cuervo Aruna Balasubramanian Dae-ki Cho Alec Wolman StefanSaroiu Ranveer Chandra and Paramvir Bahl 2010 MAUI Making SmartphonesLast Longer with Code Ooad In Proceedings of the 8th International Conferenceon Mobile Systems Applications and Services (MobiSys rsquo10) ACM New York NYUSA 49ndash62 DOIhpsdoiorg10114518144331814441

[8] Eyal de Lara Carolina S Gomes Steve Langridge S Hossein Mortazavi andMeysam Roodi 2016 Hierarchical Serverless Computing for the Mobile EdgeIn Edge Computing (SEC) IEEEACM Symposium on IEEE 109ndash110

[9] Jerey Dean and Sanjay Ghemawat 2008 MapReduce simplied data processingon large clusters Commun ACM 51 1 (2008) 107ndash113

[10] Shan Du Mahmoud Ibrahim Mohamed Shehata and Wael Badawy 2013 Auto-matic license plate recognition (ALPR) A state-of-the-art review IEEE Transac-tions on circuits and systems for video technology 23 2 (2013) 311ndash325

[11] Sadjad Fouladi Riad S Wahby Brennan Shackle Karthikeyan Vasuki Balasubra-maniam William Zeng Rahul Bhalerao Anirudh Sivaraman George Porter andKeith Winstein 2017 Encoding Fast and Slow Low-Latency Video ProcessingUsing ousands of Tiny reads In 14th USENIX Symposium on NetworkedSystems Design and Implementation (NSDI 17) USENIX Association BostonMA 363ndash376 hpswwwusenixorgconferencensdi17technical-sessionspresentationfouladi

[12] Wei Gao Yong Li Haoyang Lu Ting Wang and Cong Liu 2014 On exploitingdynamic execution paerns for workload ooading in mobile cloud applicationsIn Network Protocols (ICNP) 2014 IEEE 22nd International Conference on IEEE1ndash12

[13] Mark S Gordon Davoud Anoushe Jamshidi Sco A Mahlke Zhuoqing Mor-ley Mao and Xu Chen 2012 COMET Code Ooad by Migrating ExecutionTransparently In OSDI Vol 12 93ndash106

[14] Zijiang Hao Ed Novak Shanhe Yi andn Li 2017 Challenges and SowareArchitecture for Fog Computing Internet Computing (2017)

[15] Mohammed A Hassan Kshitiz Bhaarai Qi Wei and Songqing Chen 2014POMAC Properly Ooading Mobile Applications to Clouds In 6th USENIXWorkshop on Hot Topics in Cloud Computing (HotCloud 14)

[16] Benjamin Hindman Andy Konwinski Matei Zaharia Ali Ghodsi Anthony DJoseph Randy H Katz Sco Shenker and Ion Stoica 2011 Mesos A Platformfor Fine-Grained Resource Sharing in the Data Center In NSDI Vol 11 22ndash22

[17] Wenlu Hu Ying Gao Kiryong Ha Junjue Wang Brandon Amos Zhuo ChenPadmanabhan Pillai andMahadev Satyanarayanan 2016antifying the Impactof Edge Computing onMobile Applications In Proceedings of the 7th ACM SIGOPSAsia-Pacic Workshop on Systems ACM 5

[18] IBM 2017 Apache OpenWhisk hpopenwhiskorg (April 2017)[19] Selmer Martin Johnson 1954 Optimal two-and three-stage production schedules

with setup times included Naval research logistics quarterly 1 1 (1954) 61ndash68[20] Eric Jonas Shivaram Venkataraman Ion Stoica and Benjamin Recht 2017

Occupy the Cloud Distributed computing for the 99 arXiv preprintarXiv170204024 (2017)

[21] Ryan Newton Sivan Toledo Lewis Girod Hari Balakrishnan and Samuel Mad-den 2009 Wishbone Prole-based Partitioning for Sensornet Applications InNSDI Vol 9 395ndash408

[22] OpenALPR 2017 OpenALPR ndash Automatic License Plate Recognition hpwwwopenalprcom (April 2017)

[23] Kay Ousterhout Patrick Wendell Matei Zaharia and Ion Stoica 2013 Sparrowdistributed low latency scheduling In Proceedings of the Twenty-Fourth ACMSymposium on Operating Systems Principles ACM 69ndash84

[24] M Patel B Naughton C Chan N Sprecher S Abeta A Neal and others 2014Mobile-edge computing introductory technical white paper White Paper Mobile-edge Computing (MEC) industry initiative (2014)

[25] Brad Philip and Paul Updike 2001 Caltech Vision Group 2001 testing databasehpwwwvisioncaltecheduhtml-lesarchivehtml (2001)

[26] Kari Pulli Anatoly Baksheev Kirill Kornyakov and Victor Eruhimov 2012 Real-time computer vision with OpenCV Commun ACM 55 6 (2012) 61ndash69

[27] Je Rasley Konstantinos Karanasos Srikanth Kandula Rodrigo Fonseca MilanVojnovic and Sriram Rao 2016 Ecient queue management for cluster sched-uling In Proceedings of the Eleventh European Conference on Computer SystemsACM 36

[28] Mahadev Satyanarayanan 2001 Pervasive computing Vision and challengesIEEE Personal communications 8 4 (2001) 10ndash17

[29] Mahadev Satyanarayanan 2017 e Emergence of Edge Computing Computer50 1 (2017) 30ndash39

[30] Mahadev Satyanarayanan Pieter Simoens Yu Xiao Padmanabhan Pillai ZhuoChen Kiryong Ha Wenlu Hu and Brandon Amos 2015 Edge analytics in theinternet of things IEEE Pervasive Computing 14 2 (2015) 24ndash31

[31] Cong Shi Karim Habak Pranesh Pandurangan Mostafa Ammar Mayur Naikand Ellen Zegura 2014 Cosmos computation ooading as a service for mobiledevices In Proceedings of the 15th ACM international symposium on Mobile adhoc networking and computing ACM 287ndash296

[32] W Shi J Cao Q Zhang Y Li and L Xu 2016 Edge Computing Visionand Challenges IEEE Internet of ings Journal 3 5 (Oct 2016) 637ndash646 DOIhpsdoiorg101109JIOT20162579198

[33] Weisong Shi and Schahram Dustdar 2016 e Promise of Edge ComputingComputer 49 5 (2016) 78ndash81

[34] Tolga Soyata Rajani Muraleedharan Colin Funai Minseok Kwon and WendiHeinzelman 2012 Cloud-Vision Real-time face recognition using a mobile-cloudlet-cloud acceleration architecture In Computers and Communications(ISCC) 2012 IEEE Symposium on IEEE 000059ndash000066

[35] Xiaoli Wang Aakanksha Chowdhery andMung Chiang 2016 SkyEyes adaptivevideo streaming from UAVs In Proceedings of the 3rd Workshop on Hot Topics inWireless ACM 2ndash6

[36] Shanhe Yi Zijiang Hao Zhengrui Qin andn Li 2015 Fog Computing Plat-form and Applications In Hot Topics in Web Systems and Technologies (HotWeb)2015 ird IEEE Workshop on IEEE 73ndash78

[37] Shanhe Yi Cheng Li and n Li 2015 A Survey of Fog Computing ConceptsApplications and Issues In Proceedings of the 2015 Workshop on Mobile Big DataMobidata rsquo15 ACM 37ndash42

[38] Shanhe Yi Zhengrui Qin andn Li 2015 Security and privacy issues of fogcomputing A survey InWireless Algorithms Systems and Applications Springer685ndash695

[39] Matei Zaharia Mosharaf Chowdhury Michael J Franklin Sco Shenker and IonStoica 2010 Spark cluster computing with working sets HotCloud 10 (2010)10ndash10

[40] Haoyu Zhang Ganesh Ananthanarayanan Peter Bodik Mahai PhiliposeParamvir Bahl and Michael J Freedman 2017 Live Video Analytics at Scale withApproximation and Delay-Tolerance In 14th USENIX Symposium on NetworkedSystems Design and Implementation (NSDI 17) USENIX Association BostonMA 377ndash392 hpswwwusenixorgconferencensdi17technical-sessionspresentationzhang

[41] Q Zhang X Zhang Q Zhang W Shi and H Zhong 2016 Firework Big DataSharing and Processing in Collaborative Edge Environment In 2016 Fourth IEEEWorkshop on Hot Topics in Web Systems and Technologies (HotWeb) 20ndash25 DOIhpsdoiorg101109HotWeb201612

[42] Tan Zhang Aakanksha Chowdhery Paramvir Victor Bahl Kyle Jamieson andSuman Banerjee 2015 e design and implementation of a wireless videosurveillance system In Proceedings of the 21st Annual International Conferenceon Mobile Computing and Networking ACM 426ndash438

  • Abstract
  • 1 Introduction
  • 2 Background and Motivation
    • 21 Edge Computing Network
    • 22 Serverless Architecture
    • 23 Video Edge Analytics for Public Safety
      • 3 LAVEA System Design
        • 31 Design Goals
        • 32 System Overview
        • 33 Edge Computing Services
          • 4 Edge-front Offloading
            • 41 Task Offloading System Model and Problem Formulation
            • 42 Optimization Solver
            • 43 Prioritizing Edge Task Queue
            • 44 Workload Optimizer
              • 5 Inter-edge Collaboration
                • 51 Motivation and Challenges
                • 52 Inter-Edge Task Placement Schemes
                  • 6 System Implementation and Performance Evaluation
                    • 61 Implementation Details
                    • 62 Evaluation Setup
                    • 63 Task Profiler
                    • 64 Offloading Task Selection
                    • 65 Edge-front Task Queue Prioritizing
                    • 66 Inter-Edge Collaboration
                      • 7 Related Work
                        • 71 Distributed Data Processing
                        • 72 Computation Offloading
                          • 8 Discussions and Limitations
                          • 9 Conclusion
                          • References
Page 4: LAVEA: Latency-aware Video Analytics on Edge Computing Platformweisongshi.org/papers/yi17-LAVEA.pdf · 2020. 8. 19. · LAVEA: Latency-aware Video Analytics on Edge Computing Platform

SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA S Yi et al

of Automated License Plate Recognition (ALPR) Even though weintegrate specic application our edge platform is a general designand can be extended for other application with lile modicationsAn ALPR system usually has four stages 1) image acquisition 2)license plate extraction 3) license plate analysis and 4) characterrecognition [2 10] Each of the stages involves various computervision paern recognition and machine learning algorithms Mi-grating the execution of some algorithms to powerful edgecloudnode can signicantly reduce the response time [34] However of-oaded tasks require intermediate data application state variablesand corresponding congurations to be uploaded Some of the algo-rithms produce large amount of intermediate data will add delay tothe whole processing time if ooaded to remote cloud We believethat a carefully designed edge computing platform will assist ALPRsystem to expand on more resource-constrained devices at morelocations and provide beer response time at the same time

3 LAVEA SYSTEM DESIGNIn this section we present our system design First we will discussour design goals en we will overview our system design andintroduce several important edge computing services

31 Design Goalsbull Latency e ability to provide low latency services is

recognized as one of the essential requirements of edgecomputing system design

bull Flexibility Edge computing system should be able to ex-ibility utilize the hierarchical resources from client nodesnearby edge nodes and remote cloud nodes

bull Edge-rst By edge-rst we mean that the edge comput-ing platform is the rst choice of our computation ooad-ing target

32 System OverviewLAVEA is intrinsically an edge computing platform which supportslow-latency video processing e main components are edge com-puting node and edge client Whenever a client is running tasks andthe nearby edge computing node is available a task can be decidedto run either locally or remotely We present the architecture ofour edge computing platform in Figure 4

321 Edge Computing Node In LAVEA the edge computingnode provides edge computing services to themobile devices nearbye edge computing node aached to the same access point or basestation as clients is called the edge-front By deploying edge com-puting node with access point or base station we ensure that edgecomputing service can be as ubiquitous as Internet access Multipleedge computing nodes can collaborate and the edge-front will al-ways serve as the master and be in charge of the coordination withother edge nodes and cloud nodes As shown in Figure 4 we usethe light-weight virtualization technique to provide resource allo-cation and isolation to dierent clients Any client can submit tasksto the platform via client APIs e platform will be responsiblefor shaping workload managing queue priorities and schedulingtasks ose functions are implemented via internal APIs providedby multiple micro-services such as queueing service scheduling

service data store service etc We will introduce several importantservices later in this section

322 Edge Client Since most edge clients are either resourceconstrained devices or need to accommodate requests from a largenumber of clients an edge client usually runs lightweight dataprocessing tasks locally and ooads heavy tasks to edge computingnode nearby In LAVEA the edge client has a thin client design tomake sure all the clients can run it without introducing too muchoverhead For low-end devices there is only one worker to makeprogress on the assigned job e most important part of clientnode design is the proler and the ooading controller acting asparticipants in the corresponding proler service and ooadingservice With proler and ooading controller a client can provideooading information to the edge-front node and fulll ooadingdecision received

33 Edge Computing Services331 Profiler Service Similar to [7 21 31] our system uses a

proler to collect task performance information on various devicessince it is dicult to derive an analytic model to accurately capturethe behavior of the whole system However we have found thatthe execution of video process tasks is relatively stable (when inputand algorithmic congurations are given) and a proler can be usedto collect relevant metrics erefore we add a proling phaseto the deployment of every new type of client devices and edgedevices e proler will execute instrumented tasks multiple timeswith dierent inputs and congurations on the device and measuremetrics including but not limited to execution time inputoutputdata size etc e time-stamped logs will be gathered to buildthe task execution graph for specic tasks inputs congurationsand devices e proler service will collect those information onwhich LAVEA relies for ooading decisions

332 Monitoring Service Unlike proler service which gath-ers pre-run-time execution information on pre-dened inputs andcongurations the monitoring service is used to continuously mon-itoring and collect run-time information such as the network sys-tem load from not only the clients but also nearby edge nodesMonitoring the network between client and edge-front is neces-sary since most edge clients are connected to edge-front servervia wireless link e condition of wireless link is changing fromtime to time erefore we need to constantly monitor the wirelesslink to estimate the bandwidth and the latency Monitoring systemload on the edge client provides exible workload shaping andtask ooading from client to the edge is information is alsobroadcasted among nearby edge nodes When an edge-front nodeis saturated or unstable some tasks will be assigned to nearby edgenodes according to the system load the network bandwidth andnetwork delay between edge nodes as long as there is still benetinstead of assigning tasks to cloud node

333 Oloading Service e ooading controller will tracktasks running locally at the client exchange information with theooading service running on the edge-front server e variablesgather in proler and monitoring services will be used to as in-puts to the ooading decision problem which is formulated as anoptimization problem to minimize the response time Every time

LAVEA Latency-aware Video Analytics on Edge Computing Platform SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA

Host or Host Cluster Hardware

Host OS

Container

Container Manager (Docker Engine)

HDFS SQL KV Store

Data Store Service

Offloading Service

Queueing Service

Scheduling Service

Edge Front Gateway

Worker

Task Queue

Container Container Container Container

MonitoringService

Worker Worker

Task Scheduler

Worker Worker Worker

Worker Worker Worker

Producer

Workload Optimizer

Producer

Graph

Queue Prioritizer

Task Worker

ProfilerService

Edge Computing Platform API

Platform Internal API

OS-level Virtualization

Edge Computing Platform SDKEdge Computing Platform Client API

Application

ProfilerOffloading Controller

Worker Worker Worker

Task Scheduler

Local Worker Stack

OS or Container

Edge Computing Node

Access Potint

Security Camera

Dash Camera Smartphone and Tablet

Laptop

Figure 4 e architecture of edge computing platform

when a new client registers itself to the ooading services aerthe edge-front node collects enough prerequisite information andstatistics the optimization problem is solved again and the updatedooading decisions will be sent to all the clients Periodically theooading service also solves the optimization problem and updateooading decisions with its clients

4 EDGE-FRONT OFFLOADINGIn this section we describe how we select tasks of a job to runremotely on the edge server in order to minimize the responsetime

We consider selecting tasks to run on the edge as a computationooading problem Traditional ooading problems are ooadingschemes between clients and remote powerful cloud servers Inliterature [7 21 31] those system models usually assume the taskwill be instantly nished remotely once the task is ooaded to theserver However we argue that this assumption will not hold inedge computing environment as we need to consider the variousdelays at the server side especially when lots of clients are sendingooading requests We call it edge-front computation ooadingfrom the perspective of client

bull Tasks will be only by ooaded from client to the nearestedge node which we call the edge front

bull e underlying scheduling and processing is agnostic toclients

bull When a mobile node is disconnected from any edge nodeor even cloud node it will resort to local execution of allthe tasks

We assume that edge node is wire-connected to the access pointwhich indicates that the out-going trac can go through edge nodewith no additional cost e only dierence between ooading taskto edge node and cloud node is that the task running on edge nodemay experience resource contention and scheduling delay whilewe assume task ooaded to cloud node will get enough resourceand be scheduled to run immediately In light work load case ifthere is any response time reduction when this task is ooadedto cloud then we know that there is denitely benet when thistask is ooaded to the edge e reasons are 1) an edge server isas responsive as the server in the cloud data center 2) running atask on edge server experiences shorter data transmission delay asclient-edge link has much larger bandwidth than edge-cloud linkwhich is usually limited and imbalanced by the Internet serviceproviders (ISPs) erefore in this section we focus on the taskooading only between client and edge server and we will discussintegrating nearby edge nodes for the heavy work load scenario inthe next section

41 Task Oloading System Model andProblem Formulation

In the paper we call a running instance of the application as a jobwhich is usually a set of tasks e job is the unit of work thatuser submits to our system while the task is the unit of work forour system to make scheduling and optimization decisions esetasks generated from each application will be queued and processedeither locally or remotely By remotely we mean run the task on anedge node For simplicity we consider that all clients are running

SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA S Yi et al

instances of applications processing same kind of jobs which istypically the case in our edge application scenario However oursystem can easily extend to heterogeneous applications running oneach client device

We choose to work on the granularity of task since those tasksare modularized and can be exibly combined to either achievea speed processing or form a workow with high accuracy Inour ALPR application each task is usually a common algorithmor library of computer vision For example We have analyzed anopen source ALPR project called OpenALPR [22] and illustrate itstask graph in Fig 5

Input

MotionDetection

PlateDetection

PlateCharacterAnalysis

image

video frame

motionregion

Output

imagestill

CharacterRecognition

no plate detected

plate candidate

ResultGeneration

PlateCharacterAnalysis

CharacterRecognition

PlateCharacterAnalysis

CharacterRecognition

plate candidate

Figure 5 e task graph of OpenALPR

en we consider there are N clients and only one edge serverconnected as shown in Fig 1 is edge server could be a singleserver or a cluster of servers Each client i i isin [1N ] will processthe upcoming job upon request eg recognizing the license platesin video streams Usually those jobs will generate heavy compu-tation task and could benet from ooading some of them to theedge server Without loss of generality we use a graph of task torepresent the complex task relations inside a job which is essen-tially similar to the method call graph in [7] but in a more coarsegranularity For a certain kind of job we start with its directedacyclic graph (DAG) G = (V E) which gives the task executionsequence Each vertex v isin V weight is the computation or memorycost of a task (cv ) while each edge e = (uv)uv isin V e isin E weightrepresents the data size of intermediate results (duv ) us ourooading problem can be taken as a graph partition problem inwhich we need to assign a directed graph of tasks to dierent com-puting nodes (local edge or cloud) with the purpose to minimize

certain cost In this paper we primarily try to minimize the jobnish time

e remote response time includes the communication delay thenetwork transmission delay of sending data to the edge server andthe execution time on that server We use an indicator Ivi isin 0 1for all v in V and for all i isin [1N ] If Ivi = 1 then the task v atclient i will run locally otherwise it will run on the remote edgeserver For those tasks running locally the total execution time forclient i is a sum as

T locali =sumv isinV

Ivicvpi (1)

where pi is the processor speed of client i Similarly we use

Tlocali =

sumv isinV(1 minus Ivi )cvpi (2)

to represent the execution time of running the ooaded taskslocally For network when there is an ooading decision theclient need to upload the intermediate data (outputs of previoustask application status variables congurations etc) to the edgeserver in order to continue the computation e network delay ismodeled as

Tneti =sum(uv)isinE

(Iui minus Ivi )duvri + βi (3)

where ri is the bandwidth assigned for this client and βi is thecommunication latency which can be estimated using round triptime between the client i and the edge server

For each client the remote execution time is

T r emotei =

sumv isinV(1 minus Ivi )(cvp0) (4)

where p0 is the processor speed of the edge serveren our ooading task selection problem can be formulated as

minIi ri

Nsumi=1(T locali +Tneti +T r emote

i ) (5)

e ooading task selection is represented by the indicator matrixI is optimization problem is subject to the following constraints

bull e total bandwidth

stNsumi=1

ri le R (6)

bull Like existing work we restrict the data ow to avoid ping-pong eect in which intermediate data is transmied backand forth between client and edge serverst Ivi le Iui foralle(uv) isin Eforalli isin [1N ] (7)

bull Unlike exiting ooading frameworks for mobile cloud com-puting we take the resource contention or schedule delayat the edge side into consideration by adding a end-to-enddelay constraint

st Tlocali minus (Tneti +T r emote

i ) gt τ foralli isin [1N ] (8)where τ can be tuned to avoid selecting borderline tasksthat if ooaded will get no gain due to the resource con-tention or schedule delay at the edge

LAVEA Latency-aware Video Analytics on Edge Computing Platform SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA

42 Optimization Solvere proposed optimization is a mixed integer non-linear program-ming problem (MINLP) where the integer variable stands for theooading decision and the continuous variable stands for the band-width allocation To solve this optimization problem we start fromrelaxing the integer constraints and solve the non-linear program-ming version of the problem using Sequential adratic Program-ming method a constrained nonlinear optimization method issolution is optimal without considering the integer constraintsStarting from this optimal solution we optionally employ branchand bound (BampB) method to search for the optimal integer solutionor simply do an exhaustive search when the number of clients andthe number of tasks of each job are small

43 Prioritizing Edge Taskeuee ooading strategy produced by the task selection optimizesthe rdquoowldquo time of each type of job At each time epoch during therun time the edge-front node receives a large number of ooadedtasks from the clients Originally we follow the rst come rstserve rule to accommodate all the client requests For each requestat the head of the task queue the edge-front server rst checksif the input or intermediate data (eg images or videos) availableat the edge otherwise the server waits is scheme is easy toimplement but substantial computation is wasted if the networkIO is busy with a large size le and there is no task that is readyfor processing erefore we improve the task scheduling witha task queue prioritizer to maintain a task sequence which mini-mizes the makespan for the task scheduling of all ooading taskrequests received at a certain time epoch Since the edge node canexecute the task only when the input data has been fully receivedor the depended tasks have nished execution We consider thatan ooaded task has to go through two stages the rst stage isthe retrieval of input or intermediate data and state variables thesecond stage is the execution of the task

We model our scheduling problem using the ow job shop modeland apply the Johnsonrsquos rule [19] is scheme is optimal and themakespan is minimized when the number of stages is two Nev-ertheless this model only ts in the case that all submied jobrequests are independent and no priorities When consideringtask dependencies a successor can only start aer its predecessornishes By enforcing the topological ordering constraints theproblem can be solved optimally using the BampB method [5] How-ever this solution hardly scales against the number of tasks Inthis case we adapt the method in [3] and group tasks with depen-dencies and execute all tasks in a group sequentially e basicidea is applying Johnsonrsquos rule in two levels e rst level is todecide the sequence of tasks with in each group e dierence inour problem is that we need to decide the best sequence amongall valid topological orderings en the boom level is a job shopscheduling problem in terms of grouped jobs (ie a group of taskswith dependencies in topological ordering) in which we can utilizeJohnsonrsquos rule directly

44 Workload OptimizerIf the workload is overwhelming and the edge-front server is satu-rated the task queue will be unstable and the response time will

be accumulated indenitely ere are several measures we cantake to address this problem First we can adjust the imagevideoresolution in client-side congurations which makes well trade-o between speed and accuracy Second by constraining the taskooading problem we can restrain more computation tasks at theclient side ird if there are nearby edge nodes which are favoredin terms of latency bandwidth and computation we can furtherooad tasks to nearby edge nodes We have investigated this casewith performance improvement considerations in the Section 5Last we can always redirect tasks to the remote cloud just likeooading in MCC

5 INTER-EDGE COLLABORATIONIn this section We improve our edge-rst design by taking the casewhen the incoming workload saturates our edge-front node intoconsideration We will rst discuss our motivation of providingsuch option and list the corresponding challenges en we willintroduce several collaboration schemes we have proposed andinvestigated

51 Motivation and Challengese resources of edge computing node are much richer than clientnodes but are relatively limited compared to cloud nodes Whileserving an increasing number of client nodes nearby the edge-frontnode will be eventually overloaded and become non-responsive tonew requests As a baseline we can optionally choose to ooad fur-ther requests to the remote cloud We assume that the remote cloudhas unlimited resources and is capable to handle all the requestsHowever running tasks remotely in the cloud the application needto bear with unpredictable latency and limited bandwidth whichis not the best choice especially when there are other nearby edgenodes that can accommodate those tasks We assume that underthe condition when all available edge nodes nearby are exhaustedthe mobile-edge-cloud computing paradigm will simply fall back tothe mobile cloud computing paradigm e fallback design is notin the scope of this paper In this paper we mainly investigate theinter-edge collaboration with the prime purpose to alleviate theburden on edge-front node

When the edge-front node is saturated with requests it cancollaborate with nearby edge nodes by placing some tasks to thesenot-so-busy edge nodes such that all the tasks can get scheduledin a reasonable time is is slightly dierent from balancing theworkload among the edge nodes and the edge-front node in that thegoal of inter-edge collaboration is to beer serve the client nodeswith submied requests rather than simply making the workloadbalanced For example an edge-front node that is not overloadeddoes not need to place any tasks to the nearby edge nodes evenwhen they are idle

e challenges of inter-edge collaboration are two-fold 1) weneed to design a proper inter-edge task placement scheme thatfullls our goal of reducing the workload on the edge-front nodewhile ooading proper amount of workload to the qualied edgenodes 2) the task placement scheme should be lightweight scalableand easy-to-implement

SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA S Yi et al

52 Inter-Edge Task Placement SchemesWe have investigated three task placement schemes for inter-edgecollaboration

bull Shortest Transmission Time First (STTF)bull Shortest eue Length First (SQLF)bull Shortest Scheduling Latency First (SSLF)

e STTF task placement scheme tends to place tasks on theedge node that has the shortest estimated latency for the edge-front node to transfer the tasks e edge-front node maintains atable to record the latency of transmiing data to each availableedge node e periodical re-calibration is necessary because thenetwork condition between the edge-front node and other edgenodes may vary from time to time

e SQLF task placement scheme on the other hand tends totransfer tasks from the edge-front node to the edge node which hasthe least number of tasks queued upon the time of query Whenthe edge-front node is saturated with requests it will rst queryall the available edge nodes about their current task queue lengthand then transfer tasks to the edge node that has the shortest valuereported

e SSLF task placement scheme tends to transmit tasks from theedge-front node to the edge node that is predicted to have the short-est response time e response time is the time interval betweenthe time when the edge-front node submits a task to an availableedge node and the time when it receives the result of the task fromthat edge node Unlike the SQLF task placement scheme the edge-front node keeps querying the edge nodes about the queue lengthwhich may has performance issue when the number of nodes scalesup and results in a large volume of queries We have designed anovel method for the edge-front node to measure the schedulinglatency eciently During the measurement phase before edge-front node chooses task placement target edge-front node sendsa request message to each available edge node which appends aspecial task to the tail of the task queue When the special taskis executed the edge node simply sends a response message tothe edge-front node e edge-front node receives the responsemessage and records the response time Periodically the edge-frontnode maintains a series of response times for each available edgenode When the edge-front node is saturated it will start to reassigntasks to the edge node having the shortest response time Unlikethe STTF and SQLF task assignment schemes which choose thetarget edge node based on the current or most recent measurementsthe SSLF scheme predicts the current response time for each edgenode by applying regression analysis to the response time seriesrecorded so far e reason is that the edge nodes are also receivingtask requests from client nodes and their local workload may varyfrom time to time so the most recent response time cannot serveas a good predictor of the current response time for the edge nodesAs the local workload in the real world on each edge node usuallyfollows certain paern or trend applying regression analysis tothe recorded response times is a good way to estimate the currentresponse time To this end we recorded measurements of responsetimes from each edge node and ooads tasks to the edge nodethat is predicted to have the least current response time Once theedge-front node starts to place task to a certain edge node the

estimation will be updated using piggybacking of the redirectedtasks which lowers the overhead of measuring

Each of the task placement schemes described above has someadvantages and disadvantages For instance the STTF scheme canquickly reduce the workload on the edge-front node But there isa chance that tasks may be placed to an edge node which alreadyhas intensive workload as STTF scheme gathers no information ofthe workload on the target e SQLF scheme works well when thenetwork latency and bandwidth are stable among all the availableedge nodes When the network overheads are highly variant thisscheme fails to factor the network condition and always choosesedge node with the lowest workload When an intensive workloadis placed under a high network overhead this scheme potentiallydeteriorates the performance as it needs to measure the workloadfrequently e SSLF task placement scheme estimates the responsetime of each edge node by following the task-ooading processand the response time is a good indicator of which edge node shouldbe chosen as the target of task placement in terms of the workloadand network overhead e SSLF scheme is a well trade-o betweenprevious two schemes However the regression analysis may intro-duce a large error to the predicted response time if inappropriatemodels are selected We believe that the decision of which taskplacement scheme should be employed for achieving good systemperformance should always give proper considerations on the work-load and network conditions We evaluated those three schemesthrough a case study in the next section

6 SYSTEM IMPLEMENTATION ANDPERFORMANCE EVALUATION

In this section we rst brief the implementation details in buildingour system Next we introduce our evaluation setup and presentresults of our evaluations

61 Implementation DetailsOur implementation aims at a serverless edge architecture Asshown in system architecture of Fig 4 our implementation is basedon docker container for the benets of quick deployment and easymanagement Every component has been dockerized and its de-ployment is greatly simplied via distributing pre-built images ecreation and destruction of docker instances is much faster thanthat of VM instances Inspired by the IBM OpenWhisk [18] eachworker container contains an action proxy which uses Pythonto run any scripts or compile and execute any binary executablee worker container communicates with others using a messagequeue as all the inputsoutputs will be jsonied However we donrsquotjsonied imagevideo and use its path reference in shared storagee task queue is implemented using Redis as it is in memory andhas very good performance e end user only needs to 1) deployour edge computing platform on heterogeneous devices with justa click 2) dene the event of interests using provided API 3) pro-vide a function (scripts or binary executable) to process such evente function we have implemented is using open source projectOpenALPR[22] as the task payload for workers

LAVEA Latency-aware Video Analytics on Edge Computing Platform SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA

62 Evaluation Setup621 Testbed We have built a testbed consisting of four edge

computing nodes One of the edge nodes is the edge-front nodewhich is directly connected to a wireless router using a cable Otherthree nodes are set as nearby edge computing nodes for the evalua-tion of inter-edge collaboration ese four machines have the samehardware specications ey all have a quad-core CPU and 4 GBmain memory e three nearby edge nodes are directly connectedto the edge-front node through a network cable We make use oftwo types of Raspberry Pi (RPi) nodes as clients one type is RPi 2which is wired to the router while the other type is RPi 3 which isconnected to router using built-in 24 Ghz WiFi

622 Datasets We have employed three datasets for evalua-tions One dataset is the Caltech Vision Group 2001 testing databasein which the car rear image resolution (126 images with resolution896x592) is adequate for license plate recognition [25] Anotherdataset is a self-collected 4K video contains rear license plates takenon an Android smartphone and is converted into videos of dierentresolutions(640x480 960x720 1280x960 and 1600x1200) e otherdataset used in inter-edge collaboration evaluation contains 22 carimages with the various resolution ranging from 405x540 pixelsto 2514x1210 pixels (le size 316 KB to 285 MB) e task requestsuse the car images as input in a round-robin way one car imagefor each task request

63 Task ProlerBeside the round trip time and bandwidth benchmark we havepresented in Fig 2 and Fig 3 to characterize the edge computingnetwork we have done proling of the OpenALPR application onvarious client edge and cloud nodes

896x592workload1

640x480workload2

960x720workload2

1280x960workload2

1600x1200workload2

0200

400

600

800

Clie

nt

RPi2

Execu

tion T

ime(m

s)

MotionDetectonPlateDetectionPlateAnalysisOCR

Figure 6 OpenALPR prole result of client type 1 (RPi2quad-core 09GHz)

In this experiment we are using both dataset 1 (workload 1)and dataset 2 (workload 2) at various resolutions e executiontime for each tasks are shown in Fig 6 Fig 7 Fig 8 and Fig 9e results indicate that by utilizing an edge nodes we can geta comparable amount of computation power close to clients forcomputation-intensive tasks Another observations is that dueto the uneven optimizations on heterogeneous CPU architecturessome tasks are beer to keep local while some others should be

896x592workload1

640x480workload2

960x720workload2

1280x960workload2

1600x1200workload2

020

04

00

60

0800

Clie

nt

RPi3

Execu

tion T

ime(m

s)

MotionDetectonPlateDetectionPlateAnalysisOCR

Figure 7 OpenALPR prole result of client type 2 (RPi3quad-core 12GHz)

896x592workload1

640x480workload2

960x720workload2

1280x960workload2

1600x1200workload2

0200

400

600

800

Edge E

xecu

tion T

ime(m

s)

MotionDetectonPlateDetectionPlateAnalysisOCR

Figure 8 OpenALPR prole result of a type of edge node (i7quad-core 230GHz)

896x592workload1

640x480workload2

960x720workload2

1280x960workload2

1600x1200workload2

0200

400

600

800

EC

2 T

2 L

arg

e E

xecu

tion T

ime(m

s)

MotionDetectonPlateDetectionPlateAnalysisOCR

Figure 9 OpenALPR prole of a type of cloud node (AWSEC2 t2large Xeon dual-core 240GHz)

ooaded to edge computing node is observation justied theneed of computation ooading between clients and edge nodes

SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA S Yi et al

640x480workload2

960x720workload2

1280x960workload2

1600x1200workload2

01

23

4R

esp

onse

Tim

e p

er

fram

e p

er

clie

nt(

s) Client-edge optClient onlyEdge onlyClient-cloud optCloud only

Figure 10 e comparison of task selection impacts on edgeoloading and cloud oloading for wired clients (RPi2)

640x480workload2

960x720workload2

1280x960workload2

1600x1200workload2

01

23

4R

esp

onse

Tim

e p

er

fram

e p

er

clie

nt(

s) Client-edge optClient onlyEdge onlyClient-cloud optCloud only

Figure 11 e comparison of task selection impacts on edgeoloading and cloud oloading for 24 Ghz wireless clients(RPi3)

64 Oloading Task SelectionTo understand how much the execution time can be reduced byspliing tasks between the client and the edge or between theclient and the cloud we design an experiment with workloadsgenerated from dataset 2 on two setups of scenarios 1) one edgenode provides service to three wired client nodes that have thebest network latency and bandwidth 2) one edge node providesservice to three wireless 24 Ghz client nodes that have latencywith high variance and relatively low bandwidth e result ofthe rst case is very straightforward the clients simply uploadall the input data and run all the tasks on the edge node in edgeooading or cloud node in cloud ooading as shown in Fig 10is is mainly because using Ethernet cable can stably providelowest latency and highest bandwidth which makes ooading toedge very rewarding We didnrsquot evaluate 5 Ghz wireless client sincethis interface is not supported on our client hardware while weanticipate similar results as the wire case We plot the result of a24 Ghz wireless client node with ooading to an edge node or aremote cloud node in the second case in Fig 11 Overall the resultsshowed that by ooading tasks to an edge computing platform

the application we choose experienced a speedup up to 40x onwired client-edge conguration compared to local execution andup to 17x compared to a similar client-cloud conguration Forclients with 24 Ghz wireless interface the speedup is up to 13x onclient-edge conguration compared to local execution and is up to12x compared to similar client-cloud conguration

5 10 15 20 25 30 35Number of task offloading requests

05

Resp

onse

Tim

e(s

)

Our schemeSIOFLCPUL

Figure 12 e comparison result of three task prioritizingschemes

65 Edge-front Taskeue PrioritizingTo evaluate the performance of the task queue prioritizing wecollect the statistical results from our proler service and moni-toring service on various workload for simulation We choose thesimulation method because we can freely setup the numbers andtypes of client and edge nodes to overcome the limitation of ourcurrent testbed to evaluate more complex deployments We addtwo simple schemes as baselines 1) shortest IO rst (SIOF) sortingall the tasks against the time cost of the network transmission 2)longest CPU last (LCPUL) sorting all the tasks against the time costof the processing on the edge node In the simulation base on thecombination of client device types workloads and ooading deci-sions we have in total seven types of jobs to run on the edge nodeWe increase the total number of jobs and evenly distributed themamong the seven types and report the makespan time in Fig 12 eresult showed that LCPUL is the worst among those three schemesand our scheme outperforms the shortest job rst scheme

66 Inter-Edge CollaborationWe also evaluate the performance of the three task placementschemes (ie STTF SQLF and SSLF) discussed in Section 5 througha controlled experiment on our testbed For evaluation purpose wecongure the network in the edge computing system as followse rst edge node denoted as ldquoedge node 1rdquo aerwards has 10ms RTT and 40 Mbps bandwidth to the edge-front node e secondedge node ldquoedge node 2rdquo has 20 ms RTT and 20 Mbps bandwidthto the edge-front node e third edge node ldquoedge node 3rdquo has100 ms RTT and 2 Mbps bandwidth to the edge-front node uswe emulate the situation where three edge nodes are in the distanceto the edge-front node from near to far

LAVEA Latency-aware Video Analytics on Edge Computing Platform SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA

0

05

1

15

2

25

3

0 2 4 6 8 10 12

Th

rou

gh

pu

t (t

ask

sse

c)

Time (min)

Edge-front nodeEdge node 1Edge node 2Edge node 3

Figure 13 Performance with no task placement scheme

0

05

1

15

2

25

3

0 2 4 6 8 10 12

Th

rou

gh

pu

t (t

ask

sse

c)

Time (min)

Edge-front nodeEdge node 1Edge node 2Edge node 3

Figure 14 Performance of STTF

0

05

1

15

2

25

3

0 2 4 6 8 10 12

Th

rou

gh

pu

t (t

ask

sse

c)

Time (min)

Edge-front nodeEdge node 1Edge node 2Edge node 3

Figure 15 Performance of SQLF

0

05

1

15

2

25

3

0 2 4 6 8 10 12

Th

rou

gh

pu

t (t

ask

sse

c)

Time (min)

Edge-front nodeEdge node 1Edge node 2Edge node 3

Figure 16 Performance of SSLF

We use the third dataset to synthesize a workload as follows Inthe rst 4 minutes the edge-front node receives 5 task requests persecond edge node 1 receives 4 task requests per second edge node2 receives 3 task requests per second and edge node 3 receives 2task requests per second respectively No task comes to any of theedge nodes aer the rst 4 minutes For the SSLF task placementscheme we implement a simple linear regression to predict thescheduling latency of the task being transmied since the workloadwe have injected is uniform distributed

Fig 13 illustrates the throughput on each edge node when notask placement scheme is enabled on the edge-front node eedge-front node has the heaviest workload and it takes about 1236minutes to nish all the tasks We consider this result as our base-line

Fig 14 is the throughput result of STTF scheme In this case theedge-front node only transmits tasks to edge node 1 because edgenode 1 has the highest bandwidth and the shortest RTT to theedge-front node Fig 17 reveals that the edge-front node transmits120 tasks to edge node 1 and no task to other edge nodes Asedge node 1 has heavier workload than edge node 2 and edgenode 3 the STTF scheme has limited improvement on the systemperformance the edge-front node takes about 1129 minutes tonish all the tasks Fig 15 illustrates the throughput result of SQLF

scheme is scheme works beer than the STTF scheme becausethe edge-front node transmits more tasks to less-saturated edgenodes eciently reducing the workload on the edge-front nodeHowever the edge-front node intends to transmit many tasks toedge node 3 at the beginning which has the lowest bandwidth andthe longest RTT to the edge-front node As such the task placementmay incur more delay then expected From Fig 17 the edge-frontnode transmits 0 task to edge node 1 132 tasks to edge node 2and 152 tasks to edge node 3 e edge-front node takes about 96minutes to nish all the tasks

Fig 16 demonstrates the throughput result of SSLF scheme isscheme considers both the transmission time of the task beingplaced and the waiting time in the queue on the target edge nodeand therefore achieves the best performance of the three As men-tioned edge node 1 has the lowest transmission overhead but theheaviest workload among the three edge nodes while edge node3 has the lightest workload but the highest transmission overheadIn contrast edge node 2 has modest transmission overhead andmodest workload e SSLF scheme takes all these situations intoconsideration and places the most number of tasks on edge node2 As shown in Fig 17 the edge-front node transmits 4 tasks toedge node 1 152 tasks to edge node 2 and 148 tasks to edgenode 3 when working with the SSLF scheme e edge-front node

SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA S Yi et al

takes about 936 minutes to nish all the tasks which is the bestresult among the three schemes We infer that the third schemewill further improve the task complete time if more tough networkconditions and workloads are considered

STTF SQLF SSLF

050

10

015

0N

um

ber

of

task

palc

ed

0task

0task

0task

4tasks

Edge node 1Edge node 2Edge node 3

Figure 17 Numbers of tasks placed by the edge-front node

7 RELATEDWORKe emergence of edge computing has drawn aentions due to itscapabilities to reshape the land surface of IoTs mobile computingand cloud computing [6 14 32 33 36ndash38] Satyanarayanan [29]has briefed the origin of edge computing also known as fog comput-ing [4] cloudlet [28] mobile edge computing [24] and so on Herewe will review several relevant research elds towards video edgeanalytics including distributed data processing and computationooading in various computing paradigms

71 Distributed Data ProcessingDistributed data processing has close relationship to the edge an-alytics in the sense that those data processing platforms [9 39]and underlying techniques [16 23 27] can be easily deployed on acluster of edge nodes In this paper we pay specially aentions todistributed imagevideo data processing systems VideoStorm[40]made insightful observation on vision-related algorithms and pro-posed resource-quality trade-o with multi-dimensional congu-rations (eg video resolution frame rate sampling rate slidingwindow size etc) e resource-quality proles are generated of-ine and a online scheduler is built to allocate resources to queriesto optimize the utility of quality and latency eir work is comple-mentary to ours in that we do not consider the trade-o betweenquality and latency goals via adaptive congurations Vigil [42] isa wireless video surveillance system that leveraged edge comput-ing nodes with emphasis on the content-aware frame selections ina scenario where multiple web cameras are at the same locationto optimize the bandwidth utilization which is orthogonal to theproblems we have addressed here Firework [41] is a computingparadigm for big data processing in collaborative edge environmentwhich is complementary to our work in terms of shared data viewand programming interface

While there should be more on-going eorts for investigating theadaptation improvement and optimization of existing distributed

data processing techniques on edge computing platform we focusmore on the taskapplication-level queue management and sched-uling and leave all the underlying resource negotiating processscheduling to the container cluster engine

72 Computation OloadingComputation Ooading (aka Cyber foraging [28]) has been pro-posed to improve resource utilization response time and energyconsumption in various computing environments [7 13 21 31]Work [17] has quantied the impact of edge computing on mobileapplications and found that edge computing can improve responsetime and energy consumption signicantly for mobile devicesthrough ooading via both WiFi and LTE networks Mocha [34]has investigated how a two-stage face recognition task from mobiledevice can be accelerated by cloudlet and cloud In their designclients simply capture image and sends to cloudlet e optimaltask partition can be easily achieved as it has only two stages InLAVEA our application is more complicated in multiple stages andwe leverage client-edge ooading and other techniques to improvethe resource utilization and optimize the response time

8 DISCUSSIONS AND LIMITATIONSIn this section we will discuss alternative design options point outcurrent limitations and future work that can improve the system

Measurement-based Oloading In this paper we are using ameasurement-based ooading (static ooading) ie the ooadingdecisions are based on the outcome of periodic measurements Weconsider this as one of the limitations of our implementations asstated in [15] and there are several dynamic computation ooadingschemes have been proposed [12] We are planning to improve themeasurement-based ooading in the future work

Video Streaming Our current data processing is image-basedwhich is one of the limitations of our implementation e inputis either in the format of image or in video stream which is readinto frames and sent out We believe by utilizing existing videostreaming techniques in between our system components for datasharing will further improve the system performance and openmore potential opportunities for optimization

Discovering EdgeNodes ere are dierent ways for the edge-front node to discover the available edge nodes nearby For exampleevery edge node intending to serve as a collaborator may open adesignated port so that the edge-front node can periodically scanthe network and discover the available edge nodes is is calledthe ldquopull-basedrdquo method In contrast there is also a ldquopush-basedrdquomethod in which the edge-front node opens a designated port andevery edge node intending to serve as a collaborator will register tothe edge-front node When the network is in a large scale the pull-based method usually performs poorly because the edge-front nodemay not be able to discover an available edge node in a short timeFor this reason the edge node discovery is implemented in a push-based method which guarantees good performance regardless ofthe network scale

9 CONCLUSIONIn this paper we have investigated providing video analytic servicesto latency-sensitive applications in edge computing environment

LAVEA Latency-aware Video Analytics on Edge Computing Platform SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA

As a result we have built LAVEA a low-latency video edge analyticsystem which collaborates nearby client edge and remote cloudnodes and transfers video feeds into semantic information at placescloser to the users in early stages We have utilized an edge-frontdesign and formulated an optimization problem for ooading taskselection and prioritized task queue to minimize the response timeOur result indicated that by ooading tasks to the closest edgenode the client-edge conguration has speedup range from 13x to4x (12x to 17x) against running in local (client-cloud) under variousnetwork conditions and workloads In case of a saturating workloadon the front edge node we have proposed and compared varioustask placement schemes that are tailed for inter-edge collaboratione proposed prediction-based shortest scheduling latency rsttask placement scheme considers both the transmission time ofthe task and the waiting time in the queue and outputs overallperformance beer than the other schemes

REFERENCES[1] Amazon Web Service 2017 AWS LambdaEdge hpdocsawsamazoncom

lambdalatestdglambda-edgehtml (2017)[2] Christos-Nikolaos E Anagnostopoulos Ioannis E Anagnostopoulos Ioannis D

Psoroulas Vassili Loumos and Eleherios Kayafas 2008 License plate recog-nition from still images and video sequences A survey IEEE Transactions onintelligent transportation systems 9 3 (2008) 377ndash391

[3] KR Baker 1990 Scheduling groups of jobs in the two-machine ow shop Math-ematical and Computer Modelling 13 3 (1990) 29ndash36

[4] Flavio Bonomi Rodolfo Milito Jiang Zhu and Sateesh Addepalli 2012 Fogcomputing and its role in the internet of things In Proceedings of the rst editionof the MCC workshop on Mobile cloud computing ACM 13ndash16

[5] Peter Brucker Bernd Jurisch and Bernd Sievers 1994 A branch and boundalgorithm for the job-shop scheduling problem Discrete applied mathematics 491 (1994) 107ndash127

[6] Yu Cao Songqing Chen Peng Hou and Donald Brown 2015 FAST A fog com-puting assisted distributed analytics system to monitor fall for stroke mitigationIn Networking Architecture and Storage (NAS) 2015 IEEE International Conferenceon IEEE 2ndash11

[7] Eduardo Cuervo Aruna Balasubramanian Dae-ki Cho Alec Wolman StefanSaroiu Ranveer Chandra and Paramvir Bahl 2010 MAUI Making SmartphonesLast Longer with Code Ooad In Proceedings of the 8th International Conferenceon Mobile Systems Applications and Services (MobiSys rsquo10) ACM New York NYUSA 49ndash62 DOIhpsdoiorg10114518144331814441

[8] Eyal de Lara Carolina S Gomes Steve Langridge S Hossein Mortazavi andMeysam Roodi 2016 Hierarchical Serverless Computing for the Mobile EdgeIn Edge Computing (SEC) IEEEACM Symposium on IEEE 109ndash110

[9] Jerey Dean and Sanjay Ghemawat 2008 MapReduce simplied data processingon large clusters Commun ACM 51 1 (2008) 107ndash113

[10] Shan Du Mahmoud Ibrahim Mohamed Shehata and Wael Badawy 2013 Auto-matic license plate recognition (ALPR) A state-of-the-art review IEEE Transac-tions on circuits and systems for video technology 23 2 (2013) 311ndash325

[11] Sadjad Fouladi Riad S Wahby Brennan Shackle Karthikeyan Vasuki Balasubra-maniam William Zeng Rahul Bhalerao Anirudh Sivaraman George Porter andKeith Winstein 2017 Encoding Fast and Slow Low-Latency Video ProcessingUsing ousands of Tiny reads In 14th USENIX Symposium on NetworkedSystems Design and Implementation (NSDI 17) USENIX Association BostonMA 363ndash376 hpswwwusenixorgconferencensdi17technical-sessionspresentationfouladi

[12] Wei Gao Yong Li Haoyang Lu Ting Wang and Cong Liu 2014 On exploitingdynamic execution paerns for workload ooading in mobile cloud applicationsIn Network Protocols (ICNP) 2014 IEEE 22nd International Conference on IEEE1ndash12

[13] Mark S Gordon Davoud Anoushe Jamshidi Sco A Mahlke Zhuoqing Mor-ley Mao and Xu Chen 2012 COMET Code Ooad by Migrating ExecutionTransparently In OSDI Vol 12 93ndash106

[14] Zijiang Hao Ed Novak Shanhe Yi andn Li 2017 Challenges and SowareArchitecture for Fog Computing Internet Computing (2017)

[15] Mohammed A Hassan Kshitiz Bhaarai Qi Wei and Songqing Chen 2014POMAC Properly Ooading Mobile Applications to Clouds In 6th USENIXWorkshop on Hot Topics in Cloud Computing (HotCloud 14)

[16] Benjamin Hindman Andy Konwinski Matei Zaharia Ali Ghodsi Anthony DJoseph Randy H Katz Sco Shenker and Ion Stoica 2011 Mesos A Platformfor Fine-Grained Resource Sharing in the Data Center In NSDI Vol 11 22ndash22

[17] Wenlu Hu Ying Gao Kiryong Ha Junjue Wang Brandon Amos Zhuo ChenPadmanabhan Pillai andMahadev Satyanarayanan 2016antifying the Impactof Edge Computing onMobile Applications In Proceedings of the 7th ACM SIGOPSAsia-Pacic Workshop on Systems ACM 5

[18] IBM 2017 Apache OpenWhisk hpopenwhiskorg (April 2017)[19] Selmer Martin Johnson 1954 Optimal two-and three-stage production schedules

with setup times included Naval research logistics quarterly 1 1 (1954) 61ndash68[20] Eric Jonas Shivaram Venkataraman Ion Stoica and Benjamin Recht 2017

Occupy the Cloud Distributed computing for the 99 arXiv preprintarXiv170204024 (2017)

[21] Ryan Newton Sivan Toledo Lewis Girod Hari Balakrishnan and Samuel Mad-den 2009 Wishbone Prole-based Partitioning for Sensornet Applications InNSDI Vol 9 395ndash408

[22] OpenALPR 2017 OpenALPR ndash Automatic License Plate Recognition hpwwwopenalprcom (April 2017)

[23] Kay Ousterhout Patrick Wendell Matei Zaharia and Ion Stoica 2013 Sparrowdistributed low latency scheduling In Proceedings of the Twenty-Fourth ACMSymposium on Operating Systems Principles ACM 69ndash84

[24] M Patel B Naughton C Chan N Sprecher S Abeta A Neal and others 2014Mobile-edge computing introductory technical white paper White Paper Mobile-edge Computing (MEC) industry initiative (2014)

[25] Brad Philip and Paul Updike 2001 Caltech Vision Group 2001 testing databasehpwwwvisioncaltecheduhtml-lesarchivehtml (2001)

[26] Kari Pulli Anatoly Baksheev Kirill Kornyakov and Victor Eruhimov 2012 Real-time computer vision with OpenCV Commun ACM 55 6 (2012) 61ndash69

[27] Je Rasley Konstantinos Karanasos Srikanth Kandula Rodrigo Fonseca MilanVojnovic and Sriram Rao 2016 Ecient queue management for cluster sched-uling In Proceedings of the Eleventh European Conference on Computer SystemsACM 36

[28] Mahadev Satyanarayanan 2001 Pervasive computing Vision and challengesIEEE Personal communications 8 4 (2001) 10ndash17

[29] Mahadev Satyanarayanan 2017 e Emergence of Edge Computing Computer50 1 (2017) 30ndash39

[30] Mahadev Satyanarayanan Pieter Simoens Yu Xiao Padmanabhan Pillai ZhuoChen Kiryong Ha Wenlu Hu and Brandon Amos 2015 Edge analytics in theinternet of things IEEE Pervasive Computing 14 2 (2015) 24ndash31

[31] Cong Shi Karim Habak Pranesh Pandurangan Mostafa Ammar Mayur Naikand Ellen Zegura 2014 Cosmos computation ooading as a service for mobiledevices In Proceedings of the 15th ACM international symposium on Mobile adhoc networking and computing ACM 287ndash296

[32] W Shi J Cao Q Zhang Y Li and L Xu 2016 Edge Computing Visionand Challenges IEEE Internet of ings Journal 3 5 (Oct 2016) 637ndash646 DOIhpsdoiorg101109JIOT20162579198

[33] Weisong Shi and Schahram Dustdar 2016 e Promise of Edge ComputingComputer 49 5 (2016) 78ndash81

[34] Tolga Soyata Rajani Muraleedharan Colin Funai Minseok Kwon and WendiHeinzelman 2012 Cloud-Vision Real-time face recognition using a mobile-cloudlet-cloud acceleration architecture In Computers and Communications(ISCC) 2012 IEEE Symposium on IEEE 000059ndash000066

[35] Xiaoli Wang Aakanksha Chowdhery andMung Chiang 2016 SkyEyes adaptivevideo streaming from UAVs In Proceedings of the 3rd Workshop on Hot Topics inWireless ACM 2ndash6

[36] Shanhe Yi Zijiang Hao Zhengrui Qin andn Li 2015 Fog Computing Plat-form and Applications In Hot Topics in Web Systems and Technologies (HotWeb)2015 ird IEEE Workshop on IEEE 73ndash78

[37] Shanhe Yi Cheng Li and n Li 2015 A Survey of Fog Computing ConceptsApplications and Issues In Proceedings of the 2015 Workshop on Mobile Big DataMobidata rsquo15 ACM 37ndash42

[38] Shanhe Yi Zhengrui Qin andn Li 2015 Security and privacy issues of fogcomputing A survey InWireless Algorithms Systems and Applications Springer685ndash695

[39] Matei Zaharia Mosharaf Chowdhury Michael J Franklin Sco Shenker and IonStoica 2010 Spark cluster computing with working sets HotCloud 10 (2010)10ndash10

[40] Haoyu Zhang Ganesh Ananthanarayanan Peter Bodik Mahai PhiliposeParamvir Bahl and Michael J Freedman 2017 Live Video Analytics at Scale withApproximation and Delay-Tolerance In 14th USENIX Symposium on NetworkedSystems Design and Implementation (NSDI 17) USENIX Association BostonMA 377ndash392 hpswwwusenixorgconferencensdi17technical-sessionspresentationzhang

[41] Q Zhang X Zhang Q Zhang W Shi and H Zhong 2016 Firework Big DataSharing and Processing in Collaborative Edge Environment In 2016 Fourth IEEEWorkshop on Hot Topics in Web Systems and Technologies (HotWeb) 20ndash25 DOIhpsdoiorg101109HotWeb201612

[42] Tan Zhang Aakanksha Chowdhery Paramvir Victor Bahl Kyle Jamieson andSuman Banerjee 2015 e design and implementation of a wireless videosurveillance system In Proceedings of the 21st Annual International Conferenceon Mobile Computing and Networking ACM 426ndash438

  • Abstract
  • 1 Introduction
  • 2 Background and Motivation
    • 21 Edge Computing Network
    • 22 Serverless Architecture
    • 23 Video Edge Analytics for Public Safety
      • 3 LAVEA System Design
        • 31 Design Goals
        • 32 System Overview
        • 33 Edge Computing Services
          • 4 Edge-front Offloading
            • 41 Task Offloading System Model and Problem Formulation
            • 42 Optimization Solver
            • 43 Prioritizing Edge Task Queue
            • 44 Workload Optimizer
              • 5 Inter-edge Collaboration
                • 51 Motivation and Challenges
                • 52 Inter-Edge Task Placement Schemes
                  • 6 System Implementation and Performance Evaluation
                    • 61 Implementation Details
                    • 62 Evaluation Setup
                    • 63 Task Profiler
                    • 64 Offloading Task Selection
                    • 65 Edge-front Task Queue Prioritizing
                    • 66 Inter-Edge Collaboration
                      • 7 Related Work
                        • 71 Distributed Data Processing
                        • 72 Computation Offloading
                          • 8 Discussions and Limitations
                          • 9 Conclusion
                          • References
Page 5: LAVEA: Latency-aware Video Analytics on Edge Computing Platformweisongshi.org/papers/yi17-LAVEA.pdf · 2020. 8. 19. · LAVEA: Latency-aware Video Analytics on Edge Computing Platform

LAVEA Latency-aware Video Analytics on Edge Computing Platform SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA

Host or Host Cluster Hardware

Host OS

Container

Container Manager (Docker Engine)

HDFS SQL KV Store

Data Store Service

Offloading Service

Queueing Service

Scheduling Service

Edge Front Gateway

Worker

Task Queue

Container Container Container Container

MonitoringService

Worker Worker

Task Scheduler

Worker Worker Worker

Worker Worker Worker

Producer

Workload Optimizer

Producer

Graph

Queue Prioritizer

Task Worker

ProfilerService

Edge Computing Platform API

Platform Internal API

OS-level Virtualization

Edge Computing Platform SDKEdge Computing Platform Client API

Application

ProfilerOffloading Controller

Worker Worker Worker

Task Scheduler

Local Worker Stack

OS or Container

Edge Computing Node

Access Potint

Security Camera

Dash Camera Smartphone and Tablet

Laptop

Figure 4 e architecture of edge computing platform

when a new client registers itself to the ooading services aerthe edge-front node collects enough prerequisite information andstatistics the optimization problem is solved again and the updatedooading decisions will be sent to all the clients Periodically theooading service also solves the optimization problem and updateooading decisions with its clients

4 EDGE-FRONT OFFLOADINGIn this section we describe how we select tasks of a job to runremotely on the edge server in order to minimize the responsetime

We consider selecting tasks to run on the edge as a computationooading problem Traditional ooading problems are ooadingschemes between clients and remote powerful cloud servers Inliterature [7 21 31] those system models usually assume the taskwill be instantly nished remotely once the task is ooaded to theserver However we argue that this assumption will not hold inedge computing environment as we need to consider the variousdelays at the server side especially when lots of clients are sendingooading requests We call it edge-front computation ooadingfrom the perspective of client

bull Tasks will be only by ooaded from client to the nearestedge node which we call the edge front

bull e underlying scheduling and processing is agnostic toclients

bull When a mobile node is disconnected from any edge nodeor even cloud node it will resort to local execution of allthe tasks

We assume that edge node is wire-connected to the access pointwhich indicates that the out-going trac can go through edge nodewith no additional cost e only dierence between ooading taskto edge node and cloud node is that the task running on edge nodemay experience resource contention and scheduling delay whilewe assume task ooaded to cloud node will get enough resourceand be scheduled to run immediately In light work load case ifthere is any response time reduction when this task is ooadedto cloud then we know that there is denitely benet when thistask is ooaded to the edge e reasons are 1) an edge server isas responsive as the server in the cloud data center 2) running atask on edge server experiences shorter data transmission delay asclient-edge link has much larger bandwidth than edge-cloud linkwhich is usually limited and imbalanced by the Internet serviceproviders (ISPs) erefore in this section we focus on the taskooading only between client and edge server and we will discussintegrating nearby edge nodes for the heavy work load scenario inthe next section

41 Task Oloading System Model andProblem Formulation

In the paper we call a running instance of the application as a jobwhich is usually a set of tasks e job is the unit of work thatuser submits to our system while the task is the unit of work forour system to make scheduling and optimization decisions esetasks generated from each application will be queued and processedeither locally or remotely By remotely we mean run the task on anedge node For simplicity we consider that all clients are running

SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA S Yi et al

instances of applications processing same kind of jobs which istypically the case in our edge application scenario However oursystem can easily extend to heterogeneous applications running oneach client device

We choose to work on the granularity of task since those tasksare modularized and can be exibly combined to either achievea speed processing or form a workow with high accuracy Inour ALPR application each task is usually a common algorithmor library of computer vision For example We have analyzed anopen source ALPR project called OpenALPR [22] and illustrate itstask graph in Fig 5

Input

MotionDetection

PlateDetection

PlateCharacterAnalysis

image

video frame

motionregion

Output

imagestill

CharacterRecognition

no plate detected

plate candidate

ResultGeneration

PlateCharacterAnalysis

CharacterRecognition

PlateCharacterAnalysis

CharacterRecognition

plate candidate

Figure 5 e task graph of OpenALPR

en we consider there are N clients and only one edge serverconnected as shown in Fig 1 is edge server could be a singleserver or a cluster of servers Each client i i isin [1N ] will processthe upcoming job upon request eg recognizing the license platesin video streams Usually those jobs will generate heavy compu-tation task and could benet from ooading some of them to theedge server Without loss of generality we use a graph of task torepresent the complex task relations inside a job which is essen-tially similar to the method call graph in [7] but in a more coarsegranularity For a certain kind of job we start with its directedacyclic graph (DAG) G = (V E) which gives the task executionsequence Each vertex v isin V weight is the computation or memorycost of a task (cv ) while each edge e = (uv)uv isin V e isin E weightrepresents the data size of intermediate results (duv ) us ourooading problem can be taken as a graph partition problem inwhich we need to assign a directed graph of tasks to dierent com-puting nodes (local edge or cloud) with the purpose to minimize

certain cost In this paper we primarily try to minimize the jobnish time

e remote response time includes the communication delay thenetwork transmission delay of sending data to the edge server andthe execution time on that server We use an indicator Ivi isin 0 1for all v in V and for all i isin [1N ] If Ivi = 1 then the task v atclient i will run locally otherwise it will run on the remote edgeserver For those tasks running locally the total execution time forclient i is a sum as

T locali =sumv isinV

Ivicvpi (1)

where pi is the processor speed of client i Similarly we use

Tlocali =

sumv isinV(1 minus Ivi )cvpi (2)

to represent the execution time of running the ooaded taskslocally For network when there is an ooading decision theclient need to upload the intermediate data (outputs of previoustask application status variables congurations etc) to the edgeserver in order to continue the computation e network delay ismodeled as

Tneti =sum(uv)isinE

(Iui minus Ivi )duvri + βi (3)

where ri is the bandwidth assigned for this client and βi is thecommunication latency which can be estimated using round triptime between the client i and the edge server

For each client the remote execution time is

T r emotei =

sumv isinV(1 minus Ivi )(cvp0) (4)

where p0 is the processor speed of the edge serveren our ooading task selection problem can be formulated as

minIi ri

Nsumi=1(T locali +Tneti +T r emote

i ) (5)

e ooading task selection is represented by the indicator matrixI is optimization problem is subject to the following constraints

bull e total bandwidth

stNsumi=1

ri le R (6)

bull Like existing work we restrict the data ow to avoid ping-pong eect in which intermediate data is transmied backand forth between client and edge serverst Ivi le Iui foralle(uv) isin Eforalli isin [1N ] (7)

bull Unlike exiting ooading frameworks for mobile cloud com-puting we take the resource contention or schedule delayat the edge side into consideration by adding a end-to-enddelay constraint

st Tlocali minus (Tneti +T r emote

i ) gt τ foralli isin [1N ] (8)where τ can be tuned to avoid selecting borderline tasksthat if ooaded will get no gain due to the resource con-tention or schedule delay at the edge

LAVEA Latency-aware Video Analytics on Edge Computing Platform SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA

42 Optimization Solvere proposed optimization is a mixed integer non-linear program-ming problem (MINLP) where the integer variable stands for theooading decision and the continuous variable stands for the band-width allocation To solve this optimization problem we start fromrelaxing the integer constraints and solve the non-linear program-ming version of the problem using Sequential adratic Program-ming method a constrained nonlinear optimization method issolution is optimal without considering the integer constraintsStarting from this optimal solution we optionally employ branchand bound (BampB) method to search for the optimal integer solutionor simply do an exhaustive search when the number of clients andthe number of tasks of each job are small

43 Prioritizing Edge Taskeuee ooading strategy produced by the task selection optimizesthe rdquoowldquo time of each type of job At each time epoch during therun time the edge-front node receives a large number of ooadedtasks from the clients Originally we follow the rst come rstserve rule to accommodate all the client requests For each requestat the head of the task queue the edge-front server rst checksif the input or intermediate data (eg images or videos) availableat the edge otherwise the server waits is scheme is easy toimplement but substantial computation is wasted if the networkIO is busy with a large size le and there is no task that is readyfor processing erefore we improve the task scheduling witha task queue prioritizer to maintain a task sequence which mini-mizes the makespan for the task scheduling of all ooading taskrequests received at a certain time epoch Since the edge node canexecute the task only when the input data has been fully receivedor the depended tasks have nished execution We consider thatan ooaded task has to go through two stages the rst stage isthe retrieval of input or intermediate data and state variables thesecond stage is the execution of the task

We model our scheduling problem using the ow job shop modeland apply the Johnsonrsquos rule [19] is scheme is optimal and themakespan is minimized when the number of stages is two Nev-ertheless this model only ts in the case that all submied jobrequests are independent and no priorities When consideringtask dependencies a successor can only start aer its predecessornishes By enforcing the topological ordering constraints theproblem can be solved optimally using the BampB method [5] How-ever this solution hardly scales against the number of tasks Inthis case we adapt the method in [3] and group tasks with depen-dencies and execute all tasks in a group sequentially e basicidea is applying Johnsonrsquos rule in two levels e rst level is todecide the sequence of tasks with in each group e dierence inour problem is that we need to decide the best sequence amongall valid topological orderings en the boom level is a job shopscheduling problem in terms of grouped jobs (ie a group of taskswith dependencies in topological ordering) in which we can utilizeJohnsonrsquos rule directly

44 Workload OptimizerIf the workload is overwhelming and the edge-front server is satu-rated the task queue will be unstable and the response time will

be accumulated indenitely ere are several measures we cantake to address this problem First we can adjust the imagevideoresolution in client-side congurations which makes well trade-o between speed and accuracy Second by constraining the taskooading problem we can restrain more computation tasks at theclient side ird if there are nearby edge nodes which are favoredin terms of latency bandwidth and computation we can furtherooad tasks to nearby edge nodes We have investigated this casewith performance improvement considerations in the Section 5Last we can always redirect tasks to the remote cloud just likeooading in MCC

5 INTER-EDGE COLLABORATIONIn this section We improve our edge-rst design by taking the casewhen the incoming workload saturates our edge-front node intoconsideration We will rst discuss our motivation of providingsuch option and list the corresponding challenges en we willintroduce several collaboration schemes we have proposed andinvestigated

51 Motivation and Challengese resources of edge computing node are much richer than clientnodes but are relatively limited compared to cloud nodes Whileserving an increasing number of client nodes nearby the edge-frontnode will be eventually overloaded and become non-responsive tonew requests As a baseline we can optionally choose to ooad fur-ther requests to the remote cloud We assume that the remote cloudhas unlimited resources and is capable to handle all the requestsHowever running tasks remotely in the cloud the application needto bear with unpredictable latency and limited bandwidth whichis not the best choice especially when there are other nearby edgenodes that can accommodate those tasks We assume that underthe condition when all available edge nodes nearby are exhaustedthe mobile-edge-cloud computing paradigm will simply fall back tothe mobile cloud computing paradigm e fallback design is notin the scope of this paper In this paper we mainly investigate theinter-edge collaboration with the prime purpose to alleviate theburden on edge-front node

When the edge-front node is saturated with requests it cancollaborate with nearby edge nodes by placing some tasks to thesenot-so-busy edge nodes such that all the tasks can get scheduledin a reasonable time is is slightly dierent from balancing theworkload among the edge nodes and the edge-front node in that thegoal of inter-edge collaboration is to beer serve the client nodeswith submied requests rather than simply making the workloadbalanced For example an edge-front node that is not overloadeddoes not need to place any tasks to the nearby edge nodes evenwhen they are idle

e challenges of inter-edge collaboration are two-fold 1) weneed to design a proper inter-edge task placement scheme thatfullls our goal of reducing the workload on the edge-front nodewhile ooading proper amount of workload to the qualied edgenodes 2) the task placement scheme should be lightweight scalableand easy-to-implement

SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA S Yi et al

52 Inter-Edge Task Placement SchemesWe have investigated three task placement schemes for inter-edgecollaboration

bull Shortest Transmission Time First (STTF)bull Shortest eue Length First (SQLF)bull Shortest Scheduling Latency First (SSLF)

e STTF task placement scheme tends to place tasks on theedge node that has the shortest estimated latency for the edge-front node to transfer the tasks e edge-front node maintains atable to record the latency of transmiing data to each availableedge node e periodical re-calibration is necessary because thenetwork condition between the edge-front node and other edgenodes may vary from time to time

e SQLF task placement scheme on the other hand tends totransfer tasks from the edge-front node to the edge node which hasthe least number of tasks queued upon the time of query Whenthe edge-front node is saturated with requests it will rst queryall the available edge nodes about their current task queue lengthand then transfer tasks to the edge node that has the shortest valuereported

e SSLF task placement scheme tends to transmit tasks from theedge-front node to the edge node that is predicted to have the short-est response time e response time is the time interval betweenthe time when the edge-front node submits a task to an availableedge node and the time when it receives the result of the task fromthat edge node Unlike the SQLF task placement scheme the edge-front node keeps querying the edge nodes about the queue lengthwhich may has performance issue when the number of nodes scalesup and results in a large volume of queries We have designed anovel method for the edge-front node to measure the schedulinglatency eciently During the measurement phase before edge-front node chooses task placement target edge-front node sendsa request message to each available edge node which appends aspecial task to the tail of the task queue When the special taskis executed the edge node simply sends a response message tothe edge-front node e edge-front node receives the responsemessage and records the response time Periodically the edge-frontnode maintains a series of response times for each available edgenode When the edge-front node is saturated it will start to reassigntasks to the edge node having the shortest response time Unlikethe STTF and SQLF task assignment schemes which choose thetarget edge node based on the current or most recent measurementsthe SSLF scheme predicts the current response time for each edgenode by applying regression analysis to the response time seriesrecorded so far e reason is that the edge nodes are also receivingtask requests from client nodes and their local workload may varyfrom time to time so the most recent response time cannot serveas a good predictor of the current response time for the edge nodesAs the local workload in the real world on each edge node usuallyfollows certain paern or trend applying regression analysis tothe recorded response times is a good way to estimate the currentresponse time To this end we recorded measurements of responsetimes from each edge node and ooads tasks to the edge nodethat is predicted to have the least current response time Once theedge-front node starts to place task to a certain edge node the

estimation will be updated using piggybacking of the redirectedtasks which lowers the overhead of measuring

Each of the task placement schemes described above has someadvantages and disadvantages For instance the STTF scheme canquickly reduce the workload on the edge-front node But there isa chance that tasks may be placed to an edge node which alreadyhas intensive workload as STTF scheme gathers no information ofthe workload on the target e SQLF scheme works well when thenetwork latency and bandwidth are stable among all the availableedge nodes When the network overheads are highly variant thisscheme fails to factor the network condition and always choosesedge node with the lowest workload When an intensive workloadis placed under a high network overhead this scheme potentiallydeteriorates the performance as it needs to measure the workloadfrequently e SSLF task placement scheme estimates the responsetime of each edge node by following the task-ooading processand the response time is a good indicator of which edge node shouldbe chosen as the target of task placement in terms of the workloadand network overhead e SSLF scheme is a well trade-o betweenprevious two schemes However the regression analysis may intro-duce a large error to the predicted response time if inappropriatemodels are selected We believe that the decision of which taskplacement scheme should be employed for achieving good systemperformance should always give proper considerations on the work-load and network conditions We evaluated those three schemesthrough a case study in the next section

6 SYSTEM IMPLEMENTATION ANDPERFORMANCE EVALUATION

In this section we rst brief the implementation details in buildingour system Next we introduce our evaluation setup and presentresults of our evaluations

61 Implementation DetailsOur implementation aims at a serverless edge architecture Asshown in system architecture of Fig 4 our implementation is basedon docker container for the benets of quick deployment and easymanagement Every component has been dockerized and its de-ployment is greatly simplied via distributing pre-built images ecreation and destruction of docker instances is much faster thanthat of VM instances Inspired by the IBM OpenWhisk [18] eachworker container contains an action proxy which uses Pythonto run any scripts or compile and execute any binary executablee worker container communicates with others using a messagequeue as all the inputsoutputs will be jsonied However we donrsquotjsonied imagevideo and use its path reference in shared storagee task queue is implemented using Redis as it is in memory andhas very good performance e end user only needs to 1) deployour edge computing platform on heterogeneous devices with justa click 2) dene the event of interests using provided API 3) pro-vide a function (scripts or binary executable) to process such evente function we have implemented is using open source projectOpenALPR[22] as the task payload for workers

LAVEA Latency-aware Video Analytics on Edge Computing Platform SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA

62 Evaluation Setup621 Testbed We have built a testbed consisting of four edge

computing nodes One of the edge nodes is the edge-front nodewhich is directly connected to a wireless router using a cable Otherthree nodes are set as nearby edge computing nodes for the evalua-tion of inter-edge collaboration ese four machines have the samehardware specications ey all have a quad-core CPU and 4 GBmain memory e three nearby edge nodes are directly connectedto the edge-front node through a network cable We make use oftwo types of Raspberry Pi (RPi) nodes as clients one type is RPi 2which is wired to the router while the other type is RPi 3 which isconnected to router using built-in 24 Ghz WiFi

622 Datasets We have employed three datasets for evalua-tions One dataset is the Caltech Vision Group 2001 testing databasein which the car rear image resolution (126 images with resolution896x592) is adequate for license plate recognition [25] Anotherdataset is a self-collected 4K video contains rear license plates takenon an Android smartphone and is converted into videos of dierentresolutions(640x480 960x720 1280x960 and 1600x1200) e otherdataset used in inter-edge collaboration evaluation contains 22 carimages with the various resolution ranging from 405x540 pixelsto 2514x1210 pixels (le size 316 KB to 285 MB) e task requestsuse the car images as input in a round-robin way one car imagefor each task request

63 Task ProlerBeside the round trip time and bandwidth benchmark we havepresented in Fig 2 and Fig 3 to characterize the edge computingnetwork we have done proling of the OpenALPR application onvarious client edge and cloud nodes

896x592workload1

640x480workload2

960x720workload2

1280x960workload2

1600x1200workload2

0200

400

600

800

Clie

nt

RPi2

Execu

tion T

ime(m

s)

MotionDetectonPlateDetectionPlateAnalysisOCR

Figure 6 OpenALPR prole result of client type 1 (RPi2quad-core 09GHz)

In this experiment we are using both dataset 1 (workload 1)and dataset 2 (workload 2) at various resolutions e executiontime for each tasks are shown in Fig 6 Fig 7 Fig 8 and Fig 9e results indicate that by utilizing an edge nodes we can geta comparable amount of computation power close to clients forcomputation-intensive tasks Another observations is that dueto the uneven optimizations on heterogeneous CPU architecturessome tasks are beer to keep local while some others should be

896x592workload1

640x480workload2

960x720workload2

1280x960workload2

1600x1200workload2

020

04

00

60

0800

Clie

nt

RPi3

Execu

tion T

ime(m

s)

MotionDetectonPlateDetectionPlateAnalysisOCR

Figure 7 OpenALPR prole result of client type 2 (RPi3quad-core 12GHz)

896x592workload1

640x480workload2

960x720workload2

1280x960workload2

1600x1200workload2

0200

400

600

800

Edge E

xecu

tion T

ime(m

s)

MotionDetectonPlateDetectionPlateAnalysisOCR

Figure 8 OpenALPR prole result of a type of edge node (i7quad-core 230GHz)

896x592workload1

640x480workload2

960x720workload2

1280x960workload2

1600x1200workload2

0200

400

600

800

EC

2 T

2 L

arg

e E

xecu

tion T

ime(m

s)

MotionDetectonPlateDetectionPlateAnalysisOCR

Figure 9 OpenALPR prole of a type of cloud node (AWSEC2 t2large Xeon dual-core 240GHz)

ooaded to edge computing node is observation justied theneed of computation ooading between clients and edge nodes

SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA S Yi et al

640x480workload2

960x720workload2

1280x960workload2

1600x1200workload2

01

23

4R

esp

onse

Tim

e p

er

fram

e p

er

clie

nt(

s) Client-edge optClient onlyEdge onlyClient-cloud optCloud only

Figure 10 e comparison of task selection impacts on edgeoloading and cloud oloading for wired clients (RPi2)

640x480workload2

960x720workload2

1280x960workload2

1600x1200workload2

01

23

4R

esp

onse

Tim

e p

er

fram

e p

er

clie

nt(

s) Client-edge optClient onlyEdge onlyClient-cloud optCloud only

Figure 11 e comparison of task selection impacts on edgeoloading and cloud oloading for 24 Ghz wireless clients(RPi3)

64 Oloading Task SelectionTo understand how much the execution time can be reduced byspliing tasks between the client and the edge or between theclient and the cloud we design an experiment with workloadsgenerated from dataset 2 on two setups of scenarios 1) one edgenode provides service to three wired client nodes that have thebest network latency and bandwidth 2) one edge node providesservice to three wireless 24 Ghz client nodes that have latencywith high variance and relatively low bandwidth e result ofthe rst case is very straightforward the clients simply uploadall the input data and run all the tasks on the edge node in edgeooading or cloud node in cloud ooading as shown in Fig 10is is mainly because using Ethernet cable can stably providelowest latency and highest bandwidth which makes ooading toedge very rewarding We didnrsquot evaluate 5 Ghz wireless client sincethis interface is not supported on our client hardware while weanticipate similar results as the wire case We plot the result of a24 Ghz wireless client node with ooading to an edge node or aremote cloud node in the second case in Fig 11 Overall the resultsshowed that by ooading tasks to an edge computing platform

the application we choose experienced a speedup up to 40x onwired client-edge conguration compared to local execution andup to 17x compared to a similar client-cloud conguration Forclients with 24 Ghz wireless interface the speedup is up to 13x onclient-edge conguration compared to local execution and is up to12x compared to similar client-cloud conguration

5 10 15 20 25 30 35Number of task offloading requests

05

Resp

onse

Tim

e(s

)

Our schemeSIOFLCPUL

Figure 12 e comparison result of three task prioritizingschemes

65 Edge-front Taskeue PrioritizingTo evaluate the performance of the task queue prioritizing wecollect the statistical results from our proler service and moni-toring service on various workload for simulation We choose thesimulation method because we can freely setup the numbers andtypes of client and edge nodes to overcome the limitation of ourcurrent testbed to evaluate more complex deployments We addtwo simple schemes as baselines 1) shortest IO rst (SIOF) sortingall the tasks against the time cost of the network transmission 2)longest CPU last (LCPUL) sorting all the tasks against the time costof the processing on the edge node In the simulation base on thecombination of client device types workloads and ooading deci-sions we have in total seven types of jobs to run on the edge nodeWe increase the total number of jobs and evenly distributed themamong the seven types and report the makespan time in Fig 12 eresult showed that LCPUL is the worst among those three schemesand our scheme outperforms the shortest job rst scheme

66 Inter-Edge CollaborationWe also evaluate the performance of the three task placementschemes (ie STTF SQLF and SSLF) discussed in Section 5 througha controlled experiment on our testbed For evaluation purpose wecongure the network in the edge computing system as followse rst edge node denoted as ldquoedge node 1rdquo aerwards has 10ms RTT and 40 Mbps bandwidth to the edge-front node e secondedge node ldquoedge node 2rdquo has 20 ms RTT and 20 Mbps bandwidthto the edge-front node e third edge node ldquoedge node 3rdquo has100 ms RTT and 2 Mbps bandwidth to the edge-front node uswe emulate the situation where three edge nodes are in the distanceto the edge-front node from near to far

LAVEA Latency-aware Video Analytics on Edge Computing Platform SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA

0

05

1

15

2

25

3

0 2 4 6 8 10 12

Th

rou

gh

pu

t (t

ask

sse

c)

Time (min)

Edge-front nodeEdge node 1Edge node 2Edge node 3

Figure 13 Performance with no task placement scheme

0

05

1

15

2

25

3

0 2 4 6 8 10 12

Th

rou

gh

pu

t (t

ask

sse

c)

Time (min)

Edge-front nodeEdge node 1Edge node 2Edge node 3

Figure 14 Performance of STTF

0

05

1

15

2

25

3

0 2 4 6 8 10 12

Th

rou

gh

pu

t (t

ask

sse

c)

Time (min)

Edge-front nodeEdge node 1Edge node 2Edge node 3

Figure 15 Performance of SQLF

0

05

1

15

2

25

3

0 2 4 6 8 10 12

Th

rou

gh

pu

t (t

ask

sse

c)

Time (min)

Edge-front nodeEdge node 1Edge node 2Edge node 3

Figure 16 Performance of SSLF

We use the third dataset to synthesize a workload as follows Inthe rst 4 minutes the edge-front node receives 5 task requests persecond edge node 1 receives 4 task requests per second edge node2 receives 3 task requests per second and edge node 3 receives 2task requests per second respectively No task comes to any of theedge nodes aer the rst 4 minutes For the SSLF task placementscheme we implement a simple linear regression to predict thescheduling latency of the task being transmied since the workloadwe have injected is uniform distributed

Fig 13 illustrates the throughput on each edge node when notask placement scheme is enabled on the edge-front node eedge-front node has the heaviest workload and it takes about 1236minutes to nish all the tasks We consider this result as our base-line

Fig 14 is the throughput result of STTF scheme In this case theedge-front node only transmits tasks to edge node 1 because edgenode 1 has the highest bandwidth and the shortest RTT to theedge-front node Fig 17 reveals that the edge-front node transmits120 tasks to edge node 1 and no task to other edge nodes Asedge node 1 has heavier workload than edge node 2 and edgenode 3 the STTF scheme has limited improvement on the systemperformance the edge-front node takes about 1129 minutes tonish all the tasks Fig 15 illustrates the throughput result of SQLF

scheme is scheme works beer than the STTF scheme becausethe edge-front node transmits more tasks to less-saturated edgenodes eciently reducing the workload on the edge-front nodeHowever the edge-front node intends to transmit many tasks toedge node 3 at the beginning which has the lowest bandwidth andthe longest RTT to the edge-front node As such the task placementmay incur more delay then expected From Fig 17 the edge-frontnode transmits 0 task to edge node 1 132 tasks to edge node 2and 152 tasks to edge node 3 e edge-front node takes about 96minutes to nish all the tasks

Fig 16 demonstrates the throughput result of SSLF scheme isscheme considers both the transmission time of the task beingplaced and the waiting time in the queue on the target edge nodeand therefore achieves the best performance of the three As men-tioned edge node 1 has the lowest transmission overhead but theheaviest workload among the three edge nodes while edge node3 has the lightest workload but the highest transmission overheadIn contrast edge node 2 has modest transmission overhead andmodest workload e SSLF scheme takes all these situations intoconsideration and places the most number of tasks on edge node2 As shown in Fig 17 the edge-front node transmits 4 tasks toedge node 1 152 tasks to edge node 2 and 148 tasks to edgenode 3 when working with the SSLF scheme e edge-front node

SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA S Yi et al

takes about 936 minutes to nish all the tasks which is the bestresult among the three schemes We infer that the third schemewill further improve the task complete time if more tough networkconditions and workloads are considered

STTF SQLF SSLF

050

10

015

0N

um

ber

of

task

palc

ed

0task

0task

0task

4tasks

Edge node 1Edge node 2Edge node 3

Figure 17 Numbers of tasks placed by the edge-front node

7 RELATEDWORKe emergence of edge computing has drawn aentions due to itscapabilities to reshape the land surface of IoTs mobile computingand cloud computing [6 14 32 33 36ndash38] Satyanarayanan [29]has briefed the origin of edge computing also known as fog comput-ing [4] cloudlet [28] mobile edge computing [24] and so on Herewe will review several relevant research elds towards video edgeanalytics including distributed data processing and computationooading in various computing paradigms

71 Distributed Data ProcessingDistributed data processing has close relationship to the edge an-alytics in the sense that those data processing platforms [9 39]and underlying techniques [16 23 27] can be easily deployed on acluster of edge nodes In this paper we pay specially aentions todistributed imagevideo data processing systems VideoStorm[40]made insightful observation on vision-related algorithms and pro-posed resource-quality trade-o with multi-dimensional congu-rations (eg video resolution frame rate sampling rate slidingwindow size etc) e resource-quality proles are generated of-ine and a online scheduler is built to allocate resources to queriesto optimize the utility of quality and latency eir work is comple-mentary to ours in that we do not consider the trade-o betweenquality and latency goals via adaptive congurations Vigil [42] isa wireless video surveillance system that leveraged edge comput-ing nodes with emphasis on the content-aware frame selections ina scenario where multiple web cameras are at the same locationto optimize the bandwidth utilization which is orthogonal to theproblems we have addressed here Firework [41] is a computingparadigm for big data processing in collaborative edge environmentwhich is complementary to our work in terms of shared data viewand programming interface

While there should be more on-going eorts for investigating theadaptation improvement and optimization of existing distributed

data processing techniques on edge computing platform we focusmore on the taskapplication-level queue management and sched-uling and leave all the underlying resource negotiating processscheduling to the container cluster engine

72 Computation OloadingComputation Ooading (aka Cyber foraging [28]) has been pro-posed to improve resource utilization response time and energyconsumption in various computing environments [7 13 21 31]Work [17] has quantied the impact of edge computing on mobileapplications and found that edge computing can improve responsetime and energy consumption signicantly for mobile devicesthrough ooading via both WiFi and LTE networks Mocha [34]has investigated how a two-stage face recognition task from mobiledevice can be accelerated by cloudlet and cloud In their designclients simply capture image and sends to cloudlet e optimaltask partition can be easily achieved as it has only two stages InLAVEA our application is more complicated in multiple stages andwe leverage client-edge ooading and other techniques to improvethe resource utilization and optimize the response time

8 DISCUSSIONS AND LIMITATIONSIn this section we will discuss alternative design options point outcurrent limitations and future work that can improve the system

Measurement-based Oloading In this paper we are using ameasurement-based ooading (static ooading) ie the ooadingdecisions are based on the outcome of periodic measurements Weconsider this as one of the limitations of our implementations asstated in [15] and there are several dynamic computation ooadingschemes have been proposed [12] We are planning to improve themeasurement-based ooading in the future work

Video Streaming Our current data processing is image-basedwhich is one of the limitations of our implementation e inputis either in the format of image or in video stream which is readinto frames and sent out We believe by utilizing existing videostreaming techniques in between our system components for datasharing will further improve the system performance and openmore potential opportunities for optimization

Discovering EdgeNodes ere are dierent ways for the edge-front node to discover the available edge nodes nearby For exampleevery edge node intending to serve as a collaborator may open adesignated port so that the edge-front node can periodically scanthe network and discover the available edge nodes is is calledthe ldquopull-basedrdquo method In contrast there is also a ldquopush-basedrdquomethod in which the edge-front node opens a designated port andevery edge node intending to serve as a collaborator will register tothe edge-front node When the network is in a large scale the pull-based method usually performs poorly because the edge-front nodemay not be able to discover an available edge node in a short timeFor this reason the edge node discovery is implemented in a push-based method which guarantees good performance regardless ofthe network scale

9 CONCLUSIONIn this paper we have investigated providing video analytic servicesto latency-sensitive applications in edge computing environment

LAVEA Latency-aware Video Analytics on Edge Computing Platform SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA

As a result we have built LAVEA a low-latency video edge analyticsystem which collaborates nearby client edge and remote cloudnodes and transfers video feeds into semantic information at placescloser to the users in early stages We have utilized an edge-frontdesign and formulated an optimization problem for ooading taskselection and prioritized task queue to minimize the response timeOur result indicated that by ooading tasks to the closest edgenode the client-edge conguration has speedup range from 13x to4x (12x to 17x) against running in local (client-cloud) under variousnetwork conditions and workloads In case of a saturating workloadon the front edge node we have proposed and compared varioustask placement schemes that are tailed for inter-edge collaboratione proposed prediction-based shortest scheduling latency rsttask placement scheme considers both the transmission time ofthe task and the waiting time in the queue and outputs overallperformance beer than the other schemes

REFERENCES[1] Amazon Web Service 2017 AWS LambdaEdge hpdocsawsamazoncom

lambdalatestdglambda-edgehtml (2017)[2] Christos-Nikolaos E Anagnostopoulos Ioannis E Anagnostopoulos Ioannis D

Psoroulas Vassili Loumos and Eleherios Kayafas 2008 License plate recog-nition from still images and video sequences A survey IEEE Transactions onintelligent transportation systems 9 3 (2008) 377ndash391

[3] KR Baker 1990 Scheduling groups of jobs in the two-machine ow shop Math-ematical and Computer Modelling 13 3 (1990) 29ndash36

[4] Flavio Bonomi Rodolfo Milito Jiang Zhu and Sateesh Addepalli 2012 Fogcomputing and its role in the internet of things In Proceedings of the rst editionof the MCC workshop on Mobile cloud computing ACM 13ndash16

[5] Peter Brucker Bernd Jurisch and Bernd Sievers 1994 A branch and boundalgorithm for the job-shop scheduling problem Discrete applied mathematics 491 (1994) 107ndash127

[6] Yu Cao Songqing Chen Peng Hou and Donald Brown 2015 FAST A fog com-puting assisted distributed analytics system to monitor fall for stroke mitigationIn Networking Architecture and Storage (NAS) 2015 IEEE International Conferenceon IEEE 2ndash11

[7] Eduardo Cuervo Aruna Balasubramanian Dae-ki Cho Alec Wolman StefanSaroiu Ranveer Chandra and Paramvir Bahl 2010 MAUI Making SmartphonesLast Longer with Code Ooad In Proceedings of the 8th International Conferenceon Mobile Systems Applications and Services (MobiSys rsquo10) ACM New York NYUSA 49ndash62 DOIhpsdoiorg10114518144331814441

[8] Eyal de Lara Carolina S Gomes Steve Langridge S Hossein Mortazavi andMeysam Roodi 2016 Hierarchical Serverless Computing for the Mobile EdgeIn Edge Computing (SEC) IEEEACM Symposium on IEEE 109ndash110

[9] Jerey Dean and Sanjay Ghemawat 2008 MapReduce simplied data processingon large clusters Commun ACM 51 1 (2008) 107ndash113

[10] Shan Du Mahmoud Ibrahim Mohamed Shehata and Wael Badawy 2013 Auto-matic license plate recognition (ALPR) A state-of-the-art review IEEE Transac-tions on circuits and systems for video technology 23 2 (2013) 311ndash325

[11] Sadjad Fouladi Riad S Wahby Brennan Shackle Karthikeyan Vasuki Balasubra-maniam William Zeng Rahul Bhalerao Anirudh Sivaraman George Porter andKeith Winstein 2017 Encoding Fast and Slow Low-Latency Video ProcessingUsing ousands of Tiny reads In 14th USENIX Symposium on NetworkedSystems Design and Implementation (NSDI 17) USENIX Association BostonMA 363ndash376 hpswwwusenixorgconferencensdi17technical-sessionspresentationfouladi

[12] Wei Gao Yong Li Haoyang Lu Ting Wang and Cong Liu 2014 On exploitingdynamic execution paerns for workload ooading in mobile cloud applicationsIn Network Protocols (ICNP) 2014 IEEE 22nd International Conference on IEEE1ndash12

[13] Mark S Gordon Davoud Anoushe Jamshidi Sco A Mahlke Zhuoqing Mor-ley Mao and Xu Chen 2012 COMET Code Ooad by Migrating ExecutionTransparently In OSDI Vol 12 93ndash106

[14] Zijiang Hao Ed Novak Shanhe Yi andn Li 2017 Challenges and SowareArchitecture for Fog Computing Internet Computing (2017)

[15] Mohammed A Hassan Kshitiz Bhaarai Qi Wei and Songqing Chen 2014POMAC Properly Ooading Mobile Applications to Clouds In 6th USENIXWorkshop on Hot Topics in Cloud Computing (HotCloud 14)

[16] Benjamin Hindman Andy Konwinski Matei Zaharia Ali Ghodsi Anthony DJoseph Randy H Katz Sco Shenker and Ion Stoica 2011 Mesos A Platformfor Fine-Grained Resource Sharing in the Data Center In NSDI Vol 11 22ndash22

[17] Wenlu Hu Ying Gao Kiryong Ha Junjue Wang Brandon Amos Zhuo ChenPadmanabhan Pillai andMahadev Satyanarayanan 2016antifying the Impactof Edge Computing onMobile Applications In Proceedings of the 7th ACM SIGOPSAsia-Pacic Workshop on Systems ACM 5

[18] IBM 2017 Apache OpenWhisk hpopenwhiskorg (April 2017)[19] Selmer Martin Johnson 1954 Optimal two-and three-stage production schedules

with setup times included Naval research logistics quarterly 1 1 (1954) 61ndash68[20] Eric Jonas Shivaram Venkataraman Ion Stoica and Benjamin Recht 2017

Occupy the Cloud Distributed computing for the 99 arXiv preprintarXiv170204024 (2017)

[21] Ryan Newton Sivan Toledo Lewis Girod Hari Balakrishnan and Samuel Mad-den 2009 Wishbone Prole-based Partitioning for Sensornet Applications InNSDI Vol 9 395ndash408

[22] OpenALPR 2017 OpenALPR ndash Automatic License Plate Recognition hpwwwopenalprcom (April 2017)

[23] Kay Ousterhout Patrick Wendell Matei Zaharia and Ion Stoica 2013 Sparrowdistributed low latency scheduling In Proceedings of the Twenty-Fourth ACMSymposium on Operating Systems Principles ACM 69ndash84

[24] M Patel B Naughton C Chan N Sprecher S Abeta A Neal and others 2014Mobile-edge computing introductory technical white paper White Paper Mobile-edge Computing (MEC) industry initiative (2014)

[25] Brad Philip and Paul Updike 2001 Caltech Vision Group 2001 testing databasehpwwwvisioncaltecheduhtml-lesarchivehtml (2001)

[26] Kari Pulli Anatoly Baksheev Kirill Kornyakov and Victor Eruhimov 2012 Real-time computer vision with OpenCV Commun ACM 55 6 (2012) 61ndash69

[27] Je Rasley Konstantinos Karanasos Srikanth Kandula Rodrigo Fonseca MilanVojnovic and Sriram Rao 2016 Ecient queue management for cluster sched-uling In Proceedings of the Eleventh European Conference on Computer SystemsACM 36

[28] Mahadev Satyanarayanan 2001 Pervasive computing Vision and challengesIEEE Personal communications 8 4 (2001) 10ndash17

[29] Mahadev Satyanarayanan 2017 e Emergence of Edge Computing Computer50 1 (2017) 30ndash39

[30] Mahadev Satyanarayanan Pieter Simoens Yu Xiao Padmanabhan Pillai ZhuoChen Kiryong Ha Wenlu Hu and Brandon Amos 2015 Edge analytics in theinternet of things IEEE Pervasive Computing 14 2 (2015) 24ndash31

[31] Cong Shi Karim Habak Pranesh Pandurangan Mostafa Ammar Mayur Naikand Ellen Zegura 2014 Cosmos computation ooading as a service for mobiledevices In Proceedings of the 15th ACM international symposium on Mobile adhoc networking and computing ACM 287ndash296

[32] W Shi J Cao Q Zhang Y Li and L Xu 2016 Edge Computing Visionand Challenges IEEE Internet of ings Journal 3 5 (Oct 2016) 637ndash646 DOIhpsdoiorg101109JIOT20162579198

[33] Weisong Shi and Schahram Dustdar 2016 e Promise of Edge ComputingComputer 49 5 (2016) 78ndash81

[34] Tolga Soyata Rajani Muraleedharan Colin Funai Minseok Kwon and WendiHeinzelman 2012 Cloud-Vision Real-time face recognition using a mobile-cloudlet-cloud acceleration architecture In Computers and Communications(ISCC) 2012 IEEE Symposium on IEEE 000059ndash000066

[35] Xiaoli Wang Aakanksha Chowdhery andMung Chiang 2016 SkyEyes adaptivevideo streaming from UAVs In Proceedings of the 3rd Workshop on Hot Topics inWireless ACM 2ndash6

[36] Shanhe Yi Zijiang Hao Zhengrui Qin andn Li 2015 Fog Computing Plat-form and Applications In Hot Topics in Web Systems and Technologies (HotWeb)2015 ird IEEE Workshop on IEEE 73ndash78

[37] Shanhe Yi Cheng Li and n Li 2015 A Survey of Fog Computing ConceptsApplications and Issues In Proceedings of the 2015 Workshop on Mobile Big DataMobidata rsquo15 ACM 37ndash42

[38] Shanhe Yi Zhengrui Qin andn Li 2015 Security and privacy issues of fogcomputing A survey InWireless Algorithms Systems and Applications Springer685ndash695

[39] Matei Zaharia Mosharaf Chowdhury Michael J Franklin Sco Shenker and IonStoica 2010 Spark cluster computing with working sets HotCloud 10 (2010)10ndash10

[40] Haoyu Zhang Ganesh Ananthanarayanan Peter Bodik Mahai PhiliposeParamvir Bahl and Michael J Freedman 2017 Live Video Analytics at Scale withApproximation and Delay-Tolerance In 14th USENIX Symposium on NetworkedSystems Design and Implementation (NSDI 17) USENIX Association BostonMA 377ndash392 hpswwwusenixorgconferencensdi17technical-sessionspresentationzhang

[41] Q Zhang X Zhang Q Zhang W Shi and H Zhong 2016 Firework Big DataSharing and Processing in Collaborative Edge Environment In 2016 Fourth IEEEWorkshop on Hot Topics in Web Systems and Technologies (HotWeb) 20ndash25 DOIhpsdoiorg101109HotWeb201612

[42] Tan Zhang Aakanksha Chowdhery Paramvir Victor Bahl Kyle Jamieson andSuman Banerjee 2015 e design and implementation of a wireless videosurveillance system In Proceedings of the 21st Annual International Conferenceon Mobile Computing and Networking ACM 426ndash438

  • Abstract
  • 1 Introduction
  • 2 Background and Motivation
    • 21 Edge Computing Network
    • 22 Serverless Architecture
    • 23 Video Edge Analytics for Public Safety
      • 3 LAVEA System Design
        • 31 Design Goals
        • 32 System Overview
        • 33 Edge Computing Services
          • 4 Edge-front Offloading
            • 41 Task Offloading System Model and Problem Formulation
            • 42 Optimization Solver
            • 43 Prioritizing Edge Task Queue
            • 44 Workload Optimizer
              • 5 Inter-edge Collaboration
                • 51 Motivation and Challenges
                • 52 Inter-Edge Task Placement Schemes
                  • 6 System Implementation and Performance Evaluation
                    • 61 Implementation Details
                    • 62 Evaluation Setup
                    • 63 Task Profiler
                    • 64 Offloading Task Selection
                    • 65 Edge-front Task Queue Prioritizing
                    • 66 Inter-Edge Collaboration
                      • 7 Related Work
                        • 71 Distributed Data Processing
                        • 72 Computation Offloading
                          • 8 Discussions and Limitations
                          • 9 Conclusion
                          • References
Page 6: LAVEA: Latency-aware Video Analytics on Edge Computing Platformweisongshi.org/papers/yi17-LAVEA.pdf · 2020. 8. 19. · LAVEA: Latency-aware Video Analytics on Edge Computing Platform

SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA S Yi et al

instances of applications processing same kind of jobs which istypically the case in our edge application scenario However oursystem can easily extend to heterogeneous applications running oneach client device

We choose to work on the granularity of task since those tasksare modularized and can be exibly combined to either achievea speed processing or form a workow with high accuracy Inour ALPR application each task is usually a common algorithmor library of computer vision For example We have analyzed anopen source ALPR project called OpenALPR [22] and illustrate itstask graph in Fig 5

Input

MotionDetection

PlateDetection

PlateCharacterAnalysis

image

video frame

motionregion

Output

imagestill

CharacterRecognition

no plate detected

plate candidate

ResultGeneration

PlateCharacterAnalysis

CharacterRecognition

PlateCharacterAnalysis

CharacterRecognition

plate candidate

Figure 5 e task graph of OpenALPR

en we consider there are N clients and only one edge serverconnected as shown in Fig 1 is edge server could be a singleserver or a cluster of servers Each client i i isin [1N ] will processthe upcoming job upon request eg recognizing the license platesin video streams Usually those jobs will generate heavy compu-tation task and could benet from ooading some of them to theedge server Without loss of generality we use a graph of task torepresent the complex task relations inside a job which is essen-tially similar to the method call graph in [7] but in a more coarsegranularity For a certain kind of job we start with its directedacyclic graph (DAG) G = (V E) which gives the task executionsequence Each vertex v isin V weight is the computation or memorycost of a task (cv ) while each edge e = (uv)uv isin V e isin E weightrepresents the data size of intermediate results (duv ) us ourooading problem can be taken as a graph partition problem inwhich we need to assign a directed graph of tasks to dierent com-puting nodes (local edge or cloud) with the purpose to minimize

certain cost In this paper we primarily try to minimize the jobnish time

e remote response time includes the communication delay thenetwork transmission delay of sending data to the edge server andthe execution time on that server We use an indicator Ivi isin 0 1for all v in V and for all i isin [1N ] If Ivi = 1 then the task v atclient i will run locally otherwise it will run on the remote edgeserver For those tasks running locally the total execution time forclient i is a sum as

T locali =sumv isinV

Ivicvpi (1)

where pi is the processor speed of client i Similarly we use

Tlocali =

sumv isinV(1 minus Ivi )cvpi (2)

to represent the execution time of running the ooaded taskslocally For network when there is an ooading decision theclient need to upload the intermediate data (outputs of previoustask application status variables congurations etc) to the edgeserver in order to continue the computation e network delay ismodeled as

Tneti =sum(uv)isinE

(Iui minus Ivi )duvri + βi (3)

where ri is the bandwidth assigned for this client and βi is thecommunication latency which can be estimated using round triptime between the client i and the edge server

For each client the remote execution time is

T r emotei =

sumv isinV(1 minus Ivi )(cvp0) (4)

where p0 is the processor speed of the edge serveren our ooading task selection problem can be formulated as

minIi ri

Nsumi=1(T locali +Tneti +T r emote

i ) (5)

e ooading task selection is represented by the indicator matrixI is optimization problem is subject to the following constraints

bull e total bandwidth

stNsumi=1

ri le R (6)

bull Like existing work we restrict the data ow to avoid ping-pong eect in which intermediate data is transmied backand forth between client and edge serverst Ivi le Iui foralle(uv) isin Eforalli isin [1N ] (7)

bull Unlike exiting ooading frameworks for mobile cloud com-puting we take the resource contention or schedule delayat the edge side into consideration by adding a end-to-enddelay constraint

st Tlocali minus (Tneti +T r emote

i ) gt τ foralli isin [1N ] (8)where τ can be tuned to avoid selecting borderline tasksthat if ooaded will get no gain due to the resource con-tention or schedule delay at the edge

LAVEA Latency-aware Video Analytics on Edge Computing Platform SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA

42 Optimization Solvere proposed optimization is a mixed integer non-linear program-ming problem (MINLP) where the integer variable stands for theooading decision and the continuous variable stands for the band-width allocation To solve this optimization problem we start fromrelaxing the integer constraints and solve the non-linear program-ming version of the problem using Sequential adratic Program-ming method a constrained nonlinear optimization method issolution is optimal without considering the integer constraintsStarting from this optimal solution we optionally employ branchand bound (BampB) method to search for the optimal integer solutionor simply do an exhaustive search when the number of clients andthe number of tasks of each job are small

43 Prioritizing Edge Taskeuee ooading strategy produced by the task selection optimizesthe rdquoowldquo time of each type of job At each time epoch during therun time the edge-front node receives a large number of ooadedtasks from the clients Originally we follow the rst come rstserve rule to accommodate all the client requests For each requestat the head of the task queue the edge-front server rst checksif the input or intermediate data (eg images or videos) availableat the edge otherwise the server waits is scheme is easy toimplement but substantial computation is wasted if the networkIO is busy with a large size le and there is no task that is readyfor processing erefore we improve the task scheduling witha task queue prioritizer to maintain a task sequence which mini-mizes the makespan for the task scheduling of all ooading taskrequests received at a certain time epoch Since the edge node canexecute the task only when the input data has been fully receivedor the depended tasks have nished execution We consider thatan ooaded task has to go through two stages the rst stage isthe retrieval of input or intermediate data and state variables thesecond stage is the execution of the task

We model our scheduling problem using the ow job shop modeland apply the Johnsonrsquos rule [19] is scheme is optimal and themakespan is minimized when the number of stages is two Nev-ertheless this model only ts in the case that all submied jobrequests are independent and no priorities When consideringtask dependencies a successor can only start aer its predecessornishes By enforcing the topological ordering constraints theproblem can be solved optimally using the BampB method [5] How-ever this solution hardly scales against the number of tasks Inthis case we adapt the method in [3] and group tasks with depen-dencies and execute all tasks in a group sequentially e basicidea is applying Johnsonrsquos rule in two levels e rst level is todecide the sequence of tasks with in each group e dierence inour problem is that we need to decide the best sequence amongall valid topological orderings en the boom level is a job shopscheduling problem in terms of grouped jobs (ie a group of taskswith dependencies in topological ordering) in which we can utilizeJohnsonrsquos rule directly

44 Workload OptimizerIf the workload is overwhelming and the edge-front server is satu-rated the task queue will be unstable and the response time will

be accumulated indenitely ere are several measures we cantake to address this problem First we can adjust the imagevideoresolution in client-side congurations which makes well trade-o between speed and accuracy Second by constraining the taskooading problem we can restrain more computation tasks at theclient side ird if there are nearby edge nodes which are favoredin terms of latency bandwidth and computation we can furtherooad tasks to nearby edge nodes We have investigated this casewith performance improvement considerations in the Section 5Last we can always redirect tasks to the remote cloud just likeooading in MCC

5 INTER-EDGE COLLABORATIONIn this section We improve our edge-rst design by taking the casewhen the incoming workload saturates our edge-front node intoconsideration We will rst discuss our motivation of providingsuch option and list the corresponding challenges en we willintroduce several collaboration schemes we have proposed andinvestigated

51 Motivation and Challengese resources of edge computing node are much richer than clientnodes but are relatively limited compared to cloud nodes Whileserving an increasing number of client nodes nearby the edge-frontnode will be eventually overloaded and become non-responsive tonew requests As a baseline we can optionally choose to ooad fur-ther requests to the remote cloud We assume that the remote cloudhas unlimited resources and is capable to handle all the requestsHowever running tasks remotely in the cloud the application needto bear with unpredictable latency and limited bandwidth whichis not the best choice especially when there are other nearby edgenodes that can accommodate those tasks We assume that underthe condition when all available edge nodes nearby are exhaustedthe mobile-edge-cloud computing paradigm will simply fall back tothe mobile cloud computing paradigm e fallback design is notin the scope of this paper In this paper we mainly investigate theinter-edge collaboration with the prime purpose to alleviate theburden on edge-front node

When the edge-front node is saturated with requests it cancollaborate with nearby edge nodes by placing some tasks to thesenot-so-busy edge nodes such that all the tasks can get scheduledin a reasonable time is is slightly dierent from balancing theworkload among the edge nodes and the edge-front node in that thegoal of inter-edge collaboration is to beer serve the client nodeswith submied requests rather than simply making the workloadbalanced For example an edge-front node that is not overloadeddoes not need to place any tasks to the nearby edge nodes evenwhen they are idle

e challenges of inter-edge collaboration are two-fold 1) weneed to design a proper inter-edge task placement scheme thatfullls our goal of reducing the workload on the edge-front nodewhile ooading proper amount of workload to the qualied edgenodes 2) the task placement scheme should be lightweight scalableand easy-to-implement

SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA S Yi et al

52 Inter-Edge Task Placement SchemesWe have investigated three task placement schemes for inter-edgecollaboration

bull Shortest Transmission Time First (STTF)bull Shortest eue Length First (SQLF)bull Shortest Scheduling Latency First (SSLF)

e STTF task placement scheme tends to place tasks on theedge node that has the shortest estimated latency for the edge-front node to transfer the tasks e edge-front node maintains atable to record the latency of transmiing data to each availableedge node e periodical re-calibration is necessary because thenetwork condition between the edge-front node and other edgenodes may vary from time to time

e SQLF task placement scheme on the other hand tends totransfer tasks from the edge-front node to the edge node which hasthe least number of tasks queued upon the time of query Whenthe edge-front node is saturated with requests it will rst queryall the available edge nodes about their current task queue lengthand then transfer tasks to the edge node that has the shortest valuereported

e SSLF task placement scheme tends to transmit tasks from theedge-front node to the edge node that is predicted to have the short-est response time e response time is the time interval betweenthe time when the edge-front node submits a task to an availableedge node and the time when it receives the result of the task fromthat edge node Unlike the SQLF task placement scheme the edge-front node keeps querying the edge nodes about the queue lengthwhich may has performance issue when the number of nodes scalesup and results in a large volume of queries We have designed anovel method for the edge-front node to measure the schedulinglatency eciently During the measurement phase before edge-front node chooses task placement target edge-front node sendsa request message to each available edge node which appends aspecial task to the tail of the task queue When the special taskis executed the edge node simply sends a response message tothe edge-front node e edge-front node receives the responsemessage and records the response time Periodically the edge-frontnode maintains a series of response times for each available edgenode When the edge-front node is saturated it will start to reassigntasks to the edge node having the shortest response time Unlikethe STTF and SQLF task assignment schemes which choose thetarget edge node based on the current or most recent measurementsthe SSLF scheme predicts the current response time for each edgenode by applying regression analysis to the response time seriesrecorded so far e reason is that the edge nodes are also receivingtask requests from client nodes and their local workload may varyfrom time to time so the most recent response time cannot serveas a good predictor of the current response time for the edge nodesAs the local workload in the real world on each edge node usuallyfollows certain paern or trend applying regression analysis tothe recorded response times is a good way to estimate the currentresponse time To this end we recorded measurements of responsetimes from each edge node and ooads tasks to the edge nodethat is predicted to have the least current response time Once theedge-front node starts to place task to a certain edge node the

estimation will be updated using piggybacking of the redirectedtasks which lowers the overhead of measuring

Each of the task placement schemes described above has someadvantages and disadvantages For instance the STTF scheme canquickly reduce the workload on the edge-front node But there isa chance that tasks may be placed to an edge node which alreadyhas intensive workload as STTF scheme gathers no information ofthe workload on the target e SQLF scheme works well when thenetwork latency and bandwidth are stable among all the availableedge nodes When the network overheads are highly variant thisscheme fails to factor the network condition and always choosesedge node with the lowest workload When an intensive workloadis placed under a high network overhead this scheme potentiallydeteriorates the performance as it needs to measure the workloadfrequently e SSLF task placement scheme estimates the responsetime of each edge node by following the task-ooading processand the response time is a good indicator of which edge node shouldbe chosen as the target of task placement in terms of the workloadand network overhead e SSLF scheme is a well trade-o betweenprevious two schemes However the regression analysis may intro-duce a large error to the predicted response time if inappropriatemodels are selected We believe that the decision of which taskplacement scheme should be employed for achieving good systemperformance should always give proper considerations on the work-load and network conditions We evaluated those three schemesthrough a case study in the next section

6 SYSTEM IMPLEMENTATION ANDPERFORMANCE EVALUATION

In this section we rst brief the implementation details in buildingour system Next we introduce our evaluation setup and presentresults of our evaluations

61 Implementation DetailsOur implementation aims at a serverless edge architecture Asshown in system architecture of Fig 4 our implementation is basedon docker container for the benets of quick deployment and easymanagement Every component has been dockerized and its de-ployment is greatly simplied via distributing pre-built images ecreation and destruction of docker instances is much faster thanthat of VM instances Inspired by the IBM OpenWhisk [18] eachworker container contains an action proxy which uses Pythonto run any scripts or compile and execute any binary executablee worker container communicates with others using a messagequeue as all the inputsoutputs will be jsonied However we donrsquotjsonied imagevideo and use its path reference in shared storagee task queue is implemented using Redis as it is in memory andhas very good performance e end user only needs to 1) deployour edge computing platform on heterogeneous devices with justa click 2) dene the event of interests using provided API 3) pro-vide a function (scripts or binary executable) to process such evente function we have implemented is using open source projectOpenALPR[22] as the task payload for workers

LAVEA Latency-aware Video Analytics on Edge Computing Platform SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA

62 Evaluation Setup621 Testbed We have built a testbed consisting of four edge

computing nodes One of the edge nodes is the edge-front nodewhich is directly connected to a wireless router using a cable Otherthree nodes are set as nearby edge computing nodes for the evalua-tion of inter-edge collaboration ese four machines have the samehardware specications ey all have a quad-core CPU and 4 GBmain memory e three nearby edge nodes are directly connectedto the edge-front node through a network cable We make use oftwo types of Raspberry Pi (RPi) nodes as clients one type is RPi 2which is wired to the router while the other type is RPi 3 which isconnected to router using built-in 24 Ghz WiFi

622 Datasets We have employed three datasets for evalua-tions One dataset is the Caltech Vision Group 2001 testing databasein which the car rear image resolution (126 images with resolution896x592) is adequate for license plate recognition [25] Anotherdataset is a self-collected 4K video contains rear license plates takenon an Android smartphone and is converted into videos of dierentresolutions(640x480 960x720 1280x960 and 1600x1200) e otherdataset used in inter-edge collaboration evaluation contains 22 carimages with the various resolution ranging from 405x540 pixelsto 2514x1210 pixels (le size 316 KB to 285 MB) e task requestsuse the car images as input in a round-robin way one car imagefor each task request

63 Task ProlerBeside the round trip time and bandwidth benchmark we havepresented in Fig 2 and Fig 3 to characterize the edge computingnetwork we have done proling of the OpenALPR application onvarious client edge and cloud nodes

896x592workload1

640x480workload2

960x720workload2

1280x960workload2

1600x1200workload2

0200

400

600

800

Clie

nt

RPi2

Execu

tion T

ime(m

s)

MotionDetectonPlateDetectionPlateAnalysisOCR

Figure 6 OpenALPR prole result of client type 1 (RPi2quad-core 09GHz)

In this experiment we are using both dataset 1 (workload 1)and dataset 2 (workload 2) at various resolutions e executiontime for each tasks are shown in Fig 6 Fig 7 Fig 8 and Fig 9e results indicate that by utilizing an edge nodes we can geta comparable amount of computation power close to clients forcomputation-intensive tasks Another observations is that dueto the uneven optimizations on heterogeneous CPU architecturessome tasks are beer to keep local while some others should be

896x592workload1

640x480workload2

960x720workload2

1280x960workload2

1600x1200workload2

020

04

00

60

0800

Clie

nt

RPi3

Execu

tion T

ime(m

s)

MotionDetectonPlateDetectionPlateAnalysisOCR

Figure 7 OpenALPR prole result of client type 2 (RPi3quad-core 12GHz)

896x592workload1

640x480workload2

960x720workload2

1280x960workload2

1600x1200workload2

0200

400

600

800

Edge E

xecu

tion T

ime(m

s)

MotionDetectonPlateDetectionPlateAnalysisOCR

Figure 8 OpenALPR prole result of a type of edge node (i7quad-core 230GHz)

896x592workload1

640x480workload2

960x720workload2

1280x960workload2

1600x1200workload2

0200

400

600

800

EC

2 T

2 L

arg

e E

xecu

tion T

ime(m

s)

MotionDetectonPlateDetectionPlateAnalysisOCR

Figure 9 OpenALPR prole of a type of cloud node (AWSEC2 t2large Xeon dual-core 240GHz)

ooaded to edge computing node is observation justied theneed of computation ooading between clients and edge nodes

SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA S Yi et al

640x480workload2

960x720workload2

1280x960workload2

1600x1200workload2

01

23

4R

esp

onse

Tim

e p

er

fram

e p

er

clie

nt(

s) Client-edge optClient onlyEdge onlyClient-cloud optCloud only

Figure 10 e comparison of task selection impacts on edgeoloading and cloud oloading for wired clients (RPi2)

640x480workload2

960x720workload2

1280x960workload2

1600x1200workload2

01

23

4R

esp

onse

Tim

e p

er

fram

e p

er

clie

nt(

s) Client-edge optClient onlyEdge onlyClient-cloud optCloud only

Figure 11 e comparison of task selection impacts on edgeoloading and cloud oloading for 24 Ghz wireless clients(RPi3)

64 Oloading Task SelectionTo understand how much the execution time can be reduced byspliing tasks between the client and the edge or between theclient and the cloud we design an experiment with workloadsgenerated from dataset 2 on two setups of scenarios 1) one edgenode provides service to three wired client nodes that have thebest network latency and bandwidth 2) one edge node providesservice to three wireless 24 Ghz client nodes that have latencywith high variance and relatively low bandwidth e result ofthe rst case is very straightforward the clients simply uploadall the input data and run all the tasks on the edge node in edgeooading or cloud node in cloud ooading as shown in Fig 10is is mainly because using Ethernet cable can stably providelowest latency and highest bandwidth which makes ooading toedge very rewarding We didnrsquot evaluate 5 Ghz wireless client sincethis interface is not supported on our client hardware while weanticipate similar results as the wire case We plot the result of a24 Ghz wireless client node with ooading to an edge node or aremote cloud node in the second case in Fig 11 Overall the resultsshowed that by ooading tasks to an edge computing platform

the application we choose experienced a speedup up to 40x onwired client-edge conguration compared to local execution andup to 17x compared to a similar client-cloud conguration Forclients with 24 Ghz wireless interface the speedup is up to 13x onclient-edge conguration compared to local execution and is up to12x compared to similar client-cloud conguration

5 10 15 20 25 30 35Number of task offloading requests

05

Resp

onse

Tim

e(s

)

Our schemeSIOFLCPUL

Figure 12 e comparison result of three task prioritizingschemes

65 Edge-front Taskeue PrioritizingTo evaluate the performance of the task queue prioritizing wecollect the statistical results from our proler service and moni-toring service on various workload for simulation We choose thesimulation method because we can freely setup the numbers andtypes of client and edge nodes to overcome the limitation of ourcurrent testbed to evaluate more complex deployments We addtwo simple schemes as baselines 1) shortest IO rst (SIOF) sortingall the tasks against the time cost of the network transmission 2)longest CPU last (LCPUL) sorting all the tasks against the time costof the processing on the edge node In the simulation base on thecombination of client device types workloads and ooading deci-sions we have in total seven types of jobs to run on the edge nodeWe increase the total number of jobs and evenly distributed themamong the seven types and report the makespan time in Fig 12 eresult showed that LCPUL is the worst among those three schemesand our scheme outperforms the shortest job rst scheme

66 Inter-Edge CollaborationWe also evaluate the performance of the three task placementschemes (ie STTF SQLF and SSLF) discussed in Section 5 througha controlled experiment on our testbed For evaluation purpose wecongure the network in the edge computing system as followse rst edge node denoted as ldquoedge node 1rdquo aerwards has 10ms RTT and 40 Mbps bandwidth to the edge-front node e secondedge node ldquoedge node 2rdquo has 20 ms RTT and 20 Mbps bandwidthto the edge-front node e third edge node ldquoedge node 3rdquo has100 ms RTT and 2 Mbps bandwidth to the edge-front node uswe emulate the situation where three edge nodes are in the distanceto the edge-front node from near to far

LAVEA Latency-aware Video Analytics on Edge Computing Platform SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA

0

05

1

15

2

25

3

0 2 4 6 8 10 12

Th

rou

gh

pu

t (t

ask

sse

c)

Time (min)

Edge-front nodeEdge node 1Edge node 2Edge node 3

Figure 13 Performance with no task placement scheme

0

05

1

15

2

25

3

0 2 4 6 8 10 12

Th

rou

gh

pu

t (t

ask

sse

c)

Time (min)

Edge-front nodeEdge node 1Edge node 2Edge node 3

Figure 14 Performance of STTF

0

05

1

15

2

25

3

0 2 4 6 8 10 12

Th

rou

gh

pu

t (t

ask

sse

c)

Time (min)

Edge-front nodeEdge node 1Edge node 2Edge node 3

Figure 15 Performance of SQLF

0

05

1

15

2

25

3

0 2 4 6 8 10 12

Th

rou

gh

pu

t (t

ask

sse

c)

Time (min)

Edge-front nodeEdge node 1Edge node 2Edge node 3

Figure 16 Performance of SSLF

We use the third dataset to synthesize a workload as follows Inthe rst 4 minutes the edge-front node receives 5 task requests persecond edge node 1 receives 4 task requests per second edge node2 receives 3 task requests per second and edge node 3 receives 2task requests per second respectively No task comes to any of theedge nodes aer the rst 4 minutes For the SSLF task placementscheme we implement a simple linear regression to predict thescheduling latency of the task being transmied since the workloadwe have injected is uniform distributed

Fig 13 illustrates the throughput on each edge node when notask placement scheme is enabled on the edge-front node eedge-front node has the heaviest workload and it takes about 1236minutes to nish all the tasks We consider this result as our base-line

Fig 14 is the throughput result of STTF scheme In this case theedge-front node only transmits tasks to edge node 1 because edgenode 1 has the highest bandwidth and the shortest RTT to theedge-front node Fig 17 reveals that the edge-front node transmits120 tasks to edge node 1 and no task to other edge nodes Asedge node 1 has heavier workload than edge node 2 and edgenode 3 the STTF scheme has limited improvement on the systemperformance the edge-front node takes about 1129 minutes tonish all the tasks Fig 15 illustrates the throughput result of SQLF

scheme is scheme works beer than the STTF scheme becausethe edge-front node transmits more tasks to less-saturated edgenodes eciently reducing the workload on the edge-front nodeHowever the edge-front node intends to transmit many tasks toedge node 3 at the beginning which has the lowest bandwidth andthe longest RTT to the edge-front node As such the task placementmay incur more delay then expected From Fig 17 the edge-frontnode transmits 0 task to edge node 1 132 tasks to edge node 2and 152 tasks to edge node 3 e edge-front node takes about 96minutes to nish all the tasks

Fig 16 demonstrates the throughput result of SSLF scheme isscheme considers both the transmission time of the task beingplaced and the waiting time in the queue on the target edge nodeand therefore achieves the best performance of the three As men-tioned edge node 1 has the lowest transmission overhead but theheaviest workload among the three edge nodes while edge node3 has the lightest workload but the highest transmission overheadIn contrast edge node 2 has modest transmission overhead andmodest workload e SSLF scheme takes all these situations intoconsideration and places the most number of tasks on edge node2 As shown in Fig 17 the edge-front node transmits 4 tasks toedge node 1 152 tasks to edge node 2 and 148 tasks to edgenode 3 when working with the SSLF scheme e edge-front node

SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA S Yi et al

takes about 936 minutes to nish all the tasks which is the bestresult among the three schemes We infer that the third schemewill further improve the task complete time if more tough networkconditions and workloads are considered

STTF SQLF SSLF

050

10

015

0N

um

ber

of

task

palc

ed

0task

0task

0task

4tasks

Edge node 1Edge node 2Edge node 3

Figure 17 Numbers of tasks placed by the edge-front node

7 RELATEDWORKe emergence of edge computing has drawn aentions due to itscapabilities to reshape the land surface of IoTs mobile computingand cloud computing [6 14 32 33 36ndash38] Satyanarayanan [29]has briefed the origin of edge computing also known as fog comput-ing [4] cloudlet [28] mobile edge computing [24] and so on Herewe will review several relevant research elds towards video edgeanalytics including distributed data processing and computationooading in various computing paradigms

71 Distributed Data ProcessingDistributed data processing has close relationship to the edge an-alytics in the sense that those data processing platforms [9 39]and underlying techniques [16 23 27] can be easily deployed on acluster of edge nodes In this paper we pay specially aentions todistributed imagevideo data processing systems VideoStorm[40]made insightful observation on vision-related algorithms and pro-posed resource-quality trade-o with multi-dimensional congu-rations (eg video resolution frame rate sampling rate slidingwindow size etc) e resource-quality proles are generated of-ine and a online scheduler is built to allocate resources to queriesto optimize the utility of quality and latency eir work is comple-mentary to ours in that we do not consider the trade-o betweenquality and latency goals via adaptive congurations Vigil [42] isa wireless video surveillance system that leveraged edge comput-ing nodes with emphasis on the content-aware frame selections ina scenario where multiple web cameras are at the same locationto optimize the bandwidth utilization which is orthogonal to theproblems we have addressed here Firework [41] is a computingparadigm for big data processing in collaborative edge environmentwhich is complementary to our work in terms of shared data viewand programming interface

While there should be more on-going eorts for investigating theadaptation improvement and optimization of existing distributed

data processing techniques on edge computing platform we focusmore on the taskapplication-level queue management and sched-uling and leave all the underlying resource negotiating processscheduling to the container cluster engine

72 Computation OloadingComputation Ooading (aka Cyber foraging [28]) has been pro-posed to improve resource utilization response time and energyconsumption in various computing environments [7 13 21 31]Work [17] has quantied the impact of edge computing on mobileapplications and found that edge computing can improve responsetime and energy consumption signicantly for mobile devicesthrough ooading via both WiFi and LTE networks Mocha [34]has investigated how a two-stage face recognition task from mobiledevice can be accelerated by cloudlet and cloud In their designclients simply capture image and sends to cloudlet e optimaltask partition can be easily achieved as it has only two stages InLAVEA our application is more complicated in multiple stages andwe leverage client-edge ooading and other techniques to improvethe resource utilization and optimize the response time

8 DISCUSSIONS AND LIMITATIONSIn this section we will discuss alternative design options point outcurrent limitations and future work that can improve the system

Measurement-based Oloading In this paper we are using ameasurement-based ooading (static ooading) ie the ooadingdecisions are based on the outcome of periodic measurements Weconsider this as one of the limitations of our implementations asstated in [15] and there are several dynamic computation ooadingschemes have been proposed [12] We are planning to improve themeasurement-based ooading in the future work

Video Streaming Our current data processing is image-basedwhich is one of the limitations of our implementation e inputis either in the format of image or in video stream which is readinto frames and sent out We believe by utilizing existing videostreaming techniques in between our system components for datasharing will further improve the system performance and openmore potential opportunities for optimization

Discovering EdgeNodes ere are dierent ways for the edge-front node to discover the available edge nodes nearby For exampleevery edge node intending to serve as a collaborator may open adesignated port so that the edge-front node can periodically scanthe network and discover the available edge nodes is is calledthe ldquopull-basedrdquo method In contrast there is also a ldquopush-basedrdquomethod in which the edge-front node opens a designated port andevery edge node intending to serve as a collaborator will register tothe edge-front node When the network is in a large scale the pull-based method usually performs poorly because the edge-front nodemay not be able to discover an available edge node in a short timeFor this reason the edge node discovery is implemented in a push-based method which guarantees good performance regardless ofthe network scale

9 CONCLUSIONIn this paper we have investigated providing video analytic servicesto latency-sensitive applications in edge computing environment

LAVEA Latency-aware Video Analytics on Edge Computing Platform SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA

As a result we have built LAVEA a low-latency video edge analyticsystem which collaborates nearby client edge and remote cloudnodes and transfers video feeds into semantic information at placescloser to the users in early stages We have utilized an edge-frontdesign and formulated an optimization problem for ooading taskselection and prioritized task queue to minimize the response timeOur result indicated that by ooading tasks to the closest edgenode the client-edge conguration has speedup range from 13x to4x (12x to 17x) against running in local (client-cloud) under variousnetwork conditions and workloads In case of a saturating workloadon the front edge node we have proposed and compared varioustask placement schemes that are tailed for inter-edge collaboratione proposed prediction-based shortest scheduling latency rsttask placement scheme considers both the transmission time ofthe task and the waiting time in the queue and outputs overallperformance beer than the other schemes

REFERENCES[1] Amazon Web Service 2017 AWS LambdaEdge hpdocsawsamazoncom

lambdalatestdglambda-edgehtml (2017)[2] Christos-Nikolaos E Anagnostopoulos Ioannis E Anagnostopoulos Ioannis D

Psoroulas Vassili Loumos and Eleherios Kayafas 2008 License plate recog-nition from still images and video sequences A survey IEEE Transactions onintelligent transportation systems 9 3 (2008) 377ndash391

[3] KR Baker 1990 Scheduling groups of jobs in the two-machine ow shop Math-ematical and Computer Modelling 13 3 (1990) 29ndash36

[4] Flavio Bonomi Rodolfo Milito Jiang Zhu and Sateesh Addepalli 2012 Fogcomputing and its role in the internet of things In Proceedings of the rst editionof the MCC workshop on Mobile cloud computing ACM 13ndash16

[5] Peter Brucker Bernd Jurisch and Bernd Sievers 1994 A branch and boundalgorithm for the job-shop scheduling problem Discrete applied mathematics 491 (1994) 107ndash127

[6] Yu Cao Songqing Chen Peng Hou and Donald Brown 2015 FAST A fog com-puting assisted distributed analytics system to monitor fall for stroke mitigationIn Networking Architecture and Storage (NAS) 2015 IEEE International Conferenceon IEEE 2ndash11

[7] Eduardo Cuervo Aruna Balasubramanian Dae-ki Cho Alec Wolman StefanSaroiu Ranveer Chandra and Paramvir Bahl 2010 MAUI Making SmartphonesLast Longer with Code Ooad In Proceedings of the 8th International Conferenceon Mobile Systems Applications and Services (MobiSys rsquo10) ACM New York NYUSA 49ndash62 DOIhpsdoiorg10114518144331814441

[8] Eyal de Lara Carolina S Gomes Steve Langridge S Hossein Mortazavi andMeysam Roodi 2016 Hierarchical Serverless Computing for the Mobile EdgeIn Edge Computing (SEC) IEEEACM Symposium on IEEE 109ndash110

[9] Jerey Dean and Sanjay Ghemawat 2008 MapReduce simplied data processingon large clusters Commun ACM 51 1 (2008) 107ndash113

[10] Shan Du Mahmoud Ibrahim Mohamed Shehata and Wael Badawy 2013 Auto-matic license plate recognition (ALPR) A state-of-the-art review IEEE Transac-tions on circuits and systems for video technology 23 2 (2013) 311ndash325

[11] Sadjad Fouladi Riad S Wahby Brennan Shackle Karthikeyan Vasuki Balasubra-maniam William Zeng Rahul Bhalerao Anirudh Sivaraman George Porter andKeith Winstein 2017 Encoding Fast and Slow Low-Latency Video ProcessingUsing ousands of Tiny reads In 14th USENIX Symposium on NetworkedSystems Design and Implementation (NSDI 17) USENIX Association BostonMA 363ndash376 hpswwwusenixorgconferencensdi17technical-sessionspresentationfouladi

[12] Wei Gao Yong Li Haoyang Lu Ting Wang and Cong Liu 2014 On exploitingdynamic execution paerns for workload ooading in mobile cloud applicationsIn Network Protocols (ICNP) 2014 IEEE 22nd International Conference on IEEE1ndash12

[13] Mark S Gordon Davoud Anoushe Jamshidi Sco A Mahlke Zhuoqing Mor-ley Mao and Xu Chen 2012 COMET Code Ooad by Migrating ExecutionTransparently In OSDI Vol 12 93ndash106

[14] Zijiang Hao Ed Novak Shanhe Yi andn Li 2017 Challenges and SowareArchitecture for Fog Computing Internet Computing (2017)

[15] Mohammed A Hassan Kshitiz Bhaarai Qi Wei and Songqing Chen 2014POMAC Properly Ooading Mobile Applications to Clouds In 6th USENIXWorkshop on Hot Topics in Cloud Computing (HotCloud 14)

[16] Benjamin Hindman Andy Konwinski Matei Zaharia Ali Ghodsi Anthony DJoseph Randy H Katz Sco Shenker and Ion Stoica 2011 Mesos A Platformfor Fine-Grained Resource Sharing in the Data Center In NSDI Vol 11 22ndash22

[17] Wenlu Hu Ying Gao Kiryong Ha Junjue Wang Brandon Amos Zhuo ChenPadmanabhan Pillai andMahadev Satyanarayanan 2016antifying the Impactof Edge Computing onMobile Applications In Proceedings of the 7th ACM SIGOPSAsia-Pacic Workshop on Systems ACM 5

[18] IBM 2017 Apache OpenWhisk hpopenwhiskorg (April 2017)[19] Selmer Martin Johnson 1954 Optimal two-and three-stage production schedules

with setup times included Naval research logistics quarterly 1 1 (1954) 61ndash68[20] Eric Jonas Shivaram Venkataraman Ion Stoica and Benjamin Recht 2017

Occupy the Cloud Distributed computing for the 99 arXiv preprintarXiv170204024 (2017)

[21] Ryan Newton Sivan Toledo Lewis Girod Hari Balakrishnan and Samuel Mad-den 2009 Wishbone Prole-based Partitioning for Sensornet Applications InNSDI Vol 9 395ndash408

[22] OpenALPR 2017 OpenALPR ndash Automatic License Plate Recognition hpwwwopenalprcom (April 2017)

[23] Kay Ousterhout Patrick Wendell Matei Zaharia and Ion Stoica 2013 Sparrowdistributed low latency scheduling In Proceedings of the Twenty-Fourth ACMSymposium on Operating Systems Principles ACM 69ndash84

[24] M Patel B Naughton C Chan N Sprecher S Abeta A Neal and others 2014Mobile-edge computing introductory technical white paper White Paper Mobile-edge Computing (MEC) industry initiative (2014)

[25] Brad Philip and Paul Updike 2001 Caltech Vision Group 2001 testing databasehpwwwvisioncaltecheduhtml-lesarchivehtml (2001)

[26] Kari Pulli Anatoly Baksheev Kirill Kornyakov and Victor Eruhimov 2012 Real-time computer vision with OpenCV Commun ACM 55 6 (2012) 61ndash69

[27] Je Rasley Konstantinos Karanasos Srikanth Kandula Rodrigo Fonseca MilanVojnovic and Sriram Rao 2016 Ecient queue management for cluster sched-uling In Proceedings of the Eleventh European Conference on Computer SystemsACM 36

[28] Mahadev Satyanarayanan 2001 Pervasive computing Vision and challengesIEEE Personal communications 8 4 (2001) 10ndash17

[29] Mahadev Satyanarayanan 2017 e Emergence of Edge Computing Computer50 1 (2017) 30ndash39

[30] Mahadev Satyanarayanan Pieter Simoens Yu Xiao Padmanabhan Pillai ZhuoChen Kiryong Ha Wenlu Hu and Brandon Amos 2015 Edge analytics in theinternet of things IEEE Pervasive Computing 14 2 (2015) 24ndash31

[31] Cong Shi Karim Habak Pranesh Pandurangan Mostafa Ammar Mayur Naikand Ellen Zegura 2014 Cosmos computation ooading as a service for mobiledevices In Proceedings of the 15th ACM international symposium on Mobile adhoc networking and computing ACM 287ndash296

[32] W Shi J Cao Q Zhang Y Li and L Xu 2016 Edge Computing Visionand Challenges IEEE Internet of ings Journal 3 5 (Oct 2016) 637ndash646 DOIhpsdoiorg101109JIOT20162579198

[33] Weisong Shi and Schahram Dustdar 2016 e Promise of Edge ComputingComputer 49 5 (2016) 78ndash81

[34] Tolga Soyata Rajani Muraleedharan Colin Funai Minseok Kwon and WendiHeinzelman 2012 Cloud-Vision Real-time face recognition using a mobile-cloudlet-cloud acceleration architecture In Computers and Communications(ISCC) 2012 IEEE Symposium on IEEE 000059ndash000066

[35] Xiaoli Wang Aakanksha Chowdhery andMung Chiang 2016 SkyEyes adaptivevideo streaming from UAVs In Proceedings of the 3rd Workshop on Hot Topics inWireless ACM 2ndash6

[36] Shanhe Yi Zijiang Hao Zhengrui Qin andn Li 2015 Fog Computing Plat-form and Applications In Hot Topics in Web Systems and Technologies (HotWeb)2015 ird IEEE Workshop on IEEE 73ndash78

[37] Shanhe Yi Cheng Li and n Li 2015 A Survey of Fog Computing ConceptsApplications and Issues In Proceedings of the 2015 Workshop on Mobile Big DataMobidata rsquo15 ACM 37ndash42

[38] Shanhe Yi Zhengrui Qin andn Li 2015 Security and privacy issues of fogcomputing A survey InWireless Algorithms Systems and Applications Springer685ndash695

[39] Matei Zaharia Mosharaf Chowdhury Michael J Franklin Sco Shenker and IonStoica 2010 Spark cluster computing with working sets HotCloud 10 (2010)10ndash10

[40] Haoyu Zhang Ganesh Ananthanarayanan Peter Bodik Mahai PhiliposeParamvir Bahl and Michael J Freedman 2017 Live Video Analytics at Scale withApproximation and Delay-Tolerance In 14th USENIX Symposium on NetworkedSystems Design and Implementation (NSDI 17) USENIX Association BostonMA 377ndash392 hpswwwusenixorgconferencensdi17technical-sessionspresentationzhang

[41] Q Zhang X Zhang Q Zhang W Shi and H Zhong 2016 Firework Big DataSharing and Processing in Collaborative Edge Environment In 2016 Fourth IEEEWorkshop on Hot Topics in Web Systems and Technologies (HotWeb) 20ndash25 DOIhpsdoiorg101109HotWeb201612

[42] Tan Zhang Aakanksha Chowdhery Paramvir Victor Bahl Kyle Jamieson andSuman Banerjee 2015 e design and implementation of a wireless videosurveillance system In Proceedings of the 21st Annual International Conferenceon Mobile Computing and Networking ACM 426ndash438

  • Abstract
  • 1 Introduction
  • 2 Background and Motivation
    • 21 Edge Computing Network
    • 22 Serverless Architecture
    • 23 Video Edge Analytics for Public Safety
      • 3 LAVEA System Design
        • 31 Design Goals
        • 32 System Overview
        • 33 Edge Computing Services
          • 4 Edge-front Offloading
            • 41 Task Offloading System Model and Problem Formulation
            • 42 Optimization Solver
            • 43 Prioritizing Edge Task Queue
            • 44 Workload Optimizer
              • 5 Inter-edge Collaboration
                • 51 Motivation and Challenges
                • 52 Inter-Edge Task Placement Schemes
                  • 6 System Implementation and Performance Evaluation
                    • 61 Implementation Details
                    • 62 Evaluation Setup
                    • 63 Task Profiler
                    • 64 Offloading Task Selection
                    • 65 Edge-front Task Queue Prioritizing
                    • 66 Inter-Edge Collaboration
                      • 7 Related Work
                        • 71 Distributed Data Processing
                        • 72 Computation Offloading
                          • 8 Discussions and Limitations
                          • 9 Conclusion
                          • References
Page 7: LAVEA: Latency-aware Video Analytics on Edge Computing Platformweisongshi.org/papers/yi17-LAVEA.pdf · 2020. 8. 19. · LAVEA: Latency-aware Video Analytics on Edge Computing Platform

LAVEA Latency-aware Video Analytics on Edge Computing Platform SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA

42 Optimization Solvere proposed optimization is a mixed integer non-linear program-ming problem (MINLP) where the integer variable stands for theooading decision and the continuous variable stands for the band-width allocation To solve this optimization problem we start fromrelaxing the integer constraints and solve the non-linear program-ming version of the problem using Sequential adratic Program-ming method a constrained nonlinear optimization method issolution is optimal without considering the integer constraintsStarting from this optimal solution we optionally employ branchand bound (BampB) method to search for the optimal integer solutionor simply do an exhaustive search when the number of clients andthe number of tasks of each job are small

43 Prioritizing Edge Taskeuee ooading strategy produced by the task selection optimizesthe rdquoowldquo time of each type of job At each time epoch during therun time the edge-front node receives a large number of ooadedtasks from the clients Originally we follow the rst come rstserve rule to accommodate all the client requests For each requestat the head of the task queue the edge-front server rst checksif the input or intermediate data (eg images or videos) availableat the edge otherwise the server waits is scheme is easy toimplement but substantial computation is wasted if the networkIO is busy with a large size le and there is no task that is readyfor processing erefore we improve the task scheduling witha task queue prioritizer to maintain a task sequence which mini-mizes the makespan for the task scheduling of all ooading taskrequests received at a certain time epoch Since the edge node canexecute the task only when the input data has been fully receivedor the depended tasks have nished execution We consider thatan ooaded task has to go through two stages the rst stage isthe retrieval of input or intermediate data and state variables thesecond stage is the execution of the task

We model our scheduling problem using the ow job shop modeland apply the Johnsonrsquos rule [19] is scheme is optimal and themakespan is minimized when the number of stages is two Nev-ertheless this model only ts in the case that all submied jobrequests are independent and no priorities When consideringtask dependencies a successor can only start aer its predecessornishes By enforcing the topological ordering constraints theproblem can be solved optimally using the BampB method [5] How-ever this solution hardly scales against the number of tasks Inthis case we adapt the method in [3] and group tasks with depen-dencies and execute all tasks in a group sequentially e basicidea is applying Johnsonrsquos rule in two levels e rst level is todecide the sequence of tasks with in each group e dierence inour problem is that we need to decide the best sequence amongall valid topological orderings en the boom level is a job shopscheduling problem in terms of grouped jobs (ie a group of taskswith dependencies in topological ordering) in which we can utilizeJohnsonrsquos rule directly

44 Workload OptimizerIf the workload is overwhelming and the edge-front server is satu-rated the task queue will be unstable and the response time will

be accumulated indenitely ere are several measures we cantake to address this problem First we can adjust the imagevideoresolution in client-side congurations which makes well trade-o between speed and accuracy Second by constraining the taskooading problem we can restrain more computation tasks at theclient side ird if there are nearby edge nodes which are favoredin terms of latency bandwidth and computation we can furtherooad tasks to nearby edge nodes We have investigated this casewith performance improvement considerations in the Section 5Last we can always redirect tasks to the remote cloud just likeooading in MCC

5 INTER-EDGE COLLABORATIONIn this section We improve our edge-rst design by taking the casewhen the incoming workload saturates our edge-front node intoconsideration We will rst discuss our motivation of providingsuch option and list the corresponding challenges en we willintroduce several collaboration schemes we have proposed andinvestigated

51 Motivation and Challengese resources of edge computing node are much richer than clientnodes but are relatively limited compared to cloud nodes Whileserving an increasing number of client nodes nearby the edge-frontnode will be eventually overloaded and become non-responsive tonew requests As a baseline we can optionally choose to ooad fur-ther requests to the remote cloud We assume that the remote cloudhas unlimited resources and is capable to handle all the requestsHowever running tasks remotely in the cloud the application needto bear with unpredictable latency and limited bandwidth whichis not the best choice especially when there are other nearby edgenodes that can accommodate those tasks We assume that underthe condition when all available edge nodes nearby are exhaustedthe mobile-edge-cloud computing paradigm will simply fall back tothe mobile cloud computing paradigm e fallback design is notin the scope of this paper In this paper we mainly investigate theinter-edge collaboration with the prime purpose to alleviate theburden on edge-front node

When the edge-front node is saturated with requests it cancollaborate with nearby edge nodes by placing some tasks to thesenot-so-busy edge nodes such that all the tasks can get scheduledin a reasonable time is is slightly dierent from balancing theworkload among the edge nodes and the edge-front node in that thegoal of inter-edge collaboration is to beer serve the client nodeswith submied requests rather than simply making the workloadbalanced For example an edge-front node that is not overloadeddoes not need to place any tasks to the nearby edge nodes evenwhen they are idle

e challenges of inter-edge collaboration are two-fold 1) weneed to design a proper inter-edge task placement scheme thatfullls our goal of reducing the workload on the edge-front nodewhile ooading proper amount of workload to the qualied edgenodes 2) the task placement scheme should be lightweight scalableand easy-to-implement

SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA S Yi et al

52 Inter-Edge Task Placement SchemesWe have investigated three task placement schemes for inter-edgecollaboration

bull Shortest Transmission Time First (STTF)bull Shortest eue Length First (SQLF)bull Shortest Scheduling Latency First (SSLF)

e STTF task placement scheme tends to place tasks on theedge node that has the shortest estimated latency for the edge-front node to transfer the tasks e edge-front node maintains atable to record the latency of transmiing data to each availableedge node e periodical re-calibration is necessary because thenetwork condition between the edge-front node and other edgenodes may vary from time to time

e SQLF task placement scheme on the other hand tends totransfer tasks from the edge-front node to the edge node which hasthe least number of tasks queued upon the time of query Whenthe edge-front node is saturated with requests it will rst queryall the available edge nodes about their current task queue lengthand then transfer tasks to the edge node that has the shortest valuereported

e SSLF task placement scheme tends to transmit tasks from theedge-front node to the edge node that is predicted to have the short-est response time e response time is the time interval betweenthe time when the edge-front node submits a task to an availableedge node and the time when it receives the result of the task fromthat edge node Unlike the SQLF task placement scheme the edge-front node keeps querying the edge nodes about the queue lengthwhich may has performance issue when the number of nodes scalesup and results in a large volume of queries We have designed anovel method for the edge-front node to measure the schedulinglatency eciently During the measurement phase before edge-front node chooses task placement target edge-front node sendsa request message to each available edge node which appends aspecial task to the tail of the task queue When the special taskis executed the edge node simply sends a response message tothe edge-front node e edge-front node receives the responsemessage and records the response time Periodically the edge-frontnode maintains a series of response times for each available edgenode When the edge-front node is saturated it will start to reassigntasks to the edge node having the shortest response time Unlikethe STTF and SQLF task assignment schemes which choose thetarget edge node based on the current or most recent measurementsthe SSLF scheme predicts the current response time for each edgenode by applying regression analysis to the response time seriesrecorded so far e reason is that the edge nodes are also receivingtask requests from client nodes and their local workload may varyfrom time to time so the most recent response time cannot serveas a good predictor of the current response time for the edge nodesAs the local workload in the real world on each edge node usuallyfollows certain paern or trend applying regression analysis tothe recorded response times is a good way to estimate the currentresponse time To this end we recorded measurements of responsetimes from each edge node and ooads tasks to the edge nodethat is predicted to have the least current response time Once theedge-front node starts to place task to a certain edge node the

estimation will be updated using piggybacking of the redirectedtasks which lowers the overhead of measuring

Each of the task placement schemes described above has someadvantages and disadvantages For instance the STTF scheme canquickly reduce the workload on the edge-front node But there isa chance that tasks may be placed to an edge node which alreadyhas intensive workload as STTF scheme gathers no information ofthe workload on the target e SQLF scheme works well when thenetwork latency and bandwidth are stable among all the availableedge nodes When the network overheads are highly variant thisscheme fails to factor the network condition and always choosesedge node with the lowest workload When an intensive workloadis placed under a high network overhead this scheme potentiallydeteriorates the performance as it needs to measure the workloadfrequently e SSLF task placement scheme estimates the responsetime of each edge node by following the task-ooading processand the response time is a good indicator of which edge node shouldbe chosen as the target of task placement in terms of the workloadand network overhead e SSLF scheme is a well trade-o betweenprevious two schemes However the regression analysis may intro-duce a large error to the predicted response time if inappropriatemodels are selected We believe that the decision of which taskplacement scheme should be employed for achieving good systemperformance should always give proper considerations on the work-load and network conditions We evaluated those three schemesthrough a case study in the next section

6 SYSTEM IMPLEMENTATION ANDPERFORMANCE EVALUATION

In this section we rst brief the implementation details in buildingour system Next we introduce our evaluation setup and presentresults of our evaluations

61 Implementation DetailsOur implementation aims at a serverless edge architecture Asshown in system architecture of Fig 4 our implementation is basedon docker container for the benets of quick deployment and easymanagement Every component has been dockerized and its de-ployment is greatly simplied via distributing pre-built images ecreation and destruction of docker instances is much faster thanthat of VM instances Inspired by the IBM OpenWhisk [18] eachworker container contains an action proxy which uses Pythonto run any scripts or compile and execute any binary executablee worker container communicates with others using a messagequeue as all the inputsoutputs will be jsonied However we donrsquotjsonied imagevideo and use its path reference in shared storagee task queue is implemented using Redis as it is in memory andhas very good performance e end user only needs to 1) deployour edge computing platform on heterogeneous devices with justa click 2) dene the event of interests using provided API 3) pro-vide a function (scripts or binary executable) to process such evente function we have implemented is using open source projectOpenALPR[22] as the task payload for workers

LAVEA Latency-aware Video Analytics on Edge Computing Platform SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA

62 Evaluation Setup621 Testbed We have built a testbed consisting of four edge

computing nodes One of the edge nodes is the edge-front nodewhich is directly connected to a wireless router using a cable Otherthree nodes are set as nearby edge computing nodes for the evalua-tion of inter-edge collaboration ese four machines have the samehardware specications ey all have a quad-core CPU and 4 GBmain memory e three nearby edge nodes are directly connectedto the edge-front node through a network cable We make use oftwo types of Raspberry Pi (RPi) nodes as clients one type is RPi 2which is wired to the router while the other type is RPi 3 which isconnected to router using built-in 24 Ghz WiFi

622 Datasets We have employed three datasets for evalua-tions One dataset is the Caltech Vision Group 2001 testing databasein which the car rear image resolution (126 images with resolution896x592) is adequate for license plate recognition [25] Anotherdataset is a self-collected 4K video contains rear license plates takenon an Android smartphone and is converted into videos of dierentresolutions(640x480 960x720 1280x960 and 1600x1200) e otherdataset used in inter-edge collaboration evaluation contains 22 carimages with the various resolution ranging from 405x540 pixelsto 2514x1210 pixels (le size 316 KB to 285 MB) e task requestsuse the car images as input in a round-robin way one car imagefor each task request

63 Task ProlerBeside the round trip time and bandwidth benchmark we havepresented in Fig 2 and Fig 3 to characterize the edge computingnetwork we have done proling of the OpenALPR application onvarious client edge and cloud nodes

896x592workload1

640x480workload2

960x720workload2

1280x960workload2

1600x1200workload2

0200

400

600

800

Clie

nt

RPi2

Execu

tion T

ime(m

s)

MotionDetectonPlateDetectionPlateAnalysisOCR

Figure 6 OpenALPR prole result of client type 1 (RPi2quad-core 09GHz)

In this experiment we are using both dataset 1 (workload 1)and dataset 2 (workload 2) at various resolutions e executiontime for each tasks are shown in Fig 6 Fig 7 Fig 8 and Fig 9e results indicate that by utilizing an edge nodes we can geta comparable amount of computation power close to clients forcomputation-intensive tasks Another observations is that dueto the uneven optimizations on heterogeneous CPU architecturessome tasks are beer to keep local while some others should be

896x592workload1

640x480workload2

960x720workload2

1280x960workload2

1600x1200workload2

020

04

00

60

0800

Clie

nt

RPi3

Execu

tion T

ime(m

s)

MotionDetectonPlateDetectionPlateAnalysisOCR

Figure 7 OpenALPR prole result of client type 2 (RPi3quad-core 12GHz)

896x592workload1

640x480workload2

960x720workload2

1280x960workload2

1600x1200workload2

0200

400

600

800

Edge E

xecu

tion T

ime(m

s)

MotionDetectonPlateDetectionPlateAnalysisOCR

Figure 8 OpenALPR prole result of a type of edge node (i7quad-core 230GHz)

896x592workload1

640x480workload2

960x720workload2

1280x960workload2

1600x1200workload2

0200

400

600

800

EC

2 T

2 L

arg

e E

xecu

tion T

ime(m

s)

MotionDetectonPlateDetectionPlateAnalysisOCR

Figure 9 OpenALPR prole of a type of cloud node (AWSEC2 t2large Xeon dual-core 240GHz)

ooaded to edge computing node is observation justied theneed of computation ooading between clients and edge nodes

SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA S Yi et al

640x480workload2

960x720workload2

1280x960workload2

1600x1200workload2

01

23

4R

esp

onse

Tim

e p

er

fram

e p

er

clie

nt(

s) Client-edge optClient onlyEdge onlyClient-cloud optCloud only

Figure 10 e comparison of task selection impacts on edgeoloading and cloud oloading for wired clients (RPi2)

640x480workload2

960x720workload2

1280x960workload2

1600x1200workload2

01

23

4R

esp

onse

Tim

e p

er

fram

e p

er

clie

nt(

s) Client-edge optClient onlyEdge onlyClient-cloud optCloud only

Figure 11 e comparison of task selection impacts on edgeoloading and cloud oloading for 24 Ghz wireless clients(RPi3)

64 Oloading Task SelectionTo understand how much the execution time can be reduced byspliing tasks between the client and the edge or between theclient and the cloud we design an experiment with workloadsgenerated from dataset 2 on two setups of scenarios 1) one edgenode provides service to three wired client nodes that have thebest network latency and bandwidth 2) one edge node providesservice to three wireless 24 Ghz client nodes that have latencywith high variance and relatively low bandwidth e result ofthe rst case is very straightforward the clients simply uploadall the input data and run all the tasks on the edge node in edgeooading or cloud node in cloud ooading as shown in Fig 10is is mainly because using Ethernet cable can stably providelowest latency and highest bandwidth which makes ooading toedge very rewarding We didnrsquot evaluate 5 Ghz wireless client sincethis interface is not supported on our client hardware while weanticipate similar results as the wire case We plot the result of a24 Ghz wireless client node with ooading to an edge node or aremote cloud node in the second case in Fig 11 Overall the resultsshowed that by ooading tasks to an edge computing platform

the application we choose experienced a speedup up to 40x onwired client-edge conguration compared to local execution andup to 17x compared to a similar client-cloud conguration Forclients with 24 Ghz wireless interface the speedup is up to 13x onclient-edge conguration compared to local execution and is up to12x compared to similar client-cloud conguration

5 10 15 20 25 30 35Number of task offloading requests

05

Resp

onse

Tim

e(s

)

Our schemeSIOFLCPUL

Figure 12 e comparison result of three task prioritizingschemes

65 Edge-front Taskeue PrioritizingTo evaluate the performance of the task queue prioritizing wecollect the statistical results from our proler service and moni-toring service on various workload for simulation We choose thesimulation method because we can freely setup the numbers andtypes of client and edge nodes to overcome the limitation of ourcurrent testbed to evaluate more complex deployments We addtwo simple schemes as baselines 1) shortest IO rst (SIOF) sortingall the tasks against the time cost of the network transmission 2)longest CPU last (LCPUL) sorting all the tasks against the time costof the processing on the edge node In the simulation base on thecombination of client device types workloads and ooading deci-sions we have in total seven types of jobs to run on the edge nodeWe increase the total number of jobs and evenly distributed themamong the seven types and report the makespan time in Fig 12 eresult showed that LCPUL is the worst among those three schemesand our scheme outperforms the shortest job rst scheme

66 Inter-Edge CollaborationWe also evaluate the performance of the three task placementschemes (ie STTF SQLF and SSLF) discussed in Section 5 througha controlled experiment on our testbed For evaluation purpose wecongure the network in the edge computing system as followse rst edge node denoted as ldquoedge node 1rdquo aerwards has 10ms RTT and 40 Mbps bandwidth to the edge-front node e secondedge node ldquoedge node 2rdquo has 20 ms RTT and 20 Mbps bandwidthto the edge-front node e third edge node ldquoedge node 3rdquo has100 ms RTT and 2 Mbps bandwidth to the edge-front node uswe emulate the situation where three edge nodes are in the distanceto the edge-front node from near to far

LAVEA Latency-aware Video Analytics on Edge Computing Platform SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA

0

05

1

15

2

25

3

0 2 4 6 8 10 12

Th

rou

gh

pu

t (t

ask

sse

c)

Time (min)

Edge-front nodeEdge node 1Edge node 2Edge node 3

Figure 13 Performance with no task placement scheme

0

05

1

15

2

25

3

0 2 4 6 8 10 12

Th

rou

gh

pu

t (t

ask

sse

c)

Time (min)

Edge-front nodeEdge node 1Edge node 2Edge node 3

Figure 14 Performance of STTF

0

05

1

15

2

25

3

0 2 4 6 8 10 12

Th

rou

gh

pu

t (t

ask

sse

c)

Time (min)

Edge-front nodeEdge node 1Edge node 2Edge node 3

Figure 15 Performance of SQLF

0

05

1

15

2

25

3

0 2 4 6 8 10 12

Th

rou

gh

pu

t (t

ask

sse

c)

Time (min)

Edge-front nodeEdge node 1Edge node 2Edge node 3

Figure 16 Performance of SSLF

We use the third dataset to synthesize a workload as follows Inthe rst 4 minutes the edge-front node receives 5 task requests persecond edge node 1 receives 4 task requests per second edge node2 receives 3 task requests per second and edge node 3 receives 2task requests per second respectively No task comes to any of theedge nodes aer the rst 4 minutes For the SSLF task placementscheme we implement a simple linear regression to predict thescheduling latency of the task being transmied since the workloadwe have injected is uniform distributed

Fig 13 illustrates the throughput on each edge node when notask placement scheme is enabled on the edge-front node eedge-front node has the heaviest workload and it takes about 1236minutes to nish all the tasks We consider this result as our base-line

Fig 14 is the throughput result of STTF scheme In this case theedge-front node only transmits tasks to edge node 1 because edgenode 1 has the highest bandwidth and the shortest RTT to theedge-front node Fig 17 reveals that the edge-front node transmits120 tasks to edge node 1 and no task to other edge nodes Asedge node 1 has heavier workload than edge node 2 and edgenode 3 the STTF scheme has limited improvement on the systemperformance the edge-front node takes about 1129 minutes tonish all the tasks Fig 15 illustrates the throughput result of SQLF

scheme is scheme works beer than the STTF scheme becausethe edge-front node transmits more tasks to less-saturated edgenodes eciently reducing the workload on the edge-front nodeHowever the edge-front node intends to transmit many tasks toedge node 3 at the beginning which has the lowest bandwidth andthe longest RTT to the edge-front node As such the task placementmay incur more delay then expected From Fig 17 the edge-frontnode transmits 0 task to edge node 1 132 tasks to edge node 2and 152 tasks to edge node 3 e edge-front node takes about 96minutes to nish all the tasks

Fig 16 demonstrates the throughput result of SSLF scheme isscheme considers both the transmission time of the task beingplaced and the waiting time in the queue on the target edge nodeand therefore achieves the best performance of the three As men-tioned edge node 1 has the lowest transmission overhead but theheaviest workload among the three edge nodes while edge node3 has the lightest workload but the highest transmission overheadIn contrast edge node 2 has modest transmission overhead andmodest workload e SSLF scheme takes all these situations intoconsideration and places the most number of tasks on edge node2 As shown in Fig 17 the edge-front node transmits 4 tasks toedge node 1 152 tasks to edge node 2 and 148 tasks to edgenode 3 when working with the SSLF scheme e edge-front node

SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA S Yi et al

takes about 936 minutes to nish all the tasks which is the bestresult among the three schemes We infer that the third schemewill further improve the task complete time if more tough networkconditions and workloads are considered

STTF SQLF SSLF

050

10

015

0N

um

ber

of

task

palc

ed

0task

0task

0task

4tasks

Edge node 1Edge node 2Edge node 3

Figure 17 Numbers of tasks placed by the edge-front node

7 RELATEDWORKe emergence of edge computing has drawn aentions due to itscapabilities to reshape the land surface of IoTs mobile computingand cloud computing [6 14 32 33 36ndash38] Satyanarayanan [29]has briefed the origin of edge computing also known as fog comput-ing [4] cloudlet [28] mobile edge computing [24] and so on Herewe will review several relevant research elds towards video edgeanalytics including distributed data processing and computationooading in various computing paradigms

71 Distributed Data ProcessingDistributed data processing has close relationship to the edge an-alytics in the sense that those data processing platforms [9 39]and underlying techniques [16 23 27] can be easily deployed on acluster of edge nodes In this paper we pay specially aentions todistributed imagevideo data processing systems VideoStorm[40]made insightful observation on vision-related algorithms and pro-posed resource-quality trade-o with multi-dimensional congu-rations (eg video resolution frame rate sampling rate slidingwindow size etc) e resource-quality proles are generated of-ine and a online scheduler is built to allocate resources to queriesto optimize the utility of quality and latency eir work is comple-mentary to ours in that we do not consider the trade-o betweenquality and latency goals via adaptive congurations Vigil [42] isa wireless video surveillance system that leveraged edge comput-ing nodes with emphasis on the content-aware frame selections ina scenario where multiple web cameras are at the same locationto optimize the bandwidth utilization which is orthogonal to theproblems we have addressed here Firework [41] is a computingparadigm for big data processing in collaborative edge environmentwhich is complementary to our work in terms of shared data viewand programming interface

While there should be more on-going eorts for investigating theadaptation improvement and optimization of existing distributed

data processing techniques on edge computing platform we focusmore on the taskapplication-level queue management and sched-uling and leave all the underlying resource negotiating processscheduling to the container cluster engine

72 Computation OloadingComputation Ooading (aka Cyber foraging [28]) has been pro-posed to improve resource utilization response time and energyconsumption in various computing environments [7 13 21 31]Work [17] has quantied the impact of edge computing on mobileapplications and found that edge computing can improve responsetime and energy consumption signicantly for mobile devicesthrough ooading via both WiFi and LTE networks Mocha [34]has investigated how a two-stage face recognition task from mobiledevice can be accelerated by cloudlet and cloud In their designclients simply capture image and sends to cloudlet e optimaltask partition can be easily achieved as it has only two stages InLAVEA our application is more complicated in multiple stages andwe leverage client-edge ooading and other techniques to improvethe resource utilization and optimize the response time

8 DISCUSSIONS AND LIMITATIONSIn this section we will discuss alternative design options point outcurrent limitations and future work that can improve the system

Measurement-based Oloading In this paper we are using ameasurement-based ooading (static ooading) ie the ooadingdecisions are based on the outcome of periodic measurements Weconsider this as one of the limitations of our implementations asstated in [15] and there are several dynamic computation ooadingschemes have been proposed [12] We are planning to improve themeasurement-based ooading in the future work

Video Streaming Our current data processing is image-basedwhich is one of the limitations of our implementation e inputis either in the format of image or in video stream which is readinto frames and sent out We believe by utilizing existing videostreaming techniques in between our system components for datasharing will further improve the system performance and openmore potential opportunities for optimization

Discovering EdgeNodes ere are dierent ways for the edge-front node to discover the available edge nodes nearby For exampleevery edge node intending to serve as a collaborator may open adesignated port so that the edge-front node can periodically scanthe network and discover the available edge nodes is is calledthe ldquopull-basedrdquo method In contrast there is also a ldquopush-basedrdquomethod in which the edge-front node opens a designated port andevery edge node intending to serve as a collaborator will register tothe edge-front node When the network is in a large scale the pull-based method usually performs poorly because the edge-front nodemay not be able to discover an available edge node in a short timeFor this reason the edge node discovery is implemented in a push-based method which guarantees good performance regardless ofthe network scale

9 CONCLUSIONIn this paper we have investigated providing video analytic servicesto latency-sensitive applications in edge computing environment

LAVEA Latency-aware Video Analytics on Edge Computing Platform SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA

As a result we have built LAVEA a low-latency video edge analyticsystem which collaborates nearby client edge and remote cloudnodes and transfers video feeds into semantic information at placescloser to the users in early stages We have utilized an edge-frontdesign and formulated an optimization problem for ooading taskselection and prioritized task queue to minimize the response timeOur result indicated that by ooading tasks to the closest edgenode the client-edge conguration has speedup range from 13x to4x (12x to 17x) against running in local (client-cloud) under variousnetwork conditions and workloads In case of a saturating workloadon the front edge node we have proposed and compared varioustask placement schemes that are tailed for inter-edge collaboratione proposed prediction-based shortest scheduling latency rsttask placement scheme considers both the transmission time ofthe task and the waiting time in the queue and outputs overallperformance beer than the other schemes

REFERENCES[1] Amazon Web Service 2017 AWS LambdaEdge hpdocsawsamazoncom

lambdalatestdglambda-edgehtml (2017)[2] Christos-Nikolaos E Anagnostopoulos Ioannis E Anagnostopoulos Ioannis D

Psoroulas Vassili Loumos and Eleherios Kayafas 2008 License plate recog-nition from still images and video sequences A survey IEEE Transactions onintelligent transportation systems 9 3 (2008) 377ndash391

[3] KR Baker 1990 Scheduling groups of jobs in the two-machine ow shop Math-ematical and Computer Modelling 13 3 (1990) 29ndash36

[4] Flavio Bonomi Rodolfo Milito Jiang Zhu and Sateesh Addepalli 2012 Fogcomputing and its role in the internet of things In Proceedings of the rst editionof the MCC workshop on Mobile cloud computing ACM 13ndash16

[5] Peter Brucker Bernd Jurisch and Bernd Sievers 1994 A branch and boundalgorithm for the job-shop scheduling problem Discrete applied mathematics 491 (1994) 107ndash127

[6] Yu Cao Songqing Chen Peng Hou and Donald Brown 2015 FAST A fog com-puting assisted distributed analytics system to monitor fall for stroke mitigationIn Networking Architecture and Storage (NAS) 2015 IEEE International Conferenceon IEEE 2ndash11

[7] Eduardo Cuervo Aruna Balasubramanian Dae-ki Cho Alec Wolman StefanSaroiu Ranveer Chandra and Paramvir Bahl 2010 MAUI Making SmartphonesLast Longer with Code Ooad In Proceedings of the 8th International Conferenceon Mobile Systems Applications and Services (MobiSys rsquo10) ACM New York NYUSA 49ndash62 DOIhpsdoiorg10114518144331814441

[8] Eyal de Lara Carolina S Gomes Steve Langridge S Hossein Mortazavi andMeysam Roodi 2016 Hierarchical Serverless Computing for the Mobile EdgeIn Edge Computing (SEC) IEEEACM Symposium on IEEE 109ndash110

[9] Jerey Dean and Sanjay Ghemawat 2008 MapReduce simplied data processingon large clusters Commun ACM 51 1 (2008) 107ndash113

[10] Shan Du Mahmoud Ibrahim Mohamed Shehata and Wael Badawy 2013 Auto-matic license plate recognition (ALPR) A state-of-the-art review IEEE Transac-tions on circuits and systems for video technology 23 2 (2013) 311ndash325

[11] Sadjad Fouladi Riad S Wahby Brennan Shackle Karthikeyan Vasuki Balasubra-maniam William Zeng Rahul Bhalerao Anirudh Sivaraman George Porter andKeith Winstein 2017 Encoding Fast and Slow Low-Latency Video ProcessingUsing ousands of Tiny reads In 14th USENIX Symposium on NetworkedSystems Design and Implementation (NSDI 17) USENIX Association BostonMA 363ndash376 hpswwwusenixorgconferencensdi17technical-sessionspresentationfouladi

[12] Wei Gao Yong Li Haoyang Lu Ting Wang and Cong Liu 2014 On exploitingdynamic execution paerns for workload ooading in mobile cloud applicationsIn Network Protocols (ICNP) 2014 IEEE 22nd International Conference on IEEE1ndash12

[13] Mark S Gordon Davoud Anoushe Jamshidi Sco A Mahlke Zhuoqing Mor-ley Mao and Xu Chen 2012 COMET Code Ooad by Migrating ExecutionTransparently In OSDI Vol 12 93ndash106

[14] Zijiang Hao Ed Novak Shanhe Yi andn Li 2017 Challenges and SowareArchitecture for Fog Computing Internet Computing (2017)

[15] Mohammed A Hassan Kshitiz Bhaarai Qi Wei and Songqing Chen 2014POMAC Properly Ooading Mobile Applications to Clouds In 6th USENIXWorkshop on Hot Topics in Cloud Computing (HotCloud 14)

[16] Benjamin Hindman Andy Konwinski Matei Zaharia Ali Ghodsi Anthony DJoseph Randy H Katz Sco Shenker and Ion Stoica 2011 Mesos A Platformfor Fine-Grained Resource Sharing in the Data Center In NSDI Vol 11 22ndash22

[17] Wenlu Hu Ying Gao Kiryong Ha Junjue Wang Brandon Amos Zhuo ChenPadmanabhan Pillai andMahadev Satyanarayanan 2016antifying the Impactof Edge Computing onMobile Applications In Proceedings of the 7th ACM SIGOPSAsia-Pacic Workshop on Systems ACM 5

[18] IBM 2017 Apache OpenWhisk hpopenwhiskorg (April 2017)[19] Selmer Martin Johnson 1954 Optimal two-and three-stage production schedules

with setup times included Naval research logistics quarterly 1 1 (1954) 61ndash68[20] Eric Jonas Shivaram Venkataraman Ion Stoica and Benjamin Recht 2017

Occupy the Cloud Distributed computing for the 99 arXiv preprintarXiv170204024 (2017)

[21] Ryan Newton Sivan Toledo Lewis Girod Hari Balakrishnan and Samuel Mad-den 2009 Wishbone Prole-based Partitioning for Sensornet Applications InNSDI Vol 9 395ndash408

[22] OpenALPR 2017 OpenALPR ndash Automatic License Plate Recognition hpwwwopenalprcom (April 2017)

[23] Kay Ousterhout Patrick Wendell Matei Zaharia and Ion Stoica 2013 Sparrowdistributed low latency scheduling In Proceedings of the Twenty-Fourth ACMSymposium on Operating Systems Principles ACM 69ndash84

[24] M Patel B Naughton C Chan N Sprecher S Abeta A Neal and others 2014Mobile-edge computing introductory technical white paper White Paper Mobile-edge Computing (MEC) industry initiative (2014)

[25] Brad Philip and Paul Updike 2001 Caltech Vision Group 2001 testing databasehpwwwvisioncaltecheduhtml-lesarchivehtml (2001)

[26] Kari Pulli Anatoly Baksheev Kirill Kornyakov and Victor Eruhimov 2012 Real-time computer vision with OpenCV Commun ACM 55 6 (2012) 61ndash69

[27] Je Rasley Konstantinos Karanasos Srikanth Kandula Rodrigo Fonseca MilanVojnovic and Sriram Rao 2016 Ecient queue management for cluster sched-uling In Proceedings of the Eleventh European Conference on Computer SystemsACM 36

[28] Mahadev Satyanarayanan 2001 Pervasive computing Vision and challengesIEEE Personal communications 8 4 (2001) 10ndash17

[29] Mahadev Satyanarayanan 2017 e Emergence of Edge Computing Computer50 1 (2017) 30ndash39

[30] Mahadev Satyanarayanan Pieter Simoens Yu Xiao Padmanabhan Pillai ZhuoChen Kiryong Ha Wenlu Hu and Brandon Amos 2015 Edge analytics in theinternet of things IEEE Pervasive Computing 14 2 (2015) 24ndash31

[31] Cong Shi Karim Habak Pranesh Pandurangan Mostafa Ammar Mayur Naikand Ellen Zegura 2014 Cosmos computation ooading as a service for mobiledevices In Proceedings of the 15th ACM international symposium on Mobile adhoc networking and computing ACM 287ndash296

[32] W Shi J Cao Q Zhang Y Li and L Xu 2016 Edge Computing Visionand Challenges IEEE Internet of ings Journal 3 5 (Oct 2016) 637ndash646 DOIhpsdoiorg101109JIOT20162579198

[33] Weisong Shi and Schahram Dustdar 2016 e Promise of Edge ComputingComputer 49 5 (2016) 78ndash81

[34] Tolga Soyata Rajani Muraleedharan Colin Funai Minseok Kwon and WendiHeinzelman 2012 Cloud-Vision Real-time face recognition using a mobile-cloudlet-cloud acceleration architecture In Computers and Communications(ISCC) 2012 IEEE Symposium on IEEE 000059ndash000066

[35] Xiaoli Wang Aakanksha Chowdhery andMung Chiang 2016 SkyEyes adaptivevideo streaming from UAVs In Proceedings of the 3rd Workshop on Hot Topics inWireless ACM 2ndash6

[36] Shanhe Yi Zijiang Hao Zhengrui Qin andn Li 2015 Fog Computing Plat-form and Applications In Hot Topics in Web Systems and Technologies (HotWeb)2015 ird IEEE Workshop on IEEE 73ndash78

[37] Shanhe Yi Cheng Li and n Li 2015 A Survey of Fog Computing ConceptsApplications and Issues In Proceedings of the 2015 Workshop on Mobile Big DataMobidata rsquo15 ACM 37ndash42

[38] Shanhe Yi Zhengrui Qin andn Li 2015 Security and privacy issues of fogcomputing A survey InWireless Algorithms Systems and Applications Springer685ndash695

[39] Matei Zaharia Mosharaf Chowdhury Michael J Franklin Sco Shenker and IonStoica 2010 Spark cluster computing with working sets HotCloud 10 (2010)10ndash10

[40] Haoyu Zhang Ganesh Ananthanarayanan Peter Bodik Mahai PhiliposeParamvir Bahl and Michael J Freedman 2017 Live Video Analytics at Scale withApproximation and Delay-Tolerance In 14th USENIX Symposium on NetworkedSystems Design and Implementation (NSDI 17) USENIX Association BostonMA 377ndash392 hpswwwusenixorgconferencensdi17technical-sessionspresentationzhang

[41] Q Zhang X Zhang Q Zhang W Shi and H Zhong 2016 Firework Big DataSharing and Processing in Collaborative Edge Environment In 2016 Fourth IEEEWorkshop on Hot Topics in Web Systems and Technologies (HotWeb) 20ndash25 DOIhpsdoiorg101109HotWeb201612

[42] Tan Zhang Aakanksha Chowdhery Paramvir Victor Bahl Kyle Jamieson andSuman Banerjee 2015 e design and implementation of a wireless videosurveillance system In Proceedings of the 21st Annual International Conferenceon Mobile Computing and Networking ACM 426ndash438

  • Abstract
  • 1 Introduction
  • 2 Background and Motivation
    • 21 Edge Computing Network
    • 22 Serverless Architecture
    • 23 Video Edge Analytics for Public Safety
      • 3 LAVEA System Design
        • 31 Design Goals
        • 32 System Overview
        • 33 Edge Computing Services
          • 4 Edge-front Offloading
            • 41 Task Offloading System Model and Problem Formulation
            • 42 Optimization Solver
            • 43 Prioritizing Edge Task Queue
            • 44 Workload Optimizer
              • 5 Inter-edge Collaboration
                • 51 Motivation and Challenges
                • 52 Inter-Edge Task Placement Schemes
                  • 6 System Implementation and Performance Evaluation
                    • 61 Implementation Details
                    • 62 Evaluation Setup
                    • 63 Task Profiler
                    • 64 Offloading Task Selection
                    • 65 Edge-front Task Queue Prioritizing
                    • 66 Inter-Edge Collaboration
                      • 7 Related Work
                        • 71 Distributed Data Processing
                        • 72 Computation Offloading
                          • 8 Discussions and Limitations
                          • 9 Conclusion
                          • References
Page 8: LAVEA: Latency-aware Video Analytics on Edge Computing Platformweisongshi.org/papers/yi17-LAVEA.pdf · 2020. 8. 19. · LAVEA: Latency-aware Video Analytics on Edge Computing Platform

SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA S Yi et al

52 Inter-Edge Task Placement SchemesWe have investigated three task placement schemes for inter-edgecollaboration

bull Shortest Transmission Time First (STTF)bull Shortest eue Length First (SQLF)bull Shortest Scheduling Latency First (SSLF)

e STTF task placement scheme tends to place tasks on theedge node that has the shortest estimated latency for the edge-front node to transfer the tasks e edge-front node maintains atable to record the latency of transmiing data to each availableedge node e periodical re-calibration is necessary because thenetwork condition between the edge-front node and other edgenodes may vary from time to time

e SQLF task placement scheme on the other hand tends totransfer tasks from the edge-front node to the edge node which hasthe least number of tasks queued upon the time of query Whenthe edge-front node is saturated with requests it will rst queryall the available edge nodes about their current task queue lengthand then transfer tasks to the edge node that has the shortest valuereported

e SSLF task placement scheme tends to transmit tasks from theedge-front node to the edge node that is predicted to have the short-est response time e response time is the time interval betweenthe time when the edge-front node submits a task to an availableedge node and the time when it receives the result of the task fromthat edge node Unlike the SQLF task placement scheme the edge-front node keeps querying the edge nodes about the queue lengthwhich may has performance issue when the number of nodes scalesup and results in a large volume of queries We have designed anovel method for the edge-front node to measure the schedulinglatency eciently During the measurement phase before edge-front node chooses task placement target edge-front node sendsa request message to each available edge node which appends aspecial task to the tail of the task queue When the special taskis executed the edge node simply sends a response message tothe edge-front node e edge-front node receives the responsemessage and records the response time Periodically the edge-frontnode maintains a series of response times for each available edgenode When the edge-front node is saturated it will start to reassigntasks to the edge node having the shortest response time Unlikethe STTF and SQLF task assignment schemes which choose thetarget edge node based on the current or most recent measurementsthe SSLF scheme predicts the current response time for each edgenode by applying regression analysis to the response time seriesrecorded so far e reason is that the edge nodes are also receivingtask requests from client nodes and their local workload may varyfrom time to time so the most recent response time cannot serveas a good predictor of the current response time for the edge nodesAs the local workload in the real world on each edge node usuallyfollows certain paern or trend applying regression analysis tothe recorded response times is a good way to estimate the currentresponse time To this end we recorded measurements of responsetimes from each edge node and ooads tasks to the edge nodethat is predicted to have the least current response time Once theedge-front node starts to place task to a certain edge node the

estimation will be updated using piggybacking of the redirectedtasks which lowers the overhead of measuring

Each of the task placement schemes described above has someadvantages and disadvantages For instance the STTF scheme canquickly reduce the workload on the edge-front node But there isa chance that tasks may be placed to an edge node which alreadyhas intensive workload as STTF scheme gathers no information ofthe workload on the target e SQLF scheme works well when thenetwork latency and bandwidth are stable among all the availableedge nodes When the network overheads are highly variant thisscheme fails to factor the network condition and always choosesedge node with the lowest workload When an intensive workloadis placed under a high network overhead this scheme potentiallydeteriorates the performance as it needs to measure the workloadfrequently e SSLF task placement scheme estimates the responsetime of each edge node by following the task-ooading processand the response time is a good indicator of which edge node shouldbe chosen as the target of task placement in terms of the workloadand network overhead e SSLF scheme is a well trade-o betweenprevious two schemes However the regression analysis may intro-duce a large error to the predicted response time if inappropriatemodels are selected We believe that the decision of which taskplacement scheme should be employed for achieving good systemperformance should always give proper considerations on the work-load and network conditions We evaluated those three schemesthrough a case study in the next section

6 SYSTEM IMPLEMENTATION ANDPERFORMANCE EVALUATION

In this section we rst brief the implementation details in buildingour system Next we introduce our evaluation setup and presentresults of our evaluations

61 Implementation DetailsOur implementation aims at a serverless edge architecture Asshown in system architecture of Fig 4 our implementation is basedon docker container for the benets of quick deployment and easymanagement Every component has been dockerized and its de-ployment is greatly simplied via distributing pre-built images ecreation and destruction of docker instances is much faster thanthat of VM instances Inspired by the IBM OpenWhisk [18] eachworker container contains an action proxy which uses Pythonto run any scripts or compile and execute any binary executablee worker container communicates with others using a messagequeue as all the inputsoutputs will be jsonied However we donrsquotjsonied imagevideo and use its path reference in shared storagee task queue is implemented using Redis as it is in memory andhas very good performance e end user only needs to 1) deployour edge computing platform on heterogeneous devices with justa click 2) dene the event of interests using provided API 3) pro-vide a function (scripts or binary executable) to process such evente function we have implemented is using open source projectOpenALPR[22] as the task payload for workers

LAVEA Latency-aware Video Analytics on Edge Computing Platform SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA

62 Evaluation Setup621 Testbed We have built a testbed consisting of four edge

computing nodes One of the edge nodes is the edge-front nodewhich is directly connected to a wireless router using a cable Otherthree nodes are set as nearby edge computing nodes for the evalua-tion of inter-edge collaboration ese four machines have the samehardware specications ey all have a quad-core CPU and 4 GBmain memory e three nearby edge nodes are directly connectedto the edge-front node through a network cable We make use oftwo types of Raspberry Pi (RPi) nodes as clients one type is RPi 2which is wired to the router while the other type is RPi 3 which isconnected to router using built-in 24 Ghz WiFi

622 Datasets We have employed three datasets for evalua-tions One dataset is the Caltech Vision Group 2001 testing databasein which the car rear image resolution (126 images with resolution896x592) is adequate for license plate recognition [25] Anotherdataset is a self-collected 4K video contains rear license plates takenon an Android smartphone and is converted into videos of dierentresolutions(640x480 960x720 1280x960 and 1600x1200) e otherdataset used in inter-edge collaboration evaluation contains 22 carimages with the various resolution ranging from 405x540 pixelsto 2514x1210 pixels (le size 316 KB to 285 MB) e task requestsuse the car images as input in a round-robin way one car imagefor each task request

63 Task ProlerBeside the round trip time and bandwidth benchmark we havepresented in Fig 2 and Fig 3 to characterize the edge computingnetwork we have done proling of the OpenALPR application onvarious client edge and cloud nodes

896x592workload1

640x480workload2

960x720workload2

1280x960workload2

1600x1200workload2

0200

400

600

800

Clie

nt

RPi2

Execu

tion T

ime(m

s)

MotionDetectonPlateDetectionPlateAnalysisOCR

Figure 6 OpenALPR prole result of client type 1 (RPi2quad-core 09GHz)

In this experiment we are using both dataset 1 (workload 1)and dataset 2 (workload 2) at various resolutions e executiontime for each tasks are shown in Fig 6 Fig 7 Fig 8 and Fig 9e results indicate that by utilizing an edge nodes we can geta comparable amount of computation power close to clients forcomputation-intensive tasks Another observations is that dueto the uneven optimizations on heterogeneous CPU architecturessome tasks are beer to keep local while some others should be

896x592workload1

640x480workload2

960x720workload2

1280x960workload2

1600x1200workload2

020

04

00

60

0800

Clie

nt

RPi3

Execu

tion T

ime(m

s)

MotionDetectonPlateDetectionPlateAnalysisOCR

Figure 7 OpenALPR prole result of client type 2 (RPi3quad-core 12GHz)

896x592workload1

640x480workload2

960x720workload2

1280x960workload2

1600x1200workload2

0200

400

600

800

Edge E

xecu

tion T

ime(m

s)

MotionDetectonPlateDetectionPlateAnalysisOCR

Figure 8 OpenALPR prole result of a type of edge node (i7quad-core 230GHz)

896x592workload1

640x480workload2

960x720workload2

1280x960workload2

1600x1200workload2

0200

400

600

800

EC

2 T

2 L

arg

e E

xecu

tion T

ime(m

s)

MotionDetectonPlateDetectionPlateAnalysisOCR

Figure 9 OpenALPR prole of a type of cloud node (AWSEC2 t2large Xeon dual-core 240GHz)

ooaded to edge computing node is observation justied theneed of computation ooading between clients and edge nodes

SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA S Yi et al

640x480workload2

960x720workload2

1280x960workload2

1600x1200workload2

01

23

4R

esp

onse

Tim

e p

er

fram

e p

er

clie

nt(

s) Client-edge optClient onlyEdge onlyClient-cloud optCloud only

Figure 10 e comparison of task selection impacts on edgeoloading and cloud oloading for wired clients (RPi2)

640x480workload2

960x720workload2

1280x960workload2

1600x1200workload2

01

23

4R

esp

onse

Tim

e p

er

fram

e p

er

clie

nt(

s) Client-edge optClient onlyEdge onlyClient-cloud optCloud only

Figure 11 e comparison of task selection impacts on edgeoloading and cloud oloading for 24 Ghz wireless clients(RPi3)

64 Oloading Task SelectionTo understand how much the execution time can be reduced byspliing tasks between the client and the edge or between theclient and the cloud we design an experiment with workloadsgenerated from dataset 2 on two setups of scenarios 1) one edgenode provides service to three wired client nodes that have thebest network latency and bandwidth 2) one edge node providesservice to three wireless 24 Ghz client nodes that have latencywith high variance and relatively low bandwidth e result ofthe rst case is very straightforward the clients simply uploadall the input data and run all the tasks on the edge node in edgeooading or cloud node in cloud ooading as shown in Fig 10is is mainly because using Ethernet cable can stably providelowest latency and highest bandwidth which makes ooading toedge very rewarding We didnrsquot evaluate 5 Ghz wireless client sincethis interface is not supported on our client hardware while weanticipate similar results as the wire case We plot the result of a24 Ghz wireless client node with ooading to an edge node or aremote cloud node in the second case in Fig 11 Overall the resultsshowed that by ooading tasks to an edge computing platform

the application we choose experienced a speedup up to 40x onwired client-edge conguration compared to local execution andup to 17x compared to a similar client-cloud conguration Forclients with 24 Ghz wireless interface the speedup is up to 13x onclient-edge conguration compared to local execution and is up to12x compared to similar client-cloud conguration

5 10 15 20 25 30 35Number of task offloading requests

05

Resp

onse

Tim

e(s

)

Our schemeSIOFLCPUL

Figure 12 e comparison result of three task prioritizingschemes

65 Edge-front Taskeue PrioritizingTo evaluate the performance of the task queue prioritizing wecollect the statistical results from our proler service and moni-toring service on various workload for simulation We choose thesimulation method because we can freely setup the numbers andtypes of client and edge nodes to overcome the limitation of ourcurrent testbed to evaluate more complex deployments We addtwo simple schemes as baselines 1) shortest IO rst (SIOF) sortingall the tasks against the time cost of the network transmission 2)longest CPU last (LCPUL) sorting all the tasks against the time costof the processing on the edge node In the simulation base on thecombination of client device types workloads and ooading deci-sions we have in total seven types of jobs to run on the edge nodeWe increase the total number of jobs and evenly distributed themamong the seven types and report the makespan time in Fig 12 eresult showed that LCPUL is the worst among those three schemesand our scheme outperforms the shortest job rst scheme

66 Inter-Edge CollaborationWe also evaluate the performance of the three task placementschemes (ie STTF SQLF and SSLF) discussed in Section 5 througha controlled experiment on our testbed For evaluation purpose wecongure the network in the edge computing system as followse rst edge node denoted as ldquoedge node 1rdquo aerwards has 10ms RTT and 40 Mbps bandwidth to the edge-front node e secondedge node ldquoedge node 2rdquo has 20 ms RTT and 20 Mbps bandwidthto the edge-front node e third edge node ldquoedge node 3rdquo has100 ms RTT and 2 Mbps bandwidth to the edge-front node uswe emulate the situation where three edge nodes are in the distanceto the edge-front node from near to far

LAVEA Latency-aware Video Analytics on Edge Computing Platform SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA

0

05

1

15

2

25

3

0 2 4 6 8 10 12

Th

rou

gh

pu

t (t

ask

sse

c)

Time (min)

Edge-front nodeEdge node 1Edge node 2Edge node 3

Figure 13 Performance with no task placement scheme

0

05

1

15

2

25

3

0 2 4 6 8 10 12

Th

rou

gh

pu

t (t

ask

sse

c)

Time (min)

Edge-front nodeEdge node 1Edge node 2Edge node 3

Figure 14 Performance of STTF

0

05

1

15

2

25

3

0 2 4 6 8 10 12

Th

rou

gh

pu

t (t

ask

sse

c)

Time (min)

Edge-front nodeEdge node 1Edge node 2Edge node 3

Figure 15 Performance of SQLF

0

05

1

15

2

25

3

0 2 4 6 8 10 12

Th

rou

gh

pu

t (t

ask

sse

c)

Time (min)

Edge-front nodeEdge node 1Edge node 2Edge node 3

Figure 16 Performance of SSLF

We use the third dataset to synthesize a workload as follows Inthe rst 4 minutes the edge-front node receives 5 task requests persecond edge node 1 receives 4 task requests per second edge node2 receives 3 task requests per second and edge node 3 receives 2task requests per second respectively No task comes to any of theedge nodes aer the rst 4 minutes For the SSLF task placementscheme we implement a simple linear regression to predict thescheduling latency of the task being transmied since the workloadwe have injected is uniform distributed

Fig 13 illustrates the throughput on each edge node when notask placement scheme is enabled on the edge-front node eedge-front node has the heaviest workload and it takes about 1236minutes to nish all the tasks We consider this result as our base-line

Fig 14 is the throughput result of STTF scheme In this case theedge-front node only transmits tasks to edge node 1 because edgenode 1 has the highest bandwidth and the shortest RTT to theedge-front node Fig 17 reveals that the edge-front node transmits120 tasks to edge node 1 and no task to other edge nodes Asedge node 1 has heavier workload than edge node 2 and edgenode 3 the STTF scheme has limited improvement on the systemperformance the edge-front node takes about 1129 minutes tonish all the tasks Fig 15 illustrates the throughput result of SQLF

scheme is scheme works beer than the STTF scheme becausethe edge-front node transmits more tasks to less-saturated edgenodes eciently reducing the workload on the edge-front nodeHowever the edge-front node intends to transmit many tasks toedge node 3 at the beginning which has the lowest bandwidth andthe longest RTT to the edge-front node As such the task placementmay incur more delay then expected From Fig 17 the edge-frontnode transmits 0 task to edge node 1 132 tasks to edge node 2and 152 tasks to edge node 3 e edge-front node takes about 96minutes to nish all the tasks

Fig 16 demonstrates the throughput result of SSLF scheme isscheme considers both the transmission time of the task beingplaced and the waiting time in the queue on the target edge nodeand therefore achieves the best performance of the three As men-tioned edge node 1 has the lowest transmission overhead but theheaviest workload among the three edge nodes while edge node3 has the lightest workload but the highest transmission overheadIn contrast edge node 2 has modest transmission overhead andmodest workload e SSLF scheme takes all these situations intoconsideration and places the most number of tasks on edge node2 As shown in Fig 17 the edge-front node transmits 4 tasks toedge node 1 152 tasks to edge node 2 and 148 tasks to edgenode 3 when working with the SSLF scheme e edge-front node

SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA S Yi et al

takes about 936 minutes to nish all the tasks which is the bestresult among the three schemes We infer that the third schemewill further improve the task complete time if more tough networkconditions and workloads are considered

STTF SQLF SSLF

050

10

015

0N

um

ber

of

task

palc

ed

0task

0task

0task

4tasks

Edge node 1Edge node 2Edge node 3

Figure 17 Numbers of tasks placed by the edge-front node

7 RELATEDWORKe emergence of edge computing has drawn aentions due to itscapabilities to reshape the land surface of IoTs mobile computingand cloud computing [6 14 32 33 36ndash38] Satyanarayanan [29]has briefed the origin of edge computing also known as fog comput-ing [4] cloudlet [28] mobile edge computing [24] and so on Herewe will review several relevant research elds towards video edgeanalytics including distributed data processing and computationooading in various computing paradigms

71 Distributed Data ProcessingDistributed data processing has close relationship to the edge an-alytics in the sense that those data processing platforms [9 39]and underlying techniques [16 23 27] can be easily deployed on acluster of edge nodes In this paper we pay specially aentions todistributed imagevideo data processing systems VideoStorm[40]made insightful observation on vision-related algorithms and pro-posed resource-quality trade-o with multi-dimensional congu-rations (eg video resolution frame rate sampling rate slidingwindow size etc) e resource-quality proles are generated of-ine and a online scheduler is built to allocate resources to queriesto optimize the utility of quality and latency eir work is comple-mentary to ours in that we do not consider the trade-o betweenquality and latency goals via adaptive congurations Vigil [42] isa wireless video surveillance system that leveraged edge comput-ing nodes with emphasis on the content-aware frame selections ina scenario where multiple web cameras are at the same locationto optimize the bandwidth utilization which is orthogonal to theproblems we have addressed here Firework [41] is a computingparadigm for big data processing in collaborative edge environmentwhich is complementary to our work in terms of shared data viewand programming interface

While there should be more on-going eorts for investigating theadaptation improvement and optimization of existing distributed

data processing techniques on edge computing platform we focusmore on the taskapplication-level queue management and sched-uling and leave all the underlying resource negotiating processscheduling to the container cluster engine

72 Computation OloadingComputation Ooading (aka Cyber foraging [28]) has been pro-posed to improve resource utilization response time and energyconsumption in various computing environments [7 13 21 31]Work [17] has quantied the impact of edge computing on mobileapplications and found that edge computing can improve responsetime and energy consumption signicantly for mobile devicesthrough ooading via both WiFi and LTE networks Mocha [34]has investigated how a two-stage face recognition task from mobiledevice can be accelerated by cloudlet and cloud In their designclients simply capture image and sends to cloudlet e optimaltask partition can be easily achieved as it has only two stages InLAVEA our application is more complicated in multiple stages andwe leverage client-edge ooading and other techniques to improvethe resource utilization and optimize the response time

8 DISCUSSIONS AND LIMITATIONSIn this section we will discuss alternative design options point outcurrent limitations and future work that can improve the system

Measurement-based Oloading In this paper we are using ameasurement-based ooading (static ooading) ie the ooadingdecisions are based on the outcome of periodic measurements Weconsider this as one of the limitations of our implementations asstated in [15] and there are several dynamic computation ooadingschemes have been proposed [12] We are planning to improve themeasurement-based ooading in the future work

Video Streaming Our current data processing is image-basedwhich is one of the limitations of our implementation e inputis either in the format of image or in video stream which is readinto frames and sent out We believe by utilizing existing videostreaming techniques in between our system components for datasharing will further improve the system performance and openmore potential opportunities for optimization

Discovering EdgeNodes ere are dierent ways for the edge-front node to discover the available edge nodes nearby For exampleevery edge node intending to serve as a collaborator may open adesignated port so that the edge-front node can periodically scanthe network and discover the available edge nodes is is calledthe ldquopull-basedrdquo method In contrast there is also a ldquopush-basedrdquomethod in which the edge-front node opens a designated port andevery edge node intending to serve as a collaborator will register tothe edge-front node When the network is in a large scale the pull-based method usually performs poorly because the edge-front nodemay not be able to discover an available edge node in a short timeFor this reason the edge node discovery is implemented in a push-based method which guarantees good performance regardless ofthe network scale

9 CONCLUSIONIn this paper we have investigated providing video analytic servicesto latency-sensitive applications in edge computing environment

LAVEA Latency-aware Video Analytics on Edge Computing Platform SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA

As a result we have built LAVEA a low-latency video edge analyticsystem which collaborates nearby client edge and remote cloudnodes and transfers video feeds into semantic information at placescloser to the users in early stages We have utilized an edge-frontdesign and formulated an optimization problem for ooading taskselection and prioritized task queue to minimize the response timeOur result indicated that by ooading tasks to the closest edgenode the client-edge conguration has speedup range from 13x to4x (12x to 17x) against running in local (client-cloud) under variousnetwork conditions and workloads In case of a saturating workloadon the front edge node we have proposed and compared varioustask placement schemes that are tailed for inter-edge collaboratione proposed prediction-based shortest scheduling latency rsttask placement scheme considers both the transmission time ofthe task and the waiting time in the queue and outputs overallperformance beer than the other schemes

REFERENCES[1] Amazon Web Service 2017 AWS LambdaEdge hpdocsawsamazoncom

lambdalatestdglambda-edgehtml (2017)[2] Christos-Nikolaos E Anagnostopoulos Ioannis E Anagnostopoulos Ioannis D

Psoroulas Vassili Loumos and Eleherios Kayafas 2008 License plate recog-nition from still images and video sequences A survey IEEE Transactions onintelligent transportation systems 9 3 (2008) 377ndash391

[3] KR Baker 1990 Scheduling groups of jobs in the two-machine ow shop Math-ematical and Computer Modelling 13 3 (1990) 29ndash36

[4] Flavio Bonomi Rodolfo Milito Jiang Zhu and Sateesh Addepalli 2012 Fogcomputing and its role in the internet of things In Proceedings of the rst editionof the MCC workshop on Mobile cloud computing ACM 13ndash16

[5] Peter Brucker Bernd Jurisch and Bernd Sievers 1994 A branch and boundalgorithm for the job-shop scheduling problem Discrete applied mathematics 491 (1994) 107ndash127

[6] Yu Cao Songqing Chen Peng Hou and Donald Brown 2015 FAST A fog com-puting assisted distributed analytics system to monitor fall for stroke mitigationIn Networking Architecture and Storage (NAS) 2015 IEEE International Conferenceon IEEE 2ndash11

[7] Eduardo Cuervo Aruna Balasubramanian Dae-ki Cho Alec Wolman StefanSaroiu Ranveer Chandra and Paramvir Bahl 2010 MAUI Making SmartphonesLast Longer with Code Ooad In Proceedings of the 8th International Conferenceon Mobile Systems Applications and Services (MobiSys rsquo10) ACM New York NYUSA 49ndash62 DOIhpsdoiorg10114518144331814441

[8] Eyal de Lara Carolina S Gomes Steve Langridge S Hossein Mortazavi andMeysam Roodi 2016 Hierarchical Serverless Computing for the Mobile EdgeIn Edge Computing (SEC) IEEEACM Symposium on IEEE 109ndash110

[9] Jerey Dean and Sanjay Ghemawat 2008 MapReduce simplied data processingon large clusters Commun ACM 51 1 (2008) 107ndash113

[10] Shan Du Mahmoud Ibrahim Mohamed Shehata and Wael Badawy 2013 Auto-matic license plate recognition (ALPR) A state-of-the-art review IEEE Transac-tions on circuits and systems for video technology 23 2 (2013) 311ndash325

[11] Sadjad Fouladi Riad S Wahby Brennan Shackle Karthikeyan Vasuki Balasubra-maniam William Zeng Rahul Bhalerao Anirudh Sivaraman George Porter andKeith Winstein 2017 Encoding Fast and Slow Low-Latency Video ProcessingUsing ousands of Tiny reads In 14th USENIX Symposium on NetworkedSystems Design and Implementation (NSDI 17) USENIX Association BostonMA 363ndash376 hpswwwusenixorgconferencensdi17technical-sessionspresentationfouladi

[12] Wei Gao Yong Li Haoyang Lu Ting Wang and Cong Liu 2014 On exploitingdynamic execution paerns for workload ooading in mobile cloud applicationsIn Network Protocols (ICNP) 2014 IEEE 22nd International Conference on IEEE1ndash12

[13] Mark S Gordon Davoud Anoushe Jamshidi Sco A Mahlke Zhuoqing Mor-ley Mao and Xu Chen 2012 COMET Code Ooad by Migrating ExecutionTransparently In OSDI Vol 12 93ndash106

[14] Zijiang Hao Ed Novak Shanhe Yi andn Li 2017 Challenges and SowareArchitecture for Fog Computing Internet Computing (2017)

[15] Mohammed A Hassan Kshitiz Bhaarai Qi Wei and Songqing Chen 2014POMAC Properly Ooading Mobile Applications to Clouds In 6th USENIXWorkshop on Hot Topics in Cloud Computing (HotCloud 14)

[16] Benjamin Hindman Andy Konwinski Matei Zaharia Ali Ghodsi Anthony DJoseph Randy H Katz Sco Shenker and Ion Stoica 2011 Mesos A Platformfor Fine-Grained Resource Sharing in the Data Center In NSDI Vol 11 22ndash22

[17] Wenlu Hu Ying Gao Kiryong Ha Junjue Wang Brandon Amos Zhuo ChenPadmanabhan Pillai andMahadev Satyanarayanan 2016antifying the Impactof Edge Computing onMobile Applications In Proceedings of the 7th ACM SIGOPSAsia-Pacic Workshop on Systems ACM 5

[18] IBM 2017 Apache OpenWhisk hpopenwhiskorg (April 2017)[19] Selmer Martin Johnson 1954 Optimal two-and three-stage production schedules

with setup times included Naval research logistics quarterly 1 1 (1954) 61ndash68[20] Eric Jonas Shivaram Venkataraman Ion Stoica and Benjamin Recht 2017

Occupy the Cloud Distributed computing for the 99 arXiv preprintarXiv170204024 (2017)

[21] Ryan Newton Sivan Toledo Lewis Girod Hari Balakrishnan and Samuel Mad-den 2009 Wishbone Prole-based Partitioning for Sensornet Applications InNSDI Vol 9 395ndash408

[22] OpenALPR 2017 OpenALPR ndash Automatic License Plate Recognition hpwwwopenalprcom (April 2017)

[23] Kay Ousterhout Patrick Wendell Matei Zaharia and Ion Stoica 2013 Sparrowdistributed low latency scheduling In Proceedings of the Twenty-Fourth ACMSymposium on Operating Systems Principles ACM 69ndash84

[24] M Patel B Naughton C Chan N Sprecher S Abeta A Neal and others 2014Mobile-edge computing introductory technical white paper White Paper Mobile-edge Computing (MEC) industry initiative (2014)

[25] Brad Philip and Paul Updike 2001 Caltech Vision Group 2001 testing databasehpwwwvisioncaltecheduhtml-lesarchivehtml (2001)

[26] Kari Pulli Anatoly Baksheev Kirill Kornyakov and Victor Eruhimov 2012 Real-time computer vision with OpenCV Commun ACM 55 6 (2012) 61ndash69

[27] Je Rasley Konstantinos Karanasos Srikanth Kandula Rodrigo Fonseca MilanVojnovic and Sriram Rao 2016 Ecient queue management for cluster sched-uling In Proceedings of the Eleventh European Conference on Computer SystemsACM 36

[28] Mahadev Satyanarayanan 2001 Pervasive computing Vision and challengesIEEE Personal communications 8 4 (2001) 10ndash17

[29] Mahadev Satyanarayanan 2017 e Emergence of Edge Computing Computer50 1 (2017) 30ndash39

[30] Mahadev Satyanarayanan Pieter Simoens Yu Xiao Padmanabhan Pillai ZhuoChen Kiryong Ha Wenlu Hu and Brandon Amos 2015 Edge analytics in theinternet of things IEEE Pervasive Computing 14 2 (2015) 24ndash31

[31] Cong Shi Karim Habak Pranesh Pandurangan Mostafa Ammar Mayur Naikand Ellen Zegura 2014 Cosmos computation ooading as a service for mobiledevices In Proceedings of the 15th ACM international symposium on Mobile adhoc networking and computing ACM 287ndash296

[32] W Shi J Cao Q Zhang Y Li and L Xu 2016 Edge Computing Visionand Challenges IEEE Internet of ings Journal 3 5 (Oct 2016) 637ndash646 DOIhpsdoiorg101109JIOT20162579198

[33] Weisong Shi and Schahram Dustdar 2016 e Promise of Edge ComputingComputer 49 5 (2016) 78ndash81

[34] Tolga Soyata Rajani Muraleedharan Colin Funai Minseok Kwon and WendiHeinzelman 2012 Cloud-Vision Real-time face recognition using a mobile-cloudlet-cloud acceleration architecture In Computers and Communications(ISCC) 2012 IEEE Symposium on IEEE 000059ndash000066

[35] Xiaoli Wang Aakanksha Chowdhery andMung Chiang 2016 SkyEyes adaptivevideo streaming from UAVs In Proceedings of the 3rd Workshop on Hot Topics inWireless ACM 2ndash6

[36] Shanhe Yi Zijiang Hao Zhengrui Qin andn Li 2015 Fog Computing Plat-form and Applications In Hot Topics in Web Systems and Technologies (HotWeb)2015 ird IEEE Workshop on IEEE 73ndash78

[37] Shanhe Yi Cheng Li and n Li 2015 A Survey of Fog Computing ConceptsApplications and Issues In Proceedings of the 2015 Workshop on Mobile Big DataMobidata rsquo15 ACM 37ndash42

[38] Shanhe Yi Zhengrui Qin andn Li 2015 Security and privacy issues of fogcomputing A survey InWireless Algorithms Systems and Applications Springer685ndash695

[39] Matei Zaharia Mosharaf Chowdhury Michael J Franklin Sco Shenker and IonStoica 2010 Spark cluster computing with working sets HotCloud 10 (2010)10ndash10

[40] Haoyu Zhang Ganesh Ananthanarayanan Peter Bodik Mahai PhiliposeParamvir Bahl and Michael J Freedman 2017 Live Video Analytics at Scale withApproximation and Delay-Tolerance In 14th USENIX Symposium on NetworkedSystems Design and Implementation (NSDI 17) USENIX Association BostonMA 377ndash392 hpswwwusenixorgconferencensdi17technical-sessionspresentationzhang

[41] Q Zhang X Zhang Q Zhang W Shi and H Zhong 2016 Firework Big DataSharing and Processing in Collaborative Edge Environment In 2016 Fourth IEEEWorkshop on Hot Topics in Web Systems and Technologies (HotWeb) 20ndash25 DOIhpsdoiorg101109HotWeb201612

[42] Tan Zhang Aakanksha Chowdhery Paramvir Victor Bahl Kyle Jamieson andSuman Banerjee 2015 e design and implementation of a wireless videosurveillance system In Proceedings of the 21st Annual International Conferenceon Mobile Computing and Networking ACM 426ndash438

  • Abstract
  • 1 Introduction
  • 2 Background and Motivation
    • 21 Edge Computing Network
    • 22 Serverless Architecture
    • 23 Video Edge Analytics for Public Safety
      • 3 LAVEA System Design
        • 31 Design Goals
        • 32 System Overview
        • 33 Edge Computing Services
          • 4 Edge-front Offloading
            • 41 Task Offloading System Model and Problem Formulation
            • 42 Optimization Solver
            • 43 Prioritizing Edge Task Queue
            • 44 Workload Optimizer
              • 5 Inter-edge Collaboration
                • 51 Motivation and Challenges
                • 52 Inter-Edge Task Placement Schemes
                  • 6 System Implementation and Performance Evaluation
                    • 61 Implementation Details
                    • 62 Evaluation Setup
                    • 63 Task Profiler
                    • 64 Offloading Task Selection
                    • 65 Edge-front Task Queue Prioritizing
                    • 66 Inter-Edge Collaboration
                      • 7 Related Work
                        • 71 Distributed Data Processing
                        • 72 Computation Offloading
                          • 8 Discussions and Limitations
                          • 9 Conclusion
                          • References
Page 9: LAVEA: Latency-aware Video Analytics on Edge Computing Platformweisongshi.org/papers/yi17-LAVEA.pdf · 2020. 8. 19. · LAVEA: Latency-aware Video Analytics on Edge Computing Platform

LAVEA Latency-aware Video Analytics on Edge Computing Platform SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA

62 Evaluation Setup621 Testbed We have built a testbed consisting of four edge

computing nodes One of the edge nodes is the edge-front nodewhich is directly connected to a wireless router using a cable Otherthree nodes are set as nearby edge computing nodes for the evalua-tion of inter-edge collaboration ese four machines have the samehardware specications ey all have a quad-core CPU and 4 GBmain memory e three nearby edge nodes are directly connectedto the edge-front node through a network cable We make use oftwo types of Raspberry Pi (RPi) nodes as clients one type is RPi 2which is wired to the router while the other type is RPi 3 which isconnected to router using built-in 24 Ghz WiFi

622 Datasets We have employed three datasets for evalua-tions One dataset is the Caltech Vision Group 2001 testing databasein which the car rear image resolution (126 images with resolution896x592) is adequate for license plate recognition [25] Anotherdataset is a self-collected 4K video contains rear license plates takenon an Android smartphone and is converted into videos of dierentresolutions(640x480 960x720 1280x960 and 1600x1200) e otherdataset used in inter-edge collaboration evaluation contains 22 carimages with the various resolution ranging from 405x540 pixelsto 2514x1210 pixels (le size 316 KB to 285 MB) e task requestsuse the car images as input in a round-robin way one car imagefor each task request

63 Task ProlerBeside the round trip time and bandwidth benchmark we havepresented in Fig 2 and Fig 3 to characterize the edge computingnetwork we have done proling of the OpenALPR application onvarious client edge and cloud nodes

896x592workload1

640x480workload2

960x720workload2

1280x960workload2

1600x1200workload2

0200

400

600

800

Clie

nt

RPi2

Execu

tion T

ime(m

s)

MotionDetectonPlateDetectionPlateAnalysisOCR

Figure 6 OpenALPR prole result of client type 1 (RPi2quad-core 09GHz)

In this experiment we are using both dataset 1 (workload 1)and dataset 2 (workload 2) at various resolutions e executiontime for each tasks are shown in Fig 6 Fig 7 Fig 8 and Fig 9e results indicate that by utilizing an edge nodes we can geta comparable amount of computation power close to clients forcomputation-intensive tasks Another observations is that dueto the uneven optimizations on heterogeneous CPU architecturessome tasks are beer to keep local while some others should be

896x592workload1

640x480workload2

960x720workload2

1280x960workload2

1600x1200workload2

020

04

00

60

0800

Clie

nt

RPi3

Execu

tion T

ime(m

s)

MotionDetectonPlateDetectionPlateAnalysisOCR

Figure 7 OpenALPR prole result of client type 2 (RPi3quad-core 12GHz)

896x592workload1

640x480workload2

960x720workload2

1280x960workload2

1600x1200workload2

0200

400

600

800

Edge E

xecu

tion T

ime(m

s)

MotionDetectonPlateDetectionPlateAnalysisOCR

Figure 8 OpenALPR prole result of a type of edge node (i7quad-core 230GHz)

896x592workload1

640x480workload2

960x720workload2

1280x960workload2

1600x1200workload2

0200

400

600

800

EC

2 T

2 L

arg

e E

xecu

tion T

ime(m

s)

MotionDetectonPlateDetectionPlateAnalysisOCR

Figure 9 OpenALPR prole of a type of cloud node (AWSEC2 t2large Xeon dual-core 240GHz)

ooaded to edge computing node is observation justied theneed of computation ooading between clients and edge nodes

SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA S Yi et al

640x480workload2

960x720workload2

1280x960workload2

1600x1200workload2

01

23

4R

esp

onse

Tim

e p

er

fram

e p

er

clie

nt(

s) Client-edge optClient onlyEdge onlyClient-cloud optCloud only

Figure 10 e comparison of task selection impacts on edgeoloading and cloud oloading for wired clients (RPi2)

640x480workload2

960x720workload2

1280x960workload2

1600x1200workload2

01

23

4R

esp

onse

Tim

e p

er

fram

e p

er

clie

nt(

s) Client-edge optClient onlyEdge onlyClient-cloud optCloud only

Figure 11 e comparison of task selection impacts on edgeoloading and cloud oloading for 24 Ghz wireless clients(RPi3)

64 Oloading Task SelectionTo understand how much the execution time can be reduced byspliing tasks between the client and the edge or between theclient and the cloud we design an experiment with workloadsgenerated from dataset 2 on two setups of scenarios 1) one edgenode provides service to three wired client nodes that have thebest network latency and bandwidth 2) one edge node providesservice to three wireless 24 Ghz client nodes that have latencywith high variance and relatively low bandwidth e result ofthe rst case is very straightforward the clients simply uploadall the input data and run all the tasks on the edge node in edgeooading or cloud node in cloud ooading as shown in Fig 10is is mainly because using Ethernet cable can stably providelowest latency and highest bandwidth which makes ooading toedge very rewarding We didnrsquot evaluate 5 Ghz wireless client sincethis interface is not supported on our client hardware while weanticipate similar results as the wire case We plot the result of a24 Ghz wireless client node with ooading to an edge node or aremote cloud node in the second case in Fig 11 Overall the resultsshowed that by ooading tasks to an edge computing platform

the application we choose experienced a speedup up to 40x onwired client-edge conguration compared to local execution andup to 17x compared to a similar client-cloud conguration Forclients with 24 Ghz wireless interface the speedup is up to 13x onclient-edge conguration compared to local execution and is up to12x compared to similar client-cloud conguration

5 10 15 20 25 30 35Number of task offloading requests

05

Resp

onse

Tim

e(s

)

Our schemeSIOFLCPUL

Figure 12 e comparison result of three task prioritizingschemes

65 Edge-front Taskeue PrioritizingTo evaluate the performance of the task queue prioritizing wecollect the statistical results from our proler service and moni-toring service on various workload for simulation We choose thesimulation method because we can freely setup the numbers andtypes of client and edge nodes to overcome the limitation of ourcurrent testbed to evaluate more complex deployments We addtwo simple schemes as baselines 1) shortest IO rst (SIOF) sortingall the tasks against the time cost of the network transmission 2)longest CPU last (LCPUL) sorting all the tasks against the time costof the processing on the edge node In the simulation base on thecombination of client device types workloads and ooading deci-sions we have in total seven types of jobs to run on the edge nodeWe increase the total number of jobs and evenly distributed themamong the seven types and report the makespan time in Fig 12 eresult showed that LCPUL is the worst among those three schemesand our scheme outperforms the shortest job rst scheme

66 Inter-Edge CollaborationWe also evaluate the performance of the three task placementschemes (ie STTF SQLF and SSLF) discussed in Section 5 througha controlled experiment on our testbed For evaluation purpose wecongure the network in the edge computing system as followse rst edge node denoted as ldquoedge node 1rdquo aerwards has 10ms RTT and 40 Mbps bandwidth to the edge-front node e secondedge node ldquoedge node 2rdquo has 20 ms RTT and 20 Mbps bandwidthto the edge-front node e third edge node ldquoedge node 3rdquo has100 ms RTT and 2 Mbps bandwidth to the edge-front node uswe emulate the situation where three edge nodes are in the distanceto the edge-front node from near to far

LAVEA Latency-aware Video Analytics on Edge Computing Platform SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA

0

05

1

15

2

25

3

0 2 4 6 8 10 12

Th

rou

gh

pu

t (t

ask

sse

c)

Time (min)

Edge-front nodeEdge node 1Edge node 2Edge node 3

Figure 13 Performance with no task placement scheme

0

05

1

15

2

25

3

0 2 4 6 8 10 12

Th

rou

gh

pu

t (t

ask

sse

c)

Time (min)

Edge-front nodeEdge node 1Edge node 2Edge node 3

Figure 14 Performance of STTF

0

05

1

15

2

25

3

0 2 4 6 8 10 12

Th

rou

gh

pu

t (t

ask

sse

c)

Time (min)

Edge-front nodeEdge node 1Edge node 2Edge node 3

Figure 15 Performance of SQLF

0

05

1

15

2

25

3

0 2 4 6 8 10 12

Th

rou

gh

pu

t (t

ask

sse

c)

Time (min)

Edge-front nodeEdge node 1Edge node 2Edge node 3

Figure 16 Performance of SSLF

We use the third dataset to synthesize a workload as follows Inthe rst 4 minutes the edge-front node receives 5 task requests persecond edge node 1 receives 4 task requests per second edge node2 receives 3 task requests per second and edge node 3 receives 2task requests per second respectively No task comes to any of theedge nodes aer the rst 4 minutes For the SSLF task placementscheme we implement a simple linear regression to predict thescheduling latency of the task being transmied since the workloadwe have injected is uniform distributed

Fig 13 illustrates the throughput on each edge node when notask placement scheme is enabled on the edge-front node eedge-front node has the heaviest workload and it takes about 1236minutes to nish all the tasks We consider this result as our base-line

Fig 14 is the throughput result of STTF scheme In this case theedge-front node only transmits tasks to edge node 1 because edgenode 1 has the highest bandwidth and the shortest RTT to theedge-front node Fig 17 reveals that the edge-front node transmits120 tasks to edge node 1 and no task to other edge nodes Asedge node 1 has heavier workload than edge node 2 and edgenode 3 the STTF scheme has limited improvement on the systemperformance the edge-front node takes about 1129 minutes tonish all the tasks Fig 15 illustrates the throughput result of SQLF

scheme is scheme works beer than the STTF scheme becausethe edge-front node transmits more tasks to less-saturated edgenodes eciently reducing the workload on the edge-front nodeHowever the edge-front node intends to transmit many tasks toedge node 3 at the beginning which has the lowest bandwidth andthe longest RTT to the edge-front node As such the task placementmay incur more delay then expected From Fig 17 the edge-frontnode transmits 0 task to edge node 1 132 tasks to edge node 2and 152 tasks to edge node 3 e edge-front node takes about 96minutes to nish all the tasks

Fig 16 demonstrates the throughput result of SSLF scheme isscheme considers both the transmission time of the task beingplaced and the waiting time in the queue on the target edge nodeand therefore achieves the best performance of the three As men-tioned edge node 1 has the lowest transmission overhead but theheaviest workload among the three edge nodes while edge node3 has the lightest workload but the highest transmission overheadIn contrast edge node 2 has modest transmission overhead andmodest workload e SSLF scheme takes all these situations intoconsideration and places the most number of tasks on edge node2 As shown in Fig 17 the edge-front node transmits 4 tasks toedge node 1 152 tasks to edge node 2 and 148 tasks to edgenode 3 when working with the SSLF scheme e edge-front node

SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA S Yi et al

takes about 936 minutes to nish all the tasks which is the bestresult among the three schemes We infer that the third schemewill further improve the task complete time if more tough networkconditions and workloads are considered

STTF SQLF SSLF

050

10

015

0N

um

ber

of

task

palc

ed

0task

0task

0task

4tasks

Edge node 1Edge node 2Edge node 3

Figure 17 Numbers of tasks placed by the edge-front node

7 RELATEDWORKe emergence of edge computing has drawn aentions due to itscapabilities to reshape the land surface of IoTs mobile computingand cloud computing [6 14 32 33 36ndash38] Satyanarayanan [29]has briefed the origin of edge computing also known as fog comput-ing [4] cloudlet [28] mobile edge computing [24] and so on Herewe will review several relevant research elds towards video edgeanalytics including distributed data processing and computationooading in various computing paradigms

71 Distributed Data ProcessingDistributed data processing has close relationship to the edge an-alytics in the sense that those data processing platforms [9 39]and underlying techniques [16 23 27] can be easily deployed on acluster of edge nodes In this paper we pay specially aentions todistributed imagevideo data processing systems VideoStorm[40]made insightful observation on vision-related algorithms and pro-posed resource-quality trade-o with multi-dimensional congu-rations (eg video resolution frame rate sampling rate slidingwindow size etc) e resource-quality proles are generated of-ine and a online scheduler is built to allocate resources to queriesto optimize the utility of quality and latency eir work is comple-mentary to ours in that we do not consider the trade-o betweenquality and latency goals via adaptive congurations Vigil [42] isa wireless video surveillance system that leveraged edge comput-ing nodes with emphasis on the content-aware frame selections ina scenario where multiple web cameras are at the same locationto optimize the bandwidth utilization which is orthogonal to theproblems we have addressed here Firework [41] is a computingparadigm for big data processing in collaborative edge environmentwhich is complementary to our work in terms of shared data viewand programming interface

While there should be more on-going eorts for investigating theadaptation improvement and optimization of existing distributed

data processing techniques on edge computing platform we focusmore on the taskapplication-level queue management and sched-uling and leave all the underlying resource negotiating processscheduling to the container cluster engine

72 Computation OloadingComputation Ooading (aka Cyber foraging [28]) has been pro-posed to improve resource utilization response time and energyconsumption in various computing environments [7 13 21 31]Work [17] has quantied the impact of edge computing on mobileapplications and found that edge computing can improve responsetime and energy consumption signicantly for mobile devicesthrough ooading via both WiFi and LTE networks Mocha [34]has investigated how a two-stage face recognition task from mobiledevice can be accelerated by cloudlet and cloud In their designclients simply capture image and sends to cloudlet e optimaltask partition can be easily achieved as it has only two stages InLAVEA our application is more complicated in multiple stages andwe leverage client-edge ooading and other techniques to improvethe resource utilization and optimize the response time

8 DISCUSSIONS AND LIMITATIONSIn this section we will discuss alternative design options point outcurrent limitations and future work that can improve the system

Measurement-based Oloading In this paper we are using ameasurement-based ooading (static ooading) ie the ooadingdecisions are based on the outcome of periodic measurements Weconsider this as one of the limitations of our implementations asstated in [15] and there are several dynamic computation ooadingschemes have been proposed [12] We are planning to improve themeasurement-based ooading in the future work

Video Streaming Our current data processing is image-basedwhich is one of the limitations of our implementation e inputis either in the format of image or in video stream which is readinto frames and sent out We believe by utilizing existing videostreaming techniques in between our system components for datasharing will further improve the system performance and openmore potential opportunities for optimization

Discovering EdgeNodes ere are dierent ways for the edge-front node to discover the available edge nodes nearby For exampleevery edge node intending to serve as a collaborator may open adesignated port so that the edge-front node can periodically scanthe network and discover the available edge nodes is is calledthe ldquopull-basedrdquo method In contrast there is also a ldquopush-basedrdquomethod in which the edge-front node opens a designated port andevery edge node intending to serve as a collaborator will register tothe edge-front node When the network is in a large scale the pull-based method usually performs poorly because the edge-front nodemay not be able to discover an available edge node in a short timeFor this reason the edge node discovery is implemented in a push-based method which guarantees good performance regardless ofthe network scale

9 CONCLUSIONIn this paper we have investigated providing video analytic servicesto latency-sensitive applications in edge computing environment

LAVEA Latency-aware Video Analytics on Edge Computing Platform SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA

As a result we have built LAVEA a low-latency video edge analyticsystem which collaborates nearby client edge and remote cloudnodes and transfers video feeds into semantic information at placescloser to the users in early stages We have utilized an edge-frontdesign and formulated an optimization problem for ooading taskselection and prioritized task queue to minimize the response timeOur result indicated that by ooading tasks to the closest edgenode the client-edge conguration has speedup range from 13x to4x (12x to 17x) against running in local (client-cloud) under variousnetwork conditions and workloads In case of a saturating workloadon the front edge node we have proposed and compared varioustask placement schemes that are tailed for inter-edge collaboratione proposed prediction-based shortest scheduling latency rsttask placement scheme considers both the transmission time ofthe task and the waiting time in the queue and outputs overallperformance beer than the other schemes

REFERENCES[1] Amazon Web Service 2017 AWS LambdaEdge hpdocsawsamazoncom

lambdalatestdglambda-edgehtml (2017)[2] Christos-Nikolaos E Anagnostopoulos Ioannis E Anagnostopoulos Ioannis D

Psoroulas Vassili Loumos and Eleherios Kayafas 2008 License plate recog-nition from still images and video sequences A survey IEEE Transactions onintelligent transportation systems 9 3 (2008) 377ndash391

[3] KR Baker 1990 Scheduling groups of jobs in the two-machine ow shop Math-ematical and Computer Modelling 13 3 (1990) 29ndash36

[4] Flavio Bonomi Rodolfo Milito Jiang Zhu and Sateesh Addepalli 2012 Fogcomputing and its role in the internet of things In Proceedings of the rst editionof the MCC workshop on Mobile cloud computing ACM 13ndash16

[5] Peter Brucker Bernd Jurisch and Bernd Sievers 1994 A branch and boundalgorithm for the job-shop scheduling problem Discrete applied mathematics 491 (1994) 107ndash127

[6] Yu Cao Songqing Chen Peng Hou and Donald Brown 2015 FAST A fog com-puting assisted distributed analytics system to monitor fall for stroke mitigationIn Networking Architecture and Storage (NAS) 2015 IEEE International Conferenceon IEEE 2ndash11

[7] Eduardo Cuervo Aruna Balasubramanian Dae-ki Cho Alec Wolman StefanSaroiu Ranveer Chandra and Paramvir Bahl 2010 MAUI Making SmartphonesLast Longer with Code Ooad In Proceedings of the 8th International Conferenceon Mobile Systems Applications and Services (MobiSys rsquo10) ACM New York NYUSA 49ndash62 DOIhpsdoiorg10114518144331814441

[8] Eyal de Lara Carolina S Gomes Steve Langridge S Hossein Mortazavi andMeysam Roodi 2016 Hierarchical Serverless Computing for the Mobile EdgeIn Edge Computing (SEC) IEEEACM Symposium on IEEE 109ndash110

[9] Jerey Dean and Sanjay Ghemawat 2008 MapReduce simplied data processingon large clusters Commun ACM 51 1 (2008) 107ndash113

[10] Shan Du Mahmoud Ibrahim Mohamed Shehata and Wael Badawy 2013 Auto-matic license plate recognition (ALPR) A state-of-the-art review IEEE Transac-tions on circuits and systems for video technology 23 2 (2013) 311ndash325

[11] Sadjad Fouladi Riad S Wahby Brennan Shackle Karthikeyan Vasuki Balasubra-maniam William Zeng Rahul Bhalerao Anirudh Sivaraman George Porter andKeith Winstein 2017 Encoding Fast and Slow Low-Latency Video ProcessingUsing ousands of Tiny reads In 14th USENIX Symposium on NetworkedSystems Design and Implementation (NSDI 17) USENIX Association BostonMA 363ndash376 hpswwwusenixorgconferencensdi17technical-sessionspresentationfouladi

[12] Wei Gao Yong Li Haoyang Lu Ting Wang and Cong Liu 2014 On exploitingdynamic execution paerns for workload ooading in mobile cloud applicationsIn Network Protocols (ICNP) 2014 IEEE 22nd International Conference on IEEE1ndash12

[13] Mark S Gordon Davoud Anoushe Jamshidi Sco A Mahlke Zhuoqing Mor-ley Mao and Xu Chen 2012 COMET Code Ooad by Migrating ExecutionTransparently In OSDI Vol 12 93ndash106

[14] Zijiang Hao Ed Novak Shanhe Yi andn Li 2017 Challenges and SowareArchitecture for Fog Computing Internet Computing (2017)

[15] Mohammed A Hassan Kshitiz Bhaarai Qi Wei and Songqing Chen 2014POMAC Properly Ooading Mobile Applications to Clouds In 6th USENIXWorkshop on Hot Topics in Cloud Computing (HotCloud 14)

[16] Benjamin Hindman Andy Konwinski Matei Zaharia Ali Ghodsi Anthony DJoseph Randy H Katz Sco Shenker and Ion Stoica 2011 Mesos A Platformfor Fine-Grained Resource Sharing in the Data Center In NSDI Vol 11 22ndash22

[17] Wenlu Hu Ying Gao Kiryong Ha Junjue Wang Brandon Amos Zhuo ChenPadmanabhan Pillai andMahadev Satyanarayanan 2016antifying the Impactof Edge Computing onMobile Applications In Proceedings of the 7th ACM SIGOPSAsia-Pacic Workshop on Systems ACM 5

[18] IBM 2017 Apache OpenWhisk hpopenwhiskorg (April 2017)[19] Selmer Martin Johnson 1954 Optimal two-and three-stage production schedules

with setup times included Naval research logistics quarterly 1 1 (1954) 61ndash68[20] Eric Jonas Shivaram Venkataraman Ion Stoica and Benjamin Recht 2017

Occupy the Cloud Distributed computing for the 99 arXiv preprintarXiv170204024 (2017)

[21] Ryan Newton Sivan Toledo Lewis Girod Hari Balakrishnan and Samuel Mad-den 2009 Wishbone Prole-based Partitioning for Sensornet Applications InNSDI Vol 9 395ndash408

[22] OpenALPR 2017 OpenALPR ndash Automatic License Plate Recognition hpwwwopenalprcom (April 2017)

[23] Kay Ousterhout Patrick Wendell Matei Zaharia and Ion Stoica 2013 Sparrowdistributed low latency scheduling In Proceedings of the Twenty-Fourth ACMSymposium on Operating Systems Principles ACM 69ndash84

[24] M Patel B Naughton C Chan N Sprecher S Abeta A Neal and others 2014Mobile-edge computing introductory technical white paper White Paper Mobile-edge Computing (MEC) industry initiative (2014)

[25] Brad Philip and Paul Updike 2001 Caltech Vision Group 2001 testing databasehpwwwvisioncaltecheduhtml-lesarchivehtml (2001)

[26] Kari Pulli Anatoly Baksheev Kirill Kornyakov and Victor Eruhimov 2012 Real-time computer vision with OpenCV Commun ACM 55 6 (2012) 61ndash69

[27] Je Rasley Konstantinos Karanasos Srikanth Kandula Rodrigo Fonseca MilanVojnovic and Sriram Rao 2016 Ecient queue management for cluster sched-uling In Proceedings of the Eleventh European Conference on Computer SystemsACM 36

[28] Mahadev Satyanarayanan 2001 Pervasive computing Vision and challengesIEEE Personal communications 8 4 (2001) 10ndash17

[29] Mahadev Satyanarayanan 2017 e Emergence of Edge Computing Computer50 1 (2017) 30ndash39

[30] Mahadev Satyanarayanan Pieter Simoens Yu Xiao Padmanabhan Pillai ZhuoChen Kiryong Ha Wenlu Hu and Brandon Amos 2015 Edge analytics in theinternet of things IEEE Pervasive Computing 14 2 (2015) 24ndash31

[31] Cong Shi Karim Habak Pranesh Pandurangan Mostafa Ammar Mayur Naikand Ellen Zegura 2014 Cosmos computation ooading as a service for mobiledevices In Proceedings of the 15th ACM international symposium on Mobile adhoc networking and computing ACM 287ndash296

[32] W Shi J Cao Q Zhang Y Li and L Xu 2016 Edge Computing Visionand Challenges IEEE Internet of ings Journal 3 5 (Oct 2016) 637ndash646 DOIhpsdoiorg101109JIOT20162579198

[33] Weisong Shi and Schahram Dustdar 2016 e Promise of Edge ComputingComputer 49 5 (2016) 78ndash81

[34] Tolga Soyata Rajani Muraleedharan Colin Funai Minseok Kwon and WendiHeinzelman 2012 Cloud-Vision Real-time face recognition using a mobile-cloudlet-cloud acceleration architecture In Computers and Communications(ISCC) 2012 IEEE Symposium on IEEE 000059ndash000066

[35] Xiaoli Wang Aakanksha Chowdhery andMung Chiang 2016 SkyEyes adaptivevideo streaming from UAVs In Proceedings of the 3rd Workshop on Hot Topics inWireless ACM 2ndash6

[36] Shanhe Yi Zijiang Hao Zhengrui Qin andn Li 2015 Fog Computing Plat-form and Applications In Hot Topics in Web Systems and Technologies (HotWeb)2015 ird IEEE Workshop on IEEE 73ndash78

[37] Shanhe Yi Cheng Li and n Li 2015 A Survey of Fog Computing ConceptsApplications and Issues In Proceedings of the 2015 Workshop on Mobile Big DataMobidata rsquo15 ACM 37ndash42

[38] Shanhe Yi Zhengrui Qin andn Li 2015 Security and privacy issues of fogcomputing A survey InWireless Algorithms Systems and Applications Springer685ndash695

[39] Matei Zaharia Mosharaf Chowdhury Michael J Franklin Sco Shenker and IonStoica 2010 Spark cluster computing with working sets HotCloud 10 (2010)10ndash10

[40] Haoyu Zhang Ganesh Ananthanarayanan Peter Bodik Mahai PhiliposeParamvir Bahl and Michael J Freedman 2017 Live Video Analytics at Scale withApproximation and Delay-Tolerance In 14th USENIX Symposium on NetworkedSystems Design and Implementation (NSDI 17) USENIX Association BostonMA 377ndash392 hpswwwusenixorgconferencensdi17technical-sessionspresentationzhang

[41] Q Zhang X Zhang Q Zhang W Shi and H Zhong 2016 Firework Big DataSharing and Processing in Collaborative Edge Environment In 2016 Fourth IEEEWorkshop on Hot Topics in Web Systems and Technologies (HotWeb) 20ndash25 DOIhpsdoiorg101109HotWeb201612

[42] Tan Zhang Aakanksha Chowdhery Paramvir Victor Bahl Kyle Jamieson andSuman Banerjee 2015 e design and implementation of a wireless videosurveillance system In Proceedings of the 21st Annual International Conferenceon Mobile Computing and Networking ACM 426ndash438

  • Abstract
  • 1 Introduction
  • 2 Background and Motivation
    • 21 Edge Computing Network
    • 22 Serverless Architecture
    • 23 Video Edge Analytics for Public Safety
      • 3 LAVEA System Design
        • 31 Design Goals
        • 32 System Overview
        • 33 Edge Computing Services
          • 4 Edge-front Offloading
            • 41 Task Offloading System Model and Problem Formulation
            • 42 Optimization Solver
            • 43 Prioritizing Edge Task Queue
            • 44 Workload Optimizer
              • 5 Inter-edge Collaboration
                • 51 Motivation and Challenges
                • 52 Inter-Edge Task Placement Schemes
                  • 6 System Implementation and Performance Evaluation
                    • 61 Implementation Details
                    • 62 Evaluation Setup
                    • 63 Task Profiler
                    • 64 Offloading Task Selection
                    • 65 Edge-front Task Queue Prioritizing
                    • 66 Inter-Edge Collaboration
                      • 7 Related Work
                        • 71 Distributed Data Processing
                        • 72 Computation Offloading
                          • 8 Discussions and Limitations
                          • 9 Conclusion
                          • References
Page 10: LAVEA: Latency-aware Video Analytics on Edge Computing Platformweisongshi.org/papers/yi17-LAVEA.pdf · 2020. 8. 19. · LAVEA: Latency-aware Video Analytics on Edge Computing Platform

SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA S Yi et al

640x480workload2

960x720workload2

1280x960workload2

1600x1200workload2

01

23

4R

esp

onse

Tim

e p

er

fram

e p

er

clie

nt(

s) Client-edge optClient onlyEdge onlyClient-cloud optCloud only

Figure 10 e comparison of task selection impacts on edgeoloading and cloud oloading for wired clients (RPi2)

640x480workload2

960x720workload2

1280x960workload2

1600x1200workload2

01

23

4R

esp

onse

Tim

e p

er

fram

e p

er

clie

nt(

s) Client-edge optClient onlyEdge onlyClient-cloud optCloud only

Figure 11 e comparison of task selection impacts on edgeoloading and cloud oloading for 24 Ghz wireless clients(RPi3)

64 Oloading Task SelectionTo understand how much the execution time can be reduced byspliing tasks between the client and the edge or between theclient and the cloud we design an experiment with workloadsgenerated from dataset 2 on two setups of scenarios 1) one edgenode provides service to three wired client nodes that have thebest network latency and bandwidth 2) one edge node providesservice to three wireless 24 Ghz client nodes that have latencywith high variance and relatively low bandwidth e result ofthe rst case is very straightforward the clients simply uploadall the input data and run all the tasks on the edge node in edgeooading or cloud node in cloud ooading as shown in Fig 10is is mainly because using Ethernet cable can stably providelowest latency and highest bandwidth which makes ooading toedge very rewarding We didnrsquot evaluate 5 Ghz wireless client sincethis interface is not supported on our client hardware while weanticipate similar results as the wire case We plot the result of a24 Ghz wireless client node with ooading to an edge node or aremote cloud node in the second case in Fig 11 Overall the resultsshowed that by ooading tasks to an edge computing platform

the application we choose experienced a speedup up to 40x onwired client-edge conguration compared to local execution andup to 17x compared to a similar client-cloud conguration Forclients with 24 Ghz wireless interface the speedup is up to 13x onclient-edge conguration compared to local execution and is up to12x compared to similar client-cloud conguration

5 10 15 20 25 30 35Number of task offloading requests

05

Resp

onse

Tim

e(s

)

Our schemeSIOFLCPUL

Figure 12 e comparison result of three task prioritizingschemes

65 Edge-front Taskeue PrioritizingTo evaluate the performance of the task queue prioritizing wecollect the statistical results from our proler service and moni-toring service on various workload for simulation We choose thesimulation method because we can freely setup the numbers andtypes of client and edge nodes to overcome the limitation of ourcurrent testbed to evaluate more complex deployments We addtwo simple schemes as baselines 1) shortest IO rst (SIOF) sortingall the tasks against the time cost of the network transmission 2)longest CPU last (LCPUL) sorting all the tasks against the time costof the processing on the edge node In the simulation base on thecombination of client device types workloads and ooading deci-sions we have in total seven types of jobs to run on the edge nodeWe increase the total number of jobs and evenly distributed themamong the seven types and report the makespan time in Fig 12 eresult showed that LCPUL is the worst among those three schemesand our scheme outperforms the shortest job rst scheme

66 Inter-Edge CollaborationWe also evaluate the performance of the three task placementschemes (ie STTF SQLF and SSLF) discussed in Section 5 througha controlled experiment on our testbed For evaluation purpose wecongure the network in the edge computing system as followse rst edge node denoted as ldquoedge node 1rdquo aerwards has 10ms RTT and 40 Mbps bandwidth to the edge-front node e secondedge node ldquoedge node 2rdquo has 20 ms RTT and 20 Mbps bandwidthto the edge-front node e third edge node ldquoedge node 3rdquo has100 ms RTT and 2 Mbps bandwidth to the edge-front node uswe emulate the situation where three edge nodes are in the distanceto the edge-front node from near to far

LAVEA Latency-aware Video Analytics on Edge Computing Platform SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA

0

05

1

15

2

25

3

0 2 4 6 8 10 12

Th

rou

gh

pu

t (t

ask

sse

c)

Time (min)

Edge-front nodeEdge node 1Edge node 2Edge node 3

Figure 13 Performance with no task placement scheme

0

05

1

15

2

25

3

0 2 4 6 8 10 12

Th

rou

gh

pu

t (t

ask

sse

c)

Time (min)

Edge-front nodeEdge node 1Edge node 2Edge node 3

Figure 14 Performance of STTF

0

05

1

15

2

25

3

0 2 4 6 8 10 12

Th

rou

gh

pu

t (t

ask

sse

c)

Time (min)

Edge-front nodeEdge node 1Edge node 2Edge node 3

Figure 15 Performance of SQLF

0

05

1

15

2

25

3

0 2 4 6 8 10 12

Th

rou

gh

pu

t (t

ask

sse

c)

Time (min)

Edge-front nodeEdge node 1Edge node 2Edge node 3

Figure 16 Performance of SSLF

We use the third dataset to synthesize a workload as follows Inthe rst 4 minutes the edge-front node receives 5 task requests persecond edge node 1 receives 4 task requests per second edge node2 receives 3 task requests per second and edge node 3 receives 2task requests per second respectively No task comes to any of theedge nodes aer the rst 4 minutes For the SSLF task placementscheme we implement a simple linear regression to predict thescheduling latency of the task being transmied since the workloadwe have injected is uniform distributed

Fig 13 illustrates the throughput on each edge node when notask placement scheme is enabled on the edge-front node eedge-front node has the heaviest workload and it takes about 1236minutes to nish all the tasks We consider this result as our base-line

Fig 14 is the throughput result of STTF scheme In this case theedge-front node only transmits tasks to edge node 1 because edgenode 1 has the highest bandwidth and the shortest RTT to theedge-front node Fig 17 reveals that the edge-front node transmits120 tasks to edge node 1 and no task to other edge nodes Asedge node 1 has heavier workload than edge node 2 and edgenode 3 the STTF scheme has limited improvement on the systemperformance the edge-front node takes about 1129 minutes tonish all the tasks Fig 15 illustrates the throughput result of SQLF

scheme is scheme works beer than the STTF scheme becausethe edge-front node transmits more tasks to less-saturated edgenodes eciently reducing the workload on the edge-front nodeHowever the edge-front node intends to transmit many tasks toedge node 3 at the beginning which has the lowest bandwidth andthe longest RTT to the edge-front node As such the task placementmay incur more delay then expected From Fig 17 the edge-frontnode transmits 0 task to edge node 1 132 tasks to edge node 2and 152 tasks to edge node 3 e edge-front node takes about 96minutes to nish all the tasks

Fig 16 demonstrates the throughput result of SSLF scheme isscheme considers both the transmission time of the task beingplaced and the waiting time in the queue on the target edge nodeand therefore achieves the best performance of the three As men-tioned edge node 1 has the lowest transmission overhead but theheaviest workload among the three edge nodes while edge node3 has the lightest workload but the highest transmission overheadIn contrast edge node 2 has modest transmission overhead andmodest workload e SSLF scheme takes all these situations intoconsideration and places the most number of tasks on edge node2 As shown in Fig 17 the edge-front node transmits 4 tasks toedge node 1 152 tasks to edge node 2 and 148 tasks to edgenode 3 when working with the SSLF scheme e edge-front node

SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA S Yi et al

takes about 936 minutes to nish all the tasks which is the bestresult among the three schemes We infer that the third schemewill further improve the task complete time if more tough networkconditions and workloads are considered

STTF SQLF SSLF

050

10

015

0N

um

ber

of

task

palc

ed

0task

0task

0task

4tasks

Edge node 1Edge node 2Edge node 3

Figure 17 Numbers of tasks placed by the edge-front node

7 RELATEDWORKe emergence of edge computing has drawn aentions due to itscapabilities to reshape the land surface of IoTs mobile computingand cloud computing [6 14 32 33 36ndash38] Satyanarayanan [29]has briefed the origin of edge computing also known as fog comput-ing [4] cloudlet [28] mobile edge computing [24] and so on Herewe will review several relevant research elds towards video edgeanalytics including distributed data processing and computationooading in various computing paradigms

71 Distributed Data ProcessingDistributed data processing has close relationship to the edge an-alytics in the sense that those data processing platforms [9 39]and underlying techniques [16 23 27] can be easily deployed on acluster of edge nodes In this paper we pay specially aentions todistributed imagevideo data processing systems VideoStorm[40]made insightful observation on vision-related algorithms and pro-posed resource-quality trade-o with multi-dimensional congu-rations (eg video resolution frame rate sampling rate slidingwindow size etc) e resource-quality proles are generated of-ine and a online scheduler is built to allocate resources to queriesto optimize the utility of quality and latency eir work is comple-mentary to ours in that we do not consider the trade-o betweenquality and latency goals via adaptive congurations Vigil [42] isa wireless video surveillance system that leveraged edge comput-ing nodes with emphasis on the content-aware frame selections ina scenario where multiple web cameras are at the same locationto optimize the bandwidth utilization which is orthogonal to theproblems we have addressed here Firework [41] is a computingparadigm for big data processing in collaborative edge environmentwhich is complementary to our work in terms of shared data viewand programming interface

While there should be more on-going eorts for investigating theadaptation improvement and optimization of existing distributed

data processing techniques on edge computing platform we focusmore on the taskapplication-level queue management and sched-uling and leave all the underlying resource negotiating processscheduling to the container cluster engine

72 Computation OloadingComputation Ooading (aka Cyber foraging [28]) has been pro-posed to improve resource utilization response time and energyconsumption in various computing environments [7 13 21 31]Work [17] has quantied the impact of edge computing on mobileapplications and found that edge computing can improve responsetime and energy consumption signicantly for mobile devicesthrough ooading via both WiFi and LTE networks Mocha [34]has investigated how a two-stage face recognition task from mobiledevice can be accelerated by cloudlet and cloud In their designclients simply capture image and sends to cloudlet e optimaltask partition can be easily achieved as it has only two stages InLAVEA our application is more complicated in multiple stages andwe leverage client-edge ooading and other techniques to improvethe resource utilization and optimize the response time

8 DISCUSSIONS AND LIMITATIONSIn this section we will discuss alternative design options point outcurrent limitations and future work that can improve the system

Measurement-based Oloading In this paper we are using ameasurement-based ooading (static ooading) ie the ooadingdecisions are based on the outcome of periodic measurements Weconsider this as one of the limitations of our implementations asstated in [15] and there are several dynamic computation ooadingschemes have been proposed [12] We are planning to improve themeasurement-based ooading in the future work

Video Streaming Our current data processing is image-basedwhich is one of the limitations of our implementation e inputis either in the format of image or in video stream which is readinto frames and sent out We believe by utilizing existing videostreaming techniques in between our system components for datasharing will further improve the system performance and openmore potential opportunities for optimization

Discovering EdgeNodes ere are dierent ways for the edge-front node to discover the available edge nodes nearby For exampleevery edge node intending to serve as a collaborator may open adesignated port so that the edge-front node can periodically scanthe network and discover the available edge nodes is is calledthe ldquopull-basedrdquo method In contrast there is also a ldquopush-basedrdquomethod in which the edge-front node opens a designated port andevery edge node intending to serve as a collaborator will register tothe edge-front node When the network is in a large scale the pull-based method usually performs poorly because the edge-front nodemay not be able to discover an available edge node in a short timeFor this reason the edge node discovery is implemented in a push-based method which guarantees good performance regardless ofthe network scale

9 CONCLUSIONIn this paper we have investigated providing video analytic servicesto latency-sensitive applications in edge computing environment

LAVEA Latency-aware Video Analytics on Edge Computing Platform SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA

As a result we have built LAVEA a low-latency video edge analyticsystem which collaborates nearby client edge and remote cloudnodes and transfers video feeds into semantic information at placescloser to the users in early stages We have utilized an edge-frontdesign and formulated an optimization problem for ooading taskselection and prioritized task queue to minimize the response timeOur result indicated that by ooading tasks to the closest edgenode the client-edge conguration has speedup range from 13x to4x (12x to 17x) against running in local (client-cloud) under variousnetwork conditions and workloads In case of a saturating workloadon the front edge node we have proposed and compared varioustask placement schemes that are tailed for inter-edge collaboratione proposed prediction-based shortest scheduling latency rsttask placement scheme considers both the transmission time ofthe task and the waiting time in the queue and outputs overallperformance beer than the other schemes

REFERENCES[1] Amazon Web Service 2017 AWS LambdaEdge hpdocsawsamazoncom

lambdalatestdglambda-edgehtml (2017)[2] Christos-Nikolaos E Anagnostopoulos Ioannis E Anagnostopoulos Ioannis D

Psoroulas Vassili Loumos and Eleherios Kayafas 2008 License plate recog-nition from still images and video sequences A survey IEEE Transactions onintelligent transportation systems 9 3 (2008) 377ndash391

[3] KR Baker 1990 Scheduling groups of jobs in the two-machine ow shop Math-ematical and Computer Modelling 13 3 (1990) 29ndash36

[4] Flavio Bonomi Rodolfo Milito Jiang Zhu and Sateesh Addepalli 2012 Fogcomputing and its role in the internet of things In Proceedings of the rst editionof the MCC workshop on Mobile cloud computing ACM 13ndash16

[5] Peter Brucker Bernd Jurisch and Bernd Sievers 1994 A branch and boundalgorithm for the job-shop scheduling problem Discrete applied mathematics 491 (1994) 107ndash127

[6] Yu Cao Songqing Chen Peng Hou and Donald Brown 2015 FAST A fog com-puting assisted distributed analytics system to monitor fall for stroke mitigationIn Networking Architecture and Storage (NAS) 2015 IEEE International Conferenceon IEEE 2ndash11

[7] Eduardo Cuervo Aruna Balasubramanian Dae-ki Cho Alec Wolman StefanSaroiu Ranveer Chandra and Paramvir Bahl 2010 MAUI Making SmartphonesLast Longer with Code Ooad In Proceedings of the 8th International Conferenceon Mobile Systems Applications and Services (MobiSys rsquo10) ACM New York NYUSA 49ndash62 DOIhpsdoiorg10114518144331814441

[8] Eyal de Lara Carolina S Gomes Steve Langridge S Hossein Mortazavi andMeysam Roodi 2016 Hierarchical Serverless Computing for the Mobile EdgeIn Edge Computing (SEC) IEEEACM Symposium on IEEE 109ndash110

[9] Jerey Dean and Sanjay Ghemawat 2008 MapReduce simplied data processingon large clusters Commun ACM 51 1 (2008) 107ndash113

[10] Shan Du Mahmoud Ibrahim Mohamed Shehata and Wael Badawy 2013 Auto-matic license plate recognition (ALPR) A state-of-the-art review IEEE Transac-tions on circuits and systems for video technology 23 2 (2013) 311ndash325

[11] Sadjad Fouladi Riad S Wahby Brennan Shackle Karthikeyan Vasuki Balasubra-maniam William Zeng Rahul Bhalerao Anirudh Sivaraman George Porter andKeith Winstein 2017 Encoding Fast and Slow Low-Latency Video ProcessingUsing ousands of Tiny reads In 14th USENIX Symposium on NetworkedSystems Design and Implementation (NSDI 17) USENIX Association BostonMA 363ndash376 hpswwwusenixorgconferencensdi17technical-sessionspresentationfouladi

[12] Wei Gao Yong Li Haoyang Lu Ting Wang and Cong Liu 2014 On exploitingdynamic execution paerns for workload ooading in mobile cloud applicationsIn Network Protocols (ICNP) 2014 IEEE 22nd International Conference on IEEE1ndash12

[13] Mark S Gordon Davoud Anoushe Jamshidi Sco A Mahlke Zhuoqing Mor-ley Mao and Xu Chen 2012 COMET Code Ooad by Migrating ExecutionTransparently In OSDI Vol 12 93ndash106

[14] Zijiang Hao Ed Novak Shanhe Yi andn Li 2017 Challenges and SowareArchitecture for Fog Computing Internet Computing (2017)

[15] Mohammed A Hassan Kshitiz Bhaarai Qi Wei and Songqing Chen 2014POMAC Properly Ooading Mobile Applications to Clouds In 6th USENIXWorkshop on Hot Topics in Cloud Computing (HotCloud 14)

[16] Benjamin Hindman Andy Konwinski Matei Zaharia Ali Ghodsi Anthony DJoseph Randy H Katz Sco Shenker and Ion Stoica 2011 Mesos A Platformfor Fine-Grained Resource Sharing in the Data Center In NSDI Vol 11 22ndash22

[17] Wenlu Hu Ying Gao Kiryong Ha Junjue Wang Brandon Amos Zhuo ChenPadmanabhan Pillai andMahadev Satyanarayanan 2016antifying the Impactof Edge Computing onMobile Applications In Proceedings of the 7th ACM SIGOPSAsia-Pacic Workshop on Systems ACM 5

[18] IBM 2017 Apache OpenWhisk hpopenwhiskorg (April 2017)[19] Selmer Martin Johnson 1954 Optimal two-and three-stage production schedules

with setup times included Naval research logistics quarterly 1 1 (1954) 61ndash68[20] Eric Jonas Shivaram Venkataraman Ion Stoica and Benjamin Recht 2017

Occupy the Cloud Distributed computing for the 99 arXiv preprintarXiv170204024 (2017)

[21] Ryan Newton Sivan Toledo Lewis Girod Hari Balakrishnan and Samuel Mad-den 2009 Wishbone Prole-based Partitioning for Sensornet Applications InNSDI Vol 9 395ndash408

[22] OpenALPR 2017 OpenALPR ndash Automatic License Plate Recognition hpwwwopenalprcom (April 2017)

[23] Kay Ousterhout Patrick Wendell Matei Zaharia and Ion Stoica 2013 Sparrowdistributed low latency scheduling In Proceedings of the Twenty-Fourth ACMSymposium on Operating Systems Principles ACM 69ndash84

[24] M Patel B Naughton C Chan N Sprecher S Abeta A Neal and others 2014Mobile-edge computing introductory technical white paper White Paper Mobile-edge Computing (MEC) industry initiative (2014)

[25] Brad Philip and Paul Updike 2001 Caltech Vision Group 2001 testing databasehpwwwvisioncaltecheduhtml-lesarchivehtml (2001)

[26] Kari Pulli Anatoly Baksheev Kirill Kornyakov and Victor Eruhimov 2012 Real-time computer vision with OpenCV Commun ACM 55 6 (2012) 61ndash69

[27] Je Rasley Konstantinos Karanasos Srikanth Kandula Rodrigo Fonseca MilanVojnovic and Sriram Rao 2016 Ecient queue management for cluster sched-uling In Proceedings of the Eleventh European Conference on Computer SystemsACM 36

[28] Mahadev Satyanarayanan 2001 Pervasive computing Vision and challengesIEEE Personal communications 8 4 (2001) 10ndash17

[29] Mahadev Satyanarayanan 2017 e Emergence of Edge Computing Computer50 1 (2017) 30ndash39

[30] Mahadev Satyanarayanan Pieter Simoens Yu Xiao Padmanabhan Pillai ZhuoChen Kiryong Ha Wenlu Hu and Brandon Amos 2015 Edge analytics in theinternet of things IEEE Pervasive Computing 14 2 (2015) 24ndash31

[31] Cong Shi Karim Habak Pranesh Pandurangan Mostafa Ammar Mayur Naikand Ellen Zegura 2014 Cosmos computation ooading as a service for mobiledevices In Proceedings of the 15th ACM international symposium on Mobile adhoc networking and computing ACM 287ndash296

[32] W Shi J Cao Q Zhang Y Li and L Xu 2016 Edge Computing Visionand Challenges IEEE Internet of ings Journal 3 5 (Oct 2016) 637ndash646 DOIhpsdoiorg101109JIOT20162579198

[33] Weisong Shi and Schahram Dustdar 2016 e Promise of Edge ComputingComputer 49 5 (2016) 78ndash81

[34] Tolga Soyata Rajani Muraleedharan Colin Funai Minseok Kwon and WendiHeinzelman 2012 Cloud-Vision Real-time face recognition using a mobile-cloudlet-cloud acceleration architecture In Computers and Communications(ISCC) 2012 IEEE Symposium on IEEE 000059ndash000066

[35] Xiaoli Wang Aakanksha Chowdhery andMung Chiang 2016 SkyEyes adaptivevideo streaming from UAVs In Proceedings of the 3rd Workshop on Hot Topics inWireless ACM 2ndash6

[36] Shanhe Yi Zijiang Hao Zhengrui Qin andn Li 2015 Fog Computing Plat-form and Applications In Hot Topics in Web Systems and Technologies (HotWeb)2015 ird IEEE Workshop on IEEE 73ndash78

[37] Shanhe Yi Cheng Li and n Li 2015 A Survey of Fog Computing ConceptsApplications and Issues In Proceedings of the 2015 Workshop on Mobile Big DataMobidata rsquo15 ACM 37ndash42

[38] Shanhe Yi Zhengrui Qin andn Li 2015 Security and privacy issues of fogcomputing A survey InWireless Algorithms Systems and Applications Springer685ndash695

[39] Matei Zaharia Mosharaf Chowdhury Michael J Franklin Sco Shenker and IonStoica 2010 Spark cluster computing with working sets HotCloud 10 (2010)10ndash10

[40] Haoyu Zhang Ganesh Ananthanarayanan Peter Bodik Mahai PhiliposeParamvir Bahl and Michael J Freedman 2017 Live Video Analytics at Scale withApproximation and Delay-Tolerance In 14th USENIX Symposium on NetworkedSystems Design and Implementation (NSDI 17) USENIX Association BostonMA 377ndash392 hpswwwusenixorgconferencensdi17technical-sessionspresentationzhang

[41] Q Zhang X Zhang Q Zhang W Shi and H Zhong 2016 Firework Big DataSharing and Processing in Collaborative Edge Environment In 2016 Fourth IEEEWorkshop on Hot Topics in Web Systems and Technologies (HotWeb) 20ndash25 DOIhpsdoiorg101109HotWeb201612

[42] Tan Zhang Aakanksha Chowdhery Paramvir Victor Bahl Kyle Jamieson andSuman Banerjee 2015 e design and implementation of a wireless videosurveillance system In Proceedings of the 21st Annual International Conferenceon Mobile Computing and Networking ACM 426ndash438

  • Abstract
  • 1 Introduction
  • 2 Background and Motivation
    • 21 Edge Computing Network
    • 22 Serverless Architecture
    • 23 Video Edge Analytics for Public Safety
      • 3 LAVEA System Design
        • 31 Design Goals
        • 32 System Overview
        • 33 Edge Computing Services
          • 4 Edge-front Offloading
            • 41 Task Offloading System Model and Problem Formulation
            • 42 Optimization Solver
            • 43 Prioritizing Edge Task Queue
            • 44 Workload Optimizer
              • 5 Inter-edge Collaboration
                • 51 Motivation and Challenges
                • 52 Inter-Edge Task Placement Schemes
                  • 6 System Implementation and Performance Evaluation
                    • 61 Implementation Details
                    • 62 Evaluation Setup
                    • 63 Task Profiler
                    • 64 Offloading Task Selection
                    • 65 Edge-front Task Queue Prioritizing
                    • 66 Inter-Edge Collaboration
                      • 7 Related Work
                        • 71 Distributed Data Processing
                        • 72 Computation Offloading
                          • 8 Discussions and Limitations
                          • 9 Conclusion
                          • References
Page 11: LAVEA: Latency-aware Video Analytics on Edge Computing Platformweisongshi.org/papers/yi17-LAVEA.pdf · 2020. 8. 19. · LAVEA: Latency-aware Video Analytics on Edge Computing Platform

LAVEA Latency-aware Video Analytics on Edge Computing Platform SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA

0

05

1

15

2

25

3

0 2 4 6 8 10 12

Th

rou

gh

pu

t (t

ask

sse

c)

Time (min)

Edge-front nodeEdge node 1Edge node 2Edge node 3

Figure 13 Performance with no task placement scheme

0

05

1

15

2

25

3

0 2 4 6 8 10 12

Th

rou

gh

pu

t (t

ask

sse

c)

Time (min)

Edge-front nodeEdge node 1Edge node 2Edge node 3

Figure 14 Performance of STTF

0

05

1

15

2

25

3

0 2 4 6 8 10 12

Th

rou

gh

pu

t (t

ask

sse

c)

Time (min)

Edge-front nodeEdge node 1Edge node 2Edge node 3

Figure 15 Performance of SQLF

0

05

1

15

2

25

3

0 2 4 6 8 10 12

Th

rou

gh

pu

t (t

ask

sse

c)

Time (min)

Edge-front nodeEdge node 1Edge node 2Edge node 3

Figure 16 Performance of SSLF

We use the third dataset to synthesize a workload as follows Inthe rst 4 minutes the edge-front node receives 5 task requests persecond edge node 1 receives 4 task requests per second edge node2 receives 3 task requests per second and edge node 3 receives 2task requests per second respectively No task comes to any of theedge nodes aer the rst 4 minutes For the SSLF task placementscheme we implement a simple linear regression to predict thescheduling latency of the task being transmied since the workloadwe have injected is uniform distributed

Fig 13 illustrates the throughput on each edge node when notask placement scheme is enabled on the edge-front node eedge-front node has the heaviest workload and it takes about 1236minutes to nish all the tasks We consider this result as our base-line

Fig 14 is the throughput result of STTF scheme In this case theedge-front node only transmits tasks to edge node 1 because edgenode 1 has the highest bandwidth and the shortest RTT to theedge-front node Fig 17 reveals that the edge-front node transmits120 tasks to edge node 1 and no task to other edge nodes Asedge node 1 has heavier workload than edge node 2 and edgenode 3 the STTF scheme has limited improvement on the systemperformance the edge-front node takes about 1129 minutes tonish all the tasks Fig 15 illustrates the throughput result of SQLF

scheme is scheme works beer than the STTF scheme becausethe edge-front node transmits more tasks to less-saturated edgenodes eciently reducing the workload on the edge-front nodeHowever the edge-front node intends to transmit many tasks toedge node 3 at the beginning which has the lowest bandwidth andthe longest RTT to the edge-front node As such the task placementmay incur more delay then expected From Fig 17 the edge-frontnode transmits 0 task to edge node 1 132 tasks to edge node 2and 152 tasks to edge node 3 e edge-front node takes about 96minutes to nish all the tasks

Fig 16 demonstrates the throughput result of SSLF scheme isscheme considers both the transmission time of the task beingplaced and the waiting time in the queue on the target edge nodeand therefore achieves the best performance of the three As men-tioned edge node 1 has the lowest transmission overhead but theheaviest workload among the three edge nodes while edge node3 has the lightest workload but the highest transmission overheadIn contrast edge node 2 has modest transmission overhead andmodest workload e SSLF scheme takes all these situations intoconsideration and places the most number of tasks on edge node2 As shown in Fig 17 the edge-front node transmits 4 tasks toedge node 1 152 tasks to edge node 2 and 148 tasks to edgenode 3 when working with the SSLF scheme e edge-front node

SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA S Yi et al

takes about 936 minutes to nish all the tasks which is the bestresult among the three schemes We infer that the third schemewill further improve the task complete time if more tough networkconditions and workloads are considered

STTF SQLF SSLF

050

10

015

0N

um

ber

of

task

palc

ed

0task

0task

0task

4tasks

Edge node 1Edge node 2Edge node 3

Figure 17 Numbers of tasks placed by the edge-front node

7 RELATEDWORKe emergence of edge computing has drawn aentions due to itscapabilities to reshape the land surface of IoTs mobile computingand cloud computing [6 14 32 33 36ndash38] Satyanarayanan [29]has briefed the origin of edge computing also known as fog comput-ing [4] cloudlet [28] mobile edge computing [24] and so on Herewe will review several relevant research elds towards video edgeanalytics including distributed data processing and computationooading in various computing paradigms

71 Distributed Data ProcessingDistributed data processing has close relationship to the edge an-alytics in the sense that those data processing platforms [9 39]and underlying techniques [16 23 27] can be easily deployed on acluster of edge nodes In this paper we pay specially aentions todistributed imagevideo data processing systems VideoStorm[40]made insightful observation on vision-related algorithms and pro-posed resource-quality trade-o with multi-dimensional congu-rations (eg video resolution frame rate sampling rate slidingwindow size etc) e resource-quality proles are generated of-ine and a online scheduler is built to allocate resources to queriesto optimize the utility of quality and latency eir work is comple-mentary to ours in that we do not consider the trade-o betweenquality and latency goals via adaptive congurations Vigil [42] isa wireless video surveillance system that leveraged edge comput-ing nodes with emphasis on the content-aware frame selections ina scenario where multiple web cameras are at the same locationto optimize the bandwidth utilization which is orthogonal to theproblems we have addressed here Firework [41] is a computingparadigm for big data processing in collaborative edge environmentwhich is complementary to our work in terms of shared data viewand programming interface

While there should be more on-going eorts for investigating theadaptation improvement and optimization of existing distributed

data processing techniques on edge computing platform we focusmore on the taskapplication-level queue management and sched-uling and leave all the underlying resource negotiating processscheduling to the container cluster engine

72 Computation OloadingComputation Ooading (aka Cyber foraging [28]) has been pro-posed to improve resource utilization response time and energyconsumption in various computing environments [7 13 21 31]Work [17] has quantied the impact of edge computing on mobileapplications and found that edge computing can improve responsetime and energy consumption signicantly for mobile devicesthrough ooading via both WiFi and LTE networks Mocha [34]has investigated how a two-stage face recognition task from mobiledevice can be accelerated by cloudlet and cloud In their designclients simply capture image and sends to cloudlet e optimaltask partition can be easily achieved as it has only two stages InLAVEA our application is more complicated in multiple stages andwe leverage client-edge ooading and other techniques to improvethe resource utilization and optimize the response time

8 DISCUSSIONS AND LIMITATIONSIn this section we will discuss alternative design options point outcurrent limitations and future work that can improve the system

Measurement-based Oloading In this paper we are using ameasurement-based ooading (static ooading) ie the ooadingdecisions are based on the outcome of periodic measurements Weconsider this as one of the limitations of our implementations asstated in [15] and there are several dynamic computation ooadingschemes have been proposed [12] We are planning to improve themeasurement-based ooading in the future work

Video Streaming Our current data processing is image-basedwhich is one of the limitations of our implementation e inputis either in the format of image or in video stream which is readinto frames and sent out We believe by utilizing existing videostreaming techniques in between our system components for datasharing will further improve the system performance and openmore potential opportunities for optimization

Discovering EdgeNodes ere are dierent ways for the edge-front node to discover the available edge nodes nearby For exampleevery edge node intending to serve as a collaborator may open adesignated port so that the edge-front node can periodically scanthe network and discover the available edge nodes is is calledthe ldquopull-basedrdquo method In contrast there is also a ldquopush-basedrdquomethod in which the edge-front node opens a designated port andevery edge node intending to serve as a collaborator will register tothe edge-front node When the network is in a large scale the pull-based method usually performs poorly because the edge-front nodemay not be able to discover an available edge node in a short timeFor this reason the edge node discovery is implemented in a push-based method which guarantees good performance regardless ofthe network scale

9 CONCLUSIONIn this paper we have investigated providing video analytic servicesto latency-sensitive applications in edge computing environment

LAVEA Latency-aware Video Analytics on Edge Computing Platform SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA

As a result we have built LAVEA a low-latency video edge analyticsystem which collaborates nearby client edge and remote cloudnodes and transfers video feeds into semantic information at placescloser to the users in early stages We have utilized an edge-frontdesign and formulated an optimization problem for ooading taskselection and prioritized task queue to minimize the response timeOur result indicated that by ooading tasks to the closest edgenode the client-edge conguration has speedup range from 13x to4x (12x to 17x) against running in local (client-cloud) under variousnetwork conditions and workloads In case of a saturating workloadon the front edge node we have proposed and compared varioustask placement schemes that are tailed for inter-edge collaboratione proposed prediction-based shortest scheduling latency rsttask placement scheme considers both the transmission time ofthe task and the waiting time in the queue and outputs overallperformance beer than the other schemes

REFERENCES[1] Amazon Web Service 2017 AWS LambdaEdge hpdocsawsamazoncom

lambdalatestdglambda-edgehtml (2017)[2] Christos-Nikolaos E Anagnostopoulos Ioannis E Anagnostopoulos Ioannis D

Psoroulas Vassili Loumos and Eleherios Kayafas 2008 License plate recog-nition from still images and video sequences A survey IEEE Transactions onintelligent transportation systems 9 3 (2008) 377ndash391

[3] KR Baker 1990 Scheduling groups of jobs in the two-machine ow shop Math-ematical and Computer Modelling 13 3 (1990) 29ndash36

[4] Flavio Bonomi Rodolfo Milito Jiang Zhu and Sateesh Addepalli 2012 Fogcomputing and its role in the internet of things In Proceedings of the rst editionof the MCC workshop on Mobile cloud computing ACM 13ndash16

[5] Peter Brucker Bernd Jurisch and Bernd Sievers 1994 A branch and boundalgorithm for the job-shop scheduling problem Discrete applied mathematics 491 (1994) 107ndash127

[6] Yu Cao Songqing Chen Peng Hou and Donald Brown 2015 FAST A fog com-puting assisted distributed analytics system to monitor fall for stroke mitigationIn Networking Architecture and Storage (NAS) 2015 IEEE International Conferenceon IEEE 2ndash11

[7] Eduardo Cuervo Aruna Balasubramanian Dae-ki Cho Alec Wolman StefanSaroiu Ranveer Chandra and Paramvir Bahl 2010 MAUI Making SmartphonesLast Longer with Code Ooad In Proceedings of the 8th International Conferenceon Mobile Systems Applications and Services (MobiSys rsquo10) ACM New York NYUSA 49ndash62 DOIhpsdoiorg10114518144331814441

[8] Eyal de Lara Carolina S Gomes Steve Langridge S Hossein Mortazavi andMeysam Roodi 2016 Hierarchical Serverless Computing for the Mobile EdgeIn Edge Computing (SEC) IEEEACM Symposium on IEEE 109ndash110

[9] Jerey Dean and Sanjay Ghemawat 2008 MapReduce simplied data processingon large clusters Commun ACM 51 1 (2008) 107ndash113

[10] Shan Du Mahmoud Ibrahim Mohamed Shehata and Wael Badawy 2013 Auto-matic license plate recognition (ALPR) A state-of-the-art review IEEE Transac-tions on circuits and systems for video technology 23 2 (2013) 311ndash325

[11] Sadjad Fouladi Riad S Wahby Brennan Shackle Karthikeyan Vasuki Balasubra-maniam William Zeng Rahul Bhalerao Anirudh Sivaraman George Porter andKeith Winstein 2017 Encoding Fast and Slow Low-Latency Video ProcessingUsing ousands of Tiny reads In 14th USENIX Symposium on NetworkedSystems Design and Implementation (NSDI 17) USENIX Association BostonMA 363ndash376 hpswwwusenixorgconferencensdi17technical-sessionspresentationfouladi

[12] Wei Gao Yong Li Haoyang Lu Ting Wang and Cong Liu 2014 On exploitingdynamic execution paerns for workload ooading in mobile cloud applicationsIn Network Protocols (ICNP) 2014 IEEE 22nd International Conference on IEEE1ndash12

[13] Mark S Gordon Davoud Anoushe Jamshidi Sco A Mahlke Zhuoqing Mor-ley Mao and Xu Chen 2012 COMET Code Ooad by Migrating ExecutionTransparently In OSDI Vol 12 93ndash106

[14] Zijiang Hao Ed Novak Shanhe Yi andn Li 2017 Challenges and SowareArchitecture for Fog Computing Internet Computing (2017)

[15] Mohammed A Hassan Kshitiz Bhaarai Qi Wei and Songqing Chen 2014POMAC Properly Ooading Mobile Applications to Clouds In 6th USENIXWorkshop on Hot Topics in Cloud Computing (HotCloud 14)

[16] Benjamin Hindman Andy Konwinski Matei Zaharia Ali Ghodsi Anthony DJoseph Randy H Katz Sco Shenker and Ion Stoica 2011 Mesos A Platformfor Fine-Grained Resource Sharing in the Data Center In NSDI Vol 11 22ndash22

[17] Wenlu Hu Ying Gao Kiryong Ha Junjue Wang Brandon Amos Zhuo ChenPadmanabhan Pillai andMahadev Satyanarayanan 2016antifying the Impactof Edge Computing onMobile Applications In Proceedings of the 7th ACM SIGOPSAsia-Pacic Workshop on Systems ACM 5

[18] IBM 2017 Apache OpenWhisk hpopenwhiskorg (April 2017)[19] Selmer Martin Johnson 1954 Optimal two-and three-stage production schedules

with setup times included Naval research logistics quarterly 1 1 (1954) 61ndash68[20] Eric Jonas Shivaram Venkataraman Ion Stoica and Benjamin Recht 2017

Occupy the Cloud Distributed computing for the 99 arXiv preprintarXiv170204024 (2017)

[21] Ryan Newton Sivan Toledo Lewis Girod Hari Balakrishnan and Samuel Mad-den 2009 Wishbone Prole-based Partitioning for Sensornet Applications InNSDI Vol 9 395ndash408

[22] OpenALPR 2017 OpenALPR ndash Automatic License Plate Recognition hpwwwopenalprcom (April 2017)

[23] Kay Ousterhout Patrick Wendell Matei Zaharia and Ion Stoica 2013 Sparrowdistributed low latency scheduling In Proceedings of the Twenty-Fourth ACMSymposium on Operating Systems Principles ACM 69ndash84

[24] M Patel B Naughton C Chan N Sprecher S Abeta A Neal and others 2014Mobile-edge computing introductory technical white paper White Paper Mobile-edge Computing (MEC) industry initiative (2014)

[25] Brad Philip and Paul Updike 2001 Caltech Vision Group 2001 testing databasehpwwwvisioncaltecheduhtml-lesarchivehtml (2001)

[26] Kari Pulli Anatoly Baksheev Kirill Kornyakov and Victor Eruhimov 2012 Real-time computer vision with OpenCV Commun ACM 55 6 (2012) 61ndash69

[27] Je Rasley Konstantinos Karanasos Srikanth Kandula Rodrigo Fonseca MilanVojnovic and Sriram Rao 2016 Ecient queue management for cluster sched-uling In Proceedings of the Eleventh European Conference on Computer SystemsACM 36

[28] Mahadev Satyanarayanan 2001 Pervasive computing Vision and challengesIEEE Personal communications 8 4 (2001) 10ndash17

[29] Mahadev Satyanarayanan 2017 e Emergence of Edge Computing Computer50 1 (2017) 30ndash39

[30] Mahadev Satyanarayanan Pieter Simoens Yu Xiao Padmanabhan Pillai ZhuoChen Kiryong Ha Wenlu Hu and Brandon Amos 2015 Edge analytics in theinternet of things IEEE Pervasive Computing 14 2 (2015) 24ndash31

[31] Cong Shi Karim Habak Pranesh Pandurangan Mostafa Ammar Mayur Naikand Ellen Zegura 2014 Cosmos computation ooading as a service for mobiledevices In Proceedings of the 15th ACM international symposium on Mobile adhoc networking and computing ACM 287ndash296

[32] W Shi J Cao Q Zhang Y Li and L Xu 2016 Edge Computing Visionand Challenges IEEE Internet of ings Journal 3 5 (Oct 2016) 637ndash646 DOIhpsdoiorg101109JIOT20162579198

[33] Weisong Shi and Schahram Dustdar 2016 e Promise of Edge ComputingComputer 49 5 (2016) 78ndash81

[34] Tolga Soyata Rajani Muraleedharan Colin Funai Minseok Kwon and WendiHeinzelman 2012 Cloud-Vision Real-time face recognition using a mobile-cloudlet-cloud acceleration architecture In Computers and Communications(ISCC) 2012 IEEE Symposium on IEEE 000059ndash000066

[35] Xiaoli Wang Aakanksha Chowdhery andMung Chiang 2016 SkyEyes adaptivevideo streaming from UAVs In Proceedings of the 3rd Workshop on Hot Topics inWireless ACM 2ndash6

[36] Shanhe Yi Zijiang Hao Zhengrui Qin andn Li 2015 Fog Computing Plat-form and Applications In Hot Topics in Web Systems and Technologies (HotWeb)2015 ird IEEE Workshop on IEEE 73ndash78

[37] Shanhe Yi Cheng Li and n Li 2015 A Survey of Fog Computing ConceptsApplications and Issues In Proceedings of the 2015 Workshop on Mobile Big DataMobidata rsquo15 ACM 37ndash42

[38] Shanhe Yi Zhengrui Qin andn Li 2015 Security and privacy issues of fogcomputing A survey InWireless Algorithms Systems and Applications Springer685ndash695

[39] Matei Zaharia Mosharaf Chowdhury Michael J Franklin Sco Shenker and IonStoica 2010 Spark cluster computing with working sets HotCloud 10 (2010)10ndash10

[40] Haoyu Zhang Ganesh Ananthanarayanan Peter Bodik Mahai PhiliposeParamvir Bahl and Michael J Freedman 2017 Live Video Analytics at Scale withApproximation and Delay-Tolerance In 14th USENIX Symposium on NetworkedSystems Design and Implementation (NSDI 17) USENIX Association BostonMA 377ndash392 hpswwwusenixorgconferencensdi17technical-sessionspresentationzhang

[41] Q Zhang X Zhang Q Zhang W Shi and H Zhong 2016 Firework Big DataSharing and Processing in Collaborative Edge Environment In 2016 Fourth IEEEWorkshop on Hot Topics in Web Systems and Technologies (HotWeb) 20ndash25 DOIhpsdoiorg101109HotWeb201612

[42] Tan Zhang Aakanksha Chowdhery Paramvir Victor Bahl Kyle Jamieson andSuman Banerjee 2015 e design and implementation of a wireless videosurveillance system In Proceedings of the 21st Annual International Conferenceon Mobile Computing and Networking ACM 426ndash438

  • Abstract
  • 1 Introduction
  • 2 Background and Motivation
    • 21 Edge Computing Network
    • 22 Serverless Architecture
    • 23 Video Edge Analytics for Public Safety
      • 3 LAVEA System Design
        • 31 Design Goals
        • 32 System Overview
        • 33 Edge Computing Services
          • 4 Edge-front Offloading
            • 41 Task Offloading System Model and Problem Formulation
            • 42 Optimization Solver
            • 43 Prioritizing Edge Task Queue
            • 44 Workload Optimizer
              • 5 Inter-edge Collaboration
                • 51 Motivation and Challenges
                • 52 Inter-Edge Task Placement Schemes
                  • 6 System Implementation and Performance Evaluation
                    • 61 Implementation Details
                    • 62 Evaluation Setup
                    • 63 Task Profiler
                    • 64 Offloading Task Selection
                    • 65 Edge-front Task Queue Prioritizing
                    • 66 Inter-Edge Collaboration
                      • 7 Related Work
                        • 71 Distributed Data Processing
                        • 72 Computation Offloading
                          • 8 Discussions and Limitations
                          • 9 Conclusion
                          • References
Page 12: LAVEA: Latency-aware Video Analytics on Edge Computing Platformweisongshi.org/papers/yi17-LAVEA.pdf · 2020. 8. 19. · LAVEA: Latency-aware Video Analytics on Edge Computing Platform

SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA S Yi et al

takes about 936 minutes to nish all the tasks which is the bestresult among the three schemes We infer that the third schemewill further improve the task complete time if more tough networkconditions and workloads are considered

STTF SQLF SSLF

050

10

015

0N

um

ber

of

task

palc

ed

0task

0task

0task

4tasks

Edge node 1Edge node 2Edge node 3

Figure 17 Numbers of tasks placed by the edge-front node

7 RELATEDWORKe emergence of edge computing has drawn aentions due to itscapabilities to reshape the land surface of IoTs mobile computingand cloud computing [6 14 32 33 36ndash38] Satyanarayanan [29]has briefed the origin of edge computing also known as fog comput-ing [4] cloudlet [28] mobile edge computing [24] and so on Herewe will review several relevant research elds towards video edgeanalytics including distributed data processing and computationooading in various computing paradigms

71 Distributed Data ProcessingDistributed data processing has close relationship to the edge an-alytics in the sense that those data processing platforms [9 39]and underlying techniques [16 23 27] can be easily deployed on acluster of edge nodes In this paper we pay specially aentions todistributed imagevideo data processing systems VideoStorm[40]made insightful observation on vision-related algorithms and pro-posed resource-quality trade-o with multi-dimensional congu-rations (eg video resolution frame rate sampling rate slidingwindow size etc) e resource-quality proles are generated of-ine and a online scheduler is built to allocate resources to queriesto optimize the utility of quality and latency eir work is comple-mentary to ours in that we do not consider the trade-o betweenquality and latency goals via adaptive congurations Vigil [42] isa wireless video surveillance system that leveraged edge comput-ing nodes with emphasis on the content-aware frame selections ina scenario where multiple web cameras are at the same locationto optimize the bandwidth utilization which is orthogonal to theproblems we have addressed here Firework [41] is a computingparadigm for big data processing in collaborative edge environmentwhich is complementary to our work in terms of shared data viewand programming interface

While there should be more on-going eorts for investigating theadaptation improvement and optimization of existing distributed

data processing techniques on edge computing platform we focusmore on the taskapplication-level queue management and sched-uling and leave all the underlying resource negotiating processscheduling to the container cluster engine

72 Computation OloadingComputation Ooading (aka Cyber foraging [28]) has been pro-posed to improve resource utilization response time and energyconsumption in various computing environments [7 13 21 31]Work [17] has quantied the impact of edge computing on mobileapplications and found that edge computing can improve responsetime and energy consumption signicantly for mobile devicesthrough ooading via both WiFi and LTE networks Mocha [34]has investigated how a two-stage face recognition task from mobiledevice can be accelerated by cloudlet and cloud In their designclients simply capture image and sends to cloudlet e optimaltask partition can be easily achieved as it has only two stages InLAVEA our application is more complicated in multiple stages andwe leverage client-edge ooading and other techniques to improvethe resource utilization and optimize the response time

8 DISCUSSIONS AND LIMITATIONSIn this section we will discuss alternative design options point outcurrent limitations and future work that can improve the system

Measurement-based Oloading In this paper we are using ameasurement-based ooading (static ooading) ie the ooadingdecisions are based on the outcome of periodic measurements Weconsider this as one of the limitations of our implementations asstated in [15] and there are several dynamic computation ooadingschemes have been proposed [12] We are planning to improve themeasurement-based ooading in the future work

Video Streaming Our current data processing is image-basedwhich is one of the limitations of our implementation e inputis either in the format of image or in video stream which is readinto frames and sent out We believe by utilizing existing videostreaming techniques in between our system components for datasharing will further improve the system performance and openmore potential opportunities for optimization

Discovering EdgeNodes ere are dierent ways for the edge-front node to discover the available edge nodes nearby For exampleevery edge node intending to serve as a collaborator may open adesignated port so that the edge-front node can periodically scanthe network and discover the available edge nodes is is calledthe ldquopull-basedrdquo method In contrast there is also a ldquopush-basedrdquomethod in which the edge-front node opens a designated port andevery edge node intending to serve as a collaborator will register tothe edge-front node When the network is in a large scale the pull-based method usually performs poorly because the edge-front nodemay not be able to discover an available edge node in a short timeFor this reason the edge node discovery is implemented in a push-based method which guarantees good performance regardless ofthe network scale

9 CONCLUSIONIn this paper we have investigated providing video analytic servicesto latency-sensitive applications in edge computing environment

LAVEA Latency-aware Video Analytics on Edge Computing Platform SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA

As a result we have built LAVEA a low-latency video edge analyticsystem which collaborates nearby client edge and remote cloudnodes and transfers video feeds into semantic information at placescloser to the users in early stages We have utilized an edge-frontdesign and formulated an optimization problem for ooading taskselection and prioritized task queue to minimize the response timeOur result indicated that by ooading tasks to the closest edgenode the client-edge conguration has speedup range from 13x to4x (12x to 17x) against running in local (client-cloud) under variousnetwork conditions and workloads In case of a saturating workloadon the front edge node we have proposed and compared varioustask placement schemes that are tailed for inter-edge collaboratione proposed prediction-based shortest scheduling latency rsttask placement scheme considers both the transmission time ofthe task and the waiting time in the queue and outputs overallperformance beer than the other schemes

REFERENCES[1] Amazon Web Service 2017 AWS LambdaEdge hpdocsawsamazoncom

lambdalatestdglambda-edgehtml (2017)[2] Christos-Nikolaos E Anagnostopoulos Ioannis E Anagnostopoulos Ioannis D

Psoroulas Vassili Loumos and Eleherios Kayafas 2008 License plate recog-nition from still images and video sequences A survey IEEE Transactions onintelligent transportation systems 9 3 (2008) 377ndash391

[3] KR Baker 1990 Scheduling groups of jobs in the two-machine ow shop Math-ematical and Computer Modelling 13 3 (1990) 29ndash36

[4] Flavio Bonomi Rodolfo Milito Jiang Zhu and Sateesh Addepalli 2012 Fogcomputing and its role in the internet of things In Proceedings of the rst editionof the MCC workshop on Mobile cloud computing ACM 13ndash16

[5] Peter Brucker Bernd Jurisch and Bernd Sievers 1994 A branch and boundalgorithm for the job-shop scheduling problem Discrete applied mathematics 491 (1994) 107ndash127

[6] Yu Cao Songqing Chen Peng Hou and Donald Brown 2015 FAST A fog com-puting assisted distributed analytics system to monitor fall for stroke mitigationIn Networking Architecture and Storage (NAS) 2015 IEEE International Conferenceon IEEE 2ndash11

[7] Eduardo Cuervo Aruna Balasubramanian Dae-ki Cho Alec Wolman StefanSaroiu Ranveer Chandra and Paramvir Bahl 2010 MAUI Making SmartphonesLast Longer with Code Ooad In Proceedings of the 8th International Conferenceon Mobile Systems Applications and Services (MobiSys rsquo10) ACM New York NYUSA 49ndash62 DOIhpsdoiorg10114518144331814441

[8] Eyal de Lara Carolina S Gomes Steve Langridge S Hossein Mortazavi andMeysam Roodi 2016 Hierarchical Serverless Computing for the Mobile EdgeIn Edge Computing (SEC) IEEEACM Symposium on IEEE 109ndash110

[9] Jerey Dean and Sanjay Ghemawat 2008 MapReduce simplied data processingon large clusters Commun ACM 51 1 (2008) 107ndash113

[10] Shan Du Mahmoud Ibrahim Mohamed Shehata and Wael Badawy 2013 Auto-matic license plate recognition (ALPR) A state-of-the-art review IEEE Transac-tions on circuits and systems for video technology 23 2 (2013) 311ndash325

[11] Sadjad Fouladi Riad S Wahby Brennan Shackle Karthikeyan Vasuki Balasubra-maniam William Zeng Rahul Bhalerao Anirudh Sivaraman George Porter andKeith Winstein 2017 Encoding Fast and Slow Low-Latency Video ProcessingUsing ousands of Tiny reads In 14th USENIX Symposium on NetworkedSystems Design and Implementation (NSDI 17) USENIX Association BostonMA 363ndash376 hpswwwusenixorgconferencensdi17technical-sessionspresentationfouladi

[12] Wei Gao Yong Li Haoyang Lu Ting Wang and Cong Liu 2014 On exploitingdynamic execution paerns for workload ooading in mobile cloud applicationsIn Network Protocols (ICNP) 2014 IEEE 22nd International Conference on IEEE1ndash12

[13] Mark S Gordon Davoud Anoushe Jamshidi Sco A Mahlke Zhuoqing Mor-ley Mao and Xu Chen 2012 COMET Code Ooad by Migrating ExecutionTransparently In OSDI Vol 12 93ndash106

[14] Zijiang Hao Ed Novak Shanhe Yi andn Li 2017 Challenges and SowareArchitecture for Fog Computing Internet Computing (2017)

[15] Mohammed A Hassan Kshitiz Bhaarai Qi Wei and Songqing Chen 2014POMAC Properly Ooading Mobile Applications to Clouds In 6th USENIXWorkshop on Hot Topics in Cloud Computing (HotCloud 14)

[16] Benjamin Hindman Andy Konwinski Matei Zaharia Ali Ghodsi Anthony DJoseph Randy H Katz Sco Shenker and Ion Stoica 2011 Mesos A Platformfor Fine-Grained Resource Sharing in the Data Center In NSDI Vol 11 22ndash22

[17] Wenlu Hu Ying Gao Kiryong Ha Junjue Wang Brandon Amos Zhuo ChenPadmanabhan Pillai andMahadev Satyanarayanan 2016antifying the Impactof Edge Computing onMobile Applications In Proceedings of the 7th ACM SIGOPSAsia-Pacic Workshop on Systems ACM 5

[18] IBM 2017 Apache OpenWhisk hpopenwhiskorg (April 2017)[19] Selmer Martin Johnson 1954 Optimal two-and three-stage production schedules

with setup times included Naval research logistics quarterly 1 1 (1954) 61ndash68[20] Eric Jonas Shivaram Venkataraman Ion Stoica and Benjamin Recht 2017

Occupy the Cloud Distributed computing for the 99 arXiv preprintarXiv170204024 (2017)

[21] Ryan Newton Sivan Toledo Lewis Girod Hari Balakrishnan and Samuel Mad-den 2009 Wishbone Prole-based Partitioning for Sensornet Applications InNSDI Vol 9 395ndash408

[22] OpenALPR 2017 OpenALPR ndash Automatic License Plate Recognition hpwwwopenalprcom (April 2017)

[23] Kay Ousterhout Patrick Wendell Matei Zaharia and Ion Stoica 2013 Sparrowdistributed low latency scheduling In Proceedings of the Twenty-Fourth ACMSymposium on Operating Systems Principles ACM 69ndash84

[24] M Patel B Naughton C Chan N Sprecher S Abeta A Neal and others 2014Mobile-edge computing introductory technical white paper White Paper Mobile-edge Computing (MEC) industry initiative (2014)

[25] Brad Philip and Paul Updike 2001 Caltech Vision Group 2001 testing databasehpwwwvisioncaltecheduhtml-lesarchivehtml (2001)

[26] Kari Pulli Anatoly Baksheev Kirill Kornyakov and Victor Eruhimov 2012 Real-time computer vision with OpenCV Commun ACM 55 6 (2012) 61ndash69

[27] Je Rasley Konstantinos Karanasos Srikanth Kandula Rodrigo Fonseca MilanVojnovic and Sriram Rao 2016 Ecient queue management for cluster sched-uling In Proceedings of the Eleventh European Conference on Computer SystemsACM 36

[28] Mahadev Satyanarayanan 2001 Pervasive computing Vision and challengesIEEE Personal communications 8 4 (2001) 10ndash17

[29] Mahadev Satyanarayanan 2017 e Emergence of Edge Computing Computer50 1 (2017) 30ndash39

[30] Mahadev Satyanarayanan Pieter Simoens Yu Xiao Padmanabhan Pillai ZhuoChen Kiryong Ha Wenlu Hu and Brandon Amos 2015 Edge analytics in theinternet of things IEEE Pervasive Computing 14 2 (2015) 24ndash31

[31] Cong Shi Karim Habak Pranesh Pandurangan Mostafa Ammar Mayur Naikand Ellen Zegura 2014 Cosmos computation ooading as a service for mobiledevices In Proceedings of the 15th ACM international symposium on Mobile adhoc networking and computing ACM 287ndash296

[32] W Shi J Cao Q Zhang Y Li and L Xu 2016 Edge Computing Visionand Challenges IEEE Internet of ings Journal 3 5 (Oct 2016) 637ndash646 DOIhpsdoiorg101109JIOT20162579198

[33] Weisong Shi and Schahram Dustdar 2016 e Promise of Edge ComputingComputer 49 5 (2016) 78ndash81

[34] Tolga Soyata Rajani Muraleedharan Colin Funai Minseok Kwon and WendiHeinzelman 2012 Cloud-Vision Real-time face recognition using a mobile-cloudlet-cloud acceleration architecture In Computers and Communications(ISCC) 2012 IEEE Symposium on IEEE 000059ndash000066

[35] Xiaoli Wang Aakanksha Chowdhery andMung Chiang 2016 SkyEyes adaptivevideo streaming from UAVs In Proceedings of the 3rd Workshop on Hot Topics inWireless ACM 2ndash6

[36] Shanhe Yi Zijiang Hao Zhengrui Qin andn Li 2015 Fog Computing Plat-form and Applications In Hot Topics in Web Systems and Technologies (HotWeb)2015 ird IEEE Workshop on IEEE 73ndash78

[37] Shanhe Yi Cheng Li and n Li 2015 A Survey of Fog Computing ConceptsApplications and Issues In Proceedings of the 2015 Workshop on Mobile Big DataMobidata rsquo15 ACM 37ndash42

[38] Shanhe Yi Zhengrui Qin andn Li 2015 Security and privacy issues of fogcomputing A survey InWireless Algorithms Systems and Applications Springer685ndash695

[39] Matei Zaharia Mosharaf Chowdhury Michael J Franklin Sco Shenker and IonStoica 2010 Spark cluster computing with working sets HotCloud 10 (2010)10ndash10

[40] Haoyu Zhang Ganesh Ananthanarayanan Peter Bodik Mahai PhiliposeParamvir Bahl and Michael J Freedman 2017 Live Video Analytics at Scale withApproximation and Delay-Tolerance In 14th USENIX Symposium on NetworkedSystems Design and Implementation (NSDI 17) USENIX Association BostonMA 377ndash392 hpswwwusenixorgconferencensdi17technical-sessionspresentationzhang

[41] Q Zhang X Zhang Q Zhang W Shi and H Zhong 2016 Firework Big DataSharing and Processing in Collaborative Edge Environment In 2016 Fourth IEEEWorkshop on Hot Topics in Web Systems and Technologies (HotWeb) 20ndash25 DOIhpsdoiorg101109HotWeb201612

[42] Tan Zhang Aakanksha Chowdhery Paramvir Victor Bahl Kyle Jamieson andSuman Banerjee 2015 e design and implementation of a wireless videosurveillance system In Proceedings of the 21st Annual International Conferenceon Mobile Computing and Networking ACM 426ndash438

  • Abstract
  • 1 Introduction
  • 2 Background and Motivation
    • 21 Edge Computing Network
    • 22 Serverless Architecture
    • 23 Video Edge Analytics for Public Safety
      • 3 LAVEA System Design
        • 31 Design Goals
        • 32 System Overview
        • 33 Edge Computing Services
          • 4 Edge-front Offloading
            • 41 Task Offloading System Model and Problem Formulation
            • 42 Optimization Solver
            • 43 Prioritizing Edge Task Queue
            • 44 Workload Optimizer
              • 5 Inter-edge Collaboration
                • 51 Motivation and Challenges
                • 52 Inter-Edge Task Placement Schemes
                  • 6 System Implementation and Performance Evaluation
                    • 61 Implementation Details
                    • 62 Evaluation Setup
                    • 63 Task Profiler
                    • 64 Offloading Task Selection
                    • 65 Edge-front Task Queue Prioritizing
                    • 66 Inter-Edge Collaboration
                      • 7 Related Work
                        • 71 Distributed Data Processing
                        • 72 Computation Offloading
                          • 8 Discussions and Limitations
                          • 9 Conclusion
                          • References
Page 13: LAVEA: Latency-aware Video Analytics on Edge Computing Platformweisongshi.org/papers/yi17-LAVEA.pdf · 2020. 8. 19. · LAVEA: Latency-aware Video Analytics on Edge Computing Platform

LAVEA Latency-aware Video Analytics on Edge Computing Platform SEC rsquo17 October 12ndash14 2017 San Jose Silicon Valley CA USA

As a result we have built LAVEA a low-latency video edge analyticsystem which collaborates nearby client edge and remote cloudnodes and transfers video feeds into semantic information at placescloser to the users in early stages We have utilized an edge-frontdesign and formulated an optimization problem for ooading taskselection and prioritized task queue to minimize the response timeOur result indicated that by ooading tasks to the closest edgenode the client-edge conguration has speedup range from 13x to4x (12x to 17x) against running in local (client-cloud) under variousnetwork conditions and workloads In case of a saturating workloadon the front edge node we have proposed and compared varioustask placement schemes that are tailed for inter-edge collaboratione proposed prediction-based shortest scheduling latency rsttask placement scheme considers both the transmission time ofthe task and the waiting time in the queue and outputs overallperformance beer than the other schemes

REFERENCES[1] Amazon Web Service 2017 AWS LambdaEdge hpdocsawsamazoncom

lambdalatestdglambda-edgehtml (2017)[2] Christos-Nikolaos E Anagnostopoulos Ioannis E Anagnostopoulos Ioannis D

Psoroulas Vassili Loumos and Eleherios Kayafas 2008 License plate recog-nition from still images and video sequences A survey IEEE Transactions onintelligent transportation systems 9 3 (2008) 377ndash391

[3] KR Baker 1990 Scheduling groups of jobs in the two-machine ow shop Math-ematical and Computer Modelling 13 3 (1990) 29ndash36

[4] Flavio Bonomi Rodolfo Milito Jiang Zhu and Sateesh Addepalli 2012 Fogcomputing and its role in the internet of things In Proceedings of the rst editionof the MCC workshop on Mobile cloud computing ACM 13ndash16

[5] Peter Brucker Bernd Jurisch and Bernd Sievers 1994 A branch and boundalgorithm for the job-shop scheduling problem Discrete applied mathematics 491 (1994) 107ndash127

[6] Yu Cao Songqing Chen Peng Hou and Donald Brown 2015 FAST A fog com-puting assisted distributed analytics system to monitor fall for stroke mitigationIn Networking Architecture and Storage (NAS) 2015 IEEE International Conferenceon IEEE 2ndash11

[7] Eduardo Cuervo Aruna Balasubramanian Dae-ki Cho Alec Wolman StefanSaroiu Ranveer Chandra and Paramvir Bahl 2010 MAUI Making SmartphonesLast Longer with Code Ooad In Proceedings of the 8th International Conferenceon Mobile Systems Applications and Services (MobiSys rsquo10) ACM New York NYUSA 49ndash62 DOIhpsdoiorg10114518144331814441

[8] Eyal de Lara Carolina S Gomes Steve Langridge S Hossein Mortazavi andMeysam Roodi 2016 Hierarchical Serverless Computing for the Mobile EdgeIn Edge Computing (SEC) IEEEACM Symposium on IEEE 109ndash110

[9] Jerey Dean and Sanjay Ghemawat 2008 MapReduce simplied data processingon large clusters Commun ACM 51 1 (2008) 107ndash113

[10] Shan Du Mahmoud Ibrahim Mohamed Shehata and Wael Badawy 2013 Auto-matic license plate recognition (ALPR) A state-of-the-art review IEEE Transac-tions on circuits and systems for video technology 23 2 (2013) 311ndash325

[11] Sadjad Fouladi Riad S Wahby Brennan Shackle Karthikeyan Vasuki Balasubra-maniam William Zeng Rahul Bhalerao Anirudh Sivaraman George Porter andKeith Winstein 2017 Encoding Fast and Slow Low-Latency Video ProcessingUsing ousands of Tiny reads In 14th USENIX Symposium on NetworkedSystems Design and Implementation (NSDI 17) USENIX Association BostonMA 363ndash376 hpswwwusenixorgconferencensdi17technical-sessionspresentationfouladi

[12] Wei Gao Yong Li Haoyang Lu Ting Wang and Cong Liu 2014 On exploitingdynamic execution paerns for workload ooading in mobile cloud applicationsIn Network Protocols (ICNP) 2014 IEEE 22nd International Conference on IEEE1ndash12

[13] Mark S Gordon Davoud Anoushe Jamshidi Sco A Mahlke Zhuoqing Mor-ley Mao and Xu Chen 2012 COMET Code Ooad by Migrating ExecutionTransparently In OSDI Vol 12 93ndash106

[14] Zijiang Hao Ed Novak Shanhe Yi andn Li 2017 Challenges and SowareArchitecture for Fog Computing Internet Computing (2017)

[15] Mohammed A Hassan Kshitiz Bhaarai Qi Wei and Songqing Chen 2014POMAC Properly Ooading Mobile Applications to Clouds In 6th USENIXWorkshop on Hot Topics in Cloud Computing (HotCloud 14)

[16] Benjamin Hindman Andy Konwinski Matei Zaharia Ali Ghodsi Anthony DJoseph Randy H Katz Sco Shenker and Ion Stoica 2011 Mesos A Platformfor Fine-Grained Resource Sharing in the Data Center In NSDI Vol 11 22ndash22

[17] Wenlu Hu Ying Gao Kiryong Ha Junjue Wang Brandon Amos Zhuo ChenPadmanabhan Pillai andMahadev Satyanarayanan 2016antifying the Impactof Edge Computing onMobile Applications In Proceedings of the 7th ACM SIGOPSAsia-Pacic Workshop on Systems ACM 5

[18] IBM 2017 Apache OpenWhisk hpopenwhiskorg (April 2017)[19] Selmer Martin Johnson 1954 Optimal two-and three-stage production schedules

with setup times included Naval research logistics quarterly 1 1 (1954) 61ndash68[20] Eric Jonas Shivaram Venkataraman Ion Stoica and Benjamin Recht 2017

Occupy the Cloud Distributed computing for the 99 arXiv preprintarXiv170204024 (2017)

[21] Ryan Newton Sivan Toledo Lewis Girod Hari Balakrishnan and Samuel Mad-den 2009 Wishbone Prole-based Partitioning for Sensornet Applications InNSDI Vol 9 395ndash408

[22] OpenALPR 2017 OpenALPR ndash Automatic License Plate Recognition hpwwwopenalprcom (April 2017)

[23] Kay Ousterhout Patrick Wendell Matei Zaharia and Ion Stoica 2013 Sparrowdistributed low latency scheduling In Proceedings of the Twenty-Fourth ACMSymposium on Operating Systems Principles ACM 69ndash84

[24] M Patel B Naughton C Chan N Sprecher S Abeta A Neal and others 2014Mobile-edge computing introductory technical white paper White Paper Mobile-edge Computing (MEC) industry initiative (2014)

[25] Brad Philip and Paul Updike 2001 Caltech Vision Group 2001 testing databasehpwwwvisioncaltecheduhtml-lesarchivehtml (2001)

[26] Kari Pulli Anatoly Baksheev Kirill Kornyakov and Victor Eruhimov 2012 Real-time computer vision with OpenCV Commun ACM 55 6 (2012) 61ndash69

[27] Je Rasley Konstantinos Karanasos Srikanth Kandula Rodrigo Fonseca MilanVojnovic and Sriram Rao 2016 Ecient queue management for cluster sched-uling In Proceedings of the Eleventh European Conference on Computer SystemsACM 36

[28] Mahadev Satyanarayanan 2001 Pervasive computing Vision and challengesIEEE Personal communications 8 4 (2001) 10ndash17

[29] Mahadev Satyanarayanan 2017 e Emergence of Edge Computing Computer50 1 (2017) 30ndash39

[30] Mahadev Satyanarayanan Pieter Simoens Yu Xiao Padmanabhan Pillai ZhuoChen Kiryong Ha Wenlu Hu and Brandon Amos 2015 Edge analytics in theinternet of things IEEE Pervasive Computing 14 2 (2015) 24ndash31

[31] Cong Shi Karim Habak Pranesh Pandurangan Mostafa Ammar Mayur Naikand Ellen Zegura 2014 Cosmos computation ooading as a service for mobiledevices In Proceedings of the 15th ACM international symposium on Mobile adhoc networking and computing ACM 287ndash296

[32] W Shi J Cao Q Zhang Y Li and L Xu 2016 Edge Computing Visionand Challenges IEEE Internet of ings Journal 3 5 (Oct 2016) 637ndash646 DOIhpsdoiorg101109JIOT20162579198

[33] Weisong Shi and Schahram Dustdar 2016 e Promise of Edge ComputingComputer 49 5 (2016) 78ndash81

[34] Tolga Soyata Rajani Muraleedharan Colin Funai Minseok Kwon and WendiHeinzelman 2012 Cloud-Vision Real-time face recognition using a mobile-cloudlet-cloud acceleration architecture In Computers and Communications(ISCC) 2012 IEEE Symposium on IEEE 000059ndash000066

[35] Xiaoli Wang Aakanksha Chowdhery andMung Chiang 2016 SkyEyes adaptivevideo streaming from UAVs In Proceedings of the 3rd Workshop on Hot Topics inWireless ACM 2ndash6

[36] Shanhe Yi Zijiang Hao Zhengrui Qin andn Li 2015 Fog Computing Plat-form and Applications In Hot Topics in Web Systems and Technologies (HotWeb)2015 ird IEEE Workshop on IEEE 73ndash78

[37] Shanhe Yi Cheng Li and n Li 2015 A Survey of Fog Computing ConceptsApplications and Issues In Proceedings of the 2015 Workshop on Mobile Big DataMobidata rsquo15 ACM 37ndash42

[38] Shanhe Yi Zhengrui Qin andn Li 2015 Security and privacy issues of fogcomputing A survey InWireless Algorithms Systems and Applications Springer685ndash695

[39] Matei Zaharia Mosharaf Chowdhury Michael J Franklin Sco Shenker and IonStoica 2010 Spark cluster computing with working sets HotCloud 10 (2010)10ndash10

[40] Haoyu Zhang Ganesh Ananthanarayanan Peter Bodik Mahai PhiliposeParamvir Bahl and Michael J Freedman 2017 Live Video Analytics at Scale withApproximation and Delay-Tolerance In 14th USENIX Symposium on NetworkedSystems Design and Implementation (NSDI 17) USENIX Association BostonMA 377ndash392 hpswwwusenixorgconferencensdi17technical-sessionspresentationzhang

[41] Q Zhang X Zhang Q Zhang W Shi and H Zhong 2016 Firework Big DataSharing and Processing in Collaborative Edge Environment In 2016 Fourth IEEEWorkshop on Hot Topics in Web Systems and Technologies (HotWeb) 20ndash25 DOIhpsdoiorg101109HotWeb201612

[42] Tan Zhang Aakanksha Chowdhery Paramvir Victor Bahl Kyle Jamieson andSuman Banerjee 2015 e design and implementation of a wireless videosurveillance system In Proceedings of the 21st Annual International Conferenceon Mobile Computing and Networking ACM 426ndash438

  • Abstract
  • 1 Introduction
  • 2 Background and Motivation
    • 21 Edge Computing Network
    • 22 Serverless Architecture
    • 23 Video Edge Analytics for Public Safety
      • 3 LAVEA System Design
        • 31 Design Goals
        • 32 System Overview
        • 33 Edge Computing Services
          • 4 Edge-front Offloading
            • 41 Task Offloading System Model and Problem Formulation
            • 42 Optimization Solver
            • 43 Prioritizing Edge Task Queue
            • 44 Workload Optimizer
              • 5 Inter-edge Collaboration
                • 51 Motivation and Challenges
                • 52 Inter-Edge Task Placement Schemes
                  • 6 System Implementation and Performance Evaluation
                    • 61 Implementation Details
                    • 62 Evaluation Setup
                    • 63 Task Profiler
                    • 64 Offloading Task Selection
                    • 65 Edge-front Task Queue Prioritizing
                    • 66 Inter-Edge Collaboration
                      • 7 Related Work
                        • 71 Distributed Data Processing
                        • 72 Computation Offloading
                          • 8 Discussions and Limitations
                          • 9 Conclusion
                          • References