Top Banner
1262 IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 62, NO. 2, FEBRUARY 2015 A Hardware Platform for Evaluating Low-Energy Multiprocessor Embedded Systems Based on COTS Devices Mohammad Salehi and Alireza Ejlali Abstract Embedded systems are usually energy con- strained. Moreover, in these systems, increased pro- ductivity and reduced time to market are essential for product success. To design complex embedded systems while reducing the development time and cost, there is a great tendency to use commercial off-the-shelf (“COTS”) devices. At system level, dynamic voltage and frequency scaling (DVFS) is one of the most effective techniques for energy reduction. Nonetheless, many widely used COTS processors either do not have DVFS or apply DVFS only to processor cores. In this paper, an easy-to-implement COTS-based evaluation platform for low-energy embedded systems is presented. To achieve energy saving, DVFS is provided for the whole microcontroller (including core, phase-locked loop, memory, and I/O). In addition, facilities are provided for experimenting with fault-tolerance tech- niques. The platform is equipped with energy measurement and debugging equipment. Physical experiments show that applying DVFS on the whole microcontroller provides up to 47% and 12% energy saving compared with the sole use of dynamic power management and applying DVFS only on the core, respectively. Although the platform is designed for ARM-based embedded systems, our approach is general and can be applied to other types of systems. Index TermsEmbedded systems, energy management, hardware platform. I. I NTRODUCTION E mbedded systems are ubiquitous, and demand for these systems is growing progressively. A wide range of em- bedded systems are battery operated. As, for many of these systems, there is no possibility of frequently charging or re- placing their batteries, they are highly energy constrained [1]– [3]. Therefore, for these systems, low energy consumption has become one of the major design objectives. Examples include mobile robots and handheld devices such as personal digital assistants, cell phones, and portable medical care devices. Fur- thermore, the complexity of embedded systems is increasing as the number of parts and the number and types of interactions among them are increasing [3], [4]. Therefore, embedded sys- tem designers are always conducted at the request of designing complex embedded systems with several design objectives. Manuscript received October 27, 2013; revised March 23, 2014 and June 15, 2014; accepted July 21, 2014. Date of publication August 26, 2014; date of current version January 7, 2015. The authors are with the Department of Computer Engineering, Sharif University of Technology, Tehran 11365-11155, Iran (e-mail: [email protected]; [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TIE.2014.2352215 In dealing with today’s highly competitive embedded sys- tems markets and time-to-market pressure and in order to deliver correct-the-first-time products with multiple system re- quirements, the use of commercial off-the-shelf (COTS) de- vices [3], [5]–[7] is very beneficial in designing embedded systems. Some vendors offer reconfigurable hardware solu- tions to accelerate the design process and provide a variety of programmable logic device (PLD)-based evaluation kits (e.g., Xilinx [8] and many others). However, instead of focusing on embedded systems, these platforms allow to functionally test the SOC or ASIC devices to be produced. Embedded systems usually consist of a microcontroller that contains a microprocessor integrated with memory elements and periph- erals in a single chip [4]–[7]. Reference [5] has reported a laboratory activity on a microcontroller-based platform. Refer- ence [25] has presented a prototyping platform for ARM-based embedded systems. However, these platforms do not provide facilities to experiment with energy management techniques. Reference [23] has presented a platform for dynamic voltage and frequency scaling (DVFS) [11] in an ARM-based proces- sor. However, this work exploits DVFS only for the processor (and not for the other parts, e.g., phase-locked loop (PLL), memory, and I/O). In this paper, to meet the design requirements of multiob- jective embedded systems, we propose a hardware platform for experimenting with energy management techniques (i.e., dynamic power management (DPM) [12] and DVFS) (see Section III) and fault-tolerance techniques (see Section VI). Compared with previous related works (that proposed plat- forms for embedded systems), our platform: 1) provides DVFS capability for the microcontrollers, in- cluding not only the processor cores but also PLL, mem- ory, and I/O; it should be noted that many existing designs either do not have DVFS or apply DVFS only to processor cores [11], [13], [14], [23], whereas our study in this paper (see Section V) shows that applying DVFS to PLL, memory, and I/O is quite effective; 2) includes circuitry to accurately and separately measure energy/power consumption of different parts of the mi- crocontroller, including the processor core, PLL, mem- ory, and I/O; this provides the ability to determine the most energy-consuming part for a given application; 3) is general and based on an ARM-based COTS micro- controller; hence, it can be used for a wide range of existing microcontrollers (e.g., [13], [14], and [18]–[20]) and many other COTS devices. 0278-0046 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
8
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 06883169.pdf

1262 IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 62, NO. 2, FEBRUARY 2015

A Hardware Platform for Evaluating Low-EnergyMultiprocessor Embedded Systems

Based on COTS DevicesMohammad Salehi and Alireza Ejlali

Abstract—Embedded systems are usually energy con-strained. Moreover, in these systems, increased pro-ductivity and reduced time to market are essential forproduct success. To design complex embedded systemswhile reducing the development time and cost, there is agreat tendency to use commercial off-the-shelf (“COTS”)devices. At system level, dynamic voltage and frequencyscaling (DVFS) is one of the most effective techniques forenergy reduction. Nonetheless, many widely used COTSprocessors either do not have DVFS or apply DVFS onlyto processor cores. In this paper, an easy-to-implementCOTS-based evaluation platform for low-energy embeddedsystems is presented. To achieve energy saving, DVFSis provided for the whole microcontroller (including core,phase-locked loop, memory, and I/O). In addition, facilitiesare provided for experimenting with fault-tolerance tech-niques. The platform is equipped with energy measurementand debugging equipment. Physical experiments show thatapplying DVFS on the whole microcontroller provides up to47% and 12% energy saving compared with the sole useof dynamic power management and applying DVFS only onthe core, respectively. Although the platform is designed forARM-based embedded systems, our approach is generaland can be applied to other types of systems.

Index Terms—Embedded systems, energy management,hardware platform.

I. INTRODUCTION

Embedded systems are ubiquitous, and demand for thesesystems is growing progressively. A wide range of em-

bedded systems are battery operated. As, for many of thesesystems, there is no possibility of frequently charging or re-placing their batteries, they are highly energy constrained [1]–[3]. Therefore, for these systems, low energy consumption hasbecome one of the major design objectives. Examples includemobile robots and handheld devices such as personal digitalassistants, cell phones, and portable medical care devices. Fur-thermore, the complexity of embedded systems is increasing asthe number of parts and the number and types of interactionsamong them are increasing [3], [4]. Therefore, embedded sys-tem designers are always conducted at the request of designingcomplex embedded systems with several design objectives.

Manuscript received October 27, 2013; revised March 23, 2014 andJune 15, 2014; accepted July 21, 2014. Date of publication August 26,2014; date of current version January 7, 2015.

The authors are with the Department of Computer Engineering,Sharif University of Technology, Tehran 11365-11155, Iran (e-mail:[email protected]; [email protected]).

Color versions of one or more of the figures in this paper are availableonline at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TIE.2014.2352215

In dealing with today’s highly competitive embedded sys-tems markets and time-to-market pressure and in order todeliver correct-the-first-time products with multiple system re-quirements, the use of commercial off-the-shelf (COTS) de-vices [3], [5]–[7] is very beneficial in designing embeddedsystems. Some vendors offer reconfigurable hardware solu-tions to accelerate the design process and provide a variety ofprogrammable logic device (PLD)-based evaluation kits (e.g.,Xilinx [8] and many others). However, instead of focusingon embedded systems, these platforms allow to functionallytest the SOC or ASIC devices to be produced. Embeddedsystems usually consist of a microcontroller that contains amicroprocessor integrated with memory elements and periph-erals in a single chip [4]–[7]. Reference [5] has reported alaboratory activity on a microcontroller-based platform. Refer-ence [25] has presented a prototyping platform for ARM-basedembedded systems. However, these platforms do not providefacilities to experiment with energy management techniques.Reference [23] has presented a platform for dynamic voltageand frequency scaling (DVFS) [11] in an ARM-based proces-sor. However, this work exploits DVFS only for the processor(and not for the other parts, e.g., phase-locked loop (PLL),memory, and I/O).

In this paper, to meet the design requirements of multiob-jective embedded systems, we propose a hardware platformfor experimenting with energy management techniques (i.e.,dynamic power management (DPM) [12] and DVFS) (seeSection III) and fault-tolerance techniques (see Section VI).

Compared with previous related works (that proposed plat-forms for embedded systems), our platform:

1) provides DVFS capability for the microcontrollers, in-cluding not only the processor cores but also PLL, mem-ory, and I/O; it should be noted that many existing designseither do not have DVFS or apply DVFS only to processorcores [11], [13], [14], [23], whereas our study in thispaper (see Section V) shows that applying DVFS to PLL,memory, and I/O is quite effective;

2) includes circuitry to accurately and separately measureenergy/power consumption of different parts of the mi-crocontroller, including the processor core, PLL, mem-ory, and I/O; this provides the ability to determine themost energy-consuming part for a given application;

3) is general and based on an ARM-based COTS micro-controller; hence, it can be used for a wide range ofexisting microcontrollers (e.g., [13], [14], and [18]–[20])and many other COTS devices.

0278-0046 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Page 2: 06883169.pdf

SALEHI AND EJLALI: HARDWARE PLATFORM TO EVALUATE EMBEDDED SYSTEMS BASED ON COTS DEVICES 1263

Another advantage of the proposed platform is that it issuitable for research into energy management techniques inparallel processing. Since the proposed platform is general andis capable of implementing various design techniques and sinceit has the capability of parallel processing (because of the useof two ARM7-based and one AVR-based processors that canoperate in parallel), the proposed platform can be useful foranalyzing many design techniques (e.g., [1], [2], [12], and [22]),which exploit parallelism in energy management.

Furthermore, we made new observations in our experimentsthat could provide useful information for embedded systemdesigners. These are the five observations.

1) The high-to-low voltage scaling delay is greater than thelow-to-high delay (about 45% for the processor core andPLL and about 110% for memory and I/O).

2) Voltage and frequency scaling is very effective in re-ducing power consumption not only for the processorcore but also for the other parts of the microcontroller,including PLL, memory, and I/O.

3) Although PLL, memory, and I/O have less power con-sumption compared with the processor core, they havecomparable energy consumption to that of the core.

4) Although PLL has a very small contribution in the totalpower consumption, as it is always operational, its energyconsumption is comparable with that of the others.

5) Applying DVFS on the whole microcontroller results in aconsiderable energy savings compared with the sole useof DPM or applying DVFS only on the processor core.

The remainder of this paper is organized as follows. InSection II, the architecture of the proposed hardware platformis described. The proposed energy management units and tech-niques are represented in Section III. In Section IV the powermeasurement, debug, and test units are described. Experimentalresults are given in Section V. In Section VI, we explainthe capability of the proposed platform in experimenting withfault-tolerance techniques. Finally, we conclude this paper anddescribe future work in Section VII.

II. HARDWARE PLATFORM DESIGN

ARM7TDMI is the most widely used COTS processor incontemporary embedded systems because it is a low-cost, high-performance, and versatile processor [4], [6]. Many vendors(e.g., [9], [13], and [14]) combine the ARM7TDMI (hereafterARM7) processor with internal memory devices and a widerange of peripherals on a single chip to obtain a microcontroller.It is noteworthy that the computational power of ARM7 isquite sufficient for the majority of embedded applications. Forexample, ARM7 can easily execute all benchmarks in MiBenchbenchmark suite [1], [21]. ARM7 can also execute fairly com-plex operating systems (e.g., Real-Time Executive for Mul-tiprocessor Systems (RTEMS) [1], [26] and Keil RTX [27]).Nevertheless, for highly computation-intensive applications,the performance of ARM7 might not be adequate. In this case,it should be noted that our proposed platform is not inherentlydependent on ARM7. Indeed, any processor (e.g., i.MX27 [18]and PXA270 [20]) that allows changing operational frequency

Fig. 1. ARM7-based microcontroller architecture [9].

and its supply voltage that can vary in an allowed range can besimilarly used in our design.

A. Architecture Overview

Our design of the ARM7-based platform is founded on amember of AT91SAM7x series of microcontrollers [9]. Thearchitecture of the microcontroller series is shown in Fig. 1.The microcontroller is composed of an ARM7 processor core,a system controller, memory elements, and peripheral de-vices. Most of ARM7-based microcontrollers adopt a sim-ilar architecture, e.g., [18]–[20]. As shown in Fig. 1, themicrocontroller consists of Flash, ROM, and SRAM internalmemory devices connected via the memory controller, and awide range of peripherals, including universal synchronous/asynchronous receiver–transmitter (USART), serial peripheralinterface (SPI), analog-to-digital converter (ADC), universalserial bus (USB), Ethernet medium access control, controllerarea network (CAN), two-wire interface (TWI), synchronousserial controller (SSC), real-time timer (RTT), and pulsewidth-modulation controller (PWMC). Most I/O lines of the peripher-als are multiplexed with the parallel I/O (PIO) controller. EachPIO line may be assigned to a peripheral or used as general-purpose I/O. These features provide flexibility to designers andassure effective use of the components.

B. Platform Architecture

The architecture and physical implementation of thehardware platform are shown in Fig. 2(a) and (b), respectively.The platform contains two AT91SAM7x256 microcontrollersconnected via a bus. Based on the facilities provided byAT91SAM7x series, this bus can be easily configured as SPI,UART, CAN, or a 16-bit parallel bus. AT91SAM7x256 containsan ARM7TDMI processor with in-circuit emulation (ICE),debug communication channel support, 64-KB internal SRAM,and 256-KB internal Flash memory. Two controllable powersupplies are included in the board to provide power to the pe-ripherals and the processor core of each of the microcontrollers.The power supplies receive commands from the processors andcontrol the power applied to each part of the microcontrollers(see Section III-B). The use of separate supply voltages not onlyhelps conduct experiments with various DVFS schemes (wheredifferent supply voltages can be applied to each processorseparately) but also can be used to shut off one processorto switch into a single-processor configuration. We have alsoprovided the flexibility to users in choosing arbitrary DVFS orDPM schemes. The platform also is equipped with circuitryto measure the current drawn by the processor cores, PLLs,

Page 3: 06883169.pdf

1264 IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 62, NO. 2, FEBRUARY 2015

Fig. 2. Hardware platform. (a) Block diagram. (b) Implementation.

Flash memory devices, and I/O peripherals. By the use of themeasured current and the supply voltage of each part of themicrocontrollers (the voltages are set by the controllable powersupplies and reported to the measurement unit), the powerconsumption of each part is obtained. In addition, the executiontime of applications running by the processors is reported tomeasure the consumed energy. The measurement data are sentto the host computer through the data logging port. Two debug-ging ports [RS232 and Joint Test Action Group (JTAG) ports]provide debug capabilities for each of the microcontrollers.JTAG is also used for ICE (see Section IV-B) and fault injectionpurposes (see Section VI). After designing and evaluating thetarget system, the platform can be customized for a specificapplication.

III. ENERGY MANAGEMENT UNITS

To manage the energy consumption, DVFS [11] and DPM[12] have been effectively used. DVFS varies the components’voltage and, hence, frequency based on the system workloadand other run-time factors. DPM selectively turns off the sys-tem components when they are idle. AT91SAM7x (like manymicrocontrollers such as [18]–[20]) only supports DPM (onlycontrols the processor and peripheral clocks) and cannot exploitDVFS (does not provide variable supply voltage to its processorcore and peripherals). In the following sections, we first explainhow DPM can be employed (as an existing capability of mostCOTS microcontrollers), and then we introduce a methodologyfor adding DVFS capability to the microcontrollers that are notDVFS enabled.

A. DPM

The AT91SAM7x optimizes power consumption by con-trolling (enabling/disabling or scaling) the clock of pro-cessor and peripherals. The block diagram of the powermanagement controller is shown in Fig. 3(b). It usesthe clock outputs [see Fig. 3(a)] to supply clocks to the pro-cessor, USB, peripherals, and master clock, which is the clockprovided to the memory controller and all the peripherals.Table I summarizes the power management techniques, whichcan be used for different parts of the microcontroller. As shownin Fig. 3, the master clock can be generated through scaling oneof the clocks provided by the clock generator. A low-frequency

Fig. 3. Power management unit. (a) Clock generator. (b) Power man-agement controller [9].

TABLE IPOWER MANAGEMENT TECHNIQUES IN AT91SAM7X

clock can be provided to the whole device by selecting theslow clock, or power consumption of the PLL can be saved byselecting the main clock. The processor power consumption canbe reduced by switching off the processor clock when it entersto idle mode while waiting for an interrupt. After resetting thedevice or by any interrupt, the processor clock is automaticallyre-enabled. To reduce the power of each peripheral, the usercan individually enable and disable the peripheral clock bycontrolling the master clock on each peripheral by the use ofthe peripheral clock controller.

B. DVFS

DPM usually has only two operational states for systemscomponents, namely active and idle. The active power con-sumption of a clock-enabled component can be determinedby its operating frequency and supply voltage, as denoted byPActive, as [1]

PActive = ILeakageV + CeffV2f (1)

where ILeakageV is the static leakage power, and CeffV2f is the

dynamic power consumption (Ceff is the effective switched ca-pacitance). The dynamic power consumption can be efficientlyeliminated by putting the component into the idle state by dis-abling the clock [12]. With special hardware support and under

Page 4: 06883169.pdf

SALEHI AND EJLALI: HARDWARE PLATFORM TO EVALUATE EMBEDDED SYSTEMS BASED ON COTS DEVICES 1265

TABLE IIPOWER REQUIREMENTS IN AT91SAM7X

Fig. 4. Power supply setup. (a) Typical power supply. (b) Proposedcontrollable power supply.

software control, frequency scaling for system components canbe used to exploit idle times for power saving. The active energyconsumed by executing a task with N cycles at frequency f canbe computed as PActiveN/f . As a result, although frequencyscaling reduces the dynamic power consumption linearly, ithas no effect on the static leakage power consumption. Fur-thermore, the consumed static energy for a given computationincreases due to increasing the task execution time when reduc-ing the clock frequency. Hence, reduced energy consumptioncannot be achieved by frequency scaling alone. Frequencyscaling can be highly effective when employed in conjunc-tion with voltage scaling [1], [11]. Voltage scaling techniquesemploy software-controlled adjustable voltage regulators to setthe supply voltage of the processor core and clock-enabledcomponents. Software-controlled clock generators and voltageregulators allow the system to use DVFS. The basic idea behindDVFS techniques is to determine the minimum frequency thatsatisfies all timing constraints and then to adjust the lowestpossible voltage that allows this speed [1], [11].

According to (1) and assuming a linear relationship betweenfrequency and voltage [1], [11], the combined effects of voltageand frequency scaling result in decreasing the active powerconsumption proportional to V 3 and reducing the energy con-sumption proportional to V 2. Therefore, by scaling both thevoltage and frequency, the energy can be significantly reduced.However, this achievement does not come for free because atradeoff exists between speed and energy consumption [1].

The AT91SAM7x microcontrollers have six power supplypins and a built-in (fixed output) voltage regulator, allowing thedevice to support a 3.3-V single-supply mode. Power specifica-tions of the power supply pins are shown in Table II. Fig. 4(a)shows the schematic of a typical single-power-supply modewhere the 3.3-V power is supplied via a dc/dc voltage converterto VFLASH, VIO, and VIN. The input of the built-in voltageregulator is connected to the 3.3-V voltage source (i.e., the VINpin), and its output (i.e., the VOUT pin) supplies 1.8-V fixed

Fig. 5. Proposed controllable power supply schematic.

voltage for the VCORE and VPLL pins. As Table II shows, theUSB transceiver, Flash memory, and I/O lines power supply canrange from 3.0 to 3.6 V, and in addition, the processor core andPLL power supply can range from 1.65 to 1.95 V. This providesthe possibility for the device to vary the supply voltage ratherthan using just a single fixed voltage.

To provide voltage scaling capability for this device, thedc/dc converter and embedded voltage regulator in Fig. 4(a) isreplaced with a controllable power supply in Fig. 4(b) to feedthe power pins with variable voltages. As shown in Fig. 4(b),variable supply voltage is provided for the power inputs of themicrocontroller, except the embedded voltage regulator input,which remains unconnected, to disable the internal voltageregulator. The schematic of the proposed controllable powersupply to provide dynamically scalable power supply is shownin Fig. 5. In this architecture, an adjustable version of a low-dropout linear voltage regulator (e.g., LM1117) is used. Thisregulator can provide an output voltage from 1.25 to 13.8 Vwith exploiting only two external resistors (i.e., Rref and Radj

in Fig. 5). This device makes a 1.25-V reference voltage Vref

between the output Vout and the adjust pin. As shown in Fig. 5,this voltage is applied across the resistor Rref to produce aconstant current that flows through the adjustment resistor Radj

and fixes the output voltage Vout to the desired level as

Vout = VREF

(1 +

Radj

Rref

)+ IadjRadj. (2)

Based on (2), to set Vout to a new voltage level, we need tochange the adjustment resistor Radj. To provide the capabilityof dynamically adjusting the resistor, a digital potentiometer(e.g., AD8403) is used to provide a digitally controlled vari-able resistor that performs the same adjustment function asa potentiometer or variable resistor. As we aim at control-ling the voltage of the four power pins of AT91SAM7x256[see Fig. 4(a)], a digital potentiometer, which includes fourindependent variable resistors, is used. Each resistor canbe set separately by a digital code transferred into the de-vice. The code is loaded into the device via the standard three-wire SPI digital interface. The data bits clocked into the deviceare decoded to determine the resistor and its value.

In summary, to dynamically scale the supply voltage of apower pin of the microcontroller at run time, a digital codeindicating the resistor and its desired value is loaded by themicrocontroller into the digital potentiometer; after changingthe adjustment resistor, the voltage regulator’s output is scaledand set to the desired voltage value. Therefore, by the use ofthe proposed architecture, at run time, the microcontroller candynamically set the voltage of the peripherals and the processorcore power pins. Generally, the proposed technique can be used

Page 5: 06883169.pdf

1266 IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 62, NO. 2, FEBRUARY 2015

Fig. 6. Executing two tasks. (a) On a single-processor system. (b) Ona dual-processor system.

to provide scalable voltages for the COTS devices that theirsupply voltage can vary within a range.

C. Opportunities Offered by Parallel Processing

Since the proposed platform has multiple processing units(i.e., two ARM7-based and one AVR-based microcontrollers)and since it has the facilities for energy/power management(i.e., DVFS and DPM), one advantage of the platform is thatit can be used to research into the possible opportunities forenergy management that may be offered by parallel processing.To give an insight into this issue, we provide an example toillustrate when DVFS is used in executing parallel tasks; atwo-processor system consumes less energy as compared witha single-processor system. Suppose that the slack time that isavailable to execute two tasks T1 and T2 (with N1 and N2 CPUcycles) is S. Fig. 6 shows how the tasks are executed on a singleprocessor [see Fig. 6(a)] and on two processors [see Fig. 6(b)].In Fig. 6, N1/fmax and N2/fmax are respectively the executiontimes of T1 and T2 at the maximum frequency fmax. For thesingle-processor system [see Fig. 6(a)], the minimum possiblefrequency that stretches the two tasks as long as possible andgives the minimum energy consumption can be calculated as

fSP =N1/fmax +N2/fmax

S. (3)

Similarly, for the dual-processor system [see Fig. 6(b)], theminimum possible frequencies to execute T1 and T2 (i.e.,fDP,1 and fDP,2, respectively) that give the minimum energyconsumption can be calculated as

fDP,1 =N1/fmax

SfDP,2 =

N2/fmax

S. (4)

By the use of (1) (that gives the active power consumptionPActive) and considering that the energy consumed by execut-ing a task with N cycles at frequency f can be computed asPActiveN/f , the minimum energy consumption of the single-processor system [see Fig. 6(a)] can be written as (VSP is theminimum voltage that allows fSP)

ESP =

(ILeakage

VSP

fSP+ CeffV

2SP

)N1

+

(ILeakage

VSP

fSP+ CeffV

2SP

)N2. (5)

Fig. 7. Power measurement setup.

Fig. 8. Debug and test schematic.

Similarly, the minimum energy consumption of the dual-processor system can be written as (VDP,1 and VDP,2 are theminimum voltages that allow fDP,1 and fDP,2, respectively)

EDP =

(ILeakage

VDP,1

fDP,1+ CeffV

2DP,1

)N1

+

(ILeakage

VDP,2

fDP,2+ CeffV

2DP,2

)N2. (6)

In (3) and (4), it is shown that fDP,1 < fSP and fDP,2 < fSP(for N1 and N2 �= 0). Therefore, the minimum voltages that areused in the dual-processor system can be less than the minimumvoltage that is used in the single-processor system. Therefore,we have VDP,1 < VSP and VDP,2 < VSP. In addition, assumingan almost linear relationship between the voltage and fre-quency [1], [3], [11], we can write VSP/fSP ≈ VDP,1/fDP,1 ≈VDP,2/fDP,2. Therefore, from (5) and (6), it can be concludedthat EDP < ESP. This means that when DVFS is used inexecuting parallel tasks, a dual-processor system could providemore energy saving compared with a single-processor system.

IV. POWER MEASUREMENT, DEBUG, AND TEST UNITS

A. Power Measurement Unit

To provide power measurement equipment to the platform, aresistor is placed between each microcontroller power pin andthe power supply line, and the voltage drop across the resistor ismeasured. The measured value gives the current drawn by thepower pin. The power measurement setup is shown in Fig. 7.As the current drawn by the power pins of the microcontrolleris less than 100 mA and this value cannot be digitized by theADC of microcontrollers, the voltage value is amplified usingan operational amplifier. The amplified value is digitized by a10-bit ADC, and the data are sent to the host computer.

B. Debug Units

The AT91SAM7x microcontrollers have a number of debugand test features, shown as a block diagram in Fig. 8. TheUART debug unit provides a two-pin (i.e., TXD and RXD)UART interface that can be employed for various purposes, e.g.,debug, trace the running application, and upload an applicationinto internal SRAM. A general JTAG/ICE (see [9]) port isemployed for commonly used operations, such as loading pro-gram code, and for standard debugging functions, such as singlestepping through programs. IEEE 1149.1 JTAG Boundary Scan

Page 6: 06883169.pdf

SALEHI AND EJLALI: HARDWARE PLATFORM TO EVALUATE EMBEDDED SYSTEMS BASED ON COTS DEVICES 1267

TABLE IIIPOWER SUPPLY REQUIREMENTS FOR SOME

WIDELY USED MICROCONTROLLERS

Fig. 9. Experimental setup and monitoring. (a) Setup. (b) Voltage ofI/O. (c) Voltage of the processor. Coupling: ac.

allows pin-level access to the IEEE 1149.1 JTAG-compliantdevices independent of the device packaging technology andis commonly used for test purposes. In a test environmentfor multiple on-board devices, a number of JTAG-compliantdevices are connected to form a single scan chain, and testvectors are generated, transferred, and interpreted by a tester.

V. EXPERIMENTAL RESULTS

A survey of some widely used ARM-based microcontrollerssuggests that most of them permit the power supply pins to befed by a wide range of voltages, as shown in Table III. This pro-vides the opportunity of employing the proposed controllablepower supply (see Section III-B) for them to achieve energysaving. In addition, all of the processors in Table III offer anumber of modes to manage power in the system. These modesrange widely in the level of power savings and the level offunctionality. For instance, LPC11U6x series [14] provide fourpower modes, namely, Sleep, Deep-sleep, Power-down, andDeep power-down modes, and PXA270 [20] provides Turbomode (i.e., low latency operation), Run mode (i.e., normal full-function mode), Idle and Deep-idle modes (allow stopping andresuming the CPU clock), Standby mode (all PLLs are dis-abled), Sleep mode (only keeps I/Os powered), and Deep-sleep(I/Os are powered down). To the best of our knowledge, most ofthe current embedded processors provide power managementonly through controlling the clock of the processor core andperipherals, and only a few of them (e.g., [13] and [14]) providevariable supply voltages. As we have discussed in Section III-B,lowering clock frequency solely is not effective for energysaving, and simultaneous frequency and voltage scaling arerequired for this purpose.

Fig. 9(a) shows the experimental setup that includes anoscilloscope (for displaying voltages), a JTAG device (forprogramming), and two USB and RS232 connections (for datatransfer). In this platform, we have four different voltages (seeFig. 4) for: 1) the processor; 2) PLL; 3) I/O peripheral; and4) memory. These voltages can independently vary and canbe determined regardless of others. In the experiments, theprocessor and PLL voltages could be any value from the set{1.65, 1.7, 1.75, 1.8, 1.85, 1.9, and 1.95 V}, and I/O and

Fig. 10. High-to-low and low-to-high voltage scaling delays. (a) I/O.(b) Processor. Coupling: ac.

memory voltages could be any value from the set {3.0, 3.1, 3.2,3.3, 3.4, 3.5, and 3.6 V}. Like the works [1], [11], and [23], wehave a set of voltage–frequency pairs to perform DVFS. Eachvoltage has a corresponding frequency level, and hence, thereare seven levels, i.e., {36, 40, 45, 51, 55, 58, and 61 MHz}.The corresponding frequency for each voltage was empiricallydetermined by measuring the highest frequency at which theprocessor still worked correctly and then subtracting 5% safetymargin (similar to [23]). It should be noted that this measure-ment is carried out only once (by the board development team)and the end users just use the provided set and they do notneed to repeat such measurements (although they can do it ifthey require). Although we provided only seven different levelsof voltage, the platform can provide 256 voltage levels. Asan example to show how the four voltages can independentlyvary, Fig. 9(b) shows the voltage of the processor and I/O whenswitching, respectively, between 1.75, 1.8 and 1.85 V, and 3.2,3.3 and 3.4 V.

We conducted a set of experiments to analyze the voltagescaling delay in the proposed platform. For example, Fig. 10shows a timing diagram of voltage scaling between two consec-utive voltage levels, i.e., 1.75 and 1.8 V for the processor coreand 3.2 and 3.3 V for I/O. In Fig. 10, the high-to-low voltagescaling delay is 34 and 118 μs, and the low-to-high voltagescaling delay is 23 and 55 μs for the processor core and I/O,respectively. In our experiments, we obtained almost the sameresult for the other voltage levels. An interesting observationfrom these experiments is that the high-to-low voltage scalingdelay is greater than the low-to-high voltage scaling delay (i.e.,about 45% for the processor core and PLL and about 110%for memory and I/O). To analyze the power consumption ofdifferent parts of the microcontroller (including the processorcore, PLL, memory, and I/O) when working on different voltagelevels, we executed a matrix multiplication task on the KeilRTX operating system [27]. This task multiplies two randomlygenerated matrices and sends the result to the host computer viaUSB. Based on the power consumption results that are shownin Fig. 11, for all the parts, lower supply voltage leads to lowerpower consumption. In addition, Fig. 11 shows that voltagescaling is very effective in reducing the power consumption ofboth the processor and the other parts of the microcontroller.

Another set of experiments has been performed on theMiBench benchmarks [21] (as real applications) to determinethe contribution of each part of the microcontroller in the totalpower consumption, execution time, and energy consumption.The results are shown in Fig. 12. In this experiment, the 1.8-Vvoltage is used for the processor and PLL, and the 3.3-V

Page 7: 06883169.pdf

1268 IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 62, NO. 2, FEBRUARY 2015

Fig. 11. Power consumption of AT91SAM7x. (a) Processor and PLL.(b) Memory and I/O.

Fig. 12. Contribution of parts of AT91SAM7x in: (a) power consump-tion, (b) execution time, and (c) energy consumption.

voltage is used for memory and I/O. As PLL is always op-erational during the application execution, it is not includedin Fig. 12(b), and when we calculate energy consumption [inFig. 12(c)], applications’ execution time is considered for PLL.From Fig. 12, we make two main observations: 1) Although thepower consumption of PLL, memory, and I/O is less than thatof the processor, they have energy consumption comparablewith that of the processor; 2) although PLL has a very smallcontribution in the total power, as it is always operational, itsenergy consumption is comparable in most cases with that ofthe others.

To evaluate the effectiveness of applying voltage scalingon the whole microcontroller, we measured and compared theenergy consumption of the microcontroller when using threetypes of energy management techniques.

1) DPM: When there is an idle time, the microcontrollerenters the low-power mode, which is provided by themicrocontroller [9], as: memory is standby (is not ac-cessed at all), processor core is idle (its clock is switchedoff), main clock = 500 Hz, and all peripheral clocks aredeactivated.

2) Core voltage and frequency scaling (CVFS): DVFS isused only for the processor core, and DPM is used forthe other parts. In this case, the processor frequency is setto the slowest frequency (and its corresponding voltage)necessary to finish the application, selected from the setof available voltage–frequency pairs.

3) Microcontroller voltage and frequency scaling (MVFS):DVFS is used for the whole microcontroller, includingthe processor core, PLL, memory, and I/O.

In this experiment, we analyzed the MiBench benchmarks,and the results are shown in Table IV. This experiment showsthat, for the applications in this experiment, using MVFS resultsin energy savings in average of about 35% and 11% (at least31% and 10%), as compared with the sole use of DPM and to

TABLE IVENERGY CONSUMPTION (IN MILLIJOULES) OF DPM, CVFS, AND MVFS

TABLE VENERGY CONSUMPTION (IN MILLIJOULES) FOR EXECUTING THE

DUPLICATION TECHNIQUE ON A SINGLE PROCESSOR

OR ON TWO PROCESSORS

the use of the processor CVFS, respectively. In this experiment,we did not consider any fixed voltage–frequency pair for anybenchmark. Rather, we executed each benchmark by all sevenvoltage–frequency pairs, and the average results are reported inTable IV.

We conducted another set of experiments to show how theproposed platform can be used for parallelism in energy man-agement. As an example, consider the duplication technique[2] where each task is executed twice to detect possible errors.These two executions of each task can be performed on a singleprocessor in series [see Fig. 6(a)] or on two processors in paral-lel [see Fig. 6(b)]. As Table V shows, for this example, parallelprocessing on two processors can provide in average of 25%(up to 29%) energy saving, as compared with implementingthe technique on a single processor (the reason is discussedin Section III-C). To implement this application, we used RTXoperating system [27] (other embedded operating systems (e.g.,RTEMS [26]) could be also used with the platform). Then, wedeveloped the source code of the application, where we usedMailBox [26] feature of RTX for message passing and syn-chronization. MailBox can use commonly used communicationprotocols (e.g., SPI, UART, and CAN) that are supported by theplatform (see Section II). For this experiment, we used UARTfor MailBox. Finally, we used Keil [26] (the compiler for RTX)to compile the source code and to load the object files into theplatform through JTAG.

VI. EXTENSIONS AND FUTURE WORK

The proposed platform can provide an experimental setupfor different research projects. For example, the platform canbe used to experiment with fault-tolerance techniques as adirection for future work by the use of these facilities.

1) The two microcontrollers are connected such that theycan interrupt, restart, and turn on/off each other.

2) Each of the microcontrollers can access the internal partsof the other via JTAG. This is helpful to implement faultdetection mechanisms that require comparing parts of aprocessor with their correspondents in the other one.

Page 8: 06883169.pdf

SALEHI AND EJLALI: HARDWARE PLATFORM TO EVALUATE EMBEDDED SYSTEMS BASED ON COTS DEVICES 1269

3) There are interconnections to transfer data, internal states,and checkpoints between the microcontrollers.

4) A third smaller microcontroller is placed in the platformthat can be used (for example, as a voter [15]) to imple-ment fault-tolerance techniques.

These facilities provide the possibility of implementing fault-tolerance techniques such as standby sparing [1], duplication[2], and “2 out of 2” hardware redundancy with a voter [15].The platform can be also used to implement software fault-tolerance techniques such as result checking [2] and N-versionprogramming [10]. Such software mechanisms usually requirecommunication and synchronization between the processors[28], which is supported by the proposed platform.

The debugging features (see Section IV-B) can be also usedfor fault injection purposes [16]. For example, JTAG lets uschange processor registers, flags, and data memory at run timearbitrarily. This can be used for injecting soft errors that arecaused by transient faults (e.g., single event upset [17]) andcause one or more memory bits change [1], [2], [17].

Another possible extension is to adopt a motherboard–daughterboard architecture for the design of the board to beused for other microcontrollers but with a slight change.

VII. CONCLUSION

This paper has presented a hardware platform that consistsof two ARM-based microcontrollers, each fed separately byvariable voltages. This platform is very suitable for evaluatingembedded systems with low energy consumption and fault-tolerance requirements. In this platform, we provide DVFS ca-pability for the whole microcontroller (including the processorcore, PLL, memory, and I/O). Physical experiments show thatapplying DVFS on the whole microcontroller is considerablymore efficient in reducing power/energy consumption com-pared with applying DVFS only on the processor core or usingpower-down policies that are currently used by most embeddedprocessors. In addition, the platform is equipped with accu-rate energy/power measurement units, debugging ports, andfacilities for evaluating fault-tolerance techniques. Althoughthe platform is designed for ARM-based microcontrollers, it isgeneral, and other COTS devices and embedded processors canbe similarly used in the design of the platform.

REFERENCES

[1] A. Ejlali, B. M. Al-Hashimi, and P. Eles, “Low-energy standby-sparingfor hard real-time systems,” IEEE Trans. Comput.-Aided Design Integr.Circuits Syst., vol. 31, no. 3, pp. 329–342, Mar. 2012.

[2] S. Aminzadeh and A. Ejlali, “A comparative study of system-level energymanagement methods for fault-tolerant hard real-time systems,” IEEETrans. Comput., vol. 60, no. 9, pp. 1288–1299, Sep. 2011.

[3] A. Malinowski and H. Yu, “Comparison of embedded system design forindustrial applications,” IEEE Trans. Ind. Informat., vol. 7, no. 2, pp. 244–254, May 2011.

[4] J. Henkel and S. Parameswaran, Designing Embedded Processors: A LowPower Perspective. Berlin, Germany: Springer-Verlag, 2007.

[5] P. Marti, M. Velasco, J. M. Fuertes, A. Camacho, and G. Buttazzo,“Design of an embedded control system laboratory experiment,” IEEETrans. Ind. Electron., vol. 57, no. 10, pp. 3297–3307, Oct. 2010.

[6] T. Yang, G. Zhang, and X. Hu, “System design of current transformeraccuracy tester based on ARM,” in Proc. 8th IEEE Conf. Ind. Electron.Appl., Jun. 19–21, 2013, pp. 634–639.

[7] H. Guzman-Miranda, L. Sterpone, M. Violante, M. A. Aguirre, andM. Gutierrez-Rizo, “Coping with the obsolescence of safety- or mission-critical embedded systems using FPGAs,” IEEE Trans. Ind. Electron.,vol. 58, no. 3, pp. 814–821, Mar. 2011.

[8] Virtex-6 FPGA ML605 Evaluation Kit, Xilinx, San Jose, CA, USA, 2012.[9] ARM-based Flash MCU SAM7x Series, Atmel Corp., San Jose, CA, USA,

Feb. 11, 2014.[10] R.-T. Wang, “A dependent model for fault tolerant software systems dur-

ing debugging,” IEEE Trans. Rel., vol. 61, no. 2, pp. 504–515, Jun. 2012.[11] J. Pouwelse, K. Langendoen, and H. Sips, “Dynamic voltage scaling on a

low-power microprocessor,” in Proc. 7th ACM Int. Conf. MobiCom Netw.,2001, pp. 251–259.

[12] Y. S. Hwang and K. S. Chung, “Dynamic power management techniquefor multicore based embedded mobile devices,” IEEE Trans. Ind. Infor-mat., vol. 9, no. 3, pp. 1601–1612, Aug. 2013.

[13] STM32L15x: Ultra-Low-Power 32-Bit MCU ARM-Based Cortex-M3,STMicroelectronics, Geneva, Switzerland, Nov. 2013.

[14] LPC11U6x 32-Bit ARM Cortex-M0 + Microcontroller, NXP Semiconduc-tors, Eindhoven, The Netherlands, Mar. 2014.

[15] M. Idirin, X. Aizpurua, A. Villaro, J. Legarda, and J. Melendez, “Imple-mentation details and safety analysis of a microcontroller-based SIL-4software voter,” IEEE Trans. Ind. Electron., vol. 58, no. 3, pp. 822–829,Mar. 2011.

[16] M. Portela-Garcia, C. Lopez-Ongil, M. Garcia-Valderas, and L. Entrena,“Fault injection in modern microprocessors using on-chip debugging in-frastructures,” IEEE Trans. Dependable Secure Comput., vol. 8, no. 2,pp. 308–314, Mar./Apr. 2011.

[17] M. Grosso, H. Guzman-Miranda, and M. A. Aguirre, “Exploiting faultmodel correlations to accelerate SEU sensitivity assessment,” IEEE Trans.Ind. Informat., vol. 9, no. 1, pp. 142–148, Feb. 2013.

[18] i.MX27 and i.MX27L Multimedia Applications Processor, FreescaleSemiconductor Inc., Austin, TX, USA, 2011.

[19] High-Performance, Low-Power System-on-Chip with SDRAM and DigitalAudio, Cirrus Logic, Inc., Austin, TX, USA, 2011.

[20] Marvell PXA270 Processor: Electrical, Mechanical, Thermal Specifica-tion, Marvell, Santa Clara, CA, USA, 2009.

[21] M. R. Guthaus et al., “Mibench: A free, commercially representativeembedded benchmark suite,” in Proc. IEEE Int. Workshop WorkloadCharacterization, Dec. 2001, pp. 3–14.

[22] J. Castrillon, R. Leupers, and G. Ascheid, “MAPS: Mapping concurrentdataflow applications to heterogeneous MPSoCs,” IEEE Trans. Ind. Infor-mat., vol. 9, no. 1, pp. 527–545, Feb. 2013.

[23] T. Phatrapornnant and M. J. Pont, “Reducing jitter in embedded systemsemploying a time-triggered software architecture and dynamic voltagescaling,” IEEE Trans. Comput., vol. 55, no. 2, pp. 113–124, Feb. 2006.

[24] H. Guo, K.-S. Low, and H.-A. Nguyen, “Optimizing the localization of awireless sensor network in real time based on a low-cost microcontroller,”IEEE Trans. Ind. Electron., vol. 58, no. 3, pp. 741–749, Mar. 2011.

[25] R. Wang and S. Yang, “The design of a rapid prototype platform for ARMbased embedded system,” IEEE Trans. Consum. Electron, vol. 50, no. 2,pp. 746–751, May 2004.

[26] RTEMS Operating System, 2010. [Online]. Available: http://www.rtems.com

[27] RTX Real-Time Operating System, 2013. [Online]. Available: http://www.keil.com

[28] Y. Jiang et al., “Bayesian-network-based reliability analysis of PLCsystems,” IEEE Trans. Ind. Electron., vol. 60, no. 11, pp. 5325–5336,Nov. 2013.

Mohammad Salehi received the M.S. degree in computer engineeringfrom Sharif University of Technology, Tehran, Iran, in 2010, where he iscurrently working toward the Ph.D. degree in computer engineering.

His current research interests include embedded systems, low-powerdesign, and the tradeoff between fault tolerance and energy efficiencyin real-time systems.

Alireza Ejlali received the Ph.D. degree in computer engineering fromSharif University of Technology, Tehran, Iran, in 2006.

He is currently an Associate Professor of computer engineering withSharif University of Technology, where he is also the Director of theComputer Architecture Group and the Embedded Systems ResearchLaboratory, Department of Computer Engineering. His current researchinterests include low-power design, real-time embedded systems, andfault-tolerant embedded systems.