Unbekannt · ing platforms. We define an intelligent heterogeneity-aware load management (HALM) system that leverages hetero-geneity characteristics to provide two data center level
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Abstract Power management is becoming an increasinglycritical component of modern enterprise computing environ-ments. The traditional drive for higher performance has in-fluenced trends towards consolidation and higher densities,artifacts enabled by virtualization and new small form fac-tor server blades. The resulting effect has been increasedpower and cooling requirements in data centers which el-evate ownership costs and put more pressure on rack andenclosure densities. To address these issues, we exploit afundamental characteristic of data centers: “platform het-erogeneity”. This heterogeneity stems from the architec-tural and management-capability variations of the underly-ing platforms. We define an intelligent heterogeneity-awareload management (HALM) system that leverages hetero-geneity characteristics to provide two data center level ben-efits: (i) power efficient allocations of workloads to the bestfitting platforms and (ii) improved overall performance ina power constrained environment. Our infrastructure reliesupon platform and workload descriptors as well as a novelanalytical prediction layer that accurately predicts workloadpower/performance across different platform architecturesand power management capabilities. Our allocation scheme
R. Nathuji (�) · K. SchwanGeorgia Institute of Technology, Atlanta, GA 30032, USAe-mail: [email protected]
E. GorbatovIntel Corporation, Hillsboro, OR 97124, USAe-mail: [email protected]
achieves on average 20% improvements in power efficiencyfor representative heterogeneous data center configurations,and up to 18% improvements in performance degradationwhen power budgeting must be performed. These resultshighlight the significant potential of heterogeneity-awaremanagement.
Keywords Power management · Distributed resourcemanagement · Heterogeneous systems
1 Introduction
Power management has become a critical component ofmodern computing systems, pervading both mobile and en-terprise environments. Power consumption is a particularlysignificant issue in data centers, stimulating a variety of re-search for server systems [2]. Increased performance re-quirements in data centers have resulted in elevated densitiesenabled via consolidation and reduced server form factors.This has in turn created challenges in provisioning the nec-essary power and cooling capacities. For example, currentdata centers allocate nearly 60 Amps per rack, a limit thatis likely to become prohibitive for future high density rackconfigurations such as blade servers, even if the accompany-ing cooling issues can be solved [24]. In addition, a 30,000square feet data center with a power consumption of 10 MWrequires a cooling system which costs $2–$5 million [21]. Insuch a system, the cost of running the air conditioning equip-ment alone can reach $4–$8 million a year [24]. Coupledwith the elevated electricity costs from high performanceservers, these effects can substantially affect the operatingcosts of a data center. Overall, these trends in power/coolingdelivery and cost highlight the need for power and thermalmanagement support in data centers.
Previous work on server management has focused onmanaging heat during thermal events [21] or utilizing plat-form power management support, such as processor fre-quency scaling, for power budgeting [9, 18, 24]. In this pa-per, we approach the problem of managing data centers froma different perspective by considering how to intelligentlyallocate workloads amongst heterogeneous platforms in amanner that (i) improves data center power-efficiency whilepreserving/satisfying workload performance requirements,and (ii) meets data-center-level power budgets with minimalimpact on workload performance. Typically, data centersstatically allocate platform resources to applications basedupon peak load characteristics in order to maintain isola-tion and provide performance guarantees. With the continu-ing growth in capabilities of virtualization solutions (e.g.,Xen [1] and VMware [25]), the necessity of such offlineprovisioning is removed. Indeed, by allowing for flexibleand dynamic migration of workloads across physical re-sources [6], the use of virtualization in future data centersenables a new avenue of management and optimization. Ourapproach begins to leverage some of these capabilities toenhance power efficiency by taking advantage of the abilityto assign virtualized applications to varying sets of underly-ing hardware platforms based upon performance needs andpower consumption characteristics.
Throughout their lifetimes, data centers continually up-grade servers due to failures, capacity increases, and mi-grations to new form factors [12]. Over time, this leads todata centers comprised of a range of heterogeneous plat-forms with differences in component technologies; power,performance and thermal characteristics; and power man-agement capabilities. When provisioning resources to work-loads in these heterogeneous environments, power efficiencycan vary significantly based on the particular allocation. Forexample, by assigning a memory bound workload to a plat-form that can perform dynamic voltage and frequency scal-ing (DVFS), run-time power consumption can be reducedwith minimal impact to performance [19]. We propose anovel heterogeneity-aware load management (HALM) ar-chitecture to achieve this power-friendly behavior in datacenters.
Allocating power and cooling resources is another sig-nificant challenge in the modern data center. Though clearlybeneficial for transient power delivery and cooling issues,power budgeting solutions can also be useful in the provi-sioning of these resources. Traditionally, power and coolinghave been allocated based on the nameplate rating of the sys-tem power supply or its maximum output power. However, afully utilized server with a typical configuration will see itselectrical load between 60%–75% of the name plate ratingwith most enterprise workloads. Therefore, providing powerand cooling capacity based on these worst case assumptions
results in either over allocation of power and cooling ca-pacity or underutilization of server rack space leading to in-creased capital costs and underutilized data centers. Allocat-ing power and cooling capacity based on the average work-load behavior within a server and across a data center allowssignificantly increased densities but requires dynamic pro-tection mechanisms that can limit server power consump-tion when demand temporarily exceeds available capacity.These mechanisms have been recently proposed in the liter-ature and explored in the industry [8]. While very effectivein limiting power and protecting the infrastructure, they mayresult in nontrivial degradation of peak performance, espe-cially when the power constraint is too prohibitive. In thispaper we illustrate how HALM can lessen the performanceimpact of data center power budgeting strategies.
Intelligent mapping of applications to underlying plat-forms is dependent upon the availability of relevant infor-mation about workloads and hardware resources. As part ofHALM, we extend the use of workload and platform de-scriptors for this purpose, which are then used by a pre-dictor component that estimates the achievable performanceand power savings across the different platforms in the datacenter. These predictions are finally used by an allocationlayer that map workloads to a specific type of platform. Thisoverall infrastructure is evaluated using data center con-figurations consisting of variations upon four distinct plat-forms. In summary, the main contributions of our HALMsystem are: (i) a platform heterogeneity-aware power man-agement infrastructure that improves data center power effi-ciency under workload performance constraints and limiteddata center power budgets; (ii) an allocation infrastructurethat uses workload and platform descriptors to perform map-pings of hardware to virtualized workloads; and (iii) an in-telligent load shedding policy to dynamically meet transientchanges in power consumption limits. Evaluations of oursystem performed on state-of-the art platforms, includingIntel® Core™ microarchitecture based hardware, demon-strate the benefits of exploiting platform heterogeneity forpower management.
2 Motivation
2.1 Data center composition and exploiting heterogeneity
Data center deployments are inherently heterogeneous. Up-grade cycles and replacement of failed components andsystems contribute to this heterogeneity. In addition, newprocessor and memory architectures appear every few years,and reliability requirements are becoming ever more strin-gent. The effect of these trends is reflected by a recent sur-vey of data center managers that found that 90% of the facil-ities are expected to upgrade their compute and storage in-frastructure in the next two years. Figure 1(a) shows a distri-bution of different systems in a representative enterprise data
Fig. 1 Data center heterogeneity and management benefits
center. As the figure shows, the data center contains nine dif-ferent generations of systems that have either (1) differentprocessor architectures, cores and frequencies; (2) varyingmemory capacity and interconnect speeds; or (3) differentI/O capabilities. While all systems support the same soft-ware stack, they have very different and often asymmetricperformance and power characteristics.
Traditionally, the non-uniformity of systems in a datacenter has been characterized by different levels of perfor-mance and power consumption. However, recently, anotherdimension has been added to this heterogeneity becauseserver platforms are beginning to offer rich thermal andpower management capabilities. Processors support DVFSand aggressive sleep states to conserve CPU power. Newmemory power management implementations allow differ-ent DRAM devices to go to lower power states when in-active, and enable bandwidth throttling for thermal protec-tion. Server power supplies exhibit different conversion ef-ficiencies under different loads, directly impacting the over-all power efficiency of the system. Since power efficiencyhas become an important thrust in enterprise systems, weexpect component and platform vendors to continue intro-ducing new power and thermal management capabilities intotheir products, including I/O and system buses, chipsets, andnetwork and disk interfaces, making future platforms evenmore heterogeneous.
Previous work has proposed different approaches forenergy-efficient workload allocation in clusters, but nonehave accounted for system level power management andthermal characteristics. Therefore, the workload allocationsproposed by previous approaches will yield less than idealresults since they are completely unaware of power and ther-mal management effects on system performance and powerconsumption. To illustrate this phenomenon, we experimen-tally compare two dual processor systems, A and B , run-ning two different workloads, as shown in Fig. 1(b). Thedifferences between the two systems are in the power sup-ply unit (PSU) and processor power management capabili-ties. System A has a less efficient power supply at light loadand has processors with limited power management support.System B, on the other hand, has a high efficiency powersupply across all loads and processors that support a rich set
of power management capabilities. We measure power con-sumption on these platforms using two different syntheticworkloads: one with full utilization (W1) and one with avery low level of utilization (W2). W1 consumes about thesame amount of power on both platforms. However, allocat-ing the low-utilization W2 to system A leads to very powerinefficient execution. Since A does not support power man-agement and has low PSU efficiency at light load, its totalsystem power is more than 50 W higher than that of sys-tem B . Thus, while both systems meet the performance de-mand of both workloads, heterogeneity-aware resource allo-cation can decrease total power by more than 10%, translat-ing into millions of dollars in savings for large data centers.As this example shows, a full knowledge of system powerand supported power management features is required to ef-ficiently allocate workloads. Our HALM system is designedto provide such functionality.
2.2 Benefits of heterogeneity-aware management
To further motivate the need and benefits of heterogeneity-aware management in data centers, we perform two oppor-tunity studies. The first study considers the possible benefitsof allocating workloads by matching system capabilities andworkload execution characteristics to reduce a data center’spower profile while also meeting workload performance de-mands. We analyze an example of running a set of work-loads in a data center configuration with four unique typesof platforms described later in the paper, each with differ-ent power/performance characteristics. The set of workloadsincludes ten computational benchmarks (swim, bzip2,mesa, gcc, mcf, art, applu, vortex, sixtrack, andlucas from SPEC CPU2000) and one transaction-orientedworkload (SPECjbb2005). We generate all subsets of fourfrom these eleven benchmarks and compare three alloca-tion policies for each of the subsets in Fig. 2(a). The ‘worstcase’ allocation distributes the benchmarks across platformsto maximize power consumption, ‘random’ allocates work-loads to platforms randomly, and ‘optimal’ distributes theworkloads to minimize power consumption. For each work-load, we allocate as many systems of a given type as nec-essary to meet workload throughput requirements. Subsets
Fig. 2 Opportunity analysis of heterogeneity-aware management
that have benchmarks with more homogeneous behavior, i.e.similar processor and memory usage behavior, appear on theleft side of the graph, while subsets with more heteroge-neous benchmarks appear on the right. As can be seen fromthe figure, subsets of workloads with more heterogeneousbehavior can substantially benefit from heterogeneity-awareresource allocation. Averaging across all subsets, the opti-mal policy can reduce total power by 18% when comparedto random allocation and by 34% over worst-case allocation,without compromising workload performance.
The second opportunity study considers how the aggre-gate throughput of a set of workloads varies within a givenpower budget based upon allocations. In particular, we as-sume that we have one of each of our four unique platformsand again generate subsets of four workloads from a set ofSPEC CPU2000 benchmarks. For each subset, we calculatethe minimum, average, and best case throughput across allpermutations of possible allocations of the four workloadsonto the four platforms. Figure 2(b) provides the results,where each scenario is normalized by the minimum through-put value to provide fair comparisons. We find that on aver-age, the best case allocation provides a 23% improvementin performance over the random allocation, and a 48% im-provement compared to the worst-case. These results high-light the relationship between allocation decisions and per-formance when a power budget must be imposed.
Summarizing, HALM addresses the power benefits ofheterogeneity-aware allocation for two cases: (1) when thereis no power budget and (2) when such a budget must beimposed temporarily due to power delivery or cooling con-straints or as part of a power provisioning strategy [8].
3 Scalable enterprise and data center management
Our previous discussions have motivated the need to aug-ment the behavior of data centers to improve manageabil-ity by leveraging the heterogeneity in platform capabilities.
HALM extends this support with its heterogeneity-awareworkload allocation infrastructure that utilizes the flexibil-ity of rapidly developing virtualization technologies. Virtu-alization attempts to provide capabilities and abstractionsthat significantly impact the landscape of enterprise man-agement. For example, there is active work to ensure per-formance isolation benefits, where it will be possible torun multiple virtual machines (VMs) within a given phys-ical platform without interference among applications [15].Currently, VMs can coexist on a platform with negligibleperformance interference as long as resources are not over-committed. Approaches that allow for resource pools andreservations as well as dynamic resource sharing and recla-mation can aid in providing isolation even when systemsare over-provisioned. Secondly, by encapsulating applica-tion state within well defined virtual machines, migrationof workloads among resources can be performed easily andefficiently. A more powerful contribution of virtualization,however, is the ability to combine multiple resources acrossphysical boundaries to create virtual platforms for applica-tions, providing a scalable enterprise environment. HALMassumes the existence of this flexible and powerful virtual-ization support.
The usage pattern of data centers is becoming increas-ingly service-oriented, where applications and workloadsmay be submitted dynamically by subscribers/clients. Whenmanaging these types of applications certain managementactions, such as allocation decisions, happen at a coarsegranularity with finer adjustments being made at runtimeto address transient issues such as reduced power budgets.One can imagine how such a data center might be managedwith the typically used assignment approaches. At some in-frequent interval the pool of applications and service levelagreements (SLAs) that specify their required performance,in metrics such as throughput or response time, are com-piled. Applications are then assigned to platforms usinga simple load balancing scheme based upon utilization orqueue lengths, possibly even accounting for differences inthe performance of the systems [26], so that SLAs are met.
When load must be reduced to address power budgeting re-quirements, load might be shed from workloads in a simi-larly random or round robin fashion. This approach clearlyleaves room for improvement, since it does not considerpower or platform differences in any way. HALM addressesthis weakness by performing heterogeneity aware alloca-tions as well as intelligent load shedding.
The HALM architecture can be organized into three ma-jor components: (1) platform/workload descriptors, (2) apower/performance predictor, and (3) an allocator, as shownin Fig. 3(a). We use platform and workload descriptors toprovide our workload allocator with the differences amongstworkloads and platforms. These descriptor inputs are uti-lized by the predictor to determine: (1) the relative perfor-mance of workloads on different types of platforms, and(2) the power savings achievable from platform power man-agement mechanisms. Coupled with coarse platform powerconsumption information (obtained via online power moni-toring) (3) the allocator, performs the assignments of work-loads to the available resources.
The purpose of platform descriptors is to convey informa-tion regarding the hardware and power management capabil-ities of a machine. A platform descriptor is made up of mul-tiple modules, representing different system components, asshown in Fig. 3(b). Each module specifies the type of com-ponent to which it refers, such as processor, memory subsys-tem, or power supply. Within each of these modules, variouscomponent parameters are defined. For example, a moduledescribing the processor component may have attributes likeits microarchitecture family, frequency, and available man-agement support. Workload descriptors are also structuredin modules, headed with attribute declarations. Within eachmodule, a list of values for that attribute is provided. Asworkload attributes often vary with the platforms on whichit executes, our descriptor design allows multiple attributedefinitions, where each definition is predicated with com-ponent parameter values that correlate back to platform de-scriptors. Figure 3(b) illustrates the structure of the resultingworkload descriptor. We further explain the meaning of the
MPI (memory accesses per instruction) and CPICORE (corecycles per instruction) attributes in subsequent sections.
Platform descriptor information can be provided in a vari-ety of ways. It can be made readily available using platformsupport such as ACPI [13], and possibly also with someadministrative input. To provide the required workload de-scriptors, we profile workloads on a minimal set of orthogo-nal platforms, with mutually exclusive component types. Wethen use an analytical prediction approach to project work-load characteristics on all available platforms. As we discussin Sect. 5, this approach provides accurate predictions thatscale with increased amounts of heterogeneity.
4 Methodology
4.1 Platform hardware
Our hardware setup consists of four types of rack mountedserver platforms summarized in Fig. 4(a), where LLC de-notes last-level cache size. All four types of platforms con-tain standard components and typical configurations that en-tered production cycles. In our experiments Linux was in-stalled on all systems for measurement of various attributes(e.g. CPI, MPI, etc.) as well as performance. We validatedthat the performance results matched those with Xen usinga subset of workloads and platforms, but performed the ma-jority of our experiments in a non-virtualized environment tohave better access to performance counters used to measureother workload attributes.
The platform names are based on their processor codename in this paper. All four platforms are dual-processorsystems. Woodcrest, Sossaman, and Dempsey are CMPdual-core processors, and Irwindale is a 2-way SMT proces-sor supporting Hyper-Threading Technology. All platformshave 8 GB of memory. Woodcrest and Dempsey supportFully Buffered DIMM (FBD) memory with a 533 MHzDDR2 bus, while Sossaman and Irwindale support unreg-istered DDR2 400 MHz memory. Woodcrest and Dempsey
Table 1 Levels ofheterogeneity in ourexperimental platforms
have dual FSB architectures with two branches to memoryand two channels per branch.
All four types of systems are heterogeneous in a sensethat each has a unique combination of processor architec-ture and memory subsystem. If we assume that Intel Coremicroarchitecture/Pentium® M and NetBurst constitute twotypes of processors and LLC-4 MB/FSB-1066/FBD-533and LLC-2 MB/FSB-800/DDR2-400 constitute two types ofmemory, all four platforms can be mapped as having uniqueprocessor/memory architecture combinations. Note that allfour platforms also have vastly different power and perfor-mance characteristics. For example, the Intel Core microar-
chitecture is superior to NetBurst both in terms of perfor-mance and power efficiency. FBD based memory, on theother hand, provides higher throughput in our systems atthe expense of elevated power consumption due to increasedDDR2 bus speed and the power requirements of the Ad-vanced Memory Buffer (AMB) on the buffered DIMMs.The four platforms occupy separate quadrants of a hetero-geneity space with dimensions of microarchitecture hetero-geneity and memory subsystem heterogeneity, as shown inFig. 4(b). We refer to this initial level of heterogeneity as“across-platform heterogeneity”. However, in addition tothis, all these server platforms also support chip-level DVFS.
This leads to a second degree of heterogeneity, where onetype of platform can have instances in a data center thatare configured to operate at different frequencies. We re-fer to this as “within-platform heterogeneity”. As processvariations increasingly result in the binning of producedchips into different operating points, this within-platformheterogeneity becomes an inherent property of the gen-eral data center landscape. Finally, many of these platformsmay incorporate some processor dynamic power manage-ment (DPM) techniques that adaptively alter platform be-havior at runtime. This creates a third source of heterogene-ity, “DPM-capability heterogeneity”, where platforms withbuilt-in DPM hooks exhibit different power/performancecharacteristics from the ones with no DPM capabilities. InTable 1, we show how these three levels of heterogeneityquickly escalate the number of distinct platform configura-tions in a data center scenario.
All experimental power measurements are performed us-ing the Extech 380801 power analyzer. The power is mea-sured at the wall and represents total AC power consumptionof the entire system. The power numbers presented in thispaper are obtained by averaging the instantaneous systempower consumption over the entire run of each workload.Our assumption is that infrastructure support for monitor-ing power consumption will be utilized to obtain this typeof workload specific power characteristics online, insteadof parameterized models. For example, all power supplies,which adhere to the latest power supply monitoring interface(PSMI) specification, support out-of-band current/voltagesampling allowing for per platform A/C power monitoringreflected by our actual power measurements.
4.2 Application model
When power managing computing environments, improve-ments can be attained with a variety of approaches. In thiswork, we consider two scenarios. The first assumes a lackof budgeting constraints, concentrating on a workload allo-cation that reduces power consumption while maintainingbaseline application performance. In other words, we max-imize the performance per watt, while holding performanceconstant. The second addresses power budgeting by per-forming load shedding to reduce power consumption whileminimizing performance impact to workloads. We considerapplication performance in terms of throughput, or the rateat which transaction operations are performed. Therefore,it is not the execution time of each transaction that definesperformance, but the rate at which multiple transactions canbe sustained. This type of model is representative of appli-cations such as transaction based web services or payrollsystems.
The goal of HALM is to evaluate the power-efficiencytradeoffs of assigning a workload to a variety of platforms.
Since the performance capabilities of each platform are dif-ferent, the execution time to perform a single operable unit,or atomic transaction, varies across them. As previouslymentioned, virtualization technologies can help to extendthe physical resources dedicated to applications when nec-essary to maintain performance by increasing the number ofplatforms used to execute transactions. In particular, trans-actions can be distributed amongst nodes until the desiredthroughput is reached.
For our analysis, we consider applications that mimicthe high performance computational applications commonto data center environments and also heavily exercise thepower hungry components of server platforms, the proces-sor and memory. Two aspects of these workloads are cap-tured in our experimental analysis. First, these workloadsare inherently transactional, such as the previous financialpayroll example or the processing of risk analysis modelsacross different inputs common to investment banking. Sec-ond, with the ability to incorporate large amounts of memoryinto platforms at relatively low costs, these applications of-ten execute mostly from memory, with little or no I/O beingperformed. Though I/O such as network use can play a sig-nificant role in multi-tier enterprise applications, we leaveconsideration of such characteristics to future work. To real-ize our application model, while also providing determin-istic and repeatable behavior for our experimentation, weutilize benchmarks from the SPEC CPU2000 suite as rep-resentative examples of transaction instances. SPEC bench-marks allow for the isolation of processor and memory com-ponents, while also generating different memory loads. In-deed, many SPEC benchmarks exhibit significant measuredmemory bandwidth of 5–8 GB/sec on our systems. In orderto provide an unbiased workload set, we include all SPECbenchmarks in our experiments. For each application, wespecify an SLA in terms of required transaction processingrate, equal to the throughput achievable on the Woodcrestplatform.
5 Workload behavior estimation
The power/performance predictor component of our HALMframework can be implemented in multiple ways. For ex-ample, one can profile a set of microbenchmarks on all plat-form configurations and develop statistical mapping func-tions across these configurations. However, as the platformtypes and heterogeneity increase, the overhead of such ap-proaches can be prohibitive. Instead, we develop a predic-tor that relies on the architectural platform properties andadjusts its predictions based on the heterogeneity specifica-tions. We refer to this model as the “Blocking Factor (BF)Model”. The BF model simply decomposes execution cyclesinto CPU cycles and memory cycles. CPU cycles represent
the execution with a perfect last-level cache (LLC), whilememory cycles capture the finite cache effects. This model issimilar to the “overlap model” described by Chou et al. [5].With the BF model, the CPI (cycles per instruction) of aworkload can be represented as in (1). Here CPICORE repre-sents the CPI with a perfect LLC. This term is independentfrom the underlying memory subsystem. CPIMEM accountsfor the additional cycles spent for memory accesses with afinite-sized cache:
CPI = CPICORE + CPIMEM. (1)
The CPIMEM term can be expanded into architecture andworkload specific characteristics. Based on this, the CPI ofa platform at a specific frequency f1 can be expressed asin (2). Here, MPI is the memory accesses per instruction,which is dependent on the workload and the LLC size, L
is the average memory latency, which varies based upon thememory subsystem specifications, and BF is the blockingfactor that accounts for the overlapping concurrent execu-tion during memory accesses, which is a characteristic ofthe workload:
CPI(f1) = CPICORE(f1) + MPI · L(f1) · BF(f1). (2)
Using variants of (2), performance prediction can beperformed relatively easily for within-platform heterogene-ity, as well as across-platform heterogeneity. For within-platform heterogeneity, the frequency-dependent compo-nents of (2) are scaled with frequency to predict workloadperformance on a different frequency setting. The top chartin Fig. 5 provides results for an example of this type of pre-diction with an orthogonal platform (Sossaman). The figurecontains the actual measured performance for our workloadstogether with the predicted performance.
In the latter case of across-platform heterogeneity, thenatural decoupling of the microarchitectural and memory
subsystem differences in the BF model enables us to esti-mate application performance on a platform lying on a dif-ferent corner of the memory and microarchitecture hetero-geneity space. Among our four experimental platforms, two“orthogonal platforms”, which span two opposite cornersof the platform heterogeneity quadrants in Fig. 4(b), can beused to predict performance on a third “derived platform”.The lower chart in Fig. 5 shows the prediction results for theWoodcrest platform, whose performance is “derived” usingthe CPICORE and CPIMEM characteristics of the orthogonalplatforms (Sossaman and Dempsey respectively). Overall,for the orthogonal platforms, the BF model can very accu-rately predict performance with an average prediction errorof 2%. For the derived platforms, our predictor can trackactual execution times very well, though with an increasedaverage prediction error of 20%. In the following sections,we show that this performance prediction methodology pro-vides sufficient accuracy to represent workload behavior andallows HALM to achieve close to optimal allocations. Fur-ther details of this prediction methodology can be found inour previous work [22].
The final heterogeneity type supported by our predictoris the DPM-capability heterogeneity. For this, we considera platform that enables DVFS during memory bound exe-cution regions of an application. We implement this func-tionality as part of OS power management, based on priorwork [14]. To incorporate DPM awareness, we extend thepredictor component to estimate the potential power savingsthat can be attained when executing a workload on a DPMenabled platform. Experimental results show that there isa strong correlation between the MPI of a workload andits power saving potential. Therefore, we utilize the MPIattribute in the workload descriptors to predict the powersaving potentials of workloads on DPM enabled platforms.Figure 6 shows that our MPI based prediction approach ef-fectively captures the power saving potentials of differentworkloads and successfully differentiates applications that
Fig. 6 Power savingpredictions for DPM enabledplatforms
can benefit significantly from being allocated to a DPM en-abled machine. As we describe in Sect. 6.1, we use this pre-dictor to choose workloads that should be assigned to theDPM enabled platforms.
6 Management policies
6.1 HALM allocation policy
After processing the workload and platform descriptors,and utilizing our BF model for performance prediction, thenext step is to perform allocation of resources to a set ofapplications in a data center. Evaluations are based on agreedy policy for allocating workloads. In particular, witheach application i, we associate a cost metric for execut-ing on each platform type k. Workloads are then orderedinto a scheduling queue based on their maximum cost met-ric across all platform types. The allocator then performsapplication assignment based on this queue, where applica-tions with higher worst-case costs have priority. The plat-form type chosen for an application is a function of this costmetric across the available platforms as well as the estimatedDPM benefits. As a cost metric for our policy, we defineNi,k , the number of platforms of type k required to executea workload i, Ni,k . This value is clearly a function of boththe performance capabilities of the platform and the SLA re-quirement of the workload. Ni,k can be analytically defined,given the transaction based application model utilized in ourwork. For each application i, the service level agreement(SLA) specifies that Xi transactions should be performedevery Yi time units. If ti,k is the execution time of a trans-action of application i on platform k, the resulting numberof platforms required to achieve the SLA can be expressedwith (3)
Ni,k =⌈
Xi
�Yi/ti,k�⌉. (3)
The ti,k values are provided by the performance predictor.It should be noted that there is a discretization in Ni,k , whichis due to the fact that individual atomic transactions cannotbe parallelized across multiple platforms. Ni,k is thereforebetter able to handle errors due to the inherent discretiza-tion performed, making it a strong choice as a cost metric
(other possible metrics are discussed and defined in our pre-vious work [22]). Given the use of Ni,k as our cost metric,our allocation approach first determines the platform typesof which (1) there are enough available systems to allocatethe workload and (2) the cost metric is minimized. We thenuse DPM savings to determine whether a more power ef-ficient platform alternative should be used between thosewith the same cost value. In other words, if there are mul-tiple platform types for which an application has the sameNi,k value, we utilize a DPM specific threshold to decidewhether or not it should be scheduled to a DPM enabledplatform type. As we demonstrate in the following section,this threshold based approach can be effective in identifyingworkloads that can take advantage of DPM capabilities.
6.2 HALM power budgeting policy
In order to address transient power delivery or cooling is-sues, it is sometimes necessary to temporarily reduce powerconsumption in a data center. To provide this mechanism,we develop a load shedding policy based upon an existingworkload allocation scheme. The goal of the policy is to re-duce the amount of resources provided to applications in or-der to meet a power budget while still allowing all workloadsto make some progress. In other words, application perfor-mance may be degraded compared to prescribed SLAs, butall applications achieve some fraction of their SLA.
Our power budgeting policy is, again, a greedy approach.For all applications with resources that can be shed, i.e.applications that utilize more than one platform, we de-fine a power-efficiency metric as the throughput per Wattthat is being provided by each resource. Afterwards, the re-sources with minimal power efficiency are shed until thepower budget is met. As our experimental results demon-strate, this simple metric allows for improved performancewhen power budgeting must be performed, as well as betterfairness across workloads in terms of performance degrada-tion experienced.
7 Experimental evaluation
7.1 Increasing power efficiency
In order to evaluate our heterogeneity-aware allocation ap-proach, we perform power and performance measurements
of our SPEC based representative transactional workloadsacross each type of platform. To scale these results to thenumber of platforms present in data centers, this measureddata is extrapolated analytically using a data center alloca-tion simulator which combines real power and performancedata, prediction output, and allocation policy definitions tocalculate power efficiency in various data center configura-tions. In the simulator, we provide the output of the predictoras input to the allocation policy. We always assume that theplatforms which are profiled are the 2 GHz Sossaman plat-form and the 3.7 GHz Dempsey system. Since we assumethe workload attributes are profiled accurately on these sys-tems, for fairness we also assume that for these two plat-forms performance data is obtained via profiling as well andis therefore known perfectly. We then consider three differ-ent scenarios: (1) all other platform performance informa-tion is known perfectly (oracle) (2) our BF model is used topredict performance for the remainder of platforms as de-scribed in Sect. 5 (BF model) (3) incorporating a simple sta-tistical regression approach (Stat. Est.). For this regressionmethod, we profile a subset of applications across all plat-forms to obtain linear performance prediction models para-meterized by variables that can be obtained by profiling aworkload on the 2 GHz Sossaman and 3.7 GHz Dempseysystems (CPI, MPI, etc.). The regression models can thenbe used to predict performance of any application. The base-line allocation scheme we compare against is a random one,since it closely estimates the common round-robin or uti-lization based approach.
The efficiency improvements achievable in a data centerare also dependent upon the system’s current mix of appli-cations. To obtain our results, we randomly pick applica-tions and allocate them using the random approach until nomore workloads can be scheduled. Using the resulting setof workloads, we then evaluate power consumption whenusing our prediction and allocation policies, and comparethem against the random allocation result. This is repeated ahundred times for each of our data points.
We first look at the benefits achieved with HALM indata center configurations with across-platform and within-
platform heterogeneity but no DPM support. In particular,we include the four base platforms, Woodcrest, Sossaman,Dempsey, and Irwindale, as well as the frequency variationsof the platforms. We create data center configurations withequal numbers of each type of system. Trends are consistentacross various data center sizes, so for brevity, we includehere only results with 1000 platforms of each type. The re-sulting data center has 13 types of platforms, and power con-sumptions vary with allocation as shown in Fig. 7(a). Thefirst interesting observation is that platform heterogeneity al-lows us to achieve improved benefits over a simple randomapproach. Indeed, we see improvements of 22% with perfectknowledge and 21% using our BF based prediction com-pared to a random allocation policy. We also observe a sig-nificant difference between the statistical and analytical pre-diction schemes. The regression approach is unable to scalein terms of accuracy with increased heterogeneity, whereasthe BF approach achieves close to optimal power savings.
In order to evaluate how well our allocator can exploitDPM support, we extend the thirteen platform type configu-ration with an additional Woodcrest 3 GHz platform whichprovides DPM support. We again find that our BF predictionmethod can provide improved aggregate savings across allmachines over the statistical approach as shown in Fig. 7(b).To more closely determine our ability to exploit DPM mech-anisms, we also evaluate the power consumption of the thou-sand DPM-enabled platforms (all of which are active). Wefind that our BF model based allocation is able to improvethe power efficiency of these platforms by 3.3%. This illus-trates the potential of HALM to provide additional benefitswhen platforms vary in the power management they support.
7.2 Maximizing performance under power budgets
As a second benefit of HALM, we evaluate the ability toperform load shedding when power budgeting must be per-formed. In particular, our goal is to maximize performanceobtained in the data center while observing power budgets.For the purposes of this paper, we consider transient power
budgeting where workloads are not (re-)migrated, but in-stead, based upon an existing allocation, resources are tem-porarily withheld from applications and placed into lowpower states. For comparison, we consider three such ini-tial allocations from the previous section, one based upona random allocation, one based upon perfect oracle perfor-mance information, and finally an allocation based upon pre-diction with our BF approach. For each allocation, we con-sider two load shedding policies, one which randomly se-lects an application and sheds resources (i.e. a single com-pute node) if possible, and our greedy “smart” policy de-scribed in Sect. 6.2.
Figure 8(a) provides the aggregate performance acrossall workloads in the data center for different power budgets.The performance results are normalized with respect to themaximum achievable performance at the maximum uncon-strained power budget (100%). The figure shows how theperformance of different allocation policies decrease withdecreasing power budgets. The range of applied power bud-gets is limited to 50% as there is no possible allocation thatmeets the budget below this point. The six scenarios con-sidered are random shedding based upon a random allo-cation (rand-rand), our load shedding policy based upon arandom allocation (smart-rand), similarly the two sheddingapproaches based upon oracle based allocation (rand-oracleand smart-oracle), and finally, shedding based upon a BFprediction based allocation (rand-BF and smart-BF). We seemultiple interesting trends. First, the intrinsic benefits of theheterogeneity-aware allocations towards budgeting are ap-parent in the figure by the fact that performance does notbegin to reduce until lower power budgets compared to arandom allocation scheme. We also see that given a particu-lar allocation, our shedding policy provides benefits in per-formance across the set of power budgets. Moreover, again,we find that our BF prediction model behaves very close toan oracle based allocation when our load shedding policy isused. Overall, we see benefits of up to 18% in performancedegradation compared to a random load shedding policybased upon a random allocation. Figure 8(b) evaluates theperformance degradations for the six approaches by also
taking the fairness of load shedding into account. Here, weshow the aggregate data center performance as the harmonicmean of the individual workload throughputs, which is acommonly used metric for evaluating fairness [20]. We seefrom the figure that a random allocation always exhibits poorperformance regardless of the load shedding policy used.A heterogeneity-aware allocation, on the other hand, pro-vides improved fairness, particularly when combined withour load shedding policy.
8 Related work
HALM builds upon existing work and extends the state ofthe art in power management research. A variety of mecha-nisms exist to provide power and thermal management sup-port within a single platform. Brooks and Martonosi pro-posed mechanisms for the enforcement of thermal thresh-olds on the processor [3], focusing on single platform char-acteristics as opposed to the data center level managementachieved with HALM. Processor frequency and voltagescaling based upon memory access behavior has been shownto successfully provide power savings with minimal impactto applications. Resulting solutions include hardware basedapproaches [19] and OS-level techniques, that set processormodes based on predicted application behavior [14]. HALMis designed to manage load while being aware of such un-derlying power management occurring in server platforms.Power budgeting of SMP systems with a performance lossminimization objective has also been implemented via CPUthrottling [16]. Other budgeting solutions extend platformsupport for fine grain server power limiting [18]. The powerbudgeting achieved with HALM is based upon the use of re-source allocation to reduce power consumption across mul-tiple systems, as opposed to throttling performance of indi-vidual components.
At the data center level, incorporating temperature-awareness into workload placement has been proposed byMoore et al. [21], along with emulation environments forstudies of thermal implications of power management [11].HALM can use these thermal aware strategies to perform
power budgeting based upon data center temperature charac-teristics. Chase et al. discuss how to reduce power consump-tion in data centers by turning servers on and off based ondemand [4]. Utilizing this type of cluster reconfiguration inconjunction with DVFS [7] and the use of spare servers [23]has been investigated as well. As opposed to these ap-proaches, HALM attempts to reduce power consumptionby intelligently managing workloads across heterogeneousservers. Enforcing power budgets within data centers by al-locating power in a non-uniform manner across nodes hasbeen shown to be an effective management technique [9].Techniques for enforcing power budgets at blade enclosuregranularities have also been discussed [24]. HALM bud-gets aggregate power consumption via resource allocationwithout assigning per server power budgets as with theseprevious approaches.
Heterogeneity has been considered to some degree inprior work, including the evaluation of heterogeneous multi-core architectures with different core complexities [17]. Incomparison, HALM considers platform level heterogene-ity as opposed to processor asymmetry. In cluster environ-ments, a scheduling approach for power control has beenproposed for processors with varying fixed frequencies andvoltages [10]. HALM supports heterogeneity across addi-tional dimensions, such as power management capabilitiesand memory. A power efficient web server with intelligentrequest distribution in heterogeneous clusters is another ex-ample which considers leveraging heterogeneity in enter-prise systems [12]. HALM goes beyond these methods, byconsidering not just the differences in platforms’ perfor-mance capabilities, but also in their power management ca-pabilities.
9 Conclusions and future work
Power management in data center environments has becomean important area of research, in part because power deliveryand cooling limitations are quickly becoming a bottleneckin the provisioning of performance required by increasinglydemanding applications. This paper makes use of the man-agement flexibility afforded by virtualization solutions todevelop a heterogeneity-aware load management (HALM)system. HALM improves power management capabilitiesby exploiting the natural heterogeneity of platforms in datacenters, including differences in dynamic power manage-ment support that may be available. We introduce a threephase approach to mapping workloads to underlying re-sources to improve power efficiency consisting of struc-tured platform and workload descriptors, a prediction com-ponent to estimate the performance and power characteris-tics of various workload to platform mappings, and finallyan allocator which utilizes policies and prediction results to
perform decisions. We also evaluate a load shedding pol-icy based upon resulting allocations to improve performancewhen power budgeting must be performed.
Our results underscore two major conclusions. First, weshow that by intelligently considering the varying powermanagement capabilities of platforms, the ability for thesesystems to obtain power savings using their managementmechanisms can be vastly improved when compared toother assignment models. Using representative data centerconfigurations consisting of older P4 based platforms up toIntel Core microarchitecture based systems, we find that ourallocation architecture can improve power efficiency by 20%on average. In addition, our results show that by performingintelligent load shedding when power budgets must be ob-served, 18% improvements in performance degradation canbe obtained when using HALM.
In this paper, we present the beginning of our investi-gation into exploiting platform heterogeneity and emerg-ing virtualization support to improve the power characteris-tics of enterprise computing environments. As future work,we plan to integrate the management tradeoffs and lessonslearned from this work into virtualization layer managementapplications. This includes the consideration of distributedvirtualized workloads such as tiered web services where dif-ferent components may be appropriate for each layer, in-cluding heterogeneous I/O devices. The results presented inthis paper support the potential of this area of research forpower managing heterogeneous computing systems.
References
1. Barham, P., Dragovic, B., Fraser, K., Hand, S., Harris, T., Ho, A.,Neugebauer, R., Pratt, I., Warfield, A.: Xen and the art of virtu-alization. In: Proceedings of the ACM Symposium on OperatingSystems Principles (SOSP), 2003
2. Bianchini, R., Rajamony, R.: Power and energy management forserver systems. IEEE Comput. 37(11), 68–76 (2004)
3. Brooks, D., Martonosi, M.: Dynamic thermal management forhigh-performance microprocessors. In: Proceedings of the 7th In-ternational Symposium on High-Performance Computer Architec-ture (HPCA), January 2001
4. Chase, J., Anderson, D., Thakar, P., Vahdat, A., Doyle, R.: Manag-ing energy and server resources in hosting centers. In: Proceedingsof the 18th Symposium on Operating Systems Principles (SOSP),October 2001
6. Clark, C., Fraser, K., Hand, S., Hansen, J.G., Jul, E., Limpach,C., Pratt, I., Warfield, A.: Live migration of virtual machines. In:Proceedings of the 2nd ACM/USENIX Symposium on NetworkedSystems Design and Implementation (NSDI), May 2005
7. Elnozahy, E.N., Kistler, M., Rajamony, R.: Energy-efficient serverclusters. In: Proceedings of the Workshop on Power-Aware Com-puting Systems, February 2002
8. Fan, X., Weber, W.-D., Barroso, L.: Power provisioning for awarehouse-sized computer. In: Proceedings of the InternationalSymposium on Computer Architecture (ISCA), June 2007
9. Femal, M., Freeh, V.: Boosting data center performance throughnon-uniform power allocation. In: Proceedings of the Second In-ternational Conference on Autonomic Computing (ICAC), 2005
10. Ghiasi, S., Keller, T., Rawson, F.: Scheduling for heterogeneousprocessors in server systems. In: Proceedings of the InternationalConference on Computing Frontiers, 2005
11. Heath, T., Centeno, A.P., George, P., Ramos, L., Jaluria, Y., Bian-chini, R.: Mercury and freon: Temperature emulation and manage-ment in server systems. In: Proceedings of the International Con-ference on Architectural Support for Programming Languages andOperating Systems (ASPLOS), October 2006
12. Heath, T., Diniz, B., Carrera, E.V., Meira, W. Jr., Bianchini, R.:Energy conservation in heterogeneous server clusters. In: Proceed-ings of the 10th Symposium on Principles and Practice of ParallelProgramming (PPoPP), 2005
13. Hewlett-Packard, Intel, Microsoft, Phoenix, and Toshiba: Ad-vanced configuration and power interface specification. http://www.acpi.info (2004)
14. Isci, C., Contreras, G., Martonosi, M.: Live, runtime phase mon-itoring and prediction on real systems with application to dy-namic power management. In: Proceedings of the 39th Interna-tional Symposium on Microarchitecture (MICRO-39), December2006
15. Koh, Y., Knauerhase, R., Brett, P., Bowman, M., Wen, Z., Pu, C.:An analysis of performance interference effects in virtual environ-ments. In: Proceedings of the IEEE International Symposium onPerformance Analysis of Systems and Software (ISPASS), 2007
16. Kotla, R., Ghiasi, S., Keller, T., Rawson, F.: Scheduling processorvoltage and frequency in server and cluster systems. In: Proceed-ings of the Workshop on High-Performance, Power-Aware Com-puting (HP-PAC), 2005
17. Kumar, R., Tullsen, D., Ranganathan, P., Jouppi, N., Farkas,K.: Single-Isa heterogeneous multi-core architectures for multi-threaded workload performance. In: Proceedings of the Interna-tional Symposium on Computer Architecture (ISCA), June 2004
18. Lefurgy, C., Wang, X., Ware, M.: Server-level power control. In:Proceedings of the IEEE International Conference on AutonomicComputing (ICAC), June 2007
19. Li, H., Cher, C., Vijaykumar, T., Roy, K.: Vsv: L2-miss-drivenvariable supply-voltage scaling for low power. In: Proceed-ings of the IEEE International Symposium on Microarchitecture(MICRO-36), 2003
20. Luo, K., Gummaraju, J., Franklin, M.: Balancing throughput andfairness in SMT processors. In: Proceedings of the IEEE Interna-tional Symposium on Performance Analysis of Systems and Soft-ware (ISPASS), November 2001
21. Moore, J., Chase, J., Ranganathan, P., Sharma, R.: Makingscheduling cool: Temperature-aware workload placement in datacenters. In: Proceedings of USENIX ’05, June 2005
22. Nathuji, R., Isci, C., Gorbatov, E.: Exploiting platform heterogene-ity for power efficient data centers. In: Proceedings of the IEEEInternational Conference on Autonomic Computing (ICAC), June2007
23. Rajamani, K., Lefurgy, C.: On evaluating request-distributionschemes for saving energy in server clusters. In: Proceedings ofthe IEEE International Symposium on Performance Analysis ofSystems and Software (ISPASS), March 2003
24. Ranganathan, P., Leech, P., Irwin, D., Chase, J.: Ensemble-levelpower management for dense blade servers. In: Proceedings ofthe International Symposium on Computer Architecture (ISCA),2006
25. Sugerman, J., Venkitachalam, G., Lim, B.-H.: Virtualizing i/o de-vices on VMware workstation’s hosted virtual machine monitor.In: Proceedings of the USENIX Annual Technical Conference,2001
26. Zhang, W.: Linux virtual server for scalable network services. In:Ottawa Linux Symposium, 2000
Ripal Nathuji is a Ph.D. candidatein the School of Electrical and Com-puter Engineering at the Georgia In-stitute of Technology. He previouslyreceived his M.S. in Computer En-gineering from Texas A&M Univer-sity and his B.S. in Electrical Engi-neering and Computer Science fromthe Massachusetts Institute of Tech-nology. His current research focuseson system-level power managementof computing systems, with applica-tions to virtualized enterprise serverenvironments.
Canturk Isci is a senior memberof technical staff at VMware. Hisresearch interests include power-aware computing systems and work-load-adaptive dynamic management.He has a Ph.D. and an M.A. in elec-trical engineering from PrincetonUniversity, an M.S. in VLSI systemdesign from University of Westmin-ster, London, UK, and a B.S. in elec-trical engineering from Bilkent Uni-versity, Ankara, Turkey.
Eugene Gorbatov is a researcher atIntel’s Energy Efficient Systems lab.His current research focuses on plat-form power management, particu-larly the interaction of different plat-form components and system soft-ware.
Karsten Schwan is a professor inthe College of Computing at theGeorgia Institute of Technology, andis also the Director of the Centerfor Experimental Research in Com-puter Systems (CERCS). He ob-tained his M.S. and Ph.D. degreesfrom Carnegie-Mellon University inPittsburgh, Pennsylvania, where hebegan his research in high perfor-mance computing, addressing op-erating and programming systemssupport for the Cm* multiprocessor.His current work ranges from topics
in operating and communication systems, to middleware, to paralleland distributed applications, focusing on information-intensive distrib-uted applications in the enterprise domain and in the high performancedomain.