Virtual Melting Temperature: Managing Server Load to Minimize Cooling Overhead with Phase Change Materials Matt Skach 1 , Manish Arora 2,3 , Dean Tullsen 3 , Lingjia Tang 1 , Jason Mars 1 University of Michigan 1 -- Advanced Micro Devices, Inc. 2 -- UC San Diego 3 ISCA ‘18
33
Embed
Virtual Melting Temperature: Managing Server Load to Minimize Cooling … · 2018-06-20 · Virtual Melting Temperature: Managing Server Load to Minimize Cooling Overhead with Phase
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Virtual Melting Temperature: Managing Server Load to Minimize Cooling
Overhead with Phase Change Materials
Matt Skach1, Manish Arora2,3, Dean Tullsen3, Lingjia Tang1, Jason Mars1
University of Michigan1 -- Advanced Micro Devices, Inc.2 -- UC San Diego3
ISCA ‘18
Datacenters
2
Facebook Ireland Datacenter
Facebook datacenter
Huge warehouses full of servers that host the internet and the cloud
Datacenters Cooling
3
● Heat must be removed to prevent:○ Overheating○ Thermal downclocking○ Component failure
● Store energy in a Solid->Liquid phase change● Commercial paraffin wax offers the best properties of currently
available PCMs (Skach, 2015)
The problem with passive TTS
Thermal Time Shifting:
● Paraffin has a limited range of melting temperatures● Melting temperature cannot be changed● Power and temperature profiles vary over lifetime of servers
15Wikimedia Commons
Virtual Melting Temperature
● Datacenters need more flexibility● Create a “virtual” melting temperature separate from the actual melting
temperature
16Microsoft, Wikimedia Commons
Test Infrastructure
● 2U High Throughput Server● 2-day Google Workload trace divided between 5 datacenter workloads
17
Test Methodology
● 5 common datacenter workloads1. Web Search2. Data Caching3. Video Encoding4. Virus Scan5. Clustering
● Consider datacenter where all are colocated○ Contention mitigation techniques applied (eg. Bubble Up (Mars, 2011) and
Protean Code (Laurenzano, 2014))
18
Baseline: Load Balancing Schedulers
● Round Robin and Coolest First
19
Baseline: Load Balancing Schedulers
● Round Robin and Coolest First
● Problem: Average cluster temperature is too low to melt wax
Thermal Aware VMT
● Categorize jobs based upon thermal characteristics○ Binary classification: Would they melt significant wax in isolation?
21
Thermal Aware VMT
● Grouping Value (GV): Controllable ratio of group size○ Proportional to hot group size
● Locate ‘hot jobs’ together in ‘hot group’ to melt wax
22
Thermal Aware VMT Results
● Hot Group sized to melt wax during peak hours
23
Thermal Aware VMT Results
● Balance between melting wax too soon and not melting enough wax
24
GV=24: Hot group is too big
GV=22: Hot group is just right
GV=20: Hot Group is too small
Thermal Aware VMT Results
● Balance between melting wax too soon and not melting enough wax
25
GV=24: Hot group is too big
GV=22: Hot group is just right
GV=20: Hot Group is too small
Wax Aware VMT
● Begin with same setup as VMT-TA● When wax in hot group is fully melted, expand hot group
26
Wax Aware VMT Results
● Hot Group slightly too small: automatically expands during peak load
● Both work well at ideal GV● VMT-WA offers much more flexibility for unpredictable load
30
Smaller Hot Group
BiggerHot Group
Summary
● VMT stores thermal energy when passive TTS alone cannot○ Reduces maximum cooling load of a diurnal workload○ Configurable for varying datacenter power and load levels
● VMT-enabled thermal energy storage can:○ Reduce cooling system size 12%○ Or allow up to 14% more servers under the same cooling budget