Top Banner
NRGLoops: Adjus/ng Power from within Applica/ons Melanie Kambadur* + , Martha Kim* *Columbia University, New York, NY USA + Oscar Health Insurance, New York, NY USA
43

NRG$Loops:*Adjus/ng*Power*from* within*Applica/onsarcade.cs.columbia.edu/nrgloops-cgo16-slides.pdf · 10%Increase* Power*oknow.* Sllgeta slowdown. 17 Moreover,*must*slow*ALL*apps*on*the*same*core.*

Dec 03, 2018

Download

Documents

vominh
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: NRG$Loops:*Adjus/ng*Power*from* within*Applica/onsarcade.cs.columbia.edu/nrgloops-cgo16-slides.pdf · 10%Increase* Power*oknow.* Sllgeta slowdown. 17 Moreover,*must*slow*ALL*apps*on*the*same*core.*

NRG-­‐Loops:  Adjus/ng  Power  from  within  Applica/ons  

Melanie  Kambadur*+,  Martha  Kim*  *Columbia  University,  New  York,  NY  USA  

+Oscar  Health  Insurance,  New  York,  NY  USA    

 

Page 2: NRG$Loops:*Adjus/ng*Power*from* within*Applica/onsarcade.cs.columbia.edu/nrgloops-cgo16-slides.pdf · 10%Increase* Power*oknow.* Sllgeta slowdown. 17 Moreover,*must*slow*ALL*apps*on*the*same*core.*

Once,  power/performance  tradeoffs  were  set  at  HW  design  7me…  

 

Power  efficiency  evolu/on  

2  

-­‐  Less  power  -­‐  Slower  runGme  

+  More  power  +  Faster  runGme  

Low  Freq.  

High  Freq.  

Page 3: NRG$Loops:*Adjus/ng*Power*from* within*Applica/onsarcade.cs.columbia.edu/nrgloops-cgo16-slides.pdf · 10%Increase* Power*oknow.* Sllgeta slowdown. 17 Moreover,*must*slow*ALL*apps*on*the*same*core.*

Once,  power/performance  tradeoffs  were  set  at  HW  design  7me…  

 

Power  efficiency  evolu/on  

3  

-­‐  Less  power  -­‐  Slower  runGme  

+  More  power  +  Faster  runGme  

Low  Freq.  

High  Freq.  

+  Med.  power  +  Faster  runGme  

Page 4: NRG$Loops:*Adjus/ng*Power*from* within*Applica/onsarcade.cs.columbia.edu/nrgloops-cgo16-slides.pdf · 10%Increase* Power*oknow.* Sllgeta slowdown. 17 Moreover,*must*slow*ALL*apps*on*the*same*core.*

Once,  power/performance  tradeoffs  were  set  at  HW  design  7me…  

 

Power  efficiency  evolu/on  

4  

-­‐  Less  power  -­‐  Slower  runGme  

+  More  power  +  Faster  runGme  

Low  Freq.  

High  Freq.  

+  Med.  power  +  Faster  runGme  

-­‐-­‐  Power  ++  RunGme  Specialized  HW  

Page 5: NRG$Loops:*Adjus/ng*Power*from* within*Applica/onsarcade.cs.columbia.edu/nrgloops-cgo16-slides.pdf · 10%Increase* Power*oknow.* Sllgeta slowdown. 17 Moreover,*must*slow*ALL*apps*on*the*same*core.*

The  next  big  thing  was  tunable  “knobs”  

 

Power  efficiency  evolu/on  

5  

Dynamic  Frequency  Tuning  (DFS/DVFS)  

Page 6: NRG$Loops:*Adjus/ng*Power*from* within*Applica/onsarcade.cs.columbia.edu/nrgloops-cgo16-slides.pdf · 10%Increase* Power*oknow.* Sllgeta slowdown. 17 Moreover,*must*slow*ALL*apps*on*the*same*core.*

The  next  big  thing  was  tunable  “knobs”  

 

Power  efficiency  evolu/on  

6  

Dynamic  Frequency  Tuning  (DFS/DVFS)  

CPU  Idle  Modes  

Page 7: NRG$Loops:*Adjus/ng*Power*from* within*Applica/onsarcade.cs.columbia.edu/nrgloops-cgo16-slides.pdf · 10%Increase* Power*oknow.* Sllgeta slowdown. 17 Moreover,*must*slow*ALL*apps*on*the*same*core.*

The  next  big  thing  was  tunable  “knobs”  

 

Power  efficiency  evolu/on  

7  

Dynamic  Frequency  Tuning  (DFS/DVFS)  

CPU  Idle  Modes  

Asymmetric  MulGcores  

Page 8: NRG$Loops:*Adjus/ng*Power*from* within*Applica/onsarcade.cs.columbia.edu/nrgloops-cgo16-slides.pdf · 10%Increase* Power*oknow.* Sllgeta slowdown. 17 Moreover,*must*slow*ALL*apps*on*the*same*core.*

How  do  we  use  these  HW  knobs  for  SW  power  &  energy  efficiency?        

Moving  power  efficiency  up  the  stack  

8  

Dynamic  Frequency  Tuning  (DFS/DVFS)  

CPU  Idle  Modes  

Asymmetric  MulGcores  

Page 9: NRG$Loops:*Adjus/ng*Power*from* within*Applica/onsarcade.cs.columbia.edu/nrgloops-cgo16-slides.pdf · 10%Increase* Power*oknow.* Sllgeta slowdown. 17 Moreover,*must*slow*ALL*apps*on*the*same*core.*

Using  HW  knobs  for  SW  energy  efficiency  

9  

Most  SW  energy  efficiency  soluGons  expose  “hints”  to  OS,  which  then  tunes  HW  knobs.    func  foo  _high_power_  {  

 //  some  code  }        func  bar    _low_power_  {  

 //  some  code  }        func  baz  _high_power_  {  

 //  some  code  }      

High  freq.  

Low  freq.  

Highfreq.  

Page 10: NRG$Loops:*Adjus/ng*Power*from* within*Applica/onsarcade.cs.columbia.edu/nrgloops-cgo16-slides.pdf · 10%Increase* Power*oknow.* Sllgeta slowdown. 17 Moreover,*must*slow*ALL*apps*on*the*same*core.*

Using  HW  knobs  for  SW  energy  efficiency  

10  

Most  SW  energy  efficiency  soluGons  expose  “hints”  to  OS,  which  then  tunes  HW  knobs.    class  Foo  _high_power_  {  

 //  some  code  }        class  Bar    _low_power_  {  

 //  some  code  }        class  Baz  _high_power_  {  

 //  some  code  }      

High  freq.  

Low  freq.  

Highfreq.  

Page 11: NRG$Loops:*Adjus/ng*Power*from* within*Applica/onsarcade.cs.columbia.edu/nrgloops-cgo16-slides.pdf · 10%Increase* Power*oknow.* Sllgeta slowdown. 17 Moreover,*must*slow*ALL*apps*on*the*same*core.*

Using  HW  knobs  for  SW  energy  efficiency  

11  

Most  SW  energy  efficiency  soluGons  expose  “hints”  to  OS,  which  then  tunes  HW  knobs.    class  Foo  _high_power_  {  

 //  some  code  }        class  Bar    _low_power_  {  

 //  some  code  }        class  Baz  _high_power_  {  

 //  some  code  }      

Idle  some  cores  

Page 12: NRG$Loops:*Adjus/ng*Power*from* within*Applica/onsarcade.cs.columbia.edu/nrgloops-cgo16-slides.pdf · 10%Increase* Power*oknow.* Sllgeta slowdown. 17 Moreover,*must*slow*ALL*apps*on*the*same*core.*

STOP  using  HW  knobs  for  SW  energy  efficiency  

12  

Most  SW  energy  efficiency  soluGons  expose  “hints”  to  OS,  which  then  tunes  HW  knobs.    •  Hard  to  manage  HW  power  when  mulGple  

programs  give  hints  simultaneously  

•  HW  can  predict  idle  periods  bederà  sub-­‐cycle  DVFS  tuning?    

•  In  pracGce,  most  HW  tuning  increases  runGme  to  save  power,  so  can’t  save  energy  during  SW  acGve  periods.    

Page 13: NRG$Loops:*Adjus/ng*Power*from* within*Applica/onsarcade.cs.columbia.edu/nrgloops-cgo16-slides.pdf · 10%Increase* Power*oknow.* Sllgeta slowdown. 17 Moreover,*must*slow*ALL*apps*on*the*same*core.*

0  

20  

40  

60  

80  

100  

0   2   4   6   8   10   12   14  

Power  (%

 Total)  

Time  (s)  

Mobile  app  example    

Segment  of  mobile  game  that  takes  10s.  

13  

Page 14: NRG$Loops:*Adjus/ng*Power*from* within*Applica/onsarcade.cs.columbia.edu/nrgloops-cgo16-slides.pdf · 10%Increase* Power*oknow.* Sllgeta slowdown. 17 Moreover,*must*slow*ALL*apps*on*the*same*core.*

0  

20  

40  

60  

80  

100  

0   2   4   6   8   10   12   14  

Power  (%

 Total)  

Time  (s)  

Too  much  power!  

Mobile  app  example    Want  app  to  consume  <=  80%  power.  

14  

Page 15: NRG$Loops:*Adjus/ng*Power*from* within*Applica/onsarcade.cs.columbia.edu/nrgloops-cgo16-slides.pdf · 10%Increase* Power*oknow.* Sllgeta slowdown. 17 Moreover,*must*slow*ALL*apps*on*the*same*core.*

0  

20  

40  

60  

80  

100  

0   2   4   6   8   10   12   14  

Power  (%

 Total)  

Time  (s)  

25%  Increase  

Power  ok  now.  

Op/on  1:  Let  HW  handle  with  DVFS    

But  you  get  a  slowdown.  

15  

Page 16: NRG$Loops:*Adjus/ng*Power*from* within*Applica/onsarcade.cs.columbia.edu/nrgloops-cgo16-slides.pdf · 10%Increase* Power*oknow.* Sllgeta slowdown. 17 Moreover,*must*slow*ALL*apps*on*the*same*core.*

0  

20  

40  

60  

80  

100  

0   2   4   6   8   10   12   14  

Power  (%

 Total)  

Time  (s)  

Op/on  2:  Compiler/Language  Smart  DVFS    

10%  Increase  

Power  ok  now.  

S/ll  get  a  slowdown.  

16  

Page 17: NRG$Loops:*Adjus/ng*Power*from* within*Applica/onsarcade.cs.columbia.edu/nrgloops-cgo16-slides.pdf · 10%Increase* Power*oknow.* Sllgeta slowdown. 17 Moreover,*must*slow*ALL*apps*on*the*same*core.*

0  

20  

40  

60  

80  

100  

0   2   4   6   8   10   12   14  

Power  (%

 Total)  

Time  (s)  

Op/on  2:  Compiler/Language  Smart  DVFS    

10%  Increase  

Power  ok  now.  

S/ll  get  a  slowdown.  

17  

Moreover,  must  slow  ALL  apps  on  the  same  core.  

Page 18: NRG$Loops:*Adjus/ng*Power*from* within*Applica/onsarcade.cs.columbia.edu/nrgloops-cgo16-slides.pdf · 10%Increase* Power*oknow.* Sllgeta slowdown. 17 Moreover,*must*slow*ALL*apps*on*the*same*core.*

0  

20  

40  

60  

80  

100  

0   2   4   6   8   10   12   14  

Power  (%

 Total)  

Time  (s)  

Total     AdverGsement  

Op/on  3:  Trade  func/onality  for  power  

18  

Banner  ad:  

Page 19: NRG$Loops:*Adjus/ng*Power*from* within*Applica/onsarcade.cs.columbia.edu/nrgloops-cgo16-slides.pdf · 10%Increase* Power*oknow.* Sllgeta slowdown. 17 Moreover,*must*slow*ALL*apps*on*the*same*core.*

0  

20  

40  

60  

80  

100  

0   2   4   6   8   10   12   14  

Power  (%

 Total)  

Time  (s)  

Total     AdverGsement  

Op/on  3:  Trade  func/onality  for  power  

19  

Banner  ad:  

Ad  is  responsible  for  power  spike  

Page 20: NRG$Loops:*Adjus/ng*Power*from* within*Applica/onsarcade.cs.columbia.edu/nrgloops-cgo16-slides.pdf · 10%Increase* Power*oknow.* Sllgeta slowdown. 17 Moreover,*must*slow*ALL*apps*on*the*same*core.*

0  

20  

40  

60  

80  

100  

0   2   4   6   8   10   12   14  

Power  (%

 Total)  

Time  (s)  

Total     AdverGsement  

Op/on  3:  Trade  func/onality  for  power  

20  

Banner  ad:  

Pause  the  ad,  maintain  power  budget  with  no  /me  delay  

Page 21: NRG$Loops:*Adjus/ng*Power*from* within*Applica/onsarcade.cs.columbia.edu/nrgloops-cgo16-slides.pdf · 10%Increase* Power*oknow.* Sllgeta slowdown. 17 Moreover,*must*slow*ALL*apps*on*the*same*core.*

NRG-­‐Loops:  SW-­‐Only  Power  Management  

21  

•  Instead  of  having  sohware  manage  power  via  hardware  knobs,  have  sohware  manage  power  via  sohware  knobs.  

Page 22: NRG$Loops:*Adjus/ng*Power*from* within*Applica/onsarcade.cs.columbia.edu/nrgloops-cgo16-slides.pdf · 10%Increase* Power*oknow.* Sllgeta slowdown. 17 Moreover,*must*slow*ALL*apps*on*the*same*core.*

NRG-­‐Loops:  SW-­‐Only  Power  Management  

22  

•  Instead  of  having  sohware  manage  power  via  hardware  knobs,  have  sohware  manage  power  via  sohware  knobs.  

HW  Knobs   SW  Knobs  

•  DVFS  •  Idle  cores  •  Asymmetric  

mulGcore  

•  Adjust  caching  strategy  •  Reduce  thread  count  •  EsGmate  mathemaGcal  

funcGon  •  Stop  computaGon  early  

and  dump  memory  

Page 23: NRG$Loops:*Adjus/ng*Power*from* within*Applica/onsarcade.cs.columbia.edu/nrgloops-cgo16-slides.pdf · 10%Increase* Power*oknow.* Sllgeta slowdown. 17 Moreover,*must*slow*ALL*apps*on*the*same*core.*

NRG-­‐Loops:  SW-­‐Only  Power  Management  

23  

•  C++  Language  Extension  to  tune  SW  power  through  SW  knobs.  

 •  Measures  hardware  power  +  energy  and  enables  programs  to  trade  funcGonality  or  accuracy  ONLY  when  runGme  power  or  energy  budgets  are  exceeded.  

 •  Can  work  concurrently  with  HW  power  soluGons.  

Page 24: NRG$Loops:*Adjus/ng*Power*from* within*Applica/onsarcade.cs.columbia.edu/nrgloops-cgo16-slides.pdf · 10%Increase* Power*oknow.* Sllgeta slowdown. 17 Moreover,*must*slow*ALL*apps*on*the*same*core.*

NRG-­‐Loops  ADAPT  

24  

Concise  syntax  adds  only  a  few  lines  of  code  to  exis7ng  programs    NRG_ADAPT_for  (  int  i=0;  i<MAX_ADS  ;  ++i  &&              NRG_AVG_P<=POWER_LIMIT  )  {      //  run  ad  normally  

}  NRG_ALTERNATE  {      usleep  (  PAUSE_TIME  );  

}          

Page 25: NRG$Loops:*Adjus/ng*Power*from* within*Applica/onsarcade.cs.columbia.edu/nrgloops-cgo16-slides.pdf · 10%Increase* Power*oknow.* Sllgeta slowdown. 17 Moreover,*must*slow*ALL*apps*on*the*same*core.*

NRG-­‐Loops  ADAPT  

25  

Concise  syntax  adds  only  a  few  lines  of  code  to  exis7ng  programs    NRG_ADAPT_FOR  (  int  i=0;  i<MAX_ADS  ;  ++i  &&              NRG_AVG_P<=POWER_LIMIT  )  {      //  run  ad  normally  

}  NRG_ALTERNATE  {      usleep  (  PAUSE_TIME  );  

}          

Loop  pragma  

Page 26: NRG$Loops:*Adjus/ng*Power*from* within*Applica/onsarcade.cs.columbia.edu/nrgloops-cgo16-slides.pdf · 10%Increase* Power*oknow.* Sllgeta slowdown. 17 Moreover,*must*slow*ALL*apps*on*the*same*core.*

NRG-­‐Loops  ADAPT  

26  

Concise  syntax  adds  only  a  few  lines  of  code  to  exis7ng  programs    NRG_ADAPT_FOR  (  int  i=0;  i<MAX_ADS  ;  ++i  &&              NRG_AVG_P<=POWER_LIMIT  )  {      //  run  ad  normally  

}  NRG_ALTERNATE  {      usleep  (  PAUSE_TIME  );  

}          

Original  loop  bounds  

Page 27: NRG$Loops:*Adjus/ng*Power*from* within*Applica/onsarcade.cs.columbia.edu/nrgloops-cgo16-slides.pdf · 10%Increase* Power*oknow.* Sllgeta slowdown. 17 Moreover,*must*slow*ALL*apps*on*the*same*core.*

NRG-­‐Loops  ADAPT  

27  

Concise  syntax  adds  only  a  few  lines  of  code  to  exis7ng  programs    NRG_ADAPT_FOR  (  int  i=0;  i<MAX_ADS  ;  ++i  &&              NRG_AVG_P<=POWER_LIMIT  )  {      //  run  ad  normally  

}  NRG_ALTERNATE  {      usleep  (  PAUSE_TIME  );  

}          

Concatenate  power/energy  goals  

Page 28: NRG$Loops:*Adjus/ng*Power*from* within*Applica/onsarcade.cs.columbia.edu/nrgloops-cgo16-slides.pdf · 10%Increase* Power*oknow.* Sllgeta slowdown. 17 Moreover,*must*slow*ALL*apps*on*the*same*core.*

NRG-­‐Loops  ADAPT  

28  

Concise  syntax  adds  only  a  few  lines  of  code  to  exis7ng  programs    NRG_ADAPT_FOR  (  int  i=0;  i<MAX_ADS  ;  ++i  &&              NRG_AVG_P<=POWER_LIMIT  )  {      //  run  ad  normally  

}  NRG_ALTERNATE  {      usleep  (  PAUSE_TIME  );  

}          

Original  loop  body  

Page 29: NRG$Loops:*Adjus/ng*Power*from* within*Applica/onsarcade.cs.columbia.edu/nrgloops-cgo16-slides.pdf · 10%Increase* Power*oknow.* Sllgeta slowdown. 17 Moreover,*must*slow*ALL*apps*on*the*same*core.*

NRG-­‐Loops  ADAPT  

29  

Concise  syntax  adds  only  a  few  lines  of  code  to  exis7ng  programs    NRG_ADAPT_FOR  (  int  i=0;  i<MAX_ADS  ;  ++i  &&              NRG_AVG_P<=POWER_LIMIT  )  {      //  run  ad  normally  

}  NRG_ALTERNATE  {      usleep  (  PAUSE_TIME  );  

}          

Enter  if  power  budget  exceeded  

Page 30: NRG$Loops:*Adjus/ng*Power*from* within*Applica/onsarcade.cs.columbia.edu/nrgloops-cgo16-slides.pdf · 10%Increase* Power*oknow.* Sllgeta slowdown. 17 Moreover,*must*slow*ALL*apps*on*the*same*core.*

NRG-­‐Loops  ADAPT  

30  

Concise  syntax  adds  only  a  few  lines  of  code  to  exis7ng  programs    NRG_ADAPT_FOR  (  int  i=0;  i<MAX_ADS  ;  ++i  &&              NRG_AVG_P<=POWER_LIMIT  )  {      //  run  ad  normally  

}  NRG_ALTERNATE  {      usleep  (  PAUSE_TIME  );  

}          

Alternate,  low-­‐power  loop  body  

Page 31: NRG$Loops:*Adjus/ng*Power*from* within*Applica/onsarcade.cs.columbia.edu/nrgloops-cgo16-slides.pdf · 10%Increase* Power*oknow.* Sllgeta slowdown. 17 Moreover,*must*slow*ALL*apps*on*the*same*core.*

Other  types  of  NRG-­‐Loops  

31  

 NRG_TRUNCATE_FOR  (int  i=0;  i<N;  ++i  &&    

               NRG_TOT_E  <=  FOO_ENERGY)  {      //  original  loop  body  }    NRG_PROB_PERF_FOR  (int  i=0;  i<N;  ++i  &&    

               NRG_TOT_E  <=  FOO_ENERGY;                          PROB_SKIP=0.1)  {    //  original  loop  body  

}    NRG_AUTO_PERF_FOR  (int  i=0;  i<N;  ++i  &&                                          NRG_TOT_E  <=  FOO_ENERGY)  {  

 //  original  loop  body  }        

Do  work  un7l  NRG  Condi7on  is  met  

Page 32: NRG$Loops:*Adjus/ng*Power*from* within*Applica/onsarcade.cs.columbia.edu/nrgloops-cgo16-slides.pdf · 10%Increase* Power*oknow.* Sllgeta slowdown. 17 Moreover,*must*slow*ALL*apps*on*the*same*core.*

Other  types  of  NRG-­‐Loops  

32  

 NRG_TRUNCATE_FOR  (int  i=0;  i<N;  ++i  &&    

               NRG_TOT_E  <=  FOO_ENERGY)  {      //  original  loop  body  }    NRG_PROB_PERF_FOR  (int  i=0;  i<N;  ++i  &&    

               NRG_TOT_E  <=  FOO_ENERGY;                          PROB_SKIP=0.1)  {    //  original  loop  body  

}    NRG_AUTO_PERF_FOR  (int  i=0;  i<N;  ++i  &&                                          NRG_TOT_E  <=  FOO_ENERGY)  {  

 //  original  loop  body  }        

Once  condi7on  met,  do  work  9/10  7mes    

Page 33: NRG$Loops:*Adjus/ng*Power*from* within*Applica/onsarcade.cs.columbia.edu/nrgloops-cgo16-slides.pdf · 10%Increase* Power*oknow.* Sllgeta slowdown. 17 Moreover,*must*slow*ALL*apps*on*the*same*core.*

Other  types  of  NRG-­‐Loops  

33  

 NRG_TRUNCATE_FOR  (int  i=0;  i<N;  ++i  &&    

               NRG_TOT_E  <=  FOO_ENERGY)  {      //  original  loop  body  }    NRG_PROB_PERF_FOR  (int  i=0;  i<N;  ++i  &&    

               NRG_TOT_E  <=  FOO_ENERGY;                          PROB_SKIP=0.1)  {    //  original  loop  body  

}    NRG_AUTO_PERF_FOR  (int  i=0;  i<N;  ++i  &&                                          NRG_TOT_E  <=  FOO_ENERGY)  {  

 //  original  loop  body  }        

Do  work,  skipping  an  es7mated  number  of  itera7ons  to  exactly  match  FOO_ENERGY  

Page 34: NRG$Loops:*Adjus/ng*Power*from* within*Applica/onsarcade.cs.columbia.edu/nrgloops-cgo16-slides.pdf · 10%Increase* Power*oknow.* Sllgeta slowdown. 17 Moreover,*must*slow*ALL*apps*on*the*same*core.*

NRG-­‐Loop  Helpers  

34  

NRG_AUDIT  {      foo()    //  any  code  here  

}  NRG_USAGE  (NRG_USAGE_INFO*  foo_usage);      float  foo_energy  =  foo_usage-­‐>energy;  float  foo_average_power  =  foo_usage-­‐>average_power;  float  foo_wall_time  =  foo_usage-­‐>wall_time;    NRG_AVG_P  <=  50.0  NRG_TOT_E  <=  foo_energy  NRG_AVG_P  <=  0.5*SYS_MAX_POWER            

Capture  the  energy,  power,  and  runGme  use  of  any  code.    

Page 35: NRG$Loops:*Adjus/ng*Power*from* within*Applica/onsarcade.cs.columbia.edu/nrgloops-cgo16-slides.pdf · 10%Increase* Power*oknow.* Sllgeta slowdown. 17 Moreover,*must*slow*ALL*apps*on*the*same*core.*

NRG-­‐Loop  Helpers  

35  

NRG_AUDIT  {      foo()    //  any  code  here  

}  NRG_USAGE  (NRG_USAGE_INFO*  foo_usage);      float  foo_energy  =  foo_usage-­‐>energy;  float  foo_average_power  =  foo_usage-­‐>average_power;  float  foo_wall_time  =  foo_usage-­‐>wall_time;    NRG_AVG_P  <=  50.0    //  Watts  NRG_TOT_E  <=  foo_energy  //  Relative  to  foo()  NRG_AVG_P  <=  0.5*SYS_MAX_POWER  //  Relative  to  TDP            

Different  opGons  to  set  energy/power  budgets.  

Page 36: NRG$Loops:*Adjus/ng*Power*from* within*Applica/onsarcade.cs.columbia.edu/nrgloops-cgo16-slides.pdf · 10%Increase* Power*oknow.* Sllgeta slowdown. 17 Moreover,*must*slow*ALL*apps*on*the*same*core.*

NRG-­‐Loops  Implementa/on  

36  

•  No  adjustments  to  the  O/S,  used  commodity  HW  with  built  in  power  meters  (Intel  RAPL)  – Measure  CPU  +  cache  power  +  esGmate  DRAM  power  

•  ImplementaGon  *should  have*  been  trivial,  but  unfortunately  wasn’t  –  Socket  level  power  meters  make  adribuGng  power  to  processes  tricky  

–  Small,  overflowing  counters  –  To  minimize  monitoring  overhead  we  have  one  monitoring  thread  even  for  mulGthreaded  programs  

Page 37: NRG$Loops:*Adjus/ng*Power*from* within*Applica/onsarcade.cs.columbia.edu/nrgloops-cgo16-slides.pdf · 10%Increase* Power*oknow.* Sllgeta slowdown. 17 Moreover,*must*slow*ALL*apps*on*the*same*core.*

NRG-­‐Loops  Implementa/on  

37  

•  No  adjustments  to  the  O/S,  used  commodity  HW  with  built  in  power  meters  (Intel  RAPL)  – Measure  CPU  +  cache  power  +  esGmate  DRAM  power  

•  ImplementaGon  *should  have*  been  trivial,  but  unfortunately  wasn’t  –  Socket  level  power  meters  make  adribuGng  power  to  processes  tricky  

–  Small,  overflowing  counters  –  To  minimize  monitoring  overhead  we  have  one  monitoring  thread  even  for  mulGthreaded  programs  

•  Profiling  power  &  energy  goals  adds  <1%  overhead.    

Page 38: NRG$Loops:*Adjus/ng*Power*from* within*Applica/onsarcade.cs.columbia.edu/nrgloops-cgo16-slides.pdf · 10%Increase* Power*oknow.* Sllgeta slowdown. 17 Moreover,*must*slow*ALL*apps*on*the*same*core.*

NRG-­‐Loops  Implementa/on  

38  

•  No  adjustments  to  the  O/S,  used  commodity  HW  with  built  in  power  meters  (Intel  RAPL)  – Measure  CPU  +  cache  power  +  esGmate  DRAM  power  

•  ImplementaGon  *should  have*  been  trivial,  but  unfortunately  wasn’t  –  Socket  level  power  meters  make  adribuGng  power  to  processes  tricky  

–  Small,  overflowing  counters  –  To  minimize  monitoring  overhead  we  have  one  monitoring  thread  even  for  mulGthreaded  programs  

•  Profiling  power  &  energy  goals  adds  <1%  overhead.    

Page 39: NRG$Loops:*Adjus/ng*Power*from* within*Applica/onsarcade.cs.columbia.edu/nrgloops-cgo16-slides.pdf · 10%Increase* Power*oknow.* Sllgeta slowdown. 17 Moreover,*must*slow*ALL*apps*on*the*same*core.*

Results:  NRG_ADAPT  Minesweeper  

39  

When  adver7sements  are  consuming  too  much  energy,  force  them  to  occasionally  pause,  decreasing  net  game  plus  ad  energy.  

Page 40: NRG$Loops:*Adjus/ng*Power*from* within*Applica/onsarcade.cs.columbia.edu/nrgloops-cgo16-slides.pdf · 10%Increase* Power*oknow.* Sllgeta slowdown. 17 Moreover,*must*slow*ALL*apps*on*the*same*core.*

Results:  NRG_ADAPT  Parallel  Programs  

40  

Reduce  soYware  thread  count  to  keep  within  a  power  budget.  

Energy  RelaG

ve    

to  Uncappe

d  

Page 41: NRG$Loops:*Adjus/ng*Power*from* within*Applica/onsarcade.cs.columbia.edu/nrgloops-cgo16-slides.pdf · 10%Increase* Power*oknow.* Sllgeta slowdown. 17 Moreover,*must*slow*ALL*apps*on*the*same*core.*

Ques/ons?  

41  

   •  Melanie  Kambadur  

([email protected])  •  Martha  Kim  ([email protected])    

NRG-­‐Loops:  Adjus7ng  Power  from  within  Applica7ons  

Page 42: NRG$Loops:*Adjus/ng*Power*from* within*Applica/onsarcade.cs.columbia.edu/nrgloops-cgo16-slides.pdf · 10%Increase* Power*oknow.* Sllgeta slowdown. 17 Moreover,*must*slow*ALL*apps*on*the*same*core.*
Page 43: NRG$Loops:*Adjus/ng*Power*from* within*Applica/onsarcade.cs.columbia.edu/nrgloops-cgo16-slides.pdf · 10%Increase* Power*oknow.* Sllgeta slowdown. 17 Moreover,*must*slow*ALL*apps*on*the*same*core.*

Results:  NRG_PERFORATE  

43  

Set  a  target  energy  budget  and  drop  frames  to  meet  it.