Multimedia VLSI Lab. The 43 rd International Symposium on Computer Architecture (Session 10B: Memory 2) 1 Energy Efficient Data Encoding in DRAM channels exploiting Data Value Similarity Hoseok Seol , Wongyu Shin, Jaemin Jang, Jungwhan Choi, Jinwoong Suh, Lee-Sup Kim Department of Electrical Engineering
20
Embed
Energy Efficient Data Encoding in DRAM channels …isca2016.eecs.umich.edu/wp-content/uploads/2016/07/10B-3.pdfMultimedia VLSI Lab. The 43rd International Symposium on Computer Architecture
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Multimedia VLSI Lab.
The 43rd International Symposium on Computer Architecture (Session 10B: Memory 2)
1
Energy Efficient Data Encoding
in DRAM channels exploiting
Data Value Similarity
Hoseok Seol, Wongyu Shin, Jaemin Jang,
Jungwhan Choi, Jinwoong Suh, Lee-Sup Kim
Department of Electrical Engineering
Multimedia VLSI Lab.
The 43rd International Symposium on Computer Architecture (Session 10B: Memory 2)
2
1. Introduction
2. BD-Encoding
3. Evaluation Results
4. Conclusion
Outline
Multimedia VLSI Lab.
The 43rd International Symposium on Computer Architecture (Session 10B: Memory 2)
3
Modern DRAM Interface
DRAM off-chip data bus consumes significant energy.
Data Bus Energy: Switching + Termination (dominant)
Modern DRAMs introduce asymmetric termination.
⇒ Pseudo Open Drain (POD): DDR4, GDDR4/5
⇒ Low Voltage Swing Terminated Logic (LVSTL): LPDDR4
< Center Tapped > < POD > < LVSTL >
bit 1
bit 0
bit 0 bit 1
Multimedia VLSI Lab.
The 43rd International Symposium on Computer Architecture (Session 10B: Memory 2)
4
Hamming Weight & Interface Energy
Hamming Weight: number of 1’s in a string of bits.
Decreasing Hamming Weight reduces both the
termination and switching energy.
We propose novel data encoding to reduce data bus energy.
Data: “11101010”
Data: “00000010”
Ex) LVSTL interface
Encoding
Hamming Weight: 5
Switching Activity: 6
Hamming Weight: 1
Switching Activity: 2
Multimedia VLSI Lab.
The 43rd International Symposium on Computer Architecture (Session 10B: Memory 2)
5
Bitwise Difference (BD) Encoding
Observation: Similar data words are sent over the
DRAM data bus.
Key Idea: Transfer the bit-wise difference between a
current data word and the most similar data words.
Energy Reduction: 58.3% of termination and 45.3% of
switching energy.
Multimedia VLSI Lab.
The 43rd International Symposium on Computer Architecture (Session 10B: Memory 2)
6
1. Introduction
2. BD-Encoding
3. Evaluation Results
4. Conclusion
Outline
Multimedia VLSI Lab.
The 43rd International Symposium on Computer Architecture (Session 10B: Memory 2)
7
Motivation
43.1
21.5
14.2 14.1
7.0
0.0
10.0
20.0
30.0
40.0
50.0
Stand-by ACT/PRE RD/WR Termination Switching
Pro
po
rtio
n [
%]
< Energy dissipation in DRAM sub-system >
(Micron DDR4 Power Calculator, DDR4-2133)
Energy dissipated in DDR4 data bus:
Termination (14.1%) + Switching Activity (7%)
Multimedia VLSI Lab.
The 43rd International Symposium on Computer Architecture (Session 10B: Memory 2)
8
Observation: Data Value Similarity
Transfer libquantum mcf
1 38 ad b3 00 18 83 24 00 18 67 df aa aa 2a 00 00
2 58 ad b3 00 18 83 24 00 01 00 00 00 00 00 00 00
3 78 ad b3 00 18 83 24 00 98 53 b8 aa aa 2a 00 00
4 98 ad b3 00 18 83 24 00 08 63 b8 aa aa 2a 00 00
5 a8 ad b3 00 18 83 24 00 00 00 00 00 00 00 00 00
6 c8 ad b3 00 18 83 24 00 00 27 bd aa aa 2a 00 00
Observation: Similar data words are sent over the DRAM
data bus.
Multimedia VLSI Lab.
The 43rd International Symposium on Computer Architecture (Session 10B: Memory 2)
9
Observation: Data Value Similarity
All the workloads in SPEC 2006 have Data Value Similarity.
The probability of the similar data occurrence (with recent 64
data words) is 72% in SPEC 2006 workloads.
< Probability of 90% data matching among 64 recent data words >
0
20
40
60
80
100
Pro
ba
bil
ity [
%]
Multimedia VLSI Lab.
The 43rd International Symposium on Computer Architecture (Session 10B: Memory 2)
10
Bitwise Difference Coder
< Overall Structure of BD-coder >
Recent data is stored in both tables in Encoder / Decoder
When transfer data, search the most similar data word.
If similar data exists, transfer 1) bitwise difference, 2) index NO.
If not, transfer the original data.
Multimedia VLSI Lab.
The 43rd International Symposium on Computer Architecture (Session 10B: Memory 2)
11
Example of BD-encoding
W/O encoding BD-encoding
Hamming Weight 5 1
Switching Activity 6 2
0 0 0 1 1 0 0 1
1 1 1 0 1 0 0 0
0 0 0 1 1 0 0 1
1 1 1 0 1 0 0 0
11101010 11101010
1 1 1 0 1 0 1 0
W/O encoding
BD-encoding ( xor data)
0 0 0 0 0 0 1 0
Data Data
Multimedia VLSI Lab.
The 43rd International Symposium on Computer Architecture (Session 10B: Memory 2)
12
Hardware Overheads
Coder (data table 64 entries)
Area: 0.044% of commodity DDR4
Latency: 2.3ns (Transmitter), 0.7ns (Receiver)
Energy: 7pJ (Transmitter), 2pJ (Receiver)
Designed by 65nm logic process
Index Line
a single extra line per 8 data lines.
can be shared with DBI / DM pins in DDR4.
Multimedia VLSI Lab.
The 43rd International Symposium on Computer Architecture (Session 10B: Memory 2)
13
1. Introduction
2. BD-Encoding
3. Evaluation Results
4. Conclusion
Outline
Multimedia VLSI Lab.
The 43rd International Symposium on Computer Architecture (Session 10B: Memory 2)
14 Introduction
Methodology
Component Parameters
Processor Gem5, X86, 3.3GHz
Caches L1 I-cache : 32KB, 4way
L1 D-cache : 64KB, 4way
L2 cache : 2MB, 8way
DRAM DDR4-2133, 8GB
Interface Pseudo Open Drain (DDR4)
Termination Energy Calculation:
Micron DDR4 Power Calculator
Switching Energy Calculation:
E = CV2
Channel capacitance: 15 [pF]
Workloads SPEC CPU 2006
Multimedia VLSI Lab.
The 43rd International Symposium on Computer Architecture (Session 10B: Memory 2)
15
Comparison Points
Data Bus Inversion [M.stan, TVLSI ‘95]
⇒ Transfer inverted data if the hamming weight of inverted one is smaller.
⇒ Adopted in the commodity DRAMs (GDDR4/5, DDR4, LPDDR4)
Power Protocol [K.Basu, MICRO’02], Frequent Value Encoding [J.Yang, ISLPED’01]
⇒ Transfer the table index instead of data when current data is the same
as data transferred recently.
Variable Length Value Encoder [D.suresh, ICCD’05]
⇒ Transfer the table index instead of data when current data is partly
matched with data transferred recently.
Multimedia VLSI Lab.
The 43rd International Symposium on Computer Architecture (Session 10B: Memory 2)
16
Hamming Weight Reduction
BD-Encoding decreases the hamming weights in all
workloads (the least effect in bzip: 29%)
The results increase as the number of table entries
increases (28-58% for 1-64 table entries)
< Hamming Weight Reduction Rate of BD-Encoding >
0
20
40
60
80
100 1 8 64
Re
du
cti
on
Ra
te [
%]
(number of entries)
(workloads)
Multimedia VLSI Lab.
The 43rd International Symposium on Computer Architecture (Session 10B: Memory 2)
17
Comparison to Prior Works
BD-encoding reduces 58.3% of the termination and 45.3% of
the switching energy.
The probability for similar data occurrence is much higher
than that for the same data ⇒ BD-encoding shows better
results than Power Protocol and VALVE.
< Energy Reduction Rate >
Re
du
cti
on
Ra
te [
%]
DBI: Data Bus Inversion
PP: Power Protocol
VALVE: Variable Length Value Encoder
12.4
25.5 34.8
58.3
10.9 20.7
25.8
45.3
0
20
40
60
80
DBI PP_64 VALVE_64 Proposed
Work_64
DBI PP_64 VALVE_64 Proposed
Work_64
Termination Energy Switching Energy
Multimedia VLSI Lab.
The 43rd International Symposium on Computer Architecture (Session 10B: Memory 2)
18
Interface Energy Reduction
< Interface Energy including Coder Hardware >
BD-encoding reduces overall interface energy including
coder hardware energy (24-47.6% for 1-64 entries)
Optimal number of entries exists (32ea) due to overhead of
index line and coder hardware.
0
20
40
60
80
100
1 2 4 8 16 32 64
Index line
Coder
Data Bus
(number
of entries)
Baseline
Rela
tive E
nerg
y [
%]
24 30.4 36.8 41.9 45.6 47.6 47.1
Multimedia VLSI Lab.
The 43rd International Symposium on Computer Architecture (Session 10B: Memory 2)
19
1. Introduction
2. BD-Encoding
3. Evaluation Results
4. Conclusion
Outline
Multimedia VLSI Lab.
The 43rd International Symposium on Computer Architecture (Session 10B: Memory 2)
20
Conclusion
Reducing hamming weight decreases both the termination and switching energy.
Data Value Similarity: Similar data words are sent over the
DRAM data bus.
Bitwise Different Encoding: Transfer the bit-wise difference
between a current data word and the most similar data
word recently transferred.
Evaluation Results: Reduce 58.3% of termination and