Top Banner
1 Farhan Mohamed Ali (W2- 1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation W2 Project Objective: Design a crucial part of a GPU called the Multiply Accumulate Unit (MAC) which will revolutionize graphics. Design Manager: Zack Menegakis
40

1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation.

Dec 19, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation.

1

Farhan Mohamed Ali (W2-1)Jigar Vora (W2-2)Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4)

Presentation 12

MAD MAC 525

26th April, 2006Short Final Presentation

W2

Project Objective:Design a crucial part of a GPU called the Multiply Accumulate Unit (MAC) which will revolutionize graphics.

Design Manager: Zack Menegakis

Page 2: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation.

2

Agenda

• Marketing (Jigar)• Project Description (Farhan)• Algorithmic Description (Farhan)• Design Process (Sonali)• Floorplan Evolution (Sonali)• Layout (Avni)• Design Specifications (Avni)• Conclusion (Jigar)

Page 3: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation.

3

MARKETING

• Application of product: HDR rendering in gaming graphics

• Why HDR? Used in games like Far Cry

• Optimization for speed( chose this because of market)

• Competition- if enter market, possible barriers to entry

Page 4: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation.

4

MAD MAC and HDR

• What is HDR?

• Show animation explaining concept

Page 5: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation.

5

MAD MAC and HDR• MAD MAC accelerates FP16 blending to enable true HDR graphics

• What is HDR?

• HDR = High Dynamic Range

• Dynamic range is defined as the ratio of the largest value of a signal to the lowest measurable value

• Dynamic range of luminance in real-world scenes can be 100,000 : 1

• With HDR rendering, pixel intensity are allowed to extend beyond [0..1] range of traditional graphics

•Nature isn’t clamped to [0..1] and neither should CG

• In lay terms:

• Bright things can be really bright

• Dark things can be really dark

• And the details can be seen in both

Page 6: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation.

6

Page 7: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation.

7

• Multiply Accumulate unit (MAC)

• Executes function AB+C on 16 bit floating point inputs. Inputs will be OpenEXR format.

• Multiply and add in parallel to greatly speed up operation

• Rounding is only performed only once so greater accuracy than individual multiply and add functions.

• Also known as:

• Fused Multiply Add (FMA)

• Multiply Add (MAD/MADD) in graphics shader programs

• Many applications benefit from a fast FMA

• Graphics – HDR rendering, blending and shader ops

• DSPs – computing vector dot-products in digital filters

• Fast division, square root – eliminates extra hardware

• Available in many newer CPUs and DSPs because it’s so cool

• One ring (circuit) to rule them all!

PROJECT DESCRIPTION

Page 8: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation.

8

ALGORITHMIC DESCRIPTION

• Step through entire process

• Multiply and align occurs concurrently- always align C to A*B

• Outputs go to adder, normalize, round, overflow checker and output register

Page 9: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation.

9

RegArray A RegArray B RegArray C

Multiplier Exp Calc Align

Adder/SubtractorControlLogic

&Sign

Dtrmin

Normalize

Round

Ovf Checker

Leading 0 Anticipator

10 10 10

5

55

1435225

4

36

14

101

5

5

Input Input Input

Output

16 16 16

16RegY

15

1

1

1

Block Diagram

Page 10: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation.

10

IMPLEMENTATION

• Implementation of each module- how and why we chose a particular method keeping in mind goal of speed( multiplier, adder)

Page 11: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation.

11

Design Decisions (contd.):• Multiplier Implementation

– 11 x 11 Carry-Save Multiplier– Reasons:

• Fast because it avoids having ripple carry in every stage

• Enables Compact Layout

Page 12: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation.

12

Design Process

• Verilog-> Schematic-> Layout– Behavioral -> Structural Verilog– Transistors/gates -> Full Schematic– Gate/Component Layout -> Top Level

• Transistor Count fluctuated from 20,200 to 12,800• Major design decisions

– Decided against implementing denormal arithmetic because it would increase the complexity of the project beyond the scope of the class

– Round performed only once at the end.– Picked nPass over Tgate in the normalize shifter– Adder: variable length carry select-> Han-Carlson binary tree

adder

Page 13: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation.

13

VERIFICATION OF DESIGN

Verilog Simulations ( show outputs)– Overview– How/Why it works– Behavioral/Structural

Explain why we couldn’t get a high-level simulator and how we tested our verilog design.

Page 14: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation.

14

SCHEMATICS

• Show schematics of major blocks: adder, multiplier, and top-level

• HOW WE VERIFIED: analog simulation

Page 15: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation.

15

Top Level Schematic

Page 16: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation.

16

Multiplier Schematic

Page 17: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation.

17

Adder Schematic

Page 18: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation.

18

FLOORPLAN EVOLUTION

• Initial floorplan

• How it evolved (with animation)- why and how we changed it

Page 19: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation.

19

Multiplier

Align C

Reg A

Reg

BExpCalc

Reg C

Pipeline Reg Pipeline Reg

AdderLd

Zero

Pipeline Reg

NormalizeRound

Reg Y

Main Floorplan

Page 20: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation.

20

Floorplan

Page 21: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation.

21

Full Chip LayoutExponent

AlignZero

Adder

MultiplierNormalize

Round

Ovf

Page 22: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation.

22

Pipelining

• Initially planned 5-6 pipeline stages

• Reduced to 4 pipeline stages – made possible by implementing fast carry lookahead adders in critical path modules (adder and multiplier)

Page 23: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation.

23

Pipeline Reg

Pipelining Stages

MultiplierAlign

C

Reg A

Reg

BExpCalc

Reg C

Pipeline Reg Pipeline Reg

AdderLd

Zero

Pipeline Reg

NormalizeRound

Reg Y

Pipeline Reg

Overflow checker

Page 24: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation.

24

LAYOUT

• Final Layout

• Layout of large blocks such as multiplier, adder and normalize

Page 25: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation.

25

Layout Decisions

• 3 standard cell heights

• Uniform width vdd and ground rails

• Wider vdd and ground rails in power hungry modules

• Max of 8 flip flops per clock pulse generator

• Metal directionality

Page 26: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation.

26

Multiplier Layout with pipelining

Page 27: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation.

27

Adder Layout

Page 28: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation.

28

Normalize Layout

Page 29: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation.

29

FINAL LAYOUT

Page 30: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation.

30

Design Specifications

• Worst case delay = 2.25ns

• Long buses are all buffered (not tested yet)

• Estimated clocking speed = 400MHz

• Height by width = 193.86 um * 301.545 um

• Area = 58,458 um^2

• Aspect ratio = 1:1.55

• Total Transistor density = 0.22

Page 31: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation.

31

Layout densities

• Active : 14.05%

• Poly : 9.25%

• Metal 1 : 33.89%

• Metal 2 : 18.00%

• Metal 3 : 14.99%

• Metal 4 : 6.29%

Page 32: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation.

32

Layer Masks - Poly

Page 33: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation.

33

Layer Masks – Metal 1

Page 34: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation.

34

Layer Masks – Metal 2

Page 35: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation.

35

Layer Masks – Metal 3

Page 36: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation.

36

Layer Masks – Metal 4

Page 37: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation.

37

Schematic Power: mW (350Mhz)

Layout Power: mW

Schematic Delay

Layout Delay

Multiplier

-w/ pipeline

2.97

??

N/A

??

3.38n

1.9n

N/A

2.25n

Exponents 1.608 2.21 1.01n 1.2n

Align 0.094 0.113 480p 637p

Adder 8.48 9.73 1.34n 1.7n

Leading 0 0.232 0.857 506p 551p

Normalize 1.458 1.546 407p 437p

Round 0.631 1.21 864p 986p

OvfCheck 0.13 0.19 453p 475p

Registers ?? ?? 179p 193p

Total ?? ?? - -

Page 38: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation.

38

Area:

um2

Transistor Count

Transistor

Density

Multiplier

-w/ pipeline

20388 4496 0.22

Exponents 5,163 738 0.14

Align 3,995 500 0.13

Adder 13,202 3174 0.24

Leading 0 1,253 364 0.29

Normalize 3,190 942 0.3

Round 1,802 494 0.28

OvfCheck 200 70 0.35

Registers, etc

N/A 1948 N/A

Total 58,458 12,730 0.22

Page 39: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation.

39

Conclusion

• More marketing

• Summarize chip functionality

• Extending applications of chip

Page 40: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation.

40

Comments?