Top Banner
George Bain - PS2 Programming Optimisations 1 KRI 2003 PS2 Programming PS2 Programming Optimisations Optimisations George Bain SCEE Technology Group March 21-22, 2003 Moscow, Russia
40

George Bain SCEE Technology Group

Jan 29, 2016

Download

Documents

iniko

PS2 Programming Optimisations. George Bain SCEE Technology Group. March 21-22, 2003 Moscow, Russia. Topics. Performance Analyser DMA Transfers Vector Units Graphics Synthesizer EE Core: CPU File loading. Performance Analyser. Capture snapshot of EE (Core, Bus, Vu0, and Vu1) - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: George Bain SCEE Technology Group

George Bain - PS2 Programming Optimisations 1KRI 2003

PS2 Programming PS2 Programming

OptimisationsOptimisations

George Bain

SCEE Technology Group

March 21-22, 2003

Moscow, Russia

Page 2: George Bain SCEE Technology Group

George Bain - PS2 Programming Optimisations 2KRI 2003

TopicsTopics

• Performance Analyser• DMA Transfers• Vector Units• Graphics Synthesizer• EE Core: CPU• File loading

Page 3: George Bain SCEE Technology Group

George Bain - PS2 Programming Optimisations 3KRI 2003

Performance AnalyserPerformance Analyser

• Capture snapshot of– EE (Core, Bus, Vu0, and Vu1)– GIF and GS

• 7 frames of bus activity• Identify bottlenecks!• Also used as a Dev Kit

Page 4: George Bain SCEE Technology Group

George Bain - PS2 Programming Optimisations 4KRI 2003

PS2 MemoryPS2 Memory

CPU32MB

RDRAM

4MB EmbeddedGraphics Synthesizer

8K Data

8K Frame

16K Instruction

16K Scratchpad

8K Texture

4K DataVector Unit 0

4K Instruction

16K DataVector Unit 1

16K Instruction

N/A

Page 5: George Bain SCEE Technology Group

George Bain - PS2 Programming Optimisations 5KRI 2003

DMADMA

• 128bit Main Data BUS running at 150 MHz• 32MB of RDRAM• EE RDRAM to Device = 2.4GB/Sec• 10 DMA Channels connected to EE devices• DMAC controls data transfer to devices• Data transferred in 16byte units (QuadWord)• Data must be aligned on 128bit boundary

Page 6: George Bain SCEE Technology Group

George Bain - PS2 Programming Optimisations 6KRI 2003

DMA ControllerDMA Controller

Memory32MB

VU0EE CORE

IPU

VU1

SIF

128bit Bus GS4MB

GIF

EE

cache VIFVIFFPU

• Controls data transfers between main memory or SPR to EE devices• Handles arbitration between different DMA channels• Processes DMA Tags• Stall control and MFIFO are available for DMA packets

DMAC

Page 7: George Bain SCEE Technology Group

George Bain - PS2 Programming Optimisations 7KRI 2003

DMA.STR Register polling

Checking End of DMA TransferChecking End of DMA Transfer

CPU BC0F Polling

Main BUS

Page 8: George Bain SCEE Technology Group

George Bain - PS2 Programming Optimisations 8KRI 2003

Cycle StealingCycle Stealing

• Cycle Stealing ON or OFF?– Release is time between two DMA slices– Allow more time for CPU to access the main bus– However it slows down overall DMA transfer

Main Bus

Activity

Release C

ycle

VIF

DM

A S

lice

GIF

DM

A S

lice

Release C

ycle

GIF

DM

A S

lice

VIF

DM

A S

lice

Release C

ycle

Cycle Stealing

Page 9: George Bain SCEE Technology Group

George Bain - PS2 Programming Optimisations 9KRI 2003

Memory FIFOMemory FIFO

• MFIFO can buffer DMA packets if stall occurs on Drain DMA channel– when VU1 or GS becomes the bottleneck

• Avoid Data Cache and perform memory writes to 16K SPR

• Scratchpad DMA provides maximum DMA transfer speed to Memory FIFO

• Reduce main memory consumption

Page 10: George Bain SCEE Technology Group

George Bain - PS2 Programming Optimisations 10KRI 2003

GS FIFOGS FIFO

• What can cause the GS FIFO to become full?– Large primitives such as a full screen sprite– Multiple texture passes

VIF1 DMA

VU1 Run

GIF to GS FIFO

(GS FIFO full)

GS Pixel Engines Busy

GS FIFO requests data from GIF

Page 11: George Bain SCEE Technology Group

George Bain - PS2 Programming Optimisations 11KRI 2003

Draining MFIFO with VIF1Draining MFIFO with VIF1

• What can cause the MFIFO to become full?1. If GS FIFO is full, GIF doesn’t request any data

2. XGKICK instruction will stall VU1

3. VIF1 stalls on sync related instructions such as MSCNT and FLUSHA

SPR MFIFO VIF1 GSVU1 GIF

Page 12: George Bain SCEE Technology Group

George Bain - PS2 Programming Optimisations 12KRI 2003

Geometry and Texture SyncingGeometry and Texture Syncing

• 1.2 GB/Sec Bandwidth to GS• PATH1 for Geometry and PATH3 for Textures

VU1 VU1 MEM

PATH 1

VIF1 FIFO

MAIN BUSMAIN RAM

PATH 2 PATH 3

GIF FIFO

GIF

GS

Page 13: George Bain SCEE Technology Group

George Bain - PS2 Programming Optimisations 13KRI 2003

Texture Transfer PathsTexture Transfer Paths

• PATH2– Advantages

• Easy to transfer textures and set other GS registers• No geometry and texture data sync problems

– Disadvantages• PATH1 will stall if PATH2 is still in progress

• PATH3– Advantages

• Parallel DMA transfers through VIF1 and GIF channels • GIF can operate in 2 different modes when using IMAGE mode• Avoids PATH1 stalls when operating GIF in IMT mode

– Disadvantages• Sometimes difficult to synchronize geometry and texture data

Page 14: George Bain SCEE Technology Group

George Bain - PS2 Programming Optimisations 14KRI 2003

GIF in Intermittent ModeGIF in Intermittent Mode

• What are the benefits?– Allows texture transfers via the GIF while VIF1 and

VU1 continue to process data

• What are some things I should consider?– IMT Mode is good when loading large texture blocks– If GIF is constantly being occupied by PATH1 then

texture transfer via PATH3 is reduced– Can’t draw and transfer textures at same time!– Batch textures together to limit overhead!

Page 15: George Bain SCEE Technology Group

George Bain - PS2 Programming Optimisations 15KRI 2003

GIF IMT Mode GIF IMT Mode OFFOFF

GIF DMA

Complete

Geometry

GIF DMA

Texture

VU1 Running

VIF1 DMA

VU1 Stalling

Page 16: George Bain SCEE Technology Group

George Bain - PS2 Programming Optimisations 16KRI 2003

GIF IMT Mode GIF IMT Mode ONON

GIF DMA

VIF1 DMA

Texture

VU1

Running No XGKICK

Stall

Geometry

Page 17: George Bain SCEE Technology Group

George Bain - PS2 Programming Optimisations 17KRI 2003

Packing Texture DataPacking Texture Data

• Pack 4-Bit and 8-Bit texture data– 32-Bit textures provide maximum transfer speed– 4/8-Bit textures must be converted by the GS

• Consider the transfer speed and block layouts– 16 and 32-Bit pixel modes have very similar speeds

Size W Size H PATH3 MB/SFormat32-Bit

16-Bit

8-Bit

256 256 1070256 256 1050256 256 785

4-Bit 256 256 380

PATH2 MB/S10901075800385

Page 18: George Bain SCEE Technology Group

George Bain - PS2 Programming Optimisations 18KRI 2003

VCL ToolVCL Tool

• Application that simplifies Vu1 Programming• Available for Linux and Windows• Generates VSM source code• Handles many tasks

– Dual Pipeline processing– Loop unrolling– Register allocation– Instruction scheduling

Page 19: George Bain SCEE Technology Group

George Bain - PS2 Programming Optimisations 19KRI 2003

Vu0 UsageVu0 Usage

• Transferring Data to Vu0– Cop2 connection you can transfer 1QW in 2Cycles

– DMA transfer you can transfer 1QW in 4Cycles

• Processing Data with Vu0– Vu0 running Micro code

– Triple Buffer Scratchpad memory• Transfer data to Block A

• Process Block A and Transfer Block B

• Drain Block A, Process B, Transfer C

Page 20: George Bain SCEE Technology Group

George Bain - PS2 Programming Optimisations 20KRI 2003

Geometry Data TransferGeometry Data Transfer

• Reduce memory consumption and bandwidth– Remember Vector Unit register VF00.w = 1.0

1.0f Z Y X

1.0f 1.0f T S

A B G R

1.0f Nz Ny Nx

A B G R

Ny Nx T S

Nz Z Y X

4QW Per Vertex 3QW Per Vertex

Page 21: George Bain SCEE Technology Group

George Bain - PS2 Programming Optimisations 21KRI 2003

Compress Geometry DataCompress Geometry Data

• use the VIF to convert integer to float• use the VU to convert integer to float

Vector Unpack Mode

X,Y,Z 16 Bit

16 BitS,T

8 BitRGBA

VU Instruction

ITOF0

ITOF12

ITOF0

16 BitNx,Ny,Nz ITOF15

Compress 4 QW to 1.25 QW

Page 22: George Bain SCEE Technology Group

George Bain - PS2 Programming Optimisations 22KRI 2003

GS Frame BuffersGS Frame Buffers

• Total of 4 MB of Embedded DRAM• Draw, Display, Z and Texture Buffers• What are some recommended buffer sizes?

– PAL (512 x 512), NTSC (512 x 448)– Progressive scan support with full height buffers

• 2-Circuits of the GS to reduce interlace flicker– alpha blend odd/even fields at no cost

Page 23: George Bain SCEE Technology Group

George Bain - PS2 Programming Optimisations 23KRI 2003

GS CapabilitiesGS Capabilities

• Bandwidth– Massive total of 48 GB/Sec– Frame Buffer 38.4 GB/Sec– Texture Buffer 9.6 GB/Sec

• Drawing Speed– 16 Pixel for non-textured (2.4 Gpixels/Sec)

• 75M Flat shaded Triangles/Sec

– 8 Pixel for textured (1.2 Gpixels/Sec)• 37.5M Textured and Gouraud shaded Triangles/Sec

Page 24: George Bain SCEE Technology Group

George Bain - PS2 Programming Optimisations 24KRI 2003

GS PipelineGS Pipeline

Host IF

Set-up and Rasterizing

Pixel Pipeline x 16

Memory IF

VRAM 4MB

Frame Buffer Texture Buffer

Emotion Engine

PCRTC

Video Out

48 GB/Sec

Page 25: George Bain SCEE Technology Group

George Bain - PS2 Programming Optimisations 25KRI 2003

GS Frame/Z CacheGS Frame/Z Cache

4K

Z32x32

Frame 32x32

• Quick Page refills!– 8192bits per cycle – 8K page buffer refilled in 8 GS cycles

4K

Page 26: George Bain SCEE Technology Group

George Bain - PS2 Programming Optimisations 26KRI 2003

Reducing Frame Page MissesReducing Frame Page Misses

• Fill rate is roughly constant if varying height• Wide Primitives will cause page misses

– Use 32 Pixel wide strips to reduce page misses

• Rarely drop below 1Gpixel/Sec if miss occurs• Primitives using textures greater than a page

size are usually more of a problem• 8Bit texture page is 128x64

Page 27: George Bain SCEE Technology Group

George Bain - PS2 Programming Optimisations 27KRI 2003

Texture Fill RatesTexture Fill Rates

• Texture Page misses have biggest effect– Subdivide large texture co-ordinate ranges– Keep mip-maps in the same page

• Texture reduction reduces the fill rate– 32 pixel wide strips won’t increase performance– Texel read becomes bottleneck

• Texture expansion doesn’t affect fill rate

Page 28: George Bain SCEE Technology Group

George Bain - PS2 Programming Optimisations 28KRI 2003

0

500

1000

1500

2x2

4x4

8x8

16x1

6

32x3

2

64x6

4

128x

128

256x

256

*Texture is on cache without reducing size

Fill

ra

te

Untextured

Textured*

Fill Rate VS Triangle SizeFill Rate VS Triangle Size

Page 29: George Bain SCEE Technology Group

George Bain - PS2 Programming Optimisations 29KRI 2003

• Make better use of LOD!– 5000 polygon model may result in just 50 visible

pixels once projected onto the screen– there’s also no point having detailed textures that

are going to be shrunk so much

• Mip Mapping– Improve visual quality– Mip maps in different pages can cause multiple

texture cache reloads

Level Of DetailLevel Of Detail

Page 30: George Bain SCEE Technology Group

George Bain - PS2 Programming Optimisations 30KRI 2003

Multi-Pass RenderingMulti-Pass Rendering

• GS Alpha Blend operation is free!• Maximum textured fill rate is 1.2G Pixels/Sec

– Limit number of passes (4 passes = 300M P/S)

• Fur rendering– Reduce passes when object in distance

• Bump-mapping is possible – Technique requires full screen passes

• Back face cull to reduce GS stalls

Page 31: George Bain SCEE Technology Group

George Bain - PS2 Programming Optimisations 31KRI 2003

0200400600800

10001200

2x2

4x4

8x8

16x1

6

32x3

2

64x6

4

128x

128

256x

256

*Texture is on cache without reducing size

Fill

ra

te

Textured*

Texture*+Fog

GS FogGS Fog

Page 32: George Bain SCEE Technology Group

George Bain - PS2 Programming Optimisations 32KRI 2003

• Technique 1– 1st pass draw a textured polygon– 2nd pass alpha blend gouraud shaded polygon

• Technique 2– Post-process and perspective correct fogging– Move bits 8-15 of Z-Buffer into Alpha of Draw Buffer– Alpha blend full screen gouraud shaded polygon

onto Draw Buffer

Alternative FogAlternative Fog

Page 33: George Bain SCEE Technology Group

George Bain - PS2 Programming Optimisations 33KRI 2003

CPU OptimisationsCPU Optimisations

• Emotion Engine Core– FPU (Coprocessor 1)– Vu0 (Coprocessor 2)– 16K Instruction Cache– 8K Data Cache– 16K Scratch-Pad Memory

• Instruction Set– 64Bit MIPS III and some MIPS IV– 128Bit Multi-Media

Page 34: George Bain SCEE Technology Group

George Bain - PS2 Programming Optimisations 34KRI 2003

Multi-Media InstructionsMulti-Media Instructions

• 128-Bit Multi-Media Instructions• Parallel Processing

– 64 bits x2, 32 bits x4, 16 bits x8, 8 bits x16• Image format conversions• Sound decompressing• Pack DMA packets

– Convert PACKED mode to REGLIST mode– Smaller data, faster DMA transfers!

Page 35: George Bain SCEE Technology Group

George Bain - PS2 Programming Optimisations 35KRI 2003

Use of Data CacheUse of Data Cache

• Data Suitable for the Data Cache– Data that is frequently read or written

repeatedly– Data with a high degree of locality

• Don’t use Data Cache for– Data that gets used only once– Big chunks of data larger than 8K

Page 36: George Bain SCEE Technology Group

George Bain - PS2 Programming Optimisations 36KRI 2003

Reduce Cache MissesReduce Cache Misses

• Prefetch instruction to load data beforehand• Reduce the size of your code for I$• Use Uncached memory for data r/w only once• Performance Counter Lib to measure misses

Page 37: George Bain SCEE Technology Group

George Bain - PS2 Programming Optimisations 37KRI 2003

Scratchpad MemoryScratchpad Memory

• 16K of high-speed memory (access directly)• 2 dedicated DMA Channels (toSPR/fromSPR)• SPR DMA provides best throughput

– 100% Occupy and 85% Send

• Data Suitable for the SPR– Frequently used data where speed is a priority– Big chunks of data can be Double Buffered on

SPR memory

Page 38: George Bain SCEE Technology Group

George Bain - PS2 Programming Optimisations 38KRI 2003

CD/DVD OptimisationsCD/DVD Optimisations

• Align destination buffer on 64 Bytes– Increase performance by 25%!

• Combine files into a PAK file to reduce files• Avoid seeking when you could be reading• Load the most data you can per read

– Combine IOP modules and load into EE

Page 39: George Bain SCEE Technology Group

George Bain - PS2 Programming Optimisations 39KRI 2003

SummarySummary

• PA will push developers to the limit!• Parallel Texture and Geometry Transfer• DMA is flexible and very powerful!• Take into consideration GS page sizes• Vector Unit 0 and Scratchpad memory• Check assembler output of generated code

Page 40: George Bain SCEE Technology Group

George Bain - PS2 Programming Optimisations 40KRI 2003

Contact InformationContact Information

[email protected]

• Website for Licensed Developers– www.ps2-pro.com

• SCEE DevStation 2003– www.devstation.scee.com