Top Banner
ARM for Wireless Applications ARM11 Microarchitecture On the ARMv6 Connie Wang
25

ARM for Wireless Applications ARM11 Microarchitecture On the ARMv6 Connie Wang.

Jan 03, 2016

Download

Documents

Norman Small
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: ARM for Wireless Applications ARM11 Microarchitecture On the ARMv6 Connie Wang.

ARM for Wireless Applications

ARM11 Microarchitecture

On the ARMv6

Connie Wang

Page 2: ARM for Wireless Applications ARM11 Microarchitecture On the ARMv6 Connie Wang.

Advanced RISC Machines

• >75% of market for 32-bit RISC microprocessors

• ARM11 Design led by Ian Devereux

Page 3: ARM for Wireless Applications ARM11 Microarchitecture On the ARMv6 Connie Wang.

Demands of Wireless Applications

• High performance

• Low power

• Small size

• Cost

Page 4: ARM for Wireless Applications ARM11 Microarchitecture On the ARMv6 Connie Wang.

RISC for Wireless

• Strengths:– Clock rate

– Pipelining

• Weaknesses:– High code density

– Power consumption

Page 5: ARM for Wireless Applications ARM11 Microarchitecture On the ARMv6 Connie Wang.

ARM11 for Wireless

• Strengths Enhanced:– Clock rate

• Optimized interrupt and exception handling

• Minimized context switch cost

• Instruction set for media

– Pipelining• Decoupled for high

bandwidth• Retire before execution

• Weaknesses Reduced:– High code density

• ISA extensions

• Optional application specific and/or VFP coprocessors

– Power consumption• Architecture and

instructions reduce clock rate

• Clock gate control

Page 6: ARM for Wireless Applications ARM11 Microarchitecture On the ARMv6 Connie Wang.

ARM11 Microarchitecture

• First implementation of ARMv6 architecture• 8-stage pipeline• 64-bit datapaths• Frequency: up to 750 MHz, 350 – 500+ MHz

worst case. 400 – 1,200 Dhrystone MIPS• Power: 0.4 mW/MHz worst case: 0.13µm 1.2V• Will be released to licensees in Q4 2002

Page 7: ARM for Wireless Applications ARM11 Microarchitecture On the ARMv6 Connie Wang.

ARMv6

• Media support: SIMD extensions

• Improved interrupt latency

• ISA extensions THUMB, DSP, Jazelle

• 100% backwards compatibility to ARMv5

Page 8: ARM for Wireless Applications ARM11 Microarchitecture On the ARMv6 Connie Wang.

THUMB Instruction Set

• 32-bit performance for 16-bit systems• 32-bit instructions re-coded to 16-bit op-

codes• 32-bit ROM stores 2 THUMB instructions

per word• Decompressed in pipeline to ARM

instruction equivalents• Improves code density by 35%

Page 9: ARM for Wireless Applications ARM11 Microarchitecture On the ARMv6 Connie Wang.

DSP Instruction Set

• Application accelerator for Digital Signal Processor performance

• Can load/store registers by pairs

• 16x16 or 32x16 MAC in one cycle

• Utilized in MAC pipeline

Page 10: ARM for Wireless Applications ARM11 Microarchitecture On the ARMv6 Connie Wang.

Jazelle Instruction Set

• Support for entering/exiting Java applications

• Fetches/decodes Java bytecodes, maintains a Java operand stack

• Creates a state that imitates a Java processor

• OS controls low-cost switch between Java and ARM/THUMB states

Page 11: ARM for Wireless Applications ARM11 Microarchitecture On the ARMv6 Connie Wang.

SIMD Instruction Set

• Parallel processing of 2x16-bit or 4x8-bit operands

• Four new Greater than or Equal to status bits (GE[3:0]) for MAC calculations

• Eliminates need for very high clock frequencies and hardware accelerators

• 2 – 4 x performance improvement for multimedia applications

Page 12: ARM for Wireless Applications ARM11 Microarchitecture On the ARMv6 Connie Wang.

Synchronization and Sharing Data

• Load-/store- Exclusive instructions (LDREX/STREX) support semaphores– Consolidates old Swap instruction and

necessary semaphore implementation

• Virtual Memory System Architecture v6 ID’s separate caches– Cache hierarchy and ordering rules

Page 13: ARM for Wireless Applications ARM11 Microarchitecture On the ARMv6 Connie Wang.

Bit/Byte Order Support

• E-bit for current endian setting of core– Set/cleared with SETEND instruction

• REV* instructions reverse bytes for unaligned data support– REV – reverses a word– REV16 – reverses both halfwords– REVSH – reverses high order halfword + sign

extend halfword

Page 14: ARM for Wireless Applications ARM11 Microarchitecture On the ARMv6 Connie Wang.

Exception and Interrupt Improvement

• Imperative for real-time tasks wherein low latency is critical

• F1 bit in CP15 register 1 designates: 0: Max performance mode, or1: Low interrupt latency mode to allow interrupts

• VE bit enables vectored interrupts to core– Direct vs. external-> system -> vector address

• A-bit aborts all unaligned accesses• U-bit (with clear A-bit) allows unaligned hardware

access

Page 15: ARM for Wireless Applications ARM11 Microarchitecture On the ARMv6 Connie Wang.

Mode Changing and Stack Improvements

• CPSID/CPSIE instructions allow changing between modes with interrupt disable/enable

• Save Return State (SRS) saves registers and state of current mode onto stack of target mode

• Return From Exception (RFE) loads registers and state of saved mode

• Reduces exception handling overhead

Page 16: ARM for Wireless Applications ARM11 Microarchitecture On the ARMv6 Connie Wang.

8-Stage Pipeline

• Single-issue• Dynamic branch prediction is 64-entry directly

mapped BTB• 64-bit data paths: read 2 registers in 1 clock• Loads/stores done in background• Out-of-order completion: can retire instructions

before execution• ALU processed in parallel with data cache access• MAC processed in lock-step with ALU

Page 17: ARM for Wireless Applications ARM11 Microarchitecture On the ARMv6 Connie Wang.

Prefetch

L1 memory access requires 2 cycles

Page 18: ARM for Wireless Applications ARM11 Microarchitecture On the ARMv6 Connie Wang.

Decode

Decode instruction bits and allocate stack

Page 19: ARM for Wireless Applications ARM11 Microarchitecture On the ARMv6 Connie Wang.

Issue Instruction

Load operands from registers

Page 20: ARM for Wireless Applications ARM11 Microarchitecture On the ARMv6 Connie Wang.

ALU and MAC

• ALU pipeline– Shift bits– Arithmetic and logical

operations– Save state and registers

• 3-stage MAC – Can issue a 16x16

operation per cycle– Processed with ALU

pipeline

Page 21: ARM for Wireless Applications ARM11 Microarchitecture On the ARMv6 Connie Wang.

Data Cache Access

• Map memory address• Data cache load/store

requires 2 cycles

Page 22: ARM for Wireless Applications ARM11 Microarchitecture On the ARMv6 Connie Wang.

Writeback

Write results of instructions to designated memory, cache, or register

Page 23: ARM for Wireless Applications ARM11 Microarchitecture On the ARMv6 Connie Wang.

8-Stage Pipeline

Diagram by Devereau:7

Page 24: ARM for Wireless Applications ARM11 Microarchitecture On the ARMv6 Connie Wang.

Power-saving features

• >95% of registers clock gated

• WFI instruction: wait for interrupt: can disable entire clock network

• Reduced clock cycles and use of transistors

Page 25: ARM for Wireless Applications ARM11 Microarchitecture On the ARMv6 Connie Wang.

Conclusions

• ARM11 will be implemented as a family of cores – Designed for maximum performance in

wireless multimedia – A new standard in efficiency and power for

embedded applications