Top Banner
ARM big.LITTLE Technology Advanced Seminar – Computer Engineering Philipp Gsching 08.12.2015 1
69

Advanced Seminar Computer Engineering Philipp Gsching 08.12 · 2016-10-14 · ARM big.LITTLE Technology Advanced Seminar – Computer Engineering Philipp Gsching 08.12.2015 1 . 1.

Jun 21, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Advanced Seminar Computer Engineering Philipp Gsching 08.12 · 2016-10-14 · ARM big.LITTLE Technology Advanced Seminar – Computer Engineering Philipp Gsching 08.12.2015 1 . 1.

ARM big.LITTLE Technology

Advanced Seminar – Computer Engineering Philipp Gsching 08.12.2015

1

Page 2: Advanced Seminar Computer Engineering Philipp Gsching 08.12 · 2016-10-14 · ARM big.LITTLE Technology Advanced Seminar – Computer Engineering Philipp Gsching 08.12.2015 1 . 1.

1. Introduction 2. ARM Architecture

1. Instruction Set

2. Microarchitecture

3. CPUs 3. big.LITTLE

1. Cache Coherency

2. Distributed Virtual Memory

3. Performance 4. Conclusion

2

Page 3: Advanced Seminar Computer Engineering Philipp Gsching 08.12 · 2016-10-14 · ARM big.LITTLE Technology Advanced Seminar – Computer Engineering Philipp Gsching 08.12.2015 1 . 1.

Smartphone/Tablet use cases:

1. Idle most of the time low power CPU

2. High-performance requirements high performance CPU

Difficult to achieve with one CPU

3

Page 4: Advanced Seminar Computer Engineering Philipp Gsching 08.12 · 2016-10-14 · ARM big.LITTLE Technology Advanced Seminar – Computer Engineering Philipp Gsching 08.12.2015 1 . 1.

Idea: ARM big.LITTLE

Fusing a low-power and a high-performance CPU in one chip

LITTLE • OS • UI • Internet • E-Mail • …

big • Gaming • HD – videos • Rich Web

Services • …

Cache Coherent Interconnect

LIT

TL

E

Co

rtex

A53

1 2

3 4 B

ig

C

ort

ex A

57

2 1

3 4

L2 Cache L2 Cache

4

Page 5: Advanced Seminar Computer Engineering Philipp Gsching 08.12 · 2016-10-14 · ARM big.LITTLE Technology Advanced Seminar – Computer Engineering Philipp Gsching 08.12.2015 1 . 1.

Basics

5

Page 6: Advanced Seminar Computer Engineering Philipp Gsching 08.12 · 2016-10-14 · ARM big.LITTLE Technology Advanced Seminar – Computer Engineering Philipp Gsching 08.12.2015 1 . 1.

Advanced RISC Machines

Founded: 1990 by Acorn, Apple and VLSI Origin: Microcontrollers / Embedded Systems Business model: design and licensing of

Intellectual Property (IP) Revenue: 1.2 billion USD ( Intel: 55.8 billion USD )

Employees: 3,300 ( Intel: 106,700 )

Market Share: > 90% (2014, smartphone/tablet)

6

Page 7: Advanced Seminar Computer Engineering Philipp Gsching 08.12 · 2016-10-14 · ARM big.LITTLE Technology Advanced Seminar – Computer Engineering Philipp Gsching 08.12.2015 1 . 1.

ARM Instruction Set:

RISC (Reduced Instruction Set Computing)

7

Page 8: Advanced Seminar Computer Engineering Philipp Gsching 08.12 · 2016-10-14 · ARM big.LITTLE Technology Advanced Seminar – Computer Engineering Philipp Gsching 08.12.2015 1 . 1.

RISC (ARM)

MOV r2, #8 MUL r1, r1, r2 ADD r0, r0, r1 ADD r0, r0, #4 LDR r3, [r0] ADD r3, r3, #1 STR r3, [r0]

CISC (IA-32)

ADD $1, 4(%eax, %ebx, 8)

8

Page 9: Advanced Seminar Computer Engineering Philipp Gsching 08.12 · 2016-10-14 · ARM big.LITTLE Technology Advanced Seminar – Computer Engineering Philipp Gsching 08.12.2015 1 . 1.

Not strictly RISC

ARM Instruction Set: RISC (Reduced Instruction Set Computing)

16 general purpose registers + 2 status registers

32-bit fixed-size instructions

Condition Codes for (almost) all instructions

Barrel Shifter for ALU

16-bit fixed-size THUMB instructions

Digital Signal Processing (DSP) instructions

Cryptography Extension Instructions

9

Page 10: Advanced Seminar Computer Engineering Philipp Gsching 08.12 · 2016-10-14 · ARM big.LITTLE Technology Advanced Seminar – Computer Engineering Philipp Gsching 08.12.2015 1 . 1.

RISC (ARM)

MOV r2, #8 MUL r1, r1, r2 ADD r0, r0, r1 ADD r0, r0, #4 LDR r3, [r0] ADD r3, r3, #1 STR r3, [r0] ADD r0, r0, r1, LSL #3 LDR r3, [r0, #4]! ADD r3, #1 STR r3, [r0]

CISC (IA-32)

ADD $1, 4(%eax, %ebx, 8)

Microcode

10

Page 11: Advanced Seminar Computer Engineering Philipp Gsching 08.12 · 2016-10-14 · ARM big.LITTLE Technology Advanced Seminar – Computer Engineering Philipp Gsching 08.12.2015 1 . 1.

Instruction Set Architecture (ISA) has no significant impact on performance and power consumption

Tech-independet, scaled to 1GHz, 45 nm process, normalized to A8

0

1

2

3

4

Average Power (normalized)

A8 (ARM, 0.6GHz, 65nm, iPhone 4)

A15 (ARM, 1.66GHz, 32nm, Galaxy S4)

Atom (x86, 1.66GHz, 45nm, Netbook)

i7 (x86, 3.4GHz, 32nm, Desktop)

11

Page 12: Advanced Seminar Computer Engineering Philipp Gsching 08.12 · 2016-10-14 · ARM big.LITTLE Technology Advanced Seminar – Computer Engineering Philipp Gsching 08.12.2015 1 . 1.

ARM Instruction Set Microarchitecture: Technology-node and feature size

Voltage and Frequency Scaling

Power-domains

Clock-gating

Power-modes

Pipelining

Caches

SoC (System-On-A-Chip) design

12

Page 13: Advanced Seminar Computer Engineering Philipp Gsching 08.12 · 2016-10-14 · ARM big.LITTLE Technology Advanced Seminar – Computer Engineering Philipp Gsching 08.12.2015 1 . 1.

ARM Instruction Set Microarchitecture: Technology-node and feature size

Voltage and Frequency Scaling

Power-domains

Clock-gating

Power-modes

Pipelining

Caches

SoC (System-On-A-Chip) design

Reducing capacitance

13

Page 14: Advanced Seminar Computer Engineering Philipp Gsching 08.12 · 2016-10-14 · ARM big.LITTLE Technology Advanced Seminar – Computer Engineering Philipp Gsching 08.12.2015 1 . 1.

ARM Instruction Set Microarchitecture: Technology-node and feature size

Voltage and Frequency Scaling

Power-domains

Clock-gating

Power-modes

Pipelining

Caches

SoC (System-On-A-Chip) design

Dynamically adjusting supply voltage and

clock speed according to need

14

Page 15: Advanced Seminar Computer Engineering Philipp Gsching 08.12 · 2016-10-14 · ARM big.LITTLE Technology Advanced Seminar – Computer Engineering Philipp Gsching 08.12.2015 1 . 1.

ARM Instruction Set Microarchitecture: Technology-node and feature size

Voltage and Frequency Scaling

Power-domains

Clock-gating

Power-modes

Pipelining

Caches

SoC (System-On-A-Chip) design

Power supply for different sections of core can be turned

on/off independently

15

Page 16: Advanced Seminar Computer Engineering Philipp Gsching 08.12 · 2016-10-14 · ARM big.LITTLE Technology Advanced Seminar – Computer Engineering Philipp Gsching 08.12.2015 1 . 1.

ARM Instruction Set Microarchitecture: Technology-node and feature size

Voltage and Frequency Scaling

Power-domains

Clock-gating

Power-modes

Pipelining

Caches

SoC (System-On-A-Chip) design

Clock for different sections of the core can be turned on/off

independently

16

Page 17: Advanced Seminar Computer Engineering Philipp Gsching 08.12 · 2016-10-14 · ARM big.LITTLE Technology Advanced Seminar – Computer Engineering Philipp Gsching 08.12.2015 1 . 1.

ARM Instruction Set Microarchitecture: Technology-node and feature size

Voltage and Frequency Scaling

Power-domains

Clock-gating

Power-modes

Pipelining

Caches

SoC (System-On-A-Chip) design

Predefined low-power modes utilizing the above mentioned

features

17

Page 18: Advanced Seminar Computer Engineering Philipp Gsching 08.12 · 2016-10-14 · ARM big.LITTLE Technology Advanced Seminar – Computer Engineering Philipp Gsching 08.12.2015 1 . 1.

ARM Instruction Set Microarchitecture: Technology-node and feature size

Voltage and Frequency Scaling

Power-domains

Clock-gating

Power-modes

Pipelining

Caches

SoC (System-On-A-Chip) design

Reducing idle time of different parts of core

18

Page 19: Advanced Seminar Computer Engineering Philipp Gsching 08.12 · 2016-10-14 · ARM big.LITTLE Technology Advanced Seminar – Computer Engineering Philipp Gsching 08.12.2015 1 . 1.

ARM Instruction Set Microarchitecture: Technology-node and feature size

Voltage and Frequency Scaling

Power-domains

Clock-gating

Power-modes

Pipelining

Caches

SoC (System-On-A-Chip) design

Reducing time and power intensive accesses to main

memory

19

Page 20: Advanced Seminar Computer Engineering Philipp Gsching 08.12 · 2016-10-14 · ARM big.LITTLE Technology Advanced Seminar – Computer Engineering Philipp Gsching 08.12.2015 1 . 1.

ARM Instruction Set Microarchitecture: Technology-node and feature size

Voltage and Frequency Scaling

Power-domains

Clock-gating

Power-modes

Pipelining

Caches

SoC (System-On-A-Chip) design

Adjusting all components of a processor to one-

another

20

Page 21: Advanced Seminar Computer Engineering Philipp Gsching 08.12 · 2016-10-14 · ARM big.LITTLE Technology Advanced Seminar – Computer Engineering Philipp Gsching 08.12.2015 1 . 1.

ARM Instruction Set Microarchitecture: Technology-node and feature size

Voltage and Frequency Scaling

Power-domains

Clock-gating

Power-modes

Pipelining

Caches

SoC (System-On-A-Chip) design

ARMs emphasis is on power consumption and size

Momentum for mobile market

21

Page 22: Advanced Seminar Computer Engineering Philipp Gsching 08.12 · 2016-10-14 · ARM big.LITTLE Technology Advanced Seminar – Computer Engineering Philipp Gsching 08.12.2015 1 . 1.

Snoop Controller Unit

L2 Cache (shared) Cluster

SoC

MMU

TLB BUS

Arb

iter

Core 1

µTLB Instr.

Data L1 Cache

Instr.

Data

22

Page 23: Advanced Seminar Computer Engineering Philipp Gsching 08.12 · 2016-10-14 · ARM big.LITTLE Technology Advanced Seminar – Computer Engineering Philipp Gsching 08.12.2015 1 . 1.

Snoop Controller Unit

L2 Cache (shared) Cluster

SoC

MMU

TLB BUS

Arb

iter

Core 1

µTLB

Instr.

Data L1 Cache

Instr.

Data

Cortex A53 8-stage (integer), in-order

Cortex A57 15-stage (integer), out-of-order

23

Page 24: Advanced Seminar Computer Engineering Philipp Gsching 08.12 · 2016-10-14 · ARM big.LITTLE Technology Advanced Seminar – Computer Engineering Philipp Gsching 08.12.2015 1 . 1.

LITTLE big

CPU Cortex A53 Cortex A57

64-bit Yes Yes

Cores 1 – 4 1 – 4

Frequency* 1.3 GHz 1.9 GHz

L1 Cache 8 – 64 kB 48/32 kB

L2 Cache 128 – 2,048 kB 512 – 2,048 kB

Pipeline Integer depth 8 15

Out-of-order No Yes

Performance 2.3 DMIPS/MHz 4.1 DMIPS/MHz

Technology node* 20 nm 20 nm

Core Size* 0.70 mm² 2.05 mm²

Cluster Size* 4.58 mm² 15.10 mm²

* Values for SoC Samsung Exynos 5433 (Galaxy Note 4) 24

Page 25: Advanced Seminar Computer Engineering Philipp Gsching 08.12 · 2016-10-14 · ARM big.LITTLE Technology Advanced Seminar – Computer Engineering Philipp Gsching 08.12.2015 1 . 1.

0

1

2

3

4

5

6

7

8

40

0

500

60

0

700

80

0

90

0

100

0

110

0

120

0

130

0

140

0

150

0

160

0

170

0

180

0

190

0

Po

we

r C

on

sum

pti

on

(W

)

Frequency (MHz)

Cortex-A Power Consumption

A53 (1 Core)

A53 (4 Cores)

A57 (1 Core)

A57 (4 Cores)

SoC: Samsung Exynos 5433 (Galaxy Note 4) 25

Page 26: Advanced Seminar Computer Engineering Philipp Gsching 08.12 · 2016-10-14 · ARM big.LITTLE Technology Advanced Seminar – Computer Engineering Philipp Gsching 08.12.2015 1 . 1.

Heterogenous multi-processing

26

Page 27: Advanced Seminar Computer Engineering Philipp Gsching 08.12 · 2016-10-14 · ARM big.LITTLE Technology Advanced Seminar – Computer Engineering Philipp Gsching 08.12.2015 1 . 1.

LIT

TL

E

Co

rtex

A53

1 2

3 4

Big

Co

rtex

A57

2 1

3 4

L2 Cache L2 Cache

Connecting two heterogeneous clusters…

27

Binary compatible

Page 28: Advanced Seminar Computer Engineering Philipp Gsching 08.12 · 2016-10-14 · ARM big.LITTLE Technology Advanced Seminar – Computer Engineering Philipp Gsching 08.12.2015 1 . 1.

Big

Co

rtex

A57

AXI = Advanced eXtensible Interface

LIT

TL

E

Co

rtex

A53

1 2

3 4 3 4

L2 Cache L2 Cache

AXI

2 1

AXI

28

Page 29: Advanced Seminar Computer Engineering Philipp Gsching 08.12 · 2016-10-14 · ARM big.LITTLE Technology Advanced Seminar – Computer Engineering Philipp Gsching 08.12.2015 1 . 1.

LIT

TL

E

Co

rtex

A53

1 2

3 4

Big

Co

rtex

A57

2 1

3 4

L2 Cache L2 Cache

AXI

AXI

Read_Adress Read_Data Write_Adress Write_Data Write_Ack 29

Page 30: Advanced Seminar Computer Engineering Philipp Gsching 08.12 · 2016-10-14 · ARM big.LITTLE Technology Advanced Seminar – Computer Engineering Philipp Gsching 08.12.2015 1 . 1.

LIT

TL

E

Co

rtex

A53

1 2

3 4

Big

Co

rtex

A57

2 1

3 4

L2 Cache L2 Cache

AXI ACE

AXI

Read_Adress Read_Data Write_Adress Write_Data Write_Ack

ACE

C_Address C_Data C_Response 30

Page 31: Advanced Seminar Computer Engineering Philipp Gsching 08.12 · 2016-10-14 · ARM big.LITTLE Technology Advanced Seminar – Computer Engineering Philipp Gsching 08.12.2015 1 . 1.

LIT

TL

E

Co

rtex

A53

1 2

3 4

Big

Co

rtex

A57

2 1

3 4

L2 Cache L2 Cache

AXI ACE

AXI ACE

C_Address C_Data C_Response

ACE = AXI Coherency Extension

31

Page 32: Advanced Seminar Computer Engineering Philipp Gsching 08.12 · 2016-10-14 · ARM big.LITTLE Technology Advanced Seminar – Computer Engineering Philipp Gsching 08.12.2015 1 . 1.

LIT

TL

E

Co

rtex

A53

1 2

3 4

Big

Co

rtex

A57

2 1

3 4

L2 Cache L2 Cache

AXI ACE AXI ACE

Cache Coherent Interconnect

32

Page 33: Advanced Seminar Computer Engineering Philipp Gsching 08.12 · 2016-10-14 · ARM big.LITTLE Technology Advanced Seminar – Computer Engineering Philipp Gsching 08.12.2015 1 . 1.

SoC

Cache Coherent Interconnect

GPU

BUS

1 2

3 4

L2

Cach

e

2 1

3 4

L2 Cache

Mem

ory C

on

troller

Disp

lay

Perip

hery

33

Page 34: Advanced Seminar Computer Engineering Philipp Gsching 08.12 · 2016-10-14 · ARM big.LITTLE Technology Advanced Seminar – Computer Engineering Philipp Gsching 08.12.2015 1 . 1.

Valid Invalid

Unique Shared

Dirty Unique

Dirty Shared

Dirty Invalid

Clean Unique Clean

Shared Clean

LIT

TL

E

Co

rtex

A53

1 2

3 4

Big

Co

rtex

A57

2 1

3 4

Cache Coherent Interconnect

L2 Cache L2 Cache

AXI ACE AXI ACE

Coherency States

Analogical to MOESI-protocol: Modified, Owned, Exclusive, Shared, Invalid

34

Page 35: Advanced Seminar Computer Engineering Philipp Gsching 08.12 · 2016-10-14 · ARM big.LITTLE Technology Advanced Seminar – Computer Engineering Philipp Gsching 08.12.2015 1 . 1.

LIT

TL

E

Co

rtex

A53

1 2

3 4

Big

Co

rtex

A57

2 1

3 4

Cache Coherent Interconnect

Cache Cache

AXI ACE AXI ACE

35

Page 36: Advanced Seminar Computer Engineering Philipp Gsching 08.12 · 2016-10-14 · ARM big.LITTLE Technology Advanced Seminar – Computer Engineering Philipp Gsching 08.12.2015 1 . 1.

LIT

TL

E

Co

rtex

A53

1 2

3 4

Big

Co

rtex

A57

2 1

3 4

Cache Coherent Interconnect

Cache Cache

AXI ACE AXI ACE

1. LITTLE load(A)

36

Page 37: Advanced Seminar Computer Engineering Philipp Gsching 08.12 · 2016-10-14 · ARM big.LITTLE Technology Advanced Seminar – Computer Engineering Philipp Gsching 08.12.2015 1 . 1.

LIT

TL

E

Co

rtex

A53

1 2

3 4

Big

Co

rtex

A57

2 1

3 4

Cache Coherent Interconnect

Cache Cache

AXI ACE AXI ACE

1. LITTLE load(A) 2. CCI snoop(A)

37

Page 38: Advanced Seminar Computer Engineering Philipp Gsching 08.12 · 2016-10-14 · ARM big.LITTLE Technology Advanced Seminar – Computer Engineering Philipp Gsching 08.12.2015 1 . 1.

LIT

TL

E

Co

rtex

A53

1 2

3 4

Big

Co

rtex

A57

2 1

3 4

Cache Coherent Interconnect

AXI ACE AXI ACE

1. LITTLE load(A) 2. CCI snoop(A) 3. big resp(miss)

Cache Cache

38

Page 39: Advanced Seminar Computer Engineering Philipp Gsching 08.12 · 2016-10-14 · ARM big.LITTLE Technology Advanced Seminar – Computer Engineering Philipp Gsching 08.12.2015 1 . 1.

LIT

TL

E

Co

rtex

A53

1 2

3 4

Big

Co

rtex

A57

2 1

3 4

Cache Coherent Interconnect

AXI ACE AXI ACE

1. LITTLE load(A) 2. CCI snoop(A) 3. big resp(miss) 4. CCI load_mem(A)

to main memory

Cache Cache

39

Page 40: Advanced Seminar Computer Engineering Philipp Gsching 08.12 · 2016-10-14 · ARM big.LITTLE Technology Advanced Seminar – Computer Engineering Philipp Gsching 08.12.2015 1 . 1.

LIT

TL

E

Co

rtex

A53

1 2

3 4

Big

Co

rtex

A57

2 1

3 4

Cache Coherent Interconnect

AXI ACE AXI ACE

1. LITTLE load(A) 2. CCI snoop(A) 3. big resp(miss) 4. CCI load_mem(A) 5. CCI return(A)

Cache Cache

40

Page 41: Advanced Seminar Computer Engineering Philipp Gsching 08.12 · 2016-10-14 · ARM big.LITTLE Technology Advanced Seminar – Computer Engineering Philipp Gsching 08.12.2015 1 . 1.

LIT

TL

E

Co

rtex

A53

2

3 4

Big

Co

rtex

A57

2 1

3 4

Cache Coherent Interconnect

AXI ACE AXI ACE

1. LITTLE load(A) 2. CCI snoop(A) 3. big resp(miss) 4. CCI load_mem(A) 5. CCI return(A)

Au

Cache Au Cache

41

Page 42: Advanced Seminar Computer Engineering Philipp Gsching 08.12 · 2016-10-14 · ARM big.LITTLE Technology Advanced Seminar – Computer Engineering Philipp Gsching 08.12.2015 1 . 1.

LIT

TL

E

Co

rtex

A53

2

3 4

Big

Co

rtex

A57

2 1

3 4

Cache Coherent Interconnect

AXI ACE AXI ACE

1. LITTLE load(A) 2. CCI snoop(A) 3. big resp(miss) 4. CCI load_mem(A) 5. CCI return(A)

6. big load(A)

Au

Cache Cache Au

42

Page 43: Advanced Seminar Computer Engineering Philipp Gsching 08.12 · 2016-10-14 · ARM big.LITTLE Technology Advanced Seminar – Computer Engineering Philipp Gsching 08.12.2015 1 . 1.

LIT

TL

E

Co

rtex

A53

2

3 4

Big

Co

rtex

A57

2 1

3 4

Cache Coherent Interconnect

AXI ACE AXI ACE

1. LITTLE load(A) 2. CCI snoop(A) 3. big resp(miss) 4. CCI load_mem(A) 5. CCI return(A)

6. big load(A) 7. CCI snoop(A)

Au

Cache Cache Au

43

Page 44: Advanced Seminar Computer Engineering Philipp Gsching 08.12 · 2016-10-14 · ARM big.LITTLE Technology Advanced Seminar – Computer Engineering Philipp Gsching 08.12.2015 1 . 1.

LIT

TL

E

Co

rtex

A53

2

3 4

Big

Co

rtex

A57

2 1

3 4

Cache Coherent Interconnect

AXI ACE AXI ACE

1. LITTLE load(A) 2. CCI snoop(A) 3. big resp(miss) 4. CCI load_mem(A) 5. CCI return(A)

6. big load(A) 7. CCI snoop(A) 8. LITTLE resp(hit) return(A)

As

Cache Cache As

44

Page 45: Advanced Seminar Computer Engineering Philipp Gsching 08.12 · 2016-10-14 · ARM big.LITTLE Technology Advanced Seminar – Computer Engineering Philipp Gsching 08.12.2015 1 . 1.

LIT

TL

E

Co

rtex

A53

As 2

3 4

Big

Co

rtex

A57

2 1

3 4

Cache Coherent Interconnect

AXI ACE AXI ACE

1. LITTLE load(A) 2. CCI snoop(A) 3. big resp(miss) 4. CCI load_mem(A) 5. CCI return(A)

6. big load(A) 7. CCI snoop(A) 8. LITTLE resp(hit) return(A) 9. CCI return(A)

Cache Cache As

45

Page 46: Advanced Seminar Computer Engineering Philipp Gsching 08.12 · 2016-10-14 · ARM big.LITTLE Technology Advanced Seminar – Computer Engineering Philipp Gsching 08.12.2015 1 . 1.

LIT

TL

E

Co

rtex

A53

As 2

3 4

Big

Co

rtex

A57

2 As

3 4

Cache Coherent Interconnect

AXI ACE AXI ACE

1. LITTLE load(A) 2. CCI snoop(A) 3. big resp(miss) 4. CCI load_mem(A) 5. CCI return(A)

6. big load(A) 7. CCI snoop(A) 8. LITTLE resp(hit) return(A) 9. CCI return(A)

Cache As Cache As

46

Page 47: Advanced Seminar Computer Engineering Philipp Gsching 08.12 · 2016-10-14 · ARM big.LITTLE Technology Advanced Seminar – Computer Engineering Philipp Gsching 08.12.2015 1 . 1.

LIT

TL

E

Co

rtex

A53

A′s 2

3 4

Big

Co

rtex

A57

2 1

3 4

Cache Coherent Interconnect

AXI ACE AXI ACE

1. LITTLE load(A) 2. CCI snoop(A) 3. big resp(miss) 4. CCI load_mem(A) 5. CCI return(A)

6. big load(A) 7. CCI snoop(A) 8. LITTLE resp(hit) return(A) 9. CCI return(A)

Cache As Cache As

47

Page 48: Advanced Seminar Computer Engineering Philipp Gsching 08.12 · 2016-10-14 · ARM big.LITTLE Technology Advanced Seminar – Computer Engineering Philipp Gsching 08.12.2015 1 . 1.

LIT

TL

E

Co

rtex

A53

A′s 2

3 4

Big

Co

rtex

A57

2

3 4

Cache Coherent Interconnect

AXI ACE AXI ACE

1. LITTLE load(A) 2. CCI snoop(A) 3. big resp(miss) 4. CCI load_mem(A) 5. CCI return(A)

6. big load(A) 7. CCI snoop(A) 8. LITTLE resp(hit) return(A) 9. CCI return(A)

10. LITTLE makeUnique(A)

Cache As Cache As

1

48

Page 49: Advanced Seminar Computer Engineering Philipp Gsching 08.12 · 2016-10-14 · ARM big.LITTLE Technology Advanced Seminar – Computer Engineering Philipp Gsching 08.12.2015 1 . 1.

LIT

TL

E

Co

rtex

A53

A′s 2

3 4

Big

Co

rtex

A57

2

3 4

Cache Coherent Interconnect

AXI ACE AXI ACE

1. LITTLE load(A) 2. CCI snoop(A) 3. big resp(miss) 4. CCI load_mem(A) 5. CCI return(A)

6. big load(A) 7. CCI snoop(A) 8. LITTLE resp(hit) return(A) 9. CCI return(A)

10. LITTLE makeUnique(A) 11. CCI invalidate(A)

Cache As Cache As

1

49

Page 50: Advanced Seminar Computer Engineering Philipp Gsching 08.12 · 2016-10-14 · ARM big.LITTLE Technology Advanced Seminar – Computer Engineering Philipp Gsching 08.12.2015 1 . 1.

LIT

TL

E

Co

rtex

A53

A′s 2

3 4

Big

Co

rtex

A57

2

3 4

Cache Coherent Interconnect

AXI ACE AXI ACE

1. LITTLE load(A) 2. CCI snoop(A) 3. big resp(miss) 4. CCI load_mem(A) 5. CCI return(A)

6. big load(A) 7. CCI snoop(A) 8. LITTLE resp(hit) return(A) 9. CCI return(A)

10. LITTLE makeUnique(A) 11. CCI invalidate(A) 12. big invalidated resp(ack)

Cache --- Cache As

1

50

Page 51: Advanced Seminar Computer Engineering Philipp Gsching 08.12 · 2016-10-14 · ARM big.LITTLE Technology Advanced Seminar – Computer Engineering Philipp Gsching 08.12.2015 1 . 1.

LIT

TL

E

Co

rtex

A53

A′u 2

3 4

Big

Co

rtex

A57

2

3 4

Cache Coherent Interconnect

AXI ACE AXI ACE

1. LITTLE load(A) 2. CCI snoop(A) 3. big resp(miss) 4. CCI load_mem(A) 5. CCI return(A)

6. big load(A) 7. CCI snoop(A) 8. LITTLE resp(hit) return(A) 9. CCI return(A)

10. LITTLE makeUnique(A) 11. CCI invalidate(A) 12. big invalidated resp(ack) 13. CCI resp(isUnique)

Cache --- Cache Au

1

51

Page 52: Advanced Seminar Computer Engineering Philipp Gsching 08.12 · 2016-10-14 · ARM big.LITTLE Technology Advanced Seminar – Computer Engineering Philipp Gsching 08.12.2015 1 . 1.

LIT

TL

E

Co

rtex

A53

1 2

3 4

Big

Co

rtex

A57

2

3 4

Cache Coherent Interconnect

AXI ACE AXI ACE

1. LITTLE load(A) 2. CCI snoop(A) 3. big resp(miss) 4. CCI load_mem(A) 5. CCI return(A)

6. big load(A) 7. CCI snoop(A) 8. LITTLE resp(hit) return(A) 9. CCI return(A)

10. LITTLE makeUnique(A) 11. CCI invalidate(A) 12. big invalidated resp(ack) 13. CCI resp(isUnique) 14. LITTLE store(A)

Cache --- Cache A′𝑢

1

52

Page 53: Advanced Seminar Computer Engineering Philipp Gsching 08.12 · 2016-10-14 · ARM big.LITTLE Technology Advanced Seminar – Computer Engineering Philipp Gsching 08.12.2015 1 . 1.

LIT

TL

E

Co

rtex

A53

1 1 1 A

Big

Co

rtex

A57

Cache Coherent Interconnect

AXI ACE AXI ACE

TLB

Distributed Virtual Memory (DVM):

• Threads on different cores share the same virtual memory

• Core A causes change in page table core Bs TLB entry out-of-date

• Core A issues invalidation message CCI broadcasts TLB entry invalidation Core B invalidates TLB entry

1 1 1 B

L2 Cache L2 Cache

TLB

TLBs are read-only DVM messages can only invalidate entries (can‘t fetch entries)

53

Page 54: Advanced Seminar Computer Engineering Philipp Gsching 08.12 · 2016-10-14 · ARM big.LITTLE Technology Advanced Seminar – Computer Engineering Philipp Gsching 08.12.2015 1 . 1.

ACE Performance:

ACE clock can be integer fractions of CPU clock (including 1:1)

16 simultaneous write commands per cluster 8 simultaneous read commands per core

Snoop Performance:

SCU is clocked with CPU clock 8 simultaneous snoops per cluster Snoop response after: 13 cycles (L2 hit)

16 cycles (L1 hit) 6 cycles (Cache miss)

54

Page 55: Advanced Seminar Computer Engineering Philipp Gsching 08.12 · 2016-10-14 · ARM big.LITTLE Technology Advanced Seminar – Computer Engineering Philipp Gsching 08.12.2015 1 . 1.

ACE Performance:

ACE clock can be integer fractions of CPU clock (including 1:1)

16 simultaneous write commands per cluster 8 simultaneous read commands per core

Snoop Performance:

SCU is clocked with CPU clock 8 simultaneous snoops per cluster Snoop response after: 13 cycles (L2 hit)

16 cycles (L1 hit) 6 cycles (Cache miss)

55

Page 56: Advanced Seminar Computer Engineering Philipp Gsching 08.12 · 2016-10-14 · ARM big.LITTLE Technology Advanced Seminar – Computer Engineering Philipp Gsching 08.12.2015 1 . 1.

Queue of 8 transaction, 16 cycle wait Each transaction returns one cache line 64 byte cache line width 128-bit data channel width (16 bytes)

32 kB L1 transfer: 32 𝑘𝐵𝑦𝑡𝑒𝑠

16 𝐵𝑦𝑡𝑒𝑠+ 16 = 2,013

2 MB L2 transfer: 2 𝑀𝐵𝑦𝑡𝑒𝑠

16 𝐵𝑦𝑡𝑒𝑠+ 13 = 131,088

4 cycles

56

Page 57: Advanced Seminar Computer Engineering Philipp Gsching 08.12 · 2016-10-14 · ARM big.LITTLE Technology Advanced Seminar – Computer Engineering Philipp Gsching 08.12.2015 1 . 1.

A53 @ 1.3 GHz

E5520 @ 2.26 GHz

Queue of 8 transaction, 16 cycle wait Each transaction returns one cache line 64 byte cache line width 128-bit data channel width (16 bytes)

32 kB: ~ 1.5 µs ~ 6.5 µs

2 MB: ~ 100 µs ~ 30 µs

4 cycles

57

Page 58: Advanced Seminar Computer Engineering Philipp Gsching 08.12 · 2016-10-14 · ARM big.LITTLE Technology Advanced Seminar – Computer Engineering Philipp Gsching 08.12.2015 1 . 1.

58

0,01

0,1

1

10

92

0

138

0

230

0

3450

46

00

64

40

82

80

1012

0

119

60

1476

0

180

40

213

20

24

60

0

278

80

3116

0

349

20

382

00

414

80

Po

we

r C

on

sum

pti

on

in W

Performance in DMIPS

Samsung Exynos 5433 (Galaxy Note 4)

big.LITTLE

Cortex A53

Cortex A57

Page 59: Advanced Seminar Computer Engineering Philipp Gsching 08.12 · 2016-10-14 · ARM big.LITTLE Technology Advanced Seminar – Computer Engineering Philipp Gsching 08.12.2015 1 . 1.

59

0,01

0,1

1

10

92

0

138

0

230

0

3450

46

00

64

40

82

80

1012

0

119

60

1476

0

180

40

213

20

24

60

0

278

80

3116

0

349

20

382

00

414

80

Po

we

r C

on

sum

pti

on

in W

Performance in DMIPS

Samsung Exynos 5433 (Galaxy Note 4)

big.LITTLE

Cortex A53

Cortex A57

Power advantage ~ 70%

~ 55%

Page 60: Advanced Seminar Computer Engineering Philipp Gsching 08.12 · 2016-10-14 · ARM big.LITTLE Technology Advanced Seminar – Computer Engineering Philipp Gsching 08.12.2015 1 . 1.

60

0,01

0,1

1

10

92

0

138

0

230

0

3450

46

00

64

40

82

80

1012

0

119

60

1476

0

180

40

213

20

24

60

0

278

80

3116

0

349

20

382

00

414

80

Po

we

r C

on

sum

pti

on

in W

Performance in DMIPS

Samsung Exynos 5433 (Galaxy Note 4)

big.LITTLE

Cortex A53

Cortex A57

~ 25%

Performance advantage

Page 61: Advanced Seminar Computer Engineering Philipp Gsching 08.12 · 2016-10-14 · ARM big.LITTLE Technology Advanced Seminar – Computer Engineering Philipp Gsching 08.12.2015 1 . 1.

61

0,01

0,1

1

10

92

0

138

0

230

0

3450

46

00

64

40

82

80

1012

0

119

60

1476

0

180

40

213

20

24

60

0

278

80

3116

0

349

20

382

00

414

80

Po

we

r C

on

sum

pti

on

in W

Performance in DMIPS

Samsung Exynos 5433 (Galaxy Note 4)

big.LITTLE

Cortex A53

Cortex A57

Thermal barrier: ~7 W Temp. > 40-50 °C

Page 62: Advanced Seminar Computer Engineering Philipp Gsching 08.12 · 2016-10-14 · ARM big.LITTLE Technology Advanced Seminar – Computer Engineering Philipp Gsching 08.12.2015 1 . 1.

62

0,01

0,1

1

10

92

0

138

0

230

0

3450

46

00

64

40

82

80

1012

0

119

60

1476

0

180

40

213

20

24

60

0

278

80

3116

0

349

20

382

00

414

80

Po

we

r C

on

sum

pti

on

in W

Performance in DMIPS

Samsung Exynos 5433 (Galaxy Note 4)

big.LITTLE

Cortex A53

Cortex A57

Idle/low-power advantage

Page 63: Advanced Seminar Computer Engineering Philipp Gsching 08.12 · 2016-10-14 · ARM big.LITTLE Technology Advanced Seminar – Computer Engineering Philipp Gsching 08.12.2015 1 . 1.

smaller performance advantage (up to 25%) for high performance applications

TDP usually too high for 8 cores at maximum frequency

significant power advantages (up to 70%) for

high efficiency applications better performance/power for entire system low power Idle (and background apps) not

available to high performance CPUs 63

Page 64: Advanced Seminar Computer Engineering Philipp Gsching 08.12 · 2016-10-14 · ARM big.LITTLE Technology Advanced Seminar – Computer Engineering Philipp Gsching 08.12.2015 1 . 1.

Heterogeneous CPUs are possible big.medIUM.LITTLE: Helio X20 SoC

64

Page 65: Advanced Seminar Computer Engineering Philipp Gsching 08.12 · 2016-10-14 · ARM big.LITTLE Technology Advanced Seminar – Computer Engineering Philipp Gsching 08.12.2015 1 . 1.

Sources and References: Papers

E. Blem, J. Menon, T. Vijayaraghavan, K. Sankaralingam. (2015, March). ISA Wars: Understanding the Relevance of ISA being RISC or CISC to Performance, Power, and Energy on Modern Architectures. ACM Transactions on Computer Systems. [Type of medium]. Vol. 33, No.1, Article 3. Available: http://tocs.acm.org/

T. Mitra. (2014). Energy-Efficient Computing with Heterogeneous Multi-Cores, Presented at International Symposium on Integrated Circuits (ISIC). V. Villebonnet, G. Da Costa, L. Lefevre, J.-M. Pierson, P. Stolf. (2014). Towards Generalizing "Big.Little" for Energy Proportional HPC and Cloud

Infrastructures. Presented at IEEE Fourth International Conference on Big Data and Cloud Computing. S. Yoo, Y. Shim, S. Lee, S.-A. Lee, J. Kim. (2015, October). A case for bad big.LITTLE switching: How to scale power-performance in SI-HMP. Presented at

Hotpower’15, Monterey, CA, USA.

ARM Technical Reference Manuals and publications ARMv7 Architecture Reference Manual, ARM Ltd., Cherry Hinton, Cambridge, 2014. ARMv8 Architecture Reference Manual, ARM Ltd., Cherry Hinton, Cambridge, 2015. CoreLink CCI-400 Cache Coherent Interconnect Technical Reference Manual, ARM Ltd., Cherry Hinton, Cambridge, 2012. AMBA AXI and ACE Protocol Specification, ARM Ltd., Cherry Hinton, Cambridge, 2013. Introduction to AMBA 4 ACE and big.LITTLE Processing Technology, ARM Ltd., Cherry Hinton, Cambridge, 2013. ARM Cortex-A53 MPCore Processor Technical Reference Manual, ARM Ltd., Cherry Hinton, Cambridge, 2014. ARM Cortex-A57 MPCore Processor Technical Reference Manual, ARM Ltd., Cherry Hinton, Cambridge, 2014. big.LITTLE Technology: The Future of Mobile, ARM Ltd., Cherry Hinton, Cambridge, 2013.

Internet

A. Frumusanu, R. Smith. (2015, February). ARM A53/A57/T760 investigated - Samsung Galaxy Note 4 Exynos Review. Available: http://www.anandtech.com/show/8718/the-samsung-galaxy-note-4-exynos-review/

B. Sigoure, (2010, November) How long does it take to make a context switch? Available: http://blog.tsunanet.net/2010/11/how-long-does-it-take-to-make-context.html

Other H.-D. Cho, K. Chung, T. Kim. (2012, February). Benefits of the big.LITTLE Architecture. Samsung Electronics, Seoul. K. Yu, (2012) big.LITTLE Switchers – Evaluation on Exynos.bl Processor. Presented at 2012 Korea Linux Forum. Available:

http://events.linuxfoundation.org/images/stories/pdf/klf2012_yu.pdf

65

Page 66: Advanced Seminar Computer Engineering Philipp Gsching 08.12 · 2016-10-14 · ARM big.LITTLE Technology Advanced Seminar – Computer Engineering Philipp Gsching 08.12.2015 1 . 1.

31 30 29 28

… 0

Condition Code

= ?

Instruction

Status Register

Bit

Bit

31 30 29 28 27

… 0

N Z C V Q

e.g. 0000 := Zero flag (Z) is set 0001 := Zero flag (Z) is clear

CMP r4, r5 ; (r4 – r5) == 0 ? ADDEQ r1, r2, r3 ; if equal: r1 := r2 + r3 ADDNE r1, r2, r4 ; else: r1 := r2 + r4

Not-executed instruction takes up 1 cycle

66

Page 67: Advanced Seminar Computer Engineering Philipp Gsching 08.12 · 2016-10-14 · ARM big.LITTLE Technology Advanced Seminar – Computer Engineering Philipp Gsching 08.12.2015 1 . 1.

Registers r0 – r15

Barrel Shifter Operand A

Operand B

Result

MOV r4, #2 ; binary: 0010 ADD r5, r4, r4, LSL #1 ; r5 := r4 + (r4 << 1) ; r5 := 0010 + 0100 ; r5 := 2 + 4

ALU Shift operation:

+1 cycle

N

67

Page 68: Advanced Seminar Computer Engineering Philipp Gsching 08.12 · 2016-10-14 · ARM big.LITTLE Technology Advanced Seminar – Computer Engineering Philipp Gsching 08.12.2015 1 . 1.

68

Clock: 2,26 GHz (Turbo: 2,53 GHz) Cores: 4 (capable of hyperthreading) Cache: 8 MB (L1 64 kB per core, L2 256kB per core, 8 MB shared) TDP: 80 W Node: 45 nm Year: 2009

Page 69: Advanced Seminar Computer Engineering Philipp Gsching 08.12 · 2016-10-14 · ARM big.LITTLE Technology Advanced Seminar – Computer Engineering Philipp Gsching 08.12.2015 1 . 1.

69 from CCI-400 Cache Coherent Interconnect Reference Manual