Dr.Y.Narasimha Murthy Ph.D [email protected]1 ARM Processors -Architecture INTRODUCTION: The ARM Processor was originally developed at Acorn Computers Limited of Cambridge, England, between the years 1983-1985. It was the first RISC microprocessor developed for commercial use and has some significant differences from subsequent RISC architectures. In 1990 ARM Limited was established as a separate company specifically to widen the exploitation of ARM technology and it is established as a market-leader for low-power and cost-sensitive embedded applications. The basic reason behind the origin of ARM processor was, the 16-bit CISC microprocessors that were available in 1983 were slower than standard memory parts. They also had instructions that took many clock cycles to complete (in some cases, many hundreds of clock cycles), resulting very long interrupt latencies. As a result of these limitations with the commercial microprocessor offerings, the design of a proprietary microprocessor was considered hence ARM chip was emerged. In fact, ARM does not manufacture microprocessors. It is an IP(intellectual property) company that design systems and give licenses to other companies to fabricate them; for example, ARM microprocessors are manufactured by Intel, Texas Instruments, Samsung and by many other Fab companies. The ARM processor is supported by a toolkit which includes an instruction set emulator for hardware modeling and software testing and benchmarking, an assembler, C and C++ compilers, a linker and a symbolic debugger. So, ARM is not a Fab company, it only gives licenses to companies that want to manufacture ARM based CPUs or System On Chip products. The two main types of licenses offered by ARM are “Implementation Licenses and Architecture License”. The implementation license provides complete information required to design and manufacture integrated circuits containing an ARM processor core. ARM give two types of licenses. Software core and Hardware core. A hardware core is optimized for a specific manufacturing process while a soft core can be used in any process but it is less optimized. The architecture license enables the licensee to develop their own processors compliant with ARM ISA. Unique features of ARM Processors .
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
From the above example it is clear that there are total 7 Instructions (2 conditional branch and 2 unconditional Jump instructions). So, implementation using conditional execution generates shorter code and increases execution
speed. Also an instruction has only its normal effect if the status satisfies a condition specified in the instructions, otherwise the instruction acts as a NOP.
Another unusual architectural feature is in ARM is Shift Instructions are not provided explicitly in ARM. However an immediate value or one of the register operands in Arithmetic, Logic and
Move instructions can be shifted by a prescribed amount before being used in an operation. Consider the following ARM instruction with r1 = 3 and r2 = 5
ADD r0,r1, r2, LSL#3 ; r0= r1 + (8 x r2) which is r0 = 3+ (8x5) =43
Consider a MOV instruction with r1 = 168 and r2 = 3:
MOV r0,r1,LSR r2; Shift the binary value of 168, 3 places to right
168=0000 0000 0000 0000 0000 0000 1010 1000
Shifted 3 places
Becomes 0000 0000 0000 0000 0000 0000 0001 0101
r0:=21
For positive numbers, LSR 3 is the same as dividing by 2 ^ 3 (8)
This feature is used to implement shift instructions implicitly.
Though there are different numbers of multiply instructions for use in signal processing applications, there are no hardware Divide instructions. Division must be implemented in
software.
ARM was one of the first architectures to implement load-store multiple instructions. These can
transfer multiple registers between memory and processor in a single instruction.
ARM processor include an inline barrel shifter to pre-process one of the input registers. This
barrel shifter helps in executing arithmetic instructions like multiplication and multiply accumulate etc.
The simplicity in architecture reduces the overhead on each instruction allowing the clock cycles to be shortened.
ARM 7TDMI-S Processor : The ARM7TDMI-S processor is a member of the ARM family of
general-purpose 32-bit microprocessors. The ARM family offers high performance for very low-
This single instruction saves three program steps. Because BXJ performs three operations.
First it checks the condition .If the condition is true it will store it in the Pc and load a new Pc.
Then it will store it in the Pc and load a new Pc .Then it will set the Java state and takes a branch.
CPSR in Cortex Processors
Do Not Modify (DNM) must not be modified by software.
The IT execution state bits
IT[7:5] encodes the base condition code for the current IT block, if any. It contains b000 when no
IT block is active.
IT[4:0] encodes the number of instructions that are to be conditionally executed, and whether the
condition for each is the base condition code or the inverse of the base condition code. It contains
b00000 when no IT block is active.
SPSR Register: The SPSR is used to store the current value of the CPSR when an exception occurs so that it can be restored after handling the exception. Each exception handling mode can
access its own SPSR. User mode and System mode do not have an SPSR because they are not exception handling modes.
Processor Modes: There are seven processor modes. Six privileged modes abort, fast interrupt
request, interrupt request, supervisor, system, and undefined and one un-privileged mode called
user mode.
i.The processor enters abort mode when there is a failed attempt to access memory.
ii.Fast interrupt request and iii. interrupt request modes correspond to the two interrupt levels
iv. Pre-fetch abort(Instruction Fetch memory fault)
v.Data abort (Data access memory fault)
vi. IRQ(normal Interrupt)
vii. FIQ (Fast Interrupt request).
When an Exception occurs , the processor performs the following sequence of actions:
• It changes to the operating mode corresponding to the particular exception.
• It saves the address of the instruction following the exception entry instruction in r14 of the
new mode.
• It saves the old value of the CPSR in the SPSR of the new mode.
• It disables IRQs by setting bit 7 of the CPSR and, if the exception is a fast interrupt, disables
further fast interrupts by setting bit 6 of the CPSR.
• It forces the PC to begin executing at the relevant vector address
Excdption / Interrupt Name Address High Address
Reset RESET 0X00000000 0Xffff0000
Undefined Instruction UNDEF 0X00000004 0Xffff0004
Software Interrupt SWI 0X00000008 0Xffff0008
Pre-fetch Abort PABT 0X0000000C 0Xffff000c
Data Abort DABT 0X00000010 0Xffff0010
Interrupt Request IRQ 0X00000018 0Xffff0018
Fast Interrupt Request FIQ 0X0000001C 0Xffff001c
The exception Vector table shown above gives the address of the subroutine program to be
executed when the exception or interrupt occurs. Each vector table entry contains a form of
branch instruction pointing to the start of a specific routine.
In the above table one can see the missing of 0X00000014 address .This location was used on
earlier ARM processors which operated within a 26-bit address space to trap load or store addresses which fell outside the address space. These traps were referred to as 'address exceptions'. Since 32-bit ARMs do not generate addresses which fall outside their 32-bit
address space, address exceptions have no role in the current architecture and the vector address at 0x00000014 is unused.
Similarly some ARM vendors use the Vector table at more than one memory locations .Hence you have two address locations (Address and High address).This depend on the type and configuration of the ARM processor.
Reset vector is the location of the first instruction executed by the processor when power is applied. This instruction branches to the initialization code.
Undefined instruction vector is used when the processor cannot decode an instruction.
Software interrupt vector is called when you execute a SWI instruction. The SWI instruction is
frequently used as the mechanism to invoke an operating system routine.
Pre-fetch abort vector occurs when the processor attempts to fetch an instruction from an address
without the correct access permissions. The actual abort occurs in the decode stage.
Data abort vector is similar to a prefetch abort but is raised when an instruction attempts to
access data memory without the correct access permissions. Interrupt request vector is used by external hardware to interrupt the normal execution flow of
the processor. It can only be raised if IRQs are not masked in the CPSR. The Thumb programmer's model ARM cores after reset, start executing ARM instructions. The normal way they switch to
execute Thumb instructions is by executing a Branch and Exchange instruction (BX). The Thumb instruction set is a subset of the ARM instruction set and the instructions operate on
a restricted view of the ARM registers. i.e all the registers are not available in Thumb mode. Only registers r0 –r7 (Low registers) and special function registers (r13-r15)are available in Thumb mode.
r13 is used as a stack pointer.
r14 is used as the link register.
r15 is the program counter (PC).
The CPSR condition code flags are set by arithmetic and logical operations and control conditional branching.
encodings for VFP instructions, whereas ARMv6-M (Cortex-M0/M0+) only uses Thumb-2 in
the form of a handful of 4-byte system instructions.
ARM-Thumb transfer instructions:
(i). BX Rm
Thumb version branch exchange pc = Rm & 0xfffffffe, T = Rm[0]
(ii). BLX Rm ; Thumb version branch exchange with link pc = Rm & 0xfffffffe, T = Rm[0]
lr = address of next instruction after BLX+1 Example1: ARM code CODE32 ; word aligned
LDR r0, =thumbCode+1 address (thumbCode)= 0x00009000 ; r0 = 0x00009001 BLX r0 ; branch to Thumb code & mode
Example 2: Thumb code
CODE16 ; halfword aligned Thumb Code
ADD r1, #1 BX lr ; branch to ARM code & mode
Co-Processor Interface: ARM 7 supports for up to 16 logical Coprocessors. The introduction of this concept is mainly aimed at improving the performance of ARM processor.Each coprocessor
can have up to 16 private registers of any size without limiting to 32 bits.
Co-processors use load/store architecture.
The ARM7TDMI Co-processor is based on “Bus Watching”
The Co-processor is attached to a a bus where ARM instruction stream flows into ARM
and the coprocessor copies the instructions into an internal pipeline that is similar to ARM instruction pipe line.
There are three hand shake signals between ARM and the co-processor before execution
of instructions. (i).CPI(From ARM to all Co-processors):Co-processor instruction. Indicates that ARM has
identified a co-processor instruction and wishes to execute it.
(ii).CPA(From Co-processor to ARM):Co-processor absent, which tells the ARM that there is no
ARM co-processor present that is able to execute the current instruction.
(iii).CPB(From the co-processor to ARM):Co-processor busy signal which tells the ARM that
the co-processor cannot begin executing the instruction set.
The timing is such that the ARM and co-processor must generate their respective signals
The Cortex processor family is subdivided into three different profiles.Cortex-A, Cortex-M and
Cortex-R. Each profile is optimized for different segments of embedded systems applications.
A denotes Application, M denotes Microcontroller and R denotes Real Time. The Cortex-A profile has been designed as a high-end application processor. Cortex-A processors
are capable of running feature-rich operating systems such as WinRT and Linux.The key
applications for Cortex-A are consumer electronics such as smart phones, tablet computers, and
set-top boxes.
Unlike earlier ARM CPUs, the Cortex-M processor family is designed specifically for use within
a small microcontroller.
The Cortex-M processor comes in five variants: Cortex-M0, Cortex-M01, Cortex-M1, Cortex-
M3, and Cortex-M4. The Cortex-M0 and Cortex-M01 are the smallest processors in the family.
This helps the manufacturers to design low-cost, low-power devices that can replace existing 8-
bit microcontrollers while still offering 32-bit performance.
The Cortex-M1 has much of the same features as the Cortex-M0 but has been designed as a “soft
core” to run inside a Field Programmable Gate Array (FPGA) device.
The highest performing member of the Cortex-M family is the Cortex-M4.This has all the
features of the Cortex-M3 and adds support for digital signal processing (DSP) and also includes
hardware floating point support for single precision calculations.
The third Cortex profile is Cortex-R. This is the real-time profile that delivers a high-
performance processor which is the heart of an application specific device.
Very often a Cortex-R processor forms part of a “system-on-chip” design that is focused on a
specific task such as hard disk drive (HDD) control, automotive engine management, and
medical devices. The Arm Cortex-R real-time processors offer high-performance computing
solutions for embedded systems where reliability, high availability, fault tolerance and/or
deterministic real-time responses are needed.
Cortex-R processors are used in products where performance requirements and timing deadlines
must always be met.
In addition, Cortex-R processors are used in electronic systems which must be functionally safe
to avoid hazardous situations, for example, in medical applications or autonomous systems.