Embedded Hardware Foundation
Embedded Hardware Foundation
Content
CPU Bus Memory I/O Design, develop and debug
1. CPU I/O programming
Busy/wait Interrupt-driven
Supervisor mode, exceptions, traps Co-processor Memory System
Cache Memory management
Performance and power consumption
I/O devices
Usually includes some non-digital component.
Typical digital interface to CPU:
CPU
statusreg
datareg
mec
hani
sm
Application: 8251 UART
Universal asynchronous receiver transmitter (UART) : provides serial communication.
8251 functions are integrated into standard PC interface chip.
Allows many communication parameters to be programmed.
8251 CPU interface
CPU 8251
status(8 bit)
data(8 bit)
serialport
xmit/rcv
Programming I/O Two types of instructions can support
I/O: special-purpose I/O instructions; memory-mapped load/store instructions.
Intel x86 provides in, out instructions. Most other CPUs use memory-mapped I/O.
I/O instructions do not preclude memory-mapped I/O.
ARM memory-mapped I/O Define location for device:DEV1 EQU 0x1000 Read/write code:
LDR r1,#DEV1 ;set up device addressLDR r0,[r1] ;read DEV1LDR r0,#8 ;set up value to writeSTR r0,[r1] ;write value to device
peek and poke (Using C)
int peek(char *location) {return *location;
}
void poke(char *location, char newval) {(*location) = newval;
}
Busy/wait output Simplest way to program device.
Use instructions to test when device is ready.char *mystring="hello, world.";char *current_char;current_char = mystring;while (*current_char != ‘\0’) {
while (peek(OUT_STATUS) != 0);poke(OUT_CHAR,*current_char);current_char++;
}
Simultaneous busy/wait input and output
while (TRUE) {/* read */while (peek(IN_STATUS) != 0);achar = (char)peek(IN_DATA);/* write */while (peek(OUT_STATUS) != 0);poke(OUT_DATA,achar);
}
Interrupt I/O
Busy/wait is very inefficient. CPU can’t do other work while testing
device. Hard to do simultaneous I/O.
Interrupts allow a device to change the flow of control in the CPU. Causes subroutine call to handle
device.
Interrupt interface
CPU
statusreg
datareg
mec
hani
sm
PC
intr request
intr ack
data/address
IR
Interrupt behavior
Based on subroutine call mechanism.
Interrupt forces next instruction to be a subroutine call to a predetermined location. Return address is saved to resume
executing foreground program.
Interrupt physical interface
CPU and device are connected by CPU bus.
CPU and device handshake: device asserts interrupt request; CPU asserts interrupt acknowledge
when it can handle the interrupt.
Example: interrupt-driven input and outputvoid input_handler();void output_handler();main() {
while (TRUE) {if (gotchar) {
while (peek(OUT_STATUS) != 0);poke(OUT_DATA,achar);gotchar = FALSE;
}}
}
Example: character I/O handlers
void input_handler() {
achar = peek(IN_DATA);gotchar = TRUE;poke(IN_STATUS,0);
}void output_handler() {}
Example: interrupt I/O with buffers
Queue for characters:
head tailhead tail
a
Buffer-based input handlervoid input_handler() {
char achar;if (full_buffer())
error = 1;else {
achar = peek(IN_DATA); add_char(achar);
}if (nchars == 1) {
poke(OUT_DATA,remove_char(); poke(OUT_STATUS,1); }}
}
Buffer-based output handlervoid output_handler() {
if (!empty_buffer()) { poke(OUT_DATA, remove_char()); /* send character */ poke(OUT_STATUS, 1); /*turn device on */
}}
Priorities and vectors
Two mechanisms allow us to make interrupts more specific: Priorities determine what interrupt
gets CPU first. Vectors determine what code is called
for each type of interrupt. Mechanisms are orthogonal: most
CPUs provide both.
Prioritized interrupts
CPU
device 1 device 2 device n
L1 L2 .. Ln
interruptacknowledge
Interrupt prioritization
Masking: interrupt with priority lower than current priority is not recognized until pending interrupt is complete.
Non-maskable interrupt (NMI): highest-priority, never masked. Often used for power-down.
Interrupt vectors
Allow different devices to be handled by different code.
Interrupt vector table:
handler 0
handler 1
handler 2
handler 3
Interruptvector
table head
Interrupt vector acquisition
CPU
device
interruputrequest
interruputack. vector
Interrupt vector acquisition
:CPU :device
receiverequest
receiveack
receivevector
Interrupt sequence CPU checks pending interrupt requests and
acknowledges the one of highest priority. Device receives acknowledgement and sends
vector. CPU locates the handler using vector as
index of interrupt table and calls the handler. Software processes request. CPU restores state to foreground program.
Sources of interrupt overhead
Handler execution time. Interrupt mechanism overhead. Register save/restore. Pipeline-related penalties. Cache-related penalties.
ARM interrupts
ARM7 supports two types of interrupts: Fast interrupt requests (FIQs). Interrupt requests (IRQs).
Interrupt table starts at location 0.
ARM interrupt procedure CPU actions:
Save PC. Copy CPSR to SPSR. Force bits in CPSR to record interrupt. Force PC to vector.
Handler responsibilities: Restore proper PC. Restore CPSR from SPSR. Clear interrupt disable flags.
Exception and Trap Exception:
internally detected error. Exceptions are synchronous with instructions b
ut unpredictable. Build exception mechanism on top of interrupt
mechanism. Exceptions are usually prioritized and vectorized.
Trap (software interrupt) an exception generated by an instruction. Call supervisor mode.
Supervisor mode
May want to provide protective barriers between programs. Avoid memory corruption.
Need supervisor mode to manage the various programs.
ARM CPU modes
处理器模式 描述用户模式 (User, usr) 正常程序执行的模式快速中断模式 (FIQ, fiq) 用于高速数据传输和通道处理外部中断模式 (IRQ, irq) 用于通常的中断处理管理模式 (Supervisor, svc) 供操作系统使用的一种保护模式数据访问中止模式(Abort, abt)
用于虚拟存储及存储保护
未定义指令中止模式(Undefined, und)
用于支持通过软件仿真硬件的协处理器
系统模式 用于运行特权级的操作系统任务
异常模式
ARM CPU modes (cont’d)
SWI (Software interrupt) 指令 格式
SWI{< 条件码 >} immed_24 SWI 指令用来执行系统调用,处理器进入管
理模式, CPSR 保存到管理模式的 SPSR 中,并从地址 0x08 开始执行指令。 <immed_24> 由系统所解释。
条 件 码 1 1 1 1 24 (位 符 号 数 偏 移 量 解 释 )
31 28 27 24 23 0
Co-processor
Co-processor: added function unit that is called by instruction. Floating-point units are often
structured as co-processors. ARM allows up to 16 designer-
selected co-processors. Floating-point co-processor uses units
1 and 2.
Memory System
Cache Memory Management Unit
Cache Small amount of fast memory Sits between normal main
memory and CPU May be located on CPU chip or
module
Cache operation - overview CPU requests contents of memory location Check cache for this data If present, get from cache (fast) If not present, read required block from
main memory to cache Then deliver from cache to CPU Cache includes tags to identify which block
of main memory is in each cache slot
Cache operation Many main memory locations are
mapped onto one cache entry. May have caches for:
instructions; data; data + instructions (unified).
Memory access time is no longer deterministic.
Cache organizations
Direct-mapped: each memory location maps onto exactly one cache entry.
Fully-associative: any memory location can be stored anywhere in the cache (almost never implemented).
N-way set-associative: each memory location can go into one of n sets.
主存 块号
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Cache块号
0
1
2
3
4
5
6
7
主存 块号
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Cache块号
0
1
2
3
4
5
6
7
主存 块号
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Cache块号
0
1
2
3
4
5
6
7
0第 组
1第 组
2第 组
3第 组
a( )全相联映象 b( )直接映象
c( )组相联映象
Example Cache of 64kByte
Cache block of 4 bytes i.e. cache is 16k (214) lines of 4 bytes
16MBytes main memory 24 bit address (224=16M) 222 blocks; 28 blocks will be mapped
into one cache line on the average
Direct-mapped cache Each block of main memory maps to
only one cache line i.e. if a block is in cache, it must be in one
specific place Address is in two parts Least Significant w bits identify unique
word Most Significant s bits specify one
memory block The MSBs are split into a cache line field
r and a tag of s-r (most significant)
Direct MappingAddress Structure
Tag s-r Line or Slot r Word w
8 14 2
24 bit address 2 bit word identifier (4 byte block) 22 bit block identifier
8 bit tag (=22-14) 14 bit slot or line
No two blocks in the same line have the same Tag field
Check contents of cache by finding line and checking Tag
Direct-mapped cache
valid
=
tag index offset
hit value
tag data
1 0xabcd byte byte byte ...
byte
cache block
Fully-associative cache Set-associative cache
Write operations
Write-through: immediately copy write to main memory.
Write-back: write to main memory only when location is removed from cache.
Memory management units
Memory management unit (MMU) translates addresses:
CPUmain
memory
memorymanagement
unit
logicaladdress
physicaladdress
Memory management tasks
Allows programs to move in physical memory during execution.
Allows virtual memory: memory images kept in secondary
storage; images returned to main memory on
demand during execution. Page fault: request for location not
resident in memory.
Address translation
Requires some sort of register/table to allow arbitrary mappings of logical to physical addresses.
Two basic schemes: segmented; paged.
Segmentation and paging can be combined (x86).
Segments and pages
memory
segment 1
segment 2
page 1page 2
Segment address translation
segment base address logical address
rangecheck
physical address
+
rangeerror
segment lower boundsegment upper bound
Page address translation
page offset
page offset
page i base
concatenate
Page table organizations
flat tree
page descriptor
pagedescriptor
Caching address translations
Large translation tables require main memory access.
TLB: cache for address translation. Typically small.
ARM memory management
Memory region types: section: 1 Mbyte block; large page: 64 kbytes; small page: 4 kbytes.
An address is marked as section-mapped or page-mapped.
Two-level translation scheme.
CPU performance and power consumption
Example: Intel XScale core
2. Bus
总线: CPU 与存储器和设备通信的机制 一组组相关的电线 部件间通信的协议
四周期握手协议 总线主控器
启动总线传输的设备,如 CPU , DMA 控制器
一个基本的总线连接
DMA
DMA: Direct Memory Access 允许读写不由 CPU 控制的总线操作。 D
MA 传输由 DMA 控制器控制,它从 CPU请求总线控制。得到控制权后, DMA 控制器直接在设备和内存之间执行读写操作。
带 DMA 控制器的总线连接 附加的总线信号
总线请求 总线授权
桥 高速总线和低速总线 总线互连
高速总线提供更宽的数据连接 低速设备降低成本 桥允许总线独立操作。在 I/O 中提供某些并
行性
ARM 总线 -AMBA AMBA: Advanced Microcontroller Bus
Architecture 2.0 版 AMBA 标准定义了三组总线:
AHB(AMBA High-performance Bus) ASB(AMBA System Bus) APB(AMBA Peripheral Bus)
典型的基于 AMBA 的系统 一个典型的基
于 AMBA 的微控制器将使用AHB 或 ASB总线,再加上APB 总线。
ASB 总线是旧版的系统总线;而 AHB 较晚推出,以增强对更高性能、综合及时序验证的支持
3. Memory
RAM SRAM DRAM
ROM PROM , EPROM , EEPROM Flash ROM
Flash 在嵌入式系统中的两种作用 (boot ROM 、hard disk)
4. I/O Watchdog timer A/D & D/A Converter LCD LED Touch screen Key board USB ……
Watchdog timer 看门狗定时器是一个用来引导嵌入式微处理器
脱离死锁状态的部件。是嵌入式系统中的特色部件。
在一个较好的系统中,软件将定时监视或重置看门狗定时器。如果软件和设备工作正常,看门狗定时器得到定期重置。当软件和设备无效工作时,看门狗定时器得不到重置,这样它将持续计数,直到溢出,产生中断使 CPU复位。
Watchdog timer
Watchdog timer is periodically reset by system timer.
If watchdog is not reset, it generates an interrupt to reset the host.
host CPU watchdogtimer
interrupt
reset