1 Chapter 4 Interrupts and Excep tions Chapter 4 Interrupts and Exceptions
1Chapter 4 Interrupts and Exceptions
Chapter 4Interrupts and Exceptions
2Chapter 4 Interrupts and Exceptions
Introduction An interrupt is an event that alters the
sequence of instructions executed by a processor In corresponding to electrical signals generated
by HW circuits both inside and outside CPU Interrupts: asynchronous interrupts
Generated by HW devices (e.g., internal timers and I/O devices) at arbitrary times
Exceptions: synchronous interrupts Produced by CPU control unit only after
completion of an executing instruction E.g., divide-by-0, page faults
3Chapter 4 Interrupts and Exceptions
Role of Interrupt/Exception Signals
When an interrupt/exception signal occurs, CPU Saves current process status (eip and cs) in t
he Kernel Mode stack Places addr of IH into program counter
The code executed in IH is not a process It is a kernel control path that runs on behal
f of the same process
4Chapter 4 Interrupts and Exceptions
Interrupt/Exception Handler Requirements As short as possible
Deferring as much processing as it can E.g., A block of data arrives on a network line Top-half vs. bottom-half
Nested interrupt handling Should be allowed as much as possible to keep I/O
devices busy Interrupt handlers in Linux need not to be reentrant
When an IH is executing, the corresponding interrupt line is masked out on all processors
The same IH is never invoked concurrently to service a nested interrupt
Maskable interrupts Some critical regions will not allow interrupts Be limited as much as possible
5Chapter 4 Interrupts and Exceptions
Interrupts and Exceptions
6Chapter 4 Interrupts and Exceptions
Interrupts Definition
Maskable interrupts All IRQ issued by I/O devices Can be in 2 states: masked or unmasked
Nonmaskable interrupts Critical events such as HW failures Always recognized by CPU
7Chapter 4 Interrupts and Exceptions
Exceptions Definition Processor-detected exceptions: when CPU detects an
omalous condition while executing an instruction Faults: The saved eip is the addr of the instruction causing fau
lt re-execute same inst after IH Usage: e.g. page fault handler
Traps: saved eip is the addr of inst after the one causing traps Main usage: debugging purpose (e.g. reaching a breakpoint)
Aborts: a serious error that may be unable to determine exact inst causing this error terminate affected process
Programmed exceptions: occur at the request of programmer Triggered by int, int3, into, bound instructions Handled by control unit as traps Often called SW interrupts Usage: to implement system calls and to notify a debugger of
a specific event
8Chapter 4 Interrupts and Exceptions
Interrupt or Exception Vector
Each interrupt or exception is identified by a number from 0 to 255 Such a number is called its vector
The vectors of nonmaskable interrupts and exceptions are fixed
Maskable interrupts can be altered by programming the Interrupt Controller
9Chapter 4 Interrupts and Exceptions
IRQs Each HW device controller capable of issuing
interrupts has an output line IRQ All existing IRQ lines are connected to the input
pins of the Interrupt Controller Interrupt Controller (IC) executes
Monitoring IRQ lines, checking for raised signals If a raised signal is detected on an IRQ line
1. Converts signal into a corresponding vector2. Stores vector in an IC I/O port, for CPU to read3. Sends a signal to CPU’s INTR pin (i.e., issues an interrupt)4. CPU recognizes and writes one of Programmable Interrupt
Controller (PIC) I/O ports5. Clear INTR line
Go back to monitoring step
10Chapter 4 Interrupts and Exceptions
IRQn
Device 1 Device 2
PIC IRQn_interrupt()
do_IRQ(n)
Interrupt serviceroutine 1
Interrupt serviceroutine 2
INT IDT[32+n]
I/O Interrupt HandlingSOFTWARE
(Interrupt Handler)HARDWARE
11Chapter 4 Interrupts and Exceptions
IRQ Lines The first IRQ line is IRQ0
The # of available IRQ lines is limited to 15 for now Intel default vector for IRQn = n + 32
Mapping between IRQs and vectors can be modified by suitable I/O insts to IC ports
PIC can be told to stop issuing interrupts referring to a given IRQ line Disabled interrupts are not lost but delayed
Selective enabling/disabling IRQs is not the same as global masking/unmasking interrupts When IF flag of eflags register is clear maskable i
nterrupts are temporarily ignored by CPU
12Chapter 4 Interrupts and Exceptions
Homework Practice
How do you find out your Linux PC IRQ assignment? Ans: go to /proc/interrupts
13Chapter 4 Interrupts and Exceptions
Exceptions
80x86 issues ~20 different exceptions Each exception type is associated with a
dedicated exception handler For some exceptions, CPU also generates a
HW error code and pushes it in Kernel Mode stack before jumping to exception handler
An exception handler usually sends a Unix signal to the process
Exceptions 20-31 are reserved by Intel
14Chapter 4 Interrupts and Exceptions
Interrupt Vectors
IRQ vector assignment Vector assignment range: 32-238 128 is reserved for system call exception
Vector range Use
0-19 (0x0 – 0x13) Nonmaskable interrupts and exceptions
20-31 (0x14 – 0x1f) Intel-reserved
32-127 (0x20 – 0x7f) External interrupts (IRQs)128 (0x80) System call exception
129-238 (0x81 – 0xee)
External interrupts (IRQs)
239 (0xef) Local APIC timer interrupt
240-250 (0xf0 – 0xfa)
Reserved by Linux for future use
251 – 255 (0xfb – 0xff)
Interprocessor interrupts
15Chapter 4 Interrupts and Exceptions
# Exception Handler Signal
0 Divide error divide_error() SIGFPE
1 Debug debug() SIGTRAP
2 NMI nmi() None
3 Breakpoint int3() SIGTRAP
4 Overflow overflow() SIGSEGV
5 Bounds check bounds() SIGSEGV
6 Invalid opcode invalid_op() SIGILL
7 Device not available
device_not_available() SIGSEGV
8 Double fault double_fault() SIGSEGV
9 Coprocessor segment overrun
coprocessor_segment_overrun()
SIGFPE
16Chapter 4 Interrupts and Exceptions
# Exception Handler Signal
10 Invalid TSS invalid_tss() SIGSEGV
11 Segment not present
segment_not_present() SIGBUS
12 Stack exception stack_segment() SIGBUS
13 General protection
general_protection() SIGSEGV
14 Page Fault page_fault() SIGSEGV
15 Intel reserved None None
16 Floating-point error
coprocessor_error() SIGFPE
17 Alignment check alignment_check() SIGBUS
18 Machine check machine_check() None
19 SIMD floating point
simd_coprocessor_error()
SIGFPE
17Chapter 4 Interrupts and Exceptions
Review Slide Interrupts? Exceptions? Interrupt handler? Requirements? Maskable vs. nonmaskable interrupts? Processor-detected exceptions?
Faults, traps, aborts Programmed exceptions?
SW interrupts? Interrupt vector? Range? Vector assignment? Interrupt controller processing steps?
18Chapter 4 Interrupts and Exceptions
Review Slide Intel default vector for IRQn? Disabled interrupts? Masked interrupts? Number of exceptions defined for Intel? Homework #3: User-mode vs. kernel-mo
de stack Required for EOS new students Optional for others. Not graded. 忠毅 : please present your report next week
19Chapter 4 Interrupts and Exceptions
Interrupt Descriptor Table
20Chapter 4 Interrupts and Exceptions
Interrupt Descriptor Table IDT associates each interrupt (exception) vector with one interru
pt handler IDT must be properly initialized before kernel enable interrup
ts Each entry in IDT is 8 bytes descriptor
A maximum of 256x8 = 2048 bytes are required to store IDT The register idtr stores base addr of IDT The P bit indicates whether it is currently in memory 3 types of descriptors in IDT (40-43 bits)
Task Gate (Linux does not use it) Interrupt Gate: before jumping to proper segment, CPU clears
IF flag disabling maskable interrupts Trap Gate: before jumping to proper segment, CPU does not
modify IF flag
21Chapter 4 Interrupts and Exceptions
RESERVED PDPL
0 01 0 1
TSS SEGMENT SELECTOR RESERVED
RESERVED
Task Gate Descriptor
OFFSET(16-31) PDPL
0 11 1 0
SEGMENT SELECTOR OFFSET(0-15)
RESERVED
Interrupt Gate Descriptor
OFFSET(16-31)
SEGMENT SELECTOR OFFSET(0-15)
Trap Gate Descriptor
0 0 0
PDPL
0 11 1 1 RESERVED0 0 0
P 0 0 1 0 1
P 0 1 1 1 0 0 0 0
P 0 1 1 1 1 0 0 0
63 48 47 46 45 44 43 42 41 40 39 38 37 36 32
63 48 47 46 45 44 43 42 41 40 39 38 37 36 32
63 48 47 46 45 44 43 42 41 40 39 32
31 16 15 0
31 16 15 0
31 16 15 0
22Chapter 4 Interrupts and Exceptions
HW Handling of Interrupts (Exceptions)
In between instructions, control unit (CPU) checks if any interrupt or exception occurs
1. Determines vector i (0<=i<=255) associated with the interrupt (exception)
2. Read i-th entry of IDT3. Obtain IH addr (by entry’s segment selector gdtr GDT
segment base addr)4. Check privilege level by comparing cs’s CPL and IH’s seg
ment’s DPL5. Use the right stack (after checking privilege level)6. If a fault has occurs, load cs and eip with the add of the inst
causing fault7. Saves contents of eflags, cs, and eip in the stack8. Load cs and eip of the IH routine
23Chapter 4 Interrupts and Exceptions
Interrupt Handler Return Path1. Load cs, eip, and eflags registers with the val
ues stored in the stack If a HW error code has been pushed in the stack o
n top of eip, it must be popped before taking the return path
2. Check if CPL of ISR’s cs == the CPL value of the restored cs. If so, ISR is done.
3. Otherwise, load ss and esp from stack and return to the stack associated with old privilege level
4. Take care of user-mode process return case to avoid using wrong segment selectors
24Chapter 4 Interrupts and Exceptions
Nested Execution of IHs Linux does not allow process switching during an inter
rupt handler routine But, an interrupt handler may be interrupted by another one The current process does not change during nested IHs
The only kernel exception is Page Fault exception The rest exceptions should only be raised in user mode Otherwise (raised in kernel mode), it caused a kernel panic
Page fault exception handlers may suspend current process (until requested page is in memory) Context switch is possible inside this handler
Interrupts raised by I/O devices do not refer to data structures specific to current process
25Chapter 4 Interrupts and Exceptions
Nested Execution of IHs
Interrupt handlers cannot allow page fault No exception handler may preempt interrupt handler No context switch will take place inside interrupt han
dler Nested execution of IHs for
To improve throughput of PIC and device controllers Before CPU acks an interrupt, both PIC and a device controll
er are blocked To implement an interrupt model without priority mo
del An interrupt handler can be preempted by another one
26Chapter 4 Interrupts and Exceptions
IDT Initialization The base addr of IDT should be loaded into idtr before kernel ena
bles interrupts lidt idt_descr # (arch/i386/kernel/head.S) idt_descr: .word IDT_ENTRIES*8-1 # idt contains 256 entries .long idt_table
The int instruction allows a User Mode process to issue any interrupt signal with any vector in 0 and 255 To block illegal int from a user-mode process, set DPL of gate descri
ptor to 0 When an int from a user-mode process, its CPL (3) > DPL (0) “gen
eral protection” exception
In a few cases, a user-mode process must be able to issue a programmed exception set DPL of gate descriptor to 3
27Chapter 4 Interrupts and Exceptions
Interrupt, Trap, System Gates Intel IDT provides 3 types of interrupt descriptors
Task, Interrupt, Trap gate descriptors Linux’s classification
Interrupt gate (DPL = 0) Cannot be accessed by a user-mode process All Linux interrupt handlers use this one
System gate (DPL = 3) An Intel trap gate that can be accessed by a user process Vectors 3 (int3), 4 (into), 5 (bound), 128 (int $0x80)
Trap gate (DPL = 0) An Intel trap gate that cannot be accessed by a user process Most Linux exception handlers use this one
28Chapter 4 Interrupts and Exceptions
IDT Operations set_intr_gate (n,addr)
Insert an interrupt gate in the n-th IDT entry Segment selector kernel code’s selector Offset addr, DPL 0
set_system_gate (n,addr) Insert a trap gate in the n-th IDT entry Segment selector kernel code’s selector Offset addr, DPL 3
set_trap_gate (n,addr) Insert a trap gate in the n-th IDT entry Segment selector kernel code’s selector Offset addr, DPL 0
Code trace: trap_init()
29Chapter 4 Interrupts and Exceptions
IDT Preliminary Initialization IDT is first initialized and used by BIOS Once Linux takes over (protected mode), IDT is initialize
d again by Linux idt_table: 256 entries
During kernel initialization setup_idt() fills all entries in idt_table with ignore_int() arch/i386/kernel/head.S
ignore_int() save registers in stack printk() restore registers from stack
execute iret to resume Second initialization: kernel replaces some entries with r
eal interrupt handlers trap_init()
30Chapter 4 Interrupts and Exceptions
Review Slide IDT? # of entries in IDT? Size of each
entry? Base addr of IDT?
Types of descriptors in IDT? The only kernel exception? How to block illegal interrupt from a user-
mode process? How to enable a user-mode process issue
a programmed exception? Linux interrupt descriptor classification?
Interrupt gate, System gate, Trap gate?
31Chapter 4 Interrupts and Exceptions
Review Slide set_intr_gate(), set_system_gate(), set_t
rap_gate()?
32Chapter 4 Interrupts and Exceptions
Exception Handling
33Chapter 4 Interrupts and Exceptions
Introduction Most exceptions issued by CPU are interpreted by Linu
x as error conditions A signal is sent to current process If no signal handler is set for that signal, it aborts current proc
ess Special case: page fault exception
Exception handler handling steps: Save registers in Kernel Mode stack Call a high-level C function to handle exception Exit from handler by call ret_from_exception()
Code trace: page_fault exception arch/i386/kernel/entry.S arch/i386/kernel/traps.C
34Chapter 4 Interrupts and Exceptions
Exception Handler Registrationvoid __init trap_init(void){
…set_trap_gate(0,÷_error);set_intr_gate(1,&debug);set_intr_gate(2,&nmi);set_system_gate(3,&int3);/* int3-5 can be called from all */set_system_gate(4,&overflow);set_system_gate(5,&bounds);set_trap_gate(6,&invalid_op);set_trap_gate(7,&device_not_available);set_task_gate(8,GDT_ENTRY_DOUBLEFAULT_TSS);set_trap_gate(9,&coprocessor_segment_overrun);set_trap_gate(10,&invalid_TSS);set_trap_gate(11,&segment_not_present);set_trap_gate(12,&stack_segment);set_trap_gate(13,&general_protection);set_intr_gate(14,&page_fault);set_trap_gate(15,&spurious_interrupt_bug);set_trap_gate(16,&coprocessor_error);set_trap_gate(17,&alignment_check);
set_trap_gate(19,&simd_coprocessor_error);
set_system_gate(SYSCALL_VECTOR,&system_call);
set_call_gate(&default_ldt[0],lcall7);set_call_gate(&default_ldt[4],lcall27);
cpu_init();trap_init_hook();
}
35Chapter 4 Interrupts and Exceptions
Entering/Leaving Exception Handler A high-level C handler often stores error code and vect
or in task_struct and sends a suitable signal to current process
current->tss.error_code = error_code;current->tss.trap_no = vector;force_sig(sig_num, current); Code trace: do_general_protection()
The current process takes care of signal right after termination of exception handler Signal will be processed by process’s signal handler If no handler is available, kernel will handle it and kill process
When exception handler returns, it goes toaddl $8, %espjmp ret_from_exception
36Chapter 4 Interrupts and Exceptions
Interrupt Handling
37Chapter 4 Interrupts and Exceptions
Introduction No signal is sent to process for interrupts
Signal is sent to process for exceptions Interrupt handler for a device is part of t
he device’s driver Interrupt types:
I/O interrupts: to handle I/O devices Timer interrupts: Chapter 6
Self-reading material Interprocessor interrupts: to interrupt anoth
er CPU in a MP system
38Chapter 4 Interrupts and Exceptions
I/O Interrupt Handling An I/O IH should be capable of servicing several device
s at the same time Several devices may share same IRQ Refer to Table 4.3 in next slide
IRQ sharing One interrupt handler executes several ISRs Each ISR is related to a single device sharing this IRQ line Each ISR is executed when an interrupt occurs
IRQ dynamic allocation An IRQ line is associated with a device when accessed E.g. floppy disk device Same IRQ vector may be used by several devices, but not at t
he same time
39Chapter 4 Interrupts and Exceptions
IRQn
Device 1 Device 2
PIC IRQn_interrupt()
do_IRQ(n)
Interrupt serviceroutine 1
Interrupt serviceroutine 2
INT IDT[32+n]
I/O Interrupt HandlingSOFTWARE
(Interrupt Handler)HARDWARE
40Chapter 4 Interrupts and Exceptions
Sample: IRQ Assignment to I/O Devices
IRQ INT Device IRQ INT Device
0 32 Timer 10 42 Network interface
1 33 Keyboard 11 43 USB, sound card
2 34 PIC cascading
12 44 PS/2 mouse
3 35 2nd serial port
13 45 Math coprocessor
4 36 1st serial port
14 46 EIDE disk controller 1st
chain
6 38 Floppy disk 15 47 EIDE disk controller 2nd
chain
8 40 System clock
41Chapter 4 Interrupts and Exceptions
Interrupt Handler Structure Linux divides the actions in an IH into 3 classes
Critical, Noncritical, Noncritical deferrable Critical
E.g. ack an interrupt to PIC so it can take another interrupt at the same IRQ line
Executed in IH, with maskable interrupts disabled Noncritical
E.g. updating data structures accessed only by processor Should be finished quickly Executed in IH, with maskable interrupts enabled
Noncritical deferrable E.g. copying buffer content into addr space of some process Can be delayed for a long time Executed outside IH, called bottom-half section
42Chapter 4 Interrupts and Exceptions
Interrupt Vectors Some devices be statically connected to
specific IRQ lines Internal timer IRQ0 Salve 8259A PIC IRQ2 External math-coprocessor IRQ13
3 ways to dynamically select a line for IRQ-configurable devices By setting HW jumpers By a utility program shipped with the device By HW protocol executed at system startup
43Chapter 4 Interrupts and Exceptions
Interrupt Handler Implementation
44Chapter 4 Interrupts and Exceptions
I/O Interrupt Handler Tasks
1. Save IRQ value and register contents in Kernel Mode stack
2. Sends an ack to PIC that is servicing the IRQ line, allowing it to issue further interrupts
3. Execute ISRs associated with all devices sharing this IRQ
4. Terminating by ret_from_intr()
45Chapter 4 Interrupts and Exceptions
typedef struct irq_desc {unsigned int status; /* IRQ line status, next slide */hw_irq_controller *handler;struct irqaction *action; /* IRQ action ISR list */unsigned int depth; /* nested irq disables */unsigned int irq_count; /* For detecting broken interrupts */unsigned int irqs_unhandled;spinlock_t lock;
} ____cacheline_aligned irq_desc_t;
extern irq_desc_t irq_desc [NR_IRQS]; // global variable
typedef struct hw_interrupt_type hw_irq_controller;
struct hw_interrupt_type {const char * typename;unsigned int (*startup) (unsigned int irq);void (*shutdown) (unsigned int irq);void (*enable) (unsigned int irq);void (*disable) (unsigned int irq);void (*ack) (unsigned int irq);void (*end) (unsigned int irq);void (*set_affinity) (unsigned int irq, cpumask_t dest);
};
46Chapter 4 Interrupts and Exceptions
irq_desc0 i 224
irq_desc_t
:
hw_interrupt_type
irqaction irqaction
IRQ Descriptors
47Chapter 4 Interrupts and Exceptions
IRQ Status Listing/* * IRQ line status. */
#define IRQ_INPROGRESS 1 /* IRQ handler active - do not enter! */#define IRQ_DISABLED 2 /* IRQ disabled - do not enter! */#define IRQ_PENDING 4 /* IRQ pending - replay on enable */#define IRQ_REPLAY8 /* IRQ has been replayed but not acked yet */#define IRQ_AUTODETECT 16 /* IRQ is being autodetected */#define IRQ_WAITING 32 /* IRQ not yet seen - for autodetection */#define IRQ_LEVEL 64 /* IRQ level triggered */#define IRQ_MASKED 128 /* IRQ masked - shouldn't be seen again
*/#define IRQ_PER_CPU 256 /* IRQ is per CPU */
48Chapter 4 Interrupts and Exceptions
.dataENTRY(interrupt).text
vector=0ENTRY(irq_entries_start).rept NR_IRQS ALIGN1: pushl $vector-256 jmp common_interrupt.data .long 1b.textvector=vector+1.endr
ALIGNcommon_interrupt: SAVE_ALL call do_IRQ jmp ret_from_intr
#define BUILD_INTERRUPT(name, nr) \ENTRY(name) \ pushl $nr-256; \ SAVE_ALL \ call smp_/**/name; \ jmp ret_from_intr;
/* The include is where all of the SMP etc. interrupts come from */
#include "entry_arch.h"
ENTRY(divide_error) pushl $0 # no error code pushl $do_divide_error ALIGNerror_code: pushl %ds pushl %eax xorl %eax, %eax pushl %edx decl %eax # eax = -1 pushl %ecx pushl %ebx cld movl %es, %ecx movl ORIG_EAX(%esp), %esi # get the error code movl ES(%esp), %edi # get the function address movl %eax, ORIG_EAX(%esp) movl %ecx, ES(%esp) movl %esp, %edx pushl %esi # push the error code pushl %edx # push the pt_regs pointer movl $(__USER_DS), %edx movl %edx, %ds movl %edx, %es call *%edi addl $8, %esp jmp ret_from_exception
49Chapter 4 Interrupts and Exceptions
irq_desc_t irq_desc[NR_IRQS] __cacheline_aligned = {[0 ... NR_IRQS-1] = {
.handler = &no_irq_type,
.lock = SPIN_LOCK_UNLOCKED }};
asmlinkage void __init start_kernel(void){ …
sort_main_extable();trap_init();rcu_init();init_IRQ();… }
void __init init_IRQ(void){
pre_intr_init_hook();
for (i = 0; i < (NR_VECTORS - FIRST_EXTERNAL_VECTOR); i++) {
int vector = FIRST_EXTERNAL_VECTOR + i;
if (i >= NR_IRQS)break;
if (vector != SYSCALL_VECTOR) set_intr_gate(vector, interrupt[i]);
}intr_init_hook();setup_timer(); …
}
void __init pre_intr_init_hook(void){
init_ISA_irqs();}void __init init_ISA_irqs (void){
init_8259A(0);for (i = 0; i < NR_IRQS; i++) { irq_desc[i].status = IRQ_DISABLED; irq_desc[i].action = 0; irq_desc[i].depth = 1;
if (i < 16) { irq_desc[i].handler = &i8259A_irq_type;
} else { irq_desc[i].handler = &no_irq_type; }}
}
static struct hw_interrupt_type i8259A_irq_type = {"XT-PIC",startup_8259A_irq,shutdown_8259A_irq,enable_8259A_irq,disable_8259A_irq,mask_and_ack_8259A,end_8259A_irq,NULL
};
50Chapter 4 Interrupts and Exceptions
asmlinkage unsigned int do_IRQ(struct pt_regs regs){
int irq = regs.orig_eax & 0xff; /* high bits used in ret_from_ code */irq_desc_t *desc = irq_desc + irq;
irq_enter();kstat_this_cpu.irqs[irq]++;spin_lock(&desc->lock);desc->handler->ack(irq);status = desc->status & ~(IRQ_REPLAY | IRQ_WAITING);status |= IRQ_PENDING;
/* we _want_ to handle it */
for (;;) {irqreturn_t action_ret;spin_unlock(&desc->lock);…action_ret = handle_IRQ_event(irq, &r
egs, action);…spin_lock(&desc->lock);desc->status &= ~IRQ_PENDING;
}desc->status &= ~IRQ_INPROGRESS;
out:desc->handler->end(irq);spin_unlock(&desc->lock);irq_exit();return 1;
}
asmlinkage int handle_IRQ_event(unsigned int irq,struct pt_regs *regs, struct irqaction *action)
{int status = 1;/* Force the "do bottom halves" bit */int retval = 0;
if (!(action->flags & SA_INTERRUPT))local_irq_enable(); // RA
do {status |= action->flags;retval |= action->handler(irq,
action->dev_id, regs);action = action->next;
} while (action);
if (status & SA_SAMPLE_RANDOM)add_interrupt_randomness(irq);
local_irq_disable(); // RAreturn retval;
}
51Chapter 4 Interrupts and Exceptions
Registering Interrupt Service Routine
Drivers can register an IH and enable a given interrupt line via
int int request_irq(unsigned int irq, irqreturn_t (*handler)(int, void *, struct pt_regs *), unsigned long irqflags, const char * devname, void *dev_id);
irq: the interrupt line # to allocate For legacy PC device, this value is hard-coded For most other devices, it is probed or determined dynamically
handler: pointer to actual ISR irqflags: discussed in next slide devname: an ASCII text representation such as “keyboard” dev_id: is used as an unique cookie when this line is shared
A common practice is to pass driver’s device structure
52Chapter 4 Interrupts and Exceptions
irqflags Options irqflags may be either 0 or a bit mask of one or more o
f following flags SA_INTERRUPT
The given IH is a fast IH: it runs with all interrupts disabled on local processor
By default (w/o this flag), all interrupts are enabled except the interrupt lines of any running handlers
SA_SAMPLE_RANDOM Interrupts generated by this device should contribute to the k
ernel random pool Used on devices with non-deterministic interrupt intervals
SA_SHIRQ The interrupt line cab be shared among multiple ISRs
53Chapter 4 Interrupts and Exceptions
request_irq Usage To request an interrupt line and install a handler
if (request_irq(irqn, my_interrupt, SA_SHIRQ, “my-device”, dev)) {
printk(KERN_ERR “my_device: cannot register IRQ %d\n”, irqn);return –EIO;
} This call may block, so it cannot be called from interrupt cont
ext or other situations where code cannot block If return 0 handler was successfully installed
To free an interrupt line, callvoid free_irq(unsigned int irq, void *dev_id); If line is not shared, it removes handler and disables the line Otherwise, the line is only disabled at removal of last handler dev_id is used to uniquely identify an interrupt handler This call can be made from process context
54Chapter 4 Interrupts and Exceptions
int request_irq(unsigned int irq, irqreturn_t (*handler)(int, void *, struct pt_regs *),unsigned long irqflags, const char * devname,void *dev_id)
{int retval;struct irqaction * action;
if (irq >= NR_IRQS) return -EINVAL;if (!handler) return -EINVAL;
action = (struct irqaction *)kmalloc(sizeof(struct irqaction), GFP_ATOMIC);if (!action)
return -ENOMEM;
action->handler = handler;action->flags = irqflags;action->mask = 0;action->name = devname;action->next = NULL;action->dev_id = dev_id;
retval = setup_irq(irq, action);if (retval) kfree(action);return retval;
}
int setup_irq(unsigned int irq, struct irqaction * new){
irq_desc_t *desc = irq_desc + irq;
if (desc->handler == &no_irq_type)return -ENOSYS;
spin_lock_irqsave(&desc->lock,flags);p = &desc->action;if ((old = *p) != NULL) { if (!(old->flags & new->flags & SA_SHIRQ)) {
spin_unlock_irqrestore(&desc->lock,flags);
return -EBUSY; }
do { p = &old->next; old = *p; } while (old); shared = 1;}
*p = new;if (!shared) {
desc->depth = 0;desc->status &= ~(IRQ_DISABLED |
IRQ_AUTODETECT | IRQ_WAITING | IRQ_INPROGRESS);
desc->handler->startup(irq);}spin_unlock_irqrestore(&desc->lock,flags);
register_irq_proc(irq);return 0;
}
55Chapter 4 Interrupts and Exceptions
Processing Steps in Detail1. A device issues an interrupt by sending an electric signal to the
interrupt controller2. If the interrupt line is enabled (can be disabled), IC sends interr
upt to processor3. If interrupts are not disabled in processor, it immediately stop
s current execution4. It disables interrupt system // RA: where does this take place?5. It jumps to a predefined location memory and executes code
(entry code) by its vector6. Entry code saves IRQ# and current register values on stack and
calls do_IRQ()7. do_IRQ() acks receipt of interrupt and disable interrupt deliver
y on this IRQ line8. do_IRQ() calls handle_IRQ_event() to execute registered ISRs9. do_IRQ() returns to entry code 10. Entry code jumps to ret_from_intr()
56Chapter 4 Interrupts and Exceptions
.dataENTRY(interrupt).text
vector=0ENTRY(irq_entries_start).rept NR_IRQS ALIGN1: pushl $vector-256 jmp common_interrupt.data .long 1b.textvector=vector+1.endr
ALIGNcommon_interrupt: SAVE_ALL call do_IRQ jmp ret_from_intr
#define BUILD_INTERRUPT(name, nr) \ENTRY(name) \ pushl $nr-256; \ SAVE_ALL \ call smp_/**/name; \ jmp ret_from_intr;
/* The include is where all of the SMP etc. interrupts come from */
#include "entry_arch.h"
ENTRY(divide_error) pushl $0 # no error code pushl $do_divide_error ALIGNerror_code: pushl %ds pushl %eax xorl %eax, %eax pushl %edx decl %eax # eax = -1 pushl %ecx pushl %ebx cld movl %es, %ecx movl ORIG_EAX(%esp), %esi # get the error code movl ES(%esp), %edi # get the function address movl %eax, ORIG_EAX(%esp) movl %ecx, ES(%esp) movl %esp, %edx pushl %esi # push the error code pushl %edx # push the pt_regs pointer movl $(__USER_DS), %edx movl %edx, %ds movl %edx, %es call *%edi addl $8, %esp jmp ret_from_exception
57Chapter 4 Interrupts and Exceptions
asmlinkage unsigned int do_IRQ(struct pt_regs regs){
int irq = regs.orig_eax & 0xff; /* high bits used in ret_from_ code */irq_desc_t *desc = irq_desc + irq;
irq_enter();kstat_this_cpu.irqs[irq]++;spin_lock(&desc->lock);desc->handler->ack(irq);status = desc->status & ~(IRQ_REPLAY | IRQ_WAITING);status |= IRQ_PENDING;
/* we _want_ to handle it */
for (;;) {irqreturn_t action_ret;spin_unlock(&desc->lock);…action_ret = handle_IRQ_event(irq, &r
egs, action);…spin_lock(&desc->lock);desc->status &= ~IRQ_PENDING;
}desc->status &= ~IRQ_INPROGRESS;
out:desc->handler->end(irq);spin_unlock(&desc->lock);irq_exit();return 1;
}
asmlinkage int handle_IRQ_event(unsigned int irq,struct pt_regs *regs, struct irqaction *action)
{int status = 1;/* Force the "do bottom halves" bit */int retval = 0;
if (!(action->flags & SA_INTERRUPT))local_irq_enable();
do {status |= action->flags;retval |= action->handler(irq,
action->dev_id, regs);action = action->next;
} while (action);
if (status & SA_SAMPLE_RANDOM)add_interrupt_randomness(irq);
local_irq_disable();return retval;
}
58Chapter 4 Interrupts and Exceptions
ret_from_intr() It is written in assembly code It first checks whether a reschedule is pe
nding (need_resched) If need_resched and kernel is returning t
o user-space, schedule() is called If need_resched and kernel is returning t
o kernel-space, schedule() is called only if (preempt_count == 0)
59Chapter 4 Interrupts and Exceptions
Review Slide Which exception does not generate signal to process? Exception handler initialization? Processing step? Types of interrupts?
I/O, timer, interprocessor? IRQ sharing? IRQ dynamic allocation? Linux classification of actions in IH?
Critical, Noncritical, Noncriticial Deferrable 3 ways to select IRQ lie for configurable device?
HW jumpers, utility program, HW protocol Interrupt handler initialization? Processing step?
60Chapter 4 Interrupts and Exceptions
Review Slide How to register an ISR?
request_irq() usage? Parameters? irqline, routine, flags, devname, dev_id?
Flags usage? SA_INTERRUPT, SA_SAMPLE_RANDOM, SA_SHIRQ
free_irq() usage? RA: Study usage of SA_SAMPLE_RANDOM
How it affects random-number generator Homework #4: IDT Table Initialization
Required for everyone Mail your report to TA before deadline
61Chapter 4 Interrupts and Exceptions
8259A PIC
62Chapter 4 Interrupts and Exceptions
8259A PIC History 在 IBM PC 及其相容機上所使用的PIC是 Intel 8259A 晶片 一個 8259A 晶片的可以接最多 8 個中斷源,但由於可以將
2 個或多個 8259A 晶片 cascade ,最多可以到 8 個 所以可以接 64 個中斷源
早期 IBM PC/XT 只有 1 個 8259A ,但設計師們馬上意識到這是不夠的,於是到了IBM PC/AT , 8259A 被增加到 2 個 其中一個稱作 Master ,另外一個為 Slave Slave cascade 連接在 Master 上 如今大多數的 PC 都擁有 2 個 8259A ,最多可以接收
15 個中斷 通過 8259A 可以對單個中斷源進行遮罩
63Chapter 4 Interrupts and Exceptions
8259A Architecture 一個 8259A 晶片有以下幾
個內部暫存器 Interrupt Mask Register (IMR)
過濾被遮罩的中斷 Interrupt Request Register (IR
R) 暫時放置未被進一步處理的 In
terrupt In Service Register (ISR)
當一個 Interrupt 正在被 CPU處理時,此中斷被放置在 ISR中
64Chapter 4 Interrupts and Exceptions
More on 8259A PIC
8259A 還有一個單元叫做 Priority Resolver 當多個中斷同時發生時, Priority Resolver 根據它們的優先順序,將高優先順序者優先傳遞給 CPU
Pentium 以及後來的 CPU 將 PIC 集成 Advanced Programmable Interrupt Controller (APIC) 不過為了向前相容,即便有 APIC 的機器也會有 8259A
現在的主機板上, 8259A 都是由南橋晶片提供
65Chapter 4 Interrupts and Exceptions
Interrupt Control on SMP
當 Intel 考慮如何在 IA-32 上架構 SMP 時,原來的中斷控制器 8259A 就顯得力不從心。
在 SMP 上,必須考慮外部設備來的中斷信號如何傳遞給某個合適的 CPU 以及 IPI ( Inter-Processor Interrupt )問題。
Intel 自 Pentium 之後,在 CPU 中集成了 APIC ,在 SMP 上,主板上有一個(至少一個,有的主板有多個 IO-APIC ,用來更好的分發中斷信號)全局的 APIC
它負責從外設接收中斷信號,再分發到 CPU 上,這個全局的 APIC 被稱作 IO-APIC
66Chapter 4 Interrupts and Exceptions
8295A Processing Flow (1/2)
1. 當一個中斷請求從 IR0 ~ IR7 中的某條線到達 IMR 時, IMR 首先判斷此 IR 是否被遮罩,若是,則此中斷請求被丟棄;否則,則將其放入 IRR 中
2. 在此中斷請求不能進一步處理之前,它一直被放在 IRR 中。一旦時機已到,Priority Resolver 將從所有被放置於 IRR 中的中斷裡挑選出一個優先順序最高的,將其傳遞給 CPU 處理。 IR 號碼越低的中斷優先順序別越高, (IR0 Timer 有最高優先權 )
3. 8259A 通過發送一個 INTR (Interrupt Request) 信號給 CPU ,通知 CPU 有一個中斷到達。 CPU 收到此信號後,會暫停執行下一條指令,然後發送一個 INTA (Interrupt Acknowledge) 信號給 8259A
4. 8259A 收到這個信號之後,馬上 set ISR 中對應此中斷的 bit ,同時 reset IRR 中相應的 bit ,表示此中斷正在被 CPU 處理,而不是正在等待 CPU
5. 隨後, CPU 會再次發送一個 INTA 信號給 8259A ,要求它告訴 CPU 此中斷請求的中斷向量是什麼,這是一個從 0 ~ 255 的一個數
6. 8259A 根據被設置的起始向量(起始向量通過中斷控制字 ICW2 被初始化)加上中斷請求號碼計算出中斷向量號,並將其放置在 Data Bus 上
67Chapter 4 Interrupts and Exceptions
8295A Processing Flow (2/2)
CPU 從 Data Bus 上得到這個中斷向量之後,就去 IDT 中找到相應的中斷服務程式 ISR routine
如果 8259A 的 End of Interrupt (EOI) 通知被設為手動模式,那麼當 ISR 處理後,應該發送一個 EOI 給 8259A
8259A 得到 EOI 通知之後, ISR 中對應此中斷請求的 bit 會被 reset
如果 EOI 通知被設定為自動模式,則在收到第 2 個 INTA 信號後, 8259A ISR 中對應於此中斷請求的 bit 就會被 reset
在此期間,如果又有新的中斷請求到達,並被放置於 IRR 中,如果這些新的請求中有比在 ISR 中放置的所有中斷優先順序別還高的話,則這些高優先級別的中斷請求將會被馬上按照上述過程處理;否則,這些中斷將會被放在 IRR 中,直到 ISR 中高優先的中斷被處理結束,也就是說直到 ISR 中高優先級別的 bit 被 reset 為止
68Chapter 4 Interrupts and Exceptions
IRQ2 / IRQ9 Redirection 為什麼要將 IRQ2 重定向到 IRQ9 上?這是由於相容性問題造成的 到了 IBM PC/AT ,以 cascade 的方式增加了一個 8259A ,這樣可
以多處理 7 種 IRQ。原來的 8259A 被稱作 Master PIC ,新增的被稱作 Slave PIC
由於 CPU 只有 1 條中斷線, Slave PIC 只好 cascade 在 Master PIC 上,佔用 IRQ2 ,但是導致在 IBM PC/XT 上使用 IRQ2 的設備將無法再使用它
為了解決此ㄧ問題,設計者從 Slave PIC 中挑出 IRQ9 ,要求軟體設計者將原來的 IRQ2 重定向到 IRQ9 上,也就是說 IRQ9 的 ISR routine 必須呼叫 IRQ2 的 ISR routine
這樣,原來接在 IRQ2 上的設備現在接在 IRQ9 上,在軟體上只需要增加 IRQ9 的 ISR ,就可以和原有系統相容。而在當時,增加的 IRQ9 ISR 是由 BIOS 所提供,所以從根本上保證了相容。
69Chapter 4 Interrupts and Exceptions
I/O Port & Address/ * arch/i386/mach-generic/io_ports.h Machine specific IO port address definition
for generic. */
/* i8259A PIC registers */#define PIC_MASTER_CMD 0x20#define PIC_MASTER_IMR 0x21#define PIC_MASTER_ISR
PIC_MASTER_CMD#define PIC_MASTER_POLL
PIC_MASTER_ISR#define PIC_MASTER_OCW3
PIC_MASTER_ISR#define PIC_SLAVE_CMD 0xa0#define PIC_SLAVE_IMR 0xa1
/* i8259A PIC related value */#define PIC_CASCADE_IR 2#define MASTER_ICW4_DEFAULT 0x01#define SLAVE_ICW4_DEFAULT 0x01#define PIC_ICW4_AEOI 2
每一顆 8259A 晶片都有 2 個 I/O ports ,通過其控制 8259A Master 8259A 是 0x20 , 0x2
1 Slave 8259A 是 0xA0 , 0xA1
可向 8259A 寫入 2 種命令 Initialization Command Wor
d (ICW) :對 8259A 晶片初始化
Operation Command Word (OCW) :向 8259A 發佈命令,以對其進行控制
70Chapter 4 Interrupts and Exceptions
Linux 8259A Interrupt Handler
/* linux-2.6.14.1\arch\i386\kernel\I8259.c */
static struct hw_interrupt_type i8259A_irq_type = {
.typename = "XT-PIC",
.startup = startup_8259A_irq,
.shutdown = shutdown_8259A_irq,
.enable = enable_8259A_irq,
.disable = disable_8259A_irq,
.ack = mask_and_ack_8259A,
.end = end_8259A_irq,};
/* This contains the irq mask for both 8259A irq controllers, */unsigned int cached_irq_mask = 0xffff;
71Chapter 4 Interrupts and Exceptions
startup_8259A_irq and shutdown_8259A_irq(arch/i386/kernel/i8259.c)
54 unsigned int startup_8259A_irq(unsigned int irq)
55 {
56 enable_8259A_irq(irq);
57 return 0;
58 }
50 #define shutdown_8259A_irq disable_8259A_irq
72Chapter 4 Interrupts and Exceptions
enable_8259A_irq(arch/i386/kernel/i8259.c)105 void enable_8259A_irq(unsigned int irq) 106 { 107 unsigned int mask = ~(1 << irq); // Mask will be 11101111 11111111b if irq = 12d108 unsigned long flags; 109 110 spin_lock_irqsave(&i8259A_lock, flags); 111 cached_irq_mask &= mask; // 00110011 00111000b (Ori cached_irq_mask) // 11101111 11111111b (mask) // 00100011 00111000b (New cached_irq_mask)112 if (irq & 8) // whether irq >= 8113 outb(cached_slave_mask, PIC_SLAVE_IMR); 114 else 115 outb(cached_master_mask, PIC_MASTER_IMR); 116 spin_unlock_irqrestore(&i8259A_lock, flags); 117 }
73Chapter 4 Interrupts and Exceptions
disable_8259A_irq(arch/i386/kernel/i8259.c) 91 void disable_8259A_irq(unsigned int irq) 92 { 93 unsigned int mask = 1 << irq; 94 unsigned long flags; 95 96 spin_lock_irqsave(&i8259A_lock, flags); 97 cached_irq_mask |= mask; 98 if (irq & 8) 99 outb(cached_slave_mask, PIC_SLAVE_IMR); 100 else 101 outb(cached_master_mask, PIC_MASTER_IMR); 102 spin_unlock_irqrestore(&i8259A_lock, flags); 103 }
74Chapter 4 Interrupts and Exceptions
include/asm-i386/mach-default/io_ports.h
15 /* i8259A PIC registers */16 #define PIC_MASTER_CMD 0x2017 #define PIC_MASTER_IMR 0x2118 #define PIC_MASTER_ISR PIC_MASTER_CMD19 #define PIC_MASTER_POLL PIC_MASTER_ISR20 #define PIC_MASTER_OCW3 PIC_MASTER_ISR21 #define PIC_SLAVE_CMD 0xa022 #define PIC_SLAVE_IMR 0xa1
75Chapter 4 Interrupts and Exceptions
include/asm-i386/i8259.h
4 extern unsigned int cached_irq_mask;
5
6 #define __byte(x,y) (((unsigned char *) &(y))[x])
7 #define cached_master_mask (__byte(0, cached_irq_mask))
8 #define cached_slave_mask (__byte(1, cached_irq_mask))
76Chapter 4 Interrupts and Exceptions
/* Not all IRQs can be routed through the IO-APIC, eg. on certain (older) * boards the timer interrupt is not really connected to any IO-APIC pin, * it's fed to the master 8259A's IR0 line only. * * Any '1' bit in this mask means the IRQ is routed through the IO-APIC. * this 'mixed mode' IRQ handling costs nothing because it's only used * at IRQ setup time. */
void disable_8259A_irq(unsigned int irq){
unsigned int mask = 1 << irq;unsigned long flags;
// 確定對 master & slave 8259A 的 operation 是 mutual exclusion // for SMP system ?spin_lock_irqsave(&i8259A_lock, flags);
// 設定相對應的 bit 為 1 以 disable 此 IRQ linecached_irq_mask |= mask;
// 判斷是否 irq >= 8if (irq & 8)
// store slave IRQ maskoutb(cached_slave_mask, PIC_SLAVE_IMR);
else// store master IRQ maskoutb(cached_master_mask, PIC_MASTER_IMR);
spin_unlock_irqrestore(&i8259A_lock, flags);}
77Chapter 4 Interrupts and Exceptions
static void mask_and_ack_8259A(unsigned int irq) // 向 PIC 發送 EOI 表示 Int. Service 結束{
unsigned int irqmask = 1 << irq;unsigned long flags;
spin_lock_irqsave(&i8259A_lock, flags);if (cached_irq_mask & irqmask) // 判斷是否指定的 IRQ line 已經被 mask
// 8259A 在 IMR Reg 中相應位置被設為 1 情況下// 仍向 CPU 發出相應的中斷信號 , 因此是ㄧ個假中斷goto spurious_8259A_irq;
cached_irq_mask |= irqmask;
handle_real_irq:if (irq & 8) { // slave
inb(PIC_SLAVE_IMR); /* DUMMY - (do we need this?) */// mask 此 IRQ lineoutb(cached_slave_mask, PIC_SLAVE_IMR);// 寫入 0x60+(irq&7) 'Specific EOI' 操作 slave IRQ (irq&7)outb(0x60+(irq&7), PIC_SLAVE_CMD); /* 'Specific EOI' to slave */// 再寫入 0x60+PIC_CASCADE_IR 'Specific EOI' 操作 master IRQ2outb(0x60+PIC_CASCADE_IR, PIC_MASTER_CMD); /* 'Specific EOI' to master-IRQ2 */
} else { // master inb(PIC_MASTER_IMR); /* DUMMY - (do we need this?) */outb(cached_master_mask, PIC_MASTER_IMR);outb(0x60+irq,PIC_MASTER_CMD); /* 'Specific EOI to master */
}spin_unlock_irqrestore(&i8259A_lock, flags);return;
78Chapter 4 Interrupts and Exceptions
spurious_8259A_irq:/** this is the slow path - should happen rarely. */if (i8259A_irq_real(irq))
/* * oops, the IRQ _is_ in service according to the * 8259A - not spurious, go handle it. */goto handle_real_irq;
{static int spurious_irq_mask;/* * At this point we can be sure the IRQ is spurious, * lets ACK and report it. [once per IRQ] */
if (!(spurious_irq_mask & irqmask)) { // 判斷是否已經處理過此 spurous IRQ printk(KERN_DEBUG "spurious 8259A interrupt: IRQ%d.\n", irq);spurious_irq_mask |= irqmask;
}
atomic_inc(&irq_err_count); // 累加 irq_err_count/* * Theoretically we do not have to handle this IRQ, * but in Linux this does not cause problems and is * simpler for us. */// 在 Linux 中 , 按照處理真實 IRQ 方式處理 spurous IRQ 不會有問題goto handle_real_irq;
}}
79Chapter 4 Interrupts and Exceptions
/* * This function assumes to be called rarely. Switching between * 8259A registers is slow. * This has to be protected by the irq controller spinlock * before being called. */static inline int i8259A_irq_real(unsigned int irq){
int value;int irqmask = 1<<irq;
if (irq < 8) { // master// default 為 IRR Reg, 因此寫入 OCW3 = 0x0B 以切換到 ISR Regoutb(0x0B,PIC_MASTER_CMD); /* ISR register */// 是否此中斷真的在被 CPU 處理value = inb(PIC_MASTER_CMD) & irqmask;outb(0x0A,PIC_MASTER_CMD); /* back to the IRR register */return value;
}// slaveoutb(0x0B,PIC_SLAVE_CMD); /* ISR register */value = inb(PIC_SLAVE_CMD) & (irqmask >> 8);outb(0x0A,PIC_SLAVE_CMD); /* back to the IRR register */return value;
}
80Chapter 4 Interrupts and Exceptions
static void end_8259A_irq (unsigned int irq){
// 判斷 IRQ 是否被 disable 或 in-progress 中if (!(irq_desc[irq].status & (IRQ_DISABLED|IRQ_INPROGRESS)) &&
irq_desc[irq].action)enable_8259A_irq(irq);
}
81Chapter 4 Interrupts and Exceptions
Interrupt Control Interface
82Chapter 4 Interrupts and Exceptions
Control Interfaces Purpose: to allow disabling the interrupt syste
m for current CPU or mask out an interrupt line for entire machine
Disable/enable interrupts locally for current processor: local_irq_disable(); local_irq_enable(); local_irq_save(flags); // save and disable local_irq_restore(flags); // restore and enable
83Chapter 4 Interrupts and Exceptions
Control Interfaces (2) Disable only a specific interrupt line for entire system
disable_irq(unsigned int irq); Wait until any currently executing handler completes
disable_irq_nosync(unsigned int irq); Will not wait
enable_irq(unsigned int irq); If disable_irq() is called twice, only the 2nd enable_irq() will actually ena
ble the interrupt line synchronize_irq(unsigned int irq);
Wait for a specific IH to exit, if executing, before returning Status checking
irqs_disable() returns nonzero if interrupt system on local CPU is disabled, or 0 otherwi
se in_interrupt()
return nonzero if kernel is in interrupt context (including in IH or BH) return zero if kernel is in process context
in_irq() return nonzero if kernel is executing an interrupt handler
84Chapter 4 Interrupts and Exceptions
disable_irq_nosync (1/2)<LINUX SRC>/kernel/irq/manage.cvoid disable_irq_nosync(unsigned int irq){ // get the IRQ descriptor we are going to
// disable irq_desc_t *desc = irq_desc + irq; unsigned long flags; // acquire lock spin_lock_irqsave(&desc->lock, flags);
85Chapter 4 Interrupts and Exceptions
disable_irq_nosync (2/2)
// disable IRQ if (!desc->depth++) { desc->status |= IRQ_DISABLED; desc->handler->disable(irq); } // release lock spin_unlock_irqrestore(&desc->lock, flags);}
86Chapter 4 Interrupts and Exceptions
disable_irq
<LINUX SRC>/kernel/irq/manage.cvoid disable_irq(unsigned int irq) { // get the IRQ descriptor we are going to
// disable irq_desc_t *desc = irq_desc + irq; disable_irq_nosync(irq); // let current IRQ handler to finish if (desc->action) synchronize_irq(irq); }
87Chapter 4 Interrupts and Exceptions
synchronize_irq
<LINUX SRC>/kernel/irq/manage.c#ifdef CONFIG_SMPvoid synchronize_irq(unsigned int irq) { struct irq_desc *desc = irq_desc + irq; while (desc->status & IRQ_INPROGRESS) cpu_relax(); }
#ifndef CONFIG_SMP# define synchronize_irq(irq)barrier()
88Chapter 4 Interrupts and Exceptions
enable_irq (1/4)
<LINUX SRC>/kernel/irq/manage.cvoid enable_irq(unsigned int irq){ // get the IRQ descriptor we are going to
// disable irq_desc_t *desc = irq_desc + irq; unsigned long flags; // acquire lock spin_lock_irqsave(&desc->lock, flags);
89Chapter 4 Interrupts and Exceptions
enable_irq (2/4)
switch (desc->depth) { // cannot enable IRQ when its depth = 0 case 0: WARN_ON(1); break;
90Chapter 4 Interrupts and Exceptions
enable_irq (3/4)
case 1: { // clear IRQ_DISABLED bit in desc->status unsigned int status = desc->status &
~IRQ_DISABLED;
desc->status = status; if ((status & (IRQ_PENDING | IRQ_REPLAY))
== IRQ_PENDING) { desc->status = status | IRQ_REPLAY; hw_resend_irq(desc->handler,irq); }
91Chapter 4 Interrupts and Exceptions
enable_irq (4/4)
default: desc->depth--; }
// release lock spin_unlock_irqrestore(&desc->lock, flags);}
92Chapter 4 Interrupts and Exceptions
hw_resend_irq
#ifdef CONFIG_X86_IO_APICstatic inline void hw_resend_irq(struct
hw_interrupt_type *h, unsigned int i){ if (IO_APIC_IRQ(i)) // write io_apic_vector into APIC send_IPI_self(IO_APIC_VECTOR(i));}#ifndef CONFIG_X86_IO_APICstatic inline void hw_resend_irq(struct
hw_interrupt_type *h, unsigned int i) {}
93Chapter 4 Interrupts and Exceptions
setup_irq (1/2)int setup_irq(unsigned int irq, struct irqaction
* new){ struct irq_desc *desc = irq_desc + irq; struct irqaction *old, **p; int shared = 0; ... p = &desc->action; if ((old = *p) != NULL) { ... shared = 1; }
94Chapter 4 Interrupts and Exceptions
setup_irq (2/2) *p = new; if (!shared) { desc->depth = 0; desc->status &= ~(IRQ_DISABLED |
IRQ_AUTODETECT | IRQ_WAITING | IRQ_INPROGRESS);
if (desc->handler->startup) desc->handler->startup(irq); else desc->handler->enable(irq); ... return 0;}
95Chapter 4 Interrupts and Exceptions
Mask/Unmask IRQs
local_irq_disable() #define local_irq_disable()
__asm__ __volatile__("cli": : :"memory")
local_irq_enable() #define local_irq_enable()
__asm__ __volatile__("sti": : :"memory")
RA: __volatile__
96Chapter 4 Interrupts and Exceptions
Review Slide IH return value?
IRQ_NONE, IRQ_HANDLED When IRQ line is shared, how an IH acks a requ
ested device? Interrupt context?
Sleep? Stack? I/O IH processing steps? local_irq_disable(), local_irq_enable()? disable_irq()? disable_irq_nosync()? irqs_disa
ble()? in_interrupt()?
97Chapter 4 Interrupts and Exceptions
Writing Interrupt Service Routine
98Chapter 4 Interrupts and Exceptions
Introduction A typical declaration of an ISR
static irqreturn_t intr_handler(int irq, void *dev_id, struct pt_regs *regs)
irq: the IRQ line it is servicing dev_id: a generic pointer to the same dev_id given to request
_irq() regs: processor registers prior to servicing the interrupt
Return value IRQ_NONE: ISR detects an interrupt for which its device was n
ot the originator IRQ_HANDLED: Otherwise
At a minimum, most ISRs need to provide acks to the device that they received the interrupt
When a line is shared by multiple ISRs, kernel invokes sequentially each registered handler A HW device should have a status register its ISR can check
99Chapter 4 Interrupts and Exceptions
Example: RTC Interrupt Service Routine
When RTC driver loads, rtc_init() is invoked to initialize the driver
static int __init rtc_init(void){ …
if (request_irq(rtc_irq, rtc_interrupt, SA_INTERRUPT, "rtc", (void *)&rtc_port)) {
printk(KERN_ERR "rtc: cannot register IRQ %d\n", rtc_irq); return -EIO;
} …}
rtc_interrupt runs with all interrupts disabled rtc_irq = IRQ8 on PC
100Chapter 4 Interrupts and Exceptions
irqreturn_t rtc_interrupt(int irq, void *dev_id, struct pt_regs *regs){
// Can be an alarm interrupt, update complete interrupt, or a periodic interrupt. // We store the status in the low byte and the number of interrupts received since // the last read in the remainder of rtc_irq_data. spin_lock (&rtc_lock);rtc_irq_data += 0x100;rtc_irq_data &= ~0xff;
if (is_hpet_enabled()) {rtc_irq_data |= (unsigned long)irq & 0xF0;
} else {rtc_irq_data |= (CMOS_READ(RTC_INTR_FLAGS) & 0xF0);
}
if (rtc_status & RTC_TIMER_ON)mod_timer(&rtc_irq_timer, jiffies + HZ/rtc_freq + 2*HZ/100);
spin_unlock (&rtc_lock);
spin_lock(&rtc_task_lock);if (rtc_callback) rtc_callback->func(rtc_callback->private_data);spin_unlock(&rtc_task_lock);wake_up_interruptible(&rtc_wait);kill_fasync (&rtc_async_queue, SIGIO, POLL_IN);
return IRQ_HANDLED;}
101Chapter 4 Interrupts and Exceptions
Interrupt Context When executing an interrupt handler or
bottom half, kernel is in interrupt context Interrupt context cannot sleep
Process context can This limits the functions which one can call from an
interrupt handler Interrupt context does not receive its own
stack It shares the kernel stack of the process it
interrupts If no process is running, it uses idle task’s stack
Code trace: keyboard ISR (IRQ1) Code trace: mouse ISR (IRQ12)
102Chapter 4 Interrupts and Exceptions
Mouse & Keyboard Interrupt Handler魏淳航
103Chapter 4 Interrupts and Exceptions
/proc/interrupt
104Chapter 4 Interrupts and Exceptions
I8042
PS/2 mouse and keyboard controller
This microcontroller is hidden within the motherboard’s chipset, which integrates many microcontrollers in a single package.
105Chapter 4 Interrupts and Exceptions
4 8-bits registers Status(read), control(write), input(writ), output(read) register. use IO port 0x60, 0x64
SR(0x SR(0x 64)64)
IR(0x IR(0x 60)60)
0R(0x 0R(0x 60)60)
CR(0x CR(0x 64)64)
8042 8042 chipchip
106Chapter 4 Interrupts and Exceptions
I8042 Architecture
107Chapter 4 Interrupts and Exceptions
Initial Steps
1. Init i8042 driver2. Set interface : Serio3. Init mouse driver4. Connect mouse to interface5. Call request irq6. Start mouse
108Chapter 4 Interrupts and Exceptions
int __init i8042_init(void){…//…//initial controller
i8042_aux_values.irq = I8042_AUX_IRQ;//12i8042_kbd_values.irq = I8042_KBD_IRQ;//1if (!i8042_noaux && !i8042_check_aux(&i8042_aux_values)) {
//check if aux is availableif (!i8042_nomux && !i8042_check_mux(&i8042_aux_values)){
//check if mux is avalilable for (i = 0; i < 4; i++) { i8042_init_mux_values(i8042_mux_values + i, i8042_mux_port + i, i); i8042_port_register(i8042_mux_values + i, i8042_mux_port + i); }
}else{ i8042_port_register(&i8042_aux_values, &i8042_aux_port);}
}
i8042_port_register(&i8042_kbd_values, &i8042_kbd_port);}
drivers\input\serio\i8042.c
109Chapter 4 Interrupts and Exceptions
Structure of SERIO static struct i8042_values i8042_aux_values = {
.irqen = I8042_CTR_AUXINT,//0x02
.disable = I8042_CTR_AUXDIS,//0x20
.name = "AUX",
.mux = -1,};
static struct serio i8042_aux_port ={
.type = SERIO_8042,
.write = i8042_aux_write,
.open = i8042_open,
.close = i8042_close,
.driver = &i8042_aux_values,
.name = "i8042 Aux Port",
.phys = I8042_AUX_PHYS_DESC,}; //others are NULL
struct serio {void *private;void *driver;char *name;char *phys;unsigned short idbus;unsigned short idvendor;unsigned short idproduct;unsigned short idversion;unsigned long type;unsigned long event;int (*write)(struct serio *, unsigned char);int (*open)(struct serio *);void (*close)(struct serio *);struct serio_dev *dev;struct list_head node;
};
110Chapter 4 Interrupts and Exceptions
static int __init i8042_port_register(struct i8042_values *values, struct serio *port){
values->exists = 1;
i8042_ctr &= ~values->disable;
if (i8042_command(&i8042_ctr, I8042_CMD_CTL_WCTR)) {//enable mouse or keyboardprintk(KERN_WARNING "i8042.c: Can't write CTR while registering.\n");values->exists = 0;return -1;
}
printk(KERN_INFO "serio: i8042 %s port at %#lx,%#lx irq %d\n", values->name, (unsigned long) I8042_DATA_REG, (unsigned long) I8042_COMMAND_REG, values->irq);
serio_register_port(port);
return 0;}
111Chapter 4 Interrupts and Exceptions
Add to serio_listvoid __serio_register_port(struct serio *serio){
list_add_tail(&serio->node, &serio_list);serio_find_dev(serio);
}
static void serio_find_dev(struct serio *serio){
struct serio_dev *dev;
list_for_each_entry(dev, &serio_dev_list, node) {if (serio->dev)
break;if (dev->connect)
dev->connect(serio, dev);}
}
112Chapter 4 Interrupts and Exceptions
Initial Steps
1. Init i8042 driver2. Set interface : Serio3. Init mouse driver4. Connect mouse to interface5. Call request irq6. Start mouse
113Chapter 4 Interrupts and Exceptions
\drivers\input\mouse\Psmouse-base.c
int __init psmouse_init(void){
psmouse_parse_proto();serio_register_device(&psmouse_dev);return 0;
}
void serio_register_device(struct serio_dev *dev){
struct serio *serio;down(&serio_sem);list_add_tail(&dev->node, &serio_dev_list);list_for_each_entry(serio, &serio_list, node)
if (!serio->dev && dev->connect)dev->connect(serio, dev);
up(&serio_sem);}
static struct serio_dev psmouse_dev = {.interrupt = psmouse_interrupt,.connect = psmouse_connect,.reconnect = psmouse_reconnect,.disconnect = psmouse_disconnect,.cleanup = psmouse_cleanup,
};
114Chapter 4 Interrupts and Exceptions
psmouse_connect()static void psmouse_connect(struct serio *serio, struct serio_dev *dev){
...if (serio->type!=SERIO_8042) //check if serio type is SERIO_8042
return;if (serio_open(serio, dev)) { //request irq
kfree(psmouse);serio->private = NULL;return;
}if (psmouse_probe(psmouse) < 0) { //Hand Shake
serio_close(serio); //get ack from mouse and device ID (0x00)kfree(psmouse);serio->private = NULL;return;
}psmouse->protocol_handler = psmouse_process_byte;//mouse event handlerpsmouse_activate(psmouse); // reset counter of mouse and enables it
}
115Chapter 4 Interrupts and Exceptions
serio_open( )-request irq
int serio_open(struct serio *serio, struct serio_dev *dev){
serio->dev = dev;if (serio->open && serio->open(serio)) {
serio->dev = NULL;return -1;
}return 0;
}
static int i8042_open(struct serio *port){struct i8042_values *values = port->driver;if (request_irq(values->irq, i8042_interrupt,SA_SHIRQ, "i8042", i8042_request_irq_cookie)) {
goto irq_fail;}
}
static struct serio i8042_aux_port ={
.type = SERIO_8042,
.write = i8042_aux_write,
.open = i8042_open,
.close = i8042_close,
.driver = &i8042_aux_values,
.name = "i8042 Aux Port",
.phys = I8042_AUX_PHYS_DESC,}; //others are NULL
116Chapter 4 Interrupts and Exceptions
Mouse Interrupt Handler
1. i8042_interrupt: get data and flags from 8042
2. psmouse_interrupt()3. psmouse_process_byte():handle the pa
ckets
117Chapter 4 Interrupts and Exceptions
I8042_interrupt
static irqreturn_t i8042_interrupt(int irq, void *dev_id, struct pt_regs *regs){unsigned int dfl;…spin_lock_irqsave(&i8042_lock, flags);str = i8042_read_status();if (str & I8042_STR_OBF)
data = i8042_read_data();spin_unlock_irqrestore(&i8042_lock, flags);
dfl = ((str & I8042_STR_PARITY) ? SERIO_PARITY : 0) | ((str & I8042_STR_TIMEOUT) ? SERIO_TIMEOUT : 0);
…(next page)
If 8042 output buffer have data.Read it and save to “data”
set flag from 8042
118Chapter 4 Interrupts and Exceptions
I8042_interrupt
if (i8042_aux_values.exists && (str & I8042_STR_AUXDATA)) {serio_interrupt(&i8042_aux_port, data, dfl, regs);goto irq_ret;
}
if (!i8042_kbd_values.exists)goto irq_ret;
serio_interrupt(&i8042_kbd_port, data, dfl, regs);
irq_ret:ret = 1;
}
Check status reg, if data is AUX typeThen we can call mouse interrupt
else :we call keyboard interrupt
119Chapter 4 Interrupts and Exceptions
I8042_interrupt
rqreturn_t serio_interrupt(struct serio *serio,unsigned char data, unsigned int flags, struct pt_regs *regs)
{…
if (serio->dev && serio->dev->interrupt) ret = serio->dev->interrupt(serio, data, flags, regs);
…return ret;
} static struct serio_dev psmouse_dev = {.interrupt = psmouse_interrupt,.connect = psmouse_connect,.reconnect = psmouse_reconnect,.disconnect = psmouse_disconnect,.cleanup = psmouse_cleanup,
};
120Chapter 4 Interrupts and Exceptions
Mouse Data PacketsThe standard PS/2 mouse sends movement (and button) information to the host using the following 3-byte packet (4)
Byte2(3) is the amount of movement that has occurred sincethe last movement data packet was sent to the host.
121Chapter 4 Interrupts and Exceptions
psmouse_interruptstatic irqreturn_t psmouse_interrupt(struct serio *serio,
unsigned char data, unsigned int flags, struct pt_regs *regs){
//check flags//check mouse state…if (psmouse->state == PSMOUSE_ACTIVATED && psmouse->pktcnt && time_after(jiffies, psmouse->last + HZ/2)) { printk(KERN_WARNING "psmouse.c: %s at %s lost synchronization, throwing %d bytes away.\n",psmouse->name, psmouse->phys, psmouse->pktcnt);
psmouse->pktcnt = 0;}psmouse->last = jiffies;psmouse->packet[psmouse->pktcnt++] = data;rc = psmouse->protocol_handler(psmouse, regs);…return IRQ_HANDLED;
}
122Chapter 4 Interrupts and Exceptions
psmouse_process_byte()
static psmouse_ret_t psmouse_process_byte(struct psmouse *psmouse, struct pt_regs *regs)
{struct input_dev *dev = &psmouse->dev;unsigned char *packet = psmouse->packet;
if (psmouse->pktcnt < 3 + (psmouse->type >= PSMOUSE_GENPS))return PSMOUSE_GOOD_DATA;
123Chapter 4 Interrupts and Exceptions
psmouse_process_byte()
input_report_key(dev, BTN_LEFT, packet[0] & 1);input_report_key(dev, BTN_MIDDLE, (packet[0] >> 2) & 1);input_report_key(dev, BTN_RIGHT, (packet[0] >> 1) & 1);
input_report_rel(dev, REL_X, packet[1] ? (int) packet[1] - (int) ((packet[0] << 4) & 0x100) : 0);
input_report_rel(dev, REL_Y, packet[2] ? (int) ((packet[0] << 3) & 0x100) - (int) packet[2] : 0);
return PSMOUSE_FULL_PACKET;}
124Chapter 4 Interrupts and Exceptions
static inline void input_report_key(struct input_dev *dev, unsigned int code, int value)
{input_event(dev, EV_KEY, code, !!value);
}
static inline void input_report_rel(struct input_dev *dev, unsigned int code, int value)
{input_event(dev, EV_REL, code, value);
}choose a handler from dev->h_list to handle the event
125Chapter 4 Interrupts and Exceptions
Linux kernel - 2.6.14
1. Init i8042 driver2. Set interface : Serio3. Init keyboard driver4. Connect keyboard to interface5. Call request irq6. Start keyboard interrupt
126Chapter 4 Interrupts and Exceptions
\drivers\input\keyboard\Atkbd.c
int __init atkbd_init(void){
serio_register_device(&atkbd_dev);return 0;
}
void serio_register_device(struct serio_dev *dev){
struct serio *serio;down(&serio_sem);list_add_tail(&dev->node, &serio_dev_list);list_for_each_entry(serio, &serio_list, node)
if (!serio->dev && dev->connect)dev->connect(serio, dev);
up(&serio_sem);}
static struct serio_dev psmouse_dev = {.interrupt = psmouse_interrupt,.connect = psmouse_connect,.reconnect = psmouse_reconnect,.disconnect = psmouse_disconnect,.cleanup = psmouse_cleanup,
};
127Chapter 4 Interrupts and Exceptions
Start Keyboard Interruptstatic irqreturn_t psmouse_interrupt(struct serio *serio,
unsigned char data, unsigned int flags, struct pt_regs *regs){
//check flags//check keyboard state…
unsigned int code = data; …… value = atkbd->release ? 0 :(1 + (!atkbd_softrepeat && test_bit(atkbd->keyc
ode[code], atkbd->dev.key))); ……
atkbd_report_key(&atkbd->dev, regs, atkbd->keycode[code], value);}
128Chapter 4 Interrupts and Exceptions
static void atkbd_report_key(struct input_dev *dev, struct pt_regs *regs, int code, int value)
{ …..
input_event(dev, EV_KEY, code, value); ……}
129Chapter 4 Interrupts and Exceptions
Bottom Half and Deferring Work
130Chapter 4 Interrupts and Exceptions
Why Bottom Half? IH (top halves) have following properties (requirements)
IH (top half) need to run as quickly as possible IH runs with some (or all) interrupt levels disabled IH are often time-critical and they deal with HW IH do not run in process context and cannot block
No hard and fast rules exist about what work to perform where Research work needed
Bottom halves are to defer work later “Later” is often simply “not now” Often, bottom halves run immediately after interrupt returns They run with all interrupts enabled
131Chapter 4 Interrupts and Exceptions
A World of Bottom Halves Multiple mechanisms are available for implementing a bottom hal
f softirq, tasklet, work queues
softirq: (available since 2.3) A set of 32 statically defined bottom halves that can run simultaneous
ly on any processor Even 2 of the same type can run concurrently
Used when performance is critical Must be registered statically at compile-time
tasklet: (available since 2.3) Are built on top of softirqs Two different tasklets can run simultaneously on different processors
But 2 of the same type cannot run simultaneously Used most of the time for its ease and flexibility Code can dynamically register tasklets
work queues: (available since 2.5) Queueing work to later be performed in process context
132Chapter 4 Interrupts and Exceptions
Softirqs Softirqs are rarely used
tasklets are used more of the time Statically allocated at compile-time
Related code: kernel/softirq.cstruct softirq_action{
void (*action)(struct softirq_action *); // function to runvoid *data; // data to pass to function
};static struct softirq_action softirq_vec[32];
In 2.6.7 kernel, only 6 softirqs are usedenum{
HI_SOFTIRQ=0, TIMER_SOFTIRQ, [code trace]NET_TX_SOFTIRQ, NET_RX_SOFTIRQ,SCSI_SOFTIRQ, TASKLET_SOFTIRQ
};
133Chapter 4 Interrupts and Exceptions
The Softirq Handler The prototype of a softirq handler:
void softirq_handler(struct softirq_action *) Example:
my_softirq = softirq_vec[0]; my_softirq->action(my_softirq); Passing the whole structure will make future change
of softirq_action invincible to every softirq handler A softirq never preempts another softirq
It can only be preempted by an interrupt handler Another softirq (even the same type) can run simult
aneously on another processor
134Chapter 4 Interrupts and Exceptions
Executing Softirqs
A softirq must be raised before it is executed At a suitable later time, pending softirqs runs
Pending softirqs are checked for and executed in the following places: After processing a HW interrupt By the ksoftirqd kernel thread By code that explicitly checks and executes pendin
g softirqs (e.g. networking subsystem) They all call do_softirq() to execute softirqs
135Chapter 4 Interrupts and Exceptions
Saving Registers for Exception Handler
struct pt_regs {long ebx;long ecx;long edx;long esi;long edi;long ebp;long eax;int xds;int xes;
long orig_eax;
long eip;int xcs;long eflags;long esp;int xss;
};
IRQn_interrupt:
pushl $n-256
jmp common_interrupt
common_interrupt:
SAVE_ALL
call do_IRQ
jmp $ret_from_intr
cldpush %espush %dspushl %eaxpushl %ebppushl %edipushl %esipushl %edxpushl %ecxpushl %ebxmovl $__KERNEL_DS, %edxmovl %edx, %dsmovl %edx, %es
xss
esp
eflags
xcs
eip
orig_eax
xes
xds
eax
ebp
edi
esi
edx
ecx
ebx
ESP
136Chapter 4 Interrupts and Exceptions
asmlinkage unsigned int do_IRQ(struct pt_regs regs){
int irq = regs.orig_eax & 0xff; /* high bits used in ret_from_ code */irq_desc_t *desc = irq_desc + irq;struct irqaction * action;unsigned int status;irq_enter();
kstat_this_cpu.irqs[irq]++;spin_lock(&desc->lock);desc->handler->ack(irq);status = desc->status & ~(IRQ_REPLAY | IRQ_WAITING);status |= IRQ_PENDING; /* we _want_ to handle it */
for (;;) {irqreturn_t action_ret;spin_unlock(&desc->lock);…action_ret = handle_IRQ_event(irq, ®s, a
ction);…spin_lock(&desc->lock);desc->status &= ~IRQ_PENDING;
}desc->status &= ~IRQ_INPROGRESS;
out:desc->handler->end(irq);spin_unlock(&desc->lock);irq_exit();return 1;
}
asmlinkage int handle_IRQ_event(unsigned int irq,struct pt_regs *regs, struct irqaction *action)
{int status = 1;/* Force the "do bottom halves" bit */int retval = 0;
if (!(action->flags & SA_INTERRUPT))local_irq_enable();
do {status |= action->flags;retval |= action->handler(irq,
action->dev_id, regs);action = action->next;
} while (action);
if (status & SA_SAMPLE_RANDOM)add_interrupt_randomness(irq);
local_irq_disable();return retval;
}
137Chapter 4 Interrupts and Exceptions
#define irq_exit() \do { \
preempt_count() -= IRQ_EXIT_OFFSET; \if (!in_interrupt() && softirq_pending(smp_processor_id())) \
do_softirq();\
preempt_enable_no_resched(); \} while (0)
static inline int netif_rx_ni(struct sk_buff *skb){ int err = netif_rx(skb); if (softirq_pending(smp_processor_id())) do_softirq(); return err;}
static int ksoftirqd(void * __bind_cpu){
current->flags |= PF_NOFREEZE;set_current_state(TASK_INTERRUPTIBLE);
…. do_softirq();}__set_current_state(TASK_RUNNING);return 0; …
}
asmlinkage void do_softirq(void){
unsigned long flags;struct thread_info *curctx;union irq_ctx *irqctx;u32 *isp;
if (in_interrupt()) return;local_irq_save(flags);if (local_softirq_pending()) {
curctx = current_thread_info();irqctx = softirq_ctx[smp_processor_id
()];irqctx->tinfo.task = curctx->task;irqctx->tinfo.previous_esp =
current_stack_pointer();
/* build the stack frame on the softirq stack */isp = (u32*) ((char*)irqctx + sizeof(*irq
ctx));asm volatile(" xchgl %%ebx,%%esp \n"" call __do_softirq \n"" movl %%ebx,%%esp \n": "=b"(isp): "0"(isp): "memory", "cc", "edx", "ecx", "eax");
}local_irq_restore(flags);
}
138Chapter 4 Interrupts and Exceptions
do_softirq()游家慶
139Chapter 4 Interrupts and Exceptions
do_softirq()
Finish the jobs deferred to bottom halves in ISR
1. Get pending list from current CPU’s irq_stat[cpu].member
2. Invoke __do_softirq() if there are some pending jobs
3. Restore local irq and leave do_softirq()
140Chapter 4 Interrupts and Exceptions
__do_softirq() (1/2) Finish the jobs deferred to bottom halv
es in ISRs1. Get pending list from current CPU’s ir
q_stat[cpu].member2. Disable bottom half3. Clear irq_stat[cpu].member4. Enable irq5. Carry out pending jobs until all jobs are
done
141Chapter 4 Interrupts and Exceptions
__do_softirq() (2/2)
6. Disable irq7. Get pending list from current CPU’s ir
q_stat[cpu].member(step 3 to 7 could be carried out for up to 10, set in MAX_
SOFTIRQ_RESTART, times as necessary)
8. Defer the remaining pending jobs if kernel thread should stop, invoke another do_softirq() otherwise.
142Chapter 4 Interrupts and Exceptions
When to invoke do_softirq()? Local_bh_enable macro re-enable the
softirqs do_IRQ() finishes handling an I/O interr
upt smp_apic_timer_interrupt() finishes ha
ndling a local timer interrupt One of the special ksoftirqd_CPUn kern
el threads is awoken A packet is received on a network card
143Chapter 4 Interrupts and Exceptions
Using Softirqs Currently, only networking and SCSI subsystems direc
tly use softirqs Kernel timers and tasklets are built on top of softirqs Index assignment:
Before using softirqs, you must declare its index at compile time via an enum in slide-64
Softirqs with lower numerical priority execute first Register handler:
Softirq handler is registered at run-time via open_softirq()
void open_softirq(int nr, void (*action)(struct softirq_action*), void *data){
softirq_vec[nr].data = data;softirq_vec[nr].action = action;
}
144Chapter 4 Interrupts and Exceptions
Using Softirq (2/2) Sofirqs run with interrupt enabled and cannot sleep When a handler runs, softirqs on current processor ar
e disabled Another CPU can execute softirqs Need proper locking in softirqs As a result, most softirq handlers resort to per-processor data
Raising softirq Call: raise_softirq(NEX_TX_SOFTIRQ), for example Softirqs are often raised from within interrupt handlers When done processing interrupts, kernel invokes do_softirq()
145Chapter 4 Interrupts and Exceptions
Review Slide Why bottom halves? BH available mechanism?
softirqs, tasklets, work queues 2.6.7, # of used softirqs? When and where are pending softirqs checked and ex
ecuted? do_softirq()? open_softirq()? raise_softirq()? HW#5: Study the usage of preempt_count()
Deadline: 03/27 (mail your report to TA) No class on 03/27 In-class presentation on 04/10 by 林凱立 Sample solution
146Chapter 4 Interrupts and Exceptions
Tasklets Usage
147Chapter 4 Interrupts and Exceptions
Tasklet Implementation Tasklets are implemented on top of softirqs
HI_SOFTIRQ, TASKLET_SOFTIRQ The former runs prior to the latter
struct tasklet_struct{
struct tasklet_struct *next; // next tasklet in the listunsigned long state; // state of the taskletatomic_t count; // reference counter: 0 == enabled, !0 = disabledvoid (*func)(unsigned long); // handler functionunsigned long data; // args to handler function
};
enum{
TASKLET_STATE_SCHED, /* Tasklet is scheduled for execution */TASKLET_STATE_RUN /* Tasklet is running (SMP only) */
};
#define DECLARE_TASKLET(name, func, data) \struct tasklet_struct name = { NULL, 0, ATOMIC_INIT(0), func, data }
#define DECLARE_TASKLET_DISABLED(name, func, data) \struct tasklet_struct name = { NULL, 0, ATOMIC_INIT(1), func, data }
148Chapter 4 Interrupts and Exceptions
Scheduling Tasklets Scheduled tasklets (or raised softirqs) are stored in 2 per-processor structures
tasklet_vec (regular tasklets) tasklet_hi_vec (high-priority tasklets)
Tasklets are scheduled via tasklet_schedule() and tasklet_hi_schedule()
static inline void tasklet_schedule(struct tasklet_struct *t){
if (!test_and_set_bit(TASKLET_STATE_SCHED, &t->state))__tasklet_schedule(t);
}
void fastcall __tasklet_schedule(struct tasklet_struct *t){
unsigned long flags;
local_irq_save(flags);t->next = __get_cpu_var(tasklet_vec).list;__get_cpu_var(tasklet_vec).list = t;raise_softirq_irqoff(TASKLET_SOFTIRQ);local_irq_restore(flags);
}
149Chapter 4 Interrupts and Exceptions
Execute Taskletsvoid __init softirq_init(void){
open_softirq(TASKLET_SOFTIRQ, tasklet_action, NULL);
open_softirq(HI_SOFTIRQ, tasklet_hi_action, NULL);
}
static void tasklet_action(struct softirq_action *a)
{struct tasklet_struct *list;
local_irq_disable();list = __get_cpu_var(tasklet_vec).list;__get_cpu_var(tasklet_vec).list = NULL;local_irq_enable();
while (list) {struct tasklet_struct *t = list;list = list->next;if (tasklet_trylock(t)) { if (!atomic_read(&t->count)) {
if (!test_and_clear_bit(TASKLET_STATE_SCHED, &t->state))
BUG(); t->func(t->data); tasklet_unlock(t); continue; } tasklet_unlock(t);}
local_irq_disable();t->next = __get_cpu_var(tasklet_vec).list;
__get_cpu_var(tasklet_vec).list = t;__raise_softirq_irqoff(TASKLET_SOFTIRQ);
local_irq_enable();}
}
150Chapter 4 Interrupts and Exceptions
softireq & tasklets Concurrency Two of the same tasklets never run concurrent
ly
#ifdef CONFIG_SMPstatic inline int tasklet_trylock(struct tasklet_struct *
t){ return !test_and_set_bit(TASKLET_STATE_RUN,
&(t)->state);}#else#define tasklet_trylock(t) 1#endif
151Chapter 4 Interrupts and Exceptions
Using Tasklet A tasklet can be declared statically or dynamically
DECLARE_TASKLET(name, func, data) DECLARE_TASKLET_DISABLED(name, func, data)
Writing tasklet handler void tasklet_handler(unsigned long data) for example A tasklet handler cannot sleep It runs with all interrupts enabled Two of the same tasklets never run concurrently If the same tasklet is scheduled again before it actually runs
it still runs only once Disable / Kill a tasklet
tasklet_disable() tasklet_disable_nosync() tasklet_kill()
152Chapter 4 Interrupts and Exceptions
ksoftirqd Most commonly, kernel processes softirqs on return fr
om handling an interrupt In interrupt context
However, softirqs may be raised at very high rates Sometimes, they reactivate themselves It may lead to starvation of user programs
Kernel solution When softirqs grow excessively, kernel wakes up a family of k
ernel threads They runs at lowest possible priority
One thread per processor, named ksoftirqd/n static int ksoftirqd(void * __bind_cpu) [code]
153Chapter 4 Interrupts and Exceptions
Work Queues
154Chapter 4 Interrupts and Exceptions
Introduction Work queues defer work into a kernel thread
Runs in process context Schedulable and can sleep These threads are called worker threads
Default worker threads are called events/n n is the processor number Unless there is a need to create its own thread, most drivers d
efer work to default worker thread
struct workqueue_struct {struct cpu_workqueue_struct cpu_wq[NR_CPUS];const char *name;struct list_head list; /* Empty if single thread */
};
155Chapter 4 Interrupts and Exceptions
More Data Structurestruct cpu_workqueue_struct {
spinlock_t lock;
long remove_sequence; /* Least-recently added (next to run) */long insert_sequence; /* Next to add */
struct list_head worklist;wait_queue_head_t more_work;wait_queue_head_t work_done;
struct workqueue_struct *wq;task_t *thread;
int run_depth; /* Detect run_workqueue() recursion depth */} ____cacheline_aligned;
156Chapter 4 Interrupts and Exceptions
#define create_workqueue(name) __create_workqueue((name), 0)
struct workqueue_struct *__create_workqueue(const char *name,
int singlethread){
int cpu, destroy = 0;struct workqueue_struct *wq;struct task_struct *p;
wq = kmalloc(sizeof(*wq), GFP_KERNEL);if (!wq) return NULL;memset(wq, 0, sizeof(*wq));
wq->name = name;lock_cpu_hotplug();if (singlethread) {
…} else {
spin_lock(&workqueue_lock);list_add(&wq->list, &workqueu
es);spin_unlock(&workqueue_loc
k);for_each_online_cpu(cpu) {
p = create_workqueue_thread(wq, cpu); ….}
static struct task_struct *create_workqueue_thread(struct workqueue_struct *wq, int cpu)
{struct cpu_workqueue_struct
*cwq = wq->cpu_wq + cpu;struct task_struct *p;
spin_lock_init(&cwq->lock);cwq->wq = wq;cwq->thread = NULL;cwq->insert_sequence = 0;cwq->remove_sequence = 0;INIT_LIST_HEAD(&cwq->worklist);init_waitqueue_head(&cwq->more_work);init_waitqueue_head(&cwq->work_done);
if (is_single_threaded(wq))p = kthread_create(worker_thre
ad, cwq, "%s", wq->name);else
p = kthread_create(worker_thread, cwq, "%s/%d", wq->name, cpu);if (IS_ERR(p))
return NULL;cwq->thread = p;return p;
}
157Chapter 4 Interrupts and Exceptions
static int worker_thread(void *__cwq){
struct cpu_workqueue_struct *cwq = __cwq;DECLARE_WAITQUEUE(wait, current);struct k_sigaction sa;sigset_t blocked;
current->flags |= PF_NOFREEZE;
set_user_nice(current, -10);
/* Block and flush all signals */sigfillset(&blocked);sigprocmask(SIG_BLOCK, &blocked, NULL);flush_signals(current);
/* SIG_IGN makes children autoreap: see do_notify_parent(). */
sa.sa.sa_handler = SIG_IGN;sa.sa.sa_flags = 0;siginitset(&sa.sa.sa_mask, sigmask(SIGCHLD));do_sigaction(SIGCHLD, &sa, (struct k_sigaction *)0);
set_current_state(TASK_INTERRUPTIBLE);while (!kthread_should_stop()) {
add_wait_queue(&cwq->more_work, &wait);
if (list_empty(&cwq->worklist))schedule();
else__set_current_state
(TASK_RUNNING);remove_wait_queue(&cwq->m
ore_work, &wait);
if (!list_empty(&cwq->worklist))
run_workqueue(cwq);
set_current_state(TASK_INTERRUPTIBLE);}__set_current_state(TASK_RUNNING);return 0;
}
158Chapter 4 Interrupts and Exceptions
Wait Queues Wait queues have several uses in kernel
especially for interrupt handling, process synchronization, and timing
A process wishing to wait for a specific event places itself in the proper wait queue and relinquishes control
Each wait queue is identified by a wait queue head (wait_queue_head_t) Wait queues are modified by interrupt handlers and major ke
rnel functions Protected by spinlock
Each element is of type wait_queue_t Each entry represents a sleeping process Exclusive processes: selectively woken up Nonexclusive processes: always woken up
159Chapter 4 Interrupts and Exceptions
Data Structuresstruct __wait_queue_head {
spinlock_t lock;struct list_head task_list;
};typedef struct __wait_queue_head wait_queue_head_t;
struct __wait_queue {unsigned int flags;
#define WQ_FLAG_EXCLUSIVE 0x01struct task_struct * task;wait_queue_func_t func;struct list_head task_list;
};
160Chapter 4 Interrupts and Exceptions
worker_thread() set_current_state(TASK_INTERRUPTIBLE);
mark it sleeping add_wait_queue(&cwq->more_work, &wait);
adds this thread into a wait queue if (list_empty(&cwq->worklist)) schedule()
do a context switch and sleep else __set_current_state(TASK_RUNNING);
Thread does not go to sleep remove_wait_queue(&cwq->more_work, &wait);
dequeue itself from the wait queue if (!list_empty(&cwq->worklist)) run_workqueue(cwq);
perform deferred work
161Chapter 4 Interrupts and Exceptions
Work Itemstruct work_struct {
unsigned long pending; // is this work pending?struct list_head entry; // link list of all workvoid (*func)(void *); // handler functionvoid *data; // argument to handlervoid *wq_data; // used internallystruct timer_list timer; // timer used by delay work queues
};
162Chapter 4 Interrupts and Exceptions
run_workqueue() while (!list_empty(&cwq->worklist)) {
Check out if worklist is empty, if not struct work_struct *work = list_entry(cwq->worklist.n
ext, struct work_struct, entry); Obtain one work item
void (*f) (void *) = work->func; Obtain handler function
void *data = work->data; Obtain argument to this handler function
list_del_init(cwq->worklist.next); Remove the work item
f(data); Execute handler function
163Chapter 4 Interrupts and Exceptions
Using Work Queues
Create work to defer DECLARE_WORK(xyz, void (*abc)(void *), void *def); It statically creates a work_struct structure named
xyz, with handler abc and data def Write work queue handler
void work_handler(void *data) for example It runs at process context
Schedule work On default event queue: schedule_work(&work); schedule_delayed_work(&work, delay);
164Chapter 4 Interrupts and Exceptions
static struct workqueue_struct *keventd_wq;int fastcall schedule_work(struct work_struct *w
ork){
return queue_work(keventd_wq, work);}int fastcall schedule_delayed_work(struct work_
struct *work, unsigned long delay){
return queue_delayed_work(keventd_wq, work, delay);
}int fastcall queue_delayed_work(struct workque
ue_struct *wq, struct work_struct *work, unsigned long delay)
{int ret = 0;struct timer_list *timer = &work->timer;if (!test_and_set_bit(0, &work->pending)) {
work->wq_data = wq;timer->expires = jiffies + delay;timer->data = (unsigned long)wor
k;timer->function = delayed_work_t
imer_fn;add_timer(timer);ret = 1;
}return ret;
}
/* We queue the work to the CPU it was submitted, but there is no guarantee that it will be processed by that CPU. */
int fastcall queue_work(struct workqueue_struct *wq, struct work_struct *work)
{int ret = 0, cpu = get_cpu();if (!test_and_set_bit(0, &work->pending)) {
if (unlikely(is_single_threaded(wq)))
cpu = 0;BUG_ON(!list_empty(&work->entr
y)); __queue_work(wq->cpu_wq + cpu, work);
ret = 1;}put_cpu(); return ret;
}void init_workqueues(void){
hotcpu_notifier(workqueue_cpu_callback, 0);keventd_wq = create_workqueue("events");BUG_ON(!keventd_wq);
}
165Chapter 4 Interrupts and Exceptions
int default_wake_function(wait_queue_t *curr, unsigned mode, int sync, void *key)
{task_t *p = curr->task;return try_to_wake_up(p, mode, sync);
}
#define wake_up(x) __wake_up(x, TASK_UNINTERRUPTIBLE | TASK_INTERRUPTIBLE, 1, NULL)
void fastcall __wake_up(wait_queue_head_t *q, unsigned int mode, int nr_exclusive, void *key)
{unsigned long flags;spin_lock_irqsave(&q->lock, flags);__wake_up_common(q, mode, nr_exclusive, 0, key);spin_unlock_irqrestore(&q->lock, flags);
}
#define list_for_each_safe(pos, n, head) \for (pos = (head)->next, n = pos->next; pos != (head); \pos = n, n = pos->next)
RA: try_to_wake_up() [TASK_INTERRUPTIBLE or TASK_UNINTERRUPTIBLE, 1 or 0 or nr]
static void __wake_up_common(wait_queue_head_t *q, unsigned int mode, int nr_exclusive, int sync, void *key)
{struct list_head *tmp, *next;list_for_each_safe(tmp, next, &q->task_list) {
wait_queue_t *curr;unsigned flags;curr = list_entry(tmp, wait_que
ue_t, task_list);flags = curr->flags;if (curr->func(curr, mode, sync,
key) && (flags & WQ_FLAG_EXCLUSIVE) && !--nr_exclusive) break;
}}
#define list_entry(ptr, type, member) \container_of(ptr, type, member)
#define container_of(ptr, type, member) ({\ const typeof( ((type *)0)->member ) *__mp
tr = (ptr); \ (type *)( (char *)__mptr - offsetof(type,me
mber) );})
166Chapter 4 Interrupts and Exceptions
Summary Choices for bottom halves
softirqs, tasklets, work queues Softirqs provide least serialization
Only used when scalability is a concern Tasklets are used if code is not finely
threaded Work queues process work items in
process context Easiest to use
167Chapter 4 Interrupts and Exceptions
Disabling Bottom Halves
local_bh_disable() To disable all bottom halves (softirqs and ta
sklets) local_bh_enable()
To enable bottom halves If nested, only the last call enables
168Chapter 4 Interrupts and Exceptions
local_bh_disable()
local_bh_disable() disables all bottom halves, except workqueue on local CPU
Disable local bottom halves by incrementing preempt_count
local_bh_enable() enables local bottom halves by decreasing preempt_count check if any pending softirq
#define local_bh_disable() \do { preempt_count() += SOFTIRQ_OFFSET; \ barrier(); } while (0)
169Chapter 4 Interrupts and Exceptions
local_bh_enable()
local_bh_enable() enables local bottom halves by decreasing preempt_count, and optionally run any pending bottom halves
void local_bh_enable(void){
__local_bh_enable(); if (unlikely(!in_interrupt() && local_softirq_pending()))
invoke_softirq();}
170Chapter 4 Interrupts and Exceptions
Usage of preempt_count
Preemption markers preempt_disable and preempt_enable operate on a defined int. preempt_count, stored in each threadinfo
bits 8-15 are softirq count max # of softirqs: 256
OFFSET SOFTIRQ_OFFSET : 0x00000100 SOFTIRQ_MASK : 0x0000ff00
171Chapter 4 Interrupts and Exceptions
irq_exit() #define irq_exit() do { \
preempt_count() -= IRQ_EXIT_OFFSET; \ if (!in_interrupt() & softirq_pending(smp_processor_id())) \
do_softirq(); \} while (0)
in_interrupt() examines preempt_count to check if it is in softirq context
local_bh_disable is mostly used in driver
asmlinkage void __do_softirq(void){ pending = local_softirq_pending();
local_bh_disable();…/* handle softirq MAX_SOFTIRQ_RESTART times */…__local_bh_enable();
}
172Chapter 4 Interrupts and Exceptions
Review Slide tasklet IRQ? DECLARE_TASKLE? DECLARE_TASKLET_DISABLED? tasklet_action()? ksoftirqd()? Work queue usage? workqueue_struct? cpu_workqueue_struct? work_str
uct? worker_thread()? run_workqueue()? schedule_work()? MP1: Provide timer & keyboard ISRs for eos_x86 opera
ting system
173Chapter 4 Interrupts and Exceptions
Return from Interrupts and Exceptions朱宗賢
174Chapter 4 Interrupts and Exceptions
Introduction The following things must be handled before
terminating an interrupt or exception handler # of kernel control paths being concurrently
executed If there is just one, CPU switches back to user
mode Pending process switch requests
If TIF_NEED_RESCHED is set, call schedule() Pending signals
If a signal is sent to current process, it must be handled
175Chapter 4 Interrupts and Exceptions
Related Terminating Functions
ret_from_exception() Terminates all exceptions except 0x80 ones
ret_from_intr() Terminate interrupt handlers
ret_from_sys_call() Terminates system calls (0x80 programmed excepti
on) ret_from_fork()
Terminates fork(), vfork(), or clone() system calls
176Chapter 4 Interrupts and Exceptions
ret_from_exception:
ret_from_intr:
Nested Kernel control paths?
Virtual v86 mode?
ret_from_fork:
schedule_tail()
System call tracing?
syscall_trace()
ret_from_sys_call:
Need reschedule?
schedule()
Pendingsignals?
Virtual v86 mode?
do_signal()
Restore hardware context
save_v86_state()
yes
no
yes
yes
yes yes
no
no no
no
yes
tracesys_exit:
reschedule:
signal_return:
v86_signal_return
restore_all:
Return from Interrupts and Exceptions
177Chapter 4 Interrupts and Exceptions
Returning from Interrupt
Return from an interrupt path is much more complicated than the entry path
It is a good place to do other tasks, unrelated to the interrupt, but need to done fairly frequently
These include checking for pending signals or if a reschedule is needed
178Chapter 4 Interrupts and Exceptions
General Implementation Issue
Number of kernel control paths being concurrenly executed
Pending process switch requests Pending signals
179Chapter 4 Interrupts and Exceptions
Exiting from Interrupt Handling
180Chapter 4 Interrupts and Exceptions
Return from System Call
Disable interrupt first. It means that the tests follow are guaranteed to be atomic
Check pending work-to-be-done flags in thread information syscall trace active resumption notification requested signal pending rescheduling necessary
181Chapter 4 Interrupts and Exceptions
Returning form Exception and Interrupts
We have to determine whether the CPU was already running in kernel mode before the interrupt or not Kernel mode/ user mode / vm86 mode
If so, we are dealing with a nested interrupt and want to terminate the processing of it as quickly as possible
182Chapter 4 Interrupts and Exceptions
//entry.Sret_from_exception:
preempt_stopret_from_intr:
GET_THREAD_INFO(%ebp)movl EFLAGS(%esp), %eax # mix EFLAGS and CSmovb CS(%esp), %altestl $(VM_MASK | 3), %eaxjz resume_kernel # returning to
ENTRY(resume_userspace) cli # make sure we don't miss an interru
pt# setting need_resched or sigpending# between sampling and the iret
movl TI_flags(%ebp), %ecxandl $_TIF_WORK_MASK, %ecx
# is there any work to be done on# int/exception return?jne work_pendingjmp restore_all
// entry.s# system call handler stubENTRY(system_call)
…syscall_call:call *sys_call_table(,%eax,4)movl %eax,EAX(%esp) # store the return value
syscall_exit:cli # make sure we don't miss an interrupt # setting need_resched or sigpending
# between sampling and the iretmovl TI_flags(%ebp), %ecxtestw $_TIF_ALLWORK_MASK, %cx # current->workjne syscall_exit_work
restore_all:RESTORE_ALL
183Chapter 4 Interrupts and Exceptions
Deal with Pending Signal Check VM_MASK bit in the flags register
(Kernel / VM86 mode) Call do_notify_resume() There is an extra complication if a signal
was found to be pending while the processor was running in virtual 8086 mode before interrupt It copies saved values from the stack to the v
m86_info filed of the thread structure
184Chapter 4 Interrupts and Exceptions
Reschedule Current Process
If there is any switch request, the kernel must perform process scheduling; otherwise, control is returned to the current process
If the current process cannot continue after interrupt, then work_resched() will be invoked
185Chapter 4 Interrupts and Exceptions
Return from Fork
ret_from_fork function is executed by the child process right after its creation through a fork(), vfork(), or clone() system call
schedule_tail(): It is relevant only in the SMP case. It tries to find a suitable CPU on which to run the process just switched out.
186Chapter 4 Interrupts and Exceptions
// entry.swork_resched:
call scheduleclimovl TI_flags(%ebp), %ecxandl $_TIF_WORK_MASK, %ecxjz restore_alltestb $_TIF_NEED_RESCHED, %cljnz work_resched
work_notifysig: # deal with pending signals and
# notify-resume requeststestl $VM_MASK, EFLAGS(%esp)movl %esp, %eaxjne work_notifysig_v86# returning to kernel-space or# vm86-spacexorl %edx, %edxcall do_notify_resumejmp restore_all
// entry.sENTRY(ret_from_fork)
pushl %eaxcall schedule_tailGET_THREAD_INFO(%ebp)popl %eaxjmp syscall_exit