Top Banner
A race-cure case study A look at how some standard software tools can illuminate what is happening inside Linux
29

A race-cure case study A look at how some standard software tools can illuminate what is happening inside Linux.

Dec 19, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A race-cure case study A look at how some standard software tools can illuminate what is happening inside Linux.

A race-cure case study

A look at how some standard software tools can illuminate what

is happening inside Linux

Page 2: A race-cure case study A look at how some standard software tools can illuminate what is happening inside Linux.

Our recent ‘race’ example

• Our ‘cmosram.c’ device-driver included a ‘race condition’ in its ‘read()’ and ‘write()’ functions, since accessing any CMOS memory-location is a two-step operation, and thus is a ‘critical section’ in our code:

outb( reg_id, 0x70 );

datum = inb( 0x71 );

• Once the first step in this sequence is taken, the second step needs to follow

Page 3: A race-cure case study A look at how some standard software tools can illuminate what is happening inside Linux.

No interventions!

• To guarantee the integrity of each access to CMOS memory, we must prohibit every possibility that another control-thread may intervene and access that same i/o-port

• The main ways in which an intervention by another ‘thread’ might happen are:– The current CPU could get ‘interrupted’; or– Another CPU could access the same i/o-port

Page 4: A race-cure case study A look at how some standard software tools can illuminate what is happening inside Linux.

Linux’s solution

• Linux provides a function that an LKM can call which is designed to insure ‘exclusive access’ to a CMOS memory-location:

datum = rtc_cmos_read( reg_id );

• By using this function, a programmer does not have to expend time and mental effort analyzing the race-condition and devising a suitable ‘cure’ for it

Page 5: A race-cure case study A look at how some standard software tools can illuminate what is happening inside Linux.

But how does it work?

• As computer science students, we are not satisfied with just using convenient ‘black-box’ solutions which we don’t understand

• Such purported ‘solutions’ may not always accomplish everything that they claim – if they perform correctly today, they still may fail in some way in the future (if hardware changes); we don’t want to be helpless!

Page 6: A race-cure case study A look at how some standard software tools can illuminate what is happening inside Linux.

Is ‘open source’ enough?

• In theory we could try to track down the actual behavior of the ‘rtc_cmos_read()’ function, by reading Linux’s source-code

• But is that really a practical approach?

• In some cases the answer might be ‘yes’, but in other situations it might be ‘no’!

• Life is short, and the kernel source-files are very numerous – with many layers

Page 7: A race-cure case study A look at how some standard software tools can illuminate what is happening inside Linux.

‘LXR’ can help

• The Linux Cross-Reference tool offers a way to automate searching kernel source

• This tool is online (see our website’s link under ‘Resources’) and it is hosted on a server in Norway:

http://lxr.linux.no/

• Here you just click on “Browse the Code”

Page 8: A race-cure case study A look at how some standard software tools can illuminate what is happening inside Linux.

From: <arch/i386/kernel/time.c>

unsigned char rtc_cmos_read(unsigned char addr) {

unsigned char val;

lock_cmos_prefix( addr ); outb_p( addr, RTC_PORT(0) ); val = inb_p( RTC_PORT(1) ; lock_cmos_suffix( addr ); return val;

} EXPORT_SYMBOL( rtc_cmos_read );

Page 9: A race-cure case study A look at how some standard software tools can illuminate what is happening inside Linux.

Another approach…

• There is an alternative to searching kernel source files -- which may well be faster

• We can use some standard command-line tools, including ‘objdump’ and ‘grep’

• In this approach, we look at the compiled kernel’s object-file, named ‘vmlinux’, found normally in the ‘/usr/src/linux’ subdirectory

• Using ‘objdump’ that file can be parsed!

Page 10: A race-cure case study A look at how some standard software tools can illuminate what is happening inside Linux.

‘objdump’ can disassemble

• Change the current working directory:$ cd /usr/src/linux

• Then, to disassemble the ‘vmlinux’ kernel file we use can this command:

$ objdump -d vmlinux

• But the amount of output will be huge, so it’s hard to find the part we’re interested in

Page 11: A race-cure case study A look at how some standard software tools can illuminate what is happening inside Linux.

‘grep’ can do filtering

• If we want to see the ‘rtc_cmos_read’ code we could use ‘grep’ to eliminate irrelevant parts of the disassembly-output:

$ objdump –d vmlinux | grep rtc_cmos_read

• But we still see too many lines of output (because the ‘rtc_cmos_read()’ function gets called at many places in the kernel)

Page 12: A race-cure case study A look at how some standard software tools can illuminate what is happening inside Linux.

‘System.map’

• We can use a special textfile, located in the ‘/boot’ directory, which tells us where each ‘exported’ kernel-symbol will reside at run-time in the virtual address-space

• You can use ‘cat’ to look at this textfile:$ cat /boot/System.map

• And you can use ‘grep’ to find only the symbol you care about:

$ cat /boot/System.map | grep rtc_cmos_read

Page 13: A race-cure case study A look at how some standard software tools can illuminate what is happening inside Linux.

Example on our machines

$ cat /boot/System.map-2.6.22.5cslabs | grep rtc_cmos_read

c0105574 T rtc_cmos_readc029b8a8 r __ksymtab_rtc_cmos_readc02a0bff r __kstrtab_rtc_cmos_read

Note that the usual ‘symbolic link’ is missing from the ‘/boot’ directory

on our class and lab machines -- so you have to type a longer name

With superuser privileges this could be fixed using the ‘ln’ command:

root# ln System.map-2.6.22.5cslabs System.map

Page 14: A race-cure case study A look at how some standard software tools can illuminate what is happening inside Linux.

Now we know where to look…

• From the ‘System.map’ we learn where in the kernel our ‘rtc_cmos_read()’ function will reside

• We can ‘extract’ that function’s code, for study purpose, using these steps:– Save the complete ‘vmlinux’ disassembly– Use ‘grep’ to find its starting-address– Use ‘vi’ to delete earlier and later instructions

Page 15: A race-cure case study A look at how some standard software tools can illuminate what is happening inside Linux.

• Step 1: saving the ‘vmlinux’ disassembly$ objdump –d /usr/src/linux/vmlinux > ~/vmlinux.asm

• Step 2: finding our function’s entry-point$ cat ~/vmlinux.asm | grep -n c0105574

Page 16: A race-cure case study A look at how some standard software tools can illuminate what is happening inside Linux.

What we discover

Find the line that shows this virtual address (with colon)

$ cat vmlinux.asm | grep -n c0105574:

6812:c0105574: 53 push %ebx

…and tell us which line-number it’s on

OK, here’s that line

…and this is it’s line-number

Page 17: A race-cure case study A look at how some standard software tools can illuminate what is happening inside Linux.

Use a text-editor

• Remove all the lines in your ‘vmlinux.asm’ textfile whose line-numbers precede 6812

• Scroll down, to find where your function ends (i.e., find its return-instruction ‘ret’):

c01055b7: c3 ret

• Delete all the lines that follow the ‘return’

Page 18: A race-cure case study A look at how some standard software tools can illuminate what is happening inside Linux.

The complete functionc0105574 <rtc_cmos_read>:c0105574: 53 push %ebxc0105575: 9c pushfc0105576: 5b pop %ebxc0105577: fa clic0105578: 64 8b 15 08 20 30 c0 mov %fs:0xc0302008,%edxc010557f: 0f b6 c8 movzbl %al,%ecxc0105582: 42 inc %edxc0105583: c1 e2 08 shl $0x8,%edxc0105586: 09 ca or %ecx,%edxc0105588: a1 3c 99 30 c0 mov 0xc030993c,%eaxc010558d: 85 c0 test %eax,%eaxc010558f: 75 f7 jne c0105588 <rtc_cmos_read+0x14>c0105591: f0 0f b1 15 3c 99 30 lock cmpxchg %edx,0xc030993cc0105598: c0c0105599: 85 c0 test %eax,%eaxc010559b: 75 eb jne c0105588 <rtc_cmos_read+0x14>c010559d: 88 c8 mov %cl,%alc010559f: e6 70 out %al,$0x70c01055a1: e6 80 out %al,$0x80c01055a3: e4 71 in $0x71,%alc01055a5: e6 80 out %al,$0x80c01055a7: c7 05 3c 99 30 c0 00 movl $0x0,0xc030993cc01055ae: 00 00 00c01055b1: 53 push %ebxc01055b2: 9d popfc01055b3: 0f b6 c0 movzbl %al,%eaxc01055b6: 5b pop %ebxc01055b7: c3 ret

Page 19: A race-cure case study A look at how some standard software tools can illuminate what is happening inside Linux.

Some ‘magic’ numbers

• There are some hexadecimal constants in this code-disassembly which we probably will not understand without more research– This memory-address: 0xc030993c– This i/o-port address: 0x80– This memory-address: %fs:0xc0302008

• There’s also a jump-target, but we do have some help in deciphering what it means:

jne c0105588 <rtc_cmos_read+0x14>

Page 20: A race-cure case study A look at how some standard software tools can illuminate what is happening inside Linux.

The ‘cmpxchg’ instruction

• The ‘cmpxchg’ instruction performs these CPU actions in a single operation:

cmpxchg source, destination

– The destination-operand is compared with the accumulator-register’s value, and the eflags-bits are adjusted to reflect this comparison’s result

– If ZF is set, the value of the source-operand is copied to the destination-operand; otherwise, the destination operand is copied to the accumulator register

• A ‘lock’ prefix stops another CPUs’ bus-access

Page 21: A race-cure case study A look at how some standard software tools can illuminate what is happening inside Linux.

‘spinlock’

c0105588: a1 3c 99 30 c0 mov 0xc030993c,%eaxc010558d: 85 c0 test %eax,%eaxc010558f: 75 f7 jne c0105588 <rtc_cmos_read+0x14>c0105591: f0 0f b1 15 3c 99 30 lock cmpxchg %edx,0xc030993cc0105598: c0c0105599: 85 c0 test %eax,%eaxc010559b: 75 eb jne c0105588 <rtc_cmos_read+0x14>

Before the code’s ‘critical section’ we have this:

And then after the code’s ‘critical section’ we have this: c01055a7: c7 05 3c 99 30 c0 00 movl $0x0,0xc030993c

c010559d: 88 c8 mov %cl,%alc010559f: e6 70 out %al,$0x70c01055a1: e6 80 out %al,$0x80c01055a3: e4 71 in $0x71,%alc01055a5: e6 80 out %al,$0x80

Then we have the function’s ‘critical section’ of code:

I/O-port 0x80 has an ‘undefined’ system functionused for time-delay

Page 22: A race-cure case study A look at how some standard software tools can illuminate what is happening inside Linux.

The ‘System-map’ again

• The ‘System.map’ shows what the other mysterious memory-addresses mean:

• We see that memory-address c030993c has the label ‘cmos_lock’ (supporting our previous conclusion about a ‘spinlock’); also we get a ‘clue’ about 0xc0302008

$ cat /boot/System.map-2.6.22.5cslabs | grep c030993cc030993c B cmos_lock

$ cat /boot/System.map-2.6.22.5cslabs | grep c0302008c0302008 D per_cpu__cpu_number

Page 23: A race-cure case study A look at how some standard software tools can illuminate what is happening inside Linux.

What is ‘per_cpu’ data?

• With SMP systems there is often a need for each CPU to have its own version of some program-variable’s value

• One example: each CPU needs a unique identification-number (used in scheduling tasks for ‘load-balancing’ and respecting ‘processor-affinity’, and keeping track of which CPU now owns a particular ‘lock’)

• That’s what ‘per_cpu__cpu_number’ is

Page 24: A race-cure case study A look at how some standard software tools can illuminate what is happening inside Linux.

Role of segmentation

• Linux has a clever way of allowing CPUS to access their ‘per_cpu’ variables using the same name for different locations

• This can be arranged by exploiting the CPU’s memory-segmentation architecture

• The FS segment-register is used by the kernel to reference identically-named, but differently positioned, storage-locations

Page 25: A race-cure case study A look at how some standard software tools can illuminate what is happening inside Linux.

Each CPU has its own GDT

• The Operating System sets up a Global Descriptor Table for each CPU; it’s an array of memory-segment descriptors:

segmentaccessrights

segment-base[ 15..0 ] segment-limit[ 15..0 ]

segment-base[ 23..16 ]

segment-base[ 31..24 ] segment-

limit[ 19..16 ]G D

63 32

31 0

‘segment-base’ tells where the memory-area begins, ‘segment-limit’ tells how far the memory-area extends, and ‘access rights’ specifies how the memory-area will be used by the CPU (e.g., user or kernel)

Page 26: A race-cure case study A look at how some standard software tools can illuminate what is happening inside Linux.

In-class exercise #1

• Install our ‘dram.c’ device-driver, so you can run our ‘showgdt.cpp’ application

• You will see a CPU’s memory-descriptors (displayed as quadwords in hex format)

• You will probably see a slightly different table when you run ‘showgdt’ again – if Linux schedules it on a different CPU

Page 27: A race-cure case study A look at how some standard software tools can illuminate what is happening inside Linux.

What’s in register FS?

• You can use our ‘newinfo.cpp’ utility to quickly create an LKM that displays the values in the CPU’s segment-registers:

// using ‘global variables’ simplifies the inline assembly language short _cs, _ds, _es, _fs, _gs, _ss; // global variables

int my_get_info( ){

int len;asm(“ mov %cs, _cs \n mov %ds, _ds “);len = sprintf( buf, “CS=%04X DS=%04X \n”, _cs, _ds );return len;

}

Page 28: A race-cure case study A look at how some standard software tools can illuminate what is happening inside Linux.

In-class exercise #2

• Use the value in the FS segment-register to look up that segment’s ‘base-address’ (different base-address on different CPU)

• Convert the ‘virtual’ base-address to its corresponding ‘physical’ base-address

• Use our ‘fileview’ utility to look at what’s stored in physical memory at those spots

• Check the location: %fs:0xc0302008

Page 29: A race-cure case study A look at how some standard software tools can illuminate what is happening inside Linux.

‘virtual-to-physical’

• If a virtual address is not in the ‘high’ area (i.e., if it’s below 0xF8000000), then it is easy to calculate it’s physical address by doing a simple subtraction

userspace(3GB)

kernelspace(1GB)

virtual address-space

4GB

0xC0000000

0xF8000000

Subtract 0xC0000000 from virtual address to get physical address – but NOT in HMA

High Memory Area