Browsing Linux Kernel Linux Day – May 6, 2007 Motaz K. Saad Department of Computer Science http://motaz.saad.googlepages.com [email protected]
Browsing Linux Kernel
Linux Day – May 6, 2007Motaz K. Saad
Department of Computer Sciencehttp://motaz.saad.googlepages.com
Linux Directory Structure
Structure of a Linux Based Operating System
Kernel
Get an in-depth understanding of the Linux operating system: Kernel• The Linux kernel acts as the interface between
the hardware, and the rest of the operating system.
• The Linux kernel also contains device drivers, which are specific to the hardware peripherals that you are using.
Get an in-depth understanding of the Linux operating system: Kernel• The kernel is also responsible for handling
things such as the allocation of resources (memory and CPU time),
• As well as keeping track of which applications are busy with which files,
• As well as security; and what each user is allowed to do on the operating system.
The Linux kernel version numbers
• Any even number kernel (for example 2.0.30) is a stable, released, kernel
• Any odd numbered kernel (for example 2.1.42 is a development kernel
• Kernel Source can be downloaded form kernel.org
How The Kernel Sources Are Arranged?:
• /usr/src/linux-2.6.x
How The Kernel Sources Are Arranged?
• arch: –The arch subdirectory contains all of
the architecture specific kernel code. –It has further subdirectories, one per
supported architecture, for example i386, mips, and alpha.
How The Kernel Sources Are Arranged?
• include: – The include subdirectory contains most of the
include files needed to build the kernel code. – It too has further subdirectories including one for
every architecture supported. – The include/asm subdirectory is a soft link to the
real include directory needed for this architecture, for example include/asm-i386.
– To change architectures you need to edit the kernel makefile and rerun the Linux kernel configuration program.
How The Kernel Sources Are Arranged?
• init:–This directory contains the
initialization code for the kernel and it is a very good place to start looking at how the kernel works.
How The Kernel Sources Are Arranged?
• mm:–This directory contains all of the
memory management code. –The architecture specific memory
management code lives down in arch/*/mm/, for example arch/i386/mm/fault.c.
How The Kernel Sources Are Arranged?
• drivers:–All of the system's device drivers live in
this directory. –They are further sub-divided into
classes of device driver, for example block.
How The Kernel Sources Are Arranged?
• ipc: –This directory contains the kernels
inter-process communications code.
• modules: –This is simply a directory used to hold
built modules.
How The Kernel Sources Are Arranged?
• fs: –All of the file system code. This is further
sub-divided into directories, one per supported file system, for example vfat, ext2, and ext3.
• kernel:– The main kernel code. Again, the
architecture specific kernel code is in arch/*/kernel.
How The Kernel Sources Are Arranged?
• net: –The kernel's networking code.
• lib: –This directory contains the kernel's
library code. –The architecture specific library code
can be found in arch/*/lib/.
How The Kernel Sources Are Arranged?
• Documentation: –The kernel's documentation. –Do not forget to read it.
Where to Start Looking?
• System Startup and Initialization• Memory Management• Kernel• PCI• Interprocess Communication• Interrupt Handling• Device Drivers• File Systems• Network• Modules
System Startup and Initialization
• On an Intel based system, the kernel starts when either loadlin.exe or LILO has loaded the kernel into memory and passed control to it.
• Look in arch/i386/kernel/head.S for this part. Head.S does some architecture specific setup and then jumps to the main() routine in init/main.c.
Memory Management
• This code is mostly in mm but the architecture specific code is in arch/*/mm.
• The page fault handling code is in mm/memory.c and the memory mapping and page cache code is in mm/filemap.c.
• The buffer cache is implemented in mm/buffer.c and the swap cache in mm/swap_state.c and mm/swapfile.c.
Kernel
• Most of the relevent generic code is in kernel with the architecture specific code in arch/*/kernel.
• The scheduler is in kernel/sched.c and the fork code is in kernel/fork.c.
• The bottom half handling code is in include/linux/interrupt.h.
• The task_struct data structure can be found in include/linux/sched.h.
PCI
• The PCI pseudo driver is in drivers/pci/pci.c with the system wide definitions in include/linux/pci.h.
• Each architecture has some specific PCI BIOS code, Alpha AXP's is in arch/alpha/kernel/bios32.c.
Interprocess Communication
• This is all in ipc. All System V IPC objects include an ipc_perm data structure and this can be found in include/linux/ipc.h.
• System V messages are implemented in ipc/msg.c, shared memory in ipc/shm.c and semaphores in ipc/sem.c.
• Pipes are implemented in ipc/pipe.c.
Interrupt Handling
• The kernel's interrupt handling code is almost all microprocessor (and often platform) specific.
• The Intel interrupt handling code is in arch/i386/kernel/irq.c and its definitions in include/asm-i386/irq.h.
Device Drivers
• Most of the lines of the Linux kernel's source code are in its device drivers.
• All of Linux's device driver sources are held in drivers but these are further broken out by type: – /cdrom : All of the CDROM code for Linux. It is
here that the special CDROM devices (such as Soundblaster CDROM) can be found. Note that the ide CD driver is ide-cd.c in drivers/block and that the SCSI CD driver is in scsi.c in drivers/scsi.
Device Drivers
– /pci : This are the sources for the PCI pseudo-driver. A good place to look at how the PCI subsystem is mapped and initialized. The Alpha AXP PCI fixup code is also worth looking at in arch/alpha/kernel/bios32.c.
– /scsi : This is where to find all of the SCSI code as well as all of the drivers for the scsi devices supported by Linux.
File Systems
• The sources for the EXT2 file system are all in the fs/ext2/ directory with data structure definitions in include/linux/ext2_fs.h, ext2_fs_i.h and ext2_fs_sb.h.
• The Virtual File System data structures are described in include/linux/fs.h and the code is in fs/*.
• The buffer cache is implemented in fs/buffer.c along with the update kernel daemon.
Data Structure of ext3 fs: ext3_fs.h/* * Structure of an inode on the disk */struct ext3_inode {
__le16 i_mode; /* File mode */__le16 i_uid; /* Low 16 bits of Owner Uid */__le32 i_size; /* Size in bytes */__le32 i_atime; /* Access time */__le32 i_ctime; /* Creation time */__le32 i_mtime; /* Modification time */__le32 i_dtime; /* Deletion Time */__le16 i_gid; /* Low 16 bits of Group Id */__le16 i_links_count; /* Links count */__le32 i_blocks;/* Blocks count */__le32 i_flags; /* File flags */
Here is the description of the traditional FAT entry in the currentWindows 95 filesystem: (from docs)
struct directory { // Short 8.3 names unsigned char name[8]; // file name unsigned char ext[3]; // file extension unsigned char attr; // attribute byte
unsigned char lcase; // Case for base and extensionunsigned char ctime_ms; // Creation time, millisecondsunsigned char ctime[2]; // Creation timeunsigned char cdate[2]; // Creation dateunsigned char adate[2]; // Last access dateunsigned char reserved[2]; // reserved values (ignored)
unsigned char time[2]; // time stamp unsigned char date[2]; // date stamp unsigned char start[2]; // starting cluster number unsigned char size[4]; // size of the file };
Fat fs data structure: msdos_fs.h
struct msdos_dir_entry {__u8 name[8],ext[3]; /* name and extension */__u8 attr; /* attribute bits */__u8 lcase; /* Case for base and extension */__u8 ctime_cs; /* Creation time, centiseconds (0-199) */__le16 ctime; /* Creation time */__le16 cdate; /* Creation date */__le16 adate; /* Last access date */__le16 starthi;/* High 16 bits of cluster in FAT32 */__le16 time,date,start;/* time, date and first cluster */__le32 size; /* file size (in bytes) */
};
Network
• The networking code is kept in net with most of the include files in include/net.
• The BSD socket code is in net/socket.c and the IP version 4 INET socket code is in net/ipv4/af_inet.c.
• The generic protocol support code (including the sk_buff handling routines) is in net/core with the TCP/IP networking code in net/ipv4 and net/ipv6. The network device drivers are in drivers/net.
TCP Finite State Machine (FSM )
• /net/ipv4/tcp.c– TCP_SYN_SENT– TCP_SYN_RECV– TCP_ESTABLISHED– TCP_FIN_WAIT1– TCP_FIN_WAIT2– TCP_CLOSING– TCP_TIME_WAIT– TCP_CLOSE_WAIT– TCP_LAST_ACK– TCP_CLOSE
Big-Endian & Little-Endian
Big-Endian Little-Endian
Endianness in networking
• Networks generally use big-endian order.• In fact, the Internet Protocol (IP) defines a standard
big-endian network byte order. • This byte order is used for all numeric values in the
packet headers and by many higher level protocols and file formats that are designed for use over IP.
• The Berkeley sockets API defines a set of functions to convert 16- and 32-bit integers to and from network byte order:– The htonl (host-to-network-long) and htons (host-to-
network-short) functions convert 32-bit and 16-bit values respectively from machine (host) to network order
– Whereas the ntohl and ntohs functions convert from network to host order.
Determining the byte order
#define BIG_ENDIAN 0
#define LITTLE_ENDIAN 1
int TestByteOrder() { short int word = 0x0001;
char *byte = (char *) &word;
return(byte[0] ? LITTLE_ENDIAN : BIG_ENDIAN);
}
Endianess in kernel: linux/include/ip.hstruct iphdr {#if defined(__LITTLE_ENDIAN_BITFIELD)
__u8 ihl:4,version:4;
#elif defined (__BIG_ENDIAN_BITFIELD)__u8 version:4,
ihl:4;#else#error "Please fix <asm/byteorder.h>"#endif
__u8 tos;__be16 tot_len;__be16 id;__be16 frag_off;__u8 ttl;__u8 protocol;__u16 check;__be32 saddr;__be32 daddr;/*The options start here. */
};
Endianness in TCP Heaher include/linux/tcp.h
struct tcphdr {__u16 source;__u16 dest;__u32 seq;__u32 ack_seq;
#if defined(__LITTLE_ENDIAN_BITFIELD)__u16 res1:4,
doff:4,fin:1,syn:1,rst:1,psh:1,ack:1,urg:1,ece:1,cwr:1;
Endianness in TCP Heaher include/linux/tcp.h (Cont.)
#elif defined(__BIG_ENDIAN_BITFIELD)__u16 doff:4,
res1:4,cwr:1,ece:1,urg:1,ack:1,psh:1,rst:1,syn:1,fin:1;
Modules
• The kernel module code is partially in the kernel and partially in the modules package. The kernel code is all in kernel/modules.c with the data structures and kernel demon kerneld messages in include/linux/module.h and include/linux/kerneld.h respectively.
• You may want to look at the structure of an ELF object file in include/linux/elf.h.
Thank you!