Linux Internals Training - Minh, Inc · Day 1 Morning Lecture - Introduction to Linux GNU Project/GPL Licensing Evolution of Linux & Development Model Device Identities in Linux -
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Overview Understanding Virtual memory, Process concepts, IPC, File Systems EXT2Understanding Shell programmingUnderstanding Boot processUnderstanding cross compilation and installing Linux installation onembedded hardwareUnderstanding developing application for embedded systems
Duration Five days - 40 hours (8hours a day)50% of lecture, 50% of practical labs.
Audience Professional Software developersPeople supporting embedded and medium scale products.
Prerequisite Knowledge of c programmingAll examples are provided through c programming language.Knowledge of c programming is required.C training slides can browsed athttp://www.minhinc.com/training/c/advance-c-slides.php
Pdf document can be downloaded fromhttp://www.minhinc.com/training/advance-c-slides.pdf
GNU Project/GPL LicensingEvolution of Linux & Development ModelDevice Identities in Linux - PartitioningScheme/dev/filesMajor Minor device numbermknod system call
Lecture - Introduction to Kernel
History of LinuxTypes of KernelThe Linux kernelKernel Architecture
Lecture - Shell commands & Shell
Basic Shell commandsBash Shell Essentials- Introduction- Process- Redirection- Shell Programming- Programming Commands- Advance Shell Programming- Function- Array- I/O Redirection and file descriptor- Local and Global variables- Conditional ExecutionCreating Makefiles
Lecture
Lecture session will be course content presentation through the trainer.Any source code example related to the topic will be demonstrated, it wouldinclude executing the binaries. Complete lecture material can be downloaded fromhttp://www.minhinc.com/training/advance-li-slides.pdf
Labs
Lecture session will be course content presentation through the trainer.Any source code example related to the topic will be demonstrated, it wouldinclude executing the binaries.
mknodWrite a Makefile to compile fileCreate a static library using MakefileCreate a dynamic library using MakefileWrite application using static library anddynamic library generated
Day 2 Morning
Lecture - Creating Libraries
Creating Static Library- Using Static LibraryCreating Shared Library- Using Shared Library
Defining and Creating secondary memory areasMemory allocation & deallocation system calls malloc,calloc,alloca, freeDemand Paging definedProcess Organization in MemoryAddress Translation and page fault handlingVirtual Memory Management
Day 3 Afternoon
Lab
Implement late bindingCreate hard linkCreate soft linkWrite a program to enumerate stat structurefor both hard link and soft link. Illustrate whichfield is differentCreate a child process and validates if allopen descriptors are copied to child process also. - Use file seek from parent and see child's descriptor also got seeked.
Day 4 Morning
Lecture - Multi Thread Programming
Creating multiple threadsParent synchronization with other Thread
Write a multi threaded application and check ifglobal variables are shared. - Protect them using semaphores
Day 5 Morning
Lecture - Network Programming
TCP Server Client ProgrammingUDP Server Client ProgrammingNetlink socket interface
Lecture - Programming and Debugging tools
strace - Tracing System callsltrace : Tracing Library callsTools used to detect memory access errorand Memory leakage in Linux : mtraceUsing gdb and ddd utilitiesCore dump Analysis etc.
Lecture - Device Driver Introduction
IntroductionKernel modulesCharacter device driversBlock device driversHardware and Interrupt Handling
Linux Internals EssentialsLinux Internals Essenstials- Training Course
Minh, Inc.
DISCLAIMERDISCLAIMER
Text of this document is written in Bembo Std Otf(13 pt) font.Text of this document is written in Bembo Std Otf(13 pt) font.
Code parts are written in Consolas (10 pts) font.Code parts are written in Consolas (10 pts) font.
This training material is provided through This training material is provided through Minh, Inc., B'lore, India, B'lore, IndiaPdf version of this document is available at Pdf version of this document is available at http://www.minhinc.com/training/advance-li-slides.pdfFor suggestion(s) or complaint(s) write to us at For suggestion(s) or complaint(s) write to us at [email protected]
Document modified on Sep-30-2019 Document modified on Sep-30-2019
1. Introduction to LinuxGNU Project/GPL LicensingGNU Project/GPL LicensingEvolution of Linux & Development ModelEvolution of Linux & Development ModelDevice Identities in Linux-Partitioning SchemaDevice Identities in Linux-Partitioning Schema
Day 1 Morning
1. Introduction to LinuxGNU Project/GPL LicensingGNU Project/GPL Licensing
Evolution of Linux & Development ModelEvolution of Linux & Development ModelDevice Identities in Linux-Partitioning SchemaDevice Identities in Linux-Partitioning Schema
a) You must cause the modified files to carry prominent notices stating that you changed the files and the date of any changes.b) You must cause any work that you distribute or publish, that in whole or in part contains or is derived from the Program or any part thereof, to be licensed as a whole at no charge to all third parties under the terms of this License.c) If the modified program normally reads commands interactively when run, you must cause it, when started running for such interactive use in the most ordinary way, to print or display an announcement including an appropriate copyright notice and a notice that there is no warranty (or else, saying that you provide a warranty) and that users may redistribute the program under these conditions, and telling the user how to view a copy of this License. (Exception: if the Program itself is interactive but does not normally print such an announcement, your work based on the Program is not required to print an announcement.)
* 1991: The Linux kernel is publicly announced on 25 August by the 21-year-old Finnish student Linus Benedict Torvalds.^[13]* 1992: The Linux kernel is relicensed under the GNU GPL. The first Linux distributions are created.* 1993: Over 100 developers work on the Linux kernel. With their assistance the kernel is adapted to the GNU environment, which creates a large spectrum of application types for Linux. The oldest currently (as of 2015) existing Linux distribution, Slackware, is released for the first time. Later in the same year, the Debian project is established. Today it is the largest community distribution.* 1994: Torvalds judges all components of the kernel to be fully matured: he releases version 1.0 of Linux. The XFree86 project contributes a graphical user interface (GUI). Commercial Linux distribution makers Red Hat and SUSE publish version 1.0 of their Linux distributions.* 1995: Linux is ported to the DEC Alpha and to the Sun SPARC.Over the following years it is ported to an ever greater number of platforms.* 1996: Version 2.0 of the Linux kernel is released. The kernel can now serve several processors at the same time using symmetric multiprocessing (SMP), and thereby becomes a serious alternative for many companies.* 1998: Many major companies such as IBM, Compaq and Oracle announce their support for Linux. The Cathedral and the Bazaar is first published as an essay (later as a book), resulting in Netscape publicly releasing the source code to its Netscape Communicator web browser suite. Netscape's actions and crediting of the essay^[50] brings Linux's open source development model to the attention of the popular technical press. In addition a group of programmers begins developing the graphical user interface KDE.* 1999: A group of developers begin work on the graphical environment GNOME, destined to become a free replacement for KDE, which at the time, depends on the, then proprietary, Qt toolkit. During the year IBM announces an extensive project for the support of Linux.* 2000: Dell announces that it is now the No. 2 provider of Linux-based systems worldwide and the first major manufacturer to offer Linux across its full product* 2002: The media reports that "Microsoft killed Dell Linux"^[52]* 2004: The XFree86 team splits up and joins with the existing X standards body to form the X.Org Foundation, which results in a substantially faster development of the X server for Linux.* 2005: The project openSUSE begins a free distribution from Novell's community. Also the project OpenOffice.org introduces version 2.0 that then started supporting OASIS OpenDocument standards.* 2006: Oracle releases its own distribution of Red Hat Enterprise Linux. Novell and Microsoft announce cooperation for a better interoperability and mutual patent protection.* 2007: Dell starts distributing laptops with Ubuntu pre-installed on them.* 2009: RedHat's market capitalization equals Sun's, interpreted as a symbolic moment for the "Linux-based economy".^[53]* 2011: Version 3.0 of the Linux kernel is released.* 2012: The aggregate Linux server market revenue exceeds that of the rest of the Unix market.^[54]* 2013: Google's Linux-based Android claims 75% of the smartphone market share, in terms of the number of phones shipped.^[55]* 2014: Ubuntu claims 22,000,000 users.^[56]* 2015: Version 4.0 of the Linux kernel is released.
1. Introduction to LinuxGNU Project/GPL LicensingGNU Project/GPL Licensing
Evolution of Linux & Development ModelEvolution of Linux & Development Model
Device Identities in Linux-Partitioning SchemaDevice Identities in Linux-Partitioning Schema
Device comes in two flavours:- A character device represents a hardware device that reads or writes a serial stream of data bytes. Serial and parallel ports, tape drives, terminal devices, and sound cards.
-A block device represents a hardware device that reads or write data in fixed size blocks.unlike a character device, a block device provides random access to data stored on the device.a disk drive is an example of a block device.
Linux identifies devices using two numbers:the major device number and the minor device number.
Major device number generally identifies a driver where as minor number identifies devices controlled by the driver.so actual device is identified as major:minor combination. A device can be master and slave. master are identified with 1,2,3... and slaves as 65,66,67...
For each device there is a device file or device entry in the file system.cp rm mv commands works on device file as regular file.data transfer happens from actual device through device driver. use mknod to create file entry for the device.
$mknod ./lp0 c 6 0lp0 - path to the device filec - character device, b for block device6 - major device number, driver id0 - minor master device number
What's a Kernel?- AKA: executive, system monitor.- Controls and mediates access to hardware.- Implements and supports fundamental abstractions: - Processes, files, devices etc.- Schedules / allocates system resources: - Memory, CPU, disk, descriptors, etc.- Enforces security and protection.- Responds to user requests for service (system calls).- Etc...
Kernel Design Goals- Performance: efficiency, speed. - Utilize resources to capacity with low overhead.- Stability: robustness, resilience. - Uptime, graceful degradation.- Capability: features, flexibility, compatibility.- Security, protection. - Protect users from each other & system from bad users.- Portability.- Extensibility.
2. Introduction to KernelHistory of LinuxHistory of Linux
Types of kernelTypes of kernelThe Linux KernelThe Linux Kernel
Kernel ArchitectureKernel Architecture
Day 1 Morning
2. Introduction to KernelHistory of LinuxHistory of Linux
Types of kernelTypes of kernel
The Linux KernelThe Linux KernelKernel ArchitectureKernel Architecture
Types of Kernel- Monolithic.- Layered.- Modularized.- Micro-kernel.- Virtual machine.
A monolithic kernel is a kernel where all services (file system, VFS, device drivers, etc) as well as core functionality (scheduling, memory allocation, etc.) are a tight knit group sharing the same space. This directly opposes a microkernel.
A monolithic kernel is a kernel architecture where the entire operating system is working in the kernel space and alone as supervisor mode. In difference with other architectures,1 the monolithic kernel defines alone a high-level virtual interface over computer hardware, with a set of primitives or system calls to implement all operating system services such as process management, concurrency, and memory management itself and one or more device drivers as modules.
A microkernel prefers an approach where core functionality is isolated from system services and device drivers (which are basically just system services). For instance, VFS (virtual file system) and block device file systems (i.e. minixfs) are separate processes that run outside of the kernel's space, using IPC to communicate with the kernel, other services and user processes. In short, if it's a module in Linux, it's a service in a microkernel, indicating an isolated process.
Recent versions of Windows on the other hand use a Hybrid kernel.
A hybrid kernel is a kernel architecture based on combining aspects of microkernel and monolithic kernel architectures used in computer operating systems. The category is controversial due to the similarity to monolithic kernel; the term has been dismissed by some as simple marketing. The traditional kernel categories are monolithic kernels and microkernels (with nanokernels and exokernels seen as more extreme versions of microkernels).
2. Introduction to KernelHistory of LinuxHistory of Linux
Types of kernelTypes of kernel
The Linux KernelThe Linux Kernel
Kernel ArchitectureKernel Architecture
Linux Source Tree
linux/arch- Subdirectories for each current port.- Each contains kernel, lib, mm, boot and other directories whose contents override code stubs in architecture independent code.- lib directory contains highly-optimized common utility routines such as memcpy, checksums, etc.- arch directory as of 2.4: - alpha, arm, i386, ia64, m68k, mips, mips64. - ppc, s390, sh, sparc, sparc64.
linux/drivers- Largest amount of code in the kernel tree (~1.5M).- device, bus, platform and general directories.- drivers/char - n_tty.c is the default line discipline.- drivers/block - elevator.c, genhd.c, linear.c, ll_rw_blk.c, raidN.c.- drivers/net - specific drivers and general routines Space.c and net_init.c.- drivers/scsi - scsi_*.c files are generic; sd.c (disk), sr.c (CD- ROM), st.c (tape), sg.c (generic).- General: - cdrom, ide, isdn, parport, pcmcia, pnp, sound, telephony, video.- Buses - fc4, i2c, nubus, pci, sbus, tc, usb.- Platforms - acorn, macintosh, s390, sgi.
- Header info needed both by the kernel and user apps. - Usually linked to /usr/include/linux. - Kernel-only portions guarded by #ifdefs - #ifdef __KERNEL__ - /* kernel stuff */ - #endif- Other directories: - math-emu, net, pcmcia, scsi, video.
linux/init- Just two files: version.c, main.c.- version.c - contains the version banner that prints at boot.- main.c - architecture-independent boot code.- start_kernel is the primary entry point.
linux/ipc- System V IPC facilities.- If disabled at compile-time, util.c exports stubs that simply return -ENOSYS.- One file for each facility: - sem.c - semaphores. - shm.c - shared memory. - msg.c - message queues.
linux/kernel- The core kernel code.- sched.c - "the main kernel file": - scheduler, wait queues, timers, alarms, task queues.- Process control: - fork.c, exec.c, signal.c, exit.c etc...- Kernel module support: - kmod.c, ksyms.c, module.c.- Other operations: - time.c, resource.c, dma.c, softirq.c, itimer.c. - printk.c, info.c, panic.c, sysctl.c, sys.c.
linux/lib- kernel code cannot call standard C library routines.- Files: - brlock.c - "Big Reader" spinlocks. - cmdline.c - kernel command line parsing routines. - errno.c - global definition of errno. - inflate.c - "gunzip" part of gzip.c used during boot. - string.c - portable string code. - Usually replaced by optimized, architecture- dependent routines. - vsprintf.c - libc replacement.
3. Shell commands & ShellBasic Shell commandsBasic Shell commandsBash Shell EssentialsBash Shell Essentials- Introduction- Introduction - Process - Process
-Shell Programming-Shell Programming
- Shell Programming - Shell Programming
-Programming commands-Programming commands
- Advance Shell Programming- Advance Shell Programming - Function - Function - Array - Array - I/O Redirection and file descriptor - I/O Redirection and file descriptor - Local and Global variables - Local and Global variables - Conditional Execution - Conditional Execution
Creating MakefilesCreating Makefiles
Shell structureShell scripting has four components1) Kernel2) Shell Process3) Command Process4) Redirectors, Pipes, Filters etc.
Kernel does- I/O management- Process management- File management- Memory management
----------- ----------------- -------------| User | ------> | Linux Shell | ---------> | Kernel |----------- ----------------- ------------- | V ------------------- | command process | -------------------
Shells
NOTE: To find your shell type following command$ echo $SHELL
- Advance Shell Programming- Advance Shell Programming - Function - Function - Array - Array - I/O Redirection and file descriptor - I/O Redirection and file descriptor - Local and Global variables - Local and Global variables - Conditional Execution - Conditional Execution
Creating MakefilesCreating Makefiles
$ date --help
$ ls --help | moreSyntax: command-name --helpSyntax: man command-nameSyntax: info command-name
$ man ls$ info bashNOTE: In MS-DOS, you get help by using /? clue or by typing help command asC:\> dir /?C:\> date /?C:\> help timeC:\> help dateC:\> help
Linux Command$ date$ who$ pwd$ ls$ cat > myfile$ more myfile$ mv sales
ProcessA process is program (command given by user) to perform some Job. In Linux when you start process, it gives a number (called PID or process-id), PID starts from 0 to 65535.$ ls -lR , is command or a request to list files in a directory and all sub directory in your current directory.
Why Process requiredLinux is multi-user, multitasking o/s. It means you can run more than two process simultaneously if you wish. For e.g.. To find how many files do you have on your system you may give command like$ ls / -R | wc -lThis command will take lot of time to search all files on your system. So you can run such command in Background or simultaneously by giving command like$ ls / -R | wc -l &The ampersand (&) at the end of command tells shells start command (ls / -R | wc -l) and run it in background takes next command immediately. An instance of
- Advance Shell Programming- Advance Shell Programming - Function - Function - Array - Array - I/O Redirection and file descriptor - I/O Redirection and file descriptor - Local and Global variables - Local and Global variables - Conditional Execution - Conditional Execution
Creating MakefilesCreating Makefiles
running command is called process and the number printed by shell is called process-id (PID), this PID can be use to refer specific running process.
Redirection of Standard output/input or Input - Output redirection(1) > Redirector Symbol (Truncate to zero and write)Syntax: Linux-command > filename$ ls > myfiles(2) >> Redirector Symbol (Append)Syntax: Linux-command >> filename$ date >> myfiles(3) < Redirector SymbolSyntax: Linux-command < filenameTo take input to Linux-command from file instead of key-board. For e.g. To take input for cat command give$ cat < myfiles
PipesA pipe is a way to connect the output of one program to the input of another program without any temporary file.
A pipe is nothing but a temporary storage place where the output of one command is stored and then passed as the input for second command. Pipes are used to run more than two commands ( Multiple commands) from same command line.Syntax: command1 | command2
FilterA filter command takes input from a pipe and constricts the output of the previous program.$ tail +20 < hotel.txt | head -n30 >hlistHere head is filter which takes its input from tail command (tail command startselecting from line number 20 of given file i.e. hotel.txt) and passes this lines toinput to head, whose output is redirected to 'hlist' file.
- Advance Shell Programming- Advance Shell Programming - Function - Function - Array - Array - I/O Redirection and file descriptor - I/O Redirection and file descriptor - Local and Global variables - Local and Global variables - Conditional Execution - Conditional Execution
Creating MakefilesCreating Makefiles
Introduction to Shell ProgrammingShell program is series of Linux commands.
Variables in LinuxSometimes to process our data/information, variables are remembered by shell Process.
1) System variables - Created and maintained by Linux itself. This type of variable defined in CAPITAL LETTERS.2) User defined variables (UDV) - Created and maintained by user. This type of variable defined in lower LETTERS.
$ echo $USERNAME$ echo $HOMECaution: Do not modify System variable this can some time create problems.
User Defined Variable
Syntax: variablename=valueNOTE: Here 'value' is assigned to given 'variablename' and Value must be on right side = sign Fore.g.$ no=10 # this is ok$ 10=no # Error, NOT Ok, Value must be on right side of = sign.To define variable called 'vech' having value Bus
$ vech=BusTo define variable called n having value 10$ n=10
You can define NULL variable as follows (NULL variable is variable which has no value at the time of definition) For e.g.$ vech=$ vech=""Try to print it's value $ echo $vech , Here nothing will be shown because variable has no value i.e. NULL variable.
To print or access variables use following syntaxSyntax: $variablenameFor eg. To print contains of variable 'vech'$ echo $vech
OR Syntax: chmod 777 shell-script-name(2) Run our script asSyntax: ./your-shell-program-nameFor e.g.$ ./first
OR /bin/sh your-shell-program-nameFor e.g.$ bash first$ /bin/sh first
Script file name complete path is required OR PATH variable needs to be set.To run the script, file name complete path is required
OR PATH variable needs to be set.
Commands Related with Shell Programming(1)echo [options] [string, variables...]Displays text or variables value on screen.Options-n Do not output the trailing new line.-e Enable interpretation of the following backslash escaped characters in the strings:\a alert (bell)\b backspace\c suppress trailing new line
new line\r carriage return\t horizontal tab\\ backslashFor eg. $ echo -e "An apple a day keeps away \a\t\tdoctor"
(2)More about QuotesThere are three types of quotes" i.e. Double Quotes' i.e. Single quotes` i.e. Back quote1."Double Quotes" - Anything enclose in double quotes removed meaning of that characters (except \ and $).2. 'Single quotes' - Enclosed in single quotes remains unchanged.3. `Back quote` - To execute command.For eg.$ echo "Today is date"Can't print message with today's date.$ echo "Today is `date`".Now it will print today's date as, Today is Tue Jan ....,See the `date` statement uses back quote,(See also Shell Arithmetic NOTE).
3) Shell ArithmeticUse to perform arithmetic operations For e.g.$ expr 1 + 3$ expr 2 - 1$ expr 10 / 2$ expr 20 % 3 # remainder read as 20 mod 3 and remainder is 2)$ expr 10 \* 3 # Multiplication use \* not * since its wild card)$ echo `expr 6 + 3`For the last statement note the following points1) First, before expr keyword we used ` (back quote) sign not the (single quote i.e. ') sign. Backquote is generally found on the key under tilde (~) on PC keyboards OR To the above of TAB key.2) Second, expr is also end with ` i.e. back quote.3) Here expr 6 + 3 is evaluated to 9, then echo command prints 9 as sum4) Here if you use double quote or single quote, it will NOT work, For eg.$ echo "expr 6 + 3" # It will print expr 6 + 3$ echo 'expr 6 + 3'
Exit StatusBy default in Linux if particular command is executed, it return two type of values,if return value is zero (0), command is successfulIf return value is nonzero (>0), command is not successful or some sort of error executing command/shell script.This value is know as Exit Status of that command.To determine this exit Status we use $? variable of shell. For eg.$ rm unknow1filerm: cannot remove 'unkowm1file': No such file or directoryand after that if you give command $ echo $?it will print nonzero value(>0) to indicate error. Now give command$ ls$ echo $?It will print 0 to indicate command is successful.
-Programming commands-Programming commands- Advance Shell Programming- Advance Shell Programming - Function - Function - Array - Array - I/O Redirection and file descriptor - I/O Redirection and file descriptor - Local and Global variables - Local and Global variables - Conditional Execution - Conditional Execution
Creating MakefilesCreating Makefiles
If-then-fi for decision making is shell script$ bcfollows type 5 + 2 as5+277 is response of bc i.e. addition of 5 + 2 you can even try5-25/2Now what happened if you type 5 > 2 as follows5>20
Syntax:if condition then command1 if condition is true or if exit status of condition is 0 (zero) ... ...fi
test command or [ expr ]test command or [ expr ] is used to see if an expression is true, and if it is true it return zero(0),otherwise returns nonzero(>0) for false. Syntax: test expression OR [ expression ]Now will write script that determine whether given argument number is positive. Write script as follows$ cat > ispostive#!/bin/sh## Script to see whether argument is positive#if test $1 -gt 0then echo "$1 number is positive"fi
Or
$ cat > ispostive#!/bin/sh## Script to see whether argument is positive#If [ test $1 -gt 0 ]
test or [ expr] works with1.Integer ( Number without decimal point)2.File types3.Character stringsFor Mathematics use following operator in Shell Script
NOTE: == is equal, != is not equal.For string Comparisons use
Shell also test for file and directory types
if...else...fiIf given condition is true then command1 is executed otherwise command2 is executed.Syntax:if conditionthen command1 if condition is true or if exit status of condition is 0(zero) ... ...else command2 if condition is false or if exit status of condition is >0 (nonzero) ... ...fi$ cat > isnump_n#!/bin/sh# Script to see whether argument is positive or negative#if [ $# -eq 0 ]then echo "$0 : You must give/supply one integers" exit 1fiif test $1 -gt 0then echo "$1 number is positive"else echo "$1 number is negative"
fiMultilevel if-then-elseSyntax:if conditionthen condition is zero (true - 0) execute all commands up to elif statementelif condition1 condition1 is zero (true - 0) execute all commands up to elif statementelif condition2 condition2 is zero (true - 0) execute all commands up to elif statementelse None of the above condtion,condtion1,condtion2 are true (i.e. all of the above nonzero or false) execute all commands up to fifi
for loop Syntax:
for { variable name } in { list } do execute one for each item in the list until the list is not finished (And repeat all statement between do and done) done
Suppose,$ cat > testforfor i in 1 2 3 4 5do echo "Welcome $i times"doneRun it as,$ chmod +x testfor$ ./testfor
while loopSyntax:while [ condition ]do command1 command2 command3 .. ....done
$cat > nt1#!/bin/sh#Script to test while statementif [ $# -eq 0 ]then echo "Error - Number missing form command line argument" echo "Syntax : $0 number" echo " Use to print multiplication table for given number" exit 1fin=$1i=1while [ $i -le 10 ]do echo "$n * $i = `expr $i \* $n`" i=`expr $i + 1`done
The case StatementThe case statement is good alternative to Multilevel if-then-else-fi statement. It enable you to match several values against one variable. Its easier to read and write.Syntax:case $variable-name inpattern1) command .. command;;pattern2) command .. command;;patternN) command .. command;; *) command .. command;;esac
The $variable-name is compared against the patterns until a match is found. The shell then executes all the statements up to the two semicolons that are next to each other. The default is *) and its executed if no match is found. For eg. Create script as follows$ cat > car## if no vehicle name is given# i.e. -z $1 is defined and it is NULL## if no command line argif [ -z $1 ]then rental="*** Unknown vehicle ***"elif [ -n $1 ]then# otherwise make first arg as rental rental=$1ficase $rental in "car") echo "For $rental Rs.20 per k/m";; "van") echo "For $rental Rs.10 per k/m";; "jeep") echo "For $rental Rs.5 per k/m";; "bicycle") echo "For $rental 20 paisa per k/m";; *) echo "Sorry, I can not gat a $rental for you";;esac
Save it by pressing CTRL+D$ chmod +x car$ car van$ car car$ car Maruti-800
The read StatementUse to get input from keyboard and store them to variable.Syntax: read varible1, varible2,...varibleNCreate script as$ cat > sayH##Script to read your name from key-board#echo "Your first name please:"read fnameecho "Hello $fname, Lets be friend!"Run it as follows$ chmod +x sayH$ ./sayH
Filename Shorthand or meta Characters (i.e. wild cards)
* or ? or [...] is one of such shorthand character.* Matches any string or group of characters.For e.g. $ ls * , will show all files, $ ls a* - will show all files whose first name is starting with letter'a', $ ls *.c ,will show all files having extension .c $ ls ut*.c, will show all files having extension .c but first two letters of file name must be 'ut'.? Matches any single character.For e.g. $ ls ? , will show one single letter file name, $ ls fo? , will show all files whose names are 3 character long and file name begin with fo[...] Matches any one of the enclosed characters.For e.g. $ ls [abc]* - will show all files beginning with letters a,b,c[..-..] A pair of characters separated by a minus sign denotes a range;For eg. $ ls /bin/[a-c]* - will show all files name beginning with letter a,b or c like/bin/arch /bin/awk /bin/bsh /bin/chmod /bin/cp/bin/ash /bin/basename /bin/cat /bin/chown /bin/cpio/bin/ash.static /bin/bash /bin/chgrp /bin/consolechars /bin/csh
But
$ ls /bin/[!a-o]$ ls /bin/[^a-o]
command1;command2To run two command with one command line.For eg. $ date;who ,Will print today's date followed http://www.freeos.com/guides/lsst/shellprog.htm (18 of 19) [17/08/2001 17.42.21] Linux Shell Script Tutorialby users who are currently login.
- Advance Shell Programming- Advance Shell Programming - Function - Function - Array - Array - I/O Redirection and file descriptor - I/O Redirection and file descriptor - Local and Global variables - Local and Global variables - Conditional Execution - Conditional ExecutionCreating MakefilesCreating Makefiles
/dev/null - Use to send unwanted output of programSyntax: command > /dev/nullFor e.g. $ ls > /dev/null , output of this command is not shown on screen its send to this special file. The /dev directory contains other device files. The files in this directory mostly represent peripheral devices such disks liks floppy disk, sound card, line printers etc.local and Global Shell variable (export command)Normally all our variables are local. Local variable can be used in same shell, if you load another copy of shell (by typing the /bin/bash at the $ prompt) then new shell ignored all old shell's variable. For e.g.Consider following example$ vech=Bus$ echo $vechBus$ /bin/bash$ echo $vechNOTE:-Empty line printed$ vech=Car$ echo $vechCar$ exit$ echo $vech
Conditional execution i.e. && and ||The control operators are && (read as AND) and || (read as OR). An AND list has theSyntax: command1 && command2Here command2 is executed if, and only if, command1 returns an exit status of zero. An OR list has theSyntax: command1 || command2Here command2 is executed if and only if command1 returns a non-zero exit status. You can use both as followscommand1 && comamnd2 if exist status is zero || command3 if exit status is non-zeroHere if command1 is executed successfully then shell will run command2 and if command1 is not successful then command3 is executed. For e.g.$ rm myf && echo File is removed successfully || echo File is not removedIf file (myf) is removed successful (exist status is zero) then "echo File is removed successfully" statement is executed, otherwise "echo File is not removed" statement is executed (since exist status is non-zero)
Function is series of instruction/commands. Function performs particular activity in shell. To define function use followingSyntax:function-name ( ){ command1 command2 ..... ... commandN return}
Where function-name is name of you function, that executes these commands. A return statement will terminate the function. For e.g. Type SayHello() at $ prompt as follows$ SayHello(){echo "Hello $LOGNAME, Have nice computing"return}$ SayHelloHello xxxxx, Have nice computingEdit /etc/bashrc (as root) or ~/.bashrc for executing function at login time.
I/O Redirection and file descriptors$ cat > myf This is my file ^DAbove command send output of cat command to myf file. Redirection can be used to send output to stderr, stdout and can be used to read input for stdin files
[sc@localhost ~]$ rm > tmp1rm: missing operandTry 'rm --help' for more information.[sc@localhost ~]$ cat tmp1[sc@localhost ~]$ rm > tmp1 2>&1[sc@localhost ~]$ cat tmp1rm: missing operandTry 'rm --help' for more information.[sc@localhost ~]$
- Advance Shell Programming- Advance Shell Programming - Function - Function - Array - Array - I/O Redirection and file descriptor - I/O Redirection and file descriptor - Local and Global variables - Local and Global variables - Conditional Execution - Conditional Execution
Creating MakefilesCreating Makefiles
Constituents of a make file* Rules* Variables* Directives - Inclusion of another make - Conditional directives* Comments - Text that follows # symbol is treated as comment - To include # literally, prefix with \
* Explicit rule - explicitly specify the prerequisites for a specific target* Implicit rules - Take advantage of the knowledge make has about known patterns of files (e.g., .c, .cpp .o, .s) - Further classified into pattern rules & suffix rules
VariablesPredefinedo Some commonly used variables predefined by GNU make CC , FLAGS , CFLAGS, LDFLAGS, $@, $^, $<
$@ name of the target foo1.o: foo1.c foo1.h$< name of the first prerequisite gcc -c $<$^ names of all prerequisites
User definedABC:=10 # const assigmentABC=10 # non const assignment
Command line variablesVariables can be defined or redefined from command line$ make$ make VAR1=abc VAR2=xyz
Use override directive to let undesirable command line redefines for a variable be ignoredex.VAR1=dummyVAR2=All: echo VAR1 = $(VAR1) echo VAR2 = override $(VAR2) VAR1=dummy
* Warning function - Very useful for debugging - Can be placed anywhere in a makefile $(warning TARGET not defined) outputs in the format <filename>:<linenum>:TARGET not defined* Shell function - Can be used to invoke any external program today := $(shell date)
Dynamic Loading and UnloadingThis functionality is available under Linux by using the dlopen function. dlopen ("libtest.so", RTLD_LAZY)
The second parameter is a flag that indicates how to bind symbols in the sharedLibrary.Include the <dlfcn.h> header file and link with the -ldl option to pick up thelibdl library.
Both dlopen and dlsym return NULL if they do not succeed. In that event, you can call dlerror (with no parameters) to obtain a human-readable error message describing the problem.
C++ file linking to C shared libraryIf you're writing the code in your shared library in C++, you will probably want to declare those functions and variables that you plan to access elsewhere with theextern "C" linkage specifier. extern "C" void foo ();This prevents the C++ compiler from mangling the function name, which would change the function's name from foo to a different, funny-looking name that encodes extra information about the function. A C compiler will not mangle names; it will use whichever name you give to your function or variable.
The start_kernel() functionThe start_kernel() function
Linux Boot flow
Booting Sequence
1. Tern on2. CPU jump to address of BIOS (0xFFFF0)3. BIOS runs POST (Power-On Self Test)4. Find bootale devices5. Loads and execute boot sector form MBR6. Load OS
BIOS refers to the software code run by a computer when first powered onThe primary function of BIOS is code program embedded on a chip that recognizes and controls various devices that make up the computer.
MBR Master Boot Record- OS is booted from a hard disk, where the Master Boot Record (MBR) contains the primary boot loader- The MBR is a 512-byte sector, located in the first sector on the disk (sector 1 of cylinder 0, head 0)- After the MBR is loaded into RAM, the BIOS yields control to it.
- The first 446 bytes are the primary boot loader, which contains both executable code and error message text - The next sixty-four bytes are the partition table, which contains a record for each of four partitions - The MBR ends with two bytes that are defined as the magic number (0xAA55). The magic number serves as a validation check of the MBR
Boot Loader - Boot loader or kernel loader first decompress kernel zImage file then calls kernel start_kernel() function passing the arguments. - Optional, initial RAM disk - GRUB and LILO are the most popular Linux boot loader.
List of Boot loadersbootman, GRUB, LILO, NTLDR, XOSL, BootX, loadlin, Gujin, Boot Camp, Syslinux, GAG
GRUB Boot Loader - GRUB is an operating system independent boot loader - A multi-boot software packet from GNU - Flexible command line interface - File system access - Support multiple executable format - Support disk less system - Download OS from network
GRUB Boot Process1. The BIOS finds a bootable device (hard disk) and transfers control to the master boot record2. The MBR contains GRUB stage 1. Given the small size of the MBR, Stage 1 just load the next stage of GRUB3. GRUB Stage 1.5 is located in the first 30 kilobytes of hard disk immediately following the MBR. Stage 1.5 loads Stage 2.4. GRUB Stage 2 receives control, and displays to the user the GRUB boot menu (where the user can manually specify the boot parameters).5. GRUB loads the user-selected (or default) kernel into memory and passes control on to the kernel.
GRUB Config File
LILO: LInux LOader - A versatile boot manager that supports: - Choice of Linux kernels. - Boot time kernel parameters. - Booting non-Linux kernels. - A variety of configurations. - Characteristics: - Lives in MBR or partition boot sector. - Has no knowledge of filesystem structure so... - Builds a sector "map file" (block map) to find kernel. - /sbin/lilo - "map installer". - /etc/lilo.conf is lilo configuration file.LILO Boot Loader
Kernel Booting, Init processKernel execute init(pid 1) program, getting init process. - Init is the root/parent of all processes executing on Linux - The first processes that init starts is a script /etc/rc.d/rc.sysinit - Based on the appropriate run-level, scripts are executed to start various processes to run the system and make it functional - Init is responsible for starting system processes as defined in the /etc/inittab file - Init typically will start multiple instances of "getty" which waits for console logins which spawn one's user shell process - Upon shutdown, init controls the sequence and processes for shutdown
Process ID Description0 The Scheduler1 The init process2 kflushd3 kupdate4 kpiod5 kswapd6 mdrecoveryd
6. The File SystemThe File SystemThe File SystemVirtual File system & its roleVirtual File system & its role
Files associated with a processFiles associated with a process
proc file systemproc file system
System callsSystem calls
The File System
Filesystems are containers of files, that are stored, probably in a directory tree, together with attributes, like size, owner, creation date and the like. A filesystem has a type. It defines how things are arranged on the disk. For example, one has the types minix, ext2, reiserfs, iso9660, vfat, hfs.
InodeAn (in-core) inode contains the metadata of a file: its serial number, its protection (mode), its owner, its size, the dates of last access, creation and last modification, etc. It also points to the superblock of the filesystem the file is in, the methods for this file, and the dentries (names) for this file.
User space stat structure provides similar interface
#include <sys/types.h>#include <sys/stat.h>#include <unistd.h>int stat (const char *path, struct stat *buf);int fstat (int fd, struct stat *buf);int lstat (const char *path, struct stat *buf);
truct stat {dev_t st_dev; /*ID of device containing file */ino_t st_ino; /*inode number *mode_t st_mode; /*permissions */nlink_t st_nlink; /*number of hard links */uid_t st_uid; /*user ID of owner */gid_t st_gid; /*group ID of owner */dev_t st_rdev; /*device ID (if special file) */off_t st_size; /*total size in bytes */blksize_t st_blksize; /*blocksize for filesystem I/O */blkcnt_t st_blocks; /* number of blocks allocated */time_t st_atime; /*last access time */time_t st_mtime; /*last modification time */time_t st_ctime; /*last status change time */};
lstat() is identical to stat(), except that if pathname is a symbolic link, then it returns information about the link itself, not the file that it refers to.
fstat() is identical to stat(), except that the file about which information is to be retrieved is specified by the file descriptor fd.
#include#include#include#include<sys/types.h><sys/stat.h><unistd.h><stdio.h>int main (int argc, char *argv[]){struct stat sb;int ret;if (argc < 2) {fprintf (stderr,"usage: %s <file>", argv[0]);return 1;}ret = stat (argv[1], &sb);if (ret) {perror ("stat");return 1;}printf ("%s is %ld bytes",argv[1], sb.st_size);return 0;}
The following mask values are defined for the file type of the st_mode field:
S_IFMT 0170000 bit mask for the file type bit fieldS_IFSOCK 0140000 socketS_IFLNK 0120000 symbolic linkS_IFREG 0100000 regular fileS_IFBLK 0060000 block deviceS_IFDIR 0040000 directoryS_IFCHR 0020000 character deviceS_IFIFO 0010000 FIFO
Thus, to test for a regular file (for example), one could write:stat(pathname, &sb);if ((sb.st_mode & S_IFMT) == S_IFREG) {/* Handle regular file */}
#include "apue.h"intmain(int argc, char *argv[]){int i;struct stat buf;char *ptr;for (i = 1; i < argc; i++) { printf("%s: ", argv[i]); if (lstat(argv[i], &buf) < 0) { err_ret("lstat error"); continue; }
if (S_ISREG(buf.st_mode)) ptr = "regular"; else if (S_ISDIR(buf.st_mode)) ptr = "directory"; else if (S_ISCHR(buf.st_mode)) ptr = "character special"; else if (S_ISBLK(buf.st_mode)) ptr = "block special"; else if (S_ISFIFO(buf.st_mode)) ptr = "fifo"; else if (S_ISLNK(buf.st_mode)) ptr = "symbolic link"; else if (S_ISSOCK(buf.st_mode)) ptr = "socket"; else ptr = "** unknown mode **"; printf("%s", ptr);} exit(0);}
Printing all fields
# include <fcntl.h># include <stdio.h># include <time.h># include <sys/types.h># include<sys/stat.h>
main(){struct stat fst;struct tm *Time;int fd;fd = open("testfile",O_RDONLY);fstat(fd,&fst);printf("Listing the detailsd of the file");printf(" The inode no of the file is %d",fst.st_ino);printf(" The device ID of the file is %d",fst.st_dev);printf(" The block size of the file system is %d",fst.st_blksize);printf("The user ID is %d",fst.st_uid);printf("The group ID is %d",fst.st_gid);printf("Access time is %d",fst.st_atime);printf("creation time is %d",fst.st_ctime);printf("modification time is %d",fst.st_mtime);Time = localtime(&fst.st_atime);
PermissionsWhile the stat calls can be used to obtain the permission values for a given file, two other system calls set those values:#include <sys/types.h>#include <sys/stat.h>int chmod (const char *path, mode_t mode);int fchmod (int fd, mode_t mode);Example chmod
int ret;/** Set 'map.png' in the current directory to* owner-readable and -writable. This is the* same as 'chmod 600 ./map.png'.*/ret = chmod ("./map.png", S_IRUSR | S_IWUSR);if (ret)perror ("chmod");
OwnershipIn the stat structure, the st_uid and st_gid fields provide the file's owner and group, respectively. Three system calls allow a user to change those two values:#include <sys/types.h>#include <unistd.h>int chown (const char *path, uid_t owner, gid_t group);int lchown (const char *path, uid_t owner, gid_t group);int fchown (int fd, uid_t owner, gid_t group);
struct group *gr;int ret;/** getgrnam() returns information on a group* given its name.*/gr = getgrnam ("officers");if (!gr) {/* likely an invalid group */perror ("getgrnam");return 1;}/* set manifest.txt's group to 'officers' */ret = chown("manifest.txt", -1, gr->gr_gid);if (ret)perror ("chown");
Reading a Directory's ContentsA directory is represented by DIR object
#include <sys/types.h>#include <dirent.h>DIR * opendir (const char *name);To obtain the file descriptor behind a given directory stream:#define _BSD_SOURCE /* or _SVID_SOURCE */#include <sys/types.h>#include <dirent.h>int dirfd (DIR *dir);
Reading from a directory streamOnce you have created a directory stream with opendir() , your program can begin reading entries from the directory. To do this, use readdir() , which returns entries one by one from a given DIR object:
A successful call to readdir() returns the next entry in the directory represented by dir . The dirent structure represents a directory entry. Defined in <dirent.h> , onLinux, its definition is:Applications successively invoke readdir() , obtaining each file in the directory, until they find the file they are searching for or until the entire directory is read, at which time readdir() returns NULL .
struct dirent {ino_t d_ino; /* inode number */off_t d_off; /* offset to the next dirent */unsigned short d_reclen; /* length of this record */unsigned char d_type; /* type of file */char d_name[256]; /* filename */};
To close the DIR*int closedir (DIR *dir);
/** find_file_in_dir - searches the directory 'path' for a* file named 'file'.** Returns 0 if 'file' exists in 'path' and a nonzero* value otherwise.*/int find_file_in_dir (const char *path, const char *file){ struct dirent *entry; int ret = 1; DIR *dir; dir = opendir (path); errno = 0; while ((entry = readdir (dir)) != NULL) { if (strcmp(entry->d_name, file) == 0) {
System calls for reading directory contentsThe previously discussed functions for reading the contents of directories are standar- dized by POSIX and provided by the C library. Internally, these functions use one of two system calls, readdir() and getdents() , which are provided here for completeness:
#include <errno.h>/** Not defined for user space: need to* use the _syscall3() macro to access.*/int readdir (unsigned int fd,struct dirent *dirp,unsigned int count);int getdents (unsigned int fd,struct dirent *dirp,unsigned int count);
LinksA link is essentially just a name in a list (a directory) that points at an inode-there would appear to be no reason why multiple links to the same inode could not exist. That is, a single inode (and thus a single file) could be referenced from, say, both /etc/customs and /var/run/ledger.
Hard LinkFiles can have 0, 1, or many links. Most files have a link count of 1-that is, they are pointed at by a single directory entry-but some files have 2 or even more links. These are called hard link.
The link() system call, one of the original Unix system calls, and now standardized by POSIX, creates a new link for an existing file:
#include <unistd.h>int link (const char *oldpath, const char *newpath);
int ret;/** create a new directory entry,* '/home/kidd/privateer', that points at* the same inode as '/home/kidd/pirate'*/ret = link ("/home/kidd/privateer", /home/kidd/pirate");if (ret)perror ("link");
Symbolic LinksSymbolic links, also known as symlinks or soft links, are similar to hard links in that both point at files in the filesystem. The symbolic link differs, however, in that it is not merely an additional directory entry, but a special type of file altogether. This special file contains the pathname for a different file, called the symbolic link's target. At runtime, on the fly, the kernel substitutes this pathname for the symbolic link's pathname (unless using the various l versions of system calls, such as lstat() , which operate on the link itself, and not the target).Soft links, unlike hard links, can span filesystems also called dangling softlink.
int ret;/** create a symbolic link,* '/home/kidd/privateer', that* points at '/home/kidd/pirate'*/ret = symlink ("/home/kidd/privateer", "/home/kidd/pirate");
- The Linux kernel implements the concept of Virtual File System (VFS, originally Virtual Filesystem Switch), so that it is (to a large degree) possible to separate actual "low-level" filesystem code from the rest of the kernel.- The VFS is more of an Interface rather than an actual complete file system.- An important role of the VFS is to perform what is called "Standard Actions". For example, the function lseek() is not actually implemented by any file system, as the function of lseek() is provided by a "standard action" of VFS.- Two important native filesystems in the Linux environment are ext2 and the proc file system.
Four main objects in VFS API: superblock, dentries, inodes, files- The kernel keeps track of files using in-core inodes ("index nodes"), usually derived by the low-level filesystem from on-disk inodes. - A file may have several names, and there is a layer of dentries ("directory entries") that represent pathnames, speeding up the lookup operation. - Several processes may have the same file open for reading or writing, and file structures contain the required information such as the current file position. - Access to a filesystem starts by mounting it. This operation takes a filesystem type (like ext2, vfat, iso9660, nfs) and a device and produces the in-core superblock that contains the information required for operations on the filesystem; a third ingredient, the mount point, specifies what pathname refers to the root of the filesystem.
Auxiliary objects We have filesystem types, used to connect the name of the filesystem to the routines for setting it up (at mount time) or tearing it down (at umount time). - A struct vfsmount represents a subtree in the big file hierarchy - basically a pair (device, mountpoint). - A struct nameidata represents the result of a lookup.
- A struct address_space gives the mapping between the blocks in a file and blocks on disk. It is needed for I/O.
Filesystem type registrationThe struct is of type struct file_system_type . Here the 2.2.17 version:struct file_system_type {const char *name;int fs_flags;struct super_block *(*read_super) (struct super_block *, void *, int);struct file_system_type *next;};
The call register_filesystem() hangs this struct in the chain with head file_systems , and unregister_filesystem() removes it again.Accesses to this chain are protected by the spinlock file_systems_lock . There are no other writers. The main reader is of course the mount() system call (via get_fs_type() ). Other readers are get_filesystem_list() used for /proc/filesystems , andthe sysfs system call.The code is in fs/filesystems.c .
(In 2.4 there was no kill_sb() , and the role of get_sb() was taken by read_super() . The final parameter of get_sb() and the lock_class_key fields are present since 2.6.18.)
nameHere the filesystem type gives its name ("tue"), so that the kernel can find it when someone does mount -t tue /dev/foo /dir
get_sbAt mount time the kernel calls the fstype->get_sb() routine that initializes things and sets up a superblock. Typically this is a 1-line routine that calls one of get_sb_bdev , get_sb_single , get_sb_nodev , get_sb_pseudo
kill_sbAt umount time the kernel calls the fstype->kill_sb() routine to clean up. Typically one of kill_block_super , kill_anon_super , kill_litter_super .
Example of the use of owner - sysfsThere exists a strange SYSV system call sysfs that will return (i) a sequence number given a filesystem type, and (ii) a filesystem type given a sequence number, and (iii) the total number of filesystem types registered now. This call is not supported by libc or glibc.These sequence numbers are rather meaningless since they may change any moment. But this means that one can get a snapshot of the list of filesystem types without looking at /proc/filesystems . For example, the program
/* define the 3-arg version of sysfs() */static _syscall3(int,sysfs,int,option,unsigned int,fsindex,char *,buf);/* define the 1-arg version of sysfs() */static int sysfs1(int i) {return sysfs(i,0,NULL);}
main(){int i, tot;char buf[100];/* how long is a filesystem type name?? */tot = sysfs1(3);if (tot == -1) {perror("sysfs(3)");
exit(1);for (i=0; i<tot; i++) {if (sysfs(2, i, buf)) {perror("sysfs(2)");exit(1);}printf("%2d: %s", i, buf);}Return 0;
might give output like0:ext21:minix2:romfs3:msdos4:vfat5:proc6:nfs7:smbfs8:iso9660
MountingThe mount system call attaches a filesystem to the big file hierarchy at some indicated point. Ingredients needed:(i) a device that carries the filesystem (disk, partition, floppy, CDROM, SmartMedia card, ...), (ii) a directory where the filesystem on that device must be attached, (iii) a filesystem type.
The code for sys_mount() is found in fs/namespace.c and fs/super.c . The connection with the filesystem type name is made in do_kern_mount() :
struct file_system_type *type = get_fs_type(fstype);struct super_block *sb;if (!type)return ERR_PTR(-ENODEV);sb = type->get_sb(type, flags, name, data);and this is the only call of the get_sb() routine.
The code for sys_umount() is found in fs/namespace.c and fs/super.c . The counterpart of the just quoted code is the cleanup in deactivate_super() :fs->kill_sb(s);and this is the only call of the kill_sb() routine.
The superblockThe superblock gives global information on a filesystem: the device on which it lives, its block size, its type, the dentry of the root of the filesystem, the methods it has, etc., etc.struct super_block {dev_t s_dev;unsigned long s_blocksize;struct file_system_type *s_type;struct super_operations *s_op;struct dentry *s_root;...}struct super_operations {struct inode *(*alloc_inode)(struct super_block *sb);void (*destroy_inode)(struct inode *);void (*read_inode) (struct inode *);void (*dirty_inode) (struct inode *);void (*write_inode) (struct inode *, int);void (*put_inode) (struct inode *);void (*drop_inode) (struct inode *);void (*delete_inode) (struct inode *);void (*put_super) (struct super_block *);void (*write_super) (struct super_block *);int (*sync_fs)(struct super_block *sb, int wait);void (*write_super_lockfs) (struct super_block *);void (*unlockfs) (struct super_block *);int (*statfs) (struct super_block *, struct statfs *);int (*remount_fs) (struct super_block *, int *, char *);void (*clear_inode) (struct inode *);void (*umount_begin) (struct super_block *);int (*show_options)(struct seq_file *, struct vfsmount *);};
This is enough to get started: the dentry of the root directory tells us the inode of this root directory (and in particular its i_ino ), and sb->s_op->read_inode(inode) will read this inode from disk. Now inode->i_op->lookup() allows us to find names in the root directory, etc.Each superblock is on six lists, with links through the fields s_list , s_dirty , s_io , s_anon , s_files , s_instances , respectively.
The super_blocks listAll superblocks are collected in a list super_blocks with links in the fields s_list . This list is protected by the spinlock sb_lock . The main use is in super.c:get_super() or user_get_super() to find the superblock for a given block device. (Bothroutines are identical, except that one takes a bdev , the other a dev_t .) This list is also used various places where all superblocks must be sync'ed or all dirty inodes must be written out.
<b.The fs_supers listAll superblocks of a given type are collected in a list headed by the fs_supers field of the struct filesystem_type, with links in the fields s_instances . Also this list is protected by the spinlock sb_lock .
The file listAll open files belonging to a given superblock are chained in a list headed by the s_files field of the superblock, with links in the fields f_list of the files. These lists are protected by the spinlock files_lock . This list is used for example in fs_may_remount_ro() to check that there are no files currently open for writing.
The list of anonymous dentriesNormally, all dentries are connected to root. However, when NFS filehandles are used this need not be the case. Dentries that are roots of subtrees potentially unconnected to root are chained in a list headed by the s_anon fieldof the superblock, with links in the fields d_hash . These lists are protected by the spinlock dcache_lock . They are grown in dcache.c:d_alloc_anon() and shrunk in super.c:generic_shutdown_super() .
The inode lists s_dirty, s_ioLists of inodes to be written out. These lists are headed at the s_dirty (resp. s_io ) field of the superblock, with links in the fields i_list . These lists are protected by the spinlock inode_lock . See fs/fs-writeback.c .
InodesAn (in-core) inode contains the metadata of a file: its serial number, its protection (mode), its owner, its size, the dates of last access, creation and last modification, etc. It also points to the superblock of the filesystem the file is in, the methods for this file, and the dentries (names) for this file.struct inode {unsigned long i_ino;umode_t i_mode;uid_t i_uid;gid_t i_gid;kdev_t i_rdev;loff_t i_size;struct timespec i_atime;struct timespec i_ctime;struct timespec i_mtime;struct super_block *i_sb;
The dentries encode the filesystem tree structure, the names of the files. Thus, the main parts of a dentry are the inode (if any) that belongs to it, the name (the final part of the pathname), and the parent (the name of the containing directory). There are also the superblocks, the methods, a list of subdirectories, etc.struct dentry {struct inode *d_inode;struct dentry *d_parent;struct qstr d_name;struct super_block *d_sb;struct dentry_operations *d_op;struct list_head d_subdirs;...}struct dentry_operations {int (*d_revalidate)(struct dentry *, int);int (*d_hash) (struct dentry *, struct qstr *);int (*d_compare) (struct dentry *, struct qstr *, struct qstr *);int (*d_delete)(struct dentry *);void (*d_release)(struct dentry *);void (*d_iput)(struct dentry *, struct inode *);};
Each dentry is on five lists, with links through the fields d_hash , d_lru , d_child , d_subdirs , d_alias .
FilesFile structures represent open files, that is, an inode together with a current (reading/writing) offset. The offset can be set by the lseek() system call. Note that instead of a pointer to the inode we have a pointer to the dentry -that means that the name used to open a file is known. In particular system calls like getcwd() are possible.
struct file {struct dentry *f_dentry;struct vfsmount *f_vfsmnt;struct file_operations *f_op;mode_t f_mode;loff_t f_pos;struct fown_struct f_owner;unsigned int f_uid, f_gid;unsigned long f_version;...}
Here the f_owner field gives the owner to use for async I/O signals.
Each file is in two lists, with links through the fields f_list , f_ep_links .
f_listThe list with links through f_list was discussed above. It is the list of all files belonging to a given superblock. There is a second use: the tty driver collects all files that are opened instances of a tty in a list headed by tty->tty_files with links through the file field f_list . Conversely, these files point back at the tty via their field private_data .(This field private_data is also used elsewhere. For example, the proc code uses it to attach a struct seq_file to a file.)
The event poll listAll event poll items belonging to a given file are collected in a list with head f_ep_links , protected by the file fieldf_ep_lock . (For event poll stuff, see epoll_ctl(2).)
struct vfsmountA struct vfsmount describes a mount. The definition lives in mount.h :
struct vfsmount {struct list_head mnt_hash;struct vfsmount *mnt_parent; /* fs we are mounted on */struct dentry *mnt_mountpoint; /* dentry of mountpoint */struct dentry *mnt_root;/* root of the mounted tree */struct super_block *mnt_sb;/* pointer to superblock */struct list_head mnt_mounts; /* list of children, anchored here */struct list_head mnt_child;
Virtual File system & its roleVirtual File system & its role
Files associated with a processFiles associated with a processproc file systemproc file system
System callsSystem calls
/* and going through their mnt_child */atomic_t mnt_count;int mnt_flags;char *mnt_devname;/* Name of device e.g. /dev/dsk/hda1 */struct list_head mnt_list;};
fs_structA struct fs_struct determines the interpretation of pathnames referred to by a process (and also, somewhat illogically, contains the umask). The typical reference is current->fs . The definition
Semantics of root and pwd are clear. Remains to discuss altroot .
There are two normal cases for handling the descriptors after a fork.1. The parent waits for the child to complete. In this case, the parent does not need to do anything with its descriptors. When the child terminates, any of the shared descriptors that the child read from or wrote to will have their file offsets updated accordingly.2. Both the parent and the child go their own ways. Here, after the fork, the parent closes the descriptors that it doesn't need, and the child does the same thing. This way, neither interferes with the other's open descriptors. This scenario is often found with network servers.
Besides the open files, numerous other properties of the parent are inherited by the child:* Real user ID, real group ID, effective user ID, and effective group ID* Supplementary group IDs* Process group ID
* Session ID* Controlling terminal* The set-user-ID and set-group-ID flags* Current working directory* Root directory* File mode creation mask* Signal mask and dispositions* The close-on-exec flag for any open file descriptors* Environment* Attached shared memory segments* Memory mappings* Resource limitsThe differences between the parent and child are* The return values from fork are different.* The process IDs are different.* The two processes have different parent process IDs: the parent process ID of the child is the parent; the parent process ID of the parent doesn't change.* The child's tms_utime, tms_stime, tms_cutime, and tms_cstime values are set to 0* File locks set by the parent are not inherited by the child.* Pending alarms are cleared for the child.* The set of pending signals for the child is set to the empty set.
/proc is a window into the running Linux kernel. Files in the /proc file system don't corre-spond to actual files on a physical device. Instead, they are magic objects that behave like files but provide access to parameters, data structures, and statistics in the kernel. The "contents" of these files are not always fixed blocks of data, as ordinary file contents are. Instead, they are generated on the fly by the Linux kernel when you readfrom the file.You can also change the configuration of the running kernel by writing to certain files in the /proc file system.Let's look at an example: % ls -l /proc/version -r--r--r-- 1 root root 0 Jan 17 18:09 /proc/versionSize is 0 as this generated by kernel
$mountnone on /proc type proc (rw)
none reveals that is not a file systemon disk.
Extracting Information from /proc#include <stdio.h>#include <string.h>/* Returns the clock speed of the system's CPU in MHz, as reported by /proc/cpuinfo. On a multiprocessor machine, returns the speed of the first CPU. On error returns zero. */float get_cpu_clock_speed (){ FILE* fp; char buffer[1024]; size_t bytes_read; char* match; float clock_speed; /* Read the entire contents of /proc/cpuinfo into the buffer. */ fp = fopen ("/proc/cpuinfo", "r"); bytes_read = fread (buffer, 1, sizeof (buffer), fp); fclose (fp); /* Bail if read failed or if buffer isn't big enough. */ if (bytes_read == 0 || bytes_read == sizeof (buffer)) return 0; /* NUL-terminate the text. */ buffer[bytes_read] = '\0'; /* Locate the line that starts with "cpu MHz". */
match = strstr (buffer, "cpu MHz"); if (match == NULL) return 0; /* Parse the line to extract the clock speed. */
Virtual File system & its roleVirtual File system & its role
Files associated with a processFiles associated with a process
proc file systemproc file system
System callsSystem calls
#include <fcntl.h>int fcntl(int fd, int cmd);int fcntl(int fd, int cmd, long arg);int fcntl(int fd, int cmd, struct flock *lock); Returns: depends on cmd if OK (see following), -1 on error
For record locking cmd is F_GETLK, F_SETLK or F_SETLKWstruct flock { short l_type; /* F_RDLCK, F_WRLCK, or F_UNLCK */ short l_whence; /* SEEK_SET, SEEK_CUR, or SEEK_END */ off_t l_start; /* offset in bytes, relative to l_whence */ off_t l_len; /* length, in bytes; 0 means lock to EOF */ pid_t l_pid; /* returned with F_GETLK */};
# include <stdio.h># include<fcntl.h>Main() {int fd, pid, retval;struct flock lockc, lockp;fd = open("testlock",O_WRONLY);lockp.l_type = F_WRLCK;lockp.l_whence = 0;lockp.l_start = 10;lockp.l_len = 15;if((retval = fcntl(fd, F_SETLK,&lockp)) == -1) // Parent is locking the file perror("parent write lock");printf("retval is %d",retval);if((pid = fork()) == 0){ lockc.l_type = F_WRLCK; lockc.l_whence = 0; lockc.l_start = 40; lockc.l_len = 55; //Child is locking the file if((retval = fcntl(fd, F_SETLK,&lockc)) == -1)perror("Child write lock"); printf("retval is %d",retval); printf("Child Process over"); } else { sleep(3); lockp.l_type = F_UNLCK; lockp.l_whence = 0; lockp.l_start = 10; lockp.l_len = 15; // Parent is unlocking the file if((retval = fcntl(fd, F_SETLK,&lockp)) == -1)perror("parent write lock"); printf("Parent Process over"); }}
Both are trying to make READ LOCK,
Successfull can try at WRITE LOCK# include <stdio.h># include<fcntl.h>
printf("retval is %d",retval);printf("process %d has locked this section",lockc.l_pid);printf("lock type %d",lockc.l_type);printf("whence %d",lockc.l_whence);printf("start %d",lockc.l_start);printf("lenth is %d",lockc.l_len);
Various directories and files in /proc1)/proc/<number> # for processes running2)/proc/self #for current process3)/proc/cpuinfo4)/proc/devices5)/proc/pci #summary of devices connected to pci bus6)/proc/tty/driver/serail #serial ports7)/proc/sys/kernel #kernel information8)/proc/meminfo #system's memory usage9)/proc/filesystem #filesystems mounted in kernel10) /proc/mount #all mounted filesytems
1. fcntl Record Locking
2. lockf
SYNOPSIS#include <unistd.h>int lockf(int fd, int cmd, off_t len);- apply, test or remove a POSIX lock on an open file
DEADLOCK, avoid deadlock with F_TLOCK in child lockf() call
# include <fcntl.h># include <unistd.h>main(){int fd, retvelue;pid_t pid;
if((pid = fork()) == 0){ if(lockf(fd,F_LOCK,10) == -1) //child blocked dead lock....! perror("lockf failed"); puts("The child process over"); } else{ wait(0); printf("Process %d is over",getpid());}}
3. access#include <unistd.h>int access(const char *pathname, int mode);
access() checks whether the process would be allowed to read, write or test for existence of the file (or other file system object) whose name is pathname. If pathname is a symbolic link permissions of the file referred to by this symbolic link are tested.mode is a mask consisting of one or more of R_OK, W_OK, X_OK and F_OK.R_OK, W_OK and X_OK request checking whether the file exists and has read, write and execute permissions, respectively. F_OK just requests checking for the existence of the file.#include<errno.h>#include<stdio.h>#include<unistd.h>int main(int argc, char* argv[]) {char* path = argv[1];int ret;ret = access(path,F_OK); // check for file existsif(ret == 0)printf(" %s file exists",path);}
int open(const char *pathname, int flags);int open(const char *pathname, int flags, mode_t mode);int creat(const char *pathname, mode_t mode);
5. dup, dup2#include <unistd.h>
int dup(int oldfd);int dup2(int oldfd, int newfd);
dup() and dup2() create a copy of the file descriptor oldfd.
After a successful return from dup() or dup2(), the old and new file descriptors may be used interchangeably. They refer to the same open file descriptor thus share file offset and file status flags; for example, if the file offset is modified by usinglseek(2) on one of the descriptors, the offset is also changed for the other.
The two descriptors do not share file descriptor flags (the close-on-exec flag). The close- on-exec flag (FD_CLOEXEC;
dup() uses the lowest-numbered unused descriptor for the new descriptor.
dup2() makes newfd be the copy of oldfd, closing newfd first if necessary.
# include <stdio.h># include <stdlib.h># include <fcntl.h># include <sys/stat.h>main(){int fd, newfd;
if((fd = creat("testfile",0666)) == -1){ perror("Creat failed"); exit(0);}printf("Descriptor is %d",fd);newfd= dup2(fd,5);//try with stdoutprintf("New Descriptor is %d",newfd);printf("The PID is %d",getpid());for(;;);close(fd);close(newfd);}
Using fcntl to create a copy# include <stdio.h># include <fcntl.h>main(){int fd,fd1, newfd;fd = open("temp",O_RDWR | O_CREAT ,0666);printf("The file discriptor is %d",fd);
void *mmap(void *start, size_t length, int prot, int flags, int fd, off_t offset);int munmap(void *start, size_t length);
The mmap() function asks to map length bytes starting at offset offset from the file (or other object) specified by the file descriptor fd into memory, preferably at address start.This latter address is a hint only, and is usually specified as 0. The actual place where the object is mapped is returned by mmap().
The prot argument describes the desired memory protection (and must not conflict with the open mode of the file).It is either PROT_NONE or is the bitwise OR of one or more of the other PROT_* flags.PROT_EXEC Pages may be executed.PROT_READ Pages may be read.PROT_WRITE Pages may be written.PROT_NONE Pages may not be accessed.
The flags parameter specifies the type of the mapped object, mapping options and whether modifications made to the mapped copy of the page are private to the process or are to be shared with other references. It has bits
MAP_FIXED Do not select a different address than the one specified. If the memory region specified by start.MAP_SHARED Share this mapping with all other processes that map this object. Storing to the region is equivalent to writing to the file.MAP_PRIVATE Create a private copy-on-write mapping. Stores to the region do not affect the original file. It is unspecified whether changes made to the file after the mmap() call are visible in the mapped region.
// GPIO setup macros. Always use INP_GPIO(x) before using OUT_GPIO(x) or SET_GPIO_ALT(x,y)#define INP_GPIO(g) *(gpio+((g)/10)) &= ~(7<<(((g)%10)*3))#define OUT_GPIO(g) *(gpio+((g)/10)) |= (1<<(((g)%10)*3))#define SET_GPIO_ALT(g,a) *(gpio+(((g)/10))) |= (((a)<=3?(a)+4:(a)==4?3:2)<<(((g)%10)*3))
//#define GPIO_SET *(gpio+7) // sets bits which are 1 ignores bits which are 0//#define GPIO_CLR *(gpio+10) // clears bits which are 1 ignores bits which are 0//temporarily introduced for pint 4#define GPIO_SET *(volatile unsigned int*)(gpio+7) |= 0x10 // sets bits which are 1 ignores bits which are 0#define GPIO_CLR *(volatile unsigned int*)(gpio+10)|= 0x10 // clears bits which are 1 ignores bits which are 0
#define GPIO_READ(g) *(gpio + 13) &= (1<<(g))
void setup_io();
int main(int argc, char **argv){int g,rep;
// Set up gpi pointer for direct register accesssetup_io();// set GPIO pin 7 as output// INP_GPIO(7); // must use INP_GPIO before we can use OUT_GPIO INP_GPIO(4); // must use INP_GPIO before we can use OUT_GPIO// OUT_GPIO(7); OUT_GPIO(4);
// flash LED on and off 10 times for (rep = 0; rep < 10; rep++) {// GPIO_SET = (1 << 7); printf("setting"); GPIO_SET; sleep(1);// GPIO_CLR = (1 << 7);
printf("resetting"); GPIO_CLR; sleep(1); } return 0;} // main// Set up a memory regions to access GPIOvoid setup_io(){/* open /dev/mem */if ((mem_fd = open("/dev/mem", O_RDWR|O_SYNC) ) < 0) { printf("can't open /dev/mem"); exit(-1);}/* mmap GPIO */gpio_map = mmap( NULL, //Any adddress in our space will do BLOCK_SIZE, //Map length PROT_READ|PROT_WRITE, // Enable reading & writting to mapped memory MAP_SHARED, //Shared with other processes mem_fd, //File to map GPIO_BASE //Offset to GPIO peripheral ); close(mem_fd); //No need to keep mem_fd open after mmap if (gpio_map == MAP_FAILED) { printf("mmap error %d", (int)gpio_map); //errno also set! exit(-1); } // Always use volatile pointer! gpio = (volatile unsigned *)gpio_map;} // setup_io()
mount -a [-fFnrsvw] [-t vfstype] [-O optlist]mount [-fnrsvw] [-o options [,...]] device | dirmount [-fnrsvw] [-t vfstype] [-o options] device dir
Mount a file system
All files accessible in a Unix system are arranged in one big tree, the file hierarchy, rooted at /. These files can be spread out over several devices. The mount command serves to attach the file system found on some device to the big file tree. Conversely, the umount(8) command will detach it again.
The standard form of the mount command, is mount -t type device dir#include<sys/mount.h>#include<stdio.h>main(){int fd;fd = mount("/dev/fd0","/mnt/floppy","ext2",MS_NOSUID,NULL);if(fd != -1)printf(" Floppy mounted successfully");printf(" Changing Directory to floppy");
ssize_t pwrite(int fd, const void *buf, size_t count, off_t offset); pread, pwrite - read from or write to a file descriptor at a given offset
pread() reads up to count bytes from file descriptor fd at offset offset (from the start of the file) into the buffer starting at buf. The file offset is not changed.pwrite() writes up to count bytes from the buffer starting at buf to the file descriptor fd at offset offset. The file offset is not changed.The file referenced by fd must be capable of seeking.
char ch[1024];if((fd1 = open("/etc/passwd",O_RDONLY)) == -1) perror("Un able to open source");
n = pread(fd1,ch,100,100);printf(ch);close(fd1);
if((fd2 = open("newfile",O_WRONLY,0666)) == -1){ perror("Un able to open target"); exit(1);}pwrite(fd2,"XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",40,500); pwrite(fd2,"YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY",40,500);
7. Process ManagementProcess DefinedProcess DefinedProcess Descriptor Structures in the kernelProcess Descriptor Structures in the kernel
Process StatesProcess States
Process SchedulingProcess Scheduling
Process CreationProcess Creation
System calls related to process managementSystem calls related to process management
Day 3 Morning
7. Process ManagementProcess DefinedProcess Defined
Process Descriptor Structures in the kernelProcess Descriptor Structures in the kernelProcess StatesProcess States
Process SchedulingProcess Scheduling
Process CreationProcess Creation
System calls related to process managementSystem calls related to process management
- A Process is a file in file system.- A Process is object code in execution-active, alive, running programs- Processes are more than just assembly language; they consist of data, resources, state,and a virtualized computer.- A process uses many resources like memory space, CPU, files, etc., during its lifetime.- A Process contains threads, contained in a process group and has parent Process. A process group contained in Session. Session has tty, terminal attached to it where at most one process group (Foreground process group) attached to the terminal. Rest detached process groups are background process group.
- A Process is sub program that is scheduled, by kernel, to the process for execution. Main thread in a process is actual entity that get scheduled to the CPU. Kernel maintains separate copy of registers and various other data structure for a process. - In multi processing environment register values in context of process gets loaded to actual register when execution resumes.
- A process is an entry in task vector, and is an instance of task_struct.
Process Structure* Every process is represented by a task_struct data structure.* This structure is quite large and complex.* When ever a new process is created a new task_struct structure is created by the kernel and the complete process information is maintained by the structure.* When a process is terminated, the corresponding structure is removed.* Uses doubly linked list data structure.
7. Process ManagementProcess DefinedProcess Defined
Process Descriptor Structures in the kernelProcess Descriptor Structures in the kernel
Process StatesProcess StatesProcess SchedulingProcess Scheduling
Process CreationProcess Creation
System calls related to process managementSystem calls related to process management
* Solaris uses proc structure to manage processes.
task_struct task[256];
struct task_struct {volatile long state; /* -1 unrunnable, 0 runnable, >0 stopped */void *stack;atomic_t usage;unsigned int flags; /* per process flags, defined below */unsigned int ptrace;
#ifdef CONFIG_SMPstruct llist_node wake_entry;int on_cpu;struct task_struct *last_wakee;unsigned long wakee_flips;unsigned long wakee_flip_decay_ts;
int wake_cpu;#endifint on_rq;
int prio, static_prio, normal_prio;unsigned int rt_priority;const struct sched_class *sched_class;struct sched_entity se;struct sched_rt_entity rt;
#ifdef CONFIG_PREEMPT_NOTIFIERS/* list of struct preempt_notifier: */struct hlist_head preempt_notifiers;#endif/** fpu_counter contains the number of consecutive context switches* that the FPU is used. If this is over a threshold, the lazy fpu* saving becomes unlazy to save the trap. This is an unsigned char* so that after 256 times the counter wraps and the behavior turns* lazy again; this to deal with bursty apps that only use FPU for* a short time*/unsigned char fpu_counter;#ifdef CONFIG_BLK_DEV_IO_TRACE unsigned int btrace_seq;#endifunsigned int policy;int nr_cpus_allowed;cpumask_t cpus_allowed;
struct mm_struct *mm, *active_mm;#ifdef CONFIG_COMPAT_BRK unsigned brk_randomized:1;#endif#if defined(SPLIT_RSS_COUNTING) struct task_rss_stat rss_stat;#endif/* task state */int exit_state;int exit_code, exit_signal;int pdeath_signal; /* The signal sent when the parent dies */unsigned int jobctl; /* JOBCTL_*, siglock protected */....
In order to run unix, the computer hardware must provide two modes of execution:- kernel mode- user mode
Some computers have more than two execution modes.- eg: Intel processor. It has four modes of execution.
Each process has virtual address space; references to virtual memory are translated to physical memory locations using set of address translation maps.
7. Process ManagementProcess DefinedProcess Defined
Process Descriptor Structures in the kernelProcess Descriptor Structures in the kernel
Process StatesProcess States
Process SchedulingProcess SchedulingProcess CreationProcess Creation
System calls related to process managementSystem calls related to process management
Scheduling (Kernel perspective)* The kernel keeps track of a processes creation time as well as the CPU time that it consumes during its lifetime.* This clock is the combination of software and hardware setup.* It is independent of CPU frequency.* A clock tick unit is Jiffy. System's interactive response depends on the clock frequency.- For example: the jiffy value may be 10ms (100Hz) or 1ms (1000Hz) depending on implementation
Each clock tick, the kernel updates the amount of time that the current process has spent in system and in user mode.* Linux also supports process specific interval timers, processes can use system calls to set up timers to send signals to themselves when the timers expire. These timers can be single-shot or periodic timers.
Process Scheduling* The job of a scheduler is to select the most deserving process to run out of all of the runnable processes in the run queue.* Implement fair scheduling to avoid starvation* Implement suitable scheduling policy* Updates state of the processes in every clock tick (jiffy)
Policy - FIFO, Round Robin, Shortest Job First,FILO, Priority based etc.* Priority - higher priority process will be allowed to run.* Pre-emptive and Non-preemptive scheduling.* rt_priority - many UNIX variants support real time scheduling priority range.
Priority RangeScheduling priorities (in a typical UNIX system)have integer valuesbetween 0 and 127, with smaller numbersmeaning higher priorities.* For Solaris: 0 to 169* For Linux: 0 to 139
Process Scheduling: Linux* The Linux kernel implements two separate priority ranges.* The first is the nice value, a number from -20 to 19 with a default of zero. Larger nice values correspond to a lower priority.* A process with a nice value of -20 receives the maximum time slice, whereas a process with a nice value of 19 receives the minimum time slice.* Time slice: minimum -10ms, default -150ms and maximum
7. Process ManagementProcess DefinedProcess Defined
Process Descriptor Structures in the kernelProcess Descriptor Structures in the kernel
Process StatesProcess States
Process SchedulingProcess Scheduling
Process CreationProcess CreationSystem calls related to process managementSystem calls related to process management
- 300ms
* The second range is the real-time priority* By default, it ranges from zero to 99.* All real time processes are at a higher priority than normal processes.* Linux implements real-time priorities in accordance with POSIX.
* Linux provides two real-time scheduling policies, SCHED_FIFO and SCHED_RR* The normal non real-time scheduling policy is SCHED_OTHER* SCHED_FIFO implements without time slices- so it can run until it blocks or explicitly yields the processor.* SCHED_RR is identical to SCHED_FIFO except that each process can only run until it exhausts a predetermined time Slice.
Scheduler System Calls nice() Set a process's nice value sched_setscheduler() Set a process's scheduling policy sched_getscheduler() Get a process's scheduling policy sched_setparam() Set a process's real-time priority sched_getparam() Get a process's real-time priority sched_get_priority_max() Get the maximum real-time priority sched_get_priority_min() Get the minimum real-time priority sched_rr_get_interval() Get a process's timeslice value
Process CreationParent process creates children processes, which, in turn create other processes, forming a tree of processes.Resource sharing Parent and children share all resources. Children share subset of parent's resources. Parent and child share no resources.Execution Parent and children execute concurrently. Parent waits until children terminate.Address space Child duplicate of parent. Child has a program loaded into it.
* All statements after the fork() system call in a program are executed by two processes - the original process that used fork(), plus the new process that is created by fork( ).main ( ) {printf (" Hello fork %d, fork ( ) ");}- Hello fork: 0- Hello fork: x ( > 0);- Hello fork: -1
Parent and Childif (!fork( )) {/* Child Code */}else {/* parent code */wait (0); /* or */waitpid(pid, ....);}
Zombie State and Orphan Process* When a child process exits, it has to give the exit status to the parent process.* If the parent process is busy or suspended then the child process will not be able to terminate.* Such state is called Zombie.* If parent exits before child, the child will become an orphan process and the init process (grand parent) will take care of the child process.
Copy on Write (COW)* Instead of copying the address space of the parent, UNIX uses the COW technique for economical use of the memory page.* The parent space is not copied, it can be shared by both the parent and the child process but the memory pages are marked as write protected.* If parent or child wants to modify the pages, then kernel copies the parent pages to the child process.* Advantage: Kernel can defer or prevent copying of a parent process address space.
execlTo run a new program in a process, you use one ofthe "exec" family ofcalls (such as "execl") and specify following:* the pathname of the program to run* the name of the program* each parameter to the program* (char *)0 or NULL as the last parameter to specify end of parameter list
Text Portion* User Context consists portions accessible to the process while running in user mode.* The text portion of a process contains the actual machine instructions that are executed by the hardware.* When a program is executed by the OS, the text portion is read into memory from its disk file, unless the OS supports shared text and a copy of program is already being executed.Data Portion* The data portion contains the program's data. It is possible for this to be divided into 3 pieces.* Initialized read only data contains elements that are initialized by the program and are read only while the process is executing.* Initialized read write data contains data elements that are initialized by the program and may have their values modified during execution of the process.
Stack Portion* Un-initialized data contains data elements that are not initialized by the program but are set to zero before execution starts .* The heap is used while a process is running to allocate more data space dynamically to the process.* The stack is used dynamically while the process is running to contain the stack frames that are used by many programming languages.
Kernel Context* The stack frames contain the return address linkage for each function call and also the data elements required by a function.* A gap is shown between heap and stack to indicate that many OS leave some room between these 2 portions, sothat both can grow dynamically.* The kernel context of a process is maintained and accessible only to the kernel. This area contains info that the kernel needs to keep track of the process and to stop and restart the process while other processes are allowed to execute.
Daemon ProcessIntroduction* Daemon process starts during system startup.* They frequently spawn other process to handle services requests.- Mostly started by initialization script /etc/rc* Waits for an event to occur.* perform some specified task on periodic basis (cron job)* perform the requested service and wait- Example print server
Characteristics* executed at the background process* Orphan process* No controlling terminal* run with super user privileges* process group leaders* session leaders
7. Process ManagementProcess DefinedProcess Defined
Process Descriptor Structures in the kernelProcess Descriptor Structures in the kernel
Process StatesProcess States
Process SchedulingProcess Scheduling
Process CreationProcess Creation
System calls related to process managementSystem calls related to process management
How to daemonize1. Call umask to set the file mode creation mask to a known value, usually 0.2. Call fork and have the parent exit. Child inherits the process group ID of the parent but gets a new process ID, so we're guaranteed that the child is not a process group leader. This is a prerequisite for the call to setsid that is done next.3. Call setsid to create a new session. The three steps listed in Section 9.5 occur. The process (a) becomes the leader of a new session, (b) becomes the leader of a new process group, and (c) is disassociated from its controlling terminal.4. Change the current working directory to the root directory. The current working directory inherited from the parent could be on a mounted file system.5. Unneeded file descriptors should be closed. This prevents the daemon from holding open any descriptors that it may have inherited from its parent (which could be a shell or some other process).6. Some daemons open file descriptors 0, 1, and 2 to /dev/null so that any library routines that try to read from standard input or write to standard output or standard error will have no effect.
$ ps -axj #to get all daemon process, does not have terminal
#include "apue.h"#include <syslog.h>#include <fcntl.h>#include <sys/resource.h>voiddaemonize(const char *cmd){int i, fd0, fd1, fd2;pid_t pid;struct rlimit rl;struct sigaction sa;/** Clear file creation mask.*/umask(0);/** Get maximum number of file descriptors.*/if (getrlimit(RLIMIT_NOFILE, &rl) < 0) err_quit("%s: can't get file limit", cmd);/** Become a session leader to lose controlling TTY.*/if ((pid = fork()) < 0) err_quit("%s: can't fork", cmd);else if (pid != 0) /* parent */ exit(0);setsid();/** Ensure future opens won't allocate controlling TTYs.*/ sa.sa_handler = SIG_IGN; sigemptyset(&sa.sa_mask); sa.sa_flags = 0; if (sigaction(SIGHUP, &sa, NULL) < 0) err_quit("%s: can't ignore SIGHUP", cmd); if ((pid = fork()) < 0) err_quit("%s: can't fork", cmd); else if (pid != 0) /* parent */ exit(0); /* * Change the current working directory to the root so * we won't prevent file systems from being unmounted. */ if (chdir("/") < 0) err_quit("%s: can't change directory to /", cmd); /* * Close all open file descriptors. */ if (rl.rlim_max == RLIM_INFINITY) rl.rlim_max = 1024; for (i = 0; i < rl.rlim_max; i++) close(i); /* * Attach file descriptors 0, 1, and 2 to /dev/null. */ fd0 = open("/dev/null", O_RDWR); fd1 = dup(0); fd2 = dup(0); /* * Initialize the log file. */ openlog(cmd, LOG_CONS, LOG_DAEMON); if (fd0 != 0 || fd1 != 1 || fd2 != 2) { syslog(LOG_ERR, "unexpected file descriptors %d %d %d", fd0, fd1, fd2); exit(1); }}
pid_t wait(int *status);pid_t waitpid(pid_t pid, int *status, int options);int waitid(idtype_t idtype, id_t id, siginfo_t *infop, int options); wait, waitpid - wait for process to change state
A state change is considered to be: the child terminated; the child was stopped by a signal; or the child was resumed by a1. wait, waitpid signal. In the case of a terminated child, performing a wait allows the system to release the resources associated with the child; if a wait is not performed, then termi nated the child remains in a "zombie" state.
If a child has already changed state, then these calls return immediately. Otherwise they block until either a child changes state or a signal handler interrupts the call (assuming that system calls are not automatically restarted using the SA_RESTART flag of sigaction(2)).
waitpid(-1, &status, 0);
The value of pid can be:< -1 meaning wait for any child process whose process group ID is equal to the absolute value of pid.-1 meaning wait for any child process.0 meaning wait for any child process whose process group ID is equal to that of the calling process.> 0 meaning wait for the child whose process ID is equal to the value of pid.
#include <stdio.h>int main () {int i=0,pid;printf ("Ready to fork");pid = fork();if (pid == 0){printf ("Child starts");for(i=0;i<1000;i++) printf ("%d\t",i);printf ("Child ends");sleep(30); uncomment this to get child orphaned process}else {Wait(0); //comment and sleep to get child as zombie processprintf ("Parent process");}}
Address Translation and page faultAddress Translation and page faulthandlinghandling
Demand Paging definedDemand Paging defined
Process Organization in MemoryProcess Organization in Memory
Factors to be considered while designing secondary memory
Latency, Throughput and BandwidthLatency - : Amount of time for a single operation to execute.Throughput - Rate at which operations get executed. Normally expressed as Operations/second. In sequential processing throughput = 1 /latencyBandwidth - : Total rate at which data moves between processor and memory. Product of throughput and datawidth
Pipelining, Parallelism and Pre-chargingMemory systems can be pipelined similar to the processors are pipelined, allowing operations to overlap execution to improve throughput.Many memory technologies require a certain delay (idle time ) between operations to pre-charge circuitry for the next access.Attaching multiple memories to the processor's memory bus allows parallelism. This increases the rate at which memory is accessed without increasing the pin count of the processor.
Two kinds of systems that support parallelism - Replicated & Banked.Replicated provides multiple copies of entire memory. Store needs to write into all copies( more expensive than loads ).Banked memory - Data is divided or interleaved across memories.
Example:What is the bandwidth of a memory system with a latency of 40 ns that transfers 1 byte per operation and is pipelined to allow 4 operations to overlap execution (assume no pipelining overhead ) ?
Dividing latency 40 ns by number of overlapped operations ( 4 ) gives a rate of 1 operation per 10 ns as the throughput of the memory system. At 1 byte of data per operation, this gives a bandwidth of 100 Mbyte/sec.
Levels in the Memory HierarchyCache :1. Generally implemented using SRAM.2. Use hardware to keep track of addresses stored in them.3. Tend to be small ( capacity ).4. Small Block Sizes ( 32 to 128 bytes ).
Main Memory:1. Generally implemented using DRAM.2. Use software to keep track of addresses.3. Larger capacity ( Few MB to several Gigabytes ).4. Larger Block Sizes ( several kilobytes ).
Virtual Memory:1. Implemented using disks.2. Contains all of the data in the memory system.
Some terminology...Hit : When an address is found at a given hierarchy.Miss: When an address is NOT found at a given hierarchy.Hit Rate: % of references that reach a given level & result in hits.Miss Rate: % of references that reach a given level & result in misses.Note: Hit Rate + Miss Rate = 100% ALWAYS.
When a miss occurs, a BLOCK of data is brought in from a lower level into the current level of the hierarchy. As time progresses, the current level may fill up, and run out of free space. A block must be removed to accommodate the new block. This is called eviction or replacement. The method to decide on what block to remove is called replacement policy.To simplify evicting data blocks, many memory systems maintain a property called inclusion. The presence of an address at a given level of a memory hierarchy GUARANTEES that the address is present in ALL LOWER LEVELS of the memory system.
Computing average access times in a memory hierarchy...
If we know the hit-rate and access-time ( time to complete a request that hits ) for each level in the hierarchy, we can compute average access time of the memory hierarchy. For each level in the hierarchy, the average access time is( T hit x P hit ) + ( T miss x P miss )Where T hit = Time to resolve requests that hit in the levelP hit = Hi-rate of the level, expressed as a probability.T miss = Average access time of the level below this one. rate of the level.P miss = MissNote that Hit-rate of the lowest level is 100%, we start at the bottom and compute the average access time of each level upwards in the hierarchy.
Example:A memory system contains a cache, a DRAM and a Virtual Store. The access time of the cache is 5 ns with a hit-rate of 80%, whereas the access time of the DRAM is 100 ns with a 99.5 % hit-rate. The access time of the virtual store is 10 ms. What is the average access time of the hierarchy ?We start at the bottom and work upwards:The hit-rate of Virtual store is always 100%.Average access time for requests that reach DRAM= ( 100 ns x 0.995 ) + ( 10 ms x 0.005 ) = 50,099.5 nsThe average access time for requests that reach the cache( which is ALL REQUESTS !!)= ( 5 ns x 0.80 ) + ( 50,099.5 ns x 0.20 ) = 10,024 ns
SRAM and DRAM ChipsThese have the same basic structure ( shown in next slide )Data is stored in rectangular array of bit cells, each holding 1 bit. To read data from the array, half of the address to be read ( generally high order bits) is fed into a decoder. The decoder asserts (drives high) the word line corresponding to the value of its input bits, which causes all of the bit cells in the corresponding row to drive their values onto bit lines that they are connected to.
The other half of the address is then used as an input to a multiplexer that selects theappropriate bit line and drives its output onto the output pins of the chip.
To store data on the chip, the same process is used, except the value to be written isdriven on appropriate bit line and written into the selected bit cell.
The malloc() function allocates size bytes and returns a pointer to the allocated memory. The memory is not initialized. If size is 0, then malloc() returns either NULL, or a unique pointer value that can later be successfully passed to free().
The free() function frees the memory space pointed to by ptr, which must have been returned by a previous call to malloc(), calloc(), or realloc().
The calloc() function allocates memory for an array of nmemb elements of size bytes each and returns a pointer to the allocated memory. The memory is set to zero. If nmemb or size is 0, then calloc() returns either NULL, or a unique pointer value that can later be successfully passed to free().
The realloc() function changes the size of the memory block pointed to by ptr to size bytes. The contents will be unchanged in the range from the start of the region up to the minimum of the old and new sizes.
#include <alloca.h>void *alloca(size_t size);
DESCRIPTIONThe alloca() function allocates size bytes of space in the stack frame of the caller. This temporary space is automatically freed when the function that called alloca() returns to its caller.
Virtual MemoryEach program has a virtual address space which is the set of addresses that programs use for load and store operations.
The physical address space is the set of addresses used to reference locations in main memory.
The virtual address space is divided into pages some of which reside inside a page frame ( slots in main memory ) while others reside on the disk. Pages are always aligned on a multiple of the page size so that the addresses never overlap.
The terms virtual page and physical page are used to describe a page of data in the virtual and physical address spaces respectively.
Pages that have been loaded into main memory are said to have been mapped.
Virtual memory allows a computer to act as if its main memory were much larger than it actually is.
Address Translation and page faultAddress Translation and page faulthandlinghandlingDemand Paging definedDemand Paging defined
Process Organization in MemoryProcess Organization in Memory
When a program references a virtual address, it cannot tell, except by timing the latency of the operation, whether the virtual address was resident in the main memory or whether it had to be fetched from disk.
This makes it possible for the computer to shuffle pages in and out of the main memory exactly like data is brought in and out of the cache.
Address TranslationPrograms running on systems with Virtual Memory use Virtual Addresses as the arguments to load and store instructions.
The main memory uses Physical Addresses to record locations where data is actually stored.
Whenever a program uses a Virtual Address, this must be converted into a Physical Address and this process is known as Address Translation.
When a program accesses a memory location, the O.S accesses a Page Table, which is a data structure that contains the mapping of the virtual address to the physical address.
If the virtual page is mapped ( present in memory ) then the physical address is retrieved and the operation proceeds.
If the virtual page is NOT mapped, then a page fault occurs and the O.S fetches the page from the hard disk, loading it into a page frame, and updating the page table with the new translation. Once the page has been read into memory from disk, and the page table updated, the physical address of the page can be determined and the memory reference completed.
If all the page frames already contain data, one of them must be evicted to the disk to make room for the incoming data. The replacement policies used to select the page that is evicted are similar to the ones for set-associative caches.
Because both virtual and physical pages are always aligned on a multiple of their size, the page table does not need to keep track of the full virtual or physical address of a page that is mapped. Instead virtual addresses are divided into a Virtual Page Number or VPN and a set of bits that describe an offset from the start of the virtual page to the virtual address. Similarly, the physical pages are divided into Physical Page Numbers or PPN and an offset
Because both virtual and physical pages are always aligned on a multiple of their size, the page table does not need to keep track of the full virtual or physical address of a page that is mapped. Instead virtual addresses are divided into a Virtual Page Number or VPN and a set of bits that describe an offset from the start of the virtual page to the virtual address. Similarly, the physical pages are divided into Physical Page Numbers or PPN and an offset from the start of the physical page to the physical address.
The virtual and physical pages in a given system are generally the same size, so the number of bits(log 2 of the page size) for the offset of the virtual and physical addresses are the same.
The VPN and PPN may be of different lengths. For example, on 64-bit systems, the virtual addresses are generally much longer than physical addresses.
The page table is accessed using the virtual page frame number as an offset.
Virtual page frame 5 would be the 6th element of the table (0 is the first element).
To translate a virtual address into a physical one, the processor must first work outthe virtual addresses page frame number and the offset within that virtual page. Bymaking the page size a power of 2 this can be easily done by masking and shifting.Assuming a page size of 0x2000 bytes (which is decimal 8192) and an address of 0x2194 in process Y's virtual address space then the processor would translate that address into offset 0x194 into virtual page frame number 1.
V Valid, if set this PTE is valid,FOE "Fault on Execute", Whenever an attempt to execute instructions in this pageoccurs, the processor reports a page fault and passes control to the operating system,
FOW "Fault on Write", as above but page fault on an attempt to write to this page,FOR "Fault on Read", as above but page fault on an attempt to read from this page,ASM Address Space Match. This is used when the operating system wishes to clear only some of the entries from the Translation Buffer,KRE Code running in kernel mode can read this page,URE Code running in user mode can read this page,GH Granularity hint used when mapping an entire block with a single Translation. Buffer entry rather than many,KWE Code running in kernel mode can write to this page,UWE Code running in user mode can write to this page,
page frame number For PTEs with the V bit set, this field contains the physical Page Frame Number (page frame number) for this PTE. For invalid PTEs, if this field is not zero, it contains information about where the page is in the swap file.The following two bits are defined and used by Linux:PAGE DIRTY if set, the page needs to be written out to the swap file,PAGE ACCESSED Used by Linux to mark a page as having been accessed.
TLB, Translation Lookaside BuffersA major disadvantage of using page tables is that a page table must be accessed for every memory reference. On a system with a single-level page table, this doubles the number of memory accesses, since each load or store operation requires one memory reference to access the appropriate page table and one to perform the actual load/store. This greatly increases the latency of a memory reference.
The problem is even greater on multi-level page tables, because multiple references are required to traverse the page table. To reduce penalty, CPUs that incorporate virtual memory use Translation Looaside Buffers ( TLBs) that act as caches for the page table. Whenever a program performs a memory reference the virtual address is sentto the TLB to determine if it contains a translation for that address. If so, the TLB returns the physical address and the memory reference continues.
If not, a TLB miss occurs and the system searches the page table for a translation. Some systems provide hardwaresupport for a TLB miss while others require the OS to access the page table thru software.
Address Translation and page faultAddress Translation and page faulthandlinghandling
Demand Paging definedDemand Paging definedProcess Organization in MemoryProcess Organization in Memory
1. Hit in the TLB : The TLB contains the physical address and it is returned immediately.2. TLB miss, but page mapped : In this case the system accesses the page table from memory to find the translation for the virtual address, copies that translation into TLB returns the memory reference3. TLB miss and page not mapped: The system accesses the page table and finds that its is not mapped. This results in a page fault. The O.S loads the page's data from disk in the same manner as a virtual memory system that does not contain TLB.
TLB misses and page faults are handled very differently by the O.S because of the difference in the amount of time it takes to resolve each event.
TLB misses generally take a short time to resolve if the page is mapped and normally takes a few hundred cycles so user programs can just wait for its completion.
TLB misses that result in a page fault can take a few milliseconds which is the amount of time slice generally given to a process. Therefore, a page fault can trigger a context switch through invoking the scheduler while the page fault is being resolved.
TLB EntryTLBs are organized similar to caches having an associativity and number of sets. While cache sizes are typically described in bytes, TLBs are in number of entres or translations contained in them, since the amount of space taken up by each entry is mostly irrelevant to the performance of the system.
This a 128-entry, 4-way set-associative TLB would have 32 sets each containing 4 entries.
The TLB entry contains the VPN of the page that it is a translation for, which is compared to the VPN of the address of a memory reference to determine if a hit has occurred.
Like a cache's tag array entry, bits of the VPN used to select an entry in the TLB are omitted to save space. All the bits of the PPN are stored however, since they may differ from the corresponding bits in the VPN.
Demand PagingAs there is much less physical memory than virtual memory the operating system must be careful that it does not use the physical memory inefficiently. One way to save physical memory is to only load virtual pages that are currently being used by the executing program.
This technique of only loading virtual pages into memoryas they are accessed is known as demand paging.
When a process attempts to access a virtual address that is not currently in memory the processor cannot find a page table entry for the virtual page referenced. For Example in previous figure there is no entry in process X's page table for virtual page frame number 2 and so if process X attempts to read from an address within virtual page frame number 2 the processor cannot translate the address into a physical one. At this point the processor notifies the operating system that a page fault has Occurred.
If the faulting virtual address is invalid this means that the process has attempted to access a virtual address that it should not have. Maybe the application has gone wrong in some way, for example writing to random addresses in memory. In this case the operating system will terminate it, protecting the other processes in the system from this rogue process.
If the faulting virtual address was valid but the page that it refers to is not currently in memory, the operating system must bring the appropriate page into memory from the image on disk.
The fetched page is written into a free physical page frame and an entry for the virtual page frame number is added to the processes page table. The process is then restarted at the machine instruction where the memory fault occurred. This time the virtual memory access is made, the processor can make the virtual to physical address translation and so the process continues to run.
Linux uses demand paging to load executable images into a processes virtual memory. Whenever a command is executed, the file containing it is opened and its contents are mapped into the processes virtual memory. This is done by modifying the data structures describing this processes memory map and is known as memory mapping.
However, only the first part of the image is actually brought into physical memory. The rest of the image is left on disk. As the image executes, it generates page faults and Linux uses the processes memory map in order to determine which parts of the image to bring into memory for execution.
9. Multi Thread ProgrammingCreating multiple threadsCreating multiple threadsParent synchronization with other ThreadParent synchronization with other Thread
System callsSystem calls
Introduction* Thread is a sequential flow of control through a program.* If a process is defined as a program in execution then a thread is defined as a function in execution.* If a thread is created, it will execute a specified function.* Two type of threading:- Single Threading- Multi threading
POSIX ThreadThe created threads within a process shareinstructions of a processprocess address space and dataopen file descriptorspwd, uid and gid
The created threads maintain its own:thread identification number (tid)pc, sp, set of registersstackSignal Handlers priority of the threads scheduling policy
Advantages of Threads:Takes less time for:* Creation of a new thread* Termination of a thread* Communication between threads are easier.
There are two broad categories of threadimplementation:1. User level Threads (ULT)2. Kernel level threads (or kernel-supported threads or Light weight processes)
Thread managementThread management is done by the application and the kernel is not aware of the existence of threads.* Thread library contains code for creating and destroying threads, passing messages and data between threads, for scheduling thread execution and for saving and restoringthread contexts.* This thread application are allocated to a single process managed by the kernel.* All the activity takes place in user space and within a single process. The kernel continues to schedule the process as a unit and assigns a single execution state to that process.
ULTAdvantages:* Thread switching does not require kernel mode.* Scheduling can be application specific.* Can run on any OS.Disadvantages:* When it executes a system call, not only is that thread isblocked, but all the threads within the process are blocked.
KLTKernel Level Threads:* Thread management is done by the kernel- Advantage: If one thread in a process is blocked, kernel can schedule another thread of the same process.
9. Multi Thread ProgrammingCreating multiple threadsCreating multiple threads
Parent synchronization with other ThreadParent synchronization with other ThreadSystem callsSystem calls
Day 4 Morning
9. Multi Thread ProgrammingCreating multiple threadsCreating multiple threads
Parent synchronization with other ThreadParent synchronization with other Thread
System callsSystem calls
- Disadvantage: Transfer of control from one thread to another within the same process requires a mode switch to the kernel
Advantages of Multi ThreadingImprove application responsivenessUse multiprocessors more efficientlyImprove program structureuse fewer system resourcesSpecific applications in uniprocessor machinesApplications A file server on a LAN Graphical User Interfaces (GUIs) web applications
Parent wait on join() system call to let children join themHello Thread Example#include <pthread.h>void thread_function (void) {printf (" Hello POSIX Thread");printf ("Thread id: %d", pthread_self());}main ( ) {pthread_t mythread;pthread_create ( &mythread, NULL, thread_function, NULL);pthread_join (mythread, NULL);}$cc thread.c -lpthread
The pthread_create() function shall create a new thread, with attributes specified by attr, within a process. If attr is NULL, the default attributes shall be used. If the attributes specified by attr are modified later, the thread's attributes shall not be affected. Upon successful completion, pthread_create() shall store the ID of the created thread in the location referenced by thread.
int main(){int res;pthread_t a_thread;void *thread_result;
res=pthread_create(&a_thread,NULL,thread_fun,(void *)message);if(res !=0){ perror("unable to create thread"); exit(1);}printf("waiting for thread to finish");//Thread joining, catch exit value from the thread res=pthread_join(a_thread,&thread_result);
printf("thread joined , it returned %s",(char *)thread_result);printf("Message is now %s",message);exit(0);}
void *thread_fun(void *arg){printf("thread fun ,arg is %s",(char *)arg);sleep(3);strcpy(message,"bye");//exit with return valuepthread_exit("thank you");}
2. pthread_key_create#include <pthread.h>
int pthread_key_create(pthread_key_t *key, void (*destructor)(void*)); pthread_key_create - thread-specific data key creation
The pthread_key_create() function shall create a thread-specific data key visible to all threads in the process. Key values provided by pthread_key_create() are opaque objects used to locate thread-specific data. Although the same key value may be used by different threads, the values bound to the key by pthread_setspecific() are maintained on a per-thread basis and persist for the life of the calling thread.
Upon key creation, the value NULL shall be associated with the new key in all active threads. Upon thread creation, the value NULL shall be associated with all defined keys in the new thread.
/* Close the log file pointer THREAD_LOG. */void close_thread_log (void* thread_log){fclose ((FILE*) thread_log);}
void* thread_function (void* args){char thread_log_filename[20];FILE* thread_log;/* Generate the filename for this thread's log file. */sprintf (thread_log_filename, "thread%d.log", (int) pthread_self ());/* Open the log file. */thread_log = fopen (thread_log_filename, "w");/* Store the file pointer in thread-specific data under thread_log_key. */pthread_setspecific (thread_log_key, thread_log);write_to_thread_log ("Thread starting.");/* Do work here... */return NULL;}main (){int i;pthread_t threads[5];
/* Create a key to associate thread log file pointers inthread-specific data. Use close_thread_log to clean up the filepointers. */
The pthread_mutex_destroy() function shall destroy the mutex object referenced by mutex; the mutex object becomes, in effect, uninitialized. An implementation may cause pthread_mutex_destroy() to set the object referenced by mutex to an invalid value. A destroyed mutex object can be reinitialized using pthread_mutex_init(); the results of oth erwise referencing the object after it has been destroyed are undefined.It shall be safe to destroy an initialized mutex that is unlocked. Attempting to destroy a locked mutex results in undefined behavior.
The pthread_mutex_init() function shall initialize the mutex referenced by mutex with attributes specified by attr. If attr is NULL, the default mutex attributes are used; the effect shall be the same as passing the address of a default mutex attributes object. Upon successful initialization, the state of the mutex becomes initialized and unlocked.
char work_area[1024];int time_to_exit=0;int main(){int res;pthread_t a_thread;void *thread_result; res=pthread_mutex_init(&work_mutex,NULL);//initialize mutex default attrres=pthread_create(&a_thread,NULL,thread_fun,NULL);pthread_mutex_lock(&work_mutex); //put a lock to the main thread, then enjoy printf("input some text enter end to finish");while(!time_to_exit) { fgets(work_area,1024,stdin);
//unlock the main thread,your subordinate is waiting pthread_mutex_unlock(&work_mutex); while(1){ pthread_mutex_lock(&work_mutex);//lock it is your turn if(work_area[0] != '\0') { pthread_mutex_unlock(&work_mutex); sleep(1); } else break;
}}
pthread_mutex_unlock(&work_mutex);printf("waiting for thread to finish");res=pthread_join(a_thread,&thread_result);printf("thread joined , it returned %s",(char *)thread_result);pthread_mutex_destroy(&work_mutex);exit(0);}
void *thread_fun(void *arg){sleep(1);//Sleep well Let main thread send some datapthread_mutex_lock(&work_mutex);//lock the curr threadwhile(strncmp("end",work_area,3) !=0){ printf("you entered %d characters",strlen(work_area) -1); work_area[0]='\0'; pthread_mutex_unlock(&work_mutex);//unlock the current thread sleep(1);//Sleep well , Let main thread do it's job pthread_mutex_lock(&work_mutex); while(work_area[0] == '\0') { pthread_mutex_unlock(&work_mutex); sleep(1); pthread_mutex_lock(&work_mutex); }}time_to_exit=1;work_area[0]='\0'; pthread_mutex_unlock(&work_mutex);
On command line pipe is represented as "|"* It can be used in the shell to link two or more commands- For example ls -Rl | wc* Two ends of a pipe is represented as a set of two descriptors.* A pipe is used to communicate between related Processes (common ancestor). Normally, a pipe is created by a process, that process calls fork, and the pipe is used between the parent and the child.
* Half duplex* Data is passed in order.* Pipe uses circular buffer and it has zero buffering capacity* The read and write system calls are blocking calls.
Two way Communication* Create two pipes say fd1, fd2.* Four descriptors for each process (fd1[0],fd1[1],fd2[0],fd2[1])* Parent closes read end of fd1 and write end of fd2- close(fd1[0], fd2[1]);* child closes read end of fd2 and write end of fd1- close(fd2[0], fd1[1]);
Pipe : Advantages & DisadvantagesAdvantages:* Simplest form of IPC* Persistence in process level* Can be used in shell
Disadvantages:* Cannot be used to communicate between unrelated processes
popen and pclose Functions
The function popen does a fork and exec to execute the cmdstring and returns a standard I/O file pointer. If type is "r", the file pointer is connected to the standard output of cmdstring.If type is "w", the file pointer is connected to the standard input of cmdstring.
#include <stdio.h>FILE *popen(const char *cmdstring, const char *type); Returns: file pointer if OK, NULL on errorint pclose(FILE *fp); Returns: termination status of cmdstring, or -1 on error
* FIFO is created on a file system as a devicespecial file* It can be used to communicate between unrelatedprocesses* It can be reused.* Persist till the file is deleted.
FIFO Creation* FIFO can be created in a shell by using mknod ormkfifo command. - mknod myfifo p - mkfifo a=rw myfifo* In a C program mknod system call or mkfifo library function can be used. - int mkfifo ( char *file_name, mode_t mode); - int mknod (char *file_name, mode_t mode, dev_t dev);* mknod("./MYFIFO", S_IFIFO|0666, 0);
Using FIFO* Once a FIFO is created either from a shell or through a program, file's related system calls (open, read, write, select, close etc., ) are used to access the FIFO.* For example: Process 1 may open a FIFO in write only mode and write some data.* Process 2 may open the FIFO in read only mode, read the data and display on the monitor.
FIFO: Disadvantages* Data cannot be broadcast to multiple receivers.* If there are multiple receivers, there is no way to direct to a specific reader or vice versa.* Cannot be used across network* Less secure than a pipe, since any process with valid access permission can access data.* Cannot store data* No message boundaries. Data is treated as a stream of Bytes.
unlink(FIFO1); //removing fifo from /tmpunlink(FIFO2);exit(0);}
The common communication channel between user space program and kernel is given by the system calls.But there is a different channel, that of the signals, used both between user processes and from kernel to user process.
Sending SignalsA program can signal a different program using the kill() system call with prototypeint kill(pid_t pid, int sig);This will send the signal with number sig to the process with process ID pid . Signal numbers are small positive integers.
Signal Value Action Comment-------------------------------------------------SIGHUP 1 Term Hangup detected on controlling terminal or death of controlling processSIGINT 2 Term Interrupt from keyboardSIGQUIT 3 Core Quit from keyboardSIGILL 4 Core Illegal InstructionSIGABRT 6 Core Abort signal from abort(3)SIGFPE 8 Core Floating point exceptionSIGKILL 9 Term Kill signalSIGSEGV 11 Core Invalid memory referenceSIGPIPE 13 Term Broken pipe: write to pipe with noreadersSIGALRM 14 Term Timer signal from alarm(2)SIGTERM 15 Term Termination signalSIGUSR1 30,10,16 Term User-defined signal 1SIGUSR2 31,12,17 Term User-defined signal 2SIGCHLD 20,17,18 Ign Child stopped or terminatedSIGCONT 19,18,25 Cont Continue if stoppedSIGSTOP 17,19,23 Stop Stop processSIGTSTP 18,20,24 Stop Stop typed at terminalSIGTTIN 21,21,26 Stop Terminal input for background processSIGTTOU 22,22,27 Stop Terminal output for background process
The signals SIGKILL and SIGSTOP cannot be caught, blocked, or ignored.
Signals not in the POSIX.1-1990 standard but described in SUSv2 and POSIX.1-2001.Signal Value Action Comment--------------------------------------------------------------SIGBUS 10,7,10 Core Bus error (bad memory access)SIGPOLL Term Pollable event (Sys V).Synonym for SIGIOSIGPROF 27,27,29 Term Profiling timer expiredSIGSYS 12,31,12 Core Bad argument to routine (SVr4)SIGTRAP 5 Core Trace/breakpoint trapSIGURG 16,23,21 Ign Urgent condition on socket (4.2BSD)SIGVTALRM 26,26,28 Term Virtual alarm clock (4.2BSD)SIGXCPU 24,24,30 Core CPU time limit exceeded (4.2BSD)SIGXFSZ 25,25,31 Core File size limit exceeded (4.2BSD)
various other signals.Signal Value Action Comment-------------------------------------------------SIGIOT 6 Core IOT trap. A synonym for SIGABRTSIGEMT 7,-,7 TermSIGSTKFLT -,16,- Term Stack fault on coprocessor (unused)SIGIO 23,29,22 Term I/O now possible (4.2BSD)SIGCLD -,-,18 Ign A synonym for SIGCHLDSIGPWR 29,30,19 Term Power failure (System V)SIGINFO 29,-,- A synonym for SIGPWRSIGLOST -,-,- Term File lock lost (unused)SIGWINCH 28,28,20 Ign Window resize signal (4.3BSD, Sun)SIGUNUSED -,31,- Core Synonymous with SIGSYS
Blocking signalsEach process has a list (bitmask) of currently blocked signals. When a signal is blocked, it is not delivered (that is, no signal handling routine is called), but remains pending.The sigprocmask() system call serves to change the list of blocked signals. See sigprocmask(2).The sigpending() system call reveals what signals are (blocked and) pending.The sigsuspend() system call suspends the calling process until a specified signal is received.When a signal is blocked, it remains pending, even when otherwise the process would ignore it.
wait and SIGCHLDWhenever the child (it exits, crashes, traps, stops, continues), and in particularwhen it dies, the parent is sent a SIGCHLD signal. If parent handles it then
The parent can use the system call wait() or waitpid() or so, there are a few variations, to learn about the status of its stopped or deceased children. In the case of a deceased child, as soon as a status has been reported, the zombie vanishes.
If the parent is not interested it can say so explicitly (before the fork) using
and as a result it will not hear about deceased children, and children will not be transformed into zombies. Default action for SIGCHLD is to ignore the signal but it would create zombie child process.
Returning from a signal handlerWhen the program was interrupted by a signal, its status (including all integer and floating point registers) was saved, to be restored just before execution continues at the point of interruption.This means that the return from the signal handler is more complicated than an arbitrary procedure return - the saved state must be restored.To this end, the kernel arranges that the return from the signal handler causes a jump
# include <stdio.h># include <signal.h># include <unistd.h>void sig_fun(int);main() {struct sigaction signalact;signalact.sa_handler = sig_fun;sigemptyset(&signalact.sa_mask);signalact.sa_flags =0;sigaction(SIGINT, &signalact, 0);while(1){ printf("hello world"); sleep(1);}}void sig_fun(int signal) { printf("Hi, I got signal: %d",signal);}
#include<stdio.h>#include<signal.h>static void sighandler(int);int main(void) {int i,parentpid,childpid,status;/*prepare the sighandler routine to catch SIGUSR1 and SIGUSR2 */if(signal(SIGUSR1,sighandler)==SIG_ERR) printf("Parent:Unable to create handler for SIGUSR1");parentpid=getpid();if((childpid=fork())==0) { kill(parentpid,SIGUSR1);/* raise the SIGUSR1 signal*/ printf("Hi,child, I am here .............!
"); if(signal(SIGUSR2,sighandler)==SIG_ERR) printf("Child:Unable to create handler for SIGUSR2"); /*Child Process begins busy-wait for a signal*/ printf("child,waiting for singnal"); pause(); //sleep(4); printf("child done %d",getpid());}else { kill(childpid,SIGUSR2);/* raise the SIGUSR2 signal*/ printf("Parent:waiting for child to terminate....."); //sleep(1); wait(&status);/*Parent waiting for the child termination*/ //kill(parentpid,SIGTERM);/*Parent raising the SIGTERM signal*/ printf("parent done %d",getpid());}}static void sighandler(int signo) {switch(signo){ case SIGUSR1:/* Incoming SIGUSR1 signal*/ printf("Parent:Recieved SIGUSR1"); break; case SIGUSR2:/*Incoming SIGUSR2 signal*/ printf("Recieved SIGUSR2"); break; default: printf("This should not be printed");} return;}
Introduction* Sys V IPC is implemented as a single unit.* System V IPC Provides three mechanisms namely: - Message Queues - Shared Memory - Semaphores* Persist till explicitly delete or reboot the system.
Common AttributesEach IPC objects has the following attributes. key id Owner Permission Size - Message queue - used-bytes, number of messages - Shared memory - size, number of attach, status - Semaphore - number of semaphores in a set - The ipc_perm structure holds the common attributes of the resources.
System Limitations$ ipcs -l------ Shared Memory Limits --------max number of segments = 4096max seg size (kbytes) = 32768max total shared memory (kbytes) = 8388608min seg size (bytes) = 1------ Semaphore Limits --------max number of arrays = 128max semaphores per array = 250max semaphores system wide = 32000max ops per semop call = 32semaphore max value = 32767------ Messages: Limits --------
max queues system wide = 16max size of message (bytes) = 8192default max size of queue (bytes) = 16384
Get a Key* If we wish to communicate between different processes using an IPC resource, the first step is to create a shared unique identifier.* The simplest form of the identifier is a number - the system generates this number dynamically for a given mechanism by using the ftok library function.* But apart from the creator, other processes that want to communicate with the creator process should agree to the key value.* Syntax: key_t ftok (const char *filename, int id);
Get an idThe syntax for a get function is: int xxxget (key_t key, int xxxflg); (xxx may be msg or shm or sem)If successful, returns to an identifier; otherwise -1 for error.The key can be generated in three different ways - from the ftok library function - by choosing some static positive integer value - by using the IPC_PRIVATE macroflags commonly used with this function are IPC_CREAT and IPC_EXCL.
Control an ObjectThe syntax for the control function is: int xxxctl (int xxxid, int cmd, struct xxxid_ds *buffer); (xxx may be msg or shm or sem);If successful, the xxxctl function returns zero, otherwise it returns -1.The command argument may beIPC_STATIPC_SETIPC_RMID
Message Queues* Message queue overcomes FIFO limitation like storing data and setting message boundaries.* Create a message queue* Send message (s) to the queue* Any process who has permission to access the queue can retrieve message (s).* Remove the message queue.
Each queue has the following msqid_ds structure associated with it:
struct msqid_ds {struct ipc_perm msg_perm;msgqnum_t msg_qnum; /*# of messages on queue */msglen_t msg_qbytes; /*max # of bytes on queue */pid_t msg_lspid; /*pid of last msgsnd() */pid_t msg_lrpid; /*pid of last msgrcv() */time_t msg_stime; /*last-msgsnd() time */time_t msg_rtime; /*last-msgrcv() time */ime_t msg_ctime; /*last-change time */....};
msgget* int msgget (key_t key, int msgflg);* The first argument key can be passed from the return value of the ftok function or made IPC_PRIVATE.* To create a message queue, IPC_CREAT ORed with access permission is set for the msgflg argument.* Ex: msgid = msgget (key, IPC_CREAT | 0744);msgid = msgget (key, 0);
msgsnd* The syntax of the function is:* int msgsnd (int msqid, structu msgbuf *msgp, size_t msgsz, int msgflg);* Arguments:- message queue ID- address of the structure.- size of the message text- message flag* 0 or IPC_NOWAITstruct mymesg {
#include<sys/ipc.h>#include<sys/types.h>#include<sys/msg.h>#include<unistd.h>#include<stdlib.h>#include<stdio.h>struct message{long mtype;char mtext[50];};main(){struct message m1;int msgid;if((msgid=msgget(1,0666|IPC_CREAT))==-1) { perror("msgget"); exit(1);}m1.mtype=getpid();printf("Process id of the current process is:%ld",getpid());printf("Enter the message you want to send to the queue");fgets(m1.mtext,50,stdin);if((n=msgsnd(msgid,&m1,50,0))==-1) { perror("msgsnd"); exit(1);}printf("Message successfully sent");}
main() {struct message m1;int msgid;if((msgid=msgget(1,0666|IPC_CREAT))==-1) { perror("msgget"); exit(1);}if(msgrcv(msgid,&m1,10,0,MSG_NOERROR)==-1) { perror("msgsnd"); exit(1);}printf("Message received from the process whose pid is:%ld",m1.mtype);printf("And the message is:%s",m1.mtext);}
long mtype;/* positive message type */char mtext[512]; /* message data, of length nbytes */};
msgrcv Syntax of the function is:ssize_t msgrcv (int msqid, struct msgbuf *msgp, size_t msgsz, long msgtype, int msgflg);
msgtype argument is used to retrieve a particularmessage. 0 -retrieve in FIFO order +ve - retrieve the the exact value of the message type -ve - first message or <= to the absolute value. on success, msgrcv returns with the number of bytes actually copied into the message text
Destroying a Message QueueThere are many ways:* From command line, using one of the ways- $ ipcrm msg msqid- $ ipcrm -q msqid- $ ipcrm -Q msgkey* Using system call- msgctl (msgid, IPC_RMID, 0);
Limitations* Message queues are effective if a small amount of data is transferred.* Very expensive for large transfers.* During message sending and receiving, the message is copied from user buffer into kernel buffer and vice versa* So each message transfer involves two data copy operations, which results in poor performance of a system.* A message in a queue can not be reusedMessage send tests.c
Shared Memory* Very flexible and ease of use.* Fastest IPC mechanisms* shared memory is used to provide access to Global variable Shared libraries Word processors Multi-player gaming environment Http daemons Other programs written in languages like Perl, C etc.,
Shared Memory: Data StructuresThe data structures used in shared memory are * shmid_ds * ipc_perm * Shminfo * shm_info * shmid_kernel
ipc_perm Structurestruct ipc_perm {__key_t __key; - Key__uid_t uid - Owner's user ID__gid_t gid; - Owner's group ID__uid_t cuid; - Creator's user ID__gid_t cgid; - Creator's group IDunsigned short int mode; - r/w permission unsigned short int__seq; - Sequence number};
printf("Enter the data you want to write into shared memory");fgets(msg,1024,stdin);pos = strlen(msg);strcpy(msg+pos-1,"World");printf("Data successfully written");
* Creating shared memory* Connecting to the memory & obtaining a pointer to the memory* Reading/Writing & changing access mode to the memory* Detaching from memory* Deleting the shared segmentshmat* Used to attach the created shared memory segment onto a process address space.* void *shmat(int shmid,void *shmaddr,int shmflg)* Example: data=shmat(shmid,(void *)0,0);* A pointer is returned on the successful execution of the system call and the process can read or write to the segment using the pointer.
Reading / Writing to Shared Memory* Reading or writing to a shared memory is the easiestpart.* The data is written on to the shared memory as we do itwith normal memory using the pointers* Eg. Read:printf("SHM contents : %s", data);* Eg. Write:prinf("Enter a String : ");scanf(" %[^]",data);
shmdt and shmctl* The detachment of an attached shared memory segment is done by shmdt to pass the address of the pointer as an argument.* Syntax: int shmdt(void *shmaddr);* To remove shared memory call:int shmctl(shmid,IPC_RMID,NULL);* These functions return -1 on error and 0 on successful execution.
Shared Memory: Pseudo Code* shmid = shmget (key, 1024, IPC_CREAT|0744);* void *shmat (int shmid, void *shmaddr, int shmflg); if the shm is read only pass SHM_RDONLY else 0* (void *)data = shmat (shmid, (void *)0, 0);* int shmdt (void *shmaddr);* int shmctl (shmid, IPC_RMID, NULL);
Limitations* Data can either be read or written only. Append is not allowed.* Race condition- Since many processes can access the shared memory, any modification done by one process in the address space is visible to all other processes. Since the address space is a shared resource, the developer should implement a proper locking mechanism to prevent the race condition in the shared memory.
Semaphores* If a process wants to use the shared object, it will "lock" it by asking the semaphore to decrement the counter* Depending upon the current value of the counter, the semaphore will either be able to carry out this operation, or will have to wait until the operation becomes possible* The current value of counter is >0, the decrement operation will be possible. Otherwise, the process will have to wait
System V IPC: Semaphores* System V semaphore provides a semaphore set- that can include a number of semaphores. It is up to user to decide the number of semaphores in the set.* Each semaphore in the set can be a binary or a counting semaphore. Each semaphore can be used to control access to one resource - by changing the value of semaphore count.
Semaphore: Initializationunion semun {int val;// value for SETVALstruct semid_ds *buf; // buffer for IPC_STAT, IPC_SETunsigned short int *array; // array for GETALL, SETALL};union semun arg;semid = semget (key, 1, IPC_CREAT | 0644);arg.val = 1; /* 1 for binary else > 1 for Counting Semaphore */semctl (semid, 0, SETVAL, arg);
# include <sys/types.h># include <sys/sem.h># include <sys/ipc.h># include <stdio.h># include<pthread.h># include<unistd.h>union semun{ int val; struct semid_ds *buf; unsigned short array;
struct seminfo *__buff;};
void * th_fun(void *);
union semun u;int sid;key_t key;int pid, sid;struct sembuf su, sl;
main(){pthread_t t1, t2, t3, t4;
unsigned short int key;key = ftok("semaphore.c",100);sid = semget(key,1,IPC_CREAT | 0666);printf("semaphore created by %d",getpid());u.val = 2;semctl(sid,0,SETVAL,u);printf("Semaphore initialized to %d",u.val);
11. SocketsAn OverviewAn OverviewSystem calls related toSystem calls related to - TCP - TCP
-UDP-UDP
A socket is an abstraction of a communication endpoint. Just as they would use file descriptors to access files, applications use socket descriptors to access sockets. Socket descriptors are implemented as file descriptors in the UNIX System. Indeed, many of the functions that deal with file descriptors, such as read and write, will work with a socket descriptor.
To create a socket, we call the socket function.
#include <sys/socket.h>int socket(int domain, int type, int protocol);
Returns: file (socket) descriptor if OK, -1 on error
socket() call is similar to open() system call.slose - deallocates the socketdup, dup2 - duplicates the file descriptor as normalfchdir - fails with errno set to ENOTDIRfchmod - unspecifiedfchown - implementation definedfcntl -some commands supported, including F_DUPFD, F_DUPFD_CLOEXEC, F_GETFD, F_GETFL, F_GETOWN, F_SETFD, F_SETFL, and F_SETOWNfdatasync, fsync - implementation definedfstat - some stat structure members supported, but how left up to the implementationftruncate - unspecifiedioctl - some commands work, depending on underlying device driverlseek - implementation defined (usually fails with errno set to ESPIPE)mmap - unspecifiedpoll - works as expectedpread and pwrite - fails with errno set to ESPIPEread and readv - equivalent to recv without any flagsselect - works as expectedwrite and writev - equivalent to send without any flags
#include <sys/socket.h>int shutdown(int sockfd, int how);
If how is SHUT_RD, then reading from the socket is disabled. If how is SHUT_WR, then we can't use the socket for transmitting data. We can use SHUT_RDWR to disable both data transmission and reception.
Given that we can close a socket, why is shutdown needed? There are several reasons. First, close will deallocate the network endpoint only when the last active reference is closed. If we duplicate the socket (with dup, for example), the socket won't be deallocated until we close the last file descriptor referring to it. The shutdown function allows us to deactivate a socket independently of the number of active file descriptors referencing it. Second, it is sometimes convenient to shut a socket down in one direction only. For example, we can shut a socket down for writing if we want the process we are communicating with to be able to tell when we are done transmitting data, while still allowing us to use the socket to receive data sent to us by the process.
Byte OrderingThe TCP/IP protocol suite uses big-endian byte order.
#include <arpa/inet.h>uint32_t htonl(uint32_t hostint32); Returns: 32-bit integer in network byte orderuint16_t htons(uint16_t hostint16); Returns: 16-bit integer in network byte orderuint32_t ntohl(uint32_t netint32); Returns: 32-bit integer in host byte orderuint16_t ntohs(uint16_t netint16); Returns: 16-bit integer in host byte order
struct sockaddr_in {sa_family_t sin_family; /* address family */in_port_t sin_port; /* port number */struct in_addr sin_addr; /* IPv4 address */};
inet_ntop - network to presentation
#include <arpa/inet.h>const char *inet_ntop(int domain, const void *restrict addr,char *restrict str, socklen_t size);Returns: pointer to address string on success, NULL on errorint inet_pton(int domain, const char *restrict str,void *restrict addr);Returns: 1 on success, 0 if the format is invalid, or -1 on error
Address Look UpTo iterate or set the network configuration on the machine
#include <netdb.h>struct hostent *gethostent(void); Returns: pointer if OK, NULL on errorvoid sethostent(int stayopen);void endhostent(void);
System calls related toSystem calls related to- TCP- TCP-UDP-UDP
Services are represented by the port number portion of the address. Each service is offered on a unique, well-known port number. We can map a service name to a port number with getservbyname, map a port number to a service name with getservbyport, or scan the services database sequentially with getservent.
* sockfd - the socket file descriptor returned by socket().* addr - a pointer to a struct sockaddr that contains information about IP address and port number.* len - set to sizeof (struct sockaddr)
int connect (int sockfd, struct sockaddr *serv_addr, int addrlen);* sockfd - the socket file descriptor returned by socket().* serv_addr - is a struct sockaddr containing the destination port and IP address.* addrlen - set to sizeof (struct sockaddr).
int listen (int sockfd,int backlog);* sockfd - the socket file descriptor returned by socket().* backlog - the number of connections allowed on the incoming queue.* Backlog should never be zero as servers always expect connection from client.* The listen function converts an unconnected socket into a passive socket.* On successful execution of listen is indicating that the kernel should accept incoming connection requests directed to this socket.
int accept (int sockfd, void *addr, int *addrlen);
/*A child process is created for accepting connections*/printf("Waiting for connection.............");pid=fork();while(1){ if(pid==0) { if((nsd=accept(sd,(struct sockaddr *)&client,&length))==-1) { perror("accept"); exit(1); } printf("Got connection from client:%s",inet_ntoa(client.sin_addr)); /*else fragment is the parent process taking care of send and receive to clients*/ if((dat=recv(nsd,message,40,0))==-1) { perror("recv"); exit(1); } message[dat]='\0'; printf("Data received is : %s",message); printf("Enter the data you want to send to client"); fgets(message,40,stdin); send(nsd,message,40,0);
sockfd - the socket file descriptor returned by socket().addr - a pointer to a struct sockaddr_in. The information about the incoming connection like IP address and port number are stored.addrlen - a local integer variable that should be set to sizeof (structsockaddr_in) before its address is passed to accept().
close (sockfd);* Close system call prevents any more reads and writes to the socket. For attempting to read or write the socket on the remote end will receive an error.
int shutdown (int sockfd, int how);sockfd - socket file descriptor of the socket to be shutdown.how - if it is 0 - Further receives are disallowed 1 - Further sends are disallowed 2 - Further sends and receives are disallowed.The shutdown system call gives more control (than close (sockfd) over how the socket descriptor can be closed.
System calls related toSystem calls related to - TCP - TCP
-UDP-UDP
Client and Server both has to useinclude <sys/socket.h>ssize_t sendto(int sockfd, const void *buf, size_t nbytes, int flags, const struct sockaddr *destaddr, socklen_t destlen);Returns: number of bytes sent if OK, -1 on error
printf("Enter the message you want to send to server");fgets(msg,40,stdin);send(sd,msg,40,0);printf("Waiting for message from server..............");n=recv(sd,msg,40,0);msg[n]='\0';
printf("Message received from server is:%s",msg);close(sd); }
if(listen(sd,5)==-1) { perror("listen"); exit(1);}printf("Waiting for connection.............");if((nsd=accept(sd,(struct sockaddr *)&client,&length))==-1) { perror("accept"); exit(1);}printf("Got connection from client:%s",inet_ntoa(client.sin_addr));
if((dat=recv(nsd,message,40,0))==-1) { perror("recv"); exit(1);}message[dat]='\0';printf("Data received is : %s",message);printf("Enter the data you want to send to client");fgets(message,40,stdin);send(nsd,message,40,0); close(sd);}
printf("Enter the message you want to send to server");fgets(msg,40,stdin);if(sendto(sd,msg,40,0,(struct sockaddr *)&client,sizeof(server))==-1) { perror("sendto"); exit(1);}printf("Waiting for message from server..............");length=sizeof(client);n=recvfrom(sd,msg,40,0,(struct sockaddr *)&server,&length);msg[n]='\0';
printf("Message received from server is:%s",msg);}
printf("Got connection from client:%s",inet_ntoa(client.sin_addr));
message[dat]='\0';
printf("Data received is : %s",message);printf("Enter the data you want to send to client");fgets(message,40,stdin);sendto(sd,message,40,0,(struct sockaddr *)&client,length);}
netlink - Communication between kernel and userspace (PF_NETLINK)
Netlink is used to transfer information between kernel and userspace processes. It consists of a standard sockets-based interface for userspace processes and an internal kernel API for kernel modules.
Netlink is a datagram-oriented service. Both SOCK_RAW and SOCK_DGRAM are valid values for socket_type. However, the netlink protocol does not distinguish between datagram and raw sockets.
netlink_family selects the kernel module or netlink group to communicate with. The currently assigned netlink families are:
NETLINK_ROUTE Receives routing and link updates and may be used to modify the routing tables (both IPv4 and IPv6), IP addresses, link parameters, neighbour setups, queueing disciplines, traffic classes and packet classifiers
NETLINK_W1 Messages from 1-wire subsystem.
Example creates a NETLINK_ROUTE netlink socket which will listen to the RTM-GRP_LINK (network interface create/delete/up/down events) and RTMGRP_IPV4_IFADDR (IPv4 addresses add/delete events) multicast groups.
Example demonstrates how to send a netlink message to the kernel (pid 0). Notethat application must take care of message sequence numbers in order to reliably track acknowledgements.
struct nlmsghdr *nh; /* The nlmsghdr with payload to send. */struct sockaddr_nl sa;struct iovec iov = { (void *) nh, nh->nlmsg_len };struct msghdr msg;
13. Programming and Debugging toolsstrace - Tracing system callsstrace - Tracing system callsltrace - Tracing library callsltrace - Tracing library calls
Tools used to detect memory accessTools used to detect memory accesserror; and memory leakage in linuxerror; and memory leakage in linuxmtracemtrace
Tracing Processes* strace command - trace system calls and signals - strace runs until the given command exits - It is a useful tool for diagnostic, instructional and debugging* ptrace system call - Process trace
1. Trace the Execution of an Executable$ strace lsexecve("/bin/ls", ["ls"], [/* 21 vars */]) = 0brk(0)access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)mmap2(NULL, 8192, PROT_READ, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb78c7000access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)open("/etc/ld.so.cache", O_RDONLY) = 3fstat64(3, {st_mode=S_IFREG|0644, st_size=65354, ...}) = 0......2. Trace a Specific System Calls in an Executable Using Option -e$ strace -e open lsopen("/etc/ld.so.cache", O_RDONLY) = 3open("/lib/libselinux.so.1", O_RDONLY) = 3open("/lib/librt.so.1", O_RDONLY) = 3
3. Execute Strace on a Running Linux Process Using Option -p$ strace -p 1725 -o output.txtattach: ptrace(PTRACE_ATTACH, ...): Operation not permittedCould not attach to process. If your uid matches the uid of the targetprocess, check the setting of /proc/sys/kernel/yama/ptrace_scope, or tryagain as the root user. For more details, see /etc/sysctl.d/10-ptrace.conf
4. Print Relative Time for System Calls Using Option -r
Strace also has the option to print the execution time for each system calls as shown below.
$ strace -r ls0.000000 execve("/bin/ls", ["ls"], [/* 37 vars */]) = 00.000846 brk(0) = 0x84180000.000143 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)0.000163 mmap2(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb787b0000.000119 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)0.000123 open("/etc/ld.so.cache", O_RDONLY) = 30.000099 fstat64(3, {st_mode=S_IFREG|0644, st_size=67188, ...}) = 00.000155 mmap2(NULL, 67188, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb786a000...
13. Programming and Debugging toolsstrace - Tracing system callsstrace - Tracing system calls
ltrace - Tracing library callsltrace - Tracing library callsTools used to detect memory accessTools used to detect memory accesserror; and memory leakage in linuxerror; and memory leakage in linuxmtracemtrace
Day 5 Morning
13. Programming and Debugging toolsstrace - Tracing system callsstrace - Tracing system calls
setenv MALLOC_TRACE mtrace.out4. Run The Program Once5. View The Datamtrace <prog name> <output log file name>mtrace mtrace_test mtrace.outAssuming the C code at the beginning was the code in mtrace_test.c, the following output would be produced:
Memory not freed:----------------- Address Size Caller0x0000000000501460 0x64 at /array/home/dcurrie/test/mtrace/mtrace_test.c:11
ValgrindFinding Memory Leaks With Valgrind
eample.cinclude <stdlib.h>int main(){ char *x = malloc(100); /* or, in C++, "char *x = new char[100] */ x[10] = 'a'; return 0;}$gcc example.c -o example
$valgrind --tool=memcheck --leak-check=yes example==2116== 100 bytes in 1 blocks are definitely lost in loss record 1 of 1==2116== at 0x1B900DD0: malloc (vg_replace_malloc.c:131)==2116== by 0x804840F: main (in /home/cprogram/example1)
Finding Invalid Pointer Use With Valgrindvalgrind --tool=memcheck --leak-check=yes example
results in the following warning
==9814== Invalid write of size 1==9814== at 0x804841E: main (example2.c:6)==9814== Address 0x1BA3607A is 0 bytes after a block of size 10 alloc'd==9814== at 0x1B900DD0: malloc (vg_replace_malloc.c:131)==9814== by 0x804840F: main (example2.c:5)