Transcript

1

Introduction to Computer Networks

Slides courtesy: T. S. Eugene Ng

2

Organizing Network Functionality

• Many kinds of networking functionality– e.g., encoding, framing, routing, addressing, reliability, etc.

• Many different network styles and technologies– circuit-switched vs packet-switched, etc.– wireless vs wired vs optical, etc.

• Many different applications– ftp, email, web, P2P, etc.

• Network architecture– How should different pieces be organized?– How should different pieces interact?

3

Problem

• new application has to interface to all existing media– adding new application requires O(m) work, m = number of media

• new media requires all existing applications be modified– adding new media requires O(a) work, a = number of applications

• total work in system O(ma) eventually too much work to add apps/media

• Application end points may not be on the same media!

SMTP SSH FTP

Packetradio

Coaxial cable

Fiberoptic

Application

TransmissionMedia

HTTP

4

Solution: Indirection

• Solution: introduce an intermediate layer that provides a single abstraction for various network technologies– O(1) work to add app/media– Indirection is an often used technique in computer science

SMTP SSH NFS

802.11LAN

Coaxial cable

Fiberoptic

Application

TransmissionMedia

HTTP

Intermediate layer

5

Network Architecture

• Architecture is not the implementation itself

• Architecture is how to “organize” implementations– what interfaces are supported– where functionality is implemented

• Architecture is the modular design of the network

6

Software Modularity

Break system into modules:

• Well-defined interfaces gives flexibility– can change implementation of modules– can extend functionality of system by adding new modules

• Interfaces hide information– allows for flexibility– but can hurt performance

7

Network Modularity

Like software modularity, but with a twist:

• Implementation distributed across routers and hosts

• Must decide both:– how to break system into modules– where modules are implemented

8

Outline

• Layering– how to break network functionality into modules

• The End-to-End Argument– where to implement functionality

9

Layering

• Layering is a particular form of modularization

• The system is broken into a vertical hierarchy of logically distinct entities (layers)

• The service provided by one layer is based solely on the service provided by layer below

• Rigid structure: easy reuse, performance suffers

10

ISO OSI Reference Model

• ISO – International Standard Organization• OSI – Open System Interconnection• Goal: a general open standard

– allow vendors to enter the market by using their own implementation and protocols

11

ISO OSI Reference Model• Seven layers

– Lower two layers are peer-to-peer– Network layer involves multiple switches– Next four layers are end-to-end

ApplicationPresentation

SessionTransportNetworkDatalinkPhysical

ApplicationPresentation

SessionTransportNetworkDatalinkPhysical

NetworkDatalink

PhysicalPhysical medium A Physical medium B

Host 1 Intermediate switch Host 2

12

Layering Solves Problem

• Application layer doesn’t know about anything below the presentation layer, etc.

• Information about network is hidden from higher layers

• This ensures that we only need to implement an application once!

13

Key Concepts

• Service – says what a layer does– Ethernet: unreliable subnet unicast/multicast/broadcast

datagram service– IP: unreliable end-to-end unicast datagram service– TCP: reliable end-to-end bi-directional byte stream service– Guaranteed bandwidth/latency unicast service

• Service Interface – says how to access the service – E.g. UNIX socket interface

• Protocol – says how is the service implemented– a set of rules and formats that govern the communication

between two peers

14

Physical Layer (1)

• Service: move information between two systems connected by a physical link

• Interface: specifies how to send a bit

• Protocol: coding scheme used to represent a bit, voltage levels, duration of a bit

• Examples: coaxial cable, optical fiber links; transmitters, receivers

15

Datalink Layer (2)

• Service: – framing (attach frame separators) – send data frames between peers– others:

• arbitrate the access to common physical media• per-hop reliable transmission• per-hop flow control

• Interface: send a data unit (packet) to a machine connected to the same physical media

• Protocol: layer addresses, implement Medium Access Control (MAC) (e.g., CSMA/CD)…

16

Network Layer (3)

• Service: – deliver a packet to specified network destination– perform segmentation/reassemble– others:

• packet scheduling• buffer management

• Interface: send a packet to a specified destination• Protocol: define global unique addresses; construct

routing tables

17

Transport Layer (4)

• Service:– Multiplexing/demultiplexing– optional: error-free and flow-controlled delivery

• Interface: send message to specific destination

• Protocol: implements reliability and flow control

• Examples: TCP and UDP

18

Session Layer (5)

• Service:– full-duplex– access management (e.g., token control)– synchronization (e.g., provide check points for long transfers)

• Interface: depends on service

• Protocol: token management; insert checkpoints, implement roll-back functions

19

Presentation Layer (6)

• Service: convert data between various representations

• Interface: depends on service

• Protocol: define data formats, and rules to convert from one format to another

20

Application Layer (7)

• Service: any service provided to the end user

• Interface: depends on the application

• Protocol: depends on the application

• Examples: FTP, Telnet, WWW browser

21

Who Does What?

ApplicationPresentation

SessionTransportNetworkDatalinkPhysical

ApplicationPresentation

SessionTransportNetworkDatalinkPhysical

NetworkDatalinkPhysical

Physical medium

Host A Host B

Router

22

Logical Communication

• Layers interacts with corresponding layer on peer

ApplicationPresentation

SessionTransportNetworkDatalinkPhysical

ApplicationPresentation

SessionTransportNetworkDatalinkPhysical

NetworkDatalinkPhysical

Physical medium

Host A Host B

Router

23

Physical Communication

• Communication goes down to physical network, then to peer, then up to relevant layer

ApplicationPresentation

SessionTransportNetworkDatalinkPhysical

ApplicationPresentation

SessionTransportNetworkDatalinkPhysical

NetworkDatalinkPhysical

Physical medium

Host A Host B

Router

24

Encapsulation

• A layer can use only the service provided by the layer immediate below it

• Each layer may change and add a header to data packet

data

data

data

data

data

data

data

data

data

data

data

data

data

data

25

Example: Postal System

Standard process (historical):• Write letter• Drop an addressed letter off in your local mailbox• Postal service delivers to address• Addressee reads letter (and perhaps responds)

26

Postal Service as Layered System

Layers: • Letter writing/reading• Delivery

Information Hiding:• Network need not know letter contents• Customer need not know how the

postal network works

Encapsulation:• Envelope

Customer

Post Office

Customer

Post Office

28

Functions of the Layers

– Service: Handles details of application programs.– Functions:

– Service: Controls delivery of data between hosts.– Functions: Connection establishment/termination,

error control, flow control, congestion control, etc.

– Service: Moves packets inside the network.– Functions: Routing, addressing, switching, etc.

– Service: Reliable transfer of frames over a link.– Functions: Synchronization, error control, flow

control, etc.

telnet, ftp, emailwww, AFS

TCP, UDP

IP, ICMP, OSPFRIP, BGP

Ethernet, WiFiT1

ApplicationLayer

TransportLayer

NetworkLayer

(Data) LinkLayer

29

Internet Protocol Architecture

FTPprogram

TCP

IP

EthernetDriver

EthernetDriver

ATMDriver

IP

FTPprogram

TCP

IP

ATMDriver

FTP protocol

TCP protocol

IP protocol IP protocol

Ethernetprotocol

ATMprotocol

30

Internet Protocol Architecture

MPEG Servierprogram

UDP

IP

EthernetDriver

EthernetDriver

ATMDriver

IP

MPEG Playerprogram

UDP

IP

ATMDriver

RTP protocol

UDP protocol

IP protocol IP protocol

Ethernetprotocol

ATMprotocol

31

Application

TCP

IP

EthernetDriver

User data

User dataApplicationHeader

Application dataTCP Header

Application dataTCP HeaderIP Header

Application dataTCP HeaderIP HeaderEthernetHeader

EthernetTrailer

IP datagram

TCP segment

Ethernet frame

Encapsulation• As data is moving down the protocol stack, each protocol

is adding layer-specific control information.

32

Hourglass

Note: Additional protocols like routingprotocols (RIP, OSPF) needed to makeIP work

33

Implications of Hourglass

A single Internet layer module:

• Allows all networks to interoperate– all networks technologies that support IP can exchange

packets

• Allows all applications to function on all networks– all applications that can run on IP can use any network

• Simultaneous developments above and below IP

34

Reality

• Layering is a convenient way to think about networks• But layering is often violated

– Firewalls– Transparent caches– NAT boxes

35

Summary

• Layering is a good way to organize network functions

• Unified Internet layer decouples apps from networks

• E2E argument argues to keep IP simple

• Be judicious when thinking about adding to the network layer

OSI & Internet protocol suite

36

Where we work?

37

Sockets API

Open/X Transport Interface

Two reasons for this design

• Upper three layers handle all the details of application and know little about communication i.e. sending, receiving data etc

• Upper three layers form a user process while the lower four layers are provided as part of operating system or kernel.

About kernel

Kernel

• the part of the operating system that is mandatory and common to all other software

• simply the name given to the lowest level of abstraction that is implemented in software

Functionalities of Kernel

• Process Management• Memory Management• Device Management• System Calls

Process Management

• A kernel typically sets up an address space for the process,

• loads the file containing the code into memory, sets up a stack for the program and branches to a given location inside the program, thus starting its execution

Memory Management

• The kernel has full access to the system's memory and must allow processes to safely access this memory as they require it.

• Virtual addressing allows the kernel to make a given physical address appear to be another address, the virtual address.

• Virtual address spaces may be different for different processes;

Device Management

• Processes need access to the peripherals connected to the computer, which are controlled by the kernel through device drivers.

• For example, to show the user something on the screen, an application would make a request to the kernel, which would forward the request to its display driver, which is then responsible for actually plotting the character/pixel

System Calls

• A process must be able to access the services provided by the kernel. This is implemented differently by each kernel, but most provide a C library or an API, which in turn invokes the related kernel functions

• Implemented using software simulated interrupts

Programs and Processes

• A program is an executable file residing on disk. A program is read into memory and executed by the kernel

• An executing instance of a program is called a process

• Every process has a unique non-negative identifier called process id (PID)

Process Environment

• What happens when we execute a C program? ./a.out

• How the command-line arguments are passed to the process?

• Memory layout of a process

What happens when we execute a C program?

• int main(int argc, char *argv[]); • When a C program is executed by the kernel by one of the exec

functions, a special start-up routine is called before the main function is called.

• The executable program file specifies this routine as the starting address for the program;

• This start-up routine takes values from the kernel the command-line arguments and the environment

Memory Layout of C Program

• Code - text segment• Initialized data – data segment• Uninitialized data – bss segment• Heap• Stack

Memory Layout of C Program

• Code - text segment– Machine instructions that the CPU executes– Sharable – Read-only

Memory Layout of C Program

• Initialized data – data segment– Variables initialized to non-zero values appearing outside

any function causes this variable to be stored in the initialized data segment with its initial value.

– Statically allocated and global data that are initialized with nonzero values live in the data segment

Memory Layout of C Program

• Uninitialized data – bss segment– BSS stands for ‘Block Started by Symbol’. – Global and statically allocated data that initialized to zero

by default are kept here

Memory Layout

• Stack– The stack segment is where local (automatic) variables are allocated.  – The data is popped up or pushed into the stack following the Last In First

Out (LIFO) rule. – When a function is called, a stack frame is created and PUSHed onto the

top of the stack. This stack frame contains information such as the address from which the function was called and where to jump back to when the function is finished (return address), parameters, local variables, and any other information needed by the invoked function.

– When a function returns, the stack frame is POPped from the stack.  Typically the stack grows downward, meaning that items deeper in the call chain are at numerically lower addresses and toward the heap.

Stack

Memory Layout of C Program

• Heap– The heap is where dynamic memory (obtained by malloc(), calloc(),

realloc()) comes from.  – It is typical for the heap to grow upward.  This means that successive items

that are added to the heap are added at addresses that are numerically greater than previous items. 

– The end of the heap is marked by a pointer known as the break. You cannot reference past the break. You can, however, move the break pointer (via brk() and sbrk() system calls) to a new position to increase the amount of heap memory available.

Environment Variables

• Stored in process memory• Set of parameters that are inherited from process to process.• Each program is also passed an environment list like the

argument list.• Environment list is an array of character pointers, with each

pointer containing the variable name and its value.

Environment Variables

Listing all arguments and environment vars

intmain (int argc, char *argv[]){ int i; char **ptr; extern char **environ; for (i = 0; i < argc; i++) /* echo all command-line args */ printf ("argv[%d]: %s\n", i, argv[i]); for (ptr = environ; *ptr != 0; ptr++) /* and all env strings */ printf ("%s\n", *ptr); exit (0);}

Functions to access environment variables

Process Control

• Every process has a unique process ID, a non-negative integer.

• Although unique, process IDs are reused. As processes terminate, their IDs become candidates for reuse.

• Process ID 0 is usually the scheduler process and is often known as the swapper.

Process Control

• Process ID 1 is usually the init process and is invoked by the kernel at the end of the bootstrap procedure. This process is responsible for bringing up a UNIX system after the kernel has been bootstrapped.

• The init process never dies. It is a normal user process, not a system process within the kernel, although it does run with super user privileges.

• init becomes the parent process of any orphaned child process.

Process Identifiers

#include <unistd.h> • pid_t getpid(void);

Returns: process ID of calling process• pid_t getppid(void);

Returns: parent process ID of calling process• uid_t getuid(void);

Returns: real user ID of calling process• uid_t geteuid(void);

Returns: effective user ID of calling process• gid_t getgid(void);

Returns: real group ID of calling process• gid_t getegid(void);

Returns: effective group ID of calling process

fork()

• An existing process can create a new one by calling the fork function.#include <unistd.h> pid_t fork(void);

Returns: 0 in child, process ID of child in parent, 1 on error• The new process created by fork is called the child process. This

function is called once but returns twice. The only difference in the returns is that the return value in the child is 0, whereas the return value in the parent is the process ID of the new child

fork()

• Both the child and the parent continue executing with the instruction that follows the call to fork.

• The child is a copy of the parent. For example, the child gets a copy of the parent's data space, heap, and stack. Note that this is a copy for the child; the parent and the child do not share these portions of memory. The parent and the child share the text segment

copy-on-write (COW)

• don't perform a complete copy of the parent's data, stack, and heap

• These regions are shared by the parent and the child and have their protection changed by the kernel to read-only

• If either process tries to modify these regions, the kernel then makes a copy of that piece of memory only, typically a "page" in a virtual memory system.

int glob = 6; //global variableintmain (){ int var; pid_t pid; var = 88; printf ("Before fork\n"); if ((pid = fork ()) < 0) perror ("fork"); //function to print error that occurred in the process else if (pid == 0) { glob++; var++; printf ("pid = %d, glob=%d, var=%d\n", getpid (), glob, var); exit (0); } else { printf ("pid = %d, glob=%d, var=%d\n", getpid (), glob, var); exit (0); }}

fork()

• In general, we never know whether the child starts executing before the parent or vice versa. This depends on the scheduling algorithm used by the kernel.

• To synchronize child and parent, some form of interprocess communication is required.

File sharing between parent and child

• one characteristic of fork is that all file descriptors that are open in the parent are duplicated in the child.

• The parent and the child share a file table entry for every open descriptor .

• Generally shell process has three different files opened for standard input, standard output, and standard error. When a command is executed as a process, they are inherited

vfork()

• The vfork function is intended to create a new process when the purpose of the new process is to exec a new program

• The vfork function creates the new process, just like fork, without copying the address space of the parent into the child, as the child won't reference that address space

• vfork guarantees that the child runs first, until the child calls exec or exit. When the child calls either of these functions, the parent resumes.

What child inherits?

• Real user ID, real group ID, effective user ID, effective group ID• Current working directory• Root directory• File mode creation mask• Environment • Process group ID• Session ID• Controlling terminal• Attached shared memory segments• Memory mappings• Resource limits

What values in child are different from parent?

• The return value from fork• The process IDs are different• The two processes have different parent process IDs: the parent

process ID of the child is the parent; the parent process ID of the parent doesn't change

• The child's tms_utime, tms_stime, tms_cutime, and tms_cstime values are set to 0

• File locks set by the parent are not inherited by the child• Pending alarms are cleared for the child• The set of pending signals for the child is set to the empty set

Process Termination

• Normal Termination– Return from main– Calling exit– Calling _exit or _Exit– Return of the last thread from its start routine– Calling pthread_exit from the last thread

• Abnormal termination – Calling abort – Receipt of a signal – Response of the last thread to a cancellation request

Process Termination

• Regardless of how a process terminates, the same code in the kernel is eventually executed. This kernel code closes all the open descriptors for the process, releases the memory that it was using, and the like.

• Te able to notify its parent how it terminated, child passes an exit status as the argument to exit functions (exit, _exit, and _Exit),

• In the case of an abnormal termination, however, the kernel, not the process, generates a termination status to indicate the reason for the abnormal termination.

• In any case, the parent of the process can obtain the termination status using wait or the waitpid function

Process Termination

• When a process terminates, either normally or abnormally, the kernel notifies the parent by sending the SIGCHLD signal to the parent.

• This signal is the asynchronous notification from the kernel to the parent. The parent can choose to ignore this signal, or it can provide a function that is called when the signal occurs: a signal handler.

• The default action for this signal is that it is ignored.

wait() & waitpid()

• Parent can obtain termination status from kernel using these calls

• Process that calls wait or waitpid can– Block, if all of its children are still running– Return immediately with the termination status of a child, if a child

has terminated and is waiting for its termination status to be fetched– Return immediately with an error, if it doesn't have any child

processes

Syntax

waitpid()

main (){ int i = 0, j = 0; pid_t ret; int status; ret = fork (); if (ret == 0) { for (i = 0; i < 5000; i++) printf ("Child: %d\n", i); printf ("Child ends\n"); } else { wait (&status); printf ("Parent resumes.\n"); for (j = 0; j < 5000; j++) printf ("Parent: %d\n", j); }}

What happens if parent terminates before child?

• the init process becomes the parent process of any process whose parent terminates ( process has been inherited by init)

• parent process ID of the surviving process is changed to be 1 (the process ID of init). This way, we're guaranteed that every process has a parent.

What happens when a child terminates before its parent ?

• Kernel keeps small amount of information (process ID, the termination status of the process, and the amount of CPU time taken by the process ) until parent asks for it

• a process that has terminated, but whose parent has not yet waited for it, is called a zombie

exec functions

• fork function creates a new process (the child). Then causes another program to be executed by calling one of the exec functions.

• When a process calls one of the exec functions, that process is completely replaced by the new program, and the new program starts executing at its main function.

• The process ID does not change across an exec, because a new process is not created;

• exec replaces the current process, its text, data, heap, and stack segments with a new program from disk.

#include <unistd.h> • int execl(const char *pathname, const char *arg0, ... /*

(char *)0 */ ); • int execv(const char *pathname, char *const argv []);• int execle(const char *pathname, const char *arg0, ... /*

(char *)0, char *const envp[] */ ); • int execve(const char *pathname, char *const argv[], char

*const envp []); • int execlp(const char *filename, const char *arg0, ... /*

(char *)0 */ ); • int execvp(const char *filename, char *const argv []);

Remembering arguments

Function pathname filename Arg list argv[] environ envp[]

execl •   •   •  execlp   • •   •  execle •   •     •execv •     • •execvp   •   • •  execve •     •   •(letter in

name)   p l v   e

Example

Output: Executes ls command with –l optionint main (){ execl ("/bin/ls", "ls", "-l", (char *) 0); printf ("hello");}

• Input: a command to execute and its arguments int main(int argc, char **argv){execvp(argv[1], argv+1);

}

Signals

• A signal is an asynchronous event which is delivered to a process.

• Asynchronous means that the event can occur at any time– may be unrelated to the execution of the process– e.g. user types ctrl-C, or the modem hangs

Signals

• Name Description Default ActionSIGINT Interrupt character typed terminate processSIGQUIT Quit character typed (^\) terminate + create

core imageSIGKILL kill -9 terminate processSIGSEGV Invalid memory reference terminate +

create core imageSIGPIPE Write on pipe but no reader terminate processSIGALRM alarm() clock ‘rings’ terminate processSIGUSR1 user-defined signal type terminate processSIGUSR2 user-defined signal type terminate process

• See man 7 signal

Signal Sources

• Terminal-generated signals: SIGINT, SIGQUIT• Hardware exceptions generate signals: SIGFPE, SIGSEGV• kill function allows a process to send any signal to another

process or process group• The kill command allows us to send signals to other processes. • Software conditions: SIGURG, SIGPIPE, SIGALRM

kill() and raise()function

• Send a signal to a process (or group of processes).

#include <signal.h>int kill( pid_t pid, int signo );int raise(int signo);

• pid > 0 send signal to process pid

pid== 0 send signal to all processeswhose process group ID equals the sender’s

pgid.e.g. parent kills all children

• Return 0 if ok, -1 on error.

Responding to a Signal

• A process can:– ignore/discard the signal (not possible with SIGKILL or SIGSTOP)

– Catch the signal and execute a signal handler function, and then possibly resume execution

– Let the default action apply. Every signal has a default action• The choice is called the signal disposition

Signal Handler Function

• Specify a signal handler function to deal with a signal type.• #include <signal.h>

typedef void Sigfunc(int); /* my defn */Sigfunc *signal( int signo, Sigfunc *handler );– signal returns a pointer to a function that takes an int (i.e. it returns a

pointer to Sigfunc)• Returns previous signal disposition if ok, SIG_ERR on error.

Example

int main(){

signal( SIGINT, foo ); :

/* do usual things until SIGINT */return 0;}

void foo( int signo ){

: /* deal with SIGINT signal */

return; /* return to program */}

Special Sigfunc * Values

• Value Meaning

SIG_IGN Ignore / discard the signal.

SIG_DFL Use default action to handle signal.

SIG_ERR Returned by signal() as an error.

Signals Overview• Three phases to processing signals:

– Signal is generated• when the event that causes the signal occurs

– Signal is delivered• signal is said to be delivered to the process when process takes

action for the signal– Signal is pending

• during the time between generation and delivery, the signal is said to be pending

Signal blocking

• Blocking the delivery of a signal– process informs the signal to be blocked to kernel– When such signal is generated for the process, if the action

is not ignore, that signal remains pending until the process either unblocks it or changes action to ignore

Multiple Signals

• If a blocked signal is generated more than once then in most systems the signal is delivered only once. That is the signal is not queued.

• If many signals of different types are ready to be delivered (e.g. a SIGINT, SIGSEGV, SIGUSR1), they are not delivered in any fixed order.

Signal Sets

• A data type to represent multiple signals• #include <signal.h>

– int sigemptyset(sigset_t *set); – int sigfillset(sigset_t *set); – int sigaddset(sigset_t *set, int signo); – int sigdelset(sigset_t *set, int signo);

All four return: 0 if OK, 1 on error int – sigismember(const sigset_t *set, int signo); – Returns: 1 if true, 0 if false, 1 on error

sigprocmask()

• A process uses a signal set to create a mask which defines the signals it is blocking from delivery. – good for critical sections where you want to block certain signals.

• #include <signal.h>int sigprocmask( int how,

const sigset_t *set,sigset_t *oldset);

• how – indicates how mask is modified

‘how’ Meanings

• Value Meaning

SIG_BLOCK set signals are added to mask

SIG_UNBLOCK set signals are removed from mask

SIG_SETMASK set becomes new mask

A Critical Code Region

sigset_t newmask, oldmask;

sigemptyset( &newmask );sigaddset( &newmask, SIGINT );

/* block SIGINT; save old mask */sigprocmask( SIG_BLOCK, &newmask, &oldmask );

/* critical region of code */

/* reset mask which unblocks SIGINT */sigprocmask( SIG_SETMASK, &oldmask, NULL );

sigaction()

• Supercedes (more powerful than) signal()– sigaction() can be used to code a non-

resetting signal()• #include <signal.h>

int sigaction(int signo, const struct sigaction *act, struct sigaction *oldact );

sigaction Structure

struct sigaction {

void (*sa_handler)( int ); /* action to be taken or SIG_IGN, SIG_DFL */

sigset_t sa_mask; /* additional signal to be blocked */ int sa_flags; /* modifies action of the signal */

void (*sa_sigaction)( int, siginfo_t *, void * );/*The sa_sigaction field is an alternate signal handler used when

the SA_SIGINFO flag is used with sigaction. */}

• sa_flags – – SIG_DFL reset handler to default upon return– SA_SIGINFO denotes extra information is passed to handler (.i.e. specifies the

use of the “second” handler in the structure.

sigaction() Behavior

• A signo signal causes the sa_handler signal handler to be called.

• While sa_handler executes, the signals in sa_mask are blocked. Any more signo signals are also blocked.

• sa_handler remains installed until it is changed by another sigaction() call. No reset problem.

• sa_sigaction specifies handler if SA_SIGINFO flag is set.

struct siginfo { int si_signo; /* signal number */ int si_errno; /* if nonzero, errno value from <errno.h> */int si_code; /* additional info (depends on signal) */ pid_t si_pid; /* sending process ID */ uid_t si_uid; /* sending process real user ID */ void *si_addr; /* address that caused the fault */ int si_status; /* exit value or signal number */ long si_band; /* band number for SIGPOLL */ /* possibly other fields also */

};

Other POSIX Functions

• sigpending() examine blocked signals

• sigsetjmp()siglongjmp() jump functions for use

in signal handlers whichhandle masks correctly

• sigsuspend() atomically reset maskand sleep

pause()

• Suspend the calling process until a signal is caught.• #include <unistd.h>

int pause(void);• Returns -1 with errno assigned EINTR.• pause() only returns after a signal handler has returned.

alarm()

• Set an alarm timer that will ‘ring’ after a specified number of seconds– a SIGALRM signal is generated

• #include <unistd.h>long alarm(long secs);

• Returns 0 or number of seconds until previously set alarm would have ‘rung’.

Some aspects of alarm()

• A process can have at most one alarm timer running at once.

• If alarm() is called when there is an existing alarm set then it returns the number of seconds remaining for the old alarm, and sets the timer to the new alarm value.

• An alarm(0) call causes the previous alarm to be cancelled.

setjmp() and longjmp()

• In C we cannot use goto to jump to a label in another function– use setjmp() and longjmp() for those ‘long jumps’

• Uses :– error handling which requires a deeply nested function to recover to

a higher level (e.g. back to main())– coding timeouts with signals

Prototypes

• #include <setjmp.h>int setjmp( jmp_buf env );

• Returns 0 if called directly, non-zero if returning from a call to longjmp().• #include <setjmp.h>

void longjmp( jmp_buf env, int val );• In the setjmp() call, env is initialized to information about the current

state of the stack.• The longjmp() call causes the stack to be reset to its env value.• Execution restarts after the setjmp() call, but this time setjmp()

returns val.

Examplejmp_buf env; /* global */int main(){

char line[MAX]; int errval;

if(( errval = setjmp(env) ) != 0 ) printf( “error %d: restart\n”, errval ); while( fgets( line, MAX, stdin ) != NULL ) process_line(line); return 0;

}

continued

:void process_line( char * ptr )

{:cmd_add():}

void cmd_add(){

int token;

token = get_token(); if( token < 0 ) /* bad error */ longjmp( env, 1 );

/* normal processing */}

int get_token(){if( some error )

longjmp( env, 2 );}

Stack Frames before calling longjmp()

top of stack

direction ofstack growth

main()stack frame

setjmp(env)returns 0;env records stackframes info

Stack Frames after longjmp()

top of stack

direction ofstack growth

main()stack frame

process_line()stack frame

::

cmd_add()stack frame

longjmp(env,1)causes stack framesto be reset

What happens if longjmp() is called in signal handler?

• Signal is automatically added to signal mask (which prevents it from further delivery) when a signal handler is is entered. When signal handler is exited, signal is removed from the mask.

• When longjmp() is called in signal handler, the signal remains blocked.

siglongjmp & sigsetjmp

• POSIX does not specify whether longjmp will restore the signal context. If you want to save and restore signal masks, use siglongjmp.

• POSIX does not specify whether setjmp will save the signal context. If you want to save signal masks, use sigsetjmp.

• #include <setjmp.h> • int sigsetjmp(sigjmp_buf env, int savemask);

Returns: 0 if called directly, nonzero if returning from a call to siglongjmp • void siglongjmp(sigjmp_buf env, int val);

Inter Process Communication

122

Why do processes communicate?

123

To share resourcesClient/server paradigmsInherently distributed applicationsReusable software componentsetc

Types of IPC

• Message Passing– Pipes, FIFOs, and Message Queues

• Synchronization– Mutexes, condition variables, read-write locks, file and record locks,

and semaphores• Shared memory• Remote Procedure Calls

– Solaris doors and Sun RPC

Sharing of information

What is IPC?

• Each process has a private address space. Normally, no process can write to another process’s space. How to get important data from process A to process B?

• Message passing between different processes running on the same operating system is IPC

• Synchronization is required in case of IPC through shared memory or file system

Pipes

• Pipes are the oldest form of UNIX System IPC and are provided by all UNIX systems

• Most commonly used form of IPC • Historically, they have been half duplex (i.e., data flows in only

one direction). • Because they don’t have names, pipes can be used only

between processes that have a common ancestor. – Normally, a pipe is created by a process, that process calls fork,

and the pipe is used between the parent and the child.

UNIX Pipes

Info to beshared Info copy

pipe for p1 and p2

write function read function

int p[2];pipe(p);write(p[1], “hello”, size);….

read(p[0], inbuf, size);….

FIFO buffersize = 4096 characters

Parent process, p1 Child process, p2

olleh

Pipes

• #include <unistd.h>• int pipe(int fd[2]); returns 0 if OK,

else -1• fd[0]-> for reading, fd[1] is for writing

Pipes

• Pipes are rarely used in a single process. They are generally used between parent and child

Pipes

main (){ int i; int p[2]; pid_t ret; pipe (p); //creating pipe char buf[100]; ret = fork (); if (ret == 0) { write (p[1], "hello", 6);//writing to parent through pipe } if (ret > 0) { read (p[0], buf, 6); //reading from child via pipe printf ("Child Said:%s\n", buf); //printing to stdout }}

Pipes: who|sort

stdout

who|sort

• Create a pipe in the parent• Fork a child• Duplicate the standard output descriptor to write end of pipe• Exec ‘who’ program• In the parent wait for the child. • Duplicate the standard input descriptor to read end of pipe• Exec ‘sort’ program

who|sort

main (){ int i; int p[2]; pid_t ret; pipe (p); ret = fork (); if (ret == 0) { close (1); dup (p[1]); close (p[0]); execlp (“who", “who", (char *) 0); } if (ret > 0) { close (0); dup (p[0]); close (p[1]); wait (NULL); execlp (“sort", “sort", (char *) 0); }}

dup and dup2 Functions

• #include <unistd.h> • int dup(int filedes); • int dup2(int filedes, int filedes2);

Both return: new file descriptor if OK, 1 on error• The new file descriptor returned by dup is guaranteed to be the lowest-

numbered available file descriptor. • With dup2, we specify the value of the new descriptor with the filedes2

argument. If filedes2 is already open, it is first closed. If filedes equals filedes2, then dup2 returns filedes2 without closing it.

dup and dup2

Popen

• #include <stdio.h> • FILE *popen(const char *cmdstring, const char *type);

• Returns: file pointer if OK, NULL on error• int pclose(FILE *fp);

popen

• Popen does – creating a pipe, forking a child, closing the unused ends of

the pipe, executing a shell to run the command, and waiting for the command to terminate

– fp = popen("ls *.c", "r");

Name Spaces

• When two unrelated processes use some type of IPC to exchange information, the IPC object must have a name or identifier of some form

• The set of possible names for a given type of IPC is called its name space

• FIFOs have pathname in the file system as identifier

FIFOs

• Create a FIFO– #include <sys/types.h>– #include <sys/stat.h>– int mkfifo(const char *pathname, mode_t mode)

//returns 0 if OK or -1• Ex: if( mkfifo("fifo1", 0666)<0) perror();

– mkfifo returns error ‘EEXIST’ if the FIFO already exists at the given path

FIFOs

• Once a FIFO is created, it should be opened either for reading or writing– wfd=open("fifo1",O_WRONLY); or– FILE *fp = fopen(“fifo1”, “w”);

• FIFO can’t be opened both for reading and writing at the same time

• Unlike pipe, FIFO is not deleted as soon as all the processes referring to it exit. It has to be explicitly deleted from system.– unlink(“fifo1”)

FIFOs between parent and child

FIFOs between parent and child

Properties of FIFO

FIFOs between parent and child

Swap these two calls and see

Non-blocking option

• A descriptor can be set non-blocking in one of the two ways

Or

Read and write operations Pipe and FIFO

Writing to pipe/fifo when pipe/fifo is open for reading

• If data size is less than or equal to PIPE_BUF, the write is atomic i.e. either all the data is written or no data written

• If there is no room in the pipe for the requested data (<PIPE_BUF), by default it blocks.

– If O_NONBLOCK option is set, EAGAIN error is returned• If data is >PIPE_BUF and O_NONBLOCK option is set, even if 1 byte

space is available in the pipe, it will write that much data and return– Atomicity is not guaranteed

Message Queues

• A message queue is a linked list of messages stored within the kernel and identified by a message queue identifier

• Any process with adequate privileges can place the message into the queue and any process with adequate privileges can read from queue

• There is no requirement that some process must be waiting to receive message before sending the message

Message Queues

• Every message queue has following structure in kernel

Message Queues

Permissions

• struct ipc_perm { uid_t uid; /* owner's effective user id */ gid_t gid; /* owner's effective group id */ uid_t cuid; /* creator's effective user id */ gid_t cgid; /* creator's effective group id */ mode_t mode; /* access modes */ . . . };

• Permission Bit– user-read 0400– user-write (alter) 0200 – group-read 0040– group-write (alter) 0020– other-read 0004– other-write (alter) 0002

Message Queues

• First msgget is used to either open an existing queue or create a new queue

• #include <sys/msg.h>int msgget(key_t key, int flag); – Returns: message queue ID if OK, 1 on error

• Key value can be IPC_PRIVATE, key generated by ftok() or any key (long integer)

• Flag value must be– IPC_CREAT if a new queue has to be created– IPC_CREAT and IPC_EXCL if want to create a new a queue but don’t

reference existing one

Key Values

• The server can create a new IPC structure by specifying a key of IPC_PRIVATE

– Kernel generates a uniqe id• The client and the server can agree on a key by defining the key in a

common header. • The client and the server can agree on a pathname and project ID

and call the function ftok to convert these two values into a key.– #include <sys/ipc.h>– key_t ftok(const char *path, int id); – The path argument must refer to an existing file. Only the lower 8 bits of

id are used when generating the key.

Message Queues

• When a new queue is created, the following members of the msqid_ds structure are initialized.– The ipc_perm structure is initialized – msg_qnum, msg_lspid, msg_lrpid, msg_stime, and msg_rtime are

all set to 0.– msg_ctime is set to the current time.– msg_qbytes is set to the system limit.

• On success, msgget returns the non-negative queue ID. This value is then used with the other three message queue functions.

Messages

• Each message is composed of a positive long integer type field, and the actual data bytes. Messages are always placed at the end of the queue.

• Messaeg Template

• Most applications define their own message structure according to the needs of the application

Sending Messages

• #include <sys/msg.h>int msgsnd(int msqid, const void *ptr, size_t nbytes, int flag);

• msqid is the id returned by msgget sys call • The ptr argument is a pointer to a message structure • Nbytes is the length of the user data i.e. sizeof(struct mesg) – size

of(long). Length can be zero.• A flag value of 0 or IPC_NOWAIT can be specified • mssnd() is blocked until one of the following occurs

– Room exists for the message– Message queue is removed (EIDRM error is returned)– Interrupted by a signal ( EINTR is returned)

158

Receiving Messages

• ptr points to the message structure where message will be stord• Length points to the size available on the message structure excluding

size of (long) • Type indicates the message desired on the message queue• Flag can be 0 or IPC_NOWAIT or MSG_NOERROR

159

Receiving Messages

• The type argument lets us specify which message we want.– type == 0: The first message on the queue is returned.– type > 0:The first message on the queue whose message type equals type

is returned.– type < 0:The first message on the queue whose message type is the lowest

value less than or equal to the absolute value of type is returned.• A nonzero type is used to read the messages in an order other than

first in, first out. – Priority to messages, Multiplexing

160

Receiving Messages

• IPC_NOWAIT flag makes the operation nonblocking, causing msgrcv to return -1 with errno set to ENOMSG if a message of the specified type is not available.

• If IPC_NOWAIT is not specified, the operation blocks until – a message of the specified type is available, – the queue is removed from the system (-1 is returned with errno set to

EIDRM)– a signal is caught and the signal handler returns (causing msgrcv to return 1

with errno set to EINTR).

161

Receiving Messages

• If the returned message is larger than nbytes and the MSG_NOERROR bit in flag is set, the message is truncated. – no notification is given to us that the message was truncated, and

the remainder of the message is discarded. • If the message is too big and MSG_NOERROR is not specified,

an error of E2BIG is returned instead (and the message stays on the queue).

162

Control Operations on Message Queues

• #include <sys/msg.h> int msgctl(int msqid, int cmd, struct msqid_ds *buf );

• IPC_STAT: Fetch the msqid_ds structure for this queue, storing it in the structure pointed to by buf.

• IPC_SET: Copy the following fields from the structure pointed to by buf to the msqid_ds structure associated with this queue: msg_perm.uid, msg_perm.gid, msg_perm.mode, and msg_qbytes.

• IPC_RMID: Remove the message queue from the system and any data still on the queue. This removal is immediate.

– Any other process still using the message queue will get an error of EIDRM on its next attempted operation on the queue.

– Above two commands can be executed only by a process whose effective user ID equals msg_perm.cuid or msg_perm.uid or by a process with superuser privileges

163

Server.c

/*key.h*/#define MSGQ_PATH "/home/students/f2007045/msgq_server.c " struct my_msgbuf{ long mtype; char mtext[200];}; int main (void){ struct my_msgbuf buf; int msqid; key_t key;  if ((key = ftok (MSGQ_PATH, 'B')) == -1) { perror ("ftok"); exit (1); } 

  if ((msqid = msgget (key, IPC_CREAT | 0644)) == -1) { perror ("msgget"); exit (1); }  printf ("server: ready to receive messages\n"); for (;;) { if (msgrcv (msqid, &(buf.mtype), sizeof (buf), 0, 0) == -1)

{ perror ("msgrcv"); exit (1);}

printf ("server: \"%s\"\n", buf.mtext); }  return 0;}

164

Client.c#include "key.h“struct my_msgbuf{ long mtype; char mtext[200];};

main (void){ struct my_msgbuf buf; int msqid; key_t key;  if ((key = ftok (MSGQ_PATH, 'B')) == -1) { perror ("ftok"); exit (1); }  if ((msqid = msgget (key, 0) == -1) { perror ("msgget"); exit (1); } 

printf ("Enter lines of text, ^D to quit:\n");  buf.mtype = 1; while (gets (buf.mtext), !feof (stdin)) { if (msgsnd (msqid, &(buf.mtype), sizeof (buf), 0) == -1)perror ("msgsnd"); }  if (msgctl (msqid, IPC_RMID, NULL) == -1) { perror ("msgctl"); exit (1); }  return 0;}

165

Multiplexing Messages

• Possibility of dead lock

166

Multiplexing Messages

167

System V Semaphores

• A semaphore is a primitive used to provide synchronization between various processes (or between various threads in a given process)

• Binary Semaphores: a semaphore that can assume only values 0 or 1

• Counting Semaphores: semaphore is initialized to N indicating the number of resources

168

System V Semaphores

• Semaphores are maintained by kernel

169

Semaphore operations

• Create a semaphore and initialize it – should be atomically done

• Wait for a semaphore: This tests the value of the semaphore. waits (blocks) if the value is less than or equal to 0 and then decrements the semaphore value once it is greater than 0 (aka P, lock, wait)

– Testing and decrementing should be a single atomic operation• Post a semaphore. This increments the semaphore value. If any

processes are blocked waiting for this semaphores’s value o be greater than 0, one of those processes are woken up (aka V, unlock, signal)

170

Producer Consumer Problem

• Producer produces one item and keeps in buffer.• Consumer removes that item for processing• How to synchronize?

171

Producer Consumer Problem

• Semaphore put controls whether the producer can place an item into the shared buffer

• Semaphore get controls whether the consumer can remove an item from the shred buffer

172

System V Semaphores

• Add one more level of detail by defining “a set of counting semaphores”

• When we say System V semaphore it refers to a set of couting semaphores ( max size of set is 25)

173

System V Semaphores

• Kernel maintains the following structure for every set

• Sem structure maintains info about each semaphore. Sem_base contains pointer to an array of these structures

174

System V Semaphores

• Kernel structure for a semaphore set having 2 counting semaphores

175

Creating Semaphores

• The number of semaphores in the set is nsems. If a new set is being created, we must specify nsems. If we are referencing an existing set, we can specify nsems as 0.

• When a new set is created, the following members of the semid_ds structure are initialized.

– The ipc_perm structure – sem_otime is set to 0.– sem_ctime is set to the current time.– sem_nsems is set to nsems.

176

Initializing a semaphore value

• Semnum specifies which semaphore (0,1,2 …)• Semun union is used for some commands

• This union desn’t appear in any application, it should be declared in your program

177

Testing whether semaphore has been initilized

• When process P1 creates semaphore sem_otime is set to zero.

• When P1 calls semctl to initialize and then semop, sem_otime is set to current time.

• When process P2 checks sem_otime is non zero it understands that semaphore has been initialized.

178

semctl() commands

• IPC_STAT, IPC_SET, IPC_RMID same as in message queues• GETVAL: Return the value of semval for the member semnum.• SETVAL: Set the value of semval for the member semnum. The value is

specified by arg.val.• GETPID: Return the value of sempid for the member semnum.• GETNCNT: Return the value of semncnt for the member semnum.• GETZCNT: Return the value of semzcnt for the member semnum.• GETALL: Fetch all the semaphore values in the set. These values are stored in

the array pointed to by arg.array.• SETALL: Set all the semaphore values in the set to the values pointed to by

arg.array

179

Semaphore opearions

• Opsptr points to an array of following structure

• nops specifies number of structures in the array• Semop gurantees that either all these operations are done or

none are done

180

Semaphore operations

• The operation on each member of the set is specified by the corresponding sem_op value. This value can be negative, 0, or positive.

• If sem_op>0:– returning of resources by the process. – Semval+=sem_op– If the SEM_UNDO flag is specified, semadj -=sem_op – subtracted from the semaphore's adjustment value for this process.

181

Semaphore operations

• If sem_op <0– obtain resources that the semaphore controls.

• If semval>= |sem_op| – the resources are available– Semva -= |sem_op|– If the SEM_UNDO flag is specified, – semadj += sem_op – added to the semaphore's adjustment value for this process.

182

Semaphore operations

• If semval < |sem_op| – the resources are not available– If IPC_NOWAIT is specified, semop returns with an error of EAGAIN.– If IPC_NOWAIT is not specified, the semncnt value for this semaphore is incremented

(since the caller is about to go to sleep), and the calling process is suspended until one of the following occurs.

• Semval>=|sem_op| i.e. some other process has released some resources. Semncnt--• The semaphore is removed from the system. In this case, the function returns an error of

EIDRM.• A signal is caught by the process, and the signal handler returns. and the function returns an

error of EINTR. semncnt--

183

Semaphore operations

• If sem_op = 0,– this means that the calling process wants to wait until the semaphore's value becomes 0.

• If the semaphore's value is currently 0, the function returns immediately.• If the semaphore's value is nonzero, the following conditions apply.

– If IPC_NOWAIT is specified, return is made with an error of EAGAIN.– If IPC_NOWAIT is not specified, semzcnt++, and the calling process is suspended until one of the

following occurs.• The semaphore's value becomes 0. semzcnt--• The semaphore is removed from the system. In this case, the function returns an error of EIDRM.• A signal is caught by the process, and the signal handler returns. the function returns an error of EINTR. Semzcnt--

184

Semval adjustment on process termination

• it is a problem if a process terminates while it has resources allocated through a semaphore.

• Whenever we specify the SEM_UNDO flag for a semaphore operation and we allocate resources (a sem_op value less than 0), the kernel remembers how many resources we allocated from that particular semaphore (the absolute value of sem_op).

• When the process terminates, either voluntarily or involuntarily, the kernel checks whether the process has any outstanding semaphore adjustments and, if so, applies the adjustment to the corresponding semaphore value.

• If we set the value of a semaphore using semctl, with either the SETVAL or SETALL commands, the adjustment value for that semaphore in all processes is set to 0.

185

Producer Consumer unsigned short val[1]; id = semget (KEY, 1, IPC_CREAT | 0666);setval.val = 2; semctl (id, 0, SETVAL, setval);

operations[0].sem_num = 0;operations[0].sem_op = 0;operations[0].sem_flg = 0;  operations[1].sem_num = 0;operations[1].sem_op = 10;operations[1].sem_flg = 0; for (;;) { retval = semop (id, operations, 2); if (retval == 0)

{ printf ("Producer: Adding 10 objects\n"); getval.array = val;

semctl (id, 0, GETALL, getval); printf ("Sem Val: %d\n", getval.array[0]);

}} 

id = semget (KEY, 1, 0666);operations[0].sem_num = 0;operations[0].sem_op = -1;operations[0].sem_flg = 0; for (;;) { retval = semop (id, operations, 1);  if (retval == 0)

{printf ("Consumer: Getting one object from shelf.\n"); setval.array=val;semctl (id, 0, GETALL, setval);printf("Sem Value: %d\n", setval.array[0]);

}}

186

Shared Memory

• Shared memory allows two or more processes to share a given region of memory.

• This is the fastest form of IPC, because the data does not need to be copied between the client and the server

187

Message Passing

• Takes 4 copies to transfer data between two processes

188

Shared Memory

• Takes only two steps • Kernel is not involved in transferring data but it is involved in

creating shared memory

189

Memory mapped files

190

Memory mapped files

• proto argument for read-write access is PROT_READ|PROTO_WRITE

• Flags must be either MAP_SHARED or MAP_PRIVATE

• MAP_SHARED is used to share memory with other processes

191

Why mmap()?

• It makes file handling easy. We open some file and map that file into our process address space. To write or read from file we don’t have to use read(), write() or lseek()

• Another use is to provide shared memory between unrelated processes

192

Counter Example

• Closing file has no effect on memory mapping

• Memory mappings are propagated to newly created child

193

System V Shared Memory

• For every shared memory segment kernel maintains the following structure

194

System V Shared Memory

• Creating or opening shared memory– #include <sys/shm.h> – int shmget(key_t key, size_t size, int flag); – Size is given as zero if we are referencing existing shared

memory segment– When a new segment is created, the contents of the

segment are initialized with zeros

195

Size

of

mem

ory

in by

tes

Attaching shared memory to a process

• Once a shared memory segment has been created, a process attaches it to its address space by calling shmat.

– #include <sys/shm.h> – void *shmat(int shmid, const void *addr, int flag);

Returns: pointer to shared memory segment if OK, 1 on error• The address in the calling process at which the segment is attached

depends on the addr argument • If addr is 0, the segment is attached at the first available address

selected by the kernel. This is the recommended technique.

196

Dettaching shared memory from a process

• #include <sys/shm.h>• int shmdt(void *addr); • this does not remove the identifier and its associated data

structure from the system. • The identifier remains in existence until some process (often a

server) specifically removes it by calling shmctl with a command of IPC_RMID.

197

shmctl

• #include <sys/shm.h>• int shmctl(int shmid, int cmd, struct shmid_ds *buf); • IPC_STAT, IPC_SET same as other XSI IPC.• IPC_RMID: • Remove the shared memory segment set from the system. The

segment is not removed until the last process using the segment terminates or detaches it.

198

Memory Mapping of /dev/zero

• Shared memory can be used between unrelated processes. But if the processes are related, some implementations provide a different technique.

• The device /dev/zero is an infinite source of 0 bytes when read. This device also accepts any data that is written to it, ignoring the data.

• An unnamed memory region is created and is initialized to 0.• Multiple processes can share this region if a common ancestor specifies the

MAP_SHARED flag to mmap.

199

void *area;if ((fd = open("/dev/zero", O_RDWR)) < 0) perror("open error");if ((area = mmap(0, SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0)) == MAP_FAILED) perror(); close(fd);

Anonymous Memory Mapping

• A facility similar to the /dev/zero feature. To use this facility, we specify the MAP_ANON flag to mmap and specify the file descriptor as -1.

• The resulting region is anonymous (since it's not associated with a pathname through a file descriptor) and creates a memory region that can be shared with descendant processes.

• this call, we specify the MAP_ANON flag and set the file descriptor to -1.

200

void *area;if ((area = mmap(0, SIZE, PROT_READ | PROT_WRITE, MAP_ANON | MAP_SHARED, -1, 0)) == MAP_FAILED) perror();

Shared Memory

• Between unrelated processes:– XSI or System V shared memory– can use mmap to map the same file into another process

address spaces using the MAP_SHARED flag.• Between related processes

– Memory mapping of /dev/zero– Unonymous memory mapping

201

• Pipes and FIFOS• System V Message

Queues, Semaphores, Shared Memory

• Posix Message Queues, semaphores, shared memory

202

Effect of fork, exec, _exit on IPC

203

TCP/UDP

TCP/IP

TCP or UDP

• At the internet layer, a destination address identifies a host computer; no further distinction is made regarding which process will receive the datagram

• TCP or UDP add a mechanism that distinguishes among destinations within a given host, allowing multiple processes to send and receive datagrams independently

UDP (User Datagram Protocol)

• UDP provides an unreliable connectionless delivery service

• UDP uses IP to deliver datagrams to the right host.• UDP uses ports to provide communication services to

individual processes.

Ports

• TCP/IP uses an abstract destination point called a protocol port.

• Ports are identified by a positive integer.• Operating systems provide some mechanism that

processes use, to specify a port.

Port Numbers

• The port numbers are divided into three ranges by Internet Assigned Numbers Authority

• The well-known ports: 0 through 1023. These port numbers are controlled and assigned by the IANA.

• The registered ports: 1024 through 49151. These are not controlled by the IANA, but the IANA registers and lists the uses of these ports as a convenience to the community.

• The dynamic or private ports, 49152 through 65535. The IANA says nothing about these ports. These are what we call ephemeral ports. (The magic number 49152 is three-fourths of 65536.)

Ports

UDP header

• Header size is 8 bytes• Lack of reliability: If a datagram reaches its final destination but the checksum

detects an error, or if the datagram is dropped in the network, it is not delivered to the UDP socket and is not automatically retransmitted.

• If we want to be certain that a datagram reaches its destination, we can build lots of features into our application: acknowledgments from the other end, timeouts, retransmissions, and the like.

Some standard UDP based services and their ports

TCPTransmission Control Protocol

• TCP provides connections between clients and servers. • TCP uses the connection, not the protocol port, as its fundamental

abstraction.• Connections are identified by a pair of endpoints.

– Endpoint means (ip, port)• TCP provides:

– Connection-oriented– Reliable– Full-duplex– Byte-Stream

Connection-Oriented

• Connection oriented means that a virtual connection is established before any user data is transferred.

• A TCP client establishes a connection with a given server, exchanges data with that server across the connection, and then terminates the connection.

• If the connection cannot be established - the user program is notified.

• If the connection is ever interrupted - the user program(s) is notified.

Reliable

• TCP also provides reliability. When TCP sends data to the other end, it requires an acknowledgment in return.

• If an acknowledgment is not received, TCP automatically retransmits the data and waits a longer amount of time.

• After some number of retransmissions, TCP will give up– the total amount of time spent trying to send data typically between

4 and 10 minutes (depending on the implementation).

Reliable

• How can TCP provide reliable transfer if the underlying communication system offers only unreliable packet delivery?

• Answer is positive acknowledgement with retransmission.

Positive Acknowledgement with Retransmission

Positive Acknowledgement with Retransmission

Reliability - duplicates

• When an underlying packet delivery system duplicates packets.– Duplicates can arise when networks experience high delays that cause

premature retransmission. – Both packets and acknowledgements can be duplicated.

• To detect duplicate packets by assigning each packet a sequence number and requiring the receiver to remember which sequence numbers it has received.

• To avoid confusion caused by delayed or duplicated acknowledgements, TCP acknowledgement specifies the sequence number of the next octet that the receiver expects to receive.

Byte Stream

• Stream means that the connection is treated as a stream of bytes. – If payroll data is being sent, there are no boundaries in the

stream differentiating employee records• The user application does not need to package data

in individual datagrams (as with UDP).

Buffering

• TCP is responsible for buffering data and determining when it is time to send a datagram.

• It is possible for an application to tell TCP to send the data it has buffered without waiting for a buffer to fill up.

Full Duplex

• TCP provides transfer in both directions.• To the application program these appear as 2

unrelated data streams, although TCP can piggyback control and data communication by providing control information (such as an ACK) along with user data.

TCP Ports

• Interprocess communication via TCP is achieved with the use of ports (just like UDP).

• UDP ports have no relation to TCP ports (different name spaces).

TCP Segments

• TCP views the data stream as a sequence of bytes that it divides into segments for transmission. Segments carry varying sizes of data.

• The chunk of data that TCP asks IP to deliver is called a TCP segment.

• Each segment contains:– data bytes from the byte stream– control information that identifies the data bytes

TCP Segment Format

TCP Segments

• Segments are exchanged to establish connections, transfer data, send acknowledgements, advertise window sizes, and close connections.

• Because TCP uses piggybacking, acknowledgement can be sent along with data– an acknowledgement traveling from machine A to machine B may

travel in the same segment as data traveling from machine A to machine B, even though the acknowledgement refers to data sent from B to A

Flags

• TCP advertises how much data it is willing to accept every time it sends segment by specifying its buffer size in the WINDOW field.

Sliding Window

• TCP uses a specialized sliding window mechanism to solve two important problems

– efficient transmission – flow control.

• The TCP window mechanism makes it possible to send multiple segments before an acknowledgement arrives.

• The TCP form of a sliding window protocol also solves the end-to-end flow control problem, by allowing the receiver to restrict transmission until it has sufficient buffer space to accommodate more data.

TCP Sliding Window

• Three markers are maintained

• octets upto 2 have been sent and acknowledged,• octets 3 through 6 have been sent but not acknowledged,• octets 7 though 9 have not been sent but will be sent without delay• octets 10 and higher cannot be sent until the window moves

Variable Window Size and Flow Control

• Each acknowledgement contains a window advertisement that specifies how many additional octets of data the receiver is prepared to accept.

• In response to an increased window advertisement, the sender increases the size of its sliding window

• In response to a decreased window advertisement, the sender decreases the size of its window and stops sending octets beyond the boundary.

• In the extreme case, the receiver advertises a window size of zero to stop all transmissions.

TCP Connection Establishment

• Three-way handshake • It accomplishes two important functions.

– It guarantees that both sides are ready to transfer data (and that they know they are both ready)

– it allows both sides to agree on initial sequence numbers. • Sequence numbers are sent and acknowledged during the

handshake. Each machine must choose an initial sequence number at random that it will use to identify bytes in the stream it is sending.

TCP Connection Establishment

• When a client requests a connection, it sends a “SYN” segment (a special TCP segment) to the server port.

• SYN stands for synchronize. The SYN message includes the client’s ISN.

• ISN is Initial Sequence Number.

TCP Connection Establishment

• Every TCP segment includes a Sequence Number that refers to the first byte of data included in the segment.

• Every TCP segment includes a Request Number (Acknowledgement Number) that indicates the byte number of the next data that is expected to be received.– All bytes up through this number have already been

received.

TCP Connection Establishment

• A server accepts a connection.– Must be looking for new connections!

• A client requests a connection.– Must know where the server is!

Client Starts

• A client starts by sending a SYN segment with the following information:– Client’s ISN (generated pseudo-randomly)– Maximum Receive Window for client.– Optionally (but usually) MSS (largest datagram accepted).– No payload! (Only TCP headers)

Sever Response

• When a waiting server sees a new connection request, the server sends back a SYN segment with:– Server’s ISN (generated pseudo-randomly)– Request Number is Client ISN+1– Maximum Receive Window for server.– Optionally (but usually) MSS – No payload! (Only TCP headers)

Finally

• When the Server’s SYN is received, the client sends back an ACK with:– Request Number is Server’s ISN+1

TCP Connection Establishment

TCP Connection Establishment

TCP Connection Establishment

• Why is the third message necessary?– HINTS:

• TCP is a reliable service.• IP delivers each TCP segment.• IP is not reliable.

• Why not each connection start with the initial sequence number 1?

TCP Options

• MSS option. the maximum amount of data that it is willing to accept in each TCP segment, on this connection.

• Window scale option. The maximum window that either TCP can advertise to the other TCP is 65,535. This option specifies that the advertised window in the TCP header must be scaled (left-shifted) by 0–14 bits, providing a maximum window of almost one gigabyte (65,535 x 214).

• Timestamp option. This option is needed for high-speed connections to prevent possible data corruption caused by old, delayed, or duplicated segments.

TCP Buffers

• Both the client and server allocate buffers to hold incoming and outgoing data– The TCP layer does this.

• Both the client and server announce with every ACK how much buffer space remains (the Window field in a TCP segment).

Send Buffers

• The application gives the TCP layer some data to send.• The data is put in a send buffer, where it stays until the data is

ACK’d.– it has to stay, as it might need to be sent again!

• The TCP layer won’t accept data from the application unless (or until) there is buffer space.

Connection Termination

• The TCP layer can send a RST segment that terminates a connection if something is wrong.

• Usually the application tells TCP to terminate the connection gracefully with a FIN segment.

Connection Termination

FIN

• Either end of the connection can initiate termination.• A FIN is sent, which means the application is done

sending data.• The FIN is ACK’d.• The other end must now send a FIN.• That FIN must be ACK’d.

Connection Termination

TCP Connection State Diagram

• There are 11 different states defined for a connection– based on the current state and the segment received in that state.

• One reason for showing the state transition diagram is to show the 11 TCP states with their names. These states are displayed by netstat, which is a useful tool when debugging client/server applications

What is the purpose of TIME_WAIT?

• Once a TCP connection has been terminated (the last ACK sent) there is some unfinished business:– What if the ACK is lost? The last FIN will be resent and it must be

ACK’d.– What if there are lost or duplicated segments that finally reach the

incarnation of the previous connection after a long delay?• The MSL is the maximum amount of time that any given IP

datagram can live in a network

Socket Pair

• The socket pair for a TCP connection is the four-tuple that defines the two endpoints of the connection:

– the local IP address, local port, foreign IP address, and foreign port. • A socket pair uniquely identifies every TCP connection on a network. • The two values that identify each endpoint, an IP address and a port

number, are often called a socket.• We can extend the concept of a socket pair to UDP, even though UDP

is connectionless.

Socket Pair

Writing to TCP Socket

Writing to UDP Socket

Sockets

259

TCP/IP Model

TCP/IP

• TCP/IP does not include an API definition.• There are a variety of APIs for use with TCP/IP:

– Sockets– TLI, XTI– Winsock– MacTCP

Functions needed:

• Specify local and remote communication endpoints• Initiate a connection• Wait for incoming connection• Send and receive data• Terminate a connection gracefully• Error handling

Berkeley Sockets

• Generic:– support for multiple protocol families.– address representation independence

• Uses existing I/O programming interface as much as possible.– Socket api is similar to file I/O

Socket

• A socket is an abstract representation of a communication endpoint.

• Sockets work with Unix I/O services just like files, pipes & FIFOs.

• Sockets (obviously) have special needs over files:– establishing a connection– specifying communication endpoint addresses

Unix Descriptor Table

Socket Descriptor Data Structure

Creating a Socket

int socket(int family,int type,int proto);

• family specifies the protocol family (AF_INET for TCP/IP).

• type specifies the type of service (SOCK_STREAM, SOCK_DGRAM).

• protocol specifies the specific protocol (usually 0, which means the default).

socket()

• The socket() system call returns a socket descriptor (small integer) or -1 on error.

• socket() allocates resources needed for a communication endpoint - but it does not deal with endpoint addressing.

Specifying an Endpoint Address

• Remember that the sockets API is generic.• There must be a generic way to specify endpoint

addresses.• TCP/IP requires an IP address and a port number for

each endpoint address.• Other protocol suites (families) may use other

schemes.

Necessary Background Information: POSIX data types

int8_t signed 8bit intuint8_t unsigned 8 bit intint16_t signed 16 bit intuint16_t unsigned 16 bit intint32_t signed 32 bit intuint32_t unsigned 32 bit int

u_char, u_short, u_int, u_long

More POSIX data types

sa_family_t address familysocklen_t length of structin_addr_t IPv4 addressin_port_t IP port number

Generic socket addresses

struct sockaddr {uint8_t sa_len;sa_family_t sa_family; char sa_data[14];

};

• sa_family specifies the address type.• sa_data specifies the address value.

AF_INET

• For AF_INET we need:– 16 bit port number – 32 bit IP address

struct sockaddr_in (IPv4)

struct sockaddr_in {uint8_t sin_len;sa_family_t sin_family;in_port_t sin_port;

struct in_addr sin_addr; char sin_zero[8];

};A special kind of sockaddr structure – used for IPV4 sockets

struct in_addr

struct in_addr { in_addr_t s_addr;

};

Byte Order

Network Byte Order

• Network communication uses Bigendian style, also known as Network Byte Order (NBO)

• All values stored in a sockaddr_in must be in network byte order.– sin_port a TCP/IP port number.– sin_addr an IP address.

Network Byte Order Functions

‘h’ : host byte order ‘n’ : network byte order‘s’ : short (16bit) ‘l’ : long (32bit)

uint16_t htons(uint16_t);uint16_t ntohs(uint_16_t);

uint32_t htonl(uint32_t);uint32_t ntohl(uint32_t);

TCP/IP Addresses

• We don’t need to deal with sockaddr structures since we will only deal with a real protocol family.

• We can use sockaddr_in structures.

BUT: The C functions that make up the sockets API expect structures of type sockaddr.

Assigning an address to a socket

• The bind() system call is used to assign an address to an existing socket.

int bind( int sockfd, const struct sockaddr *myaddr, int

addrlen);

• bind returns 0 if successful or -1 on error.const!

bind()

• calling bind() assigns the address specified by the sockaddr structure to the socket descriptor.

• You can give bind() a sockaddr_in structure: bind( mysock, (struct sockaddr*) &myaddr, sizeof(myaddr) );

bind() Example

int mysock,err;struct sockaddr_in myaddr;

mysock = socket(PF_INET,SOCK_STREAM,0);myaddr.sin_family = AF_INET;myaddr.sin_port = htons( portnum );myaddr.sin_addr = htonl( ipaddress);

err=bind(mysock, (sockaddr *) &myaddr, sizeof(myaddr));

Uses for bind()

• There are a number of uses for bind():– Server would like to bind to a well known address (port

number).

– Client can bind to a specific port.

– Client can ask the O.S. to assign any available port number.

IPv4 Address Conversion

int inet_aton( char *, struct in_addr *);

Convert ASCII dotted-decimal IP address to network byte order 32 bit value. Returns 1 on success, 0 on failure.

char *inet_ntoa(struct in_addr);

Convert network byte ordered value to ASCII dotted-decimal (a string).

TCP Client Serversocket()

bind()

listen()

accept() socket()

connect()

write()

read()

Client

(Block until connection) “Handshake”

read()

write()

Data (request)

Data (reply)

close()End-of-Fileread()

close()

“well-known”

port

Server

TCP Client

sd = socket (family, type, protocol);

STREAMDGRAM

RAW

PF_INETPF_INET6PF_UNIXPF_X25

0, used by RAW socket

sd = connect (sd, server_addr, addr_len);

Server PORT#

IP-ADDR

addr

familyport

read (sd, *buff, mbytes);

write (sd, *buff, mbytes);

close (sd);

ephemeral portip addr (routing)

three way handshaking

disconnect sequence

CONNECT actions1. socket is valid2. fill remote endpoint addr/port3. choose local endpoint add/port4. initiate 3-way handshaking

TCP Server

sd = socket (family, type, protocol);

bind (sd, *server_addr, len);well-known port

#INADDR_ANYaddr

familyport

read (ssd, *buff, mbytes);

write (ssd, *buff, mbytes);

close (ssd);

three way handshaking

disconnect sequence

listen (sd, backlog);

ssd = accept (sd, *cliaddr, *len);

LISTENSOCKET

addr

familyport

CONNECTSOCKET

1. Turn sd from active to passive

2. Queue length

bind port #

closes socket for R/Wnon-blockingattempts to send unsent data

socket option SO_LINGERblock until data sent

socket() Create a socket

• family is one of– PF_INET (IPv4), PF_INET6 (IPv6), PF_LOCAL (local Unix),– PF_ROUTE (access to routing tables), PF_KEY (encryption)

• type is one of– SOCK_STREAM (TCP), SOCK_DGRAM (UDP)– SOCK_RAW (for special IP packets, PING, etc. Must be root)

• protocol is 0 (used for some raw socket options)• upon success returns socket descriptor

– Integer, like file descriptor– Return -1 if failure

int socket(int family, int type, int protocol);

connect()Connect to server

• sockfd is socket descriptor from socket()• servaddr is a pointer to a structure with:

– port number and IP address– must be specified (unlike bind())

• addrlen is length of structure• client doesn’t need bind()

– OS will pick ephemeral port• returns socket descriptor if ok, -1 on error

int connect(int sockfd, const struct sockaddr *servaddr, socklen_t addrlen);

bind() Assign a local protocol address (“name”) to a socket

• sockfd is socket descriptor from socket()• myaddr is a pointer to address struct with:

– port number and IP address– if port is 0, then

• host will pick ephemeral port (very rare for server)• How do you know assigned port number?

– if IP address is wildcard: INADDR_ANY (multiple net cards) • host kernel will choose IP address• INADDR_ defined in <netinet/in.h>• INADDR_ in host byte order => htonl(INADDR_ANY)

• addrlen is length of structure• returns 0 if ok, -1 on error

– EADDRINUSE (“Address already in use”)

int bind(int sockfd, const struct sockaddr *myaddr,

socklen_t addrlen);

process specifies resultIP address port

wildcard 0 kernel chooses IP addr and port

wildcard nonzero kernel chooses IP, process specifies port

local IP addr 0 process specifies IP, kernel chooses port

local IP addr nonzero process specifies IP and port

bind() address and port

Wildcard specified as INADDR_ANY

listen()Change socket state to TCP server

• Sockets default to active (for a client)– change to passive so OS will accept connection

• sockfd is socket descriptor from socket()• backlog is maximum number of connections that the server

should queue for this socket– historically 5– rarely above 15 on a even moderate Web server!

int listen(int sockfd, int backlog);

listen()

listen()

• Possibility of SYN flooding attack

accept() Return next completed connection

• sockfd is socket descriptor from socket()• cliaddr and addrlen return protocol address from client• returns brand new descriptor, created by OS• if used with fork(), can create concurrent server

int accept(int sockfd, struct sockaddr *cliaddr, socklen_t *addrlen);

read() and write()

int read (int sockfd, void *buff, size_t mbytes);int write (int sockfd, void *buff, size_t mbytes);

• Reading and writing packets• Both are system calls

close() Close socket for use

• sockfd is socket descriptor from socket()• closes socket for reading/writing

– returns (doesn’t block)– attempts to send any unsent data– socket option SO_LINGER

• block until data sent• or discard any remaining data

– Returns -1 if error

int close(int sockfd);

Descriptor Reference Counts

• For every socket a reference count is maintained, as to how many processes are accessing that socket

• When close() is called on socket descriptor reference count is decreased by 1

• When close() is called on socket descriptor, TCP 4 packet termination sequence will be initiated only if the reference count goes to zero

getsockname() and getpeername() Functions

• getsockname return the local endpoint address associated with a socket

• getpeername return the foreign protocol address associated with a socket

• #include <sys/socket.h> int getsockname(int sockfd, struct sockaddr

*localaddr, socklen_t *addrlen); int getpeername(int sockfd, struct sockaddr *peeraddr,

socklen_t *addrlen);

getsockname()

• TCP client that does not call bind, getsockname returns the local IP address and local port number assigned to the connection by the kernel.

• After calling bind with a port number of 0, getsockname returns the local port number that was assigned.

• getsockname can be called to obtain the address family of a socket• In a TCP server that binds the wildcard IP address, once a connection

is established with a client (accept returns successfully), the server can call getsockname to obtain the local IP address assigned to the connection.

getpeername()

• When a server is execed by the process that calls accept, the only way the server can obtain the identity of the client is to call getpeername

• inetd server works by execing the respective server’s image

getpeername() : inetd

TCP Echo Client

intmain(int argc, char **argv){ int sockfd; struct sockaddr_in servaddr; if (argc != 2) err_quit("usage: tcpcli <IPaddress>"); sockfd = Socket(PF_INET, SOCK_STREAM, 0);

bzero(&servaddr, sizeof(servaddr)); servaddr.sin_family = AF_INET; servaddr.sin_port = htons(SERV_PORT); Inet_pton(AF_INET, argv[1], &servaddr.sin_addr); Connect(sockfd, (SA *) &servaddr, sizeof(servaddr)); str_cli(stdin, sockfd); exit(0); }

str_cli function

2 void 3 str_cli(FILE *fp, int sockfd) 4 { 5 char sendline[MAXLINE], recvline[MAXLINE];

6 while (Fgets(sendline, MAXLINE, fp) != NULL) {

7 Write(sockfd, sendline, strlen (sendline));

8 if (Read(sockfd, recvline, MAXLINE) == 0) 9 err_quit("str_cli: server terminated prematurely");

10 Fputs(recvline, stdout);11 }12 }

TCP Concurrent Server

TCP Concurrent Server2 int 3 main(int argc, char **argv) 4 { 5 int listenfd, connfd; 6 pid_t childpid; 7 socklen_t clilen; 8 struct sockaddr_in cliaddr, servaddr;

9 listenfd = Socket (AF_INET, SOCK_STREAM, 0);

10 bzero(&servaddr, sizeof(servaddr));11 servaddr.sin_family = AF_INET;12 servaddr.sin_addr.s_addr = htonl (INADDR_ANY);13 servaddr.sin_port = htons (SERV_PORT);

14 Bind(listenfd, (SA *) &servaddr, sizeof(servaddr));

15 Listen(listenfd, LISTENQ);16 for ( ; ; ) {17 clilen = sizeof(cliaddr);18 connfd = Accept(listenfd, (SA *) &cliaddr, &clilen);

19 if ( (childpid = Fork()) == 0) { /* child process */20 Close(listenfd); /* close listening socket */21 str_echo(connfd); /* process the request */22 exit (0);23 }24 Close(connfd); /* parent closes connected socket */25 }26 }

str_echo function

void str_echo(int sockfd) { ssize_t n; char buf[MAXLINE]; again: while ( (n = read(sockfd, buf, MAXLINE)) > 0) Write(sockfd, buf, n);

if (n < 0 && errno == EINTR) goto again; else if (n < 0) err_sys("str_echo: read error"); }

TCP Concurrent Server

• Handling zombies– while ( (pid = waitpid(-1, &stat, WNOHANG)) > 0) in SIGCHLD

signal handler• Handling interrupted system calls

– when writing network programs that catch signals, we must be cognizant of interrupted system calls, and we must handle them

– Slow system call is any system call that can block forever

Handling interrupted system calls

for ( ; ; ) {clilen = sizeof (cliaddr);if ( (connfd = accept (listenfd, (SA *) &cliaddr,

&clilen)) < 0) { if (errno == EINTR) continue; /* back to for () */ else err_sys ("accept error"); }

Connection Abort before accept Returns

Connection Abort before accept Returns

• SVR4 and POSIX return an error of EPROTO or ECONNABORTED

• Berkeley-derived kernels never return any error

Termination of Server Process

• FIN is sent to client• Client tcp sends ACK to server • What if client application doesn’t take not of it, and

sends data to server?

SIGPIPE Signal

• When a process writes to a socket that has received an RST, the SIGPIPE signal is sent to the process. The default action of this signal is to terminate the process, so the process must catch the signal to avoid being involuntarily terminated.

Crashing of Server Host

• Nothing is sent to client• Client will try to reach the host, but will get errors

such as ETIMEDOUT, EHOSTUNREACH, ENETWORKUNREACH

Crashing and Rebooting of Server Host

• When client sends packets, server will respond with RST

Shutdown of Server Host

• Init sends SIGTERM to all processes• Then sends SIG KILL to all processes• Fin is sent to the client

I/O Multiplexing

318

I/O Multiplexing

• We often need to be able to monitor multiple descriptors:– a generic TCP client (like telnet)– need to be able to handle unexpected situations, perhaps a

server that shuts down without warning.– A server that handles both TCP and UDP

Example - generic TCP client

• Input from standard input should be sent to a TCP socket.

• Input from a TCP socket should be sent to standard output.

• How do we know when to check for input from each source?

Generic TCP Client

STDIN

STDOUTTCP

SOC

KET

Different Solutions

• Use nonblocking I/O.– use fcntl() to set O_NONBLOCK

• Use alarm and signal handler to interrupt slow system calls.

• Use multiple processes/threads.• Use functions that support checking of multiple input

sources at the same time.

Non blocking I/O

• use fcntl() to set O_NONBLOCK:int flags;flags = fcntl(sock,F_GETFL,0);fcntl(sock,F_SETFL,flags | O_NONBLOCK);• Now calls to read() (and other system calls) will return an

error and set errno to EWOULDBLOCK.

while (! done) {if ( (n=read(STDIN_FILENO,…)<0))

if (errno != EWOULDBLOCK)/* ERROR */

else write(tcpsock,…)

if ( (n=read(tcpsock,…)<0)) if (errno != EWOULDBLOCK)

/* ERROR */ else write(STDOUT_FILENO,…)}

The problem with nonblocking I/O• Using blocking I/O allows the Operating System to

put your program to sleep when nothing is happening (no input). Once input arrives the OS will wake up your program and read() (or whatever) will return.

• With nonblocking I/O the process will waste processor time in a busy-wait

Using alarms

signal(SIGALRM, sig_alrm);alarm(MAX_TIME);read(STDIN_FILENO,…);...

signal(SIGALRM, sig_alrm);alarm(MAX_TIME);read(tcpsock,…);...

Alarming Problem

• What will be the effect on response time ?

• What is the ‘right’ value for MAX_TIME?

Select()

• The select() system call allows us to use blocking I/O on a set of descriptors (file, socket, …).

• For example, we can ask select to notify us when data is available for reading on either STDIN or a TCP socket.

I/O Models

• Blocking• Non-Blocking• IO Multiplexing• Signal-driven IO• Asynchronous IO

IO Models

• Two phases– Waiting for the data– Copying the data

Blocking I/Oapplication

recvfrom

Processdatagram

System call

Return OK

No datagram ready

Datagram readycopy datagram

Copy complete

kernel

Process blocks in a call to recvfrom

Wait for data

Copy datafrom kernel to user

nonblocking I/O

application

recvfrom

Processdatagram

System call

Return OK

No datagram ready

copy datagram

application

kernel

Wait for data

EWOULDBLOCK

recvfrom No datagram readyEWOULDBLOCK

System call

recvfrom datagram readySystem call

Copy datafrom kernel to user

Process repeatedlycall recvfromwating for an OK return(polling)

I/O multiplexing(select and poll)

application

select

Processdatagram

System call

Return OK

No datagram ready

Datagram readycopy datagram

Copy complete

kernel

Wait for data

Return readable

recvfromCopy datafrom kernel to user

Process blockin a call toselect waitingfor one ofpossibly manysockets tobecome readable

Process blockswhile data copiedinto applicationbuffer

System call

signal driven I/O(SIGIO)

application

Establish SIGIO

Processdatagram

System call

Return OK

Datagram readycopy datagram

Copy complete

kernel

Wait for data

Deliver SIGIO

recvfrom Copy datafrom kernel to user

Process continues executing

Process blockswhile data copiedinto applicationbuffer

Sigaction system call

Return Signal handler

Signal handler

asynchronous I/O

application

aio_read

Signal handlerProcessdatagram

System call

Delever signal

No datagram ready

Datagram readycopy datagram

Copy complete

kernel

Process continuesexecuting

Wait for data

Copy datafrom kernel to user

Return

Specified in aio_read

Comparison of the I/O Models

blocking nonblocking I/O multiplexing

signal-drivenI/O

asynchronous I/O

initiate

complete

check check check check check check

complete

blocked

check

blocked

readyinitiate blocked

complete

notificationinitiate blocked

complete

initiate

notification

wait fordata

copy datafrom kernelto user

ist phase handled differently,2nd phase handled the same

handles both phases

Select()int select( int maxfd,

fd_set *readset, fd_set *writeset, fd_set *excepset, const struct timeval *timeout);

maxfd : highest number assigned to a descriptor.weadset: set of descriptors we want to read from.writeset: set of descriptors we want to write to.excepset: set of descriptors to watch for exceptions.timeout: maximum time select should wait

struct timeval

struct timeval {long tv_usec; /* seconds */long tv_usec; /* microseconds */

}

struct timeval max = {1,0};

Condition of select function

• Wait forever : return only descriptor is ready(timeval = NULL)

• wait up to a fixed amount of time:• Do not wait at all : return immediately after checking

the descriptors(timeval = 0)wait: normally interrupt if the process catches a signal

and returns from the signal handler

• Readset => descriptor for checking readable• writeset => descriptor for checking writable• exceptset => descriptor for checking two exception conditions :arrival of out of band data for a socket :the presence of control status information to be read from the

master side of a pseudo terminal

Select Function

Descriptor sets

• Array of integers : each bit in each integer correspond to a descriptor.

• fd_set: an array of integers, with each bit in each integer corresponding to a descriptor.

• Void FD_ZERO(fd_set *fdset); /* clear all bits in fdset */• Void FD_SET(int fd, fd_set *fdset); /* turn on the bit for fd in fdset */• Void FD_CLR(int fd, fd_set *fdset); /* turn off the bit for fd in fdset*/• int FD_ISSET(int fd, fd_set *fdset);/* is the bit for fd on in fdset ? */

Example of Descriptor sets function

fd_set rset;

FD_ZERO(&rset);/*all bits off : initiate*/FD_SET(1, &rset);/*turn on bit fd 1*/FD_SET(4, &rset); /*turn on bit fd 4*/FD_SET(5, &rset); /*turn on bit fd 5*/

• specifies the number of descriptors to be tested.• Its value is the maximum descriptor to be tested,

plus one– (example:fd1,2,5 => maxfdp1: 6)

• constant FD_SETSIZE defined by including <sys/select.h>, is the number of descriptors in the fd_set datatype.(1024)

Maxfdp1

When is the descriptor ready for reading?

• The number of bytes of data in the socket receive buffer is greater than or equal to the current size of the low-water mark for the socket receive buffer. SO_RCVLOWAT socket option. It defaults to 1 for TCP and UDP sockets

• The read half of the connection is closed (i.e., a TCP connection that has received a FIN)

• The socket is a listening socket and the number of completed connections is nonzero.

• A socket error is pending. A read operation on the socket will not block and will return an error (–1) with errno set to the specific error condition.

– These pending errors can also be fetched and cleared by calling getsockopt and specifying the SO_ERROR socket option.

When the socket is ready for writing?

• The number of bytes of available space in the socket send buffer is greater than or equal to the current size of the low-water mark for the socket send buffer and eit

• The write half of the connection is closed. A write operation on the socket will generate SIGPIPE

• A socket using a non-blocking connect has completed the connection, or the connect has failed

• A socket error is pending. A write operation on the socket will not block and will return an error (–1) with errno set to the specific error condition.

– These pending errors can also be fetched and cleared by calling getsockopt with the SO_ERROR socket option.

When is the socket descriptor returned in exception list?

• A socket has an exception condition pending if there is out-of-band data for the socket

• or the socket is still at the out-of-band mark

Condition that cause a socket to be ready for select

Condition Readable? writable? Exception?

Data to readread-half of the connection closednew connection ready for listening socketSpace available for writingwrite-half of the connection closed

•••

••

• •

Pending error

TCP out-of-band data

Condition handled by select in str_cli

Data of EOF

client

• stdinSocket•

error EOF

RST

TCP

data FIN

select() for readability on either standard input or socket

Three conditions are handled with the socket

• Peer TCP send a data,the socket becomr readable and read returns greater than 0

• Peer TCP send a FIN(peer process terminates), the socket become readable and read returns 0(end-of-file)

• Peer TCP send a RST(peer host has crashed and rebooted), the socket become readable and returns -1 and errno contains the specific error code

Implimentation of str_cli function using select

Void str_cli(FILE *fp, int sockfd){int maxfdp1;fd_set rset;charsendline[MAXLINE], recvline[MAXLINE];

FD_ZERO(&rset);for ( ; ; ) {FD_SET(fileno(fp), &rset);FD_SET(sockfd, &rset);maxfdp1 = max(fileno(fp), sockfd) + 1;

Select(maxfdp1, &rset, NULL, NULL, NULL);

Continue…..

if (FD_ISSET(sockfd, &rset)) { /* socket is readable */if (Readline(sockfd, recvline, MAXLINE) == 0)err_quit("str_cli: server terminated prematurely");Fputs(recvline, stdout);}

if (FD_ISSET(fileno(fp), &rset)) { /* input is readable */if (Fgets(sendline, MAXLINE, fp) == NULL)return; /* all done */Writen(sockfd, sendline, strlen(sendline));}}//for}//str_cli

Stop and waitsends a line to the server and then waits for the reply

request

request

serverrequest

request

serverreply

reply

reply

reply

client

time1

time2

time3

time4

time5

time6

time7

time0

Batch input

request8 request7 request6 request5

reply1 reply2 reply3 reply4

Time 7:

request9 request8 request7 request6

reply2 reply3 reply4 reply5

Time 8:

Handling batch input

• The problem with our revised str_cli function– After the handling of an end-of-file on input, the send function

returns to the main function, that is, the program is terminated.– However, in batch mode, there are still other requests and replies in

the pipe.• A way to close one-half of the TCP connection

– send a FIN to the server, telling it we have finished sending data, but leave the socket descriptor open for reading <= shutdown function

Shutdown function

• Close one half of the TCP connection• Close function :

– decrements the descriptor’s reference count and closes the socket only if the count reaches 0, terminate both directions of data transfer(reading and writing)

• Shutdown function closes just one of them (reading or writing)

Calling shutdown to close half of a TCP connection

client serverdata

dataFIN

Ack of data and FIN

datadata

FINAck of data and FIN

Read returns > 0Read returns > 0Read returns 0

writewriteclose

writewrite

shutdown

Read returns > 0Read returns > 0

Read returns 0

• #include<sys/socket.h> int shutdown(int sockfd, int howto); /* return : 0 if OK, -1 on error */• howto argument SHUT_RD : read-half of the connection closed. No more reads can be issued SHUT_WR : write-half of the connection closed. Also called half-close. Buffered

data will be sent followed by termination sequence. SHUT_RDWR : both closed

Shutdown function

Str_cli function using select and shutdown

#include "unp.h"void str_cli(FILE *fp, int sockfd){

int maxfdp1, stdineof;fd_set rset;charsendline[MAXLINE], recvline[MAXLINE];

stdineof = 0;FD_ZERO(&rset);for ( ; ; ) {

if (stdineof == 0) // select on standard input for readabilityFD_SET(fileno(fp), &rset);

FD_SET(sockfd, &rset);maxfdp1 = max(fileno(fp), sockfd) + 1;Select(maxfdp1, &rset, NULL, NULL, NULL);

Continue…..

if (FD_ISSET(sockfd, &rset)) { /* socket is readable */if (Readline(sockfd, recvline, MAXLINE) == 0) {if (stdineof == 1)

return; /* normal termination */elseerr_quit("str_cli: server terminated prematurely");}Fputs(recvline, stdout);}if (FD_ISSET(fileno(fp), &rset)) { /* input is readable */if (Fgets(sendline, MAXLINE, fp) == NULL) {

stdineof = 1;Shutdown(sockfd, SHUT_WR);/* send FIN */FD_CLR(fileno(fp), &rset);continue;}Writen(sockfd, sendline, strlen(sendline));}}

}

Str_cli function using select and shutdown

TCP echo server

• Single process server that uses select to handle any number of clients, instead of forking one child per client.

Data structure TCP server(1)

Client[][0]

[1][2]

-1-1-1

-1[FD_SETSIZE -1]

rset:fd0 fd1 fd2 fd3

0 0 0 1

Maxfd + 1 = 4

fd:0(stdin),1(stdout),2(stderr)fd:3 => listening socket fd

Before first client has established a connection

Data structure TCP server(2)

Client[][0]

[1][2]

4-1-1

-1[FD_SETSIZE -1]

rset:fd0 fd1 fd2 fd3

0 0 0 1

Maxfd + 1 = 5

* fd3 => listening socket fd

fd41

*fd4 => client socket fd

After first client connection is established

Client[][0]

[1][2]

45-1

-1[FD_SETSIZE -1]

rset:fd0 fd1 fd2 fd3

0 0 0 1

Maxfd + 1 = 6

* fd3 => listening socket fd

fd41

* fd4 => client1 socket fd

fd51

* fd5 => client2 socket fd

Data structure TCP server(3)After second client connection is established

Data structure TCP server(4)

Client[][0]

[1][2]

-15-1

-1[FD_SETSIZE -1]

rset:fd0 fd1 fd2 fd3

0 0 0 1

Maxfd + 1 = 6

* fd3 => listening socket fd

fd40

* fd4 => client1 socket fd deleted

fd51

* fd5 => client2 socket fd

*Maxfd does not change

After first client terminates its connection

TCP echo server using single process#include "unp.h"int main(int argc, char **argv){

int i, maxi, maxfd, listenfd, connfd, sockfd;int nready, client[FD_SETSIZE];ssize_t n;fd_set rset, allset;char line[MAXLINE];socklen_t clilen;struct sockaddr_in cliaddr, servaddr;listenfd = Socket(AF_INET, SOCK_STREAM, 0);bzero(&servaddr, sizeof(servaddr));servaddr.sin_family = AF_INET;servaddr.sin_addr.s_addr = htonl(INADDR_ANY);servaddr.sin_port = htons(SERV_PORT);Bind(listenfd, (SA *) &servaddr, sizeof(servaddr));Listen(listenfd, LISTENQ);

maxfd = listenfd; /* initialize */maxi = -1; /* index into client[] array */for (i = 0; i < FD_SETSIZE; i++)client[i] = -1; /* -1 indicates available entry */

FD_ZERO(&allset);FD_SET(listenfd, &allset);for ( ; ; ) {

rset = allset; /* structure assignment */nready = Select(maxfd+1, &rset, NULL, NULL, NULL);

if (FD_ISSET(listenfd, &rset)) { /* new client connection */clilen = sizeof(cliaddr);

connfd = Accept(listenfd, (SA *) &cliaddr, &clilen);for (i = 0; i < FD_SETSIZE; i++)

if (client[i] < 0) {client[i] = connfd; /* save descriptor */break;}

if (i == FD_SETSIZE)err_quit("too many clients");FD_SET(connfd, &allset); /* add new descriptor to set */

if (connfd > maxfd)maxfd = connfd; /* maxfd for select */

if (i > maxi)maxi = i; /* max index in client[] array */

if (--nready <= 0)continue; /* no more readable descriptors */

}

for (i = 0; i <= maxi; i++) { /* check all clients for data */if ( (sockfd = client[i]) < 0)

continue;if (FD_ISSET(sockfd, &rset)) {

if ( (n = Readline(sockfd, line, MAXLINE)) == 0) {/*connection closed by client */Close(sockfd);FD_CLR(sockfd, &allset);client[i] = -1;

} elseWriten(sockfd, line, n);if (--nready <= 0)break; /* no more readable descriptors */

}}

}}

Denial of service attacks

• If malicious client connect to the server, send 1 byte of data(other than a newline), and then goes to sleep.

=>call readline, server is blocked.

Denial of service attacks

• Solution – use nonblocking I/O– have each client serviced by a separate thread of control

(spawn a process or a thread to service each client)– place a timeout on the I/O operation

pselect function

#include <sys/select.h>#include <signal.h>#include <time.h>

int pselect(int maxfdp1, fd_set *readset, fd_set *writeset, fd_set *exceptset, const struct timespec *timeout, const sigset_t *sigmask)

pselect function was invented by Posix.1g.

pselect function

• struct timespec{ time_t tv_sec; /*seconds*/ long tv_nsec; /* nanoseconds */• sigmask => pointer to a signal mask.

Name and Address Conversions

DNS

RFC 1034RFC 1035

Hierarchical Namespace

Naming Authorities

DNS Record Types

Types

Sample DNS Records

aix IN A 192.168.42.2 IN AAAA 3ffe:b80:1f8d:2:204:acff:fe17:bf38 IN MX 5 aix.unpbook.com. IN MX 10 mailhost.unpbook.com.aix-4 IN A 192.168.42.2aix-6 IN AAAA 3ffe:b80:1f8d:2:204:acff:fe17:bf38aix-611 IN AAAA fe80::204:acff:fe17:bf38

Resolvers and Name Servers

379

DNS library functions

gethostbyname

gethostbyaddr

getservbyname

getservbyport

getaddrinfo

380

gethostbyname

struct hostent *gethostbyname( const char *hostname);

struct hostent is defined in netdb.h:

#include <netdb.h>

381

struct hostent

struct hostent {char *h_name;char **h_aliases; int h_addrtype;int h_length;char **h_addr_list;

};

official name (canonical)other names

AF_INET or AF_INET6address length (4 or

16) array of ptrs to

addresses

struct hostent

gethostbyname and errors

• On error gethostbyname return null.• Gethostbyname sets the global variable h_errno to indicate

the exact error:– HOST_NOT_FOUND– TRY_AGAIN– NO_RECOVERY– NO_DATA– NO_ADDRESS

Sample code using gethostbyname()

char *ptr, **pptr; char str [INET_ADDRSTRLEN]; struct hostent *hptr;

while (--argc > 0) { ptr = *++argv;if ( (hptr = gethostbyname (ptr) ) ==

NULL) {err_msg ("gethostbyname error for host:

%s: %s", ptr, hstrerror (h_errno) ); continue; } printf ("official hostname: %s\n",

hptr->h_name); for (pptr = hptr->h_aliases; *pptr ! =

NULL; pptr++) printf ("\talias: %s\n", *pptr);

switch (hptr->h_addrtype) { case AF_INET: pptr = hptr->h_addr_list; for ( ; *pptr != NULL; pptr++) printf ("\taddress: %s\n", Inet_ntop (hptr->h_addrtype, *pptr,

str, sizeof (str))); break; default: err_ret ("unknown address type"); break; } }

gethostbyaddr

• #include <netdb.h>struct hostent *gethostbyaddr (const char *addr, socklen_t

len, int family);• The addr argument is not a char*, but is really a pointer to an in_addr

structure containing the IPv4 address. len is the size of this structure: 4 for an IPv4 address. The family argument is AF_INET.

• The function gethostbyaddr takes a binary IPv4 address and tries to find the hostname corresponding to that address. This is the reverse of gethostbyname

getservbyname and getservbyport

• Services are often known by names.• mapping from the name to port number is contained

in a file (normally /etc/services)• if the port number changes, all we need to modify is

one line in the /etc/services file instead of having to recompile the applications.

getservbyname

• #include <netdb.h>struct servent *getservbyname (const char *servname, const

char *protoname); struct servent { char *s_name; /* official service name */ char **s_aliases; /* alias list */ int s-port; /* port number, network-byte order */ char *s_proto; /* protocol to use */};

• The service name servname must be specified. If a protocol is also specified (protoname is a non-null pointer), then the entry must also have a matching protocol. Some Internet services are provided using either TCP or UDP

Usage of getservbyname

struct servent *sptr;

sptr = getservbyname("domain", "udp"); /* DNS using UDP */sptr = getservbyname("ftp", "tcp"); /* FTP using TCP */sptr = getservbyname("ftp", NULL); /* FTP using TCP */sptr = getservbyname("ftp", "udp"); /* this call will fail */

/etc/services file

• freebsd % grep -e ^ftp -e ^domain /etc/services

ftp-data 20/tcp #File Transfer [Default Data]ftp 21/tcp #File Transfer [Control]domain 53/tcp #Domain Name Serverdomain 53/udp #Domain Name Serverftp-agent 574/tcp #FTP Software Agent Systemftp-agent 574/udp #FTP Software Agent Systemftps-data 989/tcp # ftp protocol, data, over TLS/SSLftps 990/tcp # ftp protocol, control, over TLS/SSL

getservbyport

• looks up a service given its port number and an optional protocol• usagestruct servent *sptr;

sptr = getservbyport (htons (53), "udp"); /* DNS using UDP */sptr = getservbyport (htons (21), "tcp"); /* FTP using TCP */sptr = getservbyport (htons (21), NULL); /* FTP using TCP */sptr = getservbyport (htons (21), "udp"); /* this call will fail */

getaddrinfo

• The gethostbyname and gethostbyaddr functions only support IPv4 • handles both

– name-to-address – service-to-port translation,

• returns – sockaddr structures instead of a list of addresses.

• hides all the protocol dependencies • The application deals only with the socket address structures that are

filled in by getaddrinfo

getaddrinfo

• #include <netdb.h>int getaddrinfo (const char *hostname, const char *service,

const struct addrinfo *hints, struct addrinfo **result) ;

struct addrinfo { int ai_flags; /* AI_PASSIVE, AI_CANONNAME */ int ai_family; /* AF_xxx */ int ai_socktype; /* SOCK_xxx */ int ai_protocol; /* 0 or IPPROTO_xxx for IPv4 and IPv6 */ socklen_t ai_addrlen; /* length of ai_addr */ char *ai_canonname; /* ptr to canonical name for host */ struct sockaddr *ai_addr; /* ptr to socket address structure */ struct addrinfo *ai_next; /* ptr to next structure in linked list */};

Hints structure

• hints is either a null pointer or a pointer to an addrinfo structure that the caller fills in with hints about the types of information the caller wants returned.

• The members of the hints structure that can be set by the caller are:– ai_flags (zero or more AI_XXX values OR'ed together)– ai_family (an AF_xxx value)– ai_socktype (a SOCK_xxx value)– ai_protocol

• For example, – if the specified service is provided for both TCP and UDP, set ai_socktype

member of the hints structure to SOCK_DGRAM. The only information returned will be for datagram sockets.

ai_flags

• AI_PASSIVE The caller will use the socket for a passive open.• AI_CANONNAME Tells the function to return the canonical name of the host.• AI_NUMERICHOST Prevents any kind of name-to-address mapping; the hostname argument

must be an address string.• AI_NUMERICSERV Prevents any kind of name-to-service mapping; the service argument must

be a decimal port number string.•

ai_flags

• AI_V4MAPPED If specified along with an ai_family of AF_INET6, then returns IPv4-mapped IPv6

addresses corresponding to A records if there are no available AAAA records.• AI_ALL If specified along with AI_V4MAPPED, then returns IPv4-mapped IPv6 addresses

in addition to any AAAA records belonging to the name.• AI_ADDRCONFIG Only looks up addresses for a given IP version if there is one or more interface that

is not a loopback interface configured with an IP address of that version.

Result

• linked list of addrinfo structures, linked through the ai_next pointer.

• There are two ways that multiple structures can be returned:– Multiple ips per hostname; one sockaddr structure for each

ip– Service is provided for multiple socket types;

SOCK_STREAM or SOCK_DGRAM

Usage

• Sockaddr structure in addrinfo structures is ready for – a call to socket – then either a call to connect or sendto (for a client), or bind (for a

server). • The arguments to socket are the members ai_family,

ai_socktype, and ai_protocol. • The second and third arguments to either connect or bind are

ai_addr, and ai_addrlen

Usage

• struct addrinfo hints, *res;

• bzero(&hints, sizeof(hints) ) ;• hints.ai_flags = AI_CANONNAME;• hints.ai_family = AF_INET;

• getaddrinfo("freebsd4", "domain", &hints, &res);

Passive sockets

• specifies the service but not the hostname, and specifies the AI_PASSIVE flag in the hints structure.

• The socket address structures returned should contain an IP address of INADDR_ANY (for IPv4) or IN6ADDR_ANY_INIT (for IPv6).

Errors: gai_strerror

• const char *gai_strerror (int error);

freeaddrinfo

• Storage returned by getaddrinfo, the addrinfo structures, the ai_addr structures, and the ai_canonname string are obtained dynamically (e.g., from malloc).

• This storage is returned by calling freeaddrinfo• void freeaddrinfo (struct addrinfo *ai);

getnameinfo function

• Takes a socket address and returns a character string describing the host and another character nstring describing the service

int getnameinfo(const struct sockaddr *sockaddr, socklen_t addrlen, char *host, size_t hostlen, char *serv, size_t servlen, int flags);

Elementary UDP Socket

Contents recvfrom and sendto Function UDP Echo Server( main, de_echo Function) UDP Echo Client( main, de_cli Function) Lost datagrams Verifying Received Response Sever not Running Connect Function with UDP Lack of Flow Control with UDP Determining Outgoing Interface with UDP TCP and UDP Echo Server Using select

UDP

connectionless unreliable datagram protocol popular using

DNS(the Domain Name System) NFS(the Network File System) SNMP(Simple Network Management Protocol)

UDP Server

socket( )

bind( )

recvfrom( )

sendto( )

socket( )

sendto( )

recvfrom( )

close( )

Process request

block until datagramreceived from a client

UDP Client

data(request)

data(reply)

Socket functions for UDP client-server

recvfrom and sendto functions

#include<sys/socket.h>

ssize_t recvfrom(int sockfd, void *buff, size_t nbyte, int flag, struct sockaddr *from, socklen_t *addrlen);

ssize_t sendto(int sockfd, const void *buff, size_t nbyte, int flag, const struct sockaddr *to, socklen_t addrlen); Both return: number of bytes read or written if OK,-1 on error

Sending UDP Datagramsssize_t sendto( int sockfd,

void *buff,size_t nbytes,int flags,

const struct sockaddr* to, socklen_t addrlen);

sockfd is a UDP socketbuff is the address of the data (nbytes long)to is the address of a sockaddr containing the destination address.Return value is the number of bytes sent, or -1 on error.

sendto()

• You can send 0 bytes of data!• Some possible errors :

EBADF, ENOTSOCK: bad socket descriptorEFAULT: bad buffer addressEMSGSIZE: message too largeENOBUFS: system buffers are full

More sendto()

• The return value of sendto() indicates how much data was accepted by the O.S. for sending as a datagram - not how much data made it to the destination.

• There is no error condition that indicates that the destination did not get the data!!!

Receiving UDP Datagramsssize_t recvfrom( int sockfd,

void *buff,size_t nbytes,int flags,

struct sockaddr* from, socklen_t *fromaddrlen);

sockfd is a UDP socketbuff is the address of a buffer (nbytes long)from is the address of a sockaddr.Return value is the number of bytes received and put into buff, or -1 on

error.

recvfrom()• If buff is not large enough, any extra data is lost forever...• You can receive 0 bytes of data!• The sockaddr at from is filled in with the address of the sender.• You should set fromaddrlen before calling.• If from and fromaddrlen are NULL we don’t find out who sent

the data.

More recvfrom()

• Same errors as sendto, but also:– EINTR: System call interrupted by signal.

• Unless you do something special - recvfrom doesn’t return until there is a datagram available.

server as we had with TCP

connection fock fock connection

connection connection

client client

TCP TCP TCP

serverchild

serverchild

listening

server

Summary of TCP client-server with two clients.

Socket receivebuffer

client clientserver

UDP UDP UDP

datagram datagram

Summary of UDP client-server with two clients.

server as with UDP

UDP Echo client: main Function#include “unp.h”

int main(int argc, char **argv)

{

int sockfd;

struct sockaddr_in servaddr;

if (argc != 2)

err_quit( “usage : udpcli <Ipaddress>”);

bzero(&servaddr, sizeof(servaddr);

servaddr.sin_family = AF_INET;

servaddr.sin_port = htons(SERV_PORT);

Inet_pton(AF_INET, argv[1], &servaddr.sin_addr);

sockfd = Socket(AF_INET, SOCK_DGRAM, 0);

dg_cli(stdin, sockfd, (SA *) &servaddr, sizeof(servaddr);

exit(0);

}

UDP Echo Client: dg_cli Function

#include “unp.h”

void dg_cli(FILE *fp, int sockfd, const SA *pservaddr, soklen_t servlen)

{

int n;

char sendline[MAXLINE], recvline[MAXLINE+1];

while(Fgets(sendline, MAXLINE, fp) != NULL) {

sendto(sockfd, sendline, strlen(sendline), 0, pservaddr, servlen);

n = Recvfrom(sockfd, recvline, MAXLINE, 0, NULL, NULL);

recvline[n] = 0; /* null terminate */

Fputs(recvline,stdout);

}

}

dg_cli function: client processing loop

Lost Datagrams

If the client datagram arrives at the server but the server’s reply is lost, the client will again block forever in its call to recvfrom.

The only way to prevent this is to place a timeout on the recvfrom.

Verify Received Response#include “unp.h”

void dg_cli(FILE *fp, int sock, const SA *pseraddr, socklen_t servlen)

{

int n;

char sendline[MAXLINE], recvline[MAXLINE];

socklen_t len;

struct sockaddr *preply_addr;

preply_addr = Malloc(servlen);

while(Fget(sendline, MAXLINE, fp) ! = NULL) {

Sendto(sockfd,sendline, strlen(sendline), 0, pservaddr, servlen);

len = servlen;

n = Recvfrom(sockfdm, recvline, MAXLINE, 0, preply_addr,&len)

continue

If(len != servlen || memcmp(pservaddr, preply_addr, len) != 0) { printf(“reply from %s (ignore)\n”, Sock_ntop(preply_addr, len); continue; } recvline[n] = 0; /*NULL terminate */ Fputs(recvline, stdout); }}

The server has not bound an IP address to its socket, the kernel choose the source address for the IP datagram. It is chosen to be the primary IP address of the outgoing interface.

Verify Received Response

Server Not Running

Client blocks forever in the call to recvfrom. ICMP error is asynchronous error.The basic rule is that asynchronous errors are not returned for UDP sockets unless the socket has been connected.

connect Function with UDP

This does not result in anything like a TCP connection: there is no three-way handshake. Instead, the kernel just records the IP address and port number of the peer.

With a connect UDP socket three change:1. We can no long specify the destination IP address and port for an output

operation. That is, we do not use sendto but use write or send instead.2. We do not use recvfrom but read or recv instead.3. Asynchronous errors are returned to the process for a connected UDP socket.

} Stores peer IP address and port#from connectUDP UDP

UDP datagram

UDP datagram

???

application peer

UDP datagram from some otherIP address and/or port#

connect Function with UDP

Lack of Flow Control with UDP

#include “unp.h”

#define NDG 2000#define DGLEN 1400

void dg_cli(FILE *fp, int sockfd, const SA *pservaddr, socklen_t, servlen){ int i; char sendline[MAXLINE]; for(I = 0; I< NDG ; I++) { Sendto(sockfd, sendline, DGLEN, 0, pservaddr, servlen); }}

dg_cli function that writes a fixed number of datagram to server

#include “unp.h”static void recvfrom_int(int);static int count;void dg_echo(int sockfd, SA *pcliaddr, socklen_t clilen){ socklen_t len; char mesg[MAXLINE]; Signal(SIGHT, recvfrom_int); for( ; ; ) { len=clilen; Recvfrom(sockfd, mesg, MAXLINE, 0, pcliaddr, &len); count++; }}

static void recvfrom_int(int signo){ printf(“\nreceived %d datagram\n”, count); exit(0);}

Lack of Flow Control with UDP

The interface’s buffers were full or they could have been discarded by the sending host.

The counter “dropped due to full socket buffers” indicates how many datagram were received by UDP but were discarded because the receiving socket’s receive queue was full

The number of datagrams received by the server in this example is nondeterministic. It depends on many factors, such as the network load, the processing load on the client host, and the processing load in the server host.

Solution fast server, slow client. Increase the size of socket receive buffer.

Lack of Flow Control with UDP

TCP and UDP Echo Server Using select

#include “unp.h”int main(int argc, char **argv){ int listenfd, connfd, udpfd, nready, maxfd1; char mesg[MAXLINE]; pid_t childpid; fd_set rset; ssize_t n; socklen_t len; const int on = 1; struct sockaddr_in cliaddr, servaddr; void sig_chld(int);

/* Create listening TCP socket */ listenfd = Socket(AF_INET,SOCK_STREAM, 0); bzero(&seraddr, sizeof(servaddr)); servaddr.sin_family = AF_INET; servaddr.sin_addr.s_addr = htol(INADDR_ANY); servaddr.sin_port = htos(SERV_PORT); Setsockopt(listenfd, SOL_SOCKET, SO_REUSEADDR, &on, sizeof(on)); Bind(listenfd, (SA *)&servaddr, sizeof(servaddr));

Listenfd, LISTENQ); /* Create UDP socket */ udpfd = Socket(AF_INET, SOCK_DGRAM, 0); bzero(&seraddr, sizeof(servaddr)); servaddr.sin_family = AF_INET; servaddr.sin_addr.s_addr = htol(INADDR_ANY); servaddr.sin_port = htos(SERV_PORT);

Bind(udpfd, (SA *) &servaddr, sizeof(servaddr));

TCP and UDP Echo Server Using select

Signal(SIGCHLD, sig_chld); /* must call waitpd( )*/ FD_ZERO(&rset); maxfdp1=max(listenfd, udpfd)+1; for( ; ; ) { FD_SET(listenfd, &rset); FD_SET(udpfd, &rset); if((nready = selext[,axfdp1, &rset, NULL, NULL,NULL) < 0) { if(errno == EINTR) continue; else err_sys(“select error”); } if(FD_ISSET(listenfd,&rset)) { len = sizeof(cliaddr); connfd = Accept(listenfd, (SA *) &cliaddr, &len);

if((childpid = fork( )) == 0) { /* child process */ Close(listenfd); /* Close listening socket */ str_echo(connfd); /* process the request */ exit(0); } Close(connfd); }

TCP and UDP Echo Server Using select

if(FD_ISSET(udpfd, &rset)) { len = sizeof(cliaddr); n = Recvfro,(udp, mesg, MAXLINE, 0, (SA *) &cliaddr, &len); Sendto(udpfd, ,esg, n, 0, (SA *) &cliaddr, len); } } /* for */} /* main */

TCP and UDP Echo Server Using select

Advanced UDP Sockets

When to use UDP instead of TCP?

• Advantages of UDP:– UDP supports broadcasting and multicasting– UDP has no connection setup or teardown

• For a two packet request-reply, we need 8 extra packets to be transmitted in TCP

• UDP: RTT+SPT, TCP: 2 *RTT + SPT

When to use UDP instead of TCP?

• Features of TCP not provided by UDP:– Positive acknowledgments, retransmission of lost packets,

duplicate detection, and sequencing of packets reordered by the network

• Seq nos, estimate RTO– Windowed flow control– Slow start and congestion avoidance

• to determine the current network capacity and to handle periods of congestion

When to use UDP instead of TCP?

• Recommendations:– UDP must be used for broadcast and multicast applications

• Error control or reliability be added if reqd at appl layer– UDP can be used for simple request-reply applications, but error

detection must be built into the application • Acknowledgements, timeouts, retransmissions

– UDP should not be used for bulk data transfer• Bulk transfer requires flow control along with error control which is like

replicating TCP at appl layer

Adding Reliability to a UDP Application

• UDP for a request-reply application– Timeout and retransmission to handle datagrams that are

discarded– Sequence numbers so the client can verify that a reply is for

the appropriate request• Examples which use simple request-reply with

reliability: – DNS resolvers, SNMP agents, TFTP, and RPC

Handling Timeout and Retransmission

• Old fashioned: Send a request and wait for N seconds linear retransmit timer

• RTT on a network can vary from fractions of a second on a LAN to many seconds on a WAN.

• Factors affecting the RTT are distance, network speed, and congestion

• Timeout should take into account the actual RTTs that we measure along with the changes in the RTT over time

Retransmission Timeout (RTO) Jacobson's algorithm

• two statistical estimators: srtt is the smoothed RTT estimator and rttvar is the smoothed mean deviation estimator

RTO

• When the retransmission timer expires, an exponential backoff must be used for the next RTO– For example, if our first RTO is 2 seconds and the reply is

not received in this time, then the next RTO is 4 seconds. If there is still no reply, the next RTO is 8 seconds, and then 16, and so on.

Retransmission ambiguity problem

• Jacobson's algorithms tell us how to calculate the RTO each time we measure an RTT and how to increase the RTO when we retransmit.

• But, a problem arises when we have to retransmit a packet and then receive a reply. This is called the retransmission ambiguity problem

Retransmission ambiguity problem

Retransmission ambiguity problem: Karns Algorithm

• the following rules that apply whenever a reply is received for a request that was retransmitted:– If an RTT was measured, do not use it to update the estimators

since we do not know to which request the reply corresponds.– Since this reply arrived before our retransmission timer expired,

reuse this RTO for the next packet. Only when we receive a reply to a request that is not retransmitted will we update the RTT estimators and recalculate the RTO

Concurrent UDP Servers

• two different types of servers:– First is a simple UDP server that reads a client request, sends a

reply, and is then finished with the client• fork a child and let it handle the request

– Second is a UDP server that exchanges multiple datagrams with the client.

• Create a new socket for each client, bind an ephemeral port to that socket, and use that socket for all its replies.

• The client look at the port number of the server's first reply and send subsequent datagrams for this request to that port.

Concurrency in UDP server that exchanges multiple datagrams with the client

Socket Options

abstraction

• Introduction• getsockopt and setsockopt function• socket state• Generic socket option• IPv4 socket option• ICMPv6 socket option• IPv6 socket option• TCP socket option• fcnl function

Introduction

• Three ways to get and set the socket option that affect a socket– getsockopt , setsockopt function=>IPv4 and IPv6

multicasting options– fcntl function =>nonblocking I/O, signal driven I/O– ioctl function =>chapter16

getsockopt and setsockopt function

#include <sys/socket.h>int getsockopt(int sockfd, , int level, int optname, void *optval, socklent_t *optlen);int setsockopt(int sockfd, int level , int optname, const void *optval, socklent_t optlen);

•sockfd => open socket descriptor•level => code in the system to interprete the option(generic, IPv4, IPv6, TCP)•optval => pointer to a variable from which the new value of option is fetched by setsockopt, or into which the current value of the option is stored by setsockopt.•optlen => the size of the option variable.

Generic socket option

• SO_BROCAST =>enable or disable the ability of the process to send broadcast message.(only datagram socket : Ethernet, token ring..)

• SO_DEBUG =>kernel keep track of detailed information about all packets sent or received by TCP(only supported by TCP)

• SO_DONTROUTE=>outgoing packets are to bypass the normal routing mechanisms of the underlying protocol.

• SO_ERROR=>when error occurs on a socket, the protocol module in a Berkeley-derived kernel sets a variable named so_error for that socket. Process can obtain the value of so_error by fetching the SO_ERROR socket option

• SO_KEEPALIVE=>wait 2hours, and then TCP automatically sends a keepalive probe to the peer.– Peer response

• ACK(everything OK)• RST(peer crashed and rebooted):ECONNRESET• no response:ETIMEOUT =>socket closed

– example: Rlogin, Telnet…– Normally used by servers

SO_KEEPALIVE

SO_LINGER

• SO_LINGER =>specify how the close function operates for a connection-oriented protocol(default:close returns immediately)

– struct linger{ int l_onoff; /* 0 = off, nonzero = on */ int l_linger; /*linger time : second*/

};• l_onoff = 0 : turn off , l_linger is ignored• l_onoff = nonzero and l_linger is 0:TCP abort the connection (send RST),

discard any remaining data in send buffer.• l_onoff = nonzero and l_linger is nonzero : process wait until remained data

sending, or until linger time expired. If socket has been set nonblocking it will not wait for the close to complete, even if linger time is nonzero.

SO_LINGER

client server

write

Closeclose returns

Data queued by TCP

Application reads queued data and FINclose

data

FIN

Ack of data and FIN

Ack of data and FIN

FIN

Default operation of close:it returns immediately

SO_LINGER

client server

write

Close Data queued by TCP

Application reads queued data and FINclose

data

FIN

Ack of data and FIN

Ack of data and FIN

FIN

close returns

Close with SO_LINGER socket option set and l_linger a positive value

SO_LINGER

client server

write

Shutdown read block

Data queued by TCP

Application reads queued data and FINclose

data

FIN

Ack of data and FIN

Ack of data and FIN

FIN

read returns 0

Using shutdown to know that peer has received our data

• An way to know that the peer application has read the data– use an application-level ack or application ACK– client

char ack;Write(sockfd, data, nbytes); // data from client to servern=Read(sockfd, &ack, 1); // wait for application-level ack

– servernbytes=Read(sockfd, buff, sizeof(buff)); //data from client//server verifies it received the correct amount of data from// the clientWrite(sockfd, “”, 1);//server’s ACK back to client

SO_RCVBUF , SO_SNDBUF

• let us change the default send-buffer, receive-buffer size.

– Default TCP send and receive buffer size : • 4096bytes• 8192-61440 bytes

– Default UDP buffer size : 9000bytes, 40000 bytes• SO_RCVBUF option must be setting before connection

established.– For client, it should be before calling connect()– For server it should be before calling listen()

• TCP socket buffer size should be at least three times the MSSs

SO_RCVLOWAT , SO_SNDLOWAT

• Every socket has a receive low-water mark and send low-water mark.(used by select function)

• Receive low-water mark: – the amount of data that must be in the socket receive buffer for select to

return “readable”.– Default receive low-water mark : 1 for TCP and UDP

• Send low-water mark: – the amount of available space that must exist in the socket send buffer for

select to return “writable”– Default send low-water mark : 2048 for TCP– UDP send buffer never change because dose not keep a copy of send

datagram.

SO_RCVTIMEO, SO_SNDTIMEO

• allow us to place a timeout on socket receives and sends.

• Default disabled

SO_REUSEADDR, SO_REUSEPORT

• Allow a listening server to start and bind its well known port even if previously established connection exist that use this port as their local port.

• Allow multiple instance of the same server to be started on the same port, as long as each instance binds a different local IP address.

• Allow a single process to bind the same port to multiple sockets, as long as each bind specifies a different local IP address.

• Allow completely duplicate bindings : multicasting

SO_TYPE

• Return the socket type.• Returned value is such as SOCK_STREAM,

SOCK_DGRAM...

SO_USELOOPBACK

• This option applies only to sockets in the routing domain(AF_ROUTE).

• The socket receives a copy of everything sent on the socket.

IPv4 socket option

• Level => IPPROTO_IP• IP_HDRINCL => If this option is set for a raw IP

socket, we must build our IP header for all the datagrams that we send on the raw socket.

IPv4 socket option

• IP_OPTIONS=>allows us to set IP option in IPv4 header.(chapter 24)

• IP_RECVDSTADDR=>This socket option causes the destination IP address of a received UDP datagram to be returned as ancillary data by recvmsg.(chapter20)

IP_RECVIF

• Cause the index of the interface on which a UDP datagram is received to be returned as ancillary data by recvmsg.(chapter20)

IP_TOS

• lets us set the type-of-service(TOS) field in IP header for a TCP or UDP socket.

• If we call getsockopt for this option, the current value that would be placed into the TOS(type of service) field in the IP header is returned

IP_TTL

• We can set and fetch the default TTL(time to live field).

ICMPv6 socket option

• This socket option is processed by ICMPv6 and has a level of IPPROTO_ICMPV6.

• ICMP6_FILTER =>lets us fetch and set an icmp6_filter structure that specifies which of the 256possible ICMPv6 message types are passed to the process on a raw socket.(chapter 25)

IPv6 socket option

• This socket option is processed by IPv6 and have a level of IPPROTO_IPV6.

• IPV6_ADDRFORM=>allow a socket to be converted from IPv4 to IPv6 or vice versa.(chapter 10)

• IPV6_CHECKSUM=>specifies the byte offset into the user data of where the checksum field is located.

IPV6_DSTOPTS

• Specifies that any received IPv6 destination options are to be returned as ancillary data by recvmsg.

IPV6_HOPLIMIT

• Setting this option specifies that the received hop limit field be returned as ancillary data by recvmsg.(chapter 20)

• Default off.

IPV6_HOPOPTS

• Setting this option specifies that any received IPv6 hop-by-hop option are to be returned as ancillary data by recvmsg.(chapter 24)

IPV6_NEXTHOP

• This is not a socket option but the type of an ancillary data object that can be specified to sendmsg. This object specifies the next-hop address for a datagram as a socket address structure.(chapter20)

IPV6_PKTINFO

• Setting this option specifies that the following two pieces of infoemation about a received IPv6 datagram are to be returned as ancillary data by recvmsg:the destination IPv6 address and the arriving interface index.(chapter 20)

IPV6_PKTOPTIONS

• Most of the IPv6 socket options assume a UDP socket with the information being passed between the kernel and the application using ancillary data with recvmsg and sendmsg.

• A TCP socket fetch and store these values using IPV6_ PKTOPTIONS socket option.

IPV6_RTHDR

• Setting this option specifies that a received IPv6 routing header is to be returned as ancillary data by recvmsg.(chapter 24)

• Default off

IPV6_UNICAST_HOPS

• This is similar to the IPv4 IP_TTL.• Specifies the default hop limit for outgoing datagram

sent on the socket, while fetching the socket option returns the value for the hop limit that the kernel will use for the socket.

TCP socket option

• There are five socket option for TCP, but three are new with Posix.1g and not widely supported.

• Specify the level as IPPROTO_TCP.

TCP_KEEPALIVE

• This is new with Posix.1g• It specifies the idle time in second for the connection

before TCP starts sending keepalive probe.• Default 2hours• this option is effective only when the

SO_KEEPALIVE socket option enabled.

TCP_MAXRT

• This is new with Posix.1g.• It specifies the amount of time in seconds before a

connection is broken once TCP starts retransmitting data.– 0 : use default– -1:retransmit forever– positive value:rounded up to next transmission time

TCP_MAXSEG

• This allows us to fetch or set the maximum segment size(MSS) for TCP connection.

TCP_NODELAY

• This option disables TCP’s Nagle algorithm. (default this algorithm enabled)• purpose of the Nagle algorithm.

==>prevent a connection from having multiple small packets outstanding at any time.

• Small packet => any packet smaller than MSS.

Nagle algorithm

• Default enabled.• Reduce the number of small packet on the WAN.• If given connection has outstanding data , then no

small packet data will be sent on connection until the existing data is acknowledged.

0250500750

1000125015001500

17502000

hello!

Nagle algorithm disabled

Nagle algorithm enabled

0250500750

1000125015001500

17502000

hello!

22502500

h

el

lo

!

fcntl function

• File control• This function perform various descriptor control

operation.• Provide the following features

– Nonblocking I/O(chapter 15)– signal-driven I/O(chapter 22)– set socket owner to receive SIGIO signal. (chapter 21,22)

#include <fcntl.h>int fcntl(int fd, int cmd, …./* int arg */); Returns:depends on cmd if OK, -1 on error

O_NONBLOCK : nonblocking I/OO_ASYNC : signal driven I/O notification

Nonblocking I/O using fcntl

Int flags; /* set socket nonblocking */if((flags = fcntl(fd, f_GETFL, 0)) < 0) err_sys(“F_GETFL error”);flags |= O_NONBLOCK;if(fcntl(fd, F_SETFL, flags) < 0) err_sys(“F_ SETFL error”);

each descriptor has a set of file flags that fetched with the F_GETFL command

and set with F_SETFL command.

Misuse of fcntl

/* wrong way to set socket nonblocking */if(fcntl(fd, F_SETFL,O_NONBLOCK) < 0) err_sys(“F_ SETFL error”);

/* because it also clears all the other file status flags.*/

Turn off the nonblocking flag

Flags &= ~O_NONBLOCK;if(fcntl(fd, F_SETFL, flags) < 0) err_sys(“F_SETFL error”);

F_SETOWN

• The integer arg value can be either positive(process ID) or negative (group ID)value to receive the signal.

• F_GETOWN => retrurn the socket owner by fcntl function, either process ID or process group ID.

Unix Domain Protocols

Chapter 14

Unix domain protocol

contents

• Introduction• unix domain socket address structure• socketpair• socket function• unix domain stream client-server• unix domain datagram client-server• passing descriptors• receiving sender credentials

Unix Domain Protocol

• perform client-server communication on a single host using same API that is used for client-server model on the different hosts.

• Faster than internet protocol suite– UNIX domain sockets only copy data; they have no protocol processing to

perform, no network headers to add or remove, no checksums to calculate, no sequence numbers to generate, and no acknowledgements to send.

• The Unix domain protocols are an alternative to the interprocess communication (IPC) methods described

Unix Domain Protocol

• Two types of sockets are provided in the Unix domain: – stream sockets (similar to TCP) – datagram sockets (similar to UDP).

• The UNIX domain datagram service is reliable, however. Messages are neither lost nor delivered out of order

Unix Domain Protocol

• Unix domain sockets are used for three reasons:– Unix domain sockets are often twice as fast as a TCP socket when

both peers are on the same host – used when passing descriptors between processes on the same

host. – Unix domain sockets provide the client's credentials (user ID and

group IDs) to the server, which can provide additional security checking

Unix Domain Protocol

• End Point Address– pathnames within the normal filesystem – The pathname associated with a Unix domain socket should

be an absolute pathname

unix domain socket address structure

• <sys/un.h>struct sockaddr_un{ uint8_t sun_len; sa_family_t sun_family; /*AF_LOCAL*/ char sun_path[104]; /*null terminated pathname*/};• sun_path => must null terminated

socketpair Function

• Create two sockets that are then connected together(only available in unix domain socket)

• family must be AF_LOCAL• protocol must be 0

#include<sys/socket.h>int socketpair(int family, int type, int protocol, int sockfd[2]); return: nonzero if OK, -1 on error

socketpair Function

• Although the socketpair function creates sockets that are connected to each other, the individual sockets don't have names.

• This means that they can't be addressed by unrelated processes.

unix domain stream client-server

#include "unp.h"int main(int argc, char **argv){

int listenfd, connfd;pid_t childpid;socklen_t clilen;struct sockaddr_un cliaddr, servaddr;void sig_chld(int);

listenfd = Socket(AF_LOCAL, SOCK_STREAM, 0);

unlink(UNIXSTR_PATH);bzero(&servaddr, sizeof(servaddr));servaddr.sun_family = AF_LOCAL;strcpy(servaddr.sun_path, UNIXSTR_PATH);

Bind(listenfd, (SA *) &servaddr, sizeof(servaddr));Listen(listenfd, LISTENQ);Signal(SIGCHLD, sig_chld);

unix domain stream client-server(2)

for ( ; ; ) {clilen = sizeof(cliaddr);if ( (connfd = accept(listenfd, (SA *) &cliaddr,

&clilen)) < 0) {if (errno == EINTR)

continue; /* back to for() */else

err_sys("accept error");}

if ( (childpid = Fork()) == 0) { /* child process */Close(listenfd); /* close listening socket */str_echo(connfd); /* process the request */exit(0);}

Close(connfd); /* parent closes connected socket */}

}

passing descriptors

• Current unix system provide a way to pass any open descriptor from one process to any other process.(using sendmsg)

• The ability to pass an open file descriptor between processes is powerful. It can lead to different ways of designing clientserver applications.

• It allows one process (typically a server) to do everything that is required to open a file (involving such details as translating a network name to a network address, dialing a modem, negotiating locks for the file, etc.) and simply pass back to the calling process a descriptor that can be used with all the I/O functions.

• All the details involved in opening the file or device are hidden from the client.

passing descriptors(2)

1. Create a unix domain socket(stream or datagram)2. one process opens a descriptor by calling any of the unix function that

returns a descriptor3. the sending process build a msghdr structure containing the

descriptor to be passed4. the receiving process calls recvmsg to receive the descriptor on the

unix domain socketPassing a descriptor is not passing a descriptor number, but involves creating a new descriptor in the receiving process that refers to the same file table entry within the kernel as the descriptor that was sent by the sending process.

Passing Descriptor

Descriptor passing example

[0] [1]

After creating stream pipe using socketpair

fork

[1][0]Exec(command-line args)

mycat openfile

descriptor

mycat program after invoking openfile program

recvmsg and sendmsg

#include <sys/socket.h>

ssize_t recvmsg (int sockfd, struct msghdr *msg, int flags);

ssize_t sendmsg (int sockfd, struct msghdr *msg, int flags);

Struct msghdr {

void *msg_name; /* starting address of buffer */ socklen_t msg_namelen; /* size of protocol address */ struct iovec *msg_iov; /* scatter/gather array */ size_t msg_iovlen; /* # elements in msg_iov */ void *msg_control; /* ancillary data; must be aligned

for a cmsghdr structure */ socklen_t msg_controllen; /* length of ancillary data */ int msg_flags; /* flags returned by recvmsg() */};

recvmsg and sendmsg

16

020

3

m sg _ n a m e

m sg _ fla gsm sg _ co n tro lle nm sg _ co n tro lm sg _ io v le nm sg _ io vm sg _ n a m e le n

100

60

80

io v_ b a se

io v_ le nio v_ b a seio v_ le nio v_ b a seio v_ le n

iovec{}

F igure 13.8 Data structures when recvmsg is called for a UDP socket.

msghdr{}

recvmsg and sendmsg

16

020

3

m sg_ na m e

m sg_ flag sm sg_ con tro lle nm sg_ con tro lm sg_ io v lenm sg_ io vm sg_ na m e len

100

60

80

io v_b ase

io v_ lenio v_b aseio v_ lenio v_b aseio v_ len

iovec{} [ ]

F igure 13.9 Update o f F igure 13.8 when recvmsg return.

msghdr{}

cm sg_ typ ecm sg_ leve lcm sg_ len

sockaddr_ in{}16, AF_ INET, 2000198.69.10.2

16IP P R O TP _IPIP _R E C V D S TA D D R206 .62 .22 6 .35

Ancillary Data• Ancillary data can be sent and received using the msg_control and

msg_controllen members of the msghdr structure with sendmsg and recvmsg functions.

Protocol cmsg_level Cmsg_type Description IPv4 IPPROTO_IP IP_RECVDSTADD

R IP_RECVIF

receive destination address with UDP datagram receive interface index with UDP datagram

IPv6 IPPROTO_IPV6

IPV6_DSTOPTS IPV6_HOPLIMIT IPV6_HOPOPTS IPV6_NEXTHOP IPV6_PKTINFO IPV6_RTHDR

specify / receive destination options specify / receive hop limit specify / receive hop-by-hop options specify next-hop address specify / receive packet information specify / receive routing header

Unix domain

SOL_SOCKET SCM_RIGHTS SCM_CREDS

send / receive descriptors send / receive user credentials

Ancillary Data

cmsg_len cmsg_level cmsg_type

pad

data

pad

cmsg_len cmsg_level cmsg_type

pad

data

c msghdr{}

c msghdr{}

ac c illarydata objec t

C MSG _ SPAC E()

ac c illarydata objec t

C MSG _ SPAC E()

msg_control

CMSG

_LEN

()cm

sg_le

n

msg_

contr

ollen

cmsg

_len

CMSG

_LEN

()

Figure 13.12 Ancillary data containing two ancillary data objects.

Ancillary Data

cmsghdr{}

F igure 13.13 cmsghdr structure when used with Unix domain sockets .

cmsg_len cmsg_level cmsg_type

d iscr ip to r

16SOL_SOC KETSC M_RIGHTS

cmsghdr{} cmsg_len cmsg_level cmsg_type

16SOL_SOCKETSC M_C REDS

fcred{}

Control Message Header

struct cmsghdr { socklen_t cmsg_len; /* data byte count, including header */ int cmsg_level; /* originating protocol */ int cmsg_type; /* protocol-specific type */ /* followed by the actual control message data */ };

Control Message Header

• To send a file descriptor, – set cmsg_len to the size of the cmsghdr structure, plus the size

of an integer (the descriptor). – The cmsg_level field is set to SOL_SOCKET, and cmsg_type is

set to SCM_RIGHTS, to indicate that we are passing access rights. (SCM stands for socket-level control message.)

– Access rights can be passed only across a UNIX domain socket. The descriptor is stored right after the cmsg_type field, using the macro CMSG_DATA to obtain the pointer to this integer.

Control Message Header

#include <sys/socket.h>/* size of control buffer to send/recv one file

descriptor */#define CONTROLLEN CMSG_LEN(sizeof(int))static struct cmsghdr *cmptr = NULL; /*

malloc'ed first time *//* * Pass a file descriptor to another process. * If fd<0, then -fd is sent back instead as the

error status. */intsend_fd(int fd, int fd_to_send){ struct iovec iov[1]; struct msghdr msg; char buf[2]; /*

send_fd()/recv_fd() 2-byte protocol */

iov[0].iov_base = buf; iov[0].iov_len = 2; msg.msg_iov = iov; msg.msg_iovlen = 1; msg.msg_name = NULL; msg.msg_namelen = 0;

if (fd_to_send < 0) { msg.msg_control = NULL; msg.msg_controllen = 0; buf[1] = -fd_to_send; /* nonzero status

means error */ if (buf[1] == 0) buf[1] = 1; } else {if (cmptr == NULL && (cmptr = malloc(CONTROLLEN))

== NULL) return(-1); cmptr->cmsg_level = SOL_SOCKET; cmptr->cmsg_type = SCM_RIGHTS; cmptr->cmsg_len = CONTROLLEN; msg.msg_control = cmptr; msg.msg_controllen = CONTROLLEN; *(int *)CMSG_DATA(cmptr) = fd_to_send;

/* the fd to pass */ buf[1] = 0; /* zero status means

OK */ } buf[0] = 0; /* null byte flag to

recv_fd() */ if (sendmsg(fd, &msg, 0) != 2) return(-1); return(0);}

Control Message Header

#include "apue.h"#include <sys/socket.h> /* struct msghdr */

/* size of control buffer to send/recv one file descriptor */#define CONTROLLEN CMSG_LEN(sizeof(int))

static struct cmsghdr *cmptr = NULL; /* malloc'ed first time *//* * Receive a file descriptor from a server process. Also, any data * received is passed to (*userfunc)(STDERR_FILENO, buf, nbytes). * We have a 2-byte protocol for receiving the fd from send_fd(). */intrecv_fd(int fd, ssize_t (*userfunc)(int, const void *, size_t)){ int newfd, nr, status; char *ptr; char buf[MAXLINE]; struct iovec iov[1]; struct msghdr msg;

status = -1; for ( ; ; ) { iov[0].iov_base = buf; iov[0].iov_len = sizeof(buf); msg.msg_iov = iov; msg.msg_iovlen = 1; msg.msg_name = NULL; msg.msg_namelen = 0; if (cmptr == NULL && (cmptr = malloc(CONTROLLEN)) == NULL) return(-1);

if (cmptr == NULL && (cmptr = malloc(CONTROLLEN)) == NULL)

return(-1);

msg.msg_control = cmptr;

msg.msg_controllen = CONTROLLEN;

if ((nr = recvmsg(fd, &msg, 0)) < 0) {

err_sys("recvmsg error");

} else if (nr == 0) {

err_ret("connection closed by server");

return(-1);

}

for (ptr = buf; ptr < &buf[nr]; ) {

if (*ptr++ == 0) {

if (ptr != &buf[nr-1])

err_dump("message format error");

status = *ptr & 0xFF; /* prevent sign extension */

if (status == 0) {

if (msg.msg_controllen != CONTROLLEN)

err_dump("status = 0 but no fd");

newfd = *(int *)CMSG_DATA(cmptr);

} else {

newfd = -status;

}

nr -= 2;

}

}

if (nr > 0 && (*userfunc)(STDERR_FILENO, buf, nr) != nr)

return(-1);

if (status >= 0) /* final data has arrived */

return(newfd); /* descriptor, or -status */

}

}

Control Message Header

if (cmptr == NULL && (cmptr = malloc(CONTROLLEN)) == NULL)

return(-1);

msg.msg_control = cmptr;

msg.msg_controllen = CONTROLLEN;

if ((nr = recvmsg(fd, &msg, 0)) < 0) {

err_sys("recvmsg error");

} else if (nr == 0) {

err_ret("connection closed by server");

return(-1);

}

for (ptr = buf; ptr < &buf[nr]; ) {

if (*ptr++ == 0) {

if (ptr != &buf[nr-1])

err_dump("message format error");

status = *ptr & 0xFF; /* prevent sign extension */

if (status == 0) {

if (msg.msg_controllen != CONTROLLEN)

err_dump("status = 0 but no fd");

newfd = *(int *)CMSG_DATA(cmptr);

} else {

newfd = -status;

}

nr -= 2;

}

}

if (nr > 0 && (*userfunc)(STDERR_FILENO, buf, nr) != nr)

return(-1);

if (status >= 0) /* final data has arrived */

return(newfd); /* descriptor, or -status */

}

}

Ancillary Data

#include "unp.h"int my_open(const char *, int);int main(int argc, char **argv){

int fd, n;charbuff[BUFFSIZE];

if (argc != 2)err_quit("usage: mycat <pathname>");

if ( (fd = my_open(argv[1], O_RDONLY)) < 0)err_sys("cannot open %s", argv[1]);

while ( (n = Read(fd, buff, BUFFSIZE)) > 0)Write(STDOUT_FILENO, buff, n);

exit(0);}

mycat program show in Figure 14.7)

#include "unp.h"

intmy_open(const char *pathname, int mode){

int fd, sockfd[2], status;pid_t childpid;char c, argsockfd[10], argmode[10];

Socketpair(AF_LOCAL, SOCK_STREAM, 0, sockfd);

if ( (childpid = Fork()) == 0) { /* child process */Close(sockfd[0]);snprintf(argsockfd, sizeof(argsockfd), "%d", sockfd[1]);snprintf(argmode, sizeof(argmode), "%d", mode);execl("./openfile", "openfile", argsockfd, pathname, argmode,

(char *) NULL);err_sys("execl error");

}

myopen function(1) : open a file and return a descriptor

/* parent process - wait for the child to terminate */Close(sockfd[1]); /* close the end we don't use */

Waitpid(childpid, &status, 0);if (WIFEXITED(status) == 0)

err_quit("child did not terminate");if ( (status = WEXITSTATUS(status)) == 0)

Read_fd(sockfd[0], &c, 1, &fd);else {

errno = status; /* set errno value from child's status */fd = -1;

}

Close(sockfd[0]);return(fd);

}

myopen function(2) : open a file and return a descriptor

receiving sender credentials

• User credentials via fcred structure

Struct fcred{uid_t fc_ruid; /*real user ID*/gid_t fc_rgid; /*real group ID*/char fc_login[MAXLOGNAME];/*setlogin() name*/uid_t fc_uid; /*effectivr user ID*/short fc_ngroups; /*number of groups*/gid_t fc_groups[NGROUPS]; /*supplemenary group IDs*/};#define fc_gid fc_groups[0] /* effective group ID */

receiving sender credentials(2)

• Usally MAXLOGNAME is 16• NGROUP is 16• fc_ngroups is at least 1

• the credentials are sent as ancillary data when data is sent on unix domain socket.(only if receiver of data has enabled the LOCAL_CREDS socket option)

• on a datagram socket , the credentials accompany every datagram.• Credentials cannot be sent along with a descriptor• user are not able to forge credentials

Advanced I/O Functions

Outline

• Socket Timeouts• recv and send Functions• readv and writev Functions• recvmsg and sendmsg Function• Ancillary Data• How much Data is Queued?• Sockets and Standard I/O

Socket Timeouts

• Three ways to place a timeout on an I/O operation involving a socket– Call alarm, which generates the SIGALRM signal when the

specified time has expired.– Block waiting for I/O in select, which has a time limit built in, instead

of blocking in a call to read or write.– Use the newer SO_RCVTIMEO and SO_SNDTIMEO socket

options.

Connect with a Timeout Using SIGALRM

static void connect_alarm(int);int connect_timeo(int sockfd, const SA *saptr, socklen_t salen, int nsec){

Sigfunc *sigfunc;int n;sigfunc = Signal(SIGALRM, connect_alarm);if (alarm(nsec) != 0)

err_msg("connect_timeo: alarm was already set");if ( (n = connect(sockfd, (struct sockaddr *) saptr, salen)) < 0) {

close(sockfd);if (errno == EINTR)

errno = ETIMEDOUT;}alarm(0); /* turn off the alarm */return(n);

}static voidconnect_alarm(int signo){

return; /* just interrupt the connect() */}

recvfrom with a Timeout Using SIGALRM

static void sig_alrm(int);void dg_cli(FILE *fp, int sockfd, const SA *pservaddr, socklen_t servlen){

int n;char sendline[MAXLINE], recvline[MAXLINE + 1];Signal(SIGALRM, sig_alrm);while (Fgets(sendline, MAXLINE, fp) != NULL) {

Sendto(sockfd, sendline, strlen(sendline), 0, pservaddr, servlen);alarm(5);if ( (n = recvfrom(sockfd, recvline, MAXLINE, 0, NULL, NULL)) < 0) {

if (errno == EINTR)fprintf(stderr, "socket timeout\n");

elseerr_sys("recvfrom error");

} else {alarm(0);recvline[n] = 0; /* null terminate */Fputs(recvline, stdout);

}}

}static void sig_alrm(int signo){

return; /* just interrupt the recvfrom() */}

recvfrom with a Timeout Using select

intreadable_timeo(int fd, int sec){

fd_set rset;struct timeval tv;

FD_ZERO(&rset);FD_SET(fd, &rset);

tv.tv_sec = sec;tv.tv_usec = 0;

return(select(fd+1, &rset, NULL, NULL, &tv));/* > 0 if descriptor is readable */

}

Timeout Using the SO_RCVTIMEO SO_SNDTIMEO Socket Option

• We set this option once for a descriptor, specifying the timeout value, and this timeout then applies to all read operations on that descriptor.

• we set the option only once, compared to the previous two methods, which required doing something before every operation on which we wanted to place a time limit.

• neither socket option can be used to set a timeout for a connect.

recvfrom with a Timeout Using the SO_RCVTIMEO Socket Option

int n;char sendline[MAXLINE], recvline[MAXLINE + 1];struct timeval tv;tv.tv_sec = 5;tv.tv_usec = 0;Setsockopt(sockfd, SOL_SOCKET, SO_RCVTIMEO, &tv, sizeof(tv));while (Fgets(sendline, MAXLINE, fp) != NULL) {

Sendto(sockfd, sendline, strlen(sendline), 0, pservaddr, servlen);n = recvfrom(sockfd, recvline, MAXLINE, 0, NULL, NULL);if (n < 0) {

if (errno == EWOULDBLOCK) {fprintf(stderr, "socket timeout\n");continue;

} elseerr_sys("recvfrom error");

}recvline[n] = 0; /* null terminate */Fputs(recvline, stdout);

}

recv and send Functions

#include <sys/socket.h>

ssize_t recv (int sockfd, void *buff, size_t nbytes, int flags);

ssize_t send (int sockfd, const void *buff, size_t nbytes, int flags);

Flags Description recv send

MSG_DONTROUTE MSG_DONTWAIT MSG_OOB MSG_PEEK MSG_WAITALL

bypass routing table lookup only this operation is nonblocking send or receive out-of-band data peek at incoming message wait for all the data

readv and writev Functions

– readv and writev let us read into or write from one or more buffers with a single function call.

• are called scatter read and gather write.

#include <sys/uio.h>

ssize_t readv (int filedes, const struct iovec *iov, int iovcnt);

ssize_t writev (int filedes, const struct iovec *iov, int iovcnt);

Struct iovec {void *iov_base; /* starting address of buffer */size_t iov_len; /* size of buffer */

};

readv and writev Functions

– The readv and writev functions can be used with any descriptor, not just sockets. – writev is an atomic operation. For a record-based protocol such as UDP, one call

to writev generates a single UDP datagram.– One use of writev with the TCP_NODELAY socket option. //modify

• a write of 4 bytes followed by a write of 396 bytes could invoke the Nagle algorithm and a preferred solution is to call writev for the two buffers.

Nagle’s Algorithm

if there is new data to sendif the window size >= MSS and available data is >= MSS send complete MSS segment now

else if there is unconfirmed data still in the pipe enqueue data in the buffer until an acknowledge is received else send data immediately end if end ifend if

recvmsg and sendmsg

#include <sys/socket.h>

ssize_t recvmsg (int sockfd, struct msghdr *msg, int flags);

ssize_t sendmsg (int sockfd, struct msghdr *msg, int flags);

Struct msghdr {

void *msg_name; /* starting address of buffer */ socklen_t msg_namelen; /* size of protocol address */ struct iovec *msg_iov; /* scatter/gather array */ size_t msg_iovlen; /* # elements in msg_iov */ void *msg_control; /* ancillary data; must be aligned

for a cmsghdr structure */ socklen_t msg_controllen; /* length of ancillary data */ int msg_flags; /* flags returned by recvmsg() */};

recvmsg and sendmsg

Flag Examined by:

Send flags Sendto flags

Sendmsg flags

Examined by: recv flags

recvfrom flags recvmsg flags

Returned by:

Recvmsg msg_flags

MSG_DONTROUTE MSG_DONTWAIT MSG_PEEK MSG_WAITALL

MSG_EOR MSG_OOB

MSG_BCAST MSG_MCAST MSG_TRUNC MSG_CTRUNC

recvmsg and sendmsg

16

020

3

m sg _ n a m e

m sg _ fla gsm sg _ co n tro lle nm sg _ co n tro lm sg _ io v le nm sg _ io vm sg _ n a m e le n

100

60

80

io v_ b a se

io v_ le nio v_ b a seio v_ le nio v_ b a seio v_ le n

iovec{}

F igure 13.8 Data structures when recvmsg is called for a UDP socket.

msghdr{}

recvmsg and sendmsg

16

020

3

m sg_ na m e

m sg_ flag sm sg_ con tro lle nm sg_ con tro lm sg_ io v lenm sg_ io vm sg_ na m e len

100

60

80

io v_b ase

io v_ lenio v_b aseio v_ lenio v_b aseio v_ len

iovec{} [ ]

F igure 13.9 Update o f F igure 13.8 when recvmsg return.

msghdr{}

cm sg_ typ ecm sg_ leve lcm sg_ len

sockaddr_ in{}16, AF_ INET, 2000198.69.10.2

16IP P R O TP _IPIP _R E C V D S TA D D R206 .62 .22 6 .35

Ancillary Data• Ancillary data can be sent and received using the msg_control and

msg_controllen members of the msghdr structure with sendmsg and recvmsg functions.

Protocol cmsg_level Cmsg_type Description IPv4 IPPROTO_IP IP_RECVDSTADD

R IP_RECVIF

receive destination address with UDP datagram receive interface index with UDP datagram

IPv6 IPPROTO_IPV6

IPV6_DSTOPTS IPV6_HOPLIMIT IPV6_HOPOPTS IPV6_NEXTHOP IPV6_PKTINFO IPV6_RTHDR

specify / receive destination options specify / receive hop limit specify / receive hop-by-hop options specify next-hop address specify / receive packet information specify / receive routing header

Unix domain

SOL_SOCKET SCM_RIGHTS SCM_CREDS

send / receive descriptors send / receive user credentials

Ancillary Data

cmsg_len cmsg_level cmsg_type

pad

data

pad

cmsg_len cmsg_level cmsg_type

pad

data

c msghdr{}

c msghdr{}

ac c illarydata objec t

C MSG _ SPAC E()

ac c illarydata objec t

C MSG _ SPAC E()

msg_control

CMSG

_LEN

()cm

sg_le

n

msg_

contr

ollen

cmsg

_len

CMSG

_LEN

()

Figure 13.12 Ancillary data containing two ancillary data objects.

Ancillary Data

cmsghdr{}

F igure 13.13 cmsghdr structure when used with Unix domain sockets .

cmsg_len cmsg_level cmsg_type

d iscr ip to r

16SOL_SOC KETSC M_RIGHTS

cmsghdr{} cmsg_len cmsg_level cmsg_type

16SOL_SOCKETSC M_C REDS

fcred{}

How Much Data Is Queued?

• nonblocking I/O • MSG_PEEK with MSG_DONTWAIT flag• FIONREAD command of ioctl

Sockets and Standard I/O

• The standard I/O stream can be used with sockets, but there are a few items to consider.

– A standard I/O stream can be created from any desciptor by calling the fdopen function. Similarly, given a standard I/O stream, we can obtain the corresponding descriptor by calling fileno.

– fseek, fsetpos, rewind functions is that they all call lseek, which fails on a socket.

– The easiest way to handle this read-write problem is to open two standard I/O streams for a given socket: one for reading, and one for writing.

Standard i/O buffers

• Fully buffered: i/O takes place only when the buffer is full, fflush() or exit() 8192 bytes

• Line buffered: i/O takes place when a new line is encountered, fflush(), or exit()

• Unbuffered: i/O take place each time a standard i/O output function is called.

Standard i/O buffers

• Standard error is always unbuffered• Standard input and standard output are fully buffered,

unless they refer to a terminal device in which case they are line buffered.

• All other streams are fully buffered unless they refer to terminal device in which case they are line buffered.

Sockets and Standard I/O

#include "unp.h"

voidstr_echo(int sockfd){

char line[MAXLINE];FILE *fpin, *fpout;

fpin = Fdopen(sockfd, "r");fpout = Fdopen(sockfd, "w");

for ( ; ; ) {if (Fgets(line, MAXLINE, fpin) == NULL) return; /* connection closed by other end */

Fputs(line, fpout);}

}

Chapter 12.

Daemon Processes and inetd Superserver

12.1 Introduction

• A daemon is a process that runs in the background and is independent of control from all terminals.

• There are numerous ways to start a daemon1. the system initialization scripts ( /etc/rc )2. the inetd superserver3. croncron deamon4. the at command5. from user terminals

• Since a daemon does not have a controlling terminal, it needs some way to output message when something happens, either normal informational messages, or emergency messages that need to be handled by an administrator.

12.2 syslogd daemon

• Berkeley-derived implementation of syslogd perform the following actions upon startup.

1. The configuration file is read, specifying what to do with each type of log message that the daemon can receive.

2. A Unix domain socket is created and bound to the pathname /var/run/log ( /dev/log on some system).

3. A UDP socket is created and bound to port 5144. The pathname /dev/klog is opened. Any error messages from

within the kernel appear as input on this device.

• We could send log messages to the syslogd daemon from our daemons by creating a Unix domain datagram socket and sending our messages to the pathname that the daemon has bound, but an easier interface is the syslog function.

syslogd

syslogdUDP socket

port 514

Unix domain socket/dev/log

/dev/klog

Filesystem/var/log/messages

Remote syslogd

Console

12. 3 syslog function

– the priority argument is a combination of a level and a facility.

– The message is like a format string to printf, with the addition of a %m specification, which is replaced with the error message corresponding to the current value of errno.

Ex) Syslog(LOG_INFO|LOG_LOCAL2, “rename(%s, %s): %m”,file1,file2);

#include <syslog.h>

void syslog(int priority, const char *message, . . . );

12. 3 syslog function

• Log message have a level between 0 and 7.level value descriptionLOG_EMERG 0 system is unusable ( highest priority )LOG_ALERT 1 action must be taken immediatelyLOG_CRIT 2 critical conditionsLOG_ERR 3 error conditionsLOG_WARNING 4 warning conditionsLOG_NOTICE 5 normal but significant condition (default)LOG_INFO 6 informationalLOG_DEBUG 7 debug-level message ( lowest priority )

Figure 12.1 level of log message.

12. 3 syslog function

• A facility to identify the type of process sending the message.

facility DescriptionLOG_AUTH security / authorization messagesLOG_AUTHPRIV security / authorization messages (private)LOG_CRON cron daemonLOG_DAEMON system daemonsLOG_FTP FTP daemonLOG_KERN kernel messagesLOG_LOCAL0 local useLOG_LOCAL1 local useLOG_LOCAL2 local useLOG_LOCAL3 local useLOG_LOCAL4 local useLOG_LOCAL5 local useLOG_LOCAL6 local useLOG_LOCAL7 local useLOG_LPR line printer systemLOG_MAIL mail systemLOG_NEWS network news systemLOG_SYSLOG messages generated internally by syslogLOG_USER random user-level messages(default)LOG_UUCP UUCP system

Figure 12.2 facility of log messages.

12. 3 syslog function

• Openlog and closelog– openlog can be called before the first call to syslog and

closelog can be called when the application is finished sending is finished log messages.

#include <syslog.h>

void openlog(const char *ident, int options, int facility);

void closelog(void);

options Description LOG_CONS Log to console if cannot send to syslog daemon LOG_NDELAY Do not delay open, create socket now LOG_PERROR Log to standard error as well as sending to syslogd

daemon LOG_PDI Log the process ID with each message

Figure 12.3 options for openlog

Unix Login

Unix Login

Process Group

• process group is a collection of one or more processes, usually associated with the same job

• int setpgid(pid_t pid, pid_t pgid);• pid_t getpgid(pid_t pid); • It is possible for a process group leader to create a

process group, create processes in the group, and then terminate. The process group still exists, as long as at least one process is in the group, regardless of whether the group leader terminates

Process Groups in a Session

• The processes in a process group are usually placed there by a shell pipeline – proc1 | proc2 & – proc3 | proc4 | proc5

Creating Session

• A process establishes a new session by calling the setsid function

• If the calling process is not a process group leader, this function creates a new session. Three things happen.– The process becomes the session leader of this new session.

(A session leader is the process that creates a session.) The process is the only process in this new session.

– The process becomes the process group leader of a new process group. The new process group ID is the process ID of the calling process.

– The process has no controlling terminal. If the process had a controlling terminal before calling setsid, that association is broken.

setsid

• pid_t setsid(void); • This function returns an error if the caller is already a

process group leader. • To ensure this is not the case, the usual practice is to

call fork and have the parent terminate and the child continue. We are guaranteed that the child is not a process group leader, because the process group ID of the parent is inherited by the child, but the child gets a new process ID. Hence, it is impossible for the child's process ID to equal its inherited process group ID

Controlling Terminal

12.4 daemon_init Function#include <syslog.h>#define MAXFD 64extern int daemon_proc; /* defined in error.c */void daemon_init(const char *pname, int facility){

int i;pid_t pid;

if ( (pid = Fork()) != 0)exit(0); /* parent terminates */

/* 1st child continues */setsid(); /* become session leader */Signal(SIGHUP, SIG_IGN);if ( (pid = Fork()) != 0) exit(0); /* 1st child terminates */

/* 2nd child continues */daemon_proc = 1; /* for our err_XXX() functions */chdir("/"); /* change working directory */umask(0); /* clear our file mode creation mask */

for (i = 0; i < MAXFD; i++)close(i);

openlog(pname, LOG_PID, facility);}

Daemon_init

1. We first call fork and then the parent terminates, and the child continues. If the process was started as a shell command in the foreground, when the parent terminates, the shell thinks the command is done. This automatically runs the child process in the background. Also, the child inherits the process group ID from the parent but gets its own process ID. This guarantees that the child is not a process group leader, which is required for the next call to setsid

2. The process becomes the session leader of the new session, becomes the process group leader of a new process group, and has no controlling terminal

Daemon_init

• We ignore SIGHUP and call fork again. When this function returns, the parent is really the first child and it terminates, leaving the second child running. The purpose of this second fork is to guarantee that the daemon cannot automatically acquire a controlling terminal should it open a terminal device in the future. When a session leader without a controlling terminal opens a terminal device (that is not currently some other session's controlling terminal), the terminal becomes the controlling terminal of the session leader. But by calling fork a second time, we guarantee that the second child is no longer a session leader, so it cannot acquire a controlling terminal. We must ignore SIGHUP because when the session leader terminates (the first child), all processes in the session (our second child) receive the SIGHUP signal.

12.5 inetd Daemon

• A typical Unix system’s problems1. All these daemons contained nearly identical startup code.2. Each daemon took a slot in the process table, but each daemon

was asleep most of the time.

• inetd daemon fixes the two problems.1. It simplifies writing daemon processes, since most of the startup

details are handled by inetd.2. It allow a single process(inetd) to be waiting for incoming client

requests for multiple services, instead of one process for each service.

12.5 inetd daemon

• Figure 12.7

socket()

bind()

listen()(if TC P socke t)

select()fo r readab ility

accpet()( if TC P socke t)

fork()

close a ll descrip to rs o the rthan socke t

dup socke t to desc rip to rs0 ,1 and 2 ;

close socke t

setgid()setuid()

( if use r no t roo t)

exec() se rve r

close connec tedsocke t(if TC P )

F or each service lis ted in the /etc/inetd.conf file

parent child

inetd service specification

• For each service, inetd needs to know:– the socket type and transport protocol– wait/nowait flag.– login name the process should run as.– pathname of real server program.– command line arguments to server program.

• Servers that are expected to deal with frequent requests are typically not run from inetd– mail, web, NFS.

# Syntax for socket-based Internet services:

# <service_name> <socket_type> <proto> <flags> <user> <server_pathname> <args>

# # comments start with #echo stream tcp nowait root internalecho dgram udp wait root internalchargen stream tcp nowait root internalchargen dgram udp wait root internalftp stream tcp nowait root /usr/sbin/ftpd ftpd -ltelnet stream tcp nowait root /usr/sbin/telnetd telnetdfinger stream tcp nowait root /usr/sbin/fingerd fingerd# Authenticationauth stream tcp nowait nobody /usr/sbin/in.identd in.identd -l -e -o# TFTPtftp dgram udp wait root /usr/sbin/tftpd tftpd -s /tftpboot

Example /etc/inetd.conf

wait/nowait

• WAIT specifies that inetd should not look for new clients for the service until the child (the real server) has terminated.

• TCP servers usually specify nowait - this means inetd can start multiple copies of the TCP server program - providing concurrency

• Most UDP services run with inetd told to wait until the child server has died.

Broadcasting 578

• Many networks support the notion of sending a message from one host to all other hosts on the network.

• A special address called the “broadcast address” is often used.

• Some popular network services are based on broadcasting (YP/NIS, rup, rusers)

Broadcasting

Broadcasting 579

Broadcasting

• TCP works only with unicast addresses, UDP supports also broadcasting and multicasting

• Multicasting support is optional in IPv4, but mandatory in IPv6• Broadcasting support is not provided in IPv6; if an IPv4 application uses

broadcasting, recode with IPv6 to use multicasting instead of broadcasting

Type IPv4 IPv6 TCP UDP

Unicast

Broadcast

Multicast opt.

Broadcasting 580

Broadcasting

Types of Casting:Unicast: One to OneAnycast: a set to one in a setMulticast: a set to all in a setBroadcast: all to all

Useful over LAN only, and with UDP

Broadcasting 581

Uses of Broadcasting

• Mainly used for resource discovery purposes (server is known to exist in the local subnet, but IP address is not known)

– ARP (Address Resolution Protocol) • Broadcast to find MAC address for known IP address – The owner of the

IP address is to reply– BOOTP (Bootstrap Protocol)

• For a diskless workstation to discover its own IP address, the IP address of a BOOTP server on the network, and a file to be loaded into memory to boot the machine

– NTP (Network Time Protocol) • To synchronize time and coordinate time distribution in a large network

– Routing Daemons :broadcasts routing table on LAN

Broadcasting 582

Broadcast Address Types

• IPv4 address: {netid; subnetid; hostid}– Subnet-directed Broadcast Address:

• {netid; subnetid; -1} //-1 means all bits are 1’s• netid = 128.7, subnetid: 6

Broadcast Address: 128.7.6.255• Normally, routers do not forward these broadcasts

– All-subnets-directed Broadcast Address:• {netid; -1; -1}• All subnets on the specified network – very rarely used

– Network-directed Broadcast Address:• {netid: -1}• If a network has no subnetting – almost non-existent

Broadcasting 583

Broadcast Address Types

– Limited Broadcast Address:• {-1; -1; -1} or 255.255.255.255• Must never be forwarded by a router

• Subnet-directed broadcast and limited broadcast are the most common• Old systems do not understand subnet-directed broadcast• For protocols like BOOTP, 255.255.255.255 is the only option

Broadcasting 584

Unicast Vs Broadcast

In Unicast, only peers participate In Broadcast, every host on the subnet has to receive the packet and

process it up to the transport layer i.e through DL,IP, and UDP Every non-IP host also must receive at the datalink layer If broadcast datagrams arrive at higher rate, processing can affect

severely the performance

Broadcasting 585

Unicast

SendingAppl

UDP

IPv4

DataLink

UDP

IPv4

DataLink

ReceivingAppl

UDP

IPv4

DataLink

subnet 128.7.6

SendtoDest IP: 128.7.6.5Dest Port: 7433

02:60:8c:2f:4e:00

128.7.6.99 = unicast128.7.6.255 = broadcast

Enethdr

IPv4hdr

UDPhdr

UDPData

Dest Enet: 08:00:20:03:f6:42Frame type: 0800

Dest IP: 128.7.6.5Protocol: UDP

Dest Port: 7433

08:00:20:03:f6:42

128.7.6.5 = unicast128.7.6.255 = broadcast

7433

Frame type= 0800

Protocol=UDP

Port=7433

Broadcasting 586

Broadcast

SendingAppl

UDP

IPv4

DataLink

UDP

IPv4

DataLink

ReceivingAppl

UDP

IPv4

DataLink

subnet 128.7.6

sendtoDest IP: 128.7.6.255Dest Port: 520

02:60:8c:2f:4e:00

128.7.6.99 = unicast128.7.6.255 = broadcast

Enethdr

IPv4hdr

UDPhdr

UDPData

Dest Enet: ff:ff:ff:ff:ff:ffFrame type: 0800

Dest IP: 128.7.6.255Protocol: UDP

Dest Port: 520

02:60:20:03:f6:42

128.7.6.5 = unicast128.7.6.255 = broadcast

520

Frame type= 0800

Protocol=UDP

Port=520

Frame type= 0800

Protocol=UDP

Discard

Set SO_BROADCASToption using setsockopt()

Broadcasting 587

Programming Requirements

• Socket option has to be set with SO_BROADCAST

• Setsockopt(sockfd, SOL_SOCKET,SO_BROADCAST,&on,sizeof(on)).

• IP Fragmentation: BSD generates EMSGSIZE if size exceeds outgoing MTU

Broadcasting 588

Race Condition

void dg_cli(…) {setsockopt(sockfd, SOL_SOCKET,SO_BROADCAST,&on,sizeof(on));

signal(SIGALRM, func);while(fgets(…)!=NULL) {

sendto(…);alarm(1);for(; ; ) {

if (n=recvfrom(…) <0) {if (errno==EINTR) break;else err_sys(…);

} else {recvline[n]=0;sleep(1);printf(…);

}}}Void func( int signo) { return; }

Problem?

- When multiple processes accessing shared data output depends on the execution order of the processes.

Broadcasting 589

Solutions to Race Condition

1. By Un-blocking and Blocking SIGALRMsigemptyset(&sig1);

sigaddset(&sig1, SIGALRM);

signal(SIGALRM, func);

while(fgets(…) !=NULL))

sendto(…);

alarm(5);

for(; ; ){

sigprocmask(SIG_UNBLOCK, &sig1,NULL);

n=recvfrom(…);

sigprocmask(SIG_BLOCK,&sig1, NULL);

if(n<0) {

if (errno==EINTR) break; else err_sys(…);

} else { recvline[n]=0; printf(…); }}}

void func(…)

{return;}

Signal Generation and Delivery is controlled

Window is reduced but the problem still persists

Broadcasting 590

2. pselect can be used with SIGALRM first blocked and then pselect being called with an empty signal set as it’s last argument.

pselect, blocking and unblocking being atomic calls, earlier

problem does not persist.

Broadcasting 591

3. Using non-local goto siglongjmp to jump from signal handler to the caller.signal(SIGALRM, func);

while (fgets(…)!=NULL) {sendto(…);alarm(5);for(; ;) {

if (sigsetjmp(jmpbuf, 1) != 0)break;

n=recvfrom(…);recvline[n]=0;printf(…);

}void func(…) {siglongjmp(jmpbuf, 1);}

Broadcasting 592

4. Using IPC from signal handler to function

void dg_cli(…) {setsockopt(…);pipe (pipefd);FD_ZERO(&rset);signal(SIGALRM, func);while(fgets(…)!=NULL){

sendto(…);alarm(5);for(; ;) {

FD_SET(sockfd, &rset);FD_SET(pipefd[0],&rset);if(n = select (…) <0) {

if (errno==EINTR) continue; else err_sys(…); }

if (FD_ISSET(sockfd, &rset) ) {recvfrom(…); printf(…); }

if (FD_ISSET(pipefd[0], &rset)) {read(pipefd[0], &n, 1); break; }

void func(int signo) {write (pipefd[1], “ ”, 1); return;}

Multicasting 593

• IPv4 Class D addresses are multicast addresses– Range 224.0.0.0 to 239.255.255.255

– 32 bit Class D address is called the group address

0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7

0 NET-ID(7b) HOST-ID (24b)

1 0 NET-ID (12b) HOST-ID (14b)

1 1 0 NET-ID (21b) HOST-ID (8b)

1 1 1 0 GROUP-ID (28b)

CLASS A:

CLASS B:

CLASS C:

CLASS D:

Multicasting

Multicasting 594

• A mapping from IPv4 multicast addresses to Ethernet addresses is also defined– High order 24 bits always 01:00:5e– 25th bit is 0– Low order 23 bits from lowest 23 bits of multicast group address– Not one-to-one, many (32) multicast addresses to a single Ethernet

address

• Broadcasting is normally limited to LANs, whereas Multicasting can be done in LANs or WANs

multicast address• IPv4 class D address

– 224.0.0.0 ~ 239.255.255.255 – (224.0.0.1: all hosts group), (224.0.0.2: all-routers group)

Multicast Addresses Scope

Multicast Session

• Especially in the case of streaming multimedia, the combination of an IP multicast address (either IPv4 or IPv6) and a transport-layer port (typically UDP) is referred to as a session.

• For example, an audio/video teleconference may comprise two sessions; one for audio and one for video. These sessions almost always use different ports and sometimes also use different groups for flexibility in choice when receiving.

Multicasting 598

Multicast vs Broadcast

SendingAppl

UDP

IPv4

DataLink

UDP

IPv4

DataLink

ReceivingAppl

UDP

IPv4

DataLink

subnet 128.7.6

sendtoDest IP: 224.0.1.1Dest Port: 123

02:60:8c:2f:4e:00

Enethdr

IPv4hdr

UDPhdr

UDPData

Dest Enet: 01:00:5e:00:01:01Frame type: 0800

Dest IP: 224.0.1.1Protocol: UDP

Dest Port: 123

02:60:20:03:f6:42

123

Frame type= 0800

Protocol=UDP

Port=123 join

224.0.1.1

receive01:00:5e:00:01:01

Imperfect hw filteringbased on dest Enet

Perfect sw filteringbased on dest IP

Multicasting 599

Multicasting on a WAN

MR1

MR2 MR3

MR5

MR4

Multicasting 600

Hosts joining a Multicast Group

MR1

MR2 MR3

MR5

MR4

H1

H2 H3 H4 H5

joingroup

joingroup

joingroup

joingroup

joingroup

MRPMRP MRP

MRP

Multicasting 601

Sending packets on a WAN

MR1

MR2 MR3

MR5

MR4

H1

H2 H3 H4 H5

joingroup

joingroup

joingroup

joingroup

joingroup

Multicasting 602

Multicasting

• Specifically note that;– All interested multicast routers receive the packets, MR5 does not

receive any since there are no interested hosts in its LAN– Packets are put to the specific LAN only if there are hosts in that LAN

to receive those packets, MR3 only forwards– Multicast router MR2 both puts packets on its LAN for hosts H2 & H3,

and also makes a copy of the packets and forwards them to MR3.– This behavior is something unique to multicast forwarding.

Source-Specific Multicast

• Multicasting on a WAN has been difficult to deploy for several reasons.– The biggest problem is that the MRP; needs to get the data from all

the senders, which may be located anywhere in the network, to all the receivers, which may similarly be located anywhere.

– Another large problem is multicast address allocation: There are not enough IPv4 multicast addresses to statically assign them to everyone who wants one, as is done with unicast addresses.

Source-Specific Multicast

• combines the group address with a system's source address, which solves the problems as follows:

– The receivers supply the sender's source address to the routers as part of joining the group.

– This removes the rendezvous problem from the network, as the network now knows exactly where the sender is.

– However, it retains the scaling properties of not requiring the sender to know who all the receivers are. This simplifies multicast routing protocols immensely.

• It redefines the identifier from simply being a multicast group address to being a combination of a unicast source and multicast destination (which SSM now calls a channel.

• An SSM session is the combination of source, destination, and port

• struct ip_mreq {• struct in_addr imr_multiaddr; /* IPv4 class D multicast addr */• struct in_addr imr_interface; /* IPv4 addr of local interface */• };

• struct ipv6_mreq {• struct in6_addr ipv6mr_multiaddr; /* IPv6 multicast addr */• unsigned int ipv6mr_interface; /* interface index, or 0 */• };

• struct group_req {• unsigned int gr_interface; /* interface index, or 0 */• struct sockaddr_storage gr_group; /* IPv4 or IPv6 multicast addr */• }

struct ip_mreq_source { struct in_addr imr_multiaddr; /* IPv4 class D multicast addr */ struct in_addr imr_sourceaddr; /* IPv4 source addr */ struct in_addr imr_interface; /* IPv4 addr of local interface */};

struct group_source_req { unsigned int gsr_interface; /* interface index, or 0 */ struct sockaddr_storage gsr_group; /* IPv4 or IPv6 multicast addr */ struct sockaddr_storage gsr_source; /* IPv4 or IPv6 source addr */}

Multicasting 609

Multicast Socket Options

• Use setsockopt() to modify socket options– IP_ADD_MEMBERSHIP

• Join a multicast group on a specified local interface– IP_DROP_MEMBERSHIP

• Leave a multicast group– IP_MULTICAST_IF

• Specify the interface for outgoing multicast datagrams sent on this socket– IP_MULTICAST_TTL

• Set the IPv4 TTL parameter (if not specified, default=1)– IP_MULTICAST_LOOP

• Enable or disable local loopback (default is enabled)

Multicasting 610

• IPv4 Class D addresses are multicast addresses– Range 224.0.0.0 to 239.255.255.255

– 32 bit Class D address is called the group address

0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7

0 NET-ID(7b) HOST-ID (24b)

1 0 NET-ID (12b) HOST-ID (14b)

1 1 0 NET-ID (21b) HOST-ID (8b)

1 1 1 0 GROUP-ID (28b)

CLASS A:

CLASS B:

CLASS C:

CLASS D:

Multicasting

Multicasting 611

• A mapping from IPv4 multicast addresses to Ethernet addresses is also defined– High order 24 bits always 01:00:5e– 25th bit is 0– Low order 23 bits from lowest 23 bits of multicast group address– Not one-to-one, many (32) multicast addresses to a single Ethernet

address

• Broadcasting is normally limited to LANs, whereas Multicasting can be done in LANs or WANs

multicast address• IPv4 class D address

– 224.0.0.0 ~ 239.255.255.255 – (224.0.0.1: all hosts group), (224.0.0.2: all-routers group)

Multicast Addresses Scope

Multicast Session

• Especially in the case of streaming multimedia, the combination of an IP multicast address (either IPv4 or IPv6) and a transport-layer port (typically UDP) is referred to as a session.

• For example, an audio/video teleconference may comprise two sessions; one for audio and one for video. These sessions almost always use different ports and sometimes also use different groups for flexibility in choice when receiving.

Multicasting 615

Multicast vs Broadcast

SendingAppl

UDP

IPv4

DataLink

UDP

IPv4

DataLink

ReceivingAppl

UDP

IPv4

DataLink

subnet 128.7.6

sendtoDest IP: 224.0.1.1Dest Port: 123

02:60:8c:2f:4e:00

Enethdr

IPv4hdr

UDPhdr

UDPData

Dest Enet: 01:00:5e:00:01:01Frame type: 0800

Dest IP: 224.0.1.1Protocol: UDP

Dest Port: 123

02:60:20:03:f6:42

123

Frame type= 0800

Protocol=UDP

Port=123 join

224.0.1.1

receive01:00:5e:00:01:01

Imperfect hw filteringbased on dest Enet

Perfect sw filteringbased on dest IP

Multicasting 616

Multicasting on a WAN

MR1

MR2 MR3

MR5

MR4

Multicasting 617

Hosts joining a Multicast Group

MR1

MR2 MR3

MR5

MR4

H1

H2 H3 H4 H5

joingroup

joingroup

joingroup

joingroup

joingroup

MRPMRP MRP

MRP

Multicasting 618

Sending packets on a WAN

MR1

MR2 MR3

MR5

MR4

H1

H2 H3 H4 H5

joingroup

joingroup

joingroup

joingroup

joingroup

Multicasting 619

Multicasting

• Specifically note that;– All interested multicast routers receive the packets, MR5 does not

receive any since there are no interested hosts in its LAN– Packets are put to the specific LAN only if there are hosts in that LAN

to receive those packets, MR3 only forwards– Multicast router MR2 both puts packets on its LAN for hosts H2 & H3,

and also makes a copy of the packets and forwards them to MR3.– This behavior is something unique to multicast forwarding.

Source-Specific Multicast

• Multicasting on a WAN has been difficult to deploy for several reasons.– The biggest problem is that the MRP; needs to get the data from all

the senders, which may be located anywhere in the network, to all the receivers, which may similarly be located anywhere.

– Another large problem is multicast address allocation: There are not enough IPv4 multicast addresses to statically assign them to everyone who wants one, as is done with unicast addresses.

Source-Specific Multicast

• combines the group address with a system's source address, which solves the problems as follows:

– The receivers supply the sender's source address to the routers as part of joining the group.

– This removes the rendezvous problem from the network, as the network now knows exactly where the sender is.

– However, it retains the scaling properties of not requiring the sender to know who all the receivers are. This simplifies multicast routing protocols immensely.

• It redefines the identifier from simply being a multicast group address to being a combination of a unicast source and multicast destination (which SSM now calls a channel.

• An SSM session is the combination of source, destination, and port

• struct ip_mreq {• struct in_addr imr_multiaddr; /* IPv4 class D multicast addr */• struct in_addr imr_interface; /* IPv4 addr of local interface */• };

• struct ipv6_mreq {• struct in6_addr ipv6mr_multiaddr; /* IPv6 multicast addr */• unsigned int ipv6mr_interface; /* interface index, or 0 */• };

• struct group_req {• unsigned int gr_interface; /* interface index, or 0 */• struct sockaddr_storage gr_group; /* IPv4 or IPv6 multicast addr */• }

struct ip_mreq_source { struct in_addr imr_multiaddr; /* IPv4 class D multicast addr */ struct in_addr imr_sourceaddr; /* IPv4 source addr */ struct in_addr imr_interface; /* IPv4 addr of local interface */};

struct group_source_req { unsigned int gsr_interface; /* interface index, or 0 */ struct sockaddr_storage gsr_group; /* IPv4 or IPv6 multicast addr */ struct sockaddr_storage gsr_source; /* IPv4 or IPv6 source addr */}

Multicasting 626

Multicast Socket Options

• Use setsockopt() to modify socket options– IP_ADD_MEMBERSHIP

• Join a multicast group on a specified local interface– IP_DROP_MEMBERSHIP

• Leave a multicast group– IP_MULTICAST_IF

• Specify the interface for outgoing multicast datagrams sent on this socket– IP_MULTICAST_TTL

• Set the IPv4 TTL parameter (if not specified, default=1)– IP_MULTICAST_LOOP

• Enable or disable local loopback (default is enabled)

Distributed Program Design

• Communication-Oriented Design– Design protocol first. – Build programs that adhere to the protocol.

• Application-Oriented Design– Build application(s).– Divide programs up and add communication protocols.

Typical Typical

SocketsSockets

ApproachApproach

RPCRPC

RPCRemote Procedure Call

• Call a procedure (subroutine) that is running on another machine.

• Issues:– identifying and accessing the remote procedure– parameters– return value

blah, blah, blah

bar = foo(a,b);

blah, blah, blah

int foo(int x, int y ) { if (x>100)

return(y-2); else if (x>10)

return(y-x); else

return(x+y);}

ClientClientServerServer

protocol

Remote Subroutine

Sun RPC

• There are a number of popular RPC specifications.• Sun RPC (ONC RPC) is widely used.• NFS (Network File System) is RPC based.• Rich set of support tools.

Sun RPC Organization

Procedure 1Procedure 1 Procedure 2Procedure 2 Procedure 3Procedure 3

Shared Global DataShared Global Data

Remote ProgramRemote Program

Procedure Arguments

• To reduce the complexity of the interface specification, Sun RPC includes support for a single argument to a remote procedure.*

• Typically the single argument is a structure that contains a number of values.

* Newer versions can handle multiple args.

Procedure Identification

• Each procedure is identified by:– Hostname (IP Address)– Program identifier (32 bit integer)– Procedure identifier (32 bit integer)

– Program Version identifier• for testing and migration.

Program Identifiers

• Each remote program has a unique ID.• Sun divided up the IDs:

0x00000000 - 0x1fffffff0x20000000 - 0x3fffffff0x40000000 - 0x5fffffff0x60000000 - 0xffffffff

SunSun

SysAdmin SysAdmin

TransientTransient

ReservedReserved

Procedure Identifiers &Program Version Numbers

• Procedure Identifiers usually start at 1 and are numbered sequentially

• Version Numbers typically start at 1 and are numbered sequentially.

Iterative Server

• Sun RPC specifies that at most one remote procedure within a program can be invoked at any given time.

• If a 2nd procedure is called, the call blocks until the 1st procedure has completed.

Iterative can be good

• Having an iterative server is useful for applications that may share data among procedures.

• Example: database - to avoid insert/delete/modify collisions.

• We can provide concurrency when necessary...

Call Semantics

• What does it mean to call a local procedure?– the procedure is run exactly one time.

• What does it mean to call a remote procedure?– It might not mean "run exactly once"!

Remote Call Semantics

• To act like a local procedure (exactly one invocation per call) - a reliable transport (TCP) is necessary.

• Sun RPC does not support reliable call semantics. !• "At Least Once" Semantics• "Zero or More" Semantics

Sun RPC Call Semantics

• At Least Once Semantics– if we get a response (a return value)

• Zero or More Semantics– if we don't hear back from the remote subroutine.

Remote Procedure deposit()

deposit(DavesAccount,$100)

• Always remember that you don't know how many times the remote procedure was run!– The net can duplicate the request (UDP).

Network Communication

• The actual network communication is nothing new - it's just TCP/IP.

• Many RPC implementations are built upon the sockets library.– the RPC library does all the work!

• We are just using a different API, the underlying stuff is the same!

Dynamic Port Mapping

• Servers typically do not use well known protocol ports!

• Clients know the Program ID (and host IP address).

• RPC includes support for looking up the port number of a remote program.

Port Lookup Service

• A port lookup service runs on each host that contains RPC servers.

• RPC servers register themselves with this service:– "I'm program 17 and I'm looking for requests on port 1736"

The portmapper

• Each system which will support RPC servers runs a port mapper server that provides a central registry for RPC services.

• Servers tell the port mapper what services they offer.

More on the portmapper

• Clients ask a remote port mapper for the port number corresponding to Remote Program ID.

• The portmapper is itself an RPC server!

• The portmapper is available on a well-known port (111).

Sun RPC Programming

• The RPC library is a collection of tools for automating the creation of RPC clients and servers.

• RPC clients are processes that call remote procedures.

• RPC servers are processes that include procedure(s) that can be called by clients.

RPC Programming

• RPC library– XDR routines– RPC run time library

• call rpc service• register with portmapper• dispatch incoming request to correct procedure

– Program Generator

RPC Run-time Library

• High- and Low-level functions that can be used by clients and servers.

• High-level functions provide simple access to RPC services.

High-level Client Library

int callrpc( char *host,u_long prognum,u_long versnum,u_long procnum,xdrproc_t inproc,char *in,xdrproc_t outproc,char *out);

High-Level Server Library

int registerrpc(u_long prognum,u_long versnum,u_long procnum,char *(*procname)()xdrproc_t inproc,xdrproc_t outproc);

High-Level Server Library (cont.)

void svc_run();

• svc_run() is a dispatcher. • A dispatcher waits for incoming connections and

invokes the appropriate function to handle each incoming request.

High-Level Library Limitation

• The High-Level RPC library calls support UDP only (no TCP).

• You must use lower-level RPC library functions to use TCP.

• The High-Level library calls do not support any kind of authentication.

Low-level RPC Library

• Full control over all IPC options– TCP & UDP– Timeout values– Asynchronous procedure calls

• Multi-tasking Servers• Broadcasting

IPC is InterProcess Communication

RPCGEN

• There is a tool for automating the creation of RPC clients and servers.

• The program rpcgen does most of the work for you.• The input to rpcgen is a protocol definition in the

form of a list of remote procedures and parameter types.

RPCGEN

Input File

rpcgen

Client Stubs XDR filters header file Server skeleton

C Source CodeC Source Code

ProtocolProtocolDescriptionDescription

rpcgen Output Files

> rpcgen –C foo.x

foo_clnt.c (client stubs)foo_svc.c (server main)foo_xdr.c (xdr filters)foo.h (shared header file)

Client Creation

> gcc -o fooclient foomain.c foo_clnt.c foo_xdr.c -lnsl

• foomain.c is the client main() (and possibly other functions) that call rpc services via the client stub functions in foo_clnt.c

• The client stubs use the xdr functions.

Server Creation

gcc -o fooserver fooservices.c foo_svc.c foo_xdr.c –lrpcsvc -lnsl

• fooservices.c contains the definitions of the actual remote procedures.

Example Protocol Definitionstruct twonums {

int a;int b;

};program UIDPROG {

version UIDVERS {int RGETUID(string<20>) = 1;string RGETLOGIN( int ) = 2;int RADD(twonums) = 3;

} = 1;} = 0x20000001;

RPC Programming with rpcgen

Issues:– Protocol Definition File– Client Programming

• Creating an "RPC Handle" to a server• Calling client stubs

– Server Programming• Writing Remote Procedures

Protocol Definition File

• Description of the interface of the remote procedures.– Almost function prototypes

• Definition of any data structures used in the calls (argument types & return types)

• Can also include shared C code (shared by client and server).

XDR the language

• Remember that XDR data types are not C data types!– There is a mapping from XDR types to C types – that's most

of what rpcgen does.

• Most of the XDR syntax is just like C– Arrays, strings are different.

XDR Arrays

• Fixed Length arrays look just like C code:int foo[100]

• Variable Length arrays look like this:

int foo<> or int foo<MAXSIZE>

Implicit maximum size is 232-1

What gets sent on the network

int x[n]

x0 x1

int y<m>int y<m>

xn-1x2 . . .

y0 y1 . . .k

k is actual array sizek my2 yk

XDR String Type

• Look like variable length arrays:string s<100>

• What is sent: length followed by sequence of ASCII chars:

. . .n s0s1s2s3 Sn-1

n is actual string length (sent as int)

Linked Lists!struct foo { int x; foo *next;}

The generated XDR filter uses xdr_pointer() to encode/decode the stuff pointed to by a pointer.

Check the online example "linkedlist".

rpcgen recognizes this as a linked list

Declaring The Program

program SIMP_PROG { version SIMP_VERSION { type1 PROC1(operands1) = 1; type2 PROC2(operands2) = 2; } = 1;} = 40000000;

Keywords Generated Symbolic ConstantsUsed to generate stub and procedure names

Color Code:

Procedure Numbers

• Procedure #0 is created for you automatically.– Start at procedure #1!

• Procedure #0 is a dummy procedure that can help debug things (sortof an RPC ping server).

Procedure NamesRpcgen converts to lower case and prepends underscore

and version number:rtype PROCNAME(arg)

Client stub:rtype *proc_1(arg *, CLIENT *);

Server procedure: rtype *proc_1_svc(arg *, struct svc_req *);

Program Numbers

• Use something like:555555555 or 22222222

• You can find the numbers currently used with "rpcinfo –p hostname"

Client Programming

• Create RPC handle. – Establishes the address of the server.

• RPC handle is passed to client stubs (generated by rpcgen).

• Type is CLIENT *

clnt_create

CLIENT *clnt_create(char *host,u_long prog, u_long vers,char *proto);

Hostname of server

Program number

Version number

Can be "tcp" or "udp"

Calling Client Stubs

• Remember:– Return value is a pointer to what you expect.– Argument is passed as a pointer.– If you are passing a string, you must pass a char**

• When in doubt – look at the ".h" file generated by rpcgen

Server Procedures

• Rpcgen writes most of the server.• You need to provide the actual remote procedures.• Look in the ".h" file for prototypes.• Run "rpcgen –C –Ss" to generate (empty) remote

procedures!

Server Function Names

• Old Style (includes AIX): Remote procedure FOO, version 1 is named foo_1()

• New Style (includes Sun,BSD,Linux): Remote procedure FOO, version 1 is named foo_1_svc()

Running rpcgen

• Command line options vary from one OS to another.• Sun/BSD/Linux – you need to use "-C" to get ANSI C

code!• Rpcgen can help write the files you need to write:

– To generate sample server code: "-Ss"– To generate sample client code: "-Sc"

Other porting issues

• Shared header file generated by rpcgen may have: #include <rpc/rpc.h>

• Or Not!

RPC without rpcgen

• Can do asynchronous RPC– Callbacks– Single process is both client and server.

• Write your own dispatcher (and provide concurrency)• Can establish control over many network parameters:

protocols, timeouts, resends, etc.

rpcinforpcinfo –p host prints a list of all registered

programs on host.

rpcinfo –[ut] host program# makes a call to procedure #0 of the specified RPC program (RPC ping).

u : UDPt : TCP

Sample Code

• simple – integer add and subtract• ulookup – look up username and uid.• varray – variable length array example.• linkedlist – arg is linked list.

Example simp

• Standalone program simp.c– Takes 2 integers from command line and prints out the sum

and difference.– Functions:

int add( int x, int y );int subtract( int x, int y );

Splitting simp.c

• Move the functions add() and subtract() to the server.

• Change simp.c to be an RPC client– Calls stubs add_1() , subtract_1()

• Create server that serves up 2 remote procedures – add_1_svc() and subtract_1_svc()

Protocol Definition: simp.xstruct operands { int x; int y;};

program SIMP_PROG { version SIMP_VERSION { int ADD(operands) = 1; int SUB(operands) = 2; } = VERSION_NUMBER;} = 555555555;

rpcgen –C simp.xsimp.x

rpcgen

simp_clnt.csimp_clnt.csimp_xdr.csimp_xdr.c

simp.hsimp.hsimp_svc.csimp_svc.cClient Stubs

XDR filtersheader file

Server skeleton

xdr_operands XDR filterbool_t xdr_operands( XDR *xdrs,

operands *objp){

if (!xdr_int(xdrs, &objp->x)) return (FALSE); if (!xdr_int(xdrs, &objp->y)) return (FALSE); return (TRUE);}

simpclient.c

• This was the main program – is now the client.• Reads 2 ints from the command line.• Creates a RPC handle.• Calls the remote add and subtract procedures.• Prints the results.

simpservice.c

• The server main is in simp_svc.c.• simpservice.c is what we write – it holds the add

and subtract procedures that simp_svc will call when it gets RPC requests.

• The only thing you need to do is to match the name/parameters that simp_svc expects (check simp.h!).

Raw Sockets

Raw Sockets

IP address

Port address

MAC address

TCP/IP Stack

67

Bootp

DHCP

176

2

OSPF89

53

protocol

frametype

UDPPort #

TCPPort #

1

EGP8

IPv641

16125 23 6921

User TCP

ICMP UDP stackTCP stack

6

17 UDP6 TCP1 ICMP2 IGMP

89 OSPF

TCP

port

port

TCP

port

17

UDP

port

port

RAW

2

1

89

User UDPICMP (ping, etc)

RAW

IGMP

echotimestamp

What can raw sockets do?

• Bypass TCP/UDP layers• Read and write ICMP and IGMP packets

– ping, traceroute, multicast routing daemon• Read and write IP datagrams with an IP protocol field not processed by the

kernel– OSPF

• Send and receive your own IP packets with your own IP header using the IP_HDRINCL socket option

– can build and send TCP and UDP packets– testing, hacking– only superuser can create raw socket though

• You need to do all protocol processing at user-level

RAW SOCKETS 694

Creating Raw Sockets• Only Superuser can create• socket(AF_INET, SOCK_RAW, protocol)

– where protocol is one of the constants, IPPROTO_xxx, such as IPPROTO_ICMP.

• bind can be called on the raw socket, but this is rare. This function sets only the local address: There is no concept of a port number with a raw socket.

• connect can be called on the raw socket, but this is rare. This function sets only the foreign address: Again, there is no concept of a port number with a raw socket.

RAW SOCKETS 695

Creating Raw Sockets: IP Header option

• The IP_HDRINCL socket option can be set as follows:

• const int on = 1; • if (setsockopt(sockfd, IPPROTO_IP, IP_HDRINCL, &on,

sizeof(on)) < 0) error

RAW SOCKETS 696

Raw Socket Output• Normal output is performed by calling sendto or sendmsg and specifying

the destination IP address– write, writev, or send can also be called if the socket has been connected.

• If the IP_HDRINCL option is not set, kernel prepends the IP header – The kernel sets the protocol field of the IPv4 header that it builds to the third

argument from the call to socket.• If the IP_HDRINCL option is set, the starting address of the data for the

kernel to send specifies the first byte of the IP header. – The process builds the entire IP header, except: (i) the IPv4 identification field

can be set to 0, which tells the kernel to set this value; (ii) the kernel always calculates and stores the IPv4 header checksum; and (iii) IP options may or may not be included

• The kernel fragments raw packets that exceed the outgoing interface MTU.

RAW SOCKETS 697

Raw Socket Input• Which received IP datagrams does the kernel pass to raw sockets?• Received UDP packets and received TCP packets are never passed to a raw

socket. – read at the datalink layer

• Most ICMP packets are passed to a raw socket after the kernel has finished processing the ICMP message.

– Except echo request, timestamp request, and address mask request • All IGMP packets are passed to a raw socket after the kernel has finished

processing the IGMP message.• All IP datagrams with a protocol field that the kernel does not understand are

passed to a raw socket. • If the datagram arrives in fragments, nothing is passed to a raw socket until all

fragments have arrived and have been reassembled.

RAW SOCKETS 698

Raw Socket Input• When the kernel has an IP datagram, all raw sockets for all processes are

examined, looking for all matching sockets. • A copy of the IP datagram is delivered to each matching socket. • The following tests are performed for each raw socket and only if all three tests

are true is the datagram delivered to the socket:– If a nonzero protocol is specified, protocol field must match– If a local IP address is bound to the raw socket by bind, then the destination IP

address of the received datagram must match – If a foreign IP address was specified for the raw socket by connect, then the source IP

address of the received datagram must match • Notice that if a raw socket is created with a protocol of 0, and neither bind nor

connect is called, then that socket receives a copy of every raw datagram the kernel passes to raw sockets.

RAW SOCKETS 699

Raw Socket Input

• Whenever a received datagram is passed to a raw IPv4 socket, the entire datagram, including the IP header, is passed to the process

• For a raw IPv6 socket, only the payload (i.e., no IPv6 header or any extension headers) is passed to the socket

RAW SOCKETS 700

Raw Socket Input

• Whenever a received datagram is passed to a raw IPv4 socket, the entire datagram, including the IP header, is passed to the process

• For a raw IPv6 socket, only the payload (i.e., no IPv6 header or any extension headers) is passed to the socket

RAW SOCKETS 701

Example: Ping Program

• Send an ICMP echo request to some IP address and receive an ICMP echo reply.

• #ping 172.10.1.3• Ping 172.10.1.3: 56 bytes of data• Reply from 172.10.1.3: bytes=56 time<10ms ttl=255

• … (4 replies)

Not active : Request Timeout

RAW SOCKETS 702

ICMP Message

• set the identifier to the PID of the ping process and we increment the sequence number by one for each packet we send

• We store the 8-byte timestamp of when the packet is sent as the optional data. The rules of ICMP require that the identifier, sequence number, and any optional data be returned in the echo reply.

• Storing the timestamp in the packet lets us calculate the RTT when the reply is received.

RAW SOCKETS 703

ICMP Message

RAW SOCKETS 704

ICMP Echo Message

RAW SOCKETS 705

ICMP Echo Message

RAW SOCKETS 706

ICMP Echo Message

RAW SOCKETS 707

main

Read loop

recvfrom Proc_v4

Infinite receive loop

Sig_Alrm

Send_v4

Send an echo request once a second

Ping Program

RAW SOCKETS 708

Traceroute Example

• Determines the path IP datagrams follow• Uses TTL field(IPv4) or hop limit(IPv6) and two ICMP messages• One UDP datagram is sent by the host with TTL=1 to the destination• 1st hop router sends an ICMP “time exceed in transit” error• TTL is increased to 2, and another datagram is sent• Process repeats with a final datagram with a port number not in use on

the destination, so that destination can send “ICMP port unreachable” error

RAW SOCKETS 709

DATALINK ACCESS 710

Datalink Access

• Uses– Watch packets on the interface– Programs can be run as applications than as part of kernel

• Ways to access the datalink – BSD Packet Filter– Datalink Provider Interface– Linux SOL_PACKET interface

Public library: libpcap

DATALINK ACCESS 711

BSD Packet filter

IPv4 IPv6

datalinkBPF

filter

buffer

application application

buffer

filter

Writing is not frequent. Why?

process

kernel

Filters: tcp, udp, tcp[15:1] 1 byte starting at offset 15

DATALINK ACCESS 712

BPF reduces its’ overhead by

1. Filtering is within the kernel2. Only a part of each packet is transmitted3. Uses buffering for both read and write to reduce

number of system calls.

Accessing a BPF: Open a BPF device, Use ioctl to set the properties likeLoad the filter, set read timeout, set buffer size, attach a DL to BPF, enable

Promiscuous mode etc.

DATALINK ACCESS 713

Linux : SOCK_PACKET

• Superuser privileges are required

• Fd =socket(AF_INET, SOCK_PACKET, htons (ETH_P_ALL))

ETH_P_IP, ETH_P_ARP, ETH_IP_IPV6

Disadvantages:1. No kernel buffering, hence, more system calls2. No device filtering, hence, ETH_IP_P will givepackets from Ethernet, PPP, SLIP links, and loop

back devices

ICMP Format

subtype

Ping Program

• Create a raw socket to send/receive ICMP echo request and echo reply packets

• Install SIGALRM handler to process output– Sending echo request packets every t second– Build ICMP packets (type, code, checksum, id,

seq, sending timestamp as optional data)• Enter an infinite loop processing input

– Use recvmsg() to read from the network– Parse the message and retrieve the ICMP packet– Print ICMP packet information, e.g., peer IP

address, round-trip time

Traceroute program

• Create a UDP socket and bind source port– To send probe packets with increasing TTL– For each TTL value, use timer to send a probe every three seconds,

and send 3 probes in total• Create a raw socket to receive ICMP packets

– If timeout, printing “ *”– If ICMP “port unreachable”, then terminate– If ICMP “TTL expired”, then printing hostname of the router and round

trip time to the router

ISZC462

Lecture#8

Problem 1

• This problem is about implementing a local chat server and client in a system. The server and client will facilitate the communication between multiple users of the system. You should submit client_idno.c and server_idno.c for client and server respectively.

• The chat server supports the following functionalities.• let us say currently users B, C and D have entered chat server. Then user A joins chat.

Server will tell all the current chatters B, C and D: ‘A just joined’ – command: connect <username>

• A can say a message to every one “Hello! Everyone!” or A can whisper a message to C alone ‘I want to tell a secret to you’. So server should facilitate one to all and one to one communication.

– Command: talk * //to talk to all chatters– Command: talk <username> to talk to one user

• A can also get the list of all chatters.– Command: list

• A can disconnect from chat – Command: disconnect

Problem 2

• The server program should• start like ./server <path>• since it runs within the system, it should use either FIFO/Message Queues for

inter process communication.• use select() call for dealing with multiple users concurrently• The client program should • start like ./client <serverpath>• take care of interpreting commands entered by user. • process the command until Ctrl-D is pressed. When a user types and then

presses <ENTER>, that is the end of one message. But the program will still wait for the next message until user presses Ctrl-D (EOF for fgets()).

• the client is capable of handling the sending and receiving simultaneously. Any messages received while the user is typing the message to be sent, will be simply flashed on the console.

Problem 3

• A simple TCP based chat server could allow two users to use any TCP client (telnet, for example) to communicate with each other. Consider a single process, single thread server that can support exactly 2 clients at once, the server simply forwards whatever is sent from one client to the other (in both directions). As soon as something is sent from one client it is immediately forwarded to the other client. As soon as either client terminates the connection, the server exits. Provide server code with comments.

Problem 4

1. When the server starts it reads from a file having a list of domain names which are to be forbidden to access. When a HTTP request comes to server, http://discovery.bits-pilani.ac.in/index.html, it checks if the domain name “discovery.bits-pilani.ac.in” exists in the list. If it is, the server sends back HTTP error 403 Forbidden to the client. If not it sends the request to the actual server. When it gets the reply, it sends the reply to the client.

2. Your server takes a port number on the command line. It can be iterative server.

3. Your server will be tested with a browser.

• Suppose you are given a task of testing the validity of links in a given web page. You are expected to test each url present in the web page and report the result. URL is of this form:

• http://<domain name>/<directory1>/<directory2>/ … /<filename>• Testing URL for validity means to test the existence of domain name, and

existence of file in the given path on remote server.• To simplify the problem, you can take a list of URLs in a file; one url per

line. Your program takes this file name as command-line argument. Your program should read each URL and validate the URL. The result is one of {VALID, INCORRECT DOMAIN, FILE DOESNT EXIST}. Your program should display the URL and result(s); each URL and its result per one line on console

Problem 5

Problem 6

Consider the following network. There are n nodes connected in a ring topology. The communication to any node in the network happens in clock-wise direction i.e. through the next node. Each node shares a set of files with it.

The nodes communicate using SUN RPC . When a node joins the network it invokes connectMe() on the next node and the previous node. The next node and previous node addresses are supplied as CLA. When a node searches for a file, it invokes

void* search(Node n, char* filename){If search is successful then

Return the result setElse

return search(nextNode(n), filename);}Write the protocol file. Take help of rpcgen. Develop rpcclient and rpcserver. Demonstration

should have all communications printed on the console indication the ip, port, file etc.

ISZC462

Tutorial 2

EC1 solutions

Q1

• Write a TCP client and server programs for the following. The connection between client and server is persistent i.e. multiple requests are sent on the same connection. The client sends N integers to server. The server sums up all of them and sends the result back to the client. The server handles the clients concurrently. Also the server avoids zombies processes to hang around. [10]

Q1 AnsProtocol:

Client server: 4 bytes: N, 4 bytes: 1st int, 4 bytes: 2nd int, … until last integerServer client: 4 bytes: result

/*Client.c*/void error(char *msg){ perror(msg); exit(0);}int main(int argc, char *argv[]){ int sockfd, portno, n; struct sockaddr_in serv_addr; struct hostent *server; char buffer[256]; if (argc < 3) { fprintf(stderr,"usage %s hostname port\n", argv[0]); exit(0); } portno = atoi(argv[2]); sockfd = socket(AF_INET, SOCK_STREAM, 0); if (sockfd < 0) error("ERROR opening socket"); server = gethostbyname(argv[1]); if (server == NULL) { fprintf(stderr,"ERROR, no such host\n"); exit(0); }

Q1 Ansbzero((char *) &serv_addr, sizeof(serv_addr)); serv_addr.sin_family = AF_INET; bcopy((char *)server->h_addr, (char *)&serv_addr.sin_addr.s_addr, server->h_length); serv_addr.sin_port = htons(portno); if (connect(sockfd,&serv_addr,sizeof(serv_addr)) < 0) error("ERROR connecting"); /*Protocol implementation*/ printf("Enter number of integers:"); scanf("%d", &N):while(N>0){ buf[0]=N; for(i=0;i<N'i++) { printf("Enter the %dth number:"); scanf("%d",&buf[i+1]);}

write(sockfd,buf,(N+1)*4); n=read(sockfd,&result, 4);if(n==0)

printf("Server terminted prematurely");printf("The result is: %d\n", result); printf("Enter number of integers(-1 to exit):"); scanf("%d", &N):}while(); return 0;}

Q1 Ans/*server.c*/voiderror (char *msg){ perror (msg); exit (1);}voidsigchldhandler (int signo){int pid; while ((pid = waitpid (-1, NULL, WNOHANG)) > 0);}intmain (int argc, char *argv[]){ int ret, i, N, val, sum; int sockfd, newsockfd, portno, clilen; char buffer[256]; struct sockaddr_in serv_addr, cli_addr; int n; signal (SIGCHLD, sigchldhandler); if (argc < 2) { fprintf (stderr, "ERROR, no port provided\n"); exit (1); } sockfd = socket (AF_INET, SOCK_STREAM, 0); if (sockfd < 0) error ("ERROR opening socket");

Q1 Ans bzero ((char *) &serv_addr, sizeof (serv_addr)); portno = atoi (argv[1]); serv_addr.sin_family = AF_INET; serv_addr.sin_addr.s_addr = INADDR_ANY; serv_addr.sin_port = htons (portno); if (bind (sockfd, (struct sockaddr *) &serv_addr, sizeof (serv_addr)) < 0) error ("ERROR on binding"); listen (sockfd, 5); for (;;) { clilen = sizeof (cli_addr); newsockfd = accept (sockfd, (struct sockaddr *) &cli_addr, &clilen); if (newsockfd < 0)

error ("ERROR on accept"); printf ("connection is accepted");

Q1 Ansret = fork ();

if (ret == 0)

{

close (sockfd);

n = read (newsockfd, &N, 4);

printf ("N=%d\n", N);

while (n > 0)

{

i = 0;

sum = 0;

while (i < N)

{

n = read (newsockfd, &val, 4);

printf ("val[%d]=%d\n", i, val);

if (n < 0)

error ("ERROR reading from socket");

sum = sum + val;

i++;

}

printf ("sum=%d\n", sum);

n = write (newsockfd, &sum, 4);

if (n < 0)

error ("ERROR writing to socket");

n = read (newsockfd, &N, 4);

}

return 0;

}

else if (ret > 0)

{

close (newsockfd);

continue;

}

}

}

Q2

1.Write a complete program to implement the shell command ls –l|grep ^d| wc –l that displays the number of sub directories in the current directory. Use system calls such as exec etc. and pipes for inter process communication. [8]

Q2 Ansmain (){ int pid, p1[2], p2[2]; pipe (p1); pipe (p2); pid = fork (); if (pid == 0) { pid = fork (); if (pid > 0)

{ close(p2[1]); dup2 (p2[0], 0); dup2 (p1[1], 1); wait (NULL); execlp ("grep", "grep","^d", NULL);}

else if (pid == 0){ dup2 (p2[1], 1); execlp ("ls", "ls", "-l", NULL);}

} else {

close(p2[1]); close(p1[1]);

dup2 (p1[0], 0); execlp ("wc", "wc", "-l", NULL); }}

Q3What is a connected UDP socket? How is it created?

What are the advantages of using it?

Connected UDP socket means that UDP layer remembers the association of local and remote end points. By default it doesn’t happen in UDP. This is achieved by calling connect() on the socket. The advantage is that asynchronous errors over the network will be informed to the process. Also there an be only one destination communicating with the socket. This provides security against spoofing.

Q3Normally whenever a socket is closed using close()

system call, TCP termination sequence is initiated. In a concurrent TCP server, when a server process closes the connection socket, the TCP termination is not initiated. Why?

Close() initiates the termination sequence only if the reference count of the socket descriptor reaches zero. When a new connection comes to the server, a child process is created. So the connection descriptor reference count is 2. if the parent closes the socket, it becomes 1. so the termination sequence doesn’t start.

Q3Why is a signal generated for the writer of a FIFO

after the reader disappears not for the reader of FIFO after its writer disappears?

cat Bigfile | grep pattern | computeif some error occurs in compute and it terminates, how

does the grep process will come to know about it. Since the filter program grep doesn't know and has no way of knowing that it's output has been redirected then the only way to tell it to stop writing to a broken pipe if ‘cmpute’ crashes is with a signal since return values of writes to STDOUT are rarely checked.

Q3Write two advantages of using message queues over pipes?

– message queues preserve message boundaries where as pipe are stream based

– in message queues, messages can be retrieved in any order. But in pipes data is invariably retrieved in FIFO order.

– message queues can be operated asynchronously where as pipes re strictly synchronous.

– message queues are full duplex where are as pipes are half duplex

Q4Write a program ‘myprogram’ that takes the executable

name and its arguments on the command line and executes it. Don’t use system() command.

$ myprogram exe arg1 arg2 arg3 ……..argn

main(int argc, char **argv){

execvp(argv[1], argv+1);}

Q4Write a piece of code that is necessary for creating and

mapping shared memory segment onto a process.

key = ftok ("shmget.c", 'R'); if ((shmid = shmget (key, 1024, 0644 | IPC_CREAT)) == -1) { perror ("shmget: shmget failed"); exit (1); } data = shmat (shmid, (void *) 0, 0);

Q5Consider the following program.#include <stdlib.h>int glob = 6;intmain (){ int var; pid_t pid; var = 88; if (!fork()) { glob++; var++; printf ("Child: pid = %d, glob=%d, var=%d\n", getpid (), glob, var); } glob++; var++; printf ("pid = %d, glob=%d, var=%d\n", getpid (), glob, var); exit (0);}

Q5 AnsWrite the output of the above program? Assume appropriate logical pids for parent and child.[3]

pid = 11710, glob=7, var=89pid = 11710, glob=8, var=90pid = 11709, glob=7, var=89

Q5 AnsModify the above program such that child starts printing only after

parent has printed.void usr1_handler(int signo){

return;}int glob = 6;intmain (){ int var; pid_t pid; var = 88; pid=fork(); if (pid==0) { signal(SIGUSR1,usr1_handler); glob++; var++; pause(); printf ("pid = %d, glob=%d, var=%d\n", getpid (), glob, var); } if(pid>0) { glob++; var++; printf ("pid = %d, glob=%d, var=%d\n", getpid (), glob, var); kill(pid,SIGUSR1); int st; wait(&st); } exit (0);

Q5 AnsModify the above program such that parent waits for the child to exit and prints the

child’s status. int glob = 6;intmain (){ int var; pid_t pid; var = 88; pid=fork(); if (pid==0) { glob++; var++; pause(); printf ("pid = %d, glob=%d, var=%d\n", getpid (), glob, var); } if(pid>0) {int st; wait(&st); glob++; var++; printf ("pid = %d, glob=%d, var=%d\n", getpid (), glob, var); } exit (0);

top related