Top Banner
1 Introduction to Computer Networks Slides courtesy: T. S. Eugene Ng
748
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: NetWork

1

Introduction to Computer Networks

Slides courtesy: T. S. Eugene Ng

Page 2: NetWork

2

Organizing Network Functionality

• Many kinds of networking functionality– e.g., encoding, framing, routing, addressing, reliability, etc.

• Many different network styles and technologies– circuit-switched vs packet-switched, etc.– wireless vs wired vs optical, etc.

• Many different applications– ftp, email, web, P2P, etc.

• Network architecture– How should different pieces be organized?– How should different pieces interact?

Page 3: NetWork

3

Problem

• new application has to interface to all existing media– adding new application requires O(m) work, m = number of media

• new media requires all existing applications be modified– adding new media requires O(a) work, a = number of applications

• total work in system O(ma) eventually too much work to add apps/media

• Application end points may not be on the same media!

SMTP SSH FTP

Packetradio

Coaxial cable

Fiberoptic

Application

TransmissionMedia

HTTP

Page 4: NetWork

4

Solution: Indirection

• Solution: introduce an intermediate layer that provides a single abstraction for various network technologies– O(1) work to add app/media– Indirection is an often used technique in computer science

SMTP SSH NFS

802.11LAN

Coaxial cable

Fiberoptic

Application

TransmissionMedia

HTTP

Intermediate layer

Page 5: NetWork

5

Network Architecture

• Architecture is not the implementation itself

• Architecture is how to “organize” implementations– what interfaces are supported– where functionality is implemented

• Architecture is the modular design of the network

Page 6: NetWork

6

Software Modularity

Break system into modules:

• Well-defined interfaces gives flexibility– can change implementation of modules– can extend functionality of system by adding new modules

• Interfaces hide information– allows for flexibility– but can hurt performance

Page 7: NetWork

7

Network Modularity

Like software modularity, but with a twist:

• Implementation distributed across routers and hosts

• Must decide both:– how to break system into modules– where modules are implemented

Page 8: NetWork

8

Outline

• Layering– how to break network functionality into modules

• The End-to-End Argument– where to implement functionality

Page 9: NetWork

9

Layering

• Layering is a particular form of modularization

• The system is broken into a vertical hierarchy of logically distinct entities (layers)

• The service provided by one layer is based solely on the service provided by layer below

• Rigid structure: easy reuse, performance suffers

Page 10: NetWork

10

ISO OSI Reference Model

• ISO – International Standard Organization• OSI – Open System Interconnection• Goal: a general open standard

– allow vendors to enter the market by using their own implementation and protocols

Page 11: NetWork

11

ISO OSI Reference Model• Seven layers

– Lower two layers are peer-to-peer– Network layer involves multiple switches– Next four layers are end-to-end

ApplicationPresentation

SessionTransportNetworkDatalinkPhysical

ApplicationPresentation

SessionTransportNetworkDatalinkPhysical

NetworkDatalink

PhysicalPhysical medium A Physical medium B

Host 1 Intermediate switch Host 2

Page 12: NetWork

12

Layering Solves Problem

• Application layer doesn’t know about anything below the presentation layer, etc.

• Information about network is hidden from higher layers

• This ensures that we only need to implement an application once!

Page 13: NetWork

13

Key Concepts

• Service – says what a layer does– Ethernet: unreliable subnet unicast/multicast/broadcast

datagram service– IP: unreliable end-to-end unicast datagram service– TCP: reliable end-to-end bi-directional byte stream service– Guaranteed bandwidth/latency unicast service

• Service Interface – says how to access the service – E.g. UNIX socket interface

• Protocol – says how is the service implemented– a set of rules and formats that govern the communication

between two peers

Page 14: NetWork

14

Physical Layer (1)

• Service: move information between two systems connected by a physical link

• Interface: specifies how to send a bit

• Protocol: coding scheme used to represent a bit, voltage levels, duration of a bit

• Examples: coaxial cable, optical fiber links; transmitters, receivers

Page 15: NetWork

15

Datalink Layer (2)

• Service: – framing (attach frame separators) – send data frames between peers– others:

• arbitrate the access to common physical media• per-hop reliable transmission• per-hop flow control

• Interface: send a data unit (packet) to a machine connected to the same physical media

• Protocol: layer addresses, implement Medium Access Control (MAC) (e.g., CSMA/CD)…

Page 16: NetWork

16

Network Layer (3)

• Service: – deliver a packet to specified network destination– perform segmentation/reassemble– others:

• packet scheduling• buffer management

• Interface: send a packet to a specified destination• Protocol: define global unique addresses; construct

routing tables

Page 17: NetWork

17

Transport Layer (4)

• Service:– Multiplexing/demultiplexing– optional: error-free and flow-controlled delivery

• Interface: send message to specific destination

• Protocol: implements reliability and flow control

• Examples: TCP and UDP

Page 18: NetWork

18

Session Layer (5)

• Service:– full-duplex– access management (e.g., token control)– synchronization (e.g., provide check points for long transfers)

• Interface: depends on service

• Protocol: token management; insert checkpoints, implement roll-back functions

Page 19: NetWork

19

Presentation Layer (6)

• Service: convert data between various representations

• Interface: depends on service

• Protocol: define data formats, and rules to convert from one format to another

Page 20: NetWork

20

Application Layer (7)

• Service: any service provided to the end user

• Interface: depends on the application

• Protocol: depends on the application

• Examples: FTP, Telnet, WWW browser

Page 21: NetWork

21

Who Does What?

ApplicationPresentation

SessionTransportNetworkDatalinkPhysical

ApplicationPresentation

SessionTransportNetworkDatalinkPhysical

NetworkDatalinkPhysical

Physical medium

Host A Host B

Router

Page 22: NetWork

22

Logical Communication

• Layers interacts with corresponding layer on peer

ApplicationPresentation

SessionTransportNetworkDatalinkPhysical

ApplicationPresentation

SessionTransportNetworkDatalinkPhysical

NetworkDatalinkPhysical

Physical medium

Host A Host B

Router

Page 23: NetWork

23

Physical Communication

• Communication goes down to physical network, then to peer, then up to relevant layer

ApplicationPresentation

SessionTransportNetworkDatalinkPhysical

ApplicationPresentation

SessionTransportNetworkDatalinkPhysical

NetworkDatalinkPhysical

Physical medium

Host A Host B

Router

Page 24: NetWork

24

Encapsulation

• A layer can use only the service provided by the layer immediate below it

• Each layer may change and add a header to data packet

data

data

data

data

data

data

data

data

data

data

data

data

data

data

Page 25: NetWork

25

Example: Postal System

Standard process (historical):• Write letter• Drop an addressed letter off in your local mailbox• Postal service delivers to address• Addressee reads letter (and perhaps responds)

Page 26: NetWork

26

Postal Service as Layered System

Layers: • Letter writing/reading• Delivery

Information Hiding:• Network need not know letter contents• Customer need not know how the

postal network works

Encapsulation:• Envelope

Customer

Post Office

Customer

Post Office

Page 27: NetWork

28

Functions of the Layers

– Service: Handles details of application programs.– Functions:

– Service: Controls delivery of data between hosts.– Functions: Connection establishment/termination,

error control, flow control, congestion control, etc.

– Service: Moves packets inside the network.– Functions: Routing, addressing, switching, etc.

– Service: Reliable transfer of frames over a link.– Functions: Synchronization, error control, flow

control, etc.

telnet, ftp, emailwww, AFS

TCP, UDP

IP, ICMP, OSPFRIP, BGP

Ethernet, WiFiT1

ApplicationLayer

TransportLayer

NetworkLayer

(Data) LinkLayer

Page 28: NetWork

29

Internet Protocol Architecture

FTPprogram

TCP

IP

EthernetDriver

EthernetDriver

ATMDriver

IP

FTPprogram

TCP

IP

ATMDriver

FTP protocol

TCP protocol

IP protocol IP protocol

Ethernetprotocol

ATMprotocol

Page 29: NetWork

30

Internet Protocol Architecture

MPEG Servierprogram

UDP

IP

EthernetDriver

EthernetDriver

ATMDriver

IP

MPEG Playerprogram

UDP

IP

ATMDriver

RTP protocol

UDP protocol

IP protocol IP protocol

Ethernetprotocol

ATMprotocol

Page 30: NetWork

31

Application

TCP

IP

EthernetDriver

User data

User dataApplicationHeader

Application dataTCP Header

Application dataTCP HeaderIP Header

Application dataTCP HeaderIP HeaderEthernetHeader

EthernetTrailer

IP datagram

TCP segment

Ethernet frame

Encapsulation• As data is moving down the protocol stack, each protocol

is adding layer-specific control information.

Page 31: NetWork

32

Hourglass

Note: Additional protocols like routingprotocols (RIP, OSPF) needed to makeIP work

Page 32: NetWork

33

Implications of Hourglass

A single Internet layer module:

• Allows all networks to interoperate– all networks technologies that support IP can exchange

packets

• Allows all applications to function on all networks– all applications that can run on IP can use any network

• Simultaneous developments above and below IP

Page 33: NetWork

34

Reality

• Layering is a convenient way to think about networks• But layering is often violated

– Firewalls– Transparent caches– NAT boxes

Page 34: NetWork

35

Summary

• Layering is a good way to organize network functions

• Unified Internet layer decouples apps from networks

• E2E argument argues to keep IP simple

• Be judicious when thinking about adding to the network layer

Page 35: NetWork

OSI & Internet protocol suite

36

Page 36: NetWork

Where we work?

37

Sockets API

Open/X Transport Interface

Page 37: NetWork

Two reasons for this design

• Upper three layers handle all the details of application and know little about communication i.e. sending, receiving data etc

• Upper three layers form a user process while the lower four layers are provided as part of operating system or kernel.

Page 38: NetWork

About kernel

Page 39: NetWork

Kernel

• the part of the operating system that is mandatory and common to all other software

• simply the name given to the lowest level of abstraction that is implemented in software

Page 40: NetWork

Functionalities of Kernel

• Process Management• Memory Management• Device Management• System Calls

Page 41: NetWork

Process Management

• A kernel typically sets up an address space for the process,

• loads the file containing the code into memory, sets up a stack for the program and branches to a given location inside the program, thus starting its execution

Page 42: NetWork

Memory Management

• The kernel has full access to the system's memory and must allow processes to safely access this memory as they require it.

• Virtual addressing allows the kernel to make a given physical address appear to be another address, the virtual address.

• Virtual address spaces may be different for different processes;

Page 43: NetWork
Page 44: NetWork

Device Management

• Processes need access to the peripherals connected to the computer, which are controlled by the kernel through device drivers.

• For example, to show the user something on the screen, an application would make a request to the kernel, which would forward the request to its display driver, which is then responsible for actually plotting the character/pixel

Page 45: NetWork

System Calls

• A process must be able to access the services provided by the kernel. This is implemented differently by each kernel, but most provide a C library or an API, which in turn invokes the related kernel functions

• Implemented using software simulated interrupts

Page 46: NetWork

Programs and Processes

• A program is an executable file residing on disk. A program is read into memory and executed by the kernel

• An executing instance of a program is called a process

• Every process has a unique non-negative identifier called process id (PID)

Page 47: NetWork

Process Environment

• What happens when we execute a C program? ./a.out

• How the command-line arguments are passed to the process?

• Memory layout of a process

Page 48: NetWork

What happens when we execute a C program?

• int main(int argc, char *argv[]); • When a C program is executed by the kernel by one of the exec

functions, a special start-up routine is called before the main function is called.

• The executable program file specifies this routine as the starting address for the program;

• This start-up routine takes values from the kernel the command-line arguments and the environment

Page 49: NetWork
Page 50: NetWork
Page 51: NetWork

Memory Layout of C Program

• Code - text segment• Initialized data – data segment• Uninitialized data – bss segment• Heap• Stack

Page 52: NetWork

Memory Layout of C Program

• Code - text segment– Machine instructions that the CPU executes– Sharable – Read-only

Page 53: NetWork

Memory Layout of C Program

• Initialized data – data segment– Variables initialized to non-zero values appearing outside

any function causes this variable to be stored in the initialized data segment with its initial value.

– Statically allocated and global data that are initialized with nonzero values live in the data segment

Page 54: NetWork

Memory Layout of C Program

• Uninitialized data – bss segment– BSS stands for ‘Block Started by Symbol’. – Global and statically allocated data that initialized to zero

by default are kept here

Page 55: NetWork
Page 56: NetWork

Memory Layout

• Stack– The stack segment is where local (automatic) variables are allocated.  – The data is popped up or pushed into the stack following the Last In First

Out (LIFO) rule. – When a function is called, a stack frame is created and PUSHed onto the

top of the stack. This stack frame contains information such as the address from which the function was called and where to jump back to when the function is finished (return address), parameters, local variables, and any other information needed by the invoked function.

– When a function returns, the stack frame is POPped from the stack.  Typically the stack grows downward, meaning that items deeper in the call chain are at numerically lower addresses and toward the heap.

Page 57: NetWork

Stack

Page 58: NetWork

Memory Layout of C Program

• Heap– The heap is where dynamic memory (obtained by malloc(), calloc(),

realloc()) comes from.  – It is typical for the heap to grow upward.  This means that successive items

that are added to the heap are added at addresses that are numerically greater than previous items. 

– The end of the heap is marked by a pointer known as the break. You cannot reference past the break. You can, however, move the break pointer (via brk() and sbrk() system calls) to a new position to increase the amount of heap memory available.

Page 59: NetWork

Environment Variables

• Stored in process memory• Set of parameters that are inherited from process to process.• Each program is also passed an environment list like the

argument list.• Environment list is an array of character pointers, with each

pointer containing the variable name and its value.

Page 60: NetWork

Environment Variables

Page 61: NetWork

Listing all arguments and environment vars

intmain (int argc, char *argv[]){ int i; char **ptr; extern char **environ; for (i = 0; i < argc; i++) /* echo all command-line args */ printf ("argv[%d]: %s\n", i, argv[i]); for (ptr = environ; *ptr != 0; ptr++) /* and all env strings */ printf ("%s\n", *ptr); exit (0);}

Page 62: NetWork

Functions to access environment variables

Page 63: NetWork

Process Control

• Every process has a unique process ID, a non-negative integer.

• Although unique, process IDs are reused. As processes terminate, their IDs become candidates for reuse.

• Process ID 0 is usually the scheduler process and is often known as the swapper.

Page 64: NetWork

Process Control

• Process ID 1 is usually the init process and is invoked by the kernel at the end of the bootstrap procedure. This process is responsible for bringing up a UNIX system after the kernel has been bootstrapped.

• The init process never dies. It is a normal user process, not a system process within the kernel, although it does run with super user privileges.

• init becomes the parent process of any orphaned child process.

Page 65: NetWork

Process Identifiers

#include <unistd.h> • pid_t getpid(void);

Returns: process ID of calling process• pid_t getppid(void);

Returns: parent process ID of calling process• uid_t getuid(void);

Returns: real user ID of calling process• uid_t geteuid(void);

Returns: effective user ID of calling process• gid_t getgid(void);

Returns: real group ID of calling process• gid_t getegid(void);

Returns: effective group ID of calling process

Page 66: NetWork

fork()

• An existing process can create a new one by calling the fork function.#include <unistd.h> pid_t fork(void);

Returns: 0 in child, process ID of child in parent, 1 on error• The new process created by fork is called the child process. This

function is called once but returns twice. The only difference in the returns is that the return value in the child is 0, whereas the return value in the parent is the process ID of the new child

Page 67: NetWork

fork()

• Both the child and the parent continue executing with the instruction that follows the call to fork.

• The child is a copy of the parent. For example, the child gets a copy of the parent's data space, heap, and stack. Note that this is a copy for the child; the parent and the child do not share these portions of memory. The parent and the child share the text segment

Page 68: NetWork

copy-on-write (COW)

• don't perform a complete copy of the parent's data, stack, and heap

• These regions are shared by the parent and the child and have their protection changed by the kernel to read-only

• If either process tries to modify these regions, the kernel then makes a copy of that piece of memory only, typically a "page" in a virtual memory system.

Page 69: NetWork

int glob = 6; //global variableintmain (){ int var; pid_t pid; var = 88; printf ("Before fork\n"); if ((pid = fork ()) < 0) perror ("fork"); //function to print error that occurred in the process else if (pid == 0) { glob++; var++; printf ("pid = %d, glob=%d, var=%d\n", getpid (), glob, var); exit (0); } else { printf ("pid = %d, glob=%d, var=%d\n", getpid (), glob, var); exit (0); }}

Page 70: NetWork

fork()

• In general, we never know whether the child starts executing before the parent or vice versa. This depends on the scheduling algorithm used by the kernel.

• To synchronize child and parent, some form of interprocess communication is required.

Page 71: NetWork

File sharing between parent and child

• one characteristic of fork is that all file descriptors that are open in the parent are duplicated in the child.

• The parent and the child share a file table entry for every open descriptor .

• Generally shell process has three different files opened for standard input, standard output, and standard error. When a command is executed as a process, they are inherited

Page 72: NetWork
Page 73: NetWork

vfork()

• The vfork function is intended to create a new process when the purpose of the new process is to exec a new program

• The vfork function creates the new process, just like fork, without copying the address space of the parent into the child, as the child won't reference that address space

• vfork guarantees that the child runs first, until the child calls exec or exit. When the child calls either of these functions, the parent resumes.

Page 74: NetWork

What child inherits?

• Real user ID, real group ID, effective user ID, effective group ID• Current working directory• Root directory• File mode creation mask• Environment • Process group ID• Session ID• Controlling terminal• Attached shared memory segments• Memory mappings• Resource limits

Page 75: NetWork

What values in child are different from parent?

• The return value from fork• The process IDs are different• The two processes have different parent process IDs: the parent

process ID of the child is the parent; the parent process ID of the parent doesn't change

• The child's tms_utime, tms_stime, tms_cutime, and tms_cstime values are set to 0

• File locks set by the parent are not inherited by the child• Pending alarms are cleared for the child• The set of pending signals for the child is set to the empty set

Page 76: NetWork

Process Termination

• Normal Termination– Return from main– Calling exit– Calling _exit or _Exit– Return of the last thread from its start routine– Calling pthread_exit from the last thread

• Abnormal termination – Calling abort – Receipt of a signal – Response of the last thread to a cancellation request

Page 77: NetWork

Process Termination

• Regardless of how a process terminates, the same code in the kernel is eventually executed. This kernel code closes all the open descriptors for the process, releases the memory that it was using, and the like.

• Te able to notify its parent how it terminated, child passes an exit status as the argument to exit functions (exit, _exit, and _Exit),

• In the case of an abnormal termination, however, the kernel, not the process, generates a termination status to indicate the reason for the abnormal termination.

• In any case, the parent of the process can obtain the termination status using wait or the waitpid function

Page 78: NetWork

Process Termination

• When a process terminates, either normally or abnormally, the kernel notifies the parent by sending the SIGCHLD signal to the parent.

• This signal is the asynchronous notification from the kernel to the parent. The parent can choose to ignore this signal, or it can provide a function that is called when the signal occurs: a signal handler.

• The default action for this signal is that it is ignored.

Page 79: NetWork

wait() & waitpid()

• Parent can obtain termination status from kernel using these calls

• Process that calls wait or waitpid can– Block, if all of its children are still running– Return immediately with the termination status of a child, if a child

has terminated and is waiting for its termination status to be fetched– Return immediately with an error, if it doesn't have any child

processes

Page 80: NetWork

Syntax

Page 81: NetWork

waitpid()

Page 82: NetWork

main (){ int i = 0, j = 0; pid_t ret; int status; ret = fork (); if (ret == 0) { for (i = 0; i < 5000; i++) printf ("Child: %d\n", i); printf ("Child ends\n"); } else { wait (&status); printf ("Parent resumes.\n"); for (j = 0; j < 5000; j++) printf ("Parent: %d\n", j); }}

Page 83: NetWork

What happens if parent terminates before child?

• the init process becomes the parent process of any process whose parent terminates ( process has been inherited by init)

• parent process ID of the surviving process is changed to be 1 (the process ID of init). This way, we're guaranteed that every process has a parent.

Page 84: NetWork

What happens when a child terminates before its parent ?

• Kernel keeps small amount of information (process ID, the termination status of the process, and the amount of CPU time taken by the process ) until parent asks for it

• a process that has terminated, but whose parent has not yet waited for it, is called a zombie

Page 85: NetWork

exec functions

• fork function creates a new process (the child). Then causes another program to be executed by calling one of the exec functions.

• When a process calls one of the exec functions, that process is completely replaced by the new program, and the new program starts executing at its main function.

• The process ID does not change across an exec, because a new process is not created;

• exec replaces the current process, its text, data, heap, and stack segments with a new program from disk.

Page 86: NetWork

#include <unistd.h> • int execl(const char *pathname, const char *arg0, ... /*

(char *)0 */ ); • int execv(const char *pathname, char *const argv []);• int execle(const char *pathname, const char *arg0, ... /*

(char *)0, char *const envp[] */ ); • int execve(const char *pathname, char *const argv[], char

*const envp []); • int execlp(const char *filename, const char *arg0, ... /*

(char *)0 */ ); • int execvp(const char *filename, char *const argv []);

Page 87: NetWork

Remembering arguments

Function pathname filename Arg list argv[] environ envp[]

execl •   •   •  execlp   • •   •  execle •   •     •execv •     • •execvp   •   • •  execve •     •   •(letter in

name)   p l v   e

Page 88: NetWork

Example

Output: Executes ls command with –l optionint main (){ execl ("/bin/ls", "ls", "-l", (char *) 0); printf ("hello");}

Page 89: NetWork

• Input: a command to execute and its arguments int main(int argc, char **argv){execvp(argv[1], argv+1);

}

Page 90: NetWork

Signals

• A signal is an asynchronous event which is delivered to a process.

• Asynchronous means that the event can occur at any time– may be unrelated to the execution of the process– e.g. user types ctrl-C, or the modem hangs

Page 91: NetWork

Signals

• Name Description Default ActionSIGINT Interrupt character typed terminate processSIGQUIT Quit character typed (^\) terminate + create

core imageSIGKILL kill -9 terminate processSIGSEGV Invalid memory reference terminate +

create core imageSIGPIPE Write on pipe but no reader terminate processSIGALRM alarm() clock ‘rings’ terminate processSIGUSR1 user-defined signal type terminate processSIGUSR2 user-defined signal type terminate process

• See man 7 signal

Page 92: NetWork

Signal Sources

• Terminal-generated signals: SIGINT, SIGQUIT• Hardware exceptions generate signals: SIGFPE, SIGSEGV• kill function allows a process to send any signal to another

process or process group• The kill command allows us to send signals to other processes. • Software conditions: SIGURG, SIGPIPE, SIGALRM

Page 93: NetWork

kill() and raise()function

• Send a signal to a process (or group of processes).

#include <signal.h>int kill( pid_t pid, int signo );int raise(int signo);

• pid > 0 send signal to process pid

pid== 0 send signal to all processeswhose process group ID equals the sender’s

pgid.e.g. parent kills all children

• Return 0 if ok, -1 on error.

Page 94: NetWork

Responding to a Signal

• A process can:– ignore/discard the signal (not possible with SIGKILL or SIGSTOP)

– Catch the signal and execute a signal handler function, and then possibly resume execution

– Let the default action apply. Every signal has a default action• The choice is called the signal disposition

Page 95: NetWork

Signal Handler Function

• Specify a signal handler function to deal with a signal type.• #include <signal.h>

typedef void Sigfunc(int); /* my defn */Sigfunc *signal( int signo, Sigfunc *handler );– signal returns a pointer to a function that takes an int (i.e. it returns a

pointer to Sigfunc)• Returns previous signal disposition if ok, SIG_ERR on error.

Page 96: NetWork

Example

int main(){

signal( SIGINT, foo ); :

/* do usual things until SIGINT */return 0;}

void foo( int signo ){

: /* deal with SIGINT signal */

return; /* return to program */}

Page 97: NetWork

Special Sigfunc * Values

• Value Meaning

SIG_IGN Ignore / discard the signal.

SIG_DFL Use default action to handle signal.

SIG_ERR Returned by signal() as an error.

Page 98: NetWork

Signals Overview• Three phases to processing signals:

– Signal is generated• when the event that causes the signal occurs

– Signal is delivered• signal is said to be delivered to the process when process takes

action for the signal– Signal is pending

• during the time between generation and delivery, the signal is said to be pending

Page 99: NetWork

Signal blocking

• Blocking the delivery of a signal– process informs the signal to be blocked to kernel– When such signal is generated for the process, if the action

is not ignore, that signal remains pending until the process either unblocks it or changes action to ignore

Page 100: NetWork

Multiple Signals

• If a blocked signal is generated more than once then in most systems the signal is delivered only once. That is the signal is not queued.

• If many signals of different types are ready to be delivered (e.g. a SIGINT, SIGSEGV, SIGUSR1), they are not delivered in any fixed order.

Page 101: NetWork

Signal Sets

• A data type to represent multiple signals• #include <signal.h>

– int sigemptyset(sigset_t *set); – int sigfillset(sigset_t *set); – int sigaddset(sigset_t *set, int signo); – int sigdelset(sigset_t *set, int signo);

All four return: 0 if OK, 1 on error int – sigismember(const sigset_t *set, int signo); – Returns: 1 if true, 0 if false, 1 on error

Page 102: NetWork

sigprocmask()

• A process uses a signal set to create a mask which defines the signals it is blocking from delivery. – good for critical sections where you want to block certain signals.

• #include <signal.h>int sigprocmask( int how,

const sigset_t *set,sigset_t *oldset);

• how – indicates how mask is modified

Page 103: NetWork

‘how’ Meanings

• Value Meaning

SIG_BLOCK set signals are added to mask

SIG_UNBLOCK set signals are removed from mask

SIG_SETMASK set becomes new mask

Page 104: NetWork

A Critical Code Region

sigset_t newmask, oldmask;

sigemptyset( &newmask );sigaddset( &newmask, SIGINT );

/* block SIGINT; save old mask */sigprocmask( SIG_BLOCK, &newmask, &oldmask );

/* critical region of code */

/* reset mask which unblocks SIGINT */sigprocmask( SIG_SETMASK, &oldmask, NULL );

Page 105: NetWork

sigaction()

• Supercedes (more powerful than) signal()– sigaction() can be used to code a non-

resetting signal()• #include <signal.h>

int sigaction(int signo, const struct sigaction *act, struct sigaction *oldact );

Page 106: NetWork

sigaction Structure

struct sigaction {

void (*sa_handler)( int ); /* action to be taken or SIG_IGN, SIG_DFL */

sigset_t sa_mask; /* additional signal to be blocked */ int sa_flags; /* modifies action of the signal */

void (*sa_sigaction)( int, siginfo_t *, void * );/*The sa_sigaction field is an alternate signal handler used when

the SA_SIGINFO flag is used with sigaction. */}

• sa_flags – – SIG_DFL reset handler to default upon return– SA_SIGINFO denotes extra information is passed to handler (.i.e. specifies the

use of the “second” handler in the structure.

Page 107: NetWork

sigaction() Behavior

• A signo signal causes the sa_handler signal handler to be called.

• While sa_handler executes, the signals in sa_mask are blocked. Any more signo signals are also blocked.

• sa_handler remains installed until it is changed by another sigaction() call. No reset problem.

• sa_sigaction specifies handler if SA_SIGINFO flag is set.

struct siginfo { int si_signo; /* signal number */ int si_errno; /* if nonzero, errno value from <errno.h> */int si_code; /* additional info (depends on signal) */ pid_t si_pid; /* sending process ID */ uid_t si_uid; /* sending process real user ID */ void *si_addr; /* address that caused the fault */ int si_status; /* exit value or signal number */ long si_band; /* band number for SIGPOLL */ /* possibly other fields also */

};

Page 108: NetWork

Other POSIX Functions

• sigpending() examine blocked signals

• sigsetjmp()siglongjmp() jump functions for use

in signal handlers whichhandle masks correctly

• sigsuspend() atomically reset maskand sleep

Page 109: NetWork

pause()

• Suspend the calling process until a signal is caught.• #include <unistd.h>

int pause(void);• Returns -1 with errno assigned EINTR.• pause() only returns after a signal handler has returned.

Page 110: NetWork

alarm()

• Set an alarm timer that will ‘ring’ after a specified number of seconds– a SIGALRM signal is generated

• #include <unistd.h>long alarm(long secs);

• Returns 0 or number of seconds until previously set alarm would have ‘rung’.

Page 111: NetWork

Some aspects of alarm()

• A process can have at most one alarm timer running at once.

• If alarm() is called when there is an existing alarm set then it returns the number of seconds remaining for the old alarm, and sets the timer to the new alarm value.

• An alarm(0) call causes the previous alarm to be cancelled.

Page 112: NetWork

setjmp() and longjmp()

• In C we cannot use goto to jump to a label in another function– use setjmp() and longjmp() for those ‘long jumps’

• Uses :– error handling which requires a deeply nested function to recover to

a higher level (e.g. back to main())– coding timeouts with signals

Page 113: NetWork

Prototypes

• #include <setjmp.h>int setjmp( jmp_buf env );

• Returns 0 if called directly, non-zero if returning from a call to longjmp().• #include <setjmp.h>

void longjmp( jmp_buf env, int val );• In the setjmp() call, env is initialized to information about the current

state of the stack.• The longjmp() call causes the stack to be reset to its env value.• Execution restarts after the setjmp() call, but this time setjmp()

returns val.

Page 114: NetWork

Examplejmp_buf env; /* global */int main(){

char line[MAX]; int errval;

if(( errval = setjmp(env) ) != 0 ) printf( “error %d: restart\n”, errval ); while( fgets( line, MAX, stdin ) != NULL ) process_line(line); return 0;

}

continued

Page 115: NetWork

:void process_line( char * ptr )

{:cmd_add():}

void cmd_add(){

int token;

token = get_token(); if( token < 0 ) /* bad error */ longjmp( env, 1 );

/* normal processing */}

int get_token(){if( some error )

longjmp( env, 2 );}

Page 116: NetWork

Stack Frames before calling longjmp()

top of stack

direction ofstack growth

main()stack frame

setjmp(env)returns 0;env records stackframes info

Page 117: NetWork

Stack Frames after longjmp()

top of stack

direction ofstack growth

main()stack frame

process_line()stack frame

::

cmd_add()stack frame

longjmp(env,1)causes stack framesto be reset

Page 118: NetWork

What happens if longjmp() is called in signal handler?

• Signal is automatically added to signal mask (which prevents it from further delivery) when a signal handler is is entered. When signal handler is exited, signal is removed from the mask.

• When longjmp() is called in signal handler, the signal remains blocked.

Page 119: NetWork

siglongjmp & sigsetjmp

• POSIX does not specify whether longjmp will restore the signal context. If you want to save and restore signal masks, use siglongjmp.

• POSIX does not specify whether setjmp will save the signal context. If you want to save signal masks, use sigsetjmp.

• #include <setjmp.h> • int sigsetjmp(sigjmp_buf env, int savemask);

Returns: 0 if called directly, nonzero if returning from a call to siglongjmp • void siglongjmp(sigjmp_buf env, int val);

Page 120: NetWork

Inter Process Communication

122

Page 121: NetWork

Why do processes communicate?

123

To share resourcesClient/server paradigmsInherently distributed applicationsReusable software componentsetc

Page 122: NetWork

Types of IPC

• Message Passing– Pipes, FIFOs, and Message Queues

• Synchronization– Mutexes, condition variables, read-write locks, file and record locks,

and semaphores• Shared memory• Remote Procedure Calls

– Solaris doors and Sun RPC

Page 123: NetWork

Sharing of information

Page 124: NetWork

What is IPC?

• Each process has a private address space. Normally, no process can write to another process’s space. How to get important data from process A to process B?

• Message passing between different processes running on the same operating system is IPC

• Synchronization is required in case of IPC through shared memory or file system

Page 125: NetWork

Pipes

• Pipes are the oldest form of UNIX System IPC and are provided by all UNIX systems

• Most commonly used form of IPC • Historically, they have been half duplex (i.e., data flows in only

one direction). • Because they don’t have names, pipes can be used only

between processes that have a common ancestor. – Normally, a pipe is created by a process, that process calls fork,

and the pipe is used between the parent and the child.

Page 126: NetWork

UNIX Pipes

Info to beshared Info copy

pipe for p1 and p2

write function read function

int p[2];pipe(p);write(p[1], “hello”, size);….

read(p[0], inbuf, size);….

FIFO buffersize = 4096 characters

Parent process, p1 Child process, p2

olleh

Page 127: NetWork

Pipes

• #include <unistd.h>• int pipe(int fd[2]); returns 0 if OK,

else -1• fd[0]-> for reading, fd[1] is for writing

Page 128: NetWork

Pipes

• Pipes are rarely used in a single process. They are generally used between parent and child

Page 129: NetWork

Pipes

main (){ int i; int p[2]; pid_t ret; pipe (p); //creating pipe char buf[100]; ret = fork (); if (ret == 0) { write (p[1], "hello", 6);//writing to parent through pipe } if (ret > 0) { read (p[0], buf, 6); //reading from child via pipe printf ("Child Said:%s\n", buf); //printing to stdout }}

Page 130: NetWork

Pipes: who|sort

stdout

Page 131: NetWork

who|sort

• Create a pipe in the parent• Fork a child• Duplicate the standard output descriptor to write end of pipe• Exec ‘who’ program• In the parent wait for the child. • Duplicate the standard input descriptor to read end of pipe• Exec ‘sort’ program

Page 132: NetWork

who|sort

main (){ int i; int p[2]; pid_t ret; pipe (p); ret = fork (); if (ret == 0) { close (1); dup (p[1]); close (p[0]); execlp (“who", “who", (char *) 0); } if (ret > 0) { close (0); dup (p[0]); close (p[1]); wait (NULL); execlp (“sort", “sort", (char *) 0); }}

Page 133: NetWork

dup and dup2 Functions

• #include <unistd.h> • int dup(int filedes); • int dup2(int filedes, int filedes2);

Both return: new file descriptor if OK, 1 on error• The new file descriptor returned by dup is guaranteed to be the lowest-

numbered available file descriptor. • With dup2, we specify the value of the new descriptor with the filedes2

argument. If filedes2 is already open, it is first closed. If filedes equals filedes2, then dup2 returns filedes2 without closing it.

Page 134: NetWork

dup and dup2

Page 135: NetWork

Popen

• #include <stdio.h> • FILE *popen(const char *cmdstring, const char *type);

• Returns: file pointer if OK, NULL on error• int pclose(FILE *fp);

Page 136: NetWork

popen

• Popen does – creating a pipe, forking a child, closing the unused ends of

the pipe, executing a shell to run the command, and waiting for the command to terminate

– fp = popen("ls *.c", "r");

Page 137: NetWork

Name Spaces

• When two unrelated processes use some type of IPC to exchange information, the IPC object must have a name or identifier of some form

• The set of possible names for a given type of IPC is called its name space

• FIFOs have pathname in the file system as identifier

Page 138: NetWork

FIFOs

• Create a FIFO– #include <sys/types.h>– #include <sys/stat.h>– int mkfifo(const char *pathname, mode_t mode)

//returns 0 if OK or -1• Ex: if( mkfifo("fifo1", 0666)<0) perror();

– mkfifo returns error ‘EEXIST’ if the FIFO already exists at the given path

Page 139: NetWork

FIFOs

• Once a FIFO is created, it should be opened either for reading or writing– wfd=open("fifo1",O_WRONLY); or– FILE *fp = fopen(“fifo1”, “w”);

• FIFO can’t be opened both for reading and writing at the same time

• Unlike pipe, FIFO is not deleted as soon as all the processes referring to it exit. It has to be explicitly deleted from system.– unlink(“fifo1”)

Page 140: NetWork

FIFOs between parent and child

Page 141: NetWork

FIFOs between parent and child

Page 142: NetWork

Properties of FIFO

Page 143: NetWork

FIFOs between parent and child

Swap these two calls and see

Page 144: NetWork

Non-blocking option

• A descriptor can be set non-blocking in one of the two ways

Or

Page 145: NetWork

Read and write operations Pipe and FIFO

Page 146: NetWork

Writing to pipe/fifo when pipe/fifo is open for reading

• If data size is less than or equal to PIPE_BUF, the write is atomic i.e. either all the data is written or no data written

• If there is no room in the pipe for the requested data (<PIPE_BUF), by default it blocks.

– If O_NONBLOCK option is set, EAGAIN error is returned• If data is >PIPE_BUF and O_NONBLOCK option is set, even if 1 byte

space is available in the pipe, it will write that much data and return– Atomicity is not guaranteed

Page 147: NetWork

Message Queues

• A message queue is a linked list of messages stored within the kernel and identified by a message queue identifier

• Any process with adequate privileges can place the message into the queue and any process with adequate privileges can read from queue

• There is no requirement that some process must be waiting to receive message before sending the message

Page 148: NetWork

Message Queues

• Every message queue has following structure in kernel

Page 149: NetWork

Message Queues

Page 150: NetWork

Permissions

• struct ipc_perm { uid_t uid; /* owner's effective user id */ gid_t gid; /* owner's effective group id */ uid_t cuid; /* creator's effective user id */ gid_t cgid; /* creator's effective group id */ mode_t mode; /* access modes */ . . . };

• Permission Bit– user-read 0400– user-write (alter) 0200 – group-read 0040– group-write (alter) 0020– other-read 0004– other-write (alter) 0002

Page 151: NetWork

Message Queues

• First msgget is used to either open an existing queue or create a new queue

• #include <sys/msg.h>int msgget(key_t key, int flag); – Returns: message queue ID if OK, 1 on error

• Key value can be IPC_PRIVATE, key generated by ftok() or any key (long integer)

• Flag value must be– IPC_CREAT if a new queue has to be created– IPC_CREAT and IPC_EXCL if want to create a new a queue but don’t

reference existing one

Page 152: NetWork

Key Values

• The server can create a new IPC structure by specifying a key of IPC_PRIVATE

– Kernel generates a uniqe id• The client and the server can agree on a key by defining the key in a

common header. • The client and the server can agree on a pathname and project ID

and call the function ftok to convert these two values into a key.– #include <sys/ipc.h>– key_t ftok(const char *path, int id); – The path argument must refer to an existing file. Only the lower 8 bits of

id are used when generating the key.

Page 153: NetWork

Message Queues

• When a new queue is created, the following members of the msqid_ds structure are initialized.– The ipc_perm structure is initialized – msg_qnum, msg_lspid, msg_lrpid, msg_stime, and msg_rtime are

all set to 0.– msg_ctime is set to the current time.– msg_qbytes is set to the system limit.

• On success, msgget returns the non-negative queue ID. This value is then used with the other three message queue functions.

Page 154: NetWork

Messages

• Each message is composed of a positive long integer type field, and the actual data bytes. Messages are always placed at the end of the queue.

• Messaeg Template

• Most applications define their own message structure according to the needs of the application

Page 155: NetWork

Sending Messages

• #include <sys/msg.h>int msgsnd(int msqid, const void *ptr, size_t nbytes, int flag);

• msqid is the id returned by msgget sys call • The ptr argument is a pointer to a message structure • Nbytes is the length of the user data i.e. sizeof(struct mesg) – size

of(long). Length can be zero.• A flag value of 0 or IPC_NOWAIT can be specified • mssnd() is blocked until one of the following occurs

– Room exists for the message– Message queue is removed (EIDRM error is returned)– Interrupted by a signal ( EINTR is returned)

158

Page 156: NetWork

Receiving Messages

• ptr points to the message structure where message will be stord• Length points to the size available on the message structure excluding

size of (long) • Type indicates the message desired on the message queue• Flag can be 0 or IPC_NOWAIT or MSG_NOERROR

159

Page 157: NetWork

Receiving Messages

• The type argument lets us specify which message we want.– type == 0: The first message on the queue is returned.– type > 0:The first message on the queue whose message type equals type

is returned.– type < 0:The first message on the queue whose message type is the lowest

value less than or equal to the absolute value of type is returned.• A nonzero type is used to read the messages in an order other than

first in, first out. – Priority to messages, Multiplexing

160

Page 158: NetWork

Receiving Messages

• IPC_NOWAIT flag makes the operation nonblocking, causing msgrcv to return -1 with errno set to ENOMSG if a message of the specified type is not available.

• If IPC_NOWAIT is not specified, the operation blocks until – a message of the specified type is available, – the queue is removed from the system (-1 is returned with errno set to

EIDRM)– a signal is caught and the signal handler returns (causing msgrcv to return 1

with errno set to EINTR).

161

Page 159: NetWork

Receiving Messages

• If the returned message is larger than nbytes and the MSG_NOERROR bit in flag is set, the message is truncated. – no notification is given to us that the message was truncated, and

the remainder of the message is discarded. • If the message is too big and MSG_NOERROR is not specified,

an error of E2BIG is returned instead (and the message stays on the queue).

162

Page 160: NetWork

Control Operations on Message Queues

• #include <sys/msg.h> int msgctl(int msqid, int cmd, struct msqid_ds *buf );

• IPC_STAT: Fetch the msqid_ds structure for this queue, storing it in the structure pointed to by buf.

• IPC_SET: Copy the following fields from the structure pointed to by buf to the msqid_ds structure associated with this queue: msg_perm.uid, msg_perm.gid, msg_perm.mode, and msg_qbytes.

• IPC_RMID: Remove the message queue from the system and any data still on the queue. This removal is immediate.

– Any other process still using the message queue will get an error of EIDRM on its next attempted operation on the queue.

– Above two commands can be executed only by a process whose effective user ID equals msg_perm.cuid or msg_perm.uid or by a process with superuser privileges

163

Page 161: NetWork

Server.c

/*key.h*/#define MSGQ_PATH "/home/students/f2007045/msgq_server.c " struct my_msgbuf{ long mtype; char mtext[200];}; int main (void){ struct my_msgbuf buf; int msqid; key_t key;  if ((key = ftok (MSGQ_PATH, 'B')) == -1) { perror ("ftok"); exit (1); } 

  if ((msqid = msgget (key, IPC_CREAT | 0644)) == -1) { perror ("msgget"); exit (1); }  printf ("server: ready to receive messages\n"); for (;;) { if (msgrcv (msqid, &(buf.mtype), sizeof (buf), 0, 0) == -1)

{ perror ("msgrcv"); exit (1);}

printf ("server: \"%s\"\n", buf.mtext); }  return 0;}

164

Page 162: NetWork

Client.c#include "key.h“struct my_msgbuf{ long mtype; char mtext[200];};

main (void){ struct my_msgbuf buf; int msqid; key_t key;  if ((key = ftok (MSGQ_PATH, 'B')) == -1) { perror ("ftok"); exit (1); }  if ((msqid = msgget (key, 0) == -1) { perror ("msgget"); exit (1); } 

printf ("Enter lines of text, ^D to quit:\n");  buf.mtype = 1; while (gets (buf.mtext), !feof (stdin)) { if (msgsnd (msqid, &(buf.mtype), sizeof (buf), 0) == -1)perror ("msgsnd"); }  if (msgctl (msqid, IPC_RMID, NULL) == -1) { perror ("msgctl"); exit (1); }  return 0;}

165

Page 163: NetWork

Multiplexing Messages

• Possibility of dead lock

166

Page 164: NetWork

Multiplexing Messages

167

Page 165: NetWork

System V Semaphores

• A semaphore is a primitive used to provide synchronization between various processes (or between various threads in a given process)

• Binary Semaphores: a semaphore that can assume only values 0 or 1

• Counting Semaphores: semaphore is initialized to N indicating the number of resources

168

Page 166: NetWork

System V Semaphores

• Semaphores are maintained by kernel

169

Page 167: NetWork

Semaphore operations

• Create a semaphore and initialize it – should be atomically done

• Wait for a semaphore: This tests the value of the semaphore. waits (blocks) if the value is less than or equal to 0 and then decrements the semaphore value once it is greater than 0 (aka P, lock, wait)

– Testing and decrementing should be a single atomic operation• Post a semaphore. This increments the semaphore value. If any

processes are blocked waiting for this semaphores’s value o be greater than 0, one of those processes are woken up (aka V, unlock, signal)

170

Page 168: NetWork

Producer Consumer Problem

• Producer produces one item and keeps in buffer.• Consumer removes that item for processing• How to synchronize?

171

Page 169: NetWork

Producer Consumer Problem

• Semaphore put controls whether the producer can place an item into the shared buffer

• Semaphore get controls whether the consumer can remove an item from the shred buffer

172

Page 170: NetWork

System V Semaphores

• Add one more level of detail by defining “a set of counting semaphores”

• When we say System V semaphore it refers to a set of couting semaphores ( max size of set is 25)

173

Page 171: NetWork

System V Semaphores

• Kernel maintains the following structure for every set

• Sem structure maintains info about each semaphore. Sem_base contains pointer to an array of these structures

174

Page 172: NetWork

System V Semaphores

• Kernel structure for a semaphore set having 2 counting semaphores

175

Page 173: NetWork

Creating Semaphores

• The number of semaphores in the set is nsems. If a new set is being created, we must specify nsems. If we are referencing an existing set, we can specify nsems as 0.

• When a new set is created, the following members of the semid_ds structure are initialized.

– The ipc_perm structure – sem_otime is set to 0.– sem_ctime is set to the current time.– sem_nsems is set to nsems.

176

Page 174: NetWork

Initializing a semaphore value

• Semnum specifies which semaphore (0,1,2 …)• Semun union is used for some commands

• This union desn’t appear in any application, it should be declared in your program

177

Page 175: NetWork

Testing whether semaphore has been initilized

• When process P1 creates semaphore sem_otime is set to zero.

• When P1 calls semctl to initialize and then semop, sem_otime is set to current time.

• When process P2 checks sem_otime is non zero it understands that semaphore has been initialized.

178

Page 176: NetWork

semctl() commands

• IPC_STAT, IPC_SET, IPC_RMID same as in message queues• GETVAL: Return the value of semval for the member semnum.• SETVAL: Set the value of semval for the member semnum. The value is

specified by arg.val.• GETPID: Return the value of sempid for the member semnum.• GETNCNT: Return the value of semncnt for the member semnum.• GETZCNT: Return the value of semzcnt for the member semnum.• GETALL: Fetch all the semaphore values in the set. These values are stored in

the array pointed to by arg.array.• SETALL: Set all the semaphore values in the set to the values pointed to by

arg.array

179

Page 177: NetWork

Semaphore opearions

• Opsptr points to an array of following structure

• nops specifies number of structures in the array• Semop gurantees that either all these operations are done or

none are done

180

Page 178: NetWork

Semaphore operations

• The operation on each member of the set is specified by the corresponding sem_op value. This value can be negative, 0, or positive.

• If sem_op>0:– returning of resources by the process. – Semval+=sem_op– If the SEM_UNDO flag is specified, semadj -=sem_op – subtracted from the semaphore's adjustment value for this process.

181

Page 179: NetWork

Semaphore operations

• If sem_op <0– obtain resources that the semaphore controls.

• If semval>= |sem_op| – the resources are available– Semva -= |sem_op|– If the SEM_UNDO flag is specified, – semadj += sem_op – added to the semaphore's adjustment value for this process.

182

Page 180: NetWork

Semaphore operations

• If semval < |sem_op| – the resources are not available– If IPC_NOWAIT is specified, semop returns with an error of EAGAIN.– If IPC_NOWAIT is not specified, the semncnt value for this semaphore is incremented

(since the caller is about to go to sleep), and the calling process is suspended until one of the following occurs.

• Semval>=|sem_op| i.e. some other process has released some resources. Semncnt--• The semaphore is removed from the system. In this case, the function returns an error of

EIDRM.• A signal is caught by the process, and the signal handler returns. and the function returns an

error of EINTR. semncnt--

183

Page 181: NetWork

Semaphore operations

• If sem_op = 0,– this means that the calling process wants to wait until the semaphore's value becomes 0.

• If the semaphore's value is currently 0, the function returns immediately.• If the semaphore's value is nonzero, the following conditions apply.

– If IPC_NOWAIT is specified, return is made with an error of EAGAIN.– If IPC_NOWAIT is not specified, semzcnt++, and the calling process is suspended until one of the

following occurs.• The semaphore's value becomes 0. semzcnt--• The semaphore is removed from the system. In this case, the function returns an error of EIDRM.• A signal is caught by the process, and the signal handler returns. the function returns an error of EINTR. Semzcnt--

184

Page 182: NetWork

Semval adjustment on process termination

• it is a problem if a process terminates while it has resources allocated through a semaphore.

• Whenever we specify the SEM_UNDO flag for a semaphore operation and we allocate resources (a sem_op value less than 0), the kernel remembers how many resources we allocated from that particular semaphore (the absolute value of sem_op).

• When the process terminates, either voluntarily or involuntarily, the kernel checks whether the process has any outstanding semaphore adjustments and, if so, applies the adjustment to the corresponding semaphore value.

• If we set the value of a semaphore using semctl, with either the SETVAL or SETALL commands, the adjustment value for that semaphore in all processes is set to 0.

185

Page 183: NetWork

Producer Consumer unsigned short val[1]; id = semget (KEY, 1, IPC_CREAT | 0666);setval.val = 2; semctl (id, 0, SETVAL, setval);

operations[0].sem_num = 0;operations[0].sem_op = 0;operations[0].sem_flg = 0;  operations[1].sem_num = 0;operations[1].sem_op = 10;operations[1].sem_flg = 0; for (;;) { retval = semop (id, operations, 2); if (retval == 0)

{ printf ("Producer: Adding 10 objects\n"); getval.array = val;

semctl (id, 0, GETALL, getval); printf ("Sem Val: %d\n", getval.array[0]);

}} 

id = semget (KEY, 1, 0666);operations[0].sem_num = 0;operations[0].sem_op = -1;operations[0].sem_flg = 0; for (;;) { retval = semop (id, operations, 1);  if (retval == 0)

{printf ("Consumer: Getting one object from shelf.\n"); setval.array=val;semctl (id, 0, GETALL, setval);printf("Sem Value: %d\n", setval.array[0]);

}}

186

Page 184: NetWork

Shared Memory

• Shared memory allows two or more processes to share a given region of memory.

• This is the fastest form of IPC, because the data does not need to be copied between the client and the server

187

Page 185: NetWork

Message Passing

• Takes 4 copies to transfer data between two processes

188

Page 186: NetWork

Shared Memory

• Takes only two steps • Kernel is not involved in transferring data but it is involved in

creating shared memory

189

Page 187: NetWork

Memory mapped files

190

Page 188: NetWork

Memory mapped files

• proto argument for read-write access is PROT_READ|PROTO_WRITE

• Flags must be either MAP_SHARED or MAP_PRIVATE

• MAP_SHARED is used to share memory with other processes

191

Page 189: NetWork

Why mmap()?

• It makes file handling easy. We open some file and map that file into our process address space. To write or read from file we don’t have to use read(), write() or lseek()

• Another use is to provide shared memory between unrelated processes

192

Page 190: NetWork

Counter Example

• Closing file has no effect on memory mapping

• Memory mappings are propagated to newly created child

193

Page 191: NetWork

System V Shared Memory

• For every shared memory segment kernel maintains the following structure

194

Page 192: NetWork

System V Shared Memory

• Creating or opening shared memory– #include <sys/shm.h> – int shmget(key_t key, size_t size, int flag); – Size is given as zero if we are referencing existing shared

memory segment– When a new segment is created, the contents of the

segment are initialized with zeros

195

Size

of

mem

ory

in by

tes

Page 193: NetWork

Attaching shared memory to a process

• Once a shared memory segment has been created, a process attaches it to its address space by calling shmat.

– #include <sys/shm.h> – void *shmat(int shmid, const void *addr, int flag);

Returns: pointer to shared memory segment if OK, 1 on error• The address in the calling process at which the segment is attached

depends on the addr argument • If addr is 0, the segment is attached at the first available address

selected by the kernel. This is the recommended technique.

196

Page 194: NetWork

Dettaching shared memory from a process

• #include <sys/shm.h>• int shmdt(void *addr); • this does not remove the identifier and its associated data

structure from the system. • The identifier remains in existence until some process (often a

server) specifically removes it by calling shmctl with a command of IPC_RMID.

197

Page 195: NetWork

shmctl

• #include <sys/shm.h>• int shmctl(int shmid, int cmd, struct shmid_ds *buf); • IPC_STAT, IPC_SET same as other XSI IPC.• IPC_RMID: • Remove the shared memory segment set from the system. The

segment is not removed until the last process using the segment terminates or detaches it.

198

Page 196: NetWork

Memory Mapping of /dev/zero

• Shared memory can be used between unrelated processes. But if the processes are related, some implementations provide a different technique.

• The device /dev/zero is an infinite source of 0 bytes when read. This device also accepts any data that is written to it, ignoring the data.

• An unnamed memory region is created and is initialized to 0.• Multiple processes can share this region if a common ancestor specifies the

MAP_SHARED flag to mmap.

199

void *area;if ((fd = open("/dev/zero", O_RDWR)) < 0) perror("open error");if ((area = mmap(0, SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0)) == MAP_FAILED) perror(); close(fd);

Page 197: NetWork

Anonymous Memory Mapping

• A facility similar to the /dev/zero feature. To use this facility, we specify the MAP_ANON flag to mmap and specify the file descriptor as -1.

• The resulting region is anonymous (since it's not associated with a pathname through a file descriptor) and creates a memory region that can be shared with descendant processes.

• this call, we specify the MAP_ANON flag and set the file descriptor to -1.

200

void *area;if ((area = mmap(0, SIZE, PROT_READ | PROT_WRITE, MAP_ANON | MAP_SHARED, -1, 0)) == MAP_FAILED) perror();

Page 198: NetWork

Shared Memory

• Between unrelated processes:– XSI or System V shared memory– can use mmap to map the same file into another process

address spaces using the MAP_SHARED flag.• Between related processes

– Memory mapping of /dev/zero– Unonymous memory mapping

201

Page 199: NetWork

• Pipes and FIFOS• System V Message

Queues, Semaphores, Shared Memory

• Posix Message Queues, semaphores, shared memory

202

Page 200: NetWork

Effect of fork, exec, _exit on IPC

203

Page 201: NetWork

TCP/UDP

Page 202: NetWork

TCP/IP

Page 203: NetWork

TCP or UDP

• At the internet layer, a destination address identifies a host computer; no further distinction is made regarding which process will receive the datagram

• TCP or UDP add a mechanism that distinguishes among destinations within a given host, allowing multiple processes to send and receive datagrams independently

Page 204: NetWork

UDP (User Datagram Protocol)

• UDP provides an unreliable connectionless delivery service

• UDP uses IP to deliver datagrams to the right host.• UDP uses ports to provide communication services to

individual processes.

Page 205: NetWork

Ports

• TCP/IP uses an abstract destination point called a protocol port.

• Ports are identified by a positive integer.• Operating systems provide some mechanism that

processes use, to specify a port.

Page 206: NetWork

Port Numbers

• The port numbers are divided into three ranges by Internet Assigned Numbers Authority

• The well-known ports: 0 through 1023. These port numbers are controlled and assigned by the IANA.

• The registered ports: 1024 through 49151. These are not controlled by the IANA, but the IANA registers and lists the uses of these ports as a convenience to the community.

• The dynamic or private ports, 49152 through 65535. The IANA says nothing about these ports. These are what we call ephemeral ports. (The magic number 49152 is three-fourths of 65536.)

Page 207: NetWork

Ports

Page 208: NetWork

UDP header

• Header size is 8 bytes• Lack of reliability: If a datagram reaches its final destination but the checksum

detects an error, or if the datagram is dropped in the network, it is not delivered to the UDP socket and is not automatically retransmitted.

• If we want to be certain that a datagram reaches its destination, we can build lots of features into our application: acknowledgments from the other end, timeouts, retransmissions, and the like.

Page 209: NetWork

Some standard UDP based services and their ports

Page 210: NetWork

TCPTransmission Control Protocol

• TCP provides connections between clients and servers. • TCP uses the connection, not the protocol port, as its fundamental

abstraction.• Connections are identified by a pair of endpoints.

– Endpoint means (ip, port)• TCP provides:

– Connection-oriented– Reliable– Full-duplex– Byte-Stream

Page 211: NetWork

Connection-Oriented

• Connection oriented means that a virtual connection is established before any user data is transferred.

• A TCP client establishes a connection with a given server, exchanges data with that server across the connection, and then terminates the connection.

• If the connection cannot be established - the user program is notified.

• If the connection is ever interrupted - the user program(s) is notified.

Page 212: NetWork

Reliable

• TCP also provides reliability. When TCP sends data to the other end, it requires an acknowledgment in return.

• If an acknowledgment is not received, TCP automatically retransmits the data and waits a longer amount of time.

• After some number of retransmissions, TCP will give up– the total amount of time spent trying to send data typically between

4 and 10 minutes (depending on the implementation).

Page 213: NetWork

Reliable

• How can TCP provide reliable transfer if the underlying communication system offers only unreliable packet delivery?

• Answer is positive acknowledgement with retransmission.

Page 214: NetWork

Positive Acknowledgement with Retransmission

Page 215: NetWork

Positive Acknowledgement with Retransmission

Page 216: NetWork

Reliability - duplicates

• When an underlying packet delivery system duplicates packets.– Duplicates can arise when networks experience high delays that cause

premature retransmission. – Both packets and acknowledgements can be duplicated.

• To detect duplicate packets by assigning each packet a sequence number and requiring the receiver to remember which sequence numbers it has received.

• To avoid confusion caused by delayed or duplicated acknowledgements, TCP acknowledgement specifies the sequence number of the next octet that the receiver expects to receive.

Page 217: NetWork

Byte Stream

• Stream means that the connection is treated as a stream of bytes. – If payroll data is being sent, there are no boundaries in the

stream differentiating employee records• The user application does not need to package data

in individual datagrams (as with UDP).

Page 218: NetWork

Buffering

• TCP is responsible for buffering data and determining when it is time to send a datagram.

• It is possible for an application to tell TCP to send the data it has buffered without waiting for a buffer to fill up.

Page 219: NetWork

Full Duplex

• TCP provides transfer in both directions.• To the application program these appear as 2

unrelated data streams, although TCP can piggyback control and data communication by providing control information (such as an ACK) along with user data.

Page 220: NetWork

TCP Ports

• Interprocess communication via TCP is achieved with the use of ports (just like UDP).

• UDP ports have no relation to TCP ports (different name spaces).

Page 221: NetWork

TCP Segments

• TCP views the data stream as a sequence of bytes that it divides into segments for transmission. Segments carry varying sizes of data.

• The chunk of data that TCP asks IP to deliver is called a TCP segment.

• Each segment contains:– data bytes from the byte stream– control information that identifies the data bytes

Page 222: NetWork

TCP Segment Format

Page 223: NetWork

TCP Segments

• Segments are exchanged to establish connections, transfer data, send acknowledgements, advertise window sizes, and close connections.

• Because TCP uses piggybacking, acknowledgement can be sent along with data– an acknowledgement traveling from machine A to machine B may

travel in the same segment as data traveling from machine A to machine B, even though the acknowledgement refers to data sent from B to A

Page 224: NetWork

Flags

• TCP advertises how much data it is willing to accept every time it sends segment by specifying its buffer size in the WINDOW field.

Page 225: NetWork

Sliding Window

• TCP uses a specialized sliding window mechanism to solve two important problems

– efficient transmission – flow control.

• The TCP window mechanism makes it possible to send multiple segments before an acknowledgement arrives.

• The TCP form of a sliding window protocol also solves the end-to-end flow control problem, by allowing the receiver to restrict transmission until it has sufficient buffer space to accommodate more data.

Page 226: NetWork

TCP Sliding Window

• Three markers are maintained

• octets upto 2 have been sent and acknowledged,• octets 3 through 6 have been sent but not acknowledged,• octets 7 though 9 have not been sent but will be sent without delay• octets 10 and higher cannot be sent until the window moves

Page 227: NetWork

Variable Window Size and Flow Control

• Each acknowledgement contains a window advertisement that specifies how many additional octets of data the receiver is prepared to accept.

• In response to an increased window advertisement, the sender increases the size of its sliding window

• In response to a decreased window advertisement, the sender decreases the size of its window and stops sending octets beyond the boundary.

• In the extreme case, the receiver advertises a window size of zero to stop all transmissions.

Page 228: NetWork

TCP Connection Establishment

• Three-way handshake • It accomplishes two important functions.

– It guarantees that both sides are ready to transfer data (and that they know they are both ready)

– it allows both sides to agree on initial sequence numbers. • Sequence numbers are sent and acknowledged during the

handshake. Each machine must choose an initial sequence number at random that it will use to identify bytes in the stream it is sending.

Page 229: NetWork

TCP Connection Establishment

• When a client requests a connection, it sends a “SYN” segment (a special TCP segment) to the server port.

• SYN stands for synchronize. The SYN message includes the client’s ISN.

• ISN is Initial Sequence Number.

Page 230: NetWork

TCP Connection Establishment

• Every TCP segment includes a Sequence Number that refers to the first byte of data included in the segment.

• Every TCP segment includes a Request Number (Acknowledgement Number) that indicates the byte number of the next data that is expected to be received.– All bytes up through this number have already been

received.

Page 231: NetWork

TCP Connection Establishment

• A server accepts a connection.– Must be looking for new connections!

• A client requests a connection.– Must know where the server is!

Page 232: NetWork

Client Starts

• A client starts by sending a SYN segment with the following information:– Client’s ISN (generated pseudo-randomly)– Maximum Receive Window for client.– Optionally (but usually) MSS (largest datagram accepted).– No payload! (Only TCP headers)

Page 233: NetWork

Sever Response

• When a waiting server sees a new connection request, the server sends back a SYN segment with:– Server’s ISN (generated pseudo-randomly)– Request Number is Client ISN+1– Maximum Receive Window for server.– Optionally (but usually) MSS – No payload! (Only TCP headers)

Page 234: NetWork

Finally

• When the Server’s SYN is received, the client sends back an ACK with:– Request Number is Server’s ISN+1

Page 235: NetWork

TCP Connection Establishment

Page 236: NetWork

TCP Connection Establishment

Page 237: NetWork

TCP Connection Establishment

• Why is the third message necessary?– HINTS:

• TCP is a reliable service.• IP delivers each TCP segment.• IP is not reliable.

• Why not each connection start with the initial sequence number 1?

Page 238: NetWork

TCP Options

• MSS option. the maximum amount of data that it is willing to accept in each TCP segment, on this connection.

• Window scale option. The maximum window that either TCP can advertise to the other TCP is 65,535. This option specifies that the advertised window in the TCP header must be scaled (left-shifted) by 0–14 bits, providing a maximum window of almost one gigabyte (65,535 x 214).

• Timestamp option. This option is needed for high-speed connections to prevent possible data corruption caused by old, delayed, or duplicated segments.

Page 239: NetWork

TCP Buffers

• Both the client and server allocate buffers to hold incoming and outgoing data– The TCP layer does this.

• Both the client and server announce with every ACK how much buffer space remains (the Window field in a TCP segment).

Page 240: NetWork

Send Buffers

• The application gives the TCP layer some data to send.• The data is put in a send buffer, where it stays until the data is

ACK’d.– it has to stay, as it might need to be sent again!

• The TCP layer won’t accept data from the application unless (or until) there is buffer space.

Page 241: NetWork

Connection Termination

• The TCP layer can send a RST segment that terminates a connection if something is wrong.

• Usually the application tells TCP to terminate the connection gracefully with a FIN segment.

Page 242: NetWork

Connection Termination

Page 243: NetWork

FIN

• Either end of the connection can initiate termination.• A FIN is sent, which means the application is done

sending data.• The FIN is ACK’d.• The other end must now send a FIN.• That FIN must be ACK’d.

Page 244: NetWork

Connection Termination

Page 245: NetWork

TCP Connection State Diagram

• There are 11 different states defined for a connection– based on the current state and the segment received in that state.

• One reason for showing the state transition diagram is to show the 11 TCP states with their names. These states are displayed by netstat, which is a useful tool when debugging client/server applications

Page 246: NetWork
Page 247: NetWork
Page 248: NetWork

What is the purpose of TIME_WAIT?

• Once a TCP connection has been terminated (the last ACK sent) there is some unfinished business:– What if the ACK is lost? The last FIN will be resent and it must be

ACK’d.– What if there are lost or duplicated segments that finally reach the

incarnation of the previous connection after a long delay?• The MSL is the maximum amount of time that any given IP

datagram can live in a network

Page 249: NetWork

Socket Pair

• The socket pair for a TCP connection is the four-tuple that defines the two endpoints of the connection:

– the local IP address, local port, foreign IP address, and foreign port. • A socket pair uniquely identifies every TCP connection on a network. • The two values that identify each endpoint, an IP address and a port

number, are often called a socket.• We can extend the concept of a socket pair to UDP, even though UDP

is connectionless.

Page 250: NetWork

Socket Pair

Page 251: NetWork
Page 252: NetWork
Page 253: NetWork
Page 254: NetWork

Writing to TCP Socket

Page 255: NetWork

Writing to UDP Socket

Page 256: NetWork

Sockets

259

Page 257: NetWork

TCP/IP Model

Page 258: NetWork

TCP/IP

• TCP/IP does not include an API definition.• There are a variety of APIs for use with TCP/IP:

– Sockets– TLI, XTI– Winsock– MacTCP

Page 259: NetWork

Functions needed:

• Specify local and remote communication endpoints• Initiate a connection• Wait for incoming connection• Send and receive data• Terminate a connection gracefully• Error handling

Page 260: NetWork

Berkeley Sockets

• Generic:– support for multiple protocol families.– address representation independence

• Uses existing I/O programming interface as much as possible.– Socket api is similar to file I/O

Page 261: NetWork

Socket

• A socket is an abstract representation of a communication endpoint.

• Sockets work with Unix I/O services just like files, pipes & FIFOs.

• Sockets (obviously) have special needs over files:– establishing a connection– specifying communication endpoint addresses

Page 262: NetWork

Unix Descriptor Table

Page 263: NetWork

Socket Descriptor Data Structure

Page 264: NetWork

Creating a Socket

int socket(int family,int type,int proto);

• family specifies the protocol family (AF_INET for TCP/IP).

• type specifies the type of service (SOCK_STREAM, SOCK_DGRAM).

• protocol specifies the specific protocol (usually 0, which means the default).

Page 265: NetWork

socket()

• The socket() system call returns a socket descriptor (small integer) or -1 on error.

• socket() allocates resources needed for a communication endpoint - but it does not deal with endpoint addressing.

Page 266: NetWork

Specifying an Endpoint Address

• Remember that the sockets API is generic.• There must be a generic way to specify endpoint

addresses.• TCP/IP requires an IP address and a port number for

each endpoint address.• Other protocol suites (families) may use other

schemes.

Page 267: NetWork

Necessary Background Information: POSIX data types

int8_t signed 8bit intuint8_t unsigned 8 bit intint16_t signed 16 bit intuint16_t unsigned 16 bit intint32_t signed 32 bit intuint32_t unsigned 32 bit int

u_char, u_short, u_int, u_long

Page 268: NetWork

More POSIX data types

sa_family_t address familysocklen_t length of structin_addr_t IPv4 addressin_port_t IP port number

Page 269: NetWork

Generic socket addresses

struct sockaddr {uint8_t sa_len;sa_family_t sa_family; char sa_data[14];

};

• sa_family specifies the address type.• sa_data specifies the address value.

Page 270: NetWork

AF_INET

• For AF_INET we need:– 16 bit port number – 32 bit IP address

Page 271: NetWork

struct sockaddr_in (IPv4)

struct sockaddr_in {uint8_t sin_len;sa_family_t sin_family;in_port_t sin_port;

struct in_addr sin_addr; char sin_zero[8];

};A special kind of sockaddr structure – used for IPV4 sockets

Page 272: NetWork

struct in_addr

struct in_addr { in_addr_t s_addr;

};

Page 273: NetWork

Byte Order

Page 274: NetWork

Network Byte Order

• Network communication uses Bigendian style, also known as Network Byte Order (NBO)

• All values stored in a sockaddr_in must be in network byte order.– sin_port a TCP/IP port number.– sin_addr an IP address.

Page 275: NetWork

Network Byte Order Functions

‘h’ : host byte order ‘n’ : network byte order‘s’ : short (16bit) ‘l’ : long (32bit)

uint16_t htons(uint16_t);uint16_t ntohs(uint_16_t);

uint32_t htonl(uint32_t);uint32_t ntohl(uint32_t);

Page 276: NetWork

TCP/IP Addresses

• We don’t need to deal with sockaddr structures since we will only deal with a real protocol family.

• We can use sockaddr_in structures.

BUT: The C functions that make up the sockets API expect structures of type sockaddr.

Page 277: NetWork
Page 278: NetWork

Assigning an address to a socket

• The bind() system call is used to assign an address to an existing socket.

int bind( int sockfd, const struct sockaddr *myaddr, int

addrlen);

• bind returns 0 if successful or -1 on error.const!

Page 279: NetWork

bind()

• calling bind() assigns the address specified by the sockaddr structure to the socket descriptor.

• You can give bind() a sockaddr_in structure: bind( mysock, (struct sockaddr*) &myaddr, sizeof(myaddr) );

Page 280: NetWork

bind() Example

int mysock,err;struct sockaddr_in myaddr;

mysock = socket(PF_INET,SOCK_STREAM,0);myaddr.sin_family = AF_INET;myaddr.sin_port = htons( portnum );myaddr.sin_addr = htonl( ipaddress);

err=bind(mysock, (sockaddr *) &myaddr, sizeof(myaddr));

Page 281: NetWork

Uses for bind()

• There are a number of uses for bind():– Server would like to bind to a well known address (port

number).

– Client can bind to a specific port.

– Client can ask the O.S. to assign any available port number.

Page 282: NetWork

IPv4 Address Conversion

int inet_aton( char *, struct in_addr *);

Convert ASCII dotted-decimal IP address to network byte order 32 bit value. Returns 1 on success, 0 on failure.

char *inet_ntoa(struct in_addr);

Convert network byte ordered value to ASCII dotted-decimal (a string).

Page 283: NetWork

TCP Client Serversocket()

bind()

listen()

accept() socket()

connect()

write()

read()

Client

(Block until connection) “Handshake”

read()

write()

Data (request)

Data (reply)

close()End-of-Fileread()

close()

“well-known”

port

Server

Page 284: NetWork

TCP Client

sd = socket (family, type, protocol);

STREAMDGRAM

RAW

PF_INETPF_INET6PF_UNIXPF_X25

0, used by RAW socket

sd = connect (sd, server_addr, addr_len);

Server PORT#

IP-ADDR

addr

familyport

read (sd, *buff, mbytes);

write (sd, *buff, mbytes);

close (sd);

ephemeral portip addr (routing)

three way handshaking

disconnect sequence

CONNECT actions1. socket is valid2. fill remote endpoint addr/port3. choose local endpoint add/port4. initiate 3-way handshaking

Page 285: NetWork

TCP Server

sd = socket (family, type, protocol);

bind (sd, *server_addr, len);well-known port

#INADDR_ANYaddr

familyport

read (ssd, *buff, mbytes);

write (ssd, *buff, mbytes);

close (ssd);

three way handshaking

disconnect sequence

listen (sd, backlog);

ssd = accept (sd, *cliaddr, *len);

LISTENSOCKET

addr

familyport

CONNECTSOCKET

1. Turn sd from active to passive

2. Queue length

bind port #

closes socket for R/Wnon-blockingattempts to send unsent data

socket option SO_LINGERblock until data sent

Page 286: NetWork

socket() Create a socket

• family is one of– PF_INET (IPv4), PF_INET6 (IPv6), PF_LOCAL (local Unix),– PF_ROUTE (access to routing tables), PF_KEY (encryption)

• type is one of– SOCK_STREAM (TCP), SOCK_DGRAM (UDP)– SOCK_RAW (for special IP packets, PING, etc. Must be root)

• protocol is 0 (used for some raw socket options)• upon success returns socket descriptor

– Integer, like file descriptor– Return -1 if failure

int socket(int family, int type, int protocol);

Page 287: NetWork

connect()Connect to server

• sockfd is socket descriptor from socket()• servaddr is a pointer to a structure with:

– port number and IP address– must be specified (unlike bind())

• addrlen is length of structure• client doesn’t need bind()

– OS will pick ephemeral port• returns socket descriptor if ok, -1 on error

int connect(int sockfd, const struct sockaddr *servaddr, socklen_t addrlen);

Page 288: NetWork

bind() Assign a local protocol address (“name”) to a socket

• sockfd is socket descriptor from socket()• myaddr is a pointer to address struct with:

– port number and IP address– if port is 0, then

• host will pick ephemeral port (very rare for server)• How do you know assigned port number?

– if IP address is wildcard: INADDR_ANY (multiple net cards) • host kernel will choose IP address• INADDR_ defined in <netinet/in.h>• INADDR_ in host byte order => htonl(INADDR_ANY)

• addrlen is length of structure• returns 0 if ok, -1 on error

– EADDRINUSE (“Address already in use”)

int bind(int sockfd, const struct sockaddr *myaddr,

socklen_t addrlen);

Page 289: NetWork

process specifies resultIP address port

wildcard 0 kernel chooses IP addr and port

wildcard nonzero kernel chooses IP, process specifies port

local IP addr 0 process specifies IP, kernel chooses port

local IP addr nonzero process specifies IP and port

bind() address and port

Wildcard specified as INADDR_ANY

Page 290: NetWork

listen()Change socket state to TCP server

• Sockets default to active (for a client)– change to passive so OS will accept connection

• sockfd is socket descriptor from socket()• backlog is maximum number of connections that the server

should queue for this socket– historically 5– rarely above 15 on a even moderate Web server!

int listen(int sockfd, int backlog);

Page 291: NetWork

listen()

Page 292: NetWork

listen()

• Possibility of SYN flooding attack

Page 293: NetWork

accept() Return next completed connection

• sockfd is socket descriptor from socket()• cliaddr and addrlen return protocol address from client• returns brand new descriptor, created by OS• if used with fork(), can create concurrent server

int accept(int sockfd, struct sockaddr *cliaddr, socklen_t *addrlen);

Page 294: NetWork

read() and write()

int read (int sockfd, void *buff, size_t mbytes);int write (int sockfd, void *buff, size_t mbytes);

• Reading and writing packets• Both are system calls

Page 295: NetWork

close() Close socket for use

• sockfd is socket descriptor from socket()• closes socket for reading/writing

– returns (doesn’t block)– attempts to send any unsent data– socket option SO_LINGER

• block until data sent• or discard any remaining data

– Returns -1 if error

int close(int sockfd);

Page 296: NetWork

Descriptor Reference Counts

• For every socket a reference count is maintained, as to how many processes are accessing that socket

• When close() is called on socket descriptor reference count is decreased by 1

• When close() is called on socket descriptor, TCP 4 packet termination sequence will be initiated only if the reference count goes to zero

Page 297: NetWork

getsockname() and getpeername() Functions

• getsockname return the local endpoint address associated with a socket

• getpeername return the foreign protocol address associated with a socket

• #include <sys/socket.h> int getsockname(int sockfd, struct sockaddr

*localaddr, socklen_t *addrlen); int getpeername(int sockfd, struct sockaddr *peeraddr,

socklen_t *addrlen);

Page 298: NetWork

getsockname()

• TCP client that does not call bind, getsockname returns the local IP address and local port number assigned to the connection by the kernel.

• After calling bind with a port number of 0, getsockname returns the local port number that was assigned.

• getsockname can be called to obtain the address family of a socket• In a TCP server that binds the wildcard IP address, once a connection

is established with a client (accept returns successfully), the server can call getsockname to obtain the local IP address assigned to the connection.

Page 299: NetWork

getpeername()

• When a server is execed by the process that calls accept, the only way the server can obtain the identity of the client is to call getpeername

• inetd server works by execing the respective server’s image

Page 300: NetWork

getpeername() : inetd

Page 301: NetWork

TCP Echo Client

intmain(int argc, char **argv){ int sockfd; struct sockaddr_in servaddr; if (argc != 2) err_quit("usage: tcpcli <IPaddress>"); sockfd = Socket(PF_INET, SOCK_STREAM, 0);

bzero(&servaddr, sizeof(servaddr)); servaddr.sin_family = AF_INET; servaddr.sin_port = htons(SERV_PORT); Inet_pton(AF_INET, argv[1], &servaddr.sin_addr); Connect(sockfd, (SA *) &servaddr, sizeof(servaddr)); str_cli(stdin, sockfd); exit(0); }

Page 302: NetWork

str_cli function

2 void 3 str_cli(FILE *fp, int sockfd) 4 { 5 char sendline[MAXLINE], recvline[MAXLINE];

6 while (Fgets(sendline, MAXLINE, fp) != NULL) {

7 Write(sockfd, sendline, strlen (sendline));

8 if (Read(sockfd, recvline, MAXLINE) == 0) 9 err_quit("str_cli: server terminated prematurely");

10 Fputs(recvline, stdout);11 }12 }

Page 303: NetWork

TCP Concurrent Server

Page 304: NetWork

TCP Concurrent Server2 int 3 main(int argc, char **argv) 4 { 5 int listenfd, connfd; 6 pid_t childpid; 7 socklen_t clilen; 8 struct sockaddr_in cliaddr, servaddr;

9 listenfd = Socket (AF_INET, SOCK_STREAM, 0);

10 bzero(&servaddr, sizeof(servaddr));11 servaddr.sin_family = AF_INET;12 servaddr.sin_addr.s_addr = htonl (INADDR_ANY);13 servaddr.sin_port = htons (SERV_PORT);

14 Bind(listenfd, (SA *) &servaddr, sizeof(servaddr));

15 Listen(listenfd, LISTENQ);16 for ( ; ; ) {17 clilen = sizeof(cliaddr);18 connfd = Accept(listenfd, (SA *) &cliaddr, &clilen);

19 if ( (childpid = Fork()) == 0) { /* child process */20 Close(listenfd); /* close listening socket */21 str_echo(connfd); /* process the request */22 exit (0);23 }24 Close(connfd); /* parent closes connected socket */25 }26 }

Page 305: NetWork

str_echo function

void str_echo(int sockfd) { ssize_t n; char buf[MAXLINE]; again: while ( (n = read(sockfd, buf, MAXLINE)) > 0) Write(sockfd, buf, n);

if (n < 0 && errno == EINTR) goto again; else if (n < 0) err_sys("str_echo: read error"); }

Page 306: NetWork

TCP Concurrent Server

• Handling zombies– while ( (pid = waitpid(-1, &stat, WNOHANG)) > 0) in SIGCHLD

signal handler• Handling interrupted system calls

– when writing network programs that catch signals, we must be cognizant of interrupted system calls, and we must handle them

– Slow system call is any system call that can block forever

Page 307: NetWork

Handling interrupted system calls

for ( ; ; ) {clilen = sizeof (cliaddr);if ( (connfd = accept (listenfd, (SA *) &cliaddr,

&clilen)) < 0) { if (errno == EINTR) continue; /* back to for () */ else err_sys ("accept error"); }

Page 308: NetWork

Connection Abort before accept Returns

Page 309: NetWork

Connection Abort before accept Returns

• SVR4 and POSIX return an error of EPROTO or ECONNABORTED

• Berkeley-derived kernels never return any error

Page 310: NetWork

Termination of Server Process

• FIN is sent to client• Client tcp sends ACK to server • What if client application doesn’t take not of it, and

sends data to server?

Page 311: NetWork

SIGPIPE Signal

• When a process writes to a socket that has received an RST, the SIGPIPE signal is sent to the process. The default action of this signal is to terminate the process, so the process must catch the signal to avoid being involuntarily terminated.

Page 312: NetWork

Crashing of Server Host

• Nothing is sent to client• Client will try to reach the host, but will get errors

such as ETIMEDOUT, EHOSTUNREACH, ENETWORKUNREACH

Page 313: NetWork

Crashing and Rebooting of Server Host

• When client sends packets, server will respond with RST

Page 314: NetWork

Shutdown of Server Host

• Init sends SIGTERM to all processes• Then sends SIG KILL to all processes• Fin is sent to the client

Page 315: NetWork

I/O Multiplexing

318

Page 316: NetWork

I/O Multiplexing

• We often need to be able to monitor multiple descriptors:– a generic TCP client (like telnet)– need to be able to handle unexpected situations, perhaps a

server that shuts down without warning.– A server that handles both TCP and UDP

Page 317: NetWork

Example - generic TCP client

• Input from standard input should be sent to a TCP socket.

• Input from a TCP socket should be sent to standard output.

• How do we know when to check for input from each source?

Page 318: NetWork

Generic TCP Client

STDIN

STDOUTTCP

SOC

KET

Page 319: NetWork

Different Solutions

• Use nonblocking I/O.– use fcntl() to set O_NONBLOCK

• Use alarm and signal handler to interrupt slow system calls.

• Use multiple processes/threads.• Use functions that support checking of multiple input

sources at the same time.

Page 320: NetWork

Non blocking I/O

• use fcntl() to set O_NONBLOCK:int flags;flags = fcntl(sock,F_GETFL,0);fcntl(sock,F_SETFL,flags | O_NONBLOCK);• Now calls to read() (and other system calls) will return an

error and set errno to EWOULDBLOCK.

Page 321: NetWork

while (! done) {if ( (n=read(STDIN_FILENO,…)<0))

if (errno != EWOULDBLOCK)/* ERROR */

else write(tcpsock,…)

if ( (n=read(tcpsock,…)<0)) if (errno != EWOULDBLOCK)

/* ERROR */ else write(STDOUT_FILENO,…)}

Page 322: NetWork

The problem with nonblocking I/O• Using blocking I/O allows the Operating System to

put your program to sleep when nothing is happening (no input). Once input arrives the OS will wake up your program and read() (or whatever) will return.

• With nonblocking I/O the process will waste processor time in a busy-wait

Page 323: NetWork

Using alarms

signal(SIGALRM, sig_alrm);alarm(MAX_TIME);read(STDIN_FILENO,…);...

signal(SIGALRM, sig_alrm);alarm(MAX_TIME);read(tcpsock,…);...

Page 324: NetWork

Alarming Problem

• What will be the effect on response time ?

• What is the ‘right’ value for MAX_TIME?

Page 325: NetWork

Select()

• The select() system call allows us to use blocking I/O on a set of descriptors (file, socket, …).

• For example, we can ask select to notify us when data is available for reading on either STDIN or a TCP socket.

Page 326: NetWork

I/O Models

• Blocking• Non-Blocking• IO Multiplexing• Signal-driven IO• Asynchronous IO

Page 327: NetWork

IO Models

• Two phases– Waiting for the data– Copying the data

Page 328: NetWork

Blocking I/Oapplication

recvfrom

Processdatagram

System call

Return OK

No datagram ready

Datagram readycopy datagram

Copy complete

kernel

Process blocks in a call to recvfrom

Wait for data

Copy datafrom kernel to user

Page 329: NetWork

nonblocking I/O

application

recvfrom

Processdatagram

System call

Return OK

No datagram ready

copy datagram

application

kernel

Wait for data

EWOULDBLOCK

recvfrom No datagram readyEWOULDBLOCK

System call

recvfrom datagram readySystem call

Copy datafrom kernel to user

Process repeatedlycall recvfromwating for an OK return(polling)

Page 330: NetWork

I/O multiplexing(select and poll)

application

select

Processdatagram

System call

Return OK

No datagram ready

Datagram readycopy datagram

Copy complete

kernel

Wait for data

Return readable

recvfromCopy datafrom kernel to user

Process blockin a call toselect waitingfor one ofpossibly manysockets tobecome readable

Process blockswhile data copiedinto applicationbuffer

System call

Page 331: NetWork

signal driven I/O(SIGIO)

application

Establish SIGIO

Processdatagram

System call

Return OK

Datagram readycopy datagram

Copy complete

kernel

Wait for data

Deliver SIGIO

recvfrom Copy datafrom kernel to user

Process continues executing

Process blockswhile data copiedinto applicationbuffer

Sigaction system call

Return Signal handler

Signal handler

Page 332: NetWork

asynchronous I/O

application

aio_read

Signal handlerProcessdatagram

System call

Delever signal

No datagram ready

Datagram readycopy datagram

Copy complete

kernel

Process continuesexecuting

Wait for data

Copy datafrom kernel to user

Return

Specified in aio_read

Page 333: NetWork

Comparison of the I/O Models

blocking nonblocking I/O multiplexing

signal-drivenI/O

asynchronous I/O

initiate

complete

check check check check check check

complete

blocked

check

blocked

readyinitiate blocked

complete

notificationinitiate blocked

complete

initiate

notification

wait fordata

copy datafrom kernelto user

ist phase handled differently,2nd phase handled the same

handles both phases

Page 334: NetWork

Select()int select( int maxfd,

fd_set *readset, fd_set *writeset, fd_set *excepset, const struct timeval *timeout);

maxfd : highest number assigned to a descriptor.weadset: set of descriptors we want to read from.writeset: set of descriptors we want to write to.excepset: set of descriptors to watch for exceptions.timeout: maximum time select should wait

Page 335: NetWork

struct timeval

struct timeval {long tv_usec; /* seconds */long tv_usec; /* microseconds */

}

struct timeval max = {1,0};

Page 336: NetWork

Condition of select function

• Wait forever : return only descriptor is ready(timeval = NULL)

• wait up to a fixed amount of time:• Do not wait at all : return immediately after checking

the descriptors(timeval = 0)wait: normally interrupt if the process catches a signal

and returns from the signal handler

Page 337: NetWork

• Readset => descriptor for checking readable• writeset => descriptor for checking writable• exceptset => descriptor for checking two exception conditions :arrival of out of band data for a socket :the presence of control status information to be read from the

master side of a pseudo terminal

Select Function

Page 338: NetWork

Descriptor sets

• Array of integers : each bit in each integer correspond to a descriptor.

• fd_set: an array of integers, with each bit in each integer corresponding to a descriptor.

• Void FD_ZERO(fd_set *fdset); /* clear all bits in fdset */• Void FD_SET(int fd, fd_set *fdset); /* turn on the bit for fd in fdset */• Void FD_CLR(int fd, fd_set *fdset); /* turn off the bit for fd in fdset*/• int FD_ISSET(int fd, fd_set *fdset);/* is the bit for fd on in fdset ? */

Page 339: NetWork

Example of Descriptor sets function

fd_set rset;

FD_ZERO(&rset);/*all bits off : initiate*/FD_SET(1, &rset);/*turn on bit fd 1*/FD_SET(4, &rset); /*turn on bit fd 4*/FD_SET(5, &rset); /*turn on bit fd 5*/

Page 340: NetWork

• specifies the number of descriptors to be tested.• Its value is the maximum descriptor to be tested,

plus one– (example:fd1,2,5 => maxfdp1: 6)

• constant FD_SETSIZE defined by including <sys/select.h>, is the number of descriptors in the fd_set datatype.(1024)

Maxfdp1

Page 341: NetWork

When is the descriptor ready for reading?

• The number of bytes of data in the socket receive buffer is greater than or equal to the current size of the low-water mark for the socket receive buffer. SO_RCVLOWAT socket option. It defaults to 1 for TCP and UDP sockets

• The read half of the connection is closed (i.e., a TCP connection that has received a FIN)

• The socket is a listening socket and the number of completed connections is nonzero.

• A socket error is pending. A read operation on the socket will not block and will return an error (–1) with errno set to the specific error condition.

– These pending errors can also be fetched and cleared by calling getsockopt and specifying the SO_ERROR socket option.

Page 342: NetWork

When the socket is ready for writing?

• The number of bytes of available space in the socket send buffer is greater than or equal to the current size of the low-water mark for the socket send buffer and eit

• The write half of the connection is closed. A write operation on the socket will generate SIGPIPE

• A socket using a non-blocking connect has completed the connection, or the connect has failed

• A socket error is pending. A write operation on the socket will not block and will return an error (–1) with errno set to the specific error condition.

– These pending errors can also be fetched and cleared by calling getsockopt with the SO_ERROR socket option.

Page 343: NetWork

When is the socket descriptor returned in exception list?

• A socket has an exception condition pending if there is out-of-band data for the socket

• or the socket is still at the out-of-band mark

Page 344: NetWork

Condition that cause a socket to be ready for select

Condition Readable? writable? Exception?

Data to readread-half of the connection closednew connection ready for listening socketSpace available for writingwrite-half of the connection closed

•••

••

• •

Pending error

TCP out-of-band data

Page 345: NetWork

Condition handled by select in str_cli

Data of EOF

client

• stdinSocket•

error EOF

RST

TCP

data FIN

select() for readability on either standard input or socket

Page 346: NetWork

Three conditions are handled with the socket

• Peer TCP send a data,the socket becomr readable and read returns greater than 0

• Peer TCP send a FIN(peer process terminates), the socket become readable and read returns 0(end-of-file)

• Peer TCP send a RST(peer host has crashed and rebooted), the socket become readable and returns -1 and errno contains the specific error code

Page 347: NetWork

Implimentation of str_cli function using select

Void str_cli(FILE *fp, int sockfd){int maxfdp1;fd_set rset;charsendline[MAXLINE], recvline[MAXLINE];

FD_ZERO(&rset);for ( ; ; ) {FD_SET(fileno(fp), &rset);FD_SET(sockfd, &rset);maxfdp1 = max(fileno(fp), sockfd) + 1;

Select(maxfdp1, &rset, NULL, NULL, NULL);

Continue…..

if (FD_ISSET(sockfd, &rset)) { /* socket is readable */if (Readline(sockfd, recvline, MAXLINE) == 0)err_quit("str_cli: server terminated prematurely");Fputs(recvline, stdout);}

if (FD_ISSET(fileno(fp), &rset)) { /* input is readable */if (Fgets(sendline, MAXLINE, fp) == NULL)return; /* all done */Writen(sockfd, sendline, strlen(sendline));}}//for}//str_cli

Page 348: NetWork

Stop and waitsends a line to the server and then waits for the reply

request

request

serverrequest

request

serverreply

reply

reply

reply

client

time1

time2

time3

time4

time5

time6

time7

time0

Page 349: NetWork

Batch input

request8 request7 request6 request5

reply1 reply2 reply3 reply4

Time 7:

request9 request8 request7 request6

reply2 reply3 reply4 reply5

Time 8:

Page 350: NetWork

Handling batch input

• The problem with our revised str_cli function– After the handling of an end-of-file on input, the send function

returns to the main function, that is, the program is terminated.– However, in batch mode, there are still other requests and replies in

the pipe.• A way to close one-half of the TCP connection

– send a FIN to the server, telling it we have finished sending data, but leave the socket descriptor open for reading <= shutdown function

Page 351: NetWork

Shutdown function

• Close one half of the TCP connection• Close function :

– decrements the descriptor’s reference count and closes the socket only if the count reaches 0, terminate both directions of data transfer(reading and writing)

• Shutdown function closes just one of them (reading or writing)

Page 352: NetWork

Calling shutdown to close half of a TCP connection

client serverdata

dataFIN

Ack of data and FIN

datadata

FINAck of data and FIN

Read returns > 0Read returns > 0Read returns 0

writewriteclose

writewrite

shutdown

Read returns > 0Read returns > 0

Read returns 0

Page 353: NetWork

• #include<sys/socket.h> int shutdown(int sockfd, int howto); /* return : 0 if OK, -1 on error */• howto argument SHUT_RD : read-half of the connection closed. No more reads can be issued SHUT_WR : write-half of the connection closed. Also called half-close. Buffered

data will be sent followed by termination sequence. SHUT_RDWR : both closed

Shutdown function

Page 354: NetWork

Str_cli function using select and shutdown

#include "unp.h"void str_cli(FILE *fp, int sockfd){

int maxfdp1, stdineof;fd_set rset;charsendline[MAXLINE], recvline[MAXLINE];

stdineof = 0;FD_ZERO(&rset);for ( ; ; ) {

if (stdineof == 0) // select on standard input for readabilityFD_SET(fileno(fp), &rset);

FD_SET(sockfd, &rset);maxfdp1 = max(fileno(fp), sockfd) + 1;Select(maxfdp1, &rset, NULL, NULL, NULL);

Continue…..

Page 355: NetWork

if (FD_ISSET(sockfd, &rset)) { /* socket is readable */if (Readline(sockfd, recvline, MAXLINE) == 0) {if (stdineof == 1)

return; /* normal termination */elseerr_quit("str_cli: server terminated prematurely");}Fputs(recvline, stdout);}if (FD_ISSET(fileno(fp), &rset)) { /* input is readable */if (Fgets(sendline, MAXLINE, fp) == NULL) {

stdineof = 1;Shutdown(sockfd, SHUT_WR);/* send FIN */FD_CLR(fileno(fp), &rset);continue;}Writen(sockfd, sendline, strlen(sendline));}}

}

Str_cli function using select and shutdown

Page 356: NetWork

TCP echo server

• Single process server that uses select to handle any number of clients, instead of forking one child per client.

Page 357: NetWork

Data structure TCP server(1)

Client[][0]

[1][2]

-1-1-1

-1[FD_SETSIZE -1]

rset:fd0 fd1 fd2 fd3

0 0 0 1

Maxfd + 1 = 4

fd:0(stdin),1(stdout),2(stderr)fd:3 => listening socket fd

Before first client has established a connection

Page 358: NetWork

Data structure TCP server(2)

Client[][0]

[1][2]

4-1-1

-1[FD_SETSIZE -1]

rset:fd0 fd1 fd2 fd3

0 0 0 1

Maxfd + 1 = 5

* fd3 => listening socket fd

fd41

*fd4 => client socket fd

After first client connection is established

Page 359: NetWork

Client[][0]

[1][2]

45-1

-1[FD_SETSIZE -1]

rset:fd0 fd1 fd2 fd3

0 0 0 1

Maxfd + 1 = 6

* fd3 => listening socket fd

fd41

* fd4 => client1 socket fd

fd51

* fd5 => client2 socket fd

Data structure TCP server(3)After second client connection is established

Page 360: NetWork

Data structure TCP server(4)

Client[][0]

[1][2]

-15-1

-1[FD_SETSIZE -1]

rset:fd0 fd1 fd2 fd3

0 0 0 1

Maxfd + 1 = 6

* fd3 => listening socket fd

fd40

* fd4 => client1 socket fd deleted

fd51

* fd5 => client2 socket fd

*Maxfd does not change

After first client terminates its connection

Page 361: NetWork

TCP echo server using single process#include "unp.h"int main(int argc, char **argv){

int i, maxi, maxfd, listenfd, connfd, sockfd;int nready, client[FD_SETSIZE];ssize_t n;fd_set rset, allset;char line[MAXLINE];socklen_t clilen;struct sockaddr_in cliaddr, servaddr;listenfd = Socket(AF_INET, SOCK_STREAM, 0);bzero(&servaddr, sizeof(servaddr));servaddr.sin_family = AF_INET;servaddr.sin_addr.s_addr = htonl(INADDR_ANY);servaddr.sin_port = htons(SERV_PORT);Bind(listenfd, (SA *) &servaddr, sizeof(servaddr));Listen(listenfd, LISTENQ);

Page 362: NetWork

maxfd = listenfd; /* initialize */maxi = -1; /* index into client[] array */for (i = 0; i < FD_SETSIZE; i++)client[i] = -1; /* -1 indicates available entry */

FD_ZERO(&allset);FD_SET(listenfd, &allset);for ( ; ; ) {

rset = allset; /* structure assignment */nready = Select(maxfd+1, &rset, NULL, NULL, NULL);

if (FD_ISSET(listenfd, &rset)) { /* new client connection */clilen = sizeof(cliaddr);

connfd = Accept(listenfd, (SA *) &cliaddr, &clilen);for (i = 0; i < FD_SETSIZE; i++)

if (client[i] < 0) {client[i] = connfd; /* save descriptor */break;}

if (i == FD_SETSIZE)err_quit("too many clients");FD_SET(connfd, &allset); /* add new descriptor to set */

if (connfd > maxfd)maxfd = connfd; /* maxfd for select */

if (i > maxi)maxi = i; /* max index in client[] array */

if (--nready <= 0)continue; /* no more readable descriptors */

}

Page 363: NetWork

for (i = 0; i <= maxi; i++) { /* check all clients for data */if ( (sockfd = client[i]) < 0)

continue;if (FD_ISSET(sockfd, &rset)) {

if ( (n = Readline(sockfd, line, MAXLINE)) == 0) {/*connection closed by client */Close(sockfd);FD_CLR(sockfd, &allset);client[i] = -1;

} elseWriten(sockfd, line, n);if (--nready <= 0)break; /* no more readable descriptors */

}}

}}

Page 364: NetWork

Denial of service attacks

• If malicious client connect to the server, send 1 byte of data(other than a newline), and then goes to sleep.

=>call readline, server is blocked.

Page 365: NetWork

Denial of service attacks

• Solution – use nonblocking I/O– have each client serviced by a separate thread of control

(spawn a process or a thread to service each client)– place a timeout on the I/O operation

Page 366: NetWork

pselect function

#include <sys/select.h>#include <signal.h>#include <time.h>

int pselect(int maxfdp1, fd_set *readset, fd_set *writeset, fd_set *exceptset, const struct timespec *timeout, const sigset_t *sigmask)

pselect function was invented by Posix.1g.

Page 367: NetWork

pselect function

• struct timespec{ time_t tv_sec; /*seconds*/ long tv_nsec; /* nanoseconds */• sigmask => pointer to a signal mask.

Page 368: NetWork

Name and Address Conversions

Page 369: NetWork

DNS

RFC 1034RFC 1035

Page 370: NetWork

Hierarchical Namespace

Page 371: NetWork

Naming Authorities

Page 372: NetWork

DNS Record Types

Page 373: NetWork

Types

Page 374: NetWork

Sample DNS Records

aix IN A 192.168.42.2 IN AAAA 3ffe:b80:1f8d:2:204:acff:fe17:bf38 IN MX 5 aix.unpbook.com. IN MX 10 mailhost.unpbook.com.aix-4 IN A 192.168.42.2aix-6 IN AAAA 3ffe:b80:1f8d:2:204:acff:fe17:bf38aix-611 IN AAAA fe80::204:acff:fe17:bf38

Page 375: NetWork

Resolvers and Name Servers

Page 376: NetWork

379

DNS library functions

gethostbyname

gethostbyaddr

getservbyname

getservbyport

getaddrinfo

Page 377: NetWork

380

gethostbyname

struct hostent *gethostbyname( const char *hostname);

struct hostent is defined in netdb.h:

#include <netdb.h>

Page 378: NetWork

381

struct hostent

struct hostent {char *h_name;char **h_aliases; int h_addrtype;int h_length;char **h_addr_list;

};

official name (canonical)other names

AF_INET or AF_INET6address length (4 or

16) array of ptrs to

addresses

Page 379: NetWork

struct hostent

Page 380: NetWork

gethostbyname and errors

• On error gethostbyname return null.• Gethostbyname sets the global variable h_errno to indicate

the exact error:– HOST_NOT_FOUND– TRY_AGAIN– NO_RECOVERY– NO_DATA– NO_ADDRESS

Page 381: NetWork

Sample code using gethostbyname()

char *ptr, **pptr; char str [INET_ADDRSTRLEN]; struct hostent *hptr;

while (--argc > 0) { ptr = *++argv;if ( (hptr = gethostbyname (ptr) ) ==

NULL) {err_msg ("gethostbyname error for host:

%s: %s", ptr, hstrerror (h_errno) ); continue; } printf ("official hostname: %s\n",

hptr->h_name); for (pptr = hptr->h_aliases; *pptr ! =

NULL; pptr++) printf ("\talias: %s\n", *pptr);

switch (hptr->h_addrtype) { case AF_INET: pptr = hptr->h_addr_list; for ( ; *pptr != NULL; pptr++) printf ("\taddress: %s\n", Inet_ntop (hptr->h_addrtype, *pptr,

str, sizeof (str))); break; default: err_ret ("unknown address type"); break; } }

Page 382: NetWork

gethostbyaddr

• #include <netdb.h>struct hostent *gethostbyaddr (const char *addr, socklen_t

len, int family);• The addr argument is not a char*, but is really a pointer to an in_addr

structure containing the IPv4 address. len is the size of this structure: 4 for an IPv4 address. The family argument is AF_INET.

• The function gethostbyaddr takes a binary IPv4 address and tries to find the hostname corresponding to that address. This is the reverse of gethostbyname

Page 383: NetWork

getservbyname and getservbyport

• Services are often known by names.• mapping from the name to port number is contained

in a file (normally /etc/services)• if the port number changes, all we need to modify is

one line in the /etc/services file instead of having to recompile the applications.

Page 384: NetWork

getservbyname

• #include <netdb.h>struct servent *getservbyname (const char *servname, const

char *protoname); struct servent { char *s_name; /* official service name */ char **s_aliases; /* alias list */ int s-port; /* port number, network-byte order */ char *s_proto; /* protocol to use */};

• The service name servname must be specified. If a protocol is also specified (protoname is a non-null pointer), then the entry must also have a matching protocol. Some Internet services are provided using either TCP or UDP

Page 385: NetWork

Usage of getservbyname

struct servent *sptr;

sptr = getservbyname("domain", "udp"); /* DNS using UDP */sptr = getservbyname("ftp", "tcp"); /* FTP using TCP */sptr = getservbyname("ftp", NULL); /* FTP using TCP */sptr = getservbyname("ftp", "udp"); /* this call will fail */

Page 386: NetWork

/etc/services file

• freebsd % grep -e ^ftp -e ^domain /etc/services

ftp-data 20/tcp #File Transfer [Default Data]ftp 21/tcp #File Transfer [Control]domain 53/tcp #Domain Name Serverdomain 53/udp #Domain Name Serverftp-agent 574/tcp #FTP Software Agent Systemftp-agent 574/udp #FTP Software Agent Systemftps-data 989/tcp # ftp protocol, data, over TLS/SSLftps 990/tcp # ftp protocol, control, over TLS/SSL

Page 387: NetWork

getservbyport

• looks up a service given its port number and an optional protocol• usagestruct servent *sptr;

sptr = getservbyport (htons (53), "udp"); /* DNS using UDP */sptr = getservbyport (htons (21), "tcp"); /* FTP using TCP */sptr = getservbyport (htons (21), NULL); /* FTP using TCP */sptr = getservbyport (htons (21), "udp"); /* this call will fail */

Page 388: NetWork

getaddrinfo

• The gethostbyname and gethostbyaddr functions only support IPv4 • handles both

– name-to-address – service-to-port translation,

• returns – sockaddr structures instead of a list of addresses.

• hides all the protocol dependencies • The application deals only with the socket address structures that are

filled in by getaddrinfo

Page 389: NetWork

getaddrinfo

• #include <netdb.h>int getaddrinfo (const char *hostname, const char *service,

const struct addrinfo *hints, struct addrinfo **result) ;

struct addrinfo { int ai_flags; /* AI_PASSIVE, AI_CANONNAME */ int ai_family; /* AF_xxx */ int ai_socktype; /* SOCK_xxx */ int ai_protocol; /* 0 or IPPROTO_xxx for IPv4 and IPv6 */ socklen_t ai_addrlen; /* length of ai_addr */ char *ai_canonname; /* ptr to canonical name for host */ struct sockaddr *ai_addr; /* ptr to socket address structure */ struct addrinfo *ai_next; /* ptr to next structure in linked list */};

Page 390: NetWork

Hints structure

• hints is either a null pointer or a pointer to an addrinfo structure that the caller fills in with hints about the types of information the caller wants returned.

• The members of the hints structure that can be set by the caller are:– ai_flags (zero or more AI_XXX values OR'ed together)– ai_family (an AF_xxx value)– ai_socktype (a SOCK_xxx value)– ai_protocol

• For example, – if the specified service is provided for both TCP and UDP, set ai_socktype

member of the hints structure to SOCK_DGRAM. The only information returned will be for datagram sockets.

Page 391: NetWork

ai_flags

• AI_PASSIVE The caller will use the socket for a passive open.• AI_CANONNAME Tells the function to return the canonical name of the host.• AI_NUMERICHOST Prevents any kind of name-to-address mapping; the hostname argument

must be an address string.• AI_NUMERICSERV Prevents any kind of name-to-service mapping; the service argument must

be a decimal port number string.•

Page 392: NetWork

ai_flags

• AI_V4MAPPED If specified along with an ai_family of AF_INET6, then returns IPv4-mapped IPv6

addresses corresponding to A records if there are no available AAAA records.• AI_ALL If specified along with AI_V4MAPPED, then returns IPv4-mapped IPv6 addresses

in addition to any AAAA records belonging to the name.• AI_ADDRCONFIG Only looks up addresses for a given IP version if there is one or more interface that

is not a loopback interface configured with an IP address of that version.

Page 393: NetWork

Result

• linked list of addrinfo structures, linked through the ai_next pointer.

• There are two ways that multiple structures can be returned:– Multiple ips per hostname; one sockaddr structure for each

ip– Service is provided for multiple socket types;

SOCK_STREAM or SOCK_DGRAM

Page 394: NetWork
Page 395: NetWork
Page 396: NetWork

Usage

• Sockaddr structure in addrinfo structures is ready for – a call to socket – then either a call to connect or sendto (for a client), or bind (for a

server). • The arguments to socket are the members ai_family,

ai_socktype, and ai_protocol. • The second and third arguments to either connect or bind are

ai_addr, and ai_addrlen

Page 397: NetWork

Usage

• struct addrinfo hints, *res;

• bzero(&hints, sizeof(hints) ) ;• hints.ai_flags = AI_CANONNAME;• hints.ai_family = AF_INET;

• getaddrinfo("freebsd4", "domain", &hints, &res);

Page 398: NetWork

Passive sockets

• specifies the service but not the hostname, and specifies the AI_PASSIVE flag in the hints structure.

• The socket address structures returned should contain an IP address of INADDR_ANY (for IPv4) or IN6ADDR_ANY_INIT (for IPv6).

Page 399: NetWork
Page 400: NetWork

Errors: gai_strerror

• const char *gai_strerror (int error);

Page 401: NetWork

freeaddrinfo

• Storage returned by getaddrinfo, the addrinfo structures, the ai_addr structures, and the ai_canonname string are obtained dynamically (e.g., from malloc).

• This storage is returned by calling freeaddrinfo• void freeaddrinfo (struct addrinfo *ai);

Page 402: NetWork

getnameinfo function

• Takes a socket address and returns a character string describing the host and another character nstring describing the service

int getnameinfo(const struct sockaddr *sockaddr, socklen_t addrlen, char *host, size_t hostlen, char *serv, size_t servlen, int flags);

Page 403: NetWork

Elementary UDP Socket

Page 404: NetWork

Contents recvfrom and sendto Function UDP Echo Server( main, de_echo Function) UDP Echo Client( main, de_cli Function) Lost datagrams Verifying Received Response Sever not Running Connect Function with UDP Lack of Flow Control with UDP Determining Outgoing Interface with UDP TCP and UDP Echo Server Using select

Page 405: NetWork

UDP

connectionless unreliable datagram protocol popular using

DNS(the Domain Name System) NFS(the Network File System) SNMP(Simple Network Management Protocol)

Page 406: NetWork

UDP Server

socket( )

bind( )

recvfrom( )

sendto( )

socket( )

sendto( )

recvfrom( )

close( )

Process request

block until datagramreceived from a client

UDP Client

data(request)

data(reply)

Socket functions for UDP client-server

Page 407: NetWork

recvfrom and sendto functions

#include<sys/socket.h>

ssize_t recvfrom(int sockfd, void *buff, size_t nbyte, int flag, struct sockaddr *from, socklen_t *addrlen);

ssize_t sendto(int sockfd, const void *buff, size_t nbyte, int flag, const struct sockaddr *to, socklen_t addrlen); Both return: number of bytes read or written if OK,-1 on error

Page 408: NetWork

Sending UDP Datagramsssize_t sendto( int sockfd,

void *buff,size_t nbytes,int flags,

const struct sockaddr* to, socklen_t addrlen);

sockfd is a UDP socketbuff is the address of the data (nbytes long)to is the address of a sockaddr containing the destination address.Return value is the number of bytes sent, or -1 on error.

Page 409: NetWork

sendto()

• You can send 0 bytes of data!• Some possible errors :

EBADF, ENOTSOCK: bad socket descriptorEFAULT: bad buffer addressEMSGSIZE: message too largeENOBUFS: system buffers are full

Page 410: NetWork

More sendto()

• The return value of sendto() indicates how much data was accepted by the O.S. for sending as a datagram - not how much data made it to the destination.

• There is no error condition that indicates that the destination did not get the data!!!

Page 411: NetWork

Receiving UDP Datagramsssize_t recvfrom( int sockfd,

void *buff,size_t nbytes,int flags,

struct sockaddr* from, socklen_t *fromaddrlen);

sockfd is a UDP socketbuff is the address of a buffer (nbytes long)from is the address of a sockaddr.Return value is the number of bytes received and put into buff, or -1 on

error.

Page 412: NetWork

recvfrom()• If buff is not large enough, any extra data is lost forever...• You can receive 0 bytes of data!• The sockaddr at from is filled in with the address of the sender.• You should set fromaddrlen before calling.• If from and fromaddrlen are NULL we don’t find out who sent

the data.

Page 413: NetWork

More recvfrom()

• Same errors as sendto, but also:– EINTR: System call interrupted by signal.

• Unless you do something special - recvfrom doesn’t return until there is a datagram available.

Page 414: NetWork

server as we had with TCP

connection fock fock connection

connection connection

client client

TCP TCP TCP

serverchild

serverchild

listening

server

Summary of TCP client-server with two clients.

Page 415: NetWork

Socket receivebuffer

client clientserver

UDP UDP UDP

datagram datagram

Summary of UDP client-server with two clients.

server as with UDP

Page 416: NetWork

UDP Echo client: main Function#include “unp.h”

int main(int argc, char **argv)

{

int sockfd;

struct sockaddr_in servaddr;

if (argc != 2)

err_quit( “usage : udpcli <Ipaddress>”);

bzero(&servaddr, sizeof(servaddr);

servaddr.sin_family = AF_INET;

servaddr.sin_port = htons(SERV_PORT);

Inet_pton(AF_INET, argv[1], &servaddr.sin_addr);

sockfd = Socket(AF_INET, SOCK_DGRAM, 0);

dg_cli(stdin, sockfd, (SA *) &servaddr, sizeof(servaddr);

exit(0);

}

Page 417: NetWork

UDP Echo Client: dg_cli Function

#include “unp.h”

void dg_cli(FILE *fp, int sockfd, const SA *pservaddr, soklen_t servlen)

{

int n;

char sendline[MAXLINE], recvline[MAXLINE+1];

while(Fgets(sendline, MAXLINE, fp) != NULL) {

sendto(sockfd, sendline, strlen(sendline), 0, pservaddr, servlen);

n = Recvfrom(sockfd, recvline, MAXLINE, 0, NULL, NULL);

recvline[n] = 0; /* null terminate */

Fputs(recvline,stdout);

}

}

dg_cli function: client processing loop

Page 418: NetWork

Lost Datagrams

If the client datagram arrives at the server but the server’s reply is lost, the client will again block forever in its call to recvfrom.

The only way to prevent this is to place a timeout on the recvfrom.

Page 419: NetWork

Verify Received Response#include “unp.h”

void dg_cli(FILE *fp, int sock, const SA *pseraddr, socklen_t servlen)

{

int n;

char sendline[MAXLINE], recvline[MAXLINE];

socklen_t len;

struct sockaddr *preply_addr;

preply_addr = Malloc(servlen);

while(Fget(sendline, MAXLINE, fp) ! = NULL) {

Sendto(sockfd,sendline, strlen(sendline), 0, pservaddr, servlen);

len = servlen;

n = Recvfrom(sockfdm, recvline, MAXLINE, 0, preply_addr,&len)

continue

Page 420: NetWork

If(len != servlen || memcmp(pservaddr, preply_addr, len) != 0) { printf(“reply from %s (ignore)\n”, Sock_ntop(preply_addr, len); continue; } recvline[n] = 0; /*NULL terminate */ Fputs(recvline, stdout); }}

The server has not bound an IP address to its socket, the kernel choose the source address for the IP datagram. It is chosen to be the primary IP address of the outgoing interface.

Verify Received Response

Page 421: NetWork

Server Not Running

Client blocks forever in the call to recvfrom. ICMP error is asynchronous error.The basic rule is that asynchronous errors are not returned for UDP sockets unless the socket has been connected.

Page 422: NetWork

connect Function with UDP

This does not result in anything like a TCP connection: there is no three-way handshake. Instead, the kernel just records the IP address and port number of the peer.

With a connect UDP socket three change:1. We can no long specify the destination IP address and port for an output

operation. That is, we do not use sendto but use write or send instead.2. We do not use recvfrom but read or recv instead.3. Asynchronous errors are returned to the process for a connected UDP socket.

Page 423: NetWork

} Stores peer IP address and port#from connectUDP UDP

UDP datagram

UDP datagram

???

application peer

UDP datagram from some otherIP address and/or port#

connect Function with UDP

Page 424: NetWork

Lack of Flow Control with UDP

#include “unp.h”

#define NDG 2000#define DGLEN 1400

void dg_cli(FILE *fp, int sockfd, const SA *pservaddr, socklen_t, servlen){ int i; char sendline[MAXLINE]; for(I = 0; I< NDG ; I++) { Sendto(sockfd, sendline, DGLEN, 0, pservaddr, servlen); }}

dg_cli function that writes a fixed number of datagram to server

Page 425: NetWork

#include “unp.h”static void recvfrom_int(int);static int count;void dg_echo(int sockfd, SA *pcliaddr, socklen_t clilen){ socklen_t len; char mesg[MAXLINE]; Signal(SIGHT, recvfrom_int); for( ; ; ) { len=clilen; Recvfrom(sockfd, mesg, MAXLINE, 0, pcliaddr, &len); count++; }}

static void recvfrom_int(int signo){ printf(“\nreceived %d datagram\n”, count); exit(0);}

Lack of Flow Control with UDP

Page 426: NetWork

The interface’s buffers were full or they could have been discarded by the sending host.

The counter “dropped due to full socket buffers” indicates how many datagram were received by UDP but were discarded because the receiving socket’s receive queue was full

The number of datagrams received by the server in this example is nondeterministic. It depends on many factors, such as the network load, the processing load on the client host, and the processing load in the server host.

Solution fast server, slow client. Increase the size of socket receive buffer.

Lack of Flow Control with UDP

Page 427: NetWork

TCP and UDP Echo Server Using select

#include “unp.h”int main(int argc, char **argv){ int listenfd, connfd, udpfd, nready, maxfd1; char mesg[MAXLINE]; pid_t childpid; fd_set rset; ssize_t n; socklen_t len; const int on = 1; struct sockaddr_in cliaddr, servaddr; void sig_chld(int);

Page 428: NetWork

/* Create listening TCP socket */ listenfd = Socket(AF_INET,SOCK_STREAM, 0); bzero(&seraddr, sizeof(servaddr)); servaddr.sin_family = AF_INET; servaddr.sin_addr.s_addr = htol(INADDR_ANY); servaddr.sin_port = htos(SERV_PORT); Setsockopt(listenfd, SOL_SOCKET, SO_REUSEADDR, &on, sizeof(on)); Bind(listenfd, (SA *)&servaddr, sizeof(servaddr));

Listenfd, LISTENQ); /* Create UDP socket */ udpfd = Socket(AF_INET, SOCK_DGRAM, 0); bzero(&seraddr, sizeof(servaddr)); servaddr.sin_family = AF_INET; servaddr.sin_addr.s_addr = htol(INADDR_ANY); servaddr.sin_port = htos(SERV_PORT);

Bind(udpfd, (SA *) &servaddr, sizeof(servaddr));

TCP and UDP Echo Server Using select

Page 429: NetWork

Signal(SIGCHLD, sig_chld); /* must call waitpd( )*/ FD_ZERO(&rset); maxfdp1=max(listenfd, udpfd)+1; for( ; ; ) { FD_SET(listenfd, &rset); FD_SET(udpfd, &rset); if((nready = selext[,axfdp1, &rset, NULL, NULL,NULL) < 0) { if(errno == EINTR) continue; else err_sys(“select error”); } if(FD_ISSET(listenfd,&rset)) { len = sizeof(cliaddr); connfd = Accept(listenfd, (SA *) &cliaddr, &len);

if((childpid = fork( )) == 0) { /* child process */ Close(listenfd); /* Close listening socket */ str_echo(connfd); /* process the request */ exit(0); } Close(connfd); }

TCP and UDP Echo Server Using select

Page 430: NetWork

if(FD_ISSET(udpfd, &rset)) { len = sizeof(cliaddr); n = Recvfro,(udp, mesg, MAXLINE, 0, (SA *) &cliaddr, &len); Sendto(udpfd, ,esg, n, 0, (SA *) &cliaddr, len); } } /* for */} /* main */

TCP and UDP Echo Server Using select

Page 431: NetWork

Advanced UDP Sockets

Page 432: NetWork

When to use UDP instead of TCP?

• Advantages of UDP:– UDP supports broadcasting and multicasting– UDP has no connection setup or teardown

• For a two packet request-reply, we need 8 extra packets to be transmitted in TCP

• UDP: RTT+SPT, TCP: 2 *RTT + SPT

Page 433: NetWork

When to use UDP instead of TCP?

• Features of TCP not provided by UDP:– Positive acknowledgments, retransmission of lost packets,

duplicate detection, and sequencing of packets reordered by the network

• Seq nos, estimate RTO– Windowed flow control– Slow start and congestion avoidance

• to determine the current network capacity and to handle periods of congestion

Page 434: NetWork

When to use UDP instead of TCP?

• Recommendations:– UDP must be used for broadcast and multicast applications

• Error control or reliability be added if reqd at appl layer– UDP can be used for simple request-reply applications, but error

detection must be built into the application • Acknowledgements, timeouts, retransmissions

– UDP should not be used for bulk data transfer• Bulk transfer requires flow control along with error control which is like

replicating TCP at appl layer

Page 435: NetWork

Adding Reliability to a UDP Application

• UDP for a request-reply application– Timeout and retransmission to handle datagrams that are

discarded– Sequence numbers so the client can verify that a reply is for

the appropriate request• Examples which use simple request-reply with

reliability: – DNS resolvers, SNMP agents, TFTP, and RPC

Page 436: NetWork

Handling Timeout and Retransmission

• Old fashioned: Send a request and wait for N seconds linear retransmit timer

• RTT on a network can vary from fractions of a second on a LAN to many seconds on a WAN.

• Factors affecting the RTT are distance, network speed, and congestion

• Timeout should take into account the actual RTTs that we measure along with the changes in the RTT over time

Page 437: NetWork

Retransmission Timeout (RTO) Jacobson's algorithm

• two statistical estimators: srtt is the smoothed RTT estimator and rttvar is the smoothed mean deviation estimator

Page 438: NetWork

RTO

• When the retransmission timer expires, an exponential backoff must be used for the next RTO– For example, if our first RTO is 2 seconds and the reply is

not received in this time, then the next RTO is 4 seconds. If there is still no reply, the next RTO is 8 seconds, and then 16, and so on.

Page 439: NetWork

Retransmission ambiguity problem

• Jacobson's algorithms tell us how to calculate the RTO each time we measure an RTT and how to increase the RTO when we retransmit.

• But, a problem arises when we have to retransmit a packet and then receive a reply. This is called the retransmission ambiguity problem

Page 440: NetWork

Retransmission ambiguity problem

Page 441: NetWork

Retransmission ambiguity problem: Karns Algorithm

• the following rules that apply whenever a reply is received for a request that was retransmitted:– If an RTT was measured, do not use it to update the estimators

since we do not know to which request the reply corresponds.– Since this reply arrived before our retransmission timer expired,

reuse this RTO for the next packet. Only when we receive a reply to a request that is not retransmitted will we update the RTT estimators and recalculate the RTO

Page 442: NetWork

Concurrent UDP Servers

• two different types of servers:– First is a simple UDP server that reads a client request, sends a

reply, and is then finished with the client• fork a child and let it handle the request

– Second is a UDP server that exchanges multiple datagrams with the client.

• Create a new socket for each client, bind an ephemeral port to that socket, and use that socket for all its replies.

• The client look at the port number of the server's first reply and send subsequent datagrams for this request to that port.

Page 443: NetWork

Concurrency in UDP server that exchanges multiple datagrams with the client

Page 444: NetWork

Socket Options

Page 445: NetWork

abstraction

• Introduction• getsockopt and setsockopt function• socket state• Generic socket option• IPv4 socket option• ICMPv6 socket option• IPv6 socket option• TCP socket option• fcnl function

Page 446: NetWork

Introduction

• Three ways to get and set the socket option that affect a socket– getsockopt , setsockopt function=>IPv4 and IPv6

multicasting options– fcntl function =>nonblocking I/O, signal driven I/O– ioctl function =>chapter16

Page 447: NetWork

getsockopt and setsockopt function

#include <sys/socket.h>int getsockopt(int sockfd, , int level, int optname, void *optval, socklent_t *optlen);int setsockopt(int sockfd, int level , int optname, const void *optval, socklent_t optlen);

•sockfd => open socket descriptor•level => code in the system to interprete the option(generic, IPv4, IPv6, TCP)•optval => pointer to a variable from which the new value of option is fetched by setsockopt, or into which the current value of the option is stored by setsockopt.•optlen => the size of the option variable.

Page 448: NetWork

Generic socket option

• SO_BROCAST =>enable or disable the ability of the process to send broadcast message.(only datagram socket : Ethernet, token ring..)

• SO_DEBUG =>kernel keep track of detailed information about all packets sent or received by TCP(only supported by TCP)

• SO_DONTROUTE=>outgoing packets are to bypass the normal routing mechanisms of the underlying protocol.

• SO_ERROR=>when error occurs on a socket, the protocol module in a Berkeley-derived kernel sets a variable named so_error for that socket. Process can obtain the value of so_error by fetching the SO_ERROR socket option

Page 449: NetWork

• SO_KEEPALIVE=>wait 2hours, and then TCP automatically sends a keepalive probe to the peer.– Peer response

• ACK(everything OK)• RST(peer crashed and rebooted):ECONNRESET• no response:ETIMEOUT =>socket closed

– example: Rlogin, Telnet…– Normally used by servers

SO_KEEPALIVE

Page 450: NetWork

SO_LINGER

• SO_LINGER =>specify how the close function operates for a connection-oriented protocol(default:close returns immediately)

– struct linger{ int l_onoff; /* 0 = off, nonzero = on */ int l_linger; /*linger time : second*/

};• l_onoff = 0 : turn off , l_linger is ignored• l_onoff = nonzero and l_linger is 0:TCP abort the connection (send RST),

discard any remaining data in send buffer.• l_onoff = nonzero and l_linger is nonzero : process wait until remained data

sending, or until linger time expired. If socket has been set nonblocking it will not wait for the close to complete, even if linger time is nonzero.

Page 451: NetWork

SO_LINGER

client server

write

Closeclose returns

Data queued by TCP

Application reads queued data and FINclose

data

FIN

Ack of data and FIN

Ack of data and FIN

FIN

Default operation of close:it returns immediately

Page 452: NetWork

SO_LINGER

client server

write

Close Data queued by TCP

Application reads queued data and FINclose

data

FIN

Ack of data and FIN

Ack of data and FIN

FIN

close returns

Close with SO_LINGER socket option set and l_linger a positive value

Page 453: NetWork

SO_LINGER

client server

write

Shutdown read block

Data queued by TCP

Application reads queued data and FINclose

data

FIN

Ack of data and FIN

Ack of data and FIN

FIN

read returns 0

Using shutdown to know that peer has received our data

Page 454: NetWork

• An way to know that the peer application has read the data– use an application-level ack or application ACK– client

char ack;Write(sockfd, data, nbytes); // data from client to servern=Read(sockfd, &ack, 1); // wait for application-level ack

– servernbytes=Read(sockfd, buff, sizeof(buff)); //data from client//server verifies it received the correct amount of data from// the clientWrite(sockfd, “”, 1);//server’s ACK back to client

Page 455: NetWork
Page 456: NetWork
Page 457: NetWork

SO_RCVBUF , SO_SNDBUF

• let us change the default send-buffer, receive-buffer size.

– Default TCP send and receive buffer size : • 4096bytes• 8192-61440 bytes

– Default UDP buffer size : 9000bytes, 40000 bytes• SO_RCVBUF option must be setting before connection

established.– For client, it should be before calling connect()– For server it should be before calling listen()

• TCP socket buffer size should be at least three times the MSSs

Page 458: NetWork

SO_RCVLOWAT , SO_SNDLOWAT

• Every socket has a receive low-water mark and send low-water mark.(used by select function)

• Receive low-water mark: – the amount of data that must be in the socket receive buffer for select to

return “readable”.– Default receive low-water mark : 1 for TCP and UDP

• Send low-water mark: – the amount of available space that must exist in the socket send buffer for

select to return “writable”– Default send low-water mark : 2048 for TCP– UDP send buffer never change because dose not keep a copy of send

datagram.

Page 459: NetWork

SO_RCVTIMEO, SO_SNDTIMEO

• allow us to place a timeout on socket receives and sends.

• Default disabled

Page 460: NetWork

SO_REUSEADDR, SO_REUSEPORT

• Allow a listening server to start and bind its well known port even if previously established connection exist that use this port as their local port.

• Allow multiple instance of the same server to be started on the same port, as long as each instance binds a different local IP address.

• Allow a single process to bind the same port to multiple sockets, as long as each bind specifies a different local IP address.

• Allow completely duplicate bindings : multicasting

Page 461: NetWork

SO_TYPE

• Return the socket type.• Returned value is such as SOCK_STREAM,

SOCK_DGRAM...

Page 462: NetWork

SO_USELOOPBACK

• This option applies only to sockets in the routing domain(AF_ROUTE).

• The socket receives a copy of everything sent on the socket.

Page 463: NetWork

IPv4 socket option

• Level => IPPROTO_IP• IP_HDRINCL => If this option is set for a raw IP

socket, we must build our IP header for all the datagrams that we send on the raw socket.

Page 464: NetWork

IPv4 socket option

• IP_OPTIONS=>allows us to set IP option in IPv4 header.(chapter 24)

• IP_RECVDSTADDR=>This socket option causes the destination IP address of a received UDP datagram to be returned as ancillary data by recvmsg.(chapter20)

Page 465: NetWork

IP_RECVIF

• Cause the index of the interface on which a UDP datagram is received to be returned as ancillary data by recvmsg.(chapter20)

Page 466: NetWork

IP_TOS

• lets us set the type-of-service(TOS) field in IP header for a TCP or UDP socket.

• If we call getsockopt for this option, the current value that would be placed into the TOS(type of service) field in the IP header is returned

Page 467: NetWork

IP_TTL

• We can set and fetch the default TTL(time to live field).

Page 468: NetWork

ICMPv6 socket option

• This socket option is processed by ICMPv6 and has a level of IPPROTO_ICMPV6.

• ICMP6_FILTER =>lets us fetch and set an icmp6_filter structure that specifies which of the 256possible ICMPv6 message types are passed to the process on a raw socket.(chapter 25)

Page 469: NetWork

IPv6 socket option

• This socket option is processed by IPv6 and have a level of IPPROTO_IPV6.

• IPV6_ADDRFORM=>allow a socket to be converted from IPv4 to IPv6 or vice versa.(chapter 10)

• IPV6_CHECKSUM=>specifies the byte offset into the user data of where the checksum field is located.

Page 470: NetWork

IPV6_DSTOPTS

• Specifies that any received IPv6 destination options are to be returned as ancillary data by recvmsg.

Page 471: NetWork

IPV6_HOPLIMIT

• Setting this option specifies that the received hop limit field be returned as ancillary data by recvmsg.(chapter 20)

• Default off.

Page 472: NetWork

IPV6_HOPOPTS

• Setting this option specifies that any received IPv6 hop-by-hop option are to be returned as ancillary data by recvmsg.(chapter 24)

Page 473: NetWork

IPV6_NEXTHOP

• This is not a socket option but the type of an ancillary data object that can be specified to sendmsg. This object specifies the next-hop address for a datagram as a socket address structure.(chapter20)

Page 474: NetWork

IPV6_PKTINFO

• Setting this option specifies that the following two pieces of infoemation about a received IPv6 datagram are to be returned as ancillary data by recvmsg:the destination IPv6 address and the arriving interface index.(chapter 20)

Page 475: NetWork

IPV6_PKTOPTIONS

• Most of the IPv6 socket options assume a UDP socket with the information being passed between the kernel and the application using ancillary data with recvmsg and sendmsg.

• A TCP socket fetch and store these values using IPV6_ PKTOPTIONS socket option.

Page 476: NetWork

IPV6_RTHDR

• Setting this option specifies that a received IPv6 routing header is to be returned as ancillary data by recvmsg.(chapter 24)

• Default off

Page 477: NetWork

IPV6_UNICAST_HOPS

• This is similar to the IPv4 IP_TTL.• Specifies the default hop limit for outgoing datagram

sent on the socket, while fetching the socket option returns the value for the hop limit that the kernel will use for the socket.

Page 478: NetWork

TCP socket option

• There are five socket option for TCP, but three are new with Posix.1g and not widely supported.

• Specify the level as IPPROTO_TCP.

Page 479: NetWork

TCP_KEEPALIVE

• This is new with Posix.1g• It specifies the idle time in second for the connection

before TCP starts sending keepalive probe.• Default 2hours• this option is effective only when the

SO_KEEPALIVE socket option enabled.

Page 480: NetWork

TCP_MAXRT

• This is new with Posix.1g.• It specifies the amount of time in seconds before a

connection is broken once TCP starts retransmitting data.– 0 : use default– -1:retransmit forever– positive value:rounded up to next transmission time

Page 481: NetWork

TCP_MAXSEG

• This allows us to fetch or set the maximum segment size(MSS) for TCP connection.

Page 482: NetWork

TCP_NODELAY

• This option disables TCP’s Nagle algorithm. (default this algorithm enabled)• purpose of the Nagle algorithm.

==>prevent a connection from having multiple small packets outstanding at any time.

• Small packet => any packet smaller than MSS.

Page 483: NetWork

Nagle algorithm

• Default enabled.• Reduce the number of small packet on the WAN.• If given connection has outstanding data , then no

small packet data will be sent on connection until the existing data is acknowledged.

Page 484: NetWork

0250500750

1000125015001500

17502000

hello!

Nagle algorithm disabled

Page 485: NetWork

Nagle algorithm enabled

0250500750

1000125015001500

17502000

hello!

22502500

h

el

lo

!

Page 486: NetWork

fcntl function

• File control• This function perform various descriptor control

operation.• Provide the following features

– Nonblocking I/O(chapter 15)– signal-driven I/O(chapter 22)– set socket owner to receive SIGIO signal. (chapter 21,22)

Page 487: NetWork

#include <fcntl.h>int fcntl(int fd, int cmd, …./* int arg */); Returns:depends on cmd if OK, -1 on error

O_NONBLOCK : nonblocking I/OO_ASYNC : signal driven I/O notification

Page 488: NetWork

Nonblocking I/O using fcntl

Int flags; /* set socket nonblocking */if((flags = fcntl(fd, f_GETFL, 0)) < 0) err_sys(“F_GETFL error”);flags |= O_NONBLOCK;if(fcntl(fd, F_SETFL, flags) < 0) err_sys(“F_ SETFL error”);

each descriptor has a set of file flags that fetched with the F_GETFL command

and set with F_SETFL command.

Page 489: NetWork

Misuse of fcntl

/* wrong way to set socket nonblocking */if(fcntl(fd, F_SETFL,O_NONBLOCK) < 0) err_sys(“F_ SETFL error”);

/* because it also clears all the other file status flags.*/

Page 490: NetWork

Turn off the nonblocking flag

Flags &= ~O_NONBLOCK;if(fcntl(fd, F_SETFL, flags) < 0) err_sys(“F_SETFL error”);

Page 491: NetWork

F_SETOWN

• The integer arg value can be either positive(process ID) or negative (group ID)value to receive the signal.

• F_GETOWN => retrurn the socket owner by fcntl function, either process ID or process group ID.

Page 492: NetWork
Page 493: NetWork
Page 494: NetWork

Unix Domain Protocols

Page 495: NetWork

Chapter 14

Unix domain protocol

Page 496: NetWork

contents

• Introduction• unix domain socket address structure• socketpair• socket function• unix domain stream client-server• unix domain datagram client-server• passing descriptors• receiving sender credentials

Page 497: NetWork

Unix Domain Protocol

• perform client-server communication on a single host using same API that is used for client-server model on the different hosts.

• Faster than internet protocol suite– UNIX domain sockets only copy data; they have no protocol processing to

perform, no network headers to add or remove, no checksums to calculate, no sequence numbers to generate, and no acknowledgements to send.

• The Unix domain protocols are an alternative to the interprocess communication (IPC) methods described

Page 498: NetWork

Unix Domain Protocol

• Two types of sockets are provided in the Unix domain: – stream sockets (similar to TCP) – datagram sockets (similar to UDP).

• The UNIX domain datagram service is reliable, however. Messages are neither lost nor delivered out of order

Page 499: NetWork

Unix Domain Protocol

• Unix domain sockets are used for three reasons:– Unix domain sockets are often twice as fast as a TCP socket when

both peers are on the same host – used when passing descriptors between processes on the same

host. – Unix domain sockets provide the client's credentials (user ID and

group IDs) to the server, which can provide additional security checking

Page 500: NetWork

Unix Domain Protocol

• End Point Address– pathnames within the normal filesystem – The pathname associated with a Unix domain socket should

be an absolute pathname

Page 501: NetWork

unix domain socket address structure

• <sys/un.h>struct sockaddr_un{ uint8_t sun_len; sa_family_t sun_family; /*AF_LOCAL*/ char sun_path[104]; /*null terminated pathname*/};• sun_path => must null terminated

Page 502: NetWork

socketpair Function

• Create two sockets that are then connected together(only available in unix domain socket)

• family must be AF_LOCAL• protocol must be 0

#include<sys/socket.h>int socketpair(int family, int type, int protocol, int sockfd[2]); return: nonzero if OK, -1 on error

Page 503: NetWork

socketpair Function

• Although the socketpair function creates sockets that are connected to each other, the individual sockets don't have names.

• This means that they can't be addressed by unrelated processes.

Page 504: NetWork

unix domain stream client-server

#include "unp.h"int main(int argc, char **argv){

int listenfd, connfd;pid_t childpid;socklen_t clilen;struct sockaddr_un cliaddr, servaddr;void sig_chld(int);

listenfd = Socket(AF_LOCAL, SOCK_STREAM, 0);

unlink(UNIXSTR_PATH);bzero(&servaddr, sizeof(servaddr));servaddr.sun_family = AF_LOCAL;strcpy(servaddr.sun_path, UNIXSTR_PATH);

Bind(listenfd, (SA *) &servaddr, sizeof(servaddr));Listen(listenfd, LISTENQ);Signal(SIGCHLD, sig_chld);

Page 505: NetWork

unix domain stream client-server(2)

for ( ; ; ) {clilen = sizeof(cliaddr);if ( (connfd = accept(listenfd, (SA *) &cliaddr,

&clilen)) < 0) {if (errno == EINTR)

continue; /* back to for() */else

err_sys("accept error");}

if ( (childpid = Fork()) == 0) { /* child process */Close(listenfd); /* close listening socket */str_echo(connfd); /* process the request */exit(0);}

Close(connfd); /* parent closes connected socket */}

}

Page 506: NetWork

passing descriptors

• Current unix system provide a way to pass any open descriptor from one process to any other process.(using sendmsg)

• The ability to pass an open file descriptor between processes is powerful. It can lead to different ways of designing clientserver applications.

• It allows one process (typically a server) to do everything that is required to open a file (involving such details as translating a network name to a network address, dialing a modem, negotiating locks for the file, etc.) and simply pass back to the calling process a descriptor that can be used with all the I/O functions.

• All the details involved in opening the file or device are hidden from the client.

Page 507: NetWork

passing descriptors(2)

1. Create a unix domain socket(stream or datagram)2. one process opens a descriptor by calling any of the unix function that

returns a descriptor3. the sending process build a msghdr structure containing the

descriptor to be passed4. the receiving process calls recvmsg to receive the descriptor on the

unix domain socketPassing a descriptor is not passing a descriptor number, but involves creating a new descriptor in the receiving process that refers to the same file table entry within the kernel as the descriptor that was sent by the sending process.

Page 508: NetWork

Passing Descriptor

Page 509: NetWork

Descriptor passing example

[0] [1]

After creating stream pipe using socketpair

Page 510: NetWork

fork

[1][0]Exec(command-line args)

mycat openfile

descriptor

mycat program after invoking openfile program

Page 511: NetWork

recvmsg and sendmsg

#include <sys/socket.h>

ssize_t recvmsg (int sockfd, struct msghdr *msg, int flags);

ssize_t sendmsg (int sockfd, struct msghdr *msg, int flags);

Struct msghdr {

void *msg_name; /* starting address of buffer */ socklen_t msg_namelen; /* size of protocol address */ struct iovec *msg_iov; /* scatter/gather array */ size_t msg_iovlen; /* # elements in msg_iov */ void *msg_control; /* ancillary data; must be aligned

for a cmsghdr structure */ socklen_t msg_controllen; /* length of ancillary data */ int msg_flags; /* flags returned by recvmsg() */};

Page 512: NetWork

recvmsg and sendmsg

16

020

3

m sg _ n a m e

m sg _ fla gsm sg _ co n tro lle nm sg _ co n tro lm sg _ io v le nm sg _ io vm sg _ n a m e le n

100

60

80

io v_ b a se

io v_ le nio v_ b a seio v_ le nio v_ b a seio v_ le n

iovec{}

F igure 13.8 Data structures when recvmsg is called for a UDP socket.

msghdr{}

Page 513: NetWork

recvmsg and sendmsg

16

020

3

m sg_ na m e

m sg_ flag sm sg_ con tro lle nm sg_ con tro lm sg_ io v lenm sg_ io vm sg_ na m e len

100

60

80

io v_b ase

io v_ lenio v_b aseio v_ lenio v_b aseio v_ len

iovec{} [ ]

F igure 13.9 Update o f F igure 13.8 when recvmsg return.

msghdr{}

cm sg_ typ ecm sg_ leve lcm sg_ len

sockaddr_ in{}16, AF_ INET, 2000198.69.10.2

16IP P R O TP _IPIP _R E C V D S TA D D R206 .62 .22 6 .35

Page 514: NetWork

Ancillary Data• Ancillary data can be sent and received using the msg_control and

msg_controllen members of the msghdr structure with sendmsg and recvmsg functions.

Protocol cmsg_level Cmsg_type Description IPv4 IPPROTO_IP IP_RECVDSTADD

R IP_RECVIF

receive destination address with UDP datagram receive interface index with UDP datagram

IPv6 IPPROTO_IPV6

IPV6_DSTOPTS IPV6_HOPLIMIT IPV6_HOPOPTS IPV6_NEXTHOP IPV6_PKTINFO IPV6_RTHDR

specify / receive destination options specify / receive hop limit specify / receive hop-by-hop options specify next-hop address specify / receive packet information specify / receive routing header

Unix domain

SOL_SOCKET SCM_RIGHTS SCM_CREDS

send / receive descriptors send / receive user credentials

Page 515: NetWork

Ancillary Data

cmsg_len cmsg_level cmsg_type

pad

data

pad

cmsg_len cmsg_level cmsg_type

pad

data

c msghdr{}

c msghdr{}

ac c illarydata objec t

C MSG _ SPAC E()

ac c illarydata objec t

C MSG _ SPAC E()

msg_control

CMSG

_LEN

()cm

sg_le

n

msg_

contr

ollen

cmsg

_len

CMSG

_LEN

()

Figure 13.12 Ancillary data containing two ancillary data objects.

Page 516: NetWork

Ancillary Data

cmsghdr{}

F igure 13.13 cmsghdr structure when used with Unix domain sockets .

cmsg_len cmsg_level cmsg_type

d iscr ip to r

16SOL_SOC KETSC M_RIGHTS

cmsghdr{} cmsg_len cmsg_level cmsg_type

16SOL_SOCKETSC M_C REDS

fcred{}

Page 517: NetWork

Control Message Header

struct cmsghdr { socklen_t cmsg_len; /* data byte count, including header */ int cmsg_level; /* originating protocol */ int cmsg_type; /* protocol-specific type */ /* followed by the actual control message data */ };

Page 518: NetWork

Control Message Header

• To send a file descriptor, – set cmsg_len to the size of the cmsghdr structure, plus the size

of an integer (the descriptor). – The cmsg_level field is set to SOL_SOCKET, and cmsg_type is

set to SCM_RIGHTS, to indicate that we are passing access rights. (SCM stands for socket-level control message.)

– Access rights can be passed only across a UNIX domain socket. The descriptor is stored right after the cmsg_type field, using the macro CMSG_DATA to obtain the pointer to this integer.

Page 519: NetWork

Control Message Header

#include <sys/socket.h>/* size of control buffer to send/recv one file

descriptor */#define CONTROLLEN CMSG_LEN(sizeof(int))static struct cmsghdr *cmptr = NULL; /*

malloc'ed first time *//* * Pass a file descriptor to another process. * If fd<0, then -fd is sent back instead as the

error status. */intsend_fd(int fd, int fd_to_send){ struct iovec iov[1]; struct msghdr msg; char buf[2]; /*

send_fd()/recv_fd() 2-byte protocol */

iov[0].iov_base = buf; iov[0].iov_len = 2; msg.msg_iov = iov; msg.msg_iovlen = 1; msg.msg_name = NULL; msg.msg_namelen = 0;

if (fd_to_send < 0) { msg.msg_control = NULL; msg.msg_controllen = 0; buf[1] = -fd_to_send; /* nonzero status

means error */ if (buf[1] == 0) buf[1] = 1; } else {if (cmptr == NULL && (cmptr = malloc(CONTROLLEN))

== NULL) return(-1); cmptr->cmsg_level = SOL_SOCKET; cmptr->cmsg_type = SCM_RIGHTS; cmptr->cmsg_len = CONTROLLEN; msg.msg_control = cmptr; msg.msg_controllen = CONTROLLEN; *(int *)CMSG_DATA(cmptr) = fd_to_send;

/* the fd to pass */ buf[1] = 0; /* zero status means

OK */ } buf[0] = 0; /* null byte flag to

recv_fd() */ if (sendmsg(fd, &msg, 0) != 2) return(-1); return(0);}

Page 520: NetWork

Control Message Header

#include "apue.h"#include <sys/socket.h> /* struct msghdr */

/* size of control buffer to send/recv one file descriptor */#define CONTROLLEN CMSG_LEN(sizeof(int))

static struct cmsghdr *cmptr = NULL; /* malloc'ed first time *//* * Receive a file descriptor from a server process. Also, any data * received is passed to (*userfunc)(STDERR_FILENO, buf, nbytes). * We have a 2-byte protocol for receiving the fd from send_fd(). */intrecv_fd(int fd, ssize_t (*userfunc)(int, const void *, size_t)){ int newfd, nr, status; char *ptr; char buf[MAXLINE]; struct iovec iov[1]; struct msghdr msg;

status = -1; for ( ; ; ) { iov[0].iov_base = buf; iov[0].iov_len = sizeof(buf); msg.msg_iov = iov; msg.msg_iovlen = 1; msg.msg_name = NULL; msg.msg_namelen = 0; if (cmptr == NULL && (cmptr = malloc(CONTROLLEN)) == NULL) return(-1);

if (cmptr == NULL && (cmptr = malloc(CONTROLLEN)) == NULL)

return(-1);

msg.msg_control = cmptr;

msg.msg_controllen = CONTROLLEN;

if ((nr = recvmsg(fd, &msg, 0)) < 0) {

err_sys("recvmsg error");

} else if (nr == 0) {

err_ret("connection closed by server");

return(-1);

}

for (ptr = buf; ptr < &buf[nr]; ) {

if (*ptr++ == 0) {

if (ptr != &buf[nr-1])

err_dump("message format error");

status = *ptr & 0xFF; /* prevent sign extension */

if (status == 0) {

if (msg.msg_controllen != CONTROLLEN)

err_dump("status = 0 but no fd");

newfd = *(int *)CMSG_DATA(cmptr);

} else {

newfd = -status;

}

nr -= 2;

}

}

if (nr > 0 && (*userfunc)(STDERR_FILENO, buf, nr) != nr)

return(-1);

if (status >= 0) /* final data has arrived */

return(newfd); /* descriptor, or -status */

}

}

Page 521: NetWork

Control Message Header

if (cmptr == NULL && (cmptr = malloc(CONTROLLEN)) == NULL)

return(-1);

msg.msg_control = cmptr;

msg.msg_controllen = CONTROLLEN;

if ((nr = recvmsg(fd, &msg, 0)) < 0) {

err_sys("recvmsg error");

} else if (nr == 0) {

err_ret("connection closed by server");

return(-1);

}

for (ptr = buf; ptr < &buf[nr]; ) {

if (*ptr++ == 0) {

if (ptr != &buf[nr-1])

err_dump("message format error");

status = *ptr & 0xFF; /* prevent sign extension */

if (status == 0) {

if (msg.msg_controllen != CONTROLLEN)

err_dump("status = 0 but no fd");

newfd = *(int *)CMSG_DATA(cmptr);

} else {

newfd = -status;

}

nr -= 2;

}

}

if (nr > 0 && (*userfunc)(STDERR_FILENO, buf, nr) != nr)

return(-1);

if (status >= 0) /* final data has arrived */

return(newfd); /* descriptor, or -status */

}

}

Page 522: NetWork

Ancillary Data

Page 523: NetWork

#include "unp.h"int my_open(const char *, int);int main(int argc, char **argv){

int fd, n;charbuff[BUFFSIZE];

if (argc != 2)err_quit("usage: mycat <pathname>");

if ( (fd = my_open(argv[1], O_RDONLY)) < 0)err_sys("cannot open %s", argv[1]);

while ( (n = Read(fd, buff, BUFFSIZE)) > 0)Write(STDOUT_FILENO, buff, n);

exit(0);}

mycat program show in Figure 14.7)

Page 524: NetWork

#include "unp.h"

intmy_open(const char *pathname, int mode){

int fd, sockfd[2], status;pid_t childpid;char c, argsockfd[10], argmode[10];

Socketpair(AF_LOCAL, SOCK_STREAM, 0, sockfd);

if ( (childpid = Fork()) == 0) { /* child process */Close(sockfd[0]);snprintf(argsockfd, sizeof(argsockfd), "%d", sockfd[1]);snprintf(argmode, sizeof(argmode), "%d", mode);execl("./openfile", "openfile", argsockfd, pathname, argmode,

(char *) NULL);err_sys("execl error");

}

myopen function(1) : open a file and return a descriptor

Page 525: NetWork

/* parent process - wait for the child to terminate */Close(sockfd[1]); /* close the end we don't use */

Waitpid(childpid, &status, 0);if (WIFEXITED(status) == 0)

err_quit("child did not terminate");if ( (status = WEXITSTATUS(status)) == 0)

Read_fd(sockfd[0], &c, 1, &fd);else {

errno = status; /* set errno value from child's status */fd = -1;

}

Close(sockfd[0]);return(fd);

}

myopen function(2) : open a file and return a descriptor

Page 526: NetWork

receiving sender credentials

• User credentials via fcred structure

Struct fcred{uid_t fc_ruid; /*real user ID*/gid_t fc_rgid; /*real group ID*/char fc_login[MAXLOGNAME];/*setlogin() name*/uid_t fc_uid; /*effectivr user ID*/short fc_ngroups; /*number of groups*/gid_t fc_groups[NGROUPS]; /*supplemenary group IDs*/};#define fc_gid fc_groups[0] /* effective group ID */

Page 527: NetWork

receiving sender credentials(2)

• Usally MAXLOGNAME is 16• NGROUP is 16• fc_ngroups is at least 1

• the credentials are sent as ancillary data when data is sent on unix domain socket.(only if receiver of data has enabled the LOCAL_CREDS socket option)

• on a datagram socket , the credentials accompany every datagram.• Credentials cannot be sent along with a descriptor• user are not able to forge credentials

Page 528: NetWork

Advanced I/O Functions

Page 529: NetWork

Outline

• Socket Timeouts• recv and send Functions• readv and writev Functions• recvmsg and sendmsg Function• Ancillary Data• How much Data is Queued?• Sockets and Standard I/O

Page 530: NetWork

Socket Timeouts

• Three ways to place a timeout on an I/O operation involving a socket– Call alarm, which generates the SIGALRM signal when the

specified time has expired.– Block waiting for I/O in select, which has a time limit built in, instead

of blocking in a call to read or write.– Use the newer SO_RCVTIMEO and SO_SNDTIMEO socket

options.

Page 531: NetWork

Connect with a Timeout Using SIGALRM

static void connect_alarm(int);int connect_timeo(int sockfd, const SA *saptr, socklen_t salen, int nsec){

Sigfunc *sigfunc;int n;sigfunc = Signal(SIGALRM, connect_alarm);if (alarm(nsec) != 0)

err_msg("connect_timeo: alarm was already set");if ( (n = connect(sockfd, (struct sockaddr *) saptr, salen)) < 0) {

close(sockfd);if (errno == EINTR)

errno = ETIMEDOUT;}alarm(0); /* turn off the alarm */return(n);

}static voidconnect_alarm(int signo){

return; /* just interrupt the connect() */}

Page 532: NetWork

recvfrom with a Timeout Using SIGALRM

static void sig_alrm(int);void dg_cli(FILE *fp, int sockfd, const SA *pservaddr, socklen_t servlen){

int n;char sendline[MAXLINE], recvline[MAXLINE + 1];Signal(SIGALRM, sig_alrm);while (Fgets(sendline, MAXLINE, fp) != NULL) {

Sendto(sockfd, sendline, strlen(sendline), 0, pservaddr, servlen);alarm(5);if ( (n = recvfrom(sockfd, recvline, MAXLINE, 0, NULL, NULL)) < 0) {

if (errno == EINTR)fprintf(stderr, "socket timeout\n");

elseerr_sys("recvfrom error");

} else {alarm(0);recvline[n] = 0; /* null terminate */Fputs(recvline, stdout);

}}

}static void sig_alrm(int signo){

return; /* just interrupt the recvfrom() */}

Page 533: NetWork

recvfrom with a Timeout Using select

intreadable_timeo(int fd, int sec){

fd_set rset;struct timeval tv;

FD_ZERO(&rset);FD_SET(fd, &rset);

tv.tv_sec = sec;tv.tv_usec = 0;

return(select(fd+1, &rset, NULL, NULL, &tv));/* > 0 if descriptor is readable */

}

Page 534: NetWork

Timeout Using the SO_RCVTIMEO SO_SNDTIMEO Socket Option

• We set this option once for a descriptor, specifying the timeout value, and this timeout then applies to all read operations on that descriptor.

• we set the option only once, compared to the previous two methods, which required doing something before every operation on which we wanted to place a time limit.

• neither socket option can be used to set a timeout for a connect.

Page 535: NetWork

recvfrom with a Timeout Using the SO_RCVTIMEO Socket Option

int n;char sendline[MAXLINE], recvline[MAXLINE + 1];struct timeval tv;tv.tv_sec = 5;tv.tv_usec = 0;Setsockopt(sockfd, SOL_SOCKET, SO_RCVTIMEO, &tv, sizeof(tv));while (Fgets(sendline, MAXLINE, fp) != NULL) {

Sendto(sockfd, sendline, strlen(sendline), 0, pservaddr, servlen);n = recvfrom(sockfd, recvline, MAXLINE, 0, NULL, NULL);if (n < 0) {

if (errno == EWOULDBLOCK) {fprintf(stderr, "socket timeout\n");continue;

} elseerr_sys("recvfrom error");

}recvline[n] = 0; /* null terminate */Fputs(recvline, stdout);

}

Page 536: NetWork

recv and send Functions

#include <sys/socket.h>

ssize_t recv (int sockfd, void *buff, size_t nbytes, int flags);

ssize_t send (int sockfd, const void *buff, size_t nbytes, int flags);

Flags Description recv send

MSG_DONTROUTE MSG_DONTWAIT MSG_OOB MSG_PEEK MSG_WAITALL

bypass routing table lookup only this operation is nonblocking send or receive out-of-band data peek at incoming message wait for all the data

Page 537: NetWork

readv and writev Functions

– readv and writev let us read into or write from one or more buffers with a single function call.

• are called scatter read and gather write.

#include <sys/uio.h>

ssize_t readv (int filedes, const struct iovec *iov, int iovcnt);

ssize_t writev (int filedes, const struct iovec *iov, int iovcnt);

Struct iovec {void *iov_base; /* starting address of buffer */size_t iov_len; /* size of buffer */

};

Page 538: NetWork

readv and writev Functions

– The readv and writev functions can be used with any descriptor, not just sockets. – writev is an atomic operation. For a record-based protocol such as UDP, one call

to writev generates a single UDP datagram.– One use of writev with the TCP_NODELAY socket option. //modify

• a write of 4 bytes followed by a write of 396 bytes could invoke the Nagle algorithm and a preferred solution is to call writev for the two buffers.

Page 539: NetWork

Nagle’s Algorithm

if there is new data to sendif the window size >= MSS and available data is >= MSS send complete MSS segment now

else if there is unconfirmed data still in the pipe enqueue data in the buffer until an acknowledge is received else send data immediately end if end ifend if

Page 540: NetWork

recvmsg and sendmsg

#include <sys/socket.h>

ssize_t recvmsg (int sockfd, struct msghdr *msg, int flags);

ssize_t sendmsg (int sockfd, struct msghdr *msg, int flags);

Struct msghdr {

void *msg_name; /* starting address of buffer */ socklen_t msg_namelen; /* size of protocol address */ struct iovec *msg_iov; /* scatter/gather array */ size_t msg_iovlen; /* # elements in msg_iov */ void *msg_control; /* ancillary data; must be aligned

for a cmsghdr structure */ socklen_t msg_controllen; /* length of ancillary data */ int msg_flags; /* flags returned by recvmsg() */};

Page 541: NetWork

recvmsg and sendmsg

Flag Examined by:

Send flags Sendto flags

Sendmsg flags

Examined by: recv flags

recvfrom flags recvmsg flags

Returned by:

Recvmsg msg_flags

MSG_DONTROUTE MSG_DONTWAIT MSG_PEEK MSG_WAITALL

MSG_EOR MSG_OOB

MSG_BCAST MSG_MCAST MSG_TRUNC MSG_CTRUNC

Page 542: NetWork

recvmsg and sendmsg

16

020

3

m sg _ n a m e

m sg _ fla gsm sg _ co n tro lle nm sg _ co n tro lm sg _ io v le nm sg _ io vm sg _ n a m e le n

100

60

80

io v_ b a se

io v_ le nio v_ b a seio v_ le nio v_ b a seio v_ le n

iovec{}

F igure 13.8 Data structures when recvmsg is called for a UDP socket.

msghdr{}

Page 543: NetWork

recvmsg and sendmsg

16

020

3

m sg_ na m e

m sg_ flag sm sg_ con tro lle nm sg_ con tro lm sg_ io v lenm sg_ io vm sg_ na m e len

100

60

80

io v_b ase

io v_ lenio v_b aseio v_ lenio v_b aseio v_ len

iovec{} [ ]

F igure 13.9 Update o f F igure 13.8 when recvmsg return.

msghdr{}

cm sg_ typ ecm sg_ leve lcm sg_ len

sockaddr_ in{}16, AF_ INET, 2000198.69.10.2

16IP P R O TP _IPIP _R E C V D S TA D D R206 .62 .22 6 .35

Page 544: NetWork

Ancillary Data• Ancillary data can be sent and received using the msg_control and

msg_controllen members of the msghdr structure with sendmsg and recvmsg functions.

Protocol cmsg_level Cmsg_type Description IPv4 IPPROTO_IP IP_RECVDSTADD

R IP_RECVIF

receive destination address with UDP datagram receive interface index with UDP datagram

IPv6 IPPROTO_IPV6

IPV6_DSTOPTS IPV6_HOPLIMIT IPV6_HOPOPTS IPV6_NEXTHOP IPV6_PKTINFO IPV6_RTHDR

specify / receive destination options specify / receive hop limit specify / receive hop-by-hop options specify next-hop address specify / receive packet information specify / receive routing header

Unix domain

SOL_SOCKET SCM_RIGHTS SCM_CREDS

send / receive descriptors send / receive user credentials

Page 545: NetWork

Ancillary Data

cmsg_len cmsg_level cmsg_type

pad

data

pad

cmsg_len cmsg_level cmsg_type

pad

data

c msghdr{}

c msghdr{}

ac c illarydata objec t

C MSG _ SPAC E()

ac c illarydata objec t

C MSG _ SPAC E()

msg_control

CMSG

_LEN

()cm

sg_le

n

msg_

contr

ollen

cmsg

_len

CMSG

_LEN

()

Figure 13.12 Ancillary data containing two ancillary data objects.

Page 546: NetWork

Ancillary Data

cmsghdr{}

F igure 13.13 cmsghdr structure when used with Unix domain sockets .

cmsg_len cmsg_level cmsg_type

d iscr ip to r

16SOL_SOC KETSC M_RIGHTS

cmsghdr{} cmsg_len cmsg_level cmsg_type

16SOL_SOCKETSC M_C REDS

fcred{}

Page 547: NetWork

How Much Data Is Queued?

• nonblocking I/O • MSG_PEEK with MSG_DONTWAIT flag• FIONREAD command of ioctl

Page 548: NetWork

Sockets and Standard I/O

• The standard I/O stream can be used with sockets, but there are a few items to consider.

– A standard I/O stream can be created from any desciptor by calling the fdopen function. Similarly, given a standard I/O stream, we can obtain the corresponding descriptor by calling fileno.

– fseek, fsetpos, rewind functions is that they all call lseek, which fails on a socket.

– The easiest way to handle this read-write problem is to open two standard I/O streams for a given socket: one for reading, and one for writing.

Page 549: NetWork

Standard i/O buffers

• Fully buffered: i/O takes place only when the buffer is full, fflush() or exit() 8192 bytes

• Line buffered: i/O takes place when a new line is encountered, fflush(), or exit()

• Unbuffered: i/O take place each time a standard i/O output function is called.

Page 550: NetWork

Standard i/O buffers

• Standard error is always unbuffered• Standard input and standard output are fully buffered,

unless they refer to a terminal device in which case they are line buffered.

• All other streams are fully buffered unless they refer to terminal device in which case they are line buffered.

Page 551: NetWork

Sockets and Standard I/O

#include "unp.h"

voidstr_echo(int sockfd){

char line[MAXLINE];FILE *fpin, *fpout;

fpin = Fdopen(sockfd, "r");fpout = Fdopen(sockfd, "w");

for ( ; ; ) {if (Fgets(line, MAXLINE, fpin) == NULL) return; /* connection closed by other end */

Fputs(line, fpout);}

}

Page 552: NetWork

Chapter 12.

Daemon Processes and inetd Superserver

Page 553: NetWork

12.1 Introduction

• A daemon is a process that runs in the background and is independent of control from all terminals.

• There are numerous ways to start a daemon1. the system initialization scripts ( /etc/rc )2. the inetd superserver3. croncron deamon4. the at command5. from user terminals

• Since a daemon does not have a controlling terminal, it needs some way to output message when something happens, either normal informational messages, or emergency messages that need to be handled by an administrator.

Page 554: NetWork

12.2 syslogd daemon

• Berkeley-derived implementation of syslogd perform the following actions upon startup.

1. The configuration file is read, specifying what to do with each type of log message that the daemon can receive.

2. A Unix domain socket is created and bound to the pathname /var/run/log ( /dev/log on some system).

3. A UDP socket is created and bound to port 5144. The pathname /dev/klog is opened. Any error messages from

within the kernel appear as input on this device.

• We could send log messages to the syslogd daemon from our daemons by creating a Unix domain datagram socket and sending our messages to the pathname that the daemon has bound, but an easier interface is the syslog function.

Page 555: NetWork

syslogd

syslogdUDP socket

port 514

Unix domain socket/dev/log

/dev/klog

Filesystem/var/log/messages

Remote syslogd

Console

Page 556: NetWork

12. 3 syslog function

– the priority argument is a combination of a level and a facility.

– The message is like a format string to printf, with the addition of a %m specification, which is replaced with the error message corresponding to the current value of errno.

Ex) Syslog(LOG_INFO|LOG_LOCAL2, “rename(%s, %s): %m”,file1,file2);

#include <syslog.h>

void syslog(int priority, const char *message, . . . );

Page 557: NetWork

12. 3 syslog function

• Log message have a level between 0 and 7.level value descriptionLOG_EMERG 0 system is unusable ( highest priority )LOG_ALERT 1 action must be taken immediatelyLOG_CRIT 2 critical conditionsLOG_ERR 3 error conditionsLOG_WARNING 4 warning conditionsLOG_NOTICE 5 normal but significant condition (default)LOG_INFO 6 informationalLOG_DEBUG 7 debug-level message ( lowest priority )

Figure 12.1 level of log message.

Page 558: NetWork

12. 3 syslog function

• A facility to identify the type of process sending the message.

facility DescriptionLOG_AUTH security / authorization messagesLOG_AUTHPRIV security / authorization messages (private)LOG_CRON cron daemonLOG_DAEMON system daemonsLOG_FTP FTP daemonLOG_KERN kernel messagesLOG_LOCAL0 local useLOG_LOCAL1 local useLOG_LOCAL2 local useLOG_LOCAL3 local useLOG_LOCAL4 local useLOG_LOCAL5 local useLOG_LOCAL6 local useLOG_LOCAL7 local useLOG_LPR line printer systemLOG_MAIL mail systemLOG_NEWS network news systemLOG_SYSLOG messages generated internally by syslogLOG_USER random user-level messages(default)LOG_UUCP UUCP system

Figure 12.2 facility of log messages.

Page 559: NetWork

12. 3 syslog function

• Openlog and closelog– openlog can be called before the first call to syslog and

closelog can be called when the application is finished sending is finished log messages.

#include <syslog.h>

void openlog(const char *ident, int options, int facility);

void closelog(void);

options Description LOG_CONS Log to console if cannot send to syslog daemon LOG_NDELAY Do not delay open, create socket now LOG_PERROR Log to standard error as well as sending to syslogd

daemon LOG_PDI Log the process ID with each message

Figure 12.3 options for openlog

Page 560: NetWork

Unix Login

Page 561: NetWork

Unix Login

Page 562: NetWork

Process Group

• process group is a collection of one or more processes, usually associated with the same job

• int setpgid(pid_t pid, pid_t pgid);• pid_t getpgid(pid_t pid); • It is possible for a process group leader to create a

process group, create processes in the group, and then terminate. The process group still exists, as long as at least one process is in the group, regardless of whether the group leader terminates

Page 563: NetWork

Process Groups in a Session

• The processes in a process group are usually placed there by a shell pipeline – proc1 | proc2 & – proc3 | proc4 | proc5

Page 564: NetWork

Creating Session

• A process establishes a new session by calling the setsid function

• If the calling process is not a process group leader, this function creates a new session. Three things happen.– The process becomes the session leader of this new session.

(A session leader is the process that creates a session.) The process is the only process in this new session.

– The process becomes the process group leader of a new process group. The new process group ID is the process ID of the calling process.

– The process has no controlling terminal. If the process had a controlling terminal before calling setsid, that association is broken.

Page 565: NetWork

setsid

• pid_t setsid(void); • This function returns an error if the caller is already a

process group leader. • To ensure this is not the case, the usual practice is to

call fork and have the parent terminate and the child continue. We are guaranteed that the child is not a process group leader, because the process group ID of the parent is inherited by the child, but the child gets a new process ID. Hence, it is impossible for the child's process ID to equal its inherited process group ID

Page 566: NetWork

Controlling Terminal

Page 567: NetWork

12.4 daemon_init Function#include <syslog.h>#define MAXFD 64extern int daemon_proc; /* defined in error.c */void daemon_init(const char *pname, int facility){

int i;pid_t pid;

if ( (pid = Fork()) != 0)exit(0); /* parent terminates */

/* 1st child continues */setsid(); /* become session leader */Signal(SIGHUP, SIG_IGN);if ( (pid = Fork()) != 0) exit(0); /* 1st child terminates */

/* 2nd child continues */daemon_proc = 1; /* for our err_XXX() functions */chdir("/"); /* change working directory */umask(0); /* clear our file mode creation mask */

for (i = 0; i < MAXFD; i++)close(i);

openlog(pname, LOG_PID, facility);}

Page 568: NetWork

Daemon_init

1. We first call fork and then the parent terminates, and the child continues. If the process was started as a shell command in the foreground, when the parent terminates, the shell thinks the command is done. This automatically runs the child process in the background. Also, the child inherits the process group ID from the parent but gets its own process ID. This guarantees that the child is not a process group leader, which is required for the next call to setsid

2. The process becomes the session leader of the new session, becomes the process group leader of a new process group, and has no controlling terminal

Page 569: NetWork

Daemon_init

• We ignore SIGHUP and call fork again. When this function returns, the parent is really the first child and it terminates, leaving the second child running. The purpose of this second fork is to guarantee that the daemon cannot automatically acquire a controlling terminal should it open a terminal device in the future. When a session leader without a controlling terminal opens a terminal device (that is not currently some other session's controlling terminal), the terminal becomes the controlling terminal of the session leader. But by calling fork a second time, we guarantee that the second child is no longer a session leader, so it cannot acquire a controlling terminal. We must ignore SIGHUP because when the session leader terminates (the first child), all processes in the session (our second child) receive the SIGHUP signal.

Page 570: NetWork

12.5 inetd Daemon

• A typical Unix system’s problems1. All these daemons contained nearly identical startup code.2. Each daemon took a slot in the process table, but each daemon

was asleep most of the time.

• inetd daemon fixes the two problems.1. It simplifies writing daemon processes, since most of the startup

details are handled by inetd.2. It allow a single process(inetd) to be waiting for incoming client

requests for multiple services, instead of one process for each service.

Page 571: NetWork

12.5 inetd daemon

• Figure 12.7

socket()

bind()

listen()(if TC P socke t)

select()fo r readab ility

accpet()( if TC P socke t)

fork()

close a ll descrip to rs o the rthan socke t

dup socke t to desc rip to rs0 ,1 and 2 ;

close socke t

setgid()setuid()

( if use r no t roo t)

exec() se rve r

close connec tedsocke t(if TC P )

F or each service lis ted in the /etc/inetd.conf file

parent child

Page 572: NetWork

inetd service specification

• For each service, inetd needs to know:– the socket type and transport protocol– wait/nowait flag.– login name the process should run as.– pathname of real server program.– command line arguments to server program.

• Servers that are expected to deal with frequent requests are typically not run from inetd– mail, web, NFS.

Page 573: NetWork

# Syntax for socket-based Internet services:

# <service_name> <socket_type> <proto> <flags> <user> <server_pathname> <args>

# # comments start with #echo stream tcp nowait root internalecho dgram udp wait root internalchargen stream tcp nowait root internalchargen dgram udp wait root internalftp stream tcp nowait root /usr/sbin/ftpd ftpd -ltelnet stream tcp nowait root /usr/sbin/telnetd telnetdfinger stream tcp nowait root /usr/sbin/fingerd fingerd# Authenticationauth stream tcp nowait nobody /usr/sbin/in.identd in.identd -l -e -o# TFTPtftp dgram udp wait root /usr/sbin/tftpd tftpd -s /tftpboot

Example /etc/inetd.conf

Page 574: NetWork

wait/nowait

• WAIT specifies that inetd should not look for new clients for the service until the child (the real server) has terminated.

• TCP servers usually specify nowait - this means inetd can start multiple copies of the TCP server program - providing concurrency

• Most UDP services run with inetd told to wait until the child server has died.

Page 575: NetWork

Broadcasting 578

• Many networks support the notion of sending a message from one host to all other hosts on the network.

• A special address called the “broadcast address” is often used.

• Some popular network services are based on broadcasting (YP/NIS, rup, rusers)

Broadcasting

Page 576: NetWork

Broadcasting 579

Broadcasting

• TCP works only with unicast addresses, UDP supports also broadcasting and multicasting

• Multicasting support is optional in IPv4, but mandatory in IPv6• Broadcasting support is not provided in IPv6; if an IPv4 application uses

broadcasting, recode with IPv6 to use multicasting instead of broadcasting

Type IPv4 IPv6 TCP UDP

Unicast

Broadcast

Multicast opt.

Page 577: NetWork

Broadcasting 580

Broadcasting

Types of Casting:Unicast: One to OneAnycast: a set to one in a setMulticast: a set to all in a setBroadcast: all to all

Useful over LAN only, and with UDP

Page 578: NetWork

Broadcasting 581

Uses of Broadcasting

• Mainly used for resource discovery purposes (server is known to exist in the local subnet, but IP address is not known)

– ARP (Address Resolution Protocol) • Broadcast to find MAC address for known IP address – The owner of the

IP address is to reply– BOOTP (Bootstrap Protocol)

• For a diskless workstation to discover its own IP address, the IP address of a BOOTP server on the network, and a file to be loaded into memory to boot the machine

– NTP (Network Time Protocol) • To synchronize time and coordinate time distribution in a large network

– Routing Daemons :broadcasts routing table on LAN

Page 579: NetWork

Broadcasting 582

Broadcast Address Types

• IPv4 address: {netid; subnetid; hostid}– Subnet-directed Broadcast Address:

• {netid; subnetid; -1} //-1 means all bits are 1’s• netid = 128.7, subnetid: 6

Broadcast Address: 128.7.6.255• Normally, routers do not forward these broadcasts

– All-subnets-directed Broadcast Address:• {netid; -1; -1}• All subnets on the specified network – very rarely used

– Network-directed Broadcast Address:• {netid: -1}• If a network has no subnetting – almost non-existent

Page 580: NetWork

Broadcasting 583

Broadcast Address Types

– Limited Broadcast Address:• {-1; -1; -1} or 255.255.255.255• Must never be forwarded by a router

• Subnet-directed broadcast and limited broadcast are the most common• Old systems do not understand subnet-directed broadcast• For protocols like BOOTP, 255.255.255.255 is the only option

Page 581: NetWork

Broadcasting 584

Unicast Vs Broadcast

In Unicast, only peers participate In Broadcast, every host on the subnet has to receive the packet and

process it up to the transport layer i.e through DL,IP, and UDP Every non-IP host also must receive at the datalink layer If broadcast datagrams arrive at higher rate, processing can affect

severely the performance

Page 582: NetWork

Broadcasting 585

Unicast

SendingAppl

UDP

IPv4

DataLink

UDP

IPv4

DataLink

ReceivingAppl

UDP

IPv4

DataLink

subnet 128.7.6

SendtoDest IP: 128.7.6.5Dest Port: 7433

02:60:8c:2f:4e:00

128.7.6.99 = unicast128.7.6.255 = broadcast

Enethdr

IPv4hdr

UDPhdr

UDPData

Dest Enet: 08:00:20:03:f6:42Frame type: 0800

Dest IP: 128.7.6.5Protocol: UDP

Dest Port: 7433

08:00:20:03:f6:42

128.7.6.5 = unicast128.7.6.255 = broadcast

7433

Frame type= 0800

Protocol=UDP

Port=7433

Page 583: NetWork

Broadcasting 586

Broadcast

SendingAppl

UDP

IPv4

DataLink

UDP

IPv4

DataLink

ReceivingAppl

UDP

IPv4

DataLink

subnet 128.7.6

sendtoDest IP: 128.7.6.255Dest Port: 520

02:60:8c:2f:4e:00

128.7.6.99 = unicast128.7.6.255 = broadcast

Enethdr

IPv4hdr

UDPhdr

UDPData

Dest Enet: ff:ff:ff:ff:ff:ffFrame type: 0800

Dest IP: 128.7.6.255Protocol: UDP

Dest Port: 520

02:60:20:03:f6:42

128.7.6.5 = unicast128.7.6.255 = broadcast

520

Frame type= 0800

Protocol=UDP

Port=520

Frame type= 0800

Protocol=UDP

Discard

Set SO_BROADCASToption using setsockopt()

Page 584: NetWork

Broadcasting 587

Programming Requirements

• Socket option has to be set with SO_BROADCAST

• Setsockopt(sockfd, SOL_SOCKET,SO_BROADCAST,&on,sizeof(on)).

• IP Fragmentation: BSD generates EMSGSIZE if size exceeds outgoing MTU

Page 585: NetWork

Broadcasting 588

Race Condition

void dg_cli(…) {setsockopt(sockfd, SOL_SOCKET,SO_BROADCAST,&on,sizeof(on));

signal(SIGALRM, func);while(fgets(…)!=NULL) {

sendto(…);alarm(1);for(; ; ) {

if (n=recvfrom(…) <0) {if (errno==EINTR) break;else err_sys(…);

} else {recvline[n]=0;sleep(1);printf(…);

}}}Void func( int signo) { return; }

Problem?

- When multiple processes accessing shared data output depends on the execution order of the processes.

Page 586: NetWork

Broadcasting 589

Solutions to Race Condition

1. By Un-blocking and Blocking SIGALRMsigemptyset(&sig1);

sigaddset(&sig1, SIGALRM);

signal(SIGALRM, func);

while(fgets(…) !=NULL))

sendto(…);

alarm(5);

for(; ; ){

sigprocmask(SIG_UNBLOCK, &sig1,NULL);

n=recvfrom(…);

sigprocmask(SIG_BLOCK,&sig1, NULL);

if(n<0) {

if (errno==EINTR) break; else err_sys(…);

} else { recvline[n]=0; printf(…); }}}

void func(…)

{return;}

Signal Generation and Delivery is controlled

Window is reduced but the problem still persists

Page 587: NetWork

Broadcasting 590

2. pselect can be used with SIGALRM first blocked and then pselect being called with an empty signal set as it’s last argument.

pselect, blocking and unblocking being atomic calls, earlier

problem does not persist.

Page 588: NetWork

Broadcasting 591

3. Using non-local goto siglongjmp to jump from signal handler to the caller.signal(SIGALRM, func);

while (fgets(…)!=NULL) {sendto(…);alarm(5);for(; ;) {

if (sigsetjmp(jmpbuf, 1) != 0)break;

n=recvfrom(…);recvline[n]=0;printf(…);

}void func(…) {siglongjmp(jmpbuf, 1);}

Page 589: NetWork

Broadcasting 592

4. Using IPC from signal handler to function

void dg_cli(…) {setsockopt(…);pipe (pipefd);FD_ZERO(&rset);signal(SIGALRM, func);while(fgets(…)!=NULL){

sendto(…);alarm(5);for(; ;) {

FD_SET(sockfd, &rset);FD_SET(pipefd[0],&rset);if(n = select (…) <0) {

if (errno==EINTR) continue; else err_sys(…); }

if (FD_ISSET(sockfd, &rset) ) {recvfrom(…); printf(…); }

if (FD_ISSET(pipefd[0], &rset)) {read(pipefd[0], &n, 1); break; }

void func(int signo) {write (pipefd[1], “ ”, 1); return;}

Page 590: NetWork

Multicasting 593

• IPv4 Class D addresses are multicast addresses– Range 224.0.0.0 to 239.255.255.255

– 32 bit Class D address is called the group address

0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7

0 NET-ID(7b) HOST-ID (24b)

1 0 NET-ID (12b) HOST-ID (14b)

1 1 0 NET-ID (21b) HOST-ID (8b)

1 1 1 0 GROUP-ID (28b)

CLASS A:

CLASS B:

CLASS C:

CLASS D:

Multicasting

Page 591: NetWork

Multicasting 594

• A mapping from IPv4 multicast addresses to Ethernet addresses is also defined– High order 24 bits always 01:00:5e– 25th bit is 0– Low order 23 bits from lowest 23 bits of multicast group address– Not one-to-one, many (32) multicast addresses to a single Ethernet

address

• Broadcasting is normally limited to LANs, whereas Multicasting can be done in LANs or WANs

Page 592: NetWork

multicast address• IPv4 class D address

– 224.0.0.0 ~ 239.255.255.255 – (224.0.0.1: all hosts group), (224.0.0.2: all-routers group)

Page 593: NetWork

Multicast Addresses Scope

Page 594: NetWork

Multicast Session

• Especially in the case of streaming multimedia, the combination of an IP multicast address (either IPv4 or IPv6) and a transport-layer port (typically UDP) is referred to as a session.

• For example, an audio/video teleconference may comprise two sessions; one for audio and one for video. These sessions almost always use different ports and sometimes also use different groups for flexibility in choice when receiving.

Page 595: NetWork

Multicasting 598

Multicast vs Broadcast

SendingAppl

UDP

IPv4

DataLink

UDP

IPv4

DataLink

ReceivingAppl

UDP

IPv4

DataLink

subnet 128.7.6

sendtoDest IP: 224.0.1.1Dest Port: 123

02:60:8c:2f:4e:00

Enethdr

IPv4hdr

UDPhdr

UDPData

Dest Enet: 01:00:5e:00:01:01Frame type: 0800

Dest IP: 224.0.1.1Protocol: UDP

Dest Port: 123

02:60:20:03:f6:42

123

Frame type= 0800

Protocol=UDP

Port=123 join

224.0.1.1

receive01:00:5e:00:01:01

Imperfect hw filteringbased on dest Enet

Perfect sw filteringbased on dest IP

Page 596: NetWork

Multicasting 599

Multicasting on a WAN

MR1

MR2 MR3

MR5

MR4

Page 597: NetWork

Multicasting 600

Hosts joining a Multicast Group

MR1

MR2 MR3

MR5

MR4

H1

H2 H3 H4 H5

joingroup

joingroup

joingroup

joingroup

joingroup

MRPMRP MRP

MRP

Page 598: NetWork

Multicasting 601

Sending packets on a WAN

MR1

MR2 MR3

MR5

MR4

H1

H2 H3 H4 H5

joingroup

joingroup

joingroup

joingroup

joingroup

Page 599: NetWork

Multicasting 602

Multicasting

• Specifically note that;– All interested multicast routers receive the packets, MR5 does not

receive any since there are no interested hosts in its LAN– Packets are put to the specific LAN only if there are hosts in that LAN

to receive those packets, MR3 only forwards– Multicast router MR2 both puts packets on its LAN for hosts H2 & H3,

and also makes a copy of the packets and forwards them to MR3.– This behavior is something unique to multicast forwarding.

Page 600: NetWork

Source-Specific Multicast

• Multicasting on a WAN has been difficult to deploy for several reasons.– The biggest problem is that the MRP; needs to get the data from all

the senders, which may be located anywhere in the network, to all the receivers, which may similarly be located anywhere.

– Another large problem is multicast address allocation: There are not enough IPv4 multicast addresses to statically assign them to everyone who wants one, as is done with unicast addresses.

Page 601: NetWork

Source-Specific Multicast

• combines the group address with a system's source address, which solves the problems as follows:

– The receivers supply the sender's source address to the routers as part of joining the group.

– This removes the rendezvous problem from the network, as the network now knows exactly where the sender is.

– However, it retains the scaling properties of not requiring the sender to know who all the receivers are. This simplifies multicast routing protocols immensely.

• It redefines the identifier from simply being a multicast group address to being a combination of a unicast source and multicast destination (which SSM now calls a channel.

• An SSM session is the combination of source, destination, and port

Page 602: NetWork
Page 603: NetWork
Page 604: NetWork

• struct ip_mreq {• struct in_addr imr_multiaddr; /* IPv4 class D multicast addr */• struct in_addr imr_interface; /* IPv4 addr of local interface */• };

• struct ipv6_mreq {• struct in6_addr ipv6mr_multiaddr; /* IPv6 multicast addr */• unsigned int ipv6mr_interface; /* interface index, or 0 */• };

• struct group_req {• unsigned int gr_interface; /* interface index, or 0 */• struct sockaddr_storage gr_group; /* IPv4 or IPv6 multicast addr */• }

Page 605: NetWork

struct ip_mreq_source { struct in_addr imr_multiaddr; /* IPv4 class D multicast addr */ struct in_addr imr_sourceaddr; /* IPv4 source addr */ struct in_addr imr_interface; /* IPv4 addr of local interface */};

struct group_source_req { unsigned int gsr_interface; /* interface index, or 0 */ struct sockaddr_storage gsr_group; /* IPv4 or IPv6 multicast addr */ struct sockaddr_storage gsr_source; /* IPv4 or IPv6 source addr */}

Page 606: NetWork

Multicasting 609

Multicast Socket Options

• Use setsockopt() to modify socket options– IP_ADD_MEMBERSHIP

• Join a multicast group on a specified local interface– IP_DROP_MEMBERSHIP

• Leave a multicast group– IP_MULTICAST_IF

• Specify the interface for outgoing multicast datagrams sent on this socket– IP_MULTICAST_TTL

• Set the IPv4 TTL parameter (if not specified, default=1)– IP_MULTICAST_LOOP

• Enable or disable local loopback (default is enabled)

Page 607: NetWork

Multicasting 610

• IPv4 Class D addresses are multicast addresses– Range 224.0.0.0 to 239.255.255.255

– 32 bit Class D address is called the group address

0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7

0 NET-ID(7b) HOST-ID (24b)

1 0 NET-ID (12b) HOST-ID (14b)

1 1 0 NET-ID (21b) HOST-ID (8b)

1 1 1 0 GROUP-ID (28b)

CLASS A:

CLASS B:

CLASS C:

CLASS D:

Multicasting

Page 608: NetWork

Multicasting 611

• A mapping from IPv4 multicast addresses to Ethernet addresses is also defined– High order 24 bits always 01:00:5e– 25th bit is 0– Low order 23 bits from lowest 23 bits of multicast group address– Not one-to-one, many (32) multicast addresses to a single Ethernet

address

• Broadcasting is normally limited to LANs, whereas Multicasting can be done in LANs or WANs

Page 609: NetWork

multicast address• IPv4 class D address

– 224.0.0.0 ~ 239.255.255.255 – (224.0.0.1: all hosts group), (224.0.0.2: all-routers group)

Page 610: NetWork

Multicast Addresses Scope

Page 611: NetWork

Multicast Session

• Especially in the case of streaming multimedia, the combination of an IP multicast address (either IPv4 or IPv6) and a transport-layer port (typically UDP) is referred to as a session.

• For example, an audio/video teleconference may comprise two sessions; one for audio and one for video. These sessions almost always use different ports and sometimes also use different groups for flexibility in choice when receiving.

Page 612: NetWork

Multicasting 615

Multicast vs Broadcast

SendingAppl

UDP

IPv4

DataLink

UDP

IPv4

DataLink

ReceivingAppl

UDP

IPv4

DataLink

subnet 128.7.6

sendtoDest IP: 224.0.1.1Dest Port: 123

02:60:8c:2f:4e:00

Enethdr

IPv4hdr

UDPhdr

UDPData

Dest Enet: 01:00:5e:00:01:01Frame type: 0800

Dest IP: 224.0.1.1Protocol: UDP

Dest Port: 123

02:60:20:03:f6:42

123

Frame type= 0800

Protocol=UDP

Port=123 join

224.0.1.1

receive01:00:5e:00:01:01

Imperfect hw filteringbased on dest Enet

Perfect sw filteringbased on dest IP

Page 613: NetWork

Multicasting 616

Multicasting on a WAN

MR1

MR2 MR3

MR5

MR4

Page 614: NetWork

Multicasting 617

Hosts joining a Multicast Group

MR1

MR2 MR3

MR5

MR4

H1

H2 H3 H4 H5

joingroup

joingroup

joingroup

joingroup

joingroup

MRPMRP MRP

MRP

Page 615: NetWork

Multicasting 618

Sending packets on a WAN

MR1

MR2 MR3

MR5

MR4

H1

H2 H3 H4 H5

joingroup

joingroup

joingroup

joingroup

joingroup

Page 616: NetWork

Multicasting 619

Multicasting

• Specifically note that;– All interested multicast routers receive the packets, MR5 does not

receive any since there are no interested hosts in its LAN– Packets are put to the specific LAN only if there are hosts in that LAN

to receive those packets, MR3 only forwards– Multicast router MR2 both puts packets on its LAN for hosts H2 & H3,

and also makes a copy of the packets and forwards them to MR3.– This behavior is something unique to multicast forwarding.

Page 617: NetWork

Source-Specific Multicast

• Multicasting on a WAN has been difficult to deploy for several reasons.– The biggest problem is that the MRP; needs to get the data from all

the senders, which may be located anywhere in the network, to all the receivers, which may similarly be located anywhere.

– Another large problem is multicast address allocation: There are not enough IPv4 multicast addresses to statically assign them to everyone who wants one, as is done with unicast addresses.

Page 618: NetWork

Source-Specific Multicast

• combines the group address with a system's source address, which solves the problems as follows:

– The receivers supply the sender's source address to the routers as part of joining the group.

– This removes the rendezvous problem from the network, as the network now knows exactly where the sender is.

– However, it retains the scaling properties of not requiring the sender to know who all the receivers are. This simplifies multicast routing protocols immensely.

• It redefines the identifier from simply being a multicast group address to being a combination of a unicast source and multicast destination (which SSM now calls a channel.

• An SSM session is the combination of source, destination, and port

Page 619: NetWork
Page 620: NetWork
Page 621: NetWork

• struct ip_mreq {• struct in_addr imr_multiaddr; /* IPv4 class D multicast addr */• struct in_addr imr_interface; /* IPv4 addr of local interface */• };

• struct ipv6_mreq {• struct in6_addr ipv6mr_multiaddr; /* IPv6 multicast addr */• unsigned int ipv6mr_interface; /* interface index, or 0 */• };

• struct group_req {• unsigned int gr_interface; /* interface index, or 0 */• struct sockaddr_storage gr_group; /* IPv4 or IPv6 multicast addr */• }

Page 622: NetWork

struct ip_mreq_source { struct in_addr imr_multiaddr; /* IPv4 class D multicast addr */ struct in_addr imr_sourceaddr; /* IPv4 source addr */ struct in_addr imr_interface; /* IPv4 addr of local interface */};

struct group_source_req { unsigned int gsr_interface; /* interface index, or 0 */ struct sockaddr_storage gsr_group; /* IPv4 or IPv6 multicast addr */ struct sockaddr_storage gsr_source; /* IPv4 or IPv6 source addr */}

Page 623: NetWork

Multicasting 626

Multicast Socket Options

• Use setsockopt() to modify socket options– IP_ADD_MEMBERSHIP

• Join a multicast group on a specified local interface– IP_DROP_MEMBERSHIP

• Leave a multicast group– IP_MULTICAST_IF

• Specify the interface for outgoing multicast datagrams sent on this socket– IP_MULTICAST_TTL

• Set the IPv4 TTL parameter (if not specified, default=1)– IP_MULTICAST_LOOP

• Enable or disable local loopback (default is enabled)

Page 624: NetWork

Distributed Program Design

• Communication-Oriented Design– Design protocol first. – Build programs that adhere to the protocol.

• Application-Oriented Design– Build application(s).– Divide programs up and add communication protocols.

Typical Typical

SocketsSockets

ApproachApproach

RPCRPC

Page 625: NetWork

RPCRemote Procedure Call

• Call a procedure (subroutine) that is running on another machine.

• Issues:– identifying and accessing the remote procedure– parameters– return value

Page 626: NetWork

blah, blah, blah

bar = foo(a,b);

blah, blah, blah

int foo(int x, int y ) { if (x>100)

return(y-2); else if (x>10)

return(y-x); else

return(x+y);}

ClientClientServerServer

protocol

Remote Subroutine

Page 627: NetWork

Sun RPC

• There are a number of popular RPC specifications.• Sun RPC (ONC RPC) is widely used.• NFS (Network File System) is RPC based.• Rich set of support tools.

Page 628: NetWork

Sun RPC Organization

Procedure 1Procedure 1 Procedure 2Procedure 2 Procedure 3Procedure 3

Shared Global DataShared Global Data

Remote ProgramRemote Program

Page 629: NetWork

Procedure Arguments

• To reduce the complexity of the interface specification, Sun RPC includes support for a single argument to a remote procedure.*

• Typically the single argument is a structure that contains a number of values.

* Newer versions can handle multiple args.

Page 630: NetWork

Procedure Identification

• Each procedure is identified by:– Hostname (IP Address)– Program identifier (32 bit integer)– Procedure identifier (32 bit integer)

– Program Version identifier• for testing and migration.

Page 631: NetWork

Program Identifiers

• Each remote program has a unique ID.• Sun divided up the IDs:

0x00000000 - 0x1fffffff0x20000000 - 0x3fffffff0x40000000 - 0x5fffffff0x60000000 - 0xffffffff

SunSun

SysAdmin SysAdmin

TransientTransient

ReservedReserved

Page 632: NetWork

Procedure Identifiers &Program Version Numbers

• Procedure Identifiers usually start at 1 and are numbered sequentially

• Version Numbers typically start at 1 and are numbered sequentially.

Page 633: NetWork

Iterative Server

• Sun RPC specifies that at most one remote procedure within a program can be invoked at any given time.

• If a 2nd procedure is called, the call blocks until the 1st procedure has completed.

Page 634: NetWork

Iterative can be good

• Having an iterative server is useful for applications that may share data among procedures.

• Example: database - to avoid insert/delete/modify collisions.

• We can provide concurrency when necessary...

Page 635: NetWork

Call Semantics

• What does it mean to call a local procedure?– the procedure is run exactly one time.

• What does it mean to call a remote procedure?– It might not mean "run exactly once"!

Page 636: NetWork

Remote Call Semantics

• To act like a local procedure (exactly one invocation per call) - a reliable transport (TCP) is necessary.

• Sun RPC does not support reliable call semantics. !• "At Least Once" Semantics• "Zero or More" Semantics

Page 637: NetWork

Sun RPC Call Semantics

• At Least Once Semantics– if we get a response (a return value)

• Zero or More Semantics– if we don't hear back from the remote subroutine.

Page 638: NetWork

Remote Procedure deposit()

deposit(DavesAccount,$100)

• Always remember that you don't know how many times the remote procedure was run!– The net can duplicate the request (UDP).

Page 639: NetWork

Network Communication

• The actual network communication is nothing new - it's just TCP/IP.

• Many RPC implementations are built upon the sockets library.– the RPC library does all the work!

• We are just using a different API, the underlying stuff is the same!

Page 640: NetWork

Dynamic Port Mapping

• Servers typically do not use well known protocol ports!

• Clients know the Program ID (and host IP address).

• RPC includes support for looking up the port number of a remote program.

Page 641: NetWork

Port Lookup Service

• A port lookup service runs on each host that contains RPC servers.

• RPC servers register themselves with this service:– "I'm program 17 and I'm looking for requests on port 1736"

Page 642: NetWork

The portmapper

• Each system which will support RPC servers runs a port mapper server that provides a central registry for RPC services.

• Servers tell the port mapper what services they offer.

Page 643: NetWork

More on the portmapper

• Clients ask a remote port mapper for the port number corresponding to Remote Program ID.

• The portmapper is itself an RPC server!

• The portmapper is available on a well-known port (111).

Page 644: NetWork

Sun RPC Programming

• The RPC library is a collection of tools for automating the creation of RPC clients and servers.

• RPC clients are processes that call remote procedures.

• RPC servers are processes that include procedure(s) that can be called by clients.

Page 645: NetWork

RPC Programming

• RPC library– XDR routines– RPC run time library

• call rpc service• register with portmapper• dispatch incoming request to correct procedure

– Program Generator

Page 646: NetWork

RPC Run-time Library

• High- and Low-level functions that can be used by clients and servers.

• High-level functions provide simple access to RPC services.

Page 647: NetWork

High-level Client Library

int callrpc( char *host,u_long prognum,u_long versnum,u_long procnum,xdrproc_t inproc,char *in,xdrproc_t outproc,char *out);

Page 648: NetWork

High-Level Server Library

int registerrpc(u_long prognum,u_long versnum,u_long procnum,char *(*procname)()xdrproc_t inproc,xdrproc_t outproc);

Page 649: NetWork

High-Level Server Library (cont.)

void svc_run();

• svc_run() is a dispatcher. • A dispatcher waits for incoming connections and

invokes the appropriate function to handle each incoming request.

Page 650: NetWork

High-Level Library Limitation

• The High-Level RPC library calls support UDP only (no TCP).

• You must use lower-level RPC library functions to use TCP.

• The High-Level library calls do not support any kind of authentication.

Page 651: NetWork

Low-level RPC Library

• Full control over all IPC options– TCP & UDP– Timeout values– Asynchronous procedure calls

• Multi-tasking Servers• Broadcasting

IPC is InterProcess Communication

Page 652: NetWork

RPCGEN

• There is a tool for automating the creation of RPC clients and servers.

• The program rpcgen does most of the work for you.• The input to rpcgen is a protocol definition in the

form of a list of remote procedures and parameter types.

Page 653: NetWork

RPCGEN

Input File

rpcgen

Client Stubs XDR filters header file Server skeleton

C Source CodeC Source Code

ProtocolProtocolDescriptionDescription

Page 654: NetWork

rpcgen Output Files

> rpcgen –C foo.x

foo_clnt.c (client stubs)foo_svc.c (server main)foo_xdr.c (xdr filters)foo.h (shared header file)

Page 655: NetWork

Client Creation

> gcc -o fooclient foomain.c foo_clnt.c foo_xdr.c -lnsl

• foomain.c is the client main() (and possibly other functions) that call rpc services via the client stub functions in foo_clnt.c

• The client stubs use the xdr functions.

Page 656: NetWork

Server Creation

gcc -o fooserver fooservices.c foo_svc.c foo_xdr.c –lrpcsvc -lnsl

• fooservices.c contains the definitions of the actual remote procedures.

Page 657: NetWork

Example Protocol Definitionstruct twonums {

int a;int b;

};program UIDPROG {

version UIDVERS {int RGETUID(string<20>) = 1;string RGETLOGIN( int ) = 2;int RADD(twonums) = 3;

} = 1;} = 0x20000001;

Page 658: NetWork

RPC Programming with rpcgen

Issues:– Protocol Definition File– Client Programming

• Creating an "RPC Handle" to a server• Calling client stubs

– Server Programming• Writing Remote Procedures

Page 659: NetWork

Protocol Definition File

• Description of the interface of the remote procedures.– Almost function prototypes

• Definition of any data structures used in the calls (argument types & return types)

• Can also include shared C code (shared by client and server).

Page 660: NetWork

XDR the language

• Remember that XDR data types are not C data types!– There is a mapping from XDR types to C types – that's most

of what rpcgen does.

• Most of the XDR syntax is just like C– Arrays, strings are different.

Page 661: NetWork

XDR Arrays

• Fixed Length arrays look just like C code:int foo[100]

• Variable Length arrays look like this:

int foo<> or int foo<MAXSIZE>

Implicit maximum size is 232-1

Page 662: NetWork

What gets sent on the network

int x[n]

x0 x1

int y<m>int y<m>

xn-1x2 . . .

y0 y1 . . .k

k is actual array sizek my2 yk

Page 663: NetWork

XDR String Type

• Look like variable length arrays:string s<100>

• What is sent: length followed by sequence of ASCII chars:

. . .n s0s1s2s3 Sn-1

n is actual string length (sent as int)

Page 664: NetWork

Linked Lists!struct foo { int x; foo *next;}

The generated XDR filter uses xdr_pointer() to encode/decode the stuff pointed to by a pointer.

Check the online example "linkedlist".

rpcgen recognizes this as a linked list

Page 665: NetWork

Declaring The Program

program SIMP_PROG { version SIMP_VERSION { type1 PROC1(operands1) = 1; type2 PROC2(operands2) = 2; } = 1;} = 40000000;

Keywords Generated Symbolic ConstantsUsed to generate stub and procedure names

Color Code:

Page 666: NetWork

Procedure Numbers

• Procedure #0 is created for you automatically.– Start at procedure #1!

• Procedure #0 is a dummy procedure that can help debug things (sortof an RPC ping server).

Page 667: NetWork

Procedure NamesRpcgen converts to lower case and prepends underscore

and version number:rtype PROCNAME(arg)

Client stub:rtype *proc_1(arg *, CLIENT *);

Server procedure: rtype *proc_1_svc(arg *, struct svc_req *);

Page 668: NetWork

Program Numbers

• Use something like:555555555 or 22222222

• You can find the numbers currently used with "rpcinfo –p hostname"

Page 669: NetWork

Client Programming

• Create RPC handle. – Establishes the address of the server.

• RPC handle is passed to client stubs (generated by rpcgen).

• Type is CLIENT *

Page 670: NetWork

clnt_create

CLIENT *clnt_create(char *host,u_long prog, u_long vers,char *proto);

Hostname of server

Program number

Version number

Can be "tcp" or "udp"

Page 671: NetWork

Calling Client Stubs

• Remember:– Return value is a pointer to what you expect.– Argument is passed as a pointer.– If you are passing a string, you must pass a char**

• When in doubt – look at the ".h" file generated by rpcgen

Page 672: NetWork

Server Procedures

• Rpcgen writes most of the server.• You need to provide the actual remote procedures.• Look in the ".h" file for prototypes.• Run "rpcgen –C –Ss" to generate (empty) remote

procedures!

Page 673: NetWork

Server Function Names

• Old Style (includes AIX): Remote procedure FOO, version 1 is named foo_1()

• New Style (includes Sun,BSD,Linux): Remote procedure FOO, version 1 is named foo_1_svc()

Page 674: NetWork

Running rpcgen

• Command line options vary from one OS to another.• Sun/BSD/Linux – you need to use "-C" to get ANSI C

code!• Rpcgen can help write the files you need to write:

– To generate sample server code: "-Ss"– To generate sample client code: "-Sc"

Page 675: NetWork

Other porting issues

• Shared header file generated by rpcgen may have: #include <rpc/rpc.h>

• Or Not!

Page 676: NetWork

RPC without rpcgen

• Can do asynchronous RPC– Callbacks– Single process is both client and server.

• Write your own dispatcher (and provide concurrency)• Can establish control over many network parameters:

protocols, timeouts, resends, etc.

Page 677: NetWork

rpcinforpcinfo –p host prints a list of all registered

programs on host.

rpcinfo –[ut] host program# makes a call to procedure #0 of the specified RPC program (RPC ping).

u : UDPt : TCP

Page 678: NetWork

Sample Code

• simple – integer add and subtract• ulookup – look up username and uid.• varray – variable length array example.• linkedlist – arg is linked list.

Page 679: NetWork

Example simp

• Standalone program simp.c– Takes 2 integers from command line and prints out the sum

and difference.– Functions:

int add( int x, int y );int subtract( int x, int y );

Page 680: NetWork

Splitting simp.c

• Move the functions add() and subtract() to the server.

• Change simp.c to be an RPC client– Calls stubs add_1() , subtract_1()

• Create server that serves up 2 remote procedures – add_1_svc() and subtract_1_svc()

Page 681: NetWork

Protocol Definition: simp.xstruct operands { int x; int y;};

program SIMP_PROG { version SIMP_VERSION { int ADD(operands) = 1; int SUB(operands) = 2; } = VERSION_NUMBER;} = 555555555;

Page 682: NetWork

rpcgen –C simp.xsimp.x

rpcgen

simp_clnt.csimp_clnt.csimp_xdr.csimp_xdr.c

simp.hsimp.hsimp_svc.csimp_svc.cClient Stubs

XDR filtersheader file

Server skeleton

Page 683: NetWork

xdr_operands XDR filterbool_t xdr_operands( XDR *xdrs,

operands *objp){

if (!xdr_int(xdrs, &objp->x)) return (FALSE); if (!xdr_int(xdrs, &objp->y)) return (FALSE); return (TRUE);}

Page 684: NetWork

simpclient.c

• This was the main program – is now the client.• Reads 2 ints from the command line.• Creates a RPC handle.• Calls the remote add and subtract procedures.• Prints the results.

Page 685: NetWork

simpservice.c

• The server main is in simp_svc.c.• simpservice.c is what we write – it holds the add

and subtract procedures that simp_svc will call when it gets RPC requests.

• The only thing you need to do is to match the name/parameters that simp_svc expects (check simp.h!).

Page 686: NetWork

Raw Sockets

Page 687: NetWork

Raw Sockets

Page 688: NetWork

IP address

Port address

MAC address

TCP/IP Stack

67

Bootp

DHCP

176

2

OSPF89

53

protocol

frametype

UDPPort #

TCPPort #

1

EGP8

IPv641

16125 23 6921

Page 689: NetWork

User TCP

ICMP UDP stackTCP stack

6

17 UDP6 TCP1 ICMP2 IGMP

89 OSPF

TCP

port

port

TCP

port

17

UDP

port

port

RAW

2

1

89

User UDPICMP (ping, etc)

RAW

IGMP

echotimestamp

Page 690: NetWork

What can raw sockets do?

• Bypass TCP/UDP layers• Read and write ICMP and IGMP packets

– ping, traceroute, multicast routing daemon• Read and write IP datagrams with an IP protocol field not processed by the

kernel– OSPF

• Send and receive your own IP packets with your own IP header using the IP_HDRINCL socket option

– can build and send TCP and UDP packets– testing, hacking– only superuser can create raw socket though

• You need to do all protocol processing at user-level

Page 691: NetWork

RAW SOCKETS 694

Creating Raw Sockets• Only Superuser can create• socket(AF_INET, SOCK_RAW, protocol)

– where protocol is one of the constants, IPPROTO_xxx, such as IPPROTO_ICMP.

• bind can be called on the raw socket, but this is rare. This function sets only the local address: There is no concept of a port number with a raw socket.

• connect can be called on the raw socket, but this is rare. This function sets only the foreign address: Again, there is no concept of a port number with a raw socket.

Page 692: NetWork

RAW SOCKETS 695

Creating Raw Sockets: IP Header option

• The IP_HDRINCL socket option can be set as follows:

• const int on = 1; • if (setsockopt(sockfd, IPPROTO_IP, IP_HDRINCL, &on,

sizeof(on)) < 0) error

Page 693: NetWork

RAW SOCKETS 696

Raw Socket Output• Normal output is performed by calling sendto or sendmsg and specifying

the destination IP address– write, writev, or send can also be called if the socket has been connected.

• If the IP_HDRINCL option is not set, kernel prepends the IP header – The kernel sets the protocol field of the IPv4 header that it builds to the third

argument from the call to socket.• If the IP_HDRINCL option is set, the starting address of the data for the

kernel to send specifies the first byte of the IP header. – The process builds the entire IP header, except: (i) the IPv4 identification field

can be set to 0, which tells the kernel to set this value; (ii) the kernel always calculates and stores the IPv4 header checksum; and (iii) IP options may or may not be included

• The kernel fragments raw packets that exceed the outgoing interface MTU.

Page 694: NetWork

RAW SOCKETS 697

Raw Socket Input• Which received IP datagrams does the kernel pass to raw sockets?• Received UDP packets and received TCP packets are never passed to a raw

socket. – read at the datalink layer

• Most ICMP packets are passed to a raw socket after the kernel has finished processing the ICMP message.

– Except echo request, timestamp request, and address mask request • All IGMP packets are passed to a raw socket after the kernel has finished

processing the IGMP message.• All IP datagrams with a protocol field that the kernel does not understand are

passed to a raw socket. • If the datagram arrives in fragments, nothing is passed to a raw socket until all

fragments have arrived and have been reassembled.

Page 695: NetWork

RAW SOCKETS 698

Raw Socket Input• When the kernel has an IP datagram, all raw sockets for all processes are

examined, looking for all matching sockets. • A copy of the IP datagram is delivered to each matching socket. • The following tests are performed for each raw socket and only if all three tests

are true is the datagram delivered to the socket:– If a nonzero protocol is specified, protocol field must match– If a local IP address is bound to the raw socket by bind, then the destination IP

address of the received datagram must match – If a foreign IP address was specified for the raw socket by connect, then the source IP

address of the received datagram must match • Notice that if a raw socket is created with a protocol of 0, and neither bind nor

connect is called, then that socket receives a copy of every raw datagram the kernel passes to raw sockets.

Page 696: NetWork

RAW SOCKETS 699

Raw Socket Input

• Whenever a received datagram is passed to a raw IPv4 socket, the entire datagram, including the IP header, is passed to the process

• For a raw IPv6 socket, only the payload (i.e., no IPv6 header or any extension headers) is passed to the socket

Page 697: NetWork

RAW SOCKETS 700

Raw Socket Input

• Whenever a received datagram is passed to a raw IPv4 socket, the entire datagram, including the IP header, is passed to the process

• For a raw IPv6 socket, only the payload (i.e., no IPv6 header or any extension headers) is passed to the socket

Page 698: NetWork

RAW SOCKETS 701

Example: Ping Program

• Send an ICMP echo request to some IP address and receive an ICMP echo reply.

• #ping 172.10.1.3• Ping 172.10.1.3: 56 bytes of data• Reply from 172.10.1.3: bytes=56 time<10ms ttl=255

• … (4 replies)

Not active : Request Timeout

Page 699: NetWork

RAW SOCKETS 702

ICMP Message

• set the identifier to the PID of the ping process and we increment the sequence number by one for each packet we send

• We store the 8-byte timestamp of when the packet is sent as the optional data. The rules of ICMP require that the identifier, sequence number, and any optional data be returned in the echo reply.

• Storing the timestamp in the packet lets us calculate the RTT when the reply is received.

Page 700: NetWork

RAW SOCKETS 703

ICMP Message

Page 701: NetWork

RAW SOCKETS 704

ICMP Echo Message

Page 702: NetWork

RAW SOCKETS 705

ICMP Echo Message

Page 703: NetWork

RAW SOCKETS 706

ICMP Echo Message

Page 704: NetWork

RAW SOCKETS 707

main

Read loop

recvfrom Proc_v4

Infinite receive loop

Sig_Alrm

Send_v4

Send an echo request once a second

Ping Program

Page 705: NetWork

RAW SOCKETS 708

Traceroute Example

• Determines the path IP datagrams follow• Uses TTL field(IPv4) or hop limit(IPv6) and two ICMP messages• One UDP datagram is sent by the host with TTL=1 to the destination• 1st hop router sends an ICMP “time exceed in transit” error• TTL is increased to 2, and another datagram is sent• Process repeats with a final datagram with a port number not in use on

the destination, so that destination can send “ICMP port unreachable” error

Page 706: NetWork

RAW SOCKETS 709

Page 707: NetWork

DATALINK ACCESS 710

Datalink Access

• Uses– Watch packets on the interface– Programs can be run as applications than as part of kernel

• Ways to access the datalink – BSD Packet Filter– Datalink Provider Interface– Linux SOL_PACKET interface

Public library: libpcap

Page 708: NetWork

DATALINK ACCESS 711

BSD Packet filter

IPv4 IPv6

datalinkBPF

filter

buffer

application application

buffer

filter

Writing is not frequent. Why?

process

kernel

Filters: tcp, udp, tcp[15:1] 1 byte starting at offset 15

Page 709: NetWork

DATALINK ACCESS 712

BPF reduces its’ overhead by

1. Filtering is within the kernel2. Only a part of each packet is transmitted3. Uses buffering for both read and write to reduce

number of system calls.

Accessing a BPF: Open a BPF device, Use ioctl to set the properties likeLoad the filter, set read timeout, set buffer size, attach a DL to BPF, enable

Promiscuous mode etc.

Page 710: NetWork

DATALINK ACCESS 713

Linux : SOCK_PACKET

• Superuser privileges are required

• Fd =socket(AF_INET, SOCK_PACKET, htons (ETH_P_ALL))

ETH_P_IP, ETH_P_ARP, ETH_IP_IPV6

Disadvantages:1. No kernel buffering, hence, more system calls2. No device filtering, hence, ETH_IP_P will givepackets from Ethernet, PPP, SLIP links, and loop

back devices

Page 711: NetWork

ICMP Format

subtype

Page 712: NetWork

Ping Program

• Create a raw socket to send/receive ICMP echo request and echo reply packets

• Install SIGALRM handler to process output– Sending echo request packets every t second– Build ICMP packets (type, code, checksum, id,

seq, sending timestamp as optional data)• Enter an infinite loop processing input

– Use recvmsg() to read from the network– Parse the message and retrieve the ICMP packet– Print ICMP packet information, e.g., peer IP

address, round-trip time

Page 713: NetWork

Traceroute program

• Create a UDP socket and bind source port– To send probe packets with increasing TTL– For each TTL value, use timer to send a probe every three seconds,

and send 3 probes in total• Create a raw socket to receive ICMP packets

– If timeout, printing “ *”– If ICMP “port unreachable”, then terminate– If ICMP “TTL expired”, then printing hostname of the router and round

trip time to the router

Page 714: NetWork

ISZC462

Lecture#8

Page 715: NetWork

Problem 1

• This problem is about implementing a local chat server and client in a system. The server and client will facilitate the communication between multiple users of the system. You should submit client_idno.c and server_idno.c for client and server respectively.

• The chat server supports the following functionalities.• let us say currently users B, C and D have entered chat server. Then user A joins chat.

Server will tell all the current chatters B, C and D: ‘A just joined’ – command: connect <username>

• A can say a message to every one “Hello! Everyone!” or A can whisper a message to C alone ‘I want to tell a secret to you’. So server should facilitate one to all and one to one communication.

– Command: talk * //to talk to all chatters– Command: talk <username> to talk to one user

• A can also get the list of all chatters.– Command: list

• A can disconnect from chat – Command: disconnect

Page 716: NetWork
Page 717: NetWork

Problem 2

• The server program should• start like ./server <path>• since it runs within the system, it should use either FIFO/Message Queues for

inter process communication.• use select() call for dealing with multiple users concurrently• The client program should • start like ./client <serverpath>• take care of interpreting commands entered by user. • process the command until Ctrl-D is pressed. When a user types and then

presses <ENTER>, that is the end of one message. But the program will still wait for the next message until user presses Ctrl-D (EOF for fgets()).

• the client is capable of handling the sending and receiving simultaneously. Any messages received while the user is typing the message to be sent, will be simply flashed on the console.

Page 718: NetWork
Page 719: NetWork

Problem 3

• A simple TCP based chat server could allow two users to use any TCP client (telnet, for example) to communicate with each other. Consider a single process, single thread server that can support exactly 2 clients at once, the server simply forwards whatever is sent from one client to the other (in both directions). As soon as something is sent from one client it is immediately forwarded to the other client. As soon as either client terminates the connection, the server exits. Provide server code with comments.

Page 720: NetWork
Page 721: NetWork
Page 722: NetWork

Problem 4

1. When the server starts it reads from a file having a list of domain names which are to be forbidden to access. When a HTTP request comes to server, http://discovery.bits-pilani.ac.in/index.html, it checks if the domain name “discovery.bits-pilani.ac.in” exists in the list. If it is, the server sends back HTTP error 403 Forbidden to the client. If not it sends the request to the actual server. When it gets the reply, it sends the reply to the client.

2. Your server takes a port number on the command line. It can be iterative server.

3. Your server will be tested with a browser.

Page 723: NetWork
Page 724: NetWork

• Suppose you are given a task of testing the validity of links in a given web page. You are expected to test each url present in the web page and report the result. URL is of this form:

• http://<domain name>/<directory1>/<directory2>/ … /<filename>• Testing URL for validity means to test the existence of domain name, and

existence of file in the given path on remote server.• To simplify the problem, you can take a list of URLs in a file; one url per

line. Your program takes this file name as command-line argument. Your program should read each URL and validate the URL. The result is one of {VALID, INCORRECT DOMAIN, FILE DOESNT EXIST}. Your program should display the URL and result(s); each URL and its result per one line on console

Problem 5

Page 725: NetWork
Page 726: NetWork

Problem 6

Consider the following network. There are n nodes connected in a ring topology. The communication to any node in the network happens in clock-wise direction i.e. through the next node. Each node shares a set of files with it.

The nodes communicate using SUN RPC . When a node joins the network it invokes connectMe() on the next node and the previous node. The next node and previous node addresses are supplied as CLA. When a node searches for a file, it invokes

void* search(Node n, char* filename){If search is successful then

Return the result setElse

return search(nextNode(n), filename);}Write the protocol file. Take help of rpcgen. Develop rpcclient and rpcserver. Demonstration

should have all communications printed on the console indication the ip, port, file etc.

Page 727: NetWork
Page 728: NetWork
Page 729: NetWork

ISZC462

Tutorial 2

Page 730: NetWork

EC1 solutions

Page 731: NetWork

Q1

• Write a TCP client and server programs for the following. The connection between client and server is persistent i.e. multiple requests are sent on the same connection. The client sends N integers to server. The server sums up all of them and sends the result back to the client. The server handles the clients concurrently. Also the server avoids zombies processes to hang around. [10]

Page 732: NetWork

Q1 AnsProtocol:

Client server: 4 bytes: N, 4 bytes: 1st int, 4 bytes: 2nd int, … until last integerServer client: 4 bytes: result

/*Client.c*/void error(char *msg){ perror(msg); exit(0);}int main(int argc, char *argv[]){ int sockfd, portno, n; struct sockaddr_in serv_addr; struct hostent *server; char buffer[256]; if (argc < 3) { fprintf(stderr,"usage %s hostname port\n", argv[0]); exit(0); } portno = atoi(argv[2]); sockfd = socket(AF_INET, SOCK_STREAM, 0); if (sockfd < 0) error("ERROR opening socket"); server = gethostbyname(argv[1]); if (server == NULL) { fprintf(stderr,"ERROR, no such host\n"); exit(0); }

Page 733: NetWork

Q1 Ansbzero((char *) &serv_addr, sizeof(serv_addr)); serv_addr.sin_family = AF_INET; bcopy((char *)server->h_addr, (char *)&serv_addr.sin_addr.s_addr, server->h_length); serv_addr.sin_port = htons(portno); if (connect(sockfd,&serv_addr,sizeof(serv_addr)) < 0) error("ERROR connecting"); /*Protocol implementation*/ printf("Enter number of integers:"); scanf("%d", &N):while(N>0){ buf[0]=N; for(i=0;i<N'i++) { printf("Enter the %dth number:"); scanf("%d",&buf[i+1]);}

write(sockfd,buf,(N+1)*4); n=read(sockfd,&result, 4);if(n==0)

printf("Server terminted prematurely");printf("The result is: %d\n", result); printf("Enter number of integers(-1 to exit):"); scanf("%d", &N):}while(); return 0;}

Page 734: NetWork

Q1 Ans/*server.c*/voiderror (char *msg){ perror (msg); exit (1);}voidsigchldhandler (int signo){int pid; while ((pid = waitpid (-1, NULL, WNOHANG)) > 0);}intmain (int argc, char *argv[]){ int ret, i, N, val, sum; int sockfd, newsockfd, portno, clilen; char buffer[256]; struct sockaddr_in serv_addr, cli_addr; int n; signal (SIGCHLD, sigchldhandler); if (argc < 2) { fprintf (stderr, "ERROR, no port provided\n"); exit (1); } sockfd = socket (AF_INET, SOCK_STREAM, 0); if (sockfd < 0) error ("ERROR opening socket");

Page 735: NetWork

Q1 Ans bzero ((char *) &serv_addr, sizeof (serv_addr)); portno = atoi (argv[1]); serv_addr.sin_family = AF_INET; serv_addr.sin_addr.s_addr = INADDR_ANY; serv_addr.sin_port = htons (portno); if (bind (sockfd, (struct sockaddr *) &serv_addr, sizeof (serv_addr)) < 0) error ("ERROR on binding"); listen (sockfd, 5); for (;;) { clilen = sizeof (cli_addr); newsockfd = accept (sockfd, (struct sockaddr *) &cli_addr, &clilen); if (newsockfd < 0)

error ("ERROR on accept"); printf ("connection is accepted");

Page 736: NetWork

Q1 Ansret = fork ();

if (ret == 0)

{

close (sockfd);

n = read (newsockfd, &N, 4);

printf ("N=%d\n", N);

while (n > 0)

{

i = 0;

sum = 0;

while (i < N)

{

n = read (newsockfd, &val, 4);

printf ("val[%d]=%d\n", i, val);

if (n < 0)

error ("ERROR reading from socket");

sum = sum + val;

i++;

}

printf ("sum=%d\n", sum);

n = write (newsockfd, &sum, 4);

if (n < 0)

error ("ERROR writing to socket");

n = read (newsockfd, &N, 4);

}

return 0;

}

else if (ret > 0)

{

close (newsockfd);

continue;

}

}

}

Page 737: NetWork

Q2

1.Write a complete program to implement the shell command ls –l|grep ^d| wc –l that displays the number of sub directories in the current directory. Use system calls such as exec etc. and pipes for inter process communication. [8]

Page 738: NetWork

Q2 Ansmain (){ int pid, p1[2], p2[2]; pipe (p1); pipe (p2); pid = fork (); if (pid == 0) { pid = fork (); if (pid > 0)

{ close(p2[1]); dup2 (p2[0], 0); dup2 (p1[1], 1); wait (NULL); execlp ("grep", "grep","^d", NULL);}

else if (pid == 0){ dup2 (p2[1], 1); execlp ("ls", "ls", "-l", NULL);}

} else {

close(p2[1]); close(p1[1]);

dup2 (p1[0], 0); execlp ("wc", "wc", "-l", NULL); }}

Page 739: NetWork

Q3What is a connected UDP socket? How is it created?

What are the advantages of using it?

Connected UDP socket means that UDP layer remembers the association of local and remote end points. By default it doesn’t happen in UDP. This is achieved by calling connect() on the socket. The advantage is that asynchronous errors over the network will be informed to the process. Also there an be only one destination communicating with the socket. This provides security against spoofing.

Page 740: NetWork

Q3Normally whenever a socket is closed using close()

system call, TCP termination sequence is initiated. In a concurrent TCP server, when a server process closes the connection socket, the TCP termination is not initiated. Why?

Close() initiates the termination sequence only if the reference count of the socket descriptor reaches zero. When a new connection comes to the server, a child process is created. So the connection descriptor reference count is 2. if the parent closes the socket, it becomes 1. so the termination sequence doesn’t start.

Page 741: NetWork

Q3Why is a signal generated for the writer of a FIFO

after the reader disappears not for the reader of FIFO after its writer disappears?

cat Bigfile | grep pattern | computeif some error occurs in compute and it terminates, how

does the grep process will come to know about it. Since the filter program grep doesn't know and has no way of knowing that it's output has been redirected then the only way to tell it to stop writing to a broken pipe if ‘cmpute’ crashes is with a signal since return values of writes to STDOUT are rarely checked.

Page 742: NetWork

Q3Write two advantages of using message queues over pipes?

– message queues preserve message boundaries where as pipe are stream based

– in message queues, messages can be retrieved in any order. But in pipes data is invariably retrieved in FIFO order.

– message queues can be operated asynchronously where as pipes re strictly synchronous.

– message queues are full duplex where are as pipes are half duplex

Page 743: NetWork

Q4Write a program ‘myprogram’ that takes the executable

name and its arguments on the command line and executes it. Don’t use system() command.

$ myprogram exe arg1 arg2 arg3 ……..argn

main(int argc, char **argv){

execvp(argv[1], argv+1);}

Page 744: NetWork

Q4Write a piece of code that is necessary for creating and

mapping shared memory segment onto a process.

key = ftok ("shmget.c", 'R'); if ((shmid = shmget (key, 1024, 0644 | IPC_CREAT)) == -1) { perror ("shmget: shmget failed"); exit (1); } data = shmat (shmid, (void *) 0, 0);

Page 745: NetWork

Q5Consider the following program.#include <stdlib.h>int glob = 6;intmain (){ int var; pid_t pid; var = 88; if (!fork()) { glob++; var++; printf ("Child: pid = %d, glob=%d, var=%d\n", getpid (), glob, var); } glob++; var++; printf ("pid = %d, glob=%d, var=%d\n", getpid (), glob, var); exit (0);}

Page 746: NetWork

Q5 AnsWrite the output of the above program? Assume appropriate logical pids for parent and child.[3]

pid = 11710, glob=7, var=89pid = 11710, glob=8, var=90pid = 11709, glob=7, var=89

Page 747: NetWork

Q5 AnsModify the above program such that child starts printing only after

parent has printed.void usr1_handler(int signo){

return;}int glob = 6;intmain (){ int var; pid_t pid; var = 88; pid=fork(); if (pid==0) { signal(SIGUSR1,usr1_handler); glob++; var++; pause(); printf ("pid = %d, glob=%d, var=%d\n", getpid (), glob, var); } if(pid>0) { glob++; var++; printf ("pid = %d, glob=%d, var=%d\n", getpid (), glob, var); kill(pid,SIGUSR1); int st; wait(&st); } exit (0);

Page 748: NetWork

Q5 AnsModify the above program such that parent waits for the child to exit and prints the

child’s status. int glob = 6;intmain (){ int var; pid_t pid; var = 88; pid=fork(); if (pid==0) { glob++; var++; pause(); printf ("pid = %d, glob=%d, var=%d\n", getpid (), glob, var); } if(pid>0) {int st; wait(&st); glob++; var++; printf ("pid = %d, glob=%d, var=%d\n", getpid (), glob, var); } exit (0);