Inter-Process Communications (IPC)esb/2018fall.ics332/sep17.pdf · Inter-Process Communications (IPC) ICS332 | Operating Systems Henri Casanova ([email protected]) Spring 2018 Henri

Inter-Process Communications (IPC)ICS332 — Operating Systems

Henri Casanova ([email protected])

Spring 2018

Henri Casanova ([email protected]) Inter-Process Communications (IPC)

Communicating Processes?

So far we have seen independent processes

Each process runs code independentlyParents and aware of their children, and children are aware oftheir parents, but they do not interact

Besides the ability to wait for a process termination

But often we need processes to cooperate

To share information (e.g., access to common data)To speed up computation (e.g., to use multiple cores)Because it’s convenient (e.g., some applications are naturallyimplemented as sets of interacting processes)

In general, the means of communication between cooperatingprocesses is called Inter-Process Communication (IPC)


















Communication Models

Example: Process A needs to communicate with Process B

Kernel

Process A

Process B

AvailableMemory

Message Passing

M

M

M

Kernel

Process A

Process B

AvailableMemory

Shared Memory

Shared Memory




Kernel

Process A

Process B

AvailableMemory

Message Passing

M

M

M

Kernel

Process A

Process B

AvailableMemory

Shared Memory

Shared Memory




Kernel

Process A

Process B

AvailableMemory

Message Passing

M

M

M

Kernel

Process A

Process B

AvailableMemory

Shared Memory

Shared Memory




Kernel

Process A

Process B

AvailableMemory

Message Passing

M

M

M

Kernel

Process A

Process B

AvailableMemory

Shared Memory

Shared Memory




Kernel

Process A

Process B

AvailableMemory

Message Passing

M

M

M

Kernel

Process A

Process B

AvailableMemoryShared Memory

Shared Memory




Kernel

Process A

Process B

AvailableMemory

Message Passing

M

M

M

Kernel

Process A

Process B

AvailableMemoryShared Memory

Shared Memory


Pros and ConsMessage Passing

Performed through the kernelmemory space

Simple to implement(pre-defined region inmemory)

Limited by kernel size ⇒Small messages

One system call percommunication operation,i.e., one send, one receive:high overhead

Cumbersome for developers:code will be sprinkled withsend and receive everywhere

Shared Memory

Performed using availablememory

Large messages allowed (onlylimited by physical memory)

Violates the principle ofmemory protection betweenprocesses

More difficult to implement:processes need to be aware ofthe shared memory region’slocation

Easy for developers: A fewsystem calls to allocate theshared region, and then justread/write to it








Shared Memory













Shared Memory













Shared Memory













Shared Memory













Shared Memory













Shared Memory













Shared Memory













Shared Memory













Shared Memory







Message Passing

There are two fundamental system calls: send and receive

Although we’re talking about communication within amachine here, many of the design options are similar to somequestions in the field of networking:

Fixed or variable length messages, ...Uni-directional or bi-directional link?Automatic or explicit buffering?Direct or indirect communication?Synchronous or asynchronous communication?x...

Picking options above is about making trade-offs between:

Ease of implementation in the kernel (will it be bug-free andmaintainable?)Convenience to users (will they like using it)Expressiveness (can users do everything they want with it?)Performance (is it fast? is it memory-efficient?)


Message Passing

There are two fundamental system calls: send and receive

Although we’re talking about communication within amachine here, many of the design options are similar to somequestions in the field of networking:

Fixed or variable length messages, ...Uni-directional or bi-directional link?Automatic or explicit buffering?Direct or indirect communication?Synchronous or asynchronous communication?x...

Picking options above is about making trade-offs between:

Ease of implementation in the kernel (will it be bug-free andmaintainable?)Convenience to users (will they like using it)Expressiveness (can users do everything they want with it?)Performance (is it fast? is it memory-efficient?)


A Word on API Design

In your professional lives you’ll use and define APIsYou probably already have encountered APIs that you liked,and APIs that you disliked?It’s often not easy to pinpoint the flaws of an API

API design has deep implications (difficult to foresee problems;dramatic snowball effect; can lead to costly full rewrites; ...)

It is thus worth spending a lot of time defining good APIsBeing good at designing APIs (and thus abstractions) is aninvaluable skill, which comes with experience

Pedagogic challenge: Conveying to college students howimportant/crucial this is, when initially it all seems like abunch of pointless nitpicking

You wouldn’t believe the number of hours spent daily onminuscule API details in the software industry; because youhaven’t yet experienced the above “snowball effect” of yourpoorly designed API

But let’s try anyway in the context of IPCs...


























Direct vs. Indirect Communication

Direct Communication: sends and receivesWhen sending, the message target is explicit:e.g., send(Message message, Process targetProcess);

When receiving, the message source is explicit:e.g., Message ← recv(Process sourceProcess);

Indirect Communication: mailboxesA mailbox is an opaque IDOne sends and receive to/from a mailbox: e.g., send(Messagemessage, String mailbox);

e.g., Message ← recv(String mailbox);

Believe it or not, the above really matters, e.g.:With direct communication, you must know the process thatwill receive the message (which must be running already)With indirect, who receives the message can be decided wellafter sendingBut if two processes want to receive from a mailbox “at thesame time”, which one gets the message (and what aboutmessage ordering?)...
















Message Passing: Synchronous or asynchronous?

Synchronous = Blocking

Synchronous send: Block until the message is receivedSynchronous recv: Block until a message is availableWhen both send and receive are blocking, the operation is arendez-vous

Asynchronous = Non-Blocking

Non-blocking send: Send and continueUsually comes with the option to check the status later (Wasthe message received?)Non-blocking receive: e.g. Read any number of bytes (possibly0) or any message (possibly the empty one, or null)

Most OSes propose both options in various ways


Message Passing: Synchronous or asynchronous?

Synchronous = Blocking

Synchronous send: Block until the message is receivedSynchronous recv: Block until a message is availableWhen both send and receive are blocking, the operation is arendez-vous

Asynchronous = Non-Blocking

Non-blocking send: Send and continueUsually comes with the option to check the status later (Wasthe message received?)Non-blocking receive: e.g. Read any number of bytes (possibly0) or any message (possibly the empty one, or null)

Most OSes propose both options in various ways


A “real-life” Story

I realize that the previous couple of slides are probably a bitabstract and underwhelming, so let me attempts a story frommy own software endeavors

For more than a decade I’ve co-led/co-developed a simulationproject called SimGrid

It’s a simulator of distributed systems running on distributedcompute platforms, used by parallel and distributed computingresearchers

Therefore it offers a process abstraction, and IPC abstractions

Here is a brief history of our IPC API development:Circa 2005: direct only; synchronous only

Circa 2010: indirect only; synchronous onlyCirca 2015: indirect only; synchronous and asynchronous

Each of these decisions took hours of design meetings, hashad drastic implications on our design/implementation ofSimGrid, and has had massive implications for our users







Here is a brief history of our IPC API development:Circa 2005: direct only; synchronous onlyCirca 2010: indirect only; synchronous only

Circa 2015: indirect only; synchronous and asynchronous








Here is a brief history of our IPC API development:Circa 2005: direct only; synchronous onlyCirca 2010: indirect only; synchronous onlyCirca 2015: indirect only; synchronous and asynchronous








Here is a brief history of our IPC API development:Circa 2005: direct only; synchronous onlyCirca 2010: indirect only; synchronous onlyCirca 2015: indirect only; synchronous and asynchronous



Communication Models (again)


Kernel

Process A

Process B

AvailableMemory

Message Passing

M

M

M

Kernel

Process A

Process B

AvailableMemory

Shared Memory

Shared Memory


Shared Memory

IPC is performed outside the kernel space (but still on thesame host)

One process creates a shared memory segment

Other processes can then attach it to their address spaces(Bye bye memory protection for processes)

It is the processes’ (and therefore the developer’s)responsibility to make sure that processes are not stepping oneach other’s toes

The OS is not involved: “What happens in Shared MemorySegments stays in Shared Memory Segments”

Memory is freed by the requester


Shared Memory: POSIX API

Implementation in C

SystemV Implementation:

Creation/Request for a new shared memory segment:id = shmget(IPC PRIVATE, size, IPC R | IPC W)

Attaching a process to the id shared memory segment:shmsAddress = shmat(id, NULL, 0);

Detach the memory segment:shmdt(shmsAddress);

Release control of the shared memory segment:shmctl(idm, IPC RMID, NULL);

When the process is attached:“sending”: sprintf(shmsAddress, "Hello, World!");

“receiving”: sscanf(shmsAddress, "%s", &message);

Let’s look at the posix shm example.c example


Shared Memory needs Message Passing?

How do the non-requesters know the id of the shared memorysegment?

fork(): parent requests and gives it to children (since thechild will have a copy of the parent’s address space)For non-related processes, their many ways...

Through a file?Argument on the command-line?Other Message Passing mechanism?

Shared Memory Segments are flagged by the OS as “SharedMemory”

Let’s run the ipcs -m command on a Linux box and see if wefind any...


Remote Procedure Calls

So far, we’ve viewed messages as unstructured sequences ofbytes: the receiver has to interpret the message to know itsmeaningRPC provides a procedure invocation abstraction acrossprocesses (and actually across machines)

A client invokes a procedure in another process just as it wouldinvoke it directly

It has a lot of usages, of course for client-server applications(RPC is a building blocks or microkernels)

The “magic” is performed through a client stub (one stub foreach RPC):

Marshal the parameters (structured data to bytes stream)Send the data over to the serverWait for the server’s answerUnmarshal the returned values (bytes stream to structureddata)

A lot of different implementations exist... including in Java


Remote Procedure Calls

So far, we’ve viewed messages as unstructured sequences ofbytes: the receiver has to interpret the message to know itsmeaningRPC provides a procedure invocation abstraction acrossprocesses (and actually across machines)

A client invokes a procedure in another process just as it wouldinvoke it directly

It has a lot of usages, of course for client-server applications(RPC is a building blocks or microkernels)The “magic” is performed through a client stub (one stub foreach RPC):

Marshal the parameters (structured data to bytes stream)Send the data over to the serverWait for the server’s answerUnmarshal the returned values (bytes stream to structureddata)

A lot of different implementations exist... including in Java


RPC a la Java: Remote Method Invocation

RPC in Java: Remote Method Invocation (RMI)

A process in a JVM can invoke a method of an object living inanother JVM

Marshalling/Unmarshalling performed by the JVM

(The class need to implement the java.io.Serializable

interface)

RMI hides all the gory details of RPC/IPC

See this Java RMI Tutorial for more info


https://docs.oracle.com/javase/tutorial/rmi/

Remote Procedure Calls: Main Issue

Local procedure calls never fail (i.e., if they reach an errorcondition, that error can be locally managed)

Not so easy when execution is remote: there are many“failure” cases

RPC could be in execution but taking a long time and perhapsappear stuckRPC could have partially executed and then failed halfwaythrough causing the server process to crashRPC could have successfully executed, but then failed whenreplying with some “it worked” message perhaps due to anetwork problem (when running across hosts)

What we want is a strong execute exactly once semantic:When the RPC completes (with perhaps hidden retries), thenyou know it’s been executed exactly once successfully, or notexecuted at all and failed

This gets us to difficult distributed systems issues, which areoften part of graduate courses...



Local procedure calls never fail (i.e., if they reach an errorcondition, that error can be locally managed)Not so easy when execution is remote: there are many“failure” cases






Local procedure calls never fail (i.e., if they reach an errorcondition, that error can be locally managed)Not so easy when execution is remote: there are many“failure” cases





Pipes

One of the most ancient, yet simple, useful, and powerful IPCmechanism provided by OSes is typically called pipes

We explore this in a programming assignment, so it’s a goodidea to pay close attention

But first, let’s take a little detour about UNIX file descriptorsand output redirection...


Pipes

One of the most ancient, yet simple, useful, and powerful IPCmechanism provided by OSes is typically called pipes

We explore this in a programming assignment, so it’s a goodidea to pay close attention

But first, let’s take a little detour about UNIX file descriptorsand output redirection...


stdin, stdout, stderr

In UNIX, every process comes with 3 already opened “files”

Not real files, but in UNIX “everything looks like a file”

These files are:

stdin: the standard input streamstdout: the standard output streamstderr: the standard error stream

You’ve encountered these when developing code (C/C++,Java, Python, etc.)

e.g., printf writes to stdout

Each file in UNIX is associated an integer file descriptor

An index into some “this process’ open files” table

By convention, the file descriptors for each standard streamare (see /usr/include/unistd.h):

stdin: STDIN FILENO = 0stdout: STDOUT FILENO = 1stderr: STDERR FILENO = 2





These files are:












These files are:









Re-directing output

Perhaps some of you have wondered how come something likels > file.txt can work?

After all, ls has code that looks likefprintf(stdout, "%s", filename);

So how can this code magically knows to write to a fileinstead of to stdout???

This is one of the famous UNIX “tricks”

In UNIX, when I open a new file, this file gets the firstavailable file descriptor number

SO, if I close stdout, and open a file right after, this file willhave file descriptor 1THEREFORE, printf will write to it as if it were stdout

Because fprintf(stdout, ...) really means “write to filedescriptor 1”

And I don’t need to change the code of ls at all!!!

Let’s see an example program


Re-directing output











Re-directing output











Re-directing output






SO, if I close stdout, and open a file right after, this file willhave file descriptor 1

THEREFORE, printf will write to it as if it were stdout





Re-directing output











Re-directing output











Output Redirect Example

Example program fragment (should check for errors)

...

pid_t pid = fork();

if (!pid) { // childclose(1); // close stdoutFILE ∗file = fopen(”/tmp/stuff”, ”w”); // open a new file, which gets file descriptor 1// exec the ”ls −la” commandchar* const arguments[] = {"ls", "-la", NULL};

execv("ls", arguments);

}

...

}

This program will run ls -la and write its output to file/tmp/stuff!

Let’s look at output redirect example1.c


https://henricasanova.github.io/ics332_spring2018/morea/Processes/examples/output_redirect_example1.c

What if I opened the file before calling fork()?

In the previous example, the sequence of operation is:Close stdout

Open a new file, which then gets file descriptor 1

What if I have already opened the file and it has some otherfile descriptor?This is what the dup() system call is there: file descriptorduplication!

Essentially, dup() allows you to say “Create another filedescriptor for an existing opened file”, and it will always pickto lowest unused descriptor numberThe fileno() library call returns the descriptor of an open file

So the sequence is:FILE *some file = fopen(....);

close(1);

dup(fileno(some file));

After this sequence, writing to file descriptor 1 writes to thefile instead!

Let’s see a simple example again...





What if I have already opened the file and it has some otherfile descriptor?

This is what the dup() system call is there: file descriptorduplication!



close(1);









Essentially, dup() allows you to say “Create another filedescriptor for an existing opened file”, and it will always pickto lowest unused descriptor number

The fileno() library call returns the descriptor of an open fileSo the sequence is:

FILE *some file = fopen(....);

close(1);











close(1);











close(1);





Another Output Redirect Example


...

FILE ∗file = fopen(”/tmp/stuff”, ”w”); // open a new filepid_t pid = fork();

if (!pid) { // childclose(1); // close stdoutdup(fileno(file)) // duplicate the file’s file descriptor// exec the ”ls −la” commandchar* const arguments[] = {"ls", "-la", NULL};

execv("ls", arguments);

}

...

}

This program will run ls -la and write its output to file/tmp/stuff!

Let’s look at output redirect example2.c


https://henricasanova.github.io/ics332_spring2018/morea/Processes/examples/output_redirect_example2.c

UNIX Pipes

A pipe is a simple IPC mechanisms between two processes

One can create a pipe so that process A can write to it andprocess B reads from it and B can read from the pipe

Available in the shell with the | symbol: the output of aprocess becomes the input of other(s)e.g.: Count the files whose names contain foo but not bar inthe /tmp directory

List all files in /tmp: find /tmp -type f

Keep those with foo: grep foo

But remove those with bar: grep -v bar

And count the lines that remain: wc -l

Putting everything together:find /tmp -type f | grep foo | grep -v bar | wc -l


UNIX Pipes

A pipe is a simple IPC mechanisms between two processes

One can create a pipe so that process A can write to it andprocess B reads from it and B can read from the pipe

Available in the shell with the | symbol: the output of aprocess becomes the input of other(s)e.g.: Count the files whose names contain foo but not bar inthe /tmp directory

List all files in /tmp: find /tmp -type f

Keep those with foo: grep foo

But remove those with bar: grep -v bar

And count the lines that remain: wc -l

Putting everything together:find /tmp -type f | grep foo | grep -v bar | wc -l


popen(): fork() with a pipe!

Very convenient library functions are popen and pclose

Sounds like “pipe open” and “pipe close”, but it’s MUCHmore than that

popen()() does:

Creates a (bi-directional) pipeForks and execs a child process (e.g., ”ls -a”)Returns the pipe, which is in fact a file (FILE *)Both the parent and the child can “talk” through the pipe!

pclose() does:

Waits for the child process to completeCloses the pipe

These are implemented with several system calls: fork,waitpid, pipe (which creates a pipe), close, open, dup

Re-implementing popen/pclose would be a bit too muchhere, but let’s just see an example program that uses it...





popen()() does:


pclose() does:








popen()() does:


pclose() does:





popen() / pclose() Example


...

// fork/exec a child process and get a pipe to READ fromFILE ∗pipe = popen(”/usr/bin/ls −la”, ”r”);

// Get lines of output from the pipe, which is just a FILE ∗, until EOF is reachedchar buffer[2048];

while (fgets(buffer, 2048, pipe)) {

fprintf(stderr,"LINE: %s", buffer);

}

// Wait for the child process to terminatepclose(pipe);

}

This program prints all the output produced by ls -la

Of course, if that’s the only thing you want to do in thisprogram, just run ls -la directly :)But perhaps you want to tweak the output and call thisprogram my ls?

Let’s look at and run popen example.c


https://henricasanova.github.io/ics332_spring2018/morea/Processes/examples/popen_example.c

Conclusions

We’ve seen two main mechanisms for processes tocommunicate:

Message Passing: Within the kernel SpaceShared Memory: Outside the kernel Space

Both mechanisms implemented in all mainstream OS

Many variants and extensions exist:RPCs, RMI, Pipes, and many others we didn’t mention

Textbook readings:About Pipes (Section 5.4)About Client-Servers and RPCs (Section 47.5)

Quiz next week on this Module (Processes and IPCs)

Let’s look at Homework Assignments #4


http://pages.cs.wisc.edu/~remzi/OSTEP/cpu-api.pdf

http://pages.cs.wisc.edu/~remzi/OSTEP/dist-intro.pdf

Conclusions











Conclusions











Inter-Process Communications (IPC)esb/2018fall.ics332/sep17.pdf · Inter-Process Communications (IPC) ICS332 | Operating Systems Henri Casanova ([email protected]) Spring 2018 Henri

Documents