Top Banner
2002-08-15 S. Haridi, CS2104, Lecture 01 1 Distributed Systems, cs5223 Lecture 01 (2004- 01-06) Seif Haridi Department of Computer Science, NUS [email protected]
120
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 1

Distributed Systems, cs5223

Lecture 01 (2004-01-06)

Seif HaridiDepartment of Computer Science,

NUS

[email protected]

Page 2: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 2

Overview

Organization Course overview Getting started (introduction to distributed

systems and distributed algorithms)

Page 3: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 3

Organization/Objectives

Page 4: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 4

Objectives

Understand some of the fundamental aspects of distributed systems

Overview of systems aspects (half of the course)

Focus is on algorithmic aspects (half of the course)

Learn how to read/present research papers

Page 5: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 5

Non objectives

Learning in detail about all middleware for constructing distributed applications

Learn how to program distributed applications Web services Java and distributed computing Mozart and distributed computing

Look at M.L. Liu, Distributed Computing P. Van Roy and S. Haridi, Concepts, Techniques

and Models of Computer Programming

Page 6: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 6

Distributed SystemsCS5223

cs5223 written final exam 60% Midterm exam 20% Assignments 20%

Module homepagehttp://www.comp.nus.edu.sg/~cs5223IVLE

Teaching Lectures Consultation using IVLE Come any time

Page 7: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 7

Teacher

Course responsible [Lectures]Seif Haridi [email protected]

[email protected]

Page 8: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 8

Lectures

Held by me

Page 9: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 9

Lecture Structure

Reminder of last lecture

Overview Content Summary

Reading suggestions

Page 10: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 10

Material

Lectures are based on mainly two books (DS) Andrew S. Tanenbaum, Maarten van Steen,

Distributed Systems, Principles and Paradigms, Prentice-Hall 2002.

(DA) Randy Chow and Theodore Johnson, Distributed Operating Systems & Algorithms, Addison Wesley 1997, ISBN 0-201-49838-3.

Copies should be available (now or soon) at the CO-OP

The handouts are in most cases self explanatory Available from the webpage

Some scientific papers

Page 11: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 11

Other recommended material

Coulouris, Dollimore, Kindberg, “Distributed Systems: Concepts and Design”, Addison-Wesley (3rd Edition)

M.L. Liu, Distributed Computing, principles and applications, Addison Wesley

Nancy Lynch, Distributed Algorithms

Page 12: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 12

Reading Suggestions

Will be available on webpage (Lectures) Initially

Chapter 1 of Tanenbaum (DS)

Page 13: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 13

Assignments

There will be one assignments You will have to study one or two research papers Or do a programming assignment

One discussion group per assignment solutions to be submitted through IVLE there is a deadline for each assignment

Page 14: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 14

General information

Reading of papers In groups of two or three Each group will read one or two research papers

For each paper studied Identify the problem Explain the solution(s) presented in the paper Identify positive and negative aspects of the paper Propose your own solution if any Provide a report Give a presentation to the class

Page 15: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 15

Assignment Groups

Assignment done via IVLE is everybody subscribed to IVLE?

Page 16: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 16

Use IVLE

Only on exceptional [email protected]

Questions There is a discussion group for each book

chapter/lectures There is a discussion group for general matters

Submit your assignments using the corresponding Workbin (IVLE)

Page 17: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 17

Feedback in General

Approach me directly, (any time) or arrange for appointment

Do not be afraid!

Page 18: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 18

Questions and Using Brakes!

Please do ask questions during the lectures repeat an explanation give better explanation for an example?

Please say when things go too fast! Please say when things go too slow!

Page 19: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 19

Background Knowledge

I assume the following some knowledge on Programming languages knowledge: C/Java Operating systems knowledge: basic concepts Networking: basic concepts Algorithms and data structures

I will try to be as elementary as possible Ask me if lack some knowledge

Page 20: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 20

Course Overview

Page 21: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 21

What is a distributed system

Page 22: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 22

Distributed system

A simplified view

Communication Medium

Communication Medium

Processor

Process

Thread

Communication channel

Node: processor/process

Page 23: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 23

Distributed system

Set of computing nodes that cooperate in order to achieve a well defined goal

Nodes cooperate through communication

Communication is by message passing at the fundamental level

Page 24: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 24

Distributed System

A distributed system is one/more applications running on a collection of independent computers that appears to its users as a single coherent system

Page 25: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 25

What is a Distributed System?

Distributed hardware n processing elements (processor + memory), PE Interconnected by some network No shared-memory

Distributed software No centralized OS, each PE has its own copy of OS No physically centralized file system Means for inter-process communication

Distributed applications

Page 26: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 26

Why distributed systems?

Information exchange (collaborative work) Resource sharing (e.g. printer, backup

storage, disk units, etc.) Resource sharing (applications, information,

media, services) Cost reduction Increase of availability (partial-failure) Increase of performance through

parallelism,...

Page 27: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 27

Main characteristics

No shared memory between nodes Each node has its memory Communication by message passing

No global clock Each node has its own clock

Impossible for a node to obtain an instantaneous global state of the system

Page 28: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 28

Examples of Distributed Systems

Airline reservation system Bank automated teller machine network CSCW (Computer Supported Cooperative

Work) Intranet Internet Mobile computing

Page 29: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 29

intranet

ISP

desktop computer:

backbone

satellite link

server:

network link:

A typical portion of the Internet

Page 30: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 30

A typical intranet

the rest of

email server

Web server

Desktopcomputers

File server

router/firewall

print and other servers

other servers

print

Local areanetwork

email server

the Internet

Page 31: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 31

How Distributed Systems are built?

A number of computers connected by a network

Distribution middleware services layer that gives a uniform view of the nodes, and hides some of the network and distribution aspects

Application on top the middleware service layer (using a programming system)

Page 32: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 32

Middleware view

Page 33: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 33

Middleware view

Distributed Systems is organized often as a layer on the top of local operating systems

Page 34: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 34

Goals of a Distributed System Transparency

Hide the fact the processes are resources are physically distributed

Scalability Distributed systems should be easy to expand

Availability Distributed systems should be continuously available

Openness New users/components into the system Incremental and independent augmentation by

independent developer teams

Page 35: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 35

Transparency Ideally a distributed application (system) should

look like conventional centralized systems, no distinction between local and remote resources

This is the user view The developer view is different

Network aware, knows the cost of distribution of programming entities (e.g. objects)

Have means to control the distribution behavior

Page 36: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 36

Transparency

Access Transparency Hide differences in data representation and

how a resource is accessed Hides heterogeneity of underlying nodes

Location Transparency Hide where a resource/service is located

Migration Transparency Hides that a resources/service may be moved

to another location while in use

Page 37: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 37

Transparency

Relocation Transparency Hides that a resource maybe moved to

another location (machine/node) Failure Transparency

Hide the failure and recovery of a resource Concurrency Transparency

Hides that a resources may be shared by a number of competitive uses/processes

Page 38: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 38

Transparency

Transparency Description

AccessHide differences in data representation and how a resource is accessed

Location Hide where a resource is located

Migration Hide that a resource may move to another location

RelocationHide that a resource may be moved to another location while in use

ReplicationHide that a resource may be shared by several competitive users

ConcurrencyHide that a resource may be shared by several competitive users

Failure Hide the failure and recovery of a resource

Persistence Hide whether a (software) resource is in memory or on disk

Page 39: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 39

Scalability

Size Add more users and resources/components

Distance Cope with geographically apart resources/users

Management Spanning over independent administrative

organizations Local management

Page 40: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 40

Scalability Problems (Size)

Examples of scalability limitations.

Concept Example

Centralized services A single server for all users

Centralized data A single database for location information

Centralized algorithms All requests go through one process

Page 41: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 41

Scaling Techniques I

1.4

Off loading the server by sending form processing procedures to the client

Page 42: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 42

Scaling Techniques II

• Distributed Algorithms• No process has complete information of the system• Process decisions are based on local information• Failure of one process does not ruin the whole system• Non implicit assumptions about exactly synchronized clocks (global clock)

Page 43: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 43

Scaling Techniques II

1.5

An example of dividing the DNS name space into zones.

Page 44: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 44

Scalability Problems (Distance)

Long communication delays Programming techniques for Local Area

Networks LAN do not really work for Wide Area Networks WAN Synchronous Communication like Remote

Procedure Calls (RPC) are not suitable Asynchronous Message passing is more

appropriate

Page 45: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 45

Scalability Problems (Distance)

Long communication delays Programming techniques for Local Area

Networks LAN do not really work for Wide Area Networks WAN Synchronous Communication like Remote

Procedure Calls (RPC) are not suitable Asynchronous Message passing is more

appropriate

Page 46: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 46

Scalability Problems (Distance)

WAN has unreliable communication media Cannot exploit broadcast communication

Only point-to-point communication Locating a service on a WAN is more difficult

that on LAN On LAN just broadcast a service identifier, and

wait for response

Page 47: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 47

Scalability Problems (Different Administrative Organizations)

Different and conflicting policies for Resource usage Management of the system Security policies

WHO has access to WHAT resources Can I trust a non local system administrator

Page 48: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 48

Scalability Problems (Different Administrative Organizations)

Protect DS from the domains 1 & 2 Protect domains 1 & 2 from the DS GRID Computing GGF

Distributed System DS

Admin Domain 1 Admin Domain 2

Page 49: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 49

Focus of the Distributed systems part (Basics)

Components of Distributed Systems Inter-process communication Processes, threads, client/servers, code

migration, software agents Naming services

Page 50: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 50

Focus of the Distributed systems part (Middleware)

Examples of middleware for building DS Distributed Object-based Systems

CORBA Distributed COM GLOBE

Distributed Coordination-based systems Security

Page 51: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 51

Focus of the Distributed systems part (Infrastructures)

Distributed file systems Distributed document-based systems

Page 52: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 52

Focus of the Distributed Algorithms part

Model of Computations Techniques for coordination of processess Techniques for high availability

Fault tolerance Reliable group communication Distributed agreement

Techniques for scalability Consistency models Replication techniques

Page 53: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 53

Distributed Algorithms

Page 54: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 54

Distributed Algorithms

How to design distributed algorithmsStudy of some fundamental problemsAnalysis of distributed algorithms

How to achieve fault-tolerance in a distributed systemFault-tolerance: ability for a system to provide

useful service despite the failure of some of its components

Very important for high availability

Page 55: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 55

Why studying distributed algorithms?

Distributed algorithms are backbone of distributed computing systems

They are essential for the implementation of distributed systems Distributed operating systems Distributed databases, communication systems, Real-time process-control systems, Transportation, etc.

Page 56: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 56

Classes of distributed algorithms

Fully decentralizedFault-tolerantMore difficult in general

With a centralized coordinatorConceptually simpler Single point of failure, bottleneckRequire efficient mechanisms for selecting a

new coordinator if the current one fails

Page 57: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 57

References

Text book: Distributed Operating Systems & Algorithms

Randy Chow and Theodore Johnson, Addison Wesley, 1997

Others Distributed Algorithms

Nancy A. Lynch, 1996 Research papers

Page 58: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 58

Distributed Algorithms Models of Distributed Computation

CausalityOrdering of events, Logical Clocks (timestamps)Causal communication

Distributed snapshotsDetecting stable properties, Diffusing computation

Modeling a distributed computationExpressing correctness properties of a dist. algo.

Failures in a distributed system

Page 59: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 59

Distributed Algorithms: outline

SynchronizationDistributed mutual exclusion: needed to regulate

accesses to a common resource that can be used only by one process at a time

ElectionUsed for instance, to designate a new coordinator

when the current coordinator fails

Page 60: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 60

Distributed Algorithms: outline Distributed agreement

How to get a set of nodes to agree on a value

Distributed agreement is used for instance, To determine which nodes are alive in the

systemTo confine malicious behavior of some

components (Fault-tolerance again!)

Page 61: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 61

Distributed Algorithms: outline

Replicated data management A key for high availability is to replicate

components (data/files, servers, etc.)

We shall be concerned with Techniques for maintaining replicated data in a

distributed system, (database techniques) Atomic broadcast/multicast Membership

Page 62: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 62

Distributed Algorithms: outline

Check-pointing and recovery Error recovery is essential for fault-tolerance When a processor fails and then is repaired, it will

need to recover its state of the computation To enable recovery, check-pointing (recording of

the state into a stable storage) is needed We will be concerned with techniques used for

this, in the context of distributed systems

Page 63: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 63

Background

Page 64: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 64

Distributed system, distributed computing

Early computing was performed on a single processor. Uni-processor computing can be called centralized computing.

A distributed system is a collection of independent computers, interconnected via a network, capable of collaborating on a task.

Distributed computing is computing performed in a distributed system.

Page 65: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 65

Distributed Systems

T h e I n te r n e t

a n e tw o r k h o s t

w o r ks ta t io n s a lo c a l n e tw o r k

Page 66: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 66

Examples of Distributed systems

Network of workstations (NOW): a group of networked personal workstations connected to one or more server machines.

The Internet An intranet: a network of computers and

workstations within an organization, segregated from the Internet via a protective device (a firewall).

Page 67: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 67

Computers in a Distributed System

Workstations: computers used by end-users to perform computing

Server machines: computers which provide resources and services

Personal Assistance Devices: handheld computers connected to the system via a wireless communication link.

Page 68: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 68

Centralized vs. Distributed Computing

m ain f r am e c o m p u terw o r k s ta tio n

n etw o r k h o s t

n e tw o r k lin k

ter m in al

ce n tra lize d co m pu t in gdis tribu te d co m pu t in g

Page 69: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 69

Evolution of pardigms

Client-server: Socket API, remote method invocation Distributed objects Object broker: CORBA Network service: Jini Object space: JavaSpaces Mobile agents Message oriented middleware (MOM): Java Message

Service Collaborative applications

Page 70: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 70

Cooperative distributed computing projects

Cooperative distributed computing projects (also called distributed computing in some literature): these are projects that parcel out large-scale computing to workstations, often making use of surplus CPU cycles. Example: seti@home: project to scan data retrieved by a radio telescope to search for radio signals from another world.

Page 71: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 71

Why distributed computing?

Economics: distributed systems allow the pooling of resources, including CPU cycles, data storage, input/output devices, and services.

Reliability: a distributed system allow replication of resources and/or services, thus reducing service outage due to failures.

The Internet has become a universal platform for distributed computing.

Page 72: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 72

The Weaknesses and Strengths of Distributed Computing In any form of computing, there is always a

tradeoff in advantages and disadvantages Some of the reasons for the popularity of

distributed computing : The affordability of computers and

availability of network access Resource sharing Scalability Fault Tolerance

Page 73: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 73

The Weaknesses and Strengths of Distributed ComputingThe disadvantages of distributed computing: Multiple Points of Failures: the failure of

one or more participating computers, or one or more network links, can spell trouble.

Security Concerns: In a distributed system, there are more opportunities for unauthorized attack.

Difficult to develop application

Page 74: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 74

Introductory Basics

M. L. Liu

Page 75: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 75

Basics in three areas

Some of the notations and concepts from these areas will be employed from time to time in the presentations for this course: Programming Languages Operating systems Networks.

Page 76: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 76

Procedural versus Object-oriented

Programming In building network applications, there are two main classes of programming languages: procedural language and object-oriented language. Procedural languages, with the C language

being the primary example, use procedures (functions) to break down the complexity of the tasks that an application entails.  

Object-oriented languages, exemplified by Java, use objects to encapsulate the details. Each object carrying state data as well as behaviors. State data are represented as instance data. Behaviors are represented as methods.

Page 77: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 77

Operating Systems Basics

Page 78: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 78

Operating systems basics

A process consists of an executing program, its current values, state information, and the resources used by the operating system to manage its execution.

A program is an artifact constructed by a software developer; a process is a dynamic entity which exists only when a program is run.

Page 79: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 79

Process State Transition Diagram

S im plif e d f in it e s ta te dia g ra m fo r a pro ce s s 's lif e t im e

s ta rt

re a dyru n n in g

blo ck e d

te rm in a te d

d is p atc h

q u eu ed

ev en t c o m p letio n w aitin gfo r ev en t

ex it

Page 80: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 80

Example: Java processes

There are three types of Java program: applications, applets, and servlets, all are written as a class. A Java application program is run as an

independent(standalone) process. An applet is run using a browser or the applet viewer. A servlet is run in the context of a web server.

A Java program is compiled into byte code, a universal object code. When run, the byte code is interpreted by the Java Virtual Machine (JVM).

Page 81: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 81

Three Types of Java programs Applications

a program whose byte code can be run on any system which has a Java Virtual Machine. An application may be standalone (monolithic) or distributed (if it interacts with another process).

Applets

A program whose byte code is downloaded from a remote machine and is run in the browser’s Java Virtual Machine.

Servlets

A program whose byte code resides on a remote machine and is run at the request of an HTTP client (a browser).

Page 82: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 82

Three Types of Java programs

computer

Java object

Java Virtual Machine

A standalone Java application is run on a local machine

Java object

Java Virtual Machine

An applet is an object downloaded (transferred) from a remote machine,then run on a local machine.

request

response

a servlet

an applet

Aservlet is an object that runs on a remote machine andinteracts with a local program using a request-response protocol

a process

Page 83: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 83

Concurrent Processing

On modern day operating systems, multiple processes appear to be executing concurrently on a machine by timesharing resources.

Processes

time

P1P2

P3P4

Timesharing of a resource

Page 84: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 84

Concurrent processing within a process

It is often useful for a process to have parallel threads of execution,

each of which timeshare the system resources in much the same

way as concurrent processes.

p ar en t p r o c es s

c h ild p r o c es s es

A pa re n t pro ce s s m a y s pa wn ch ild pro ce s s e s .

a p r o c es s

m ain th r ead

c h ild th r ead 1

c h ild th r ead 2

A pro ce s s m a y s pa wn ch ild th re a ds

C o n cu rre n t pro ce s s in g with in a pro ce s s

Page 85: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 85

Thread-safe Programming

When two threads independently access and update the same data object, such as a counter, as part of their code, the updating needs to be synchronized. (See next slide.)

Because the threads are executed concurrently, it is possible for one of the updates to be overwritten by the other due to the sequencing of the two sets of machine instructions executed in behalf of the two threads.

To protect against the possibility, a synchronized method can be used to provide mutual exclusion.

Page 86: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 86

Race Condition

fe tch va lu e in co u n te r a n d lo a d in to a re g is te r

in cre me n t va lu e in re g is te r

s to re va lu e in re g is te r to co u n te r

t im e

fe tch va lu e in co u n te r a n d lo a d in to a re g is te r

in cre m e n t va lu e in re g is te r

s to re va lu e in re g is te r to co u n te r

in s tr u c tio n ex ec u ted in c o n c u r r en t p r o c es s o r th r ead 1

in s tr u c tio n ex ec u ted in c o n c u r r en t p r o c es s o r th r ead 2

This e xe c ut io n re s ul ts in the value 2 in the c o unte r

fe tch va lu e in co u n te r a n d lo a d in to a re g is te r

fe tch va lu e in co u n te r a n d lo a d in to a re g is te r

in cre me n t va lu e in re g is te r

in cre m e n t va lu e in re g is te r

sto re va lu e in re g is te r to co u n te r

s to re va lu e in re g is te r to co u n te r

This e xe c ut io n re s ul ts in the value 1 in the c o unte r

Page 87: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 87

Network Basics

Page 88: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 88

Network standards and protocols

On public networks such as the Internet, it is necessary for a common set of rules to be specified for the exchange of data.

Such rules, called protocols, specify such matters as the formatting and semantics of data, flow control, error correction.

Software can share data over the network using network software which supports a common set of protocols.

Page 89: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 89

Protocols

A protocol is a set of rules that must be observed by the participants.

Protocols must be formally defined and precisely implemented. For each protocol, there must be rules that specify the followings:

How is the data exchanged encoded?

How are events (sending , receiving) synchronized so that the participants can send and receive in a coordinated order?

The specification of a protocol does not dictate how the rules are to be implemented.

Page 90: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 90

The network architecture

Network hardware transfers electronic signals,which represent a bit stream, between two devices.

Modern day network applications require an application programming interface (API) which masks the underlying complexities of data transmission.

A layered network architecture allows the functionalities needed to mask the complexities to be provided incrementally, layer by layer.

Actual implementation of the functionalities may not be clearly divided by layer.

Page 91: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 91

The OSI seven-layer network architecture

application layer

presentation layer

session layer

transport layer

network layer

data link layer

physical layer

application layer

presentation layer

session layer

transport layer

network layer

data link layer

physical layer

Page 92: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 92

Network Architecture

The division of the layers is conceptual: the implementation of the functionalities need not be clearly divided as such in the hardware and software that implements the architecture. The conceptual division serves at least two useful purposes :1. Systematic specification of protocols

it allows protocols to be specified systematically

2. Conceptual Data Flow: it allows programs to be written in terms of logical data flow.

Page 93: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 93

The TCP/IP Protocol Suite The Transmission Control Protocol/Internet Protocol suite is a set of

network protocols which supports a four-layer network architecture. It is currently the protocol suite employed on the Internet.

Ap p lic a tio n lay er

T r an s p o r t lay er

I n te r n e t lay er

P h y s ic a l lay er

Ap p lic a tio n lay er

T r an s p o r t lay er

I n te r n e t lay er

P h y s ic a l lay er

Th e I n te rn e t n e two rk a rch ite ctu re

Page 94: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 94

The TCP/IP Protocol Suite -2

The Internet layer implements the Internet Protocol, which provides the functionalities for allowing data to be transmitted between any two hosts on the Internet.

The Transport layer delivers the transmitted data to a specific process running on an Internet host.

The Application layer supports the programming interface used for building a program.

Page 95: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 95

Network Resources

Network resources are resources available to the participants of a distributed computing community.

Network resources include hardware such as computers and equipment, and software such as processes, email mailboxes, files, web documents.

An important class of network resources is network services such as the World Wide Web and file transfer (FTP), which are provided by specific processes running on computers.

Page 96: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 96

Identification of Network Resources

One of the key challenges in distributed computing is the unique identification of resources available on the network, such as e-mail mailboxes, and web documents. Addressing an Internet Host Addressing a process running on a host Email Addresses Addressing web contents: URL

Page 97: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 97

Addressing Internet Hosts

Page 98: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 98

The Internet Topology

s u b n ets

T h e I n ter n e t b ac k b o n e

an I n ter n e t h o s t

Th e I n te rn e t To po lo g y M o de l

Page 99: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 99

The Internet Topology The internet consists of an hierarchy of

networks, interconnected via a network backbone.

Each network has a unique network address. Computers, or hosts, are connected to a

network. Each host has a unique ID within its network.

Each process running on a host is associated with zero or more ports. A port is a logical entity for data transmission.

Page 100: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 100

The Internet addressing scheme In IP version 4, each address is 32 bit long. The address space accommodates 232 (4.3 billion) addresses in total. Addresses are divided into 5 classes (A through E)

0

10

11

11

0

0111

1111 101

network address

host portionmulticast group

reserved

byte 0 byte 1 byte 2 byte 3

class A address

class B address

class C address

Multicast addresses

reserved address reserved

Page 101: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 101

The Internet addressing scheme - 2

1 0 network address host portion

byte 0 byte 1 byte 2 byte 3

class B address

subnet address local host address

Subdividing the host portion of an Internet address:

A class A/C address space canalso be similarly subdivided..

Which portion of the host address

is used for the subnet identificationis determined by a subnet mask.

Page 102: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 102

Suppose the dotted-decimal notation for a particular Internet address

is129.65.24.50. The 32-bit binary expansion of the notation is as

follows:

Since the leading bit sequence is 10, the address is a Class address. Within the class, the network portion is identified by theremaining bits in the first two bytes, that is, 00000101000001, and thehost portion is the values in the last two bytes, or 0001100000110010. For convenience, the binary prefix for class identification is often included as

part of the network portion of the address, so that we would say that this particular address is at network 129.65 and then at host address 24.50 on that network.

1 2 9 .6 5 .2 4 .5 01 0 0 0 0 0 0 1

0 1 0 0 0 0 0 1

0 0 0 11 0 0 0

0 0 11 0 0 1 0

Example

Page 103: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 103

Given the address 224.0.0.1, one can expand it as follows:

 

The binary prefix of 1110 signifies that this is class D, or multicast, address. Data packets sent to this address should therefore be delivered to the multicast group

0000000000000000000000000001.

2 2 4 .0 .0 .1

11 1 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 1

Another Example

Page 104: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 104

The Internet Address Scheme - 3

For human readability, Internet addresses are written in a dotted decimal notation:

nnn.nnn.nnn.nnn, where each nnn group is a decimal value in the range of 0 through 255

# Internet host table (found in /etc/hosts file)

127.0.0.1 localhost

129.65.242.5 falcon.csc.calpoly.edu falcon loghost

129.65.241.9 falcon-srv.csc.calpoly.edu falcon-srv

129.65.242.4 hornet.csc.calpoly.edu hornet

129.65.241.8 hornet-srv.csc.calpoly.edu hornet-srv

129.65.54.9 onion.csc.calpoly.edu onion

129.65.241.3 hercules.csc.calpoly.edu hercules

Page 105: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 105

IP version 6 Addressing Scheme

Each address is 128-bit long. There are three types of addresses:

Unicast: An identifier for a single interface. Anycast: An identifier for a set of interfaces

(typically belonging to different nodes). Multicast: An identifier for a set of interfaces

(typically belonging to different nodes). A packet sent to a multicast address is delivered to all interfaces identified by that address.

See Request for Comments: 2373 http://www.faqs.org/rfcs/ (link is in book’s reference)

Page 106: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 106

The Domain Name System (DNS)

Each Internet address is mapped to a symbolic name, using the DNS, in the format of:

<computer-name>.<subdomain hierarchy>.<organization>.<sector name>{.<country code>}

e.g., www.csc.calpoly.edu.us

ro o t

co me du g o v n e t o rg m il

o rg a n iza t io n

...

...

h o s t n a m e

to p- le v e l do m a in

s u bdo m a in

in th e U.S .

To p- le v e l do m a in n a m e h a s to be a pplie d fo r.S u bdo m a in h ie ra ch y a n d n a m e s a re a s s ig n e d by th e o rg a n iza t io n .

cou n try code

Page 107: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 107

The Domain Name System For network applications, a domain name must be

mapped to its corresponding Internet address. Processes known as domain name system servers

provide the mapping service, based on a distributed database of the mapping scheme.

The mapping service is offered by thousands of DNS servers on the Internet, each responsible for a portion of the name space, called a zone. The servers that have access to the DNS information (zone file) for a zone is said to have authority for that zone

Page 108: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 108

Domain Name Hierarchy

. au . c a .u s . zw .c o m .g o v .ed u .m il .n e t. . .. . . . . . . o r g

c o u n tr y c o d e

u c s b .ed u c a lp o ly . ed u

c s c ee. . . en g lis h w ir e les s. . .c s ec e. . . . . .

. . . . . .

. ( r o o t d o m ain )

Page 109: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 109

Name lookup and resolution

If a domain name is used to address a host, its corresponding IP address must be obtained for the lower-layer network software.

The mapping, or name resolution, must be maintained in some registry.

For runtime name resolution, a network service is needed; a protocol must be defined for the naming scheme and for the service. Example: The DNS service supports the DNS; the Java RMI registry supports RMI object lookup; JNDI is a network service lookup protocol.

Page 110: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 110

Addressing a process running on a host

Page 111: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 111

Logical Ports

...

process

port

...

host A

host B

The Internet

Each host has 65536 ports.

Page 112: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 112

Well Known Ports

Each Internet host has 216 (65,535) logical ports. Each port is identified by a number between 1 and 65535, and can be allocated to a particular process.

Port numbers beween 1 and 1023 are reserved for processes which provide well-known services such as finger, FTP, HTTP, and email.

Page 113: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 113

Well-known ports

Pro to co l Po rt S e rv ice

e cho 7 IPC te stin g

d a ytime 1 3 p ro v id e s the cu rre n t d a te a n d time

ftp 2 1 file tra n sfe r p ro toco l

te lne t 2 3 remote, c ommand- line termina l s es s ion

smtp 2 5 simp le ma il tran sfe r p ro to co l

time 3 7 p ro v id e s a s tan d a rd time

fin ge r 7 9 p ro v id e s in fo rma tio n a b ou t a u se r

h ttp 8 0 we b se rve r

R MI R e g is try 1 0 9 9 re g is try fo r R e m o te Me th o d In vo ca tio n

sp ecia l we b se rve r 8 0 8 0we b se rve r wh ich su p p o rts

se rv le ts , JSP, o r ASP

A ssig n m en t o f so m e w ell-kn o w n p o rts

Page 114: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 114

Choosing a port to run your program

For programming: when a port is needed, choose a random number above the well known ports: 1,024- 65,535.

For providing a network service for the community, then arrange to have a port assigned to and reserved for your service.

Page 115: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 115

Addressing a Web Document

Page 116: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 116

The Uniform Resource Identifier (URI)

Resources to be shared on a network need to be uniquely identifiable.

On the Internet, a URI is a character string which allows a resource to be located.

There are two types of URIs: URL (Uniform Resource Locator) points to a

specific resource at a specific location URN (Uniform Resource Name) points to a

specific resource at a nonspecific location.

Page 117: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 117

URL

A URL has the format of: protocol://host address[:port]/directory path/file name#section

A sample URL:

http:// www.csc.calpoly.edu :8080/~ mliu/ CSC369 / hw.html # hw1

protocol of server

host name

port number of server process

directory path

file name

section name

Other protocols that can appear in a URL are: file

ftp gopher news telnet WAIS

Page 118: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 118

More on URL

The path in a URL is relative to the document root of the server.

A URL may appear in a document in a relative form:

< a href=“another.html”>

and the actual URL referred to will be another.html preceded by the protocol, hostname, directory path of the document .

Page 119: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 119

Summary - 1

We discussed the following topics: What is meant by distributed computing Distributed system Basic concepts in operating system:

processes and threads

Page 120: Lecture 01

2002-08-15 S. Haridi, CS2104, Lecture 01 120

Summary - 2

Basic concepts in data communication: Network architectures: the OSI model and the

Internet model Connection-oriented communication vs.

connectionless communication Naming schemes for network resources

The Domain Name System (DNS) Protocol port numbers Uniform Resource Identifier (URI) Email addresses