2002-08-15 S. Haridi, CS2104, Lecture 01 1 Distributed Systems, cs5223 Lecture 01 (2004- 01-06) Seif Haridi Department of Computer Science, NUS [email protected]
2002-08-15 S. Haridi, CS2104, Lecture 01 1
Distributed Systems, cs5223
Lecture 01 (2004-01-06)
Seif HaridiDepartment of Computer Science,
NUS
2002-08-15 S. Haridi, CS2104, Lecture 01 2
Overview
Organization Course overview Getting started (introduction to distributed
systems and distributed algorithms)
2002-08-15 S. Haridi, CS2104, Lecture 01 3
Organization/Objectives
2002-08-15 S. Haridi, CS2104, Lecture 01 4
Objectives
Understand some of the fundamental aspects of distributed systems
Overview of systems aspects (half of the course)
Focus is on algorithmic aspects (half of the course)
Learn how to read/present research papers
2002-08-15 S. Haridi, CS2104, Lecture 01 5
Non objectives
Learning in detail about all middleware for constructing distributed applications
Learn how to program distributed applications Web services Java and distributed computing Mozart and distributed computing
Look at M.L. Liu, Distributed Computing P. Van Roy and S. Haridi, Concepts, Techniques
and Models of Computer Programming
2002-08-15 S. Haridi, CS2104, Lecture 01 6
Distributed SystemsCS5223
cs5223 written final exam 60% Midterm exam 20% Assignments 20%
Module homepagehttp://www.comp.nus.edu.sg/~cs5223IVLE
Teaching Lectures Consultation using IVLE Come any time
2002-08-15 S. Haridi, CS2104, Lecture 01 7
Teacher
Course responsible [Lectures]Seif Haridi [email protected]
2002-08-15 S. Haridi, CS2104, Lecture 01 8
Lectures
Held by me
2002-08-15 S. Haridi, CS2104, Lecture 01 9
Lecture Structure
Reminder of last lecture
Overview Content Summary
Reading suggestions
2002-08-15 S. Haridi, CS2104, Lecture 01 10
Material
Lectures are based on mainly two books (DS) Andrew S. Tanenbaum, Maarten van Steen,
Distributed Systems, Principles and Paradigms, Prentice-Hall 2002.
(DA) Randy Chow and Theodore Johnson, Distributed Operating Systems & Algorithms, Addison Wesley 1997, ISBN 0-201-49838-3.
Copies should be available (now or soon) at the CO-OP
The handouts are in most cases self explanatory Available from the webpage
Some scientific papers
2002-08-15 S. Haridi, CS2104, Lecture 01 11
Other recommended material
Coulouris, Dollimore, Kindberg, “Distributed Systems: Concepts and Design”, Addison-Wesley (3rd Edition)
M.L. Liu, Distributed Computing, principles and applications, Addison Wesley
Nancy Lynch, Distributed Algorithms
2002-08-15 S. Haridi, CS2104, Lecture 01 12
Reading Suggestions
Will be available on webpage (Lectures) Initially
Chapter 1 of Tanenbaum (DS)
2002-08-15 S. Haridi, CS2104, Lecture 01 13
Assignments
There will be one assignments You will have to study one or two research papers Or do a programming assignment
One discussion group per assignment solutions to be submitted through IVLE there is a deadline for each assignment
2002-08-15 S. Haridi, CS2104, Lecture 01 14
General information
Reading of papers In groups of two or three Each group will read one or two research papers
For each paper studied Identify the problem Explain the solution(s) presented in the paper Identify positive and negative aspects of the paper Propose your own solution if any Provide a report Give a presentation to the class
2002-08-15 S. Haridi, CS2104, Lecture 01 15
Assignment Groups
Assignment done via IVLE is everybody subscribed to IVLE?
2002-08-15 S. Haridi, CS2104, Lecture 01 16
Use IVLE
Only on exceptional [email protected]
Questions There is a discussion group for each book
chapter/lectures There is a discussion group for general matters
Submit your assignments using the corresponding Workbin (IVLE)
2002-08-15 S. Haridi, CS2104, Lecture 01 17
Feedback in General
Approach me directly, (any time) or arrange for appointment
Do not be afraid!
2002-08-15 S. Haridi, CS2104, Lecture 01 18
Questions and Using Brakes!
Please do ask questions during the lectures repeat an explanation give better explanation for an example?
Please say when things go too fast! Please say when things go too slow!
2002-08-15 S. Haridi, CS2104, Lecture 01 19
Background Knowledge
I assume the following some knowledge on Programming languages knowledge: C/Java Operating systems knowledge: basic concepts Networking: basic concepts Algorithms and data structures
I will try to be as elementary as possible Ask me if lack some knowledge
2002-08-15 S. Haridi, CS2104, Lecture 01 20
Course Overview
2002-08-15 S. Haridi, CS2104, Lecture 01 21
What is a distributed system
2002-08-15 S. Haridi, CS2104, Lecture 01 22
Distributed system
A simplified view
Communication Medium
Communication Medium
Processor
Process
Thread
Communication channel
Node: processor/process
2002-08-15 S. Haridi, CS2104, Lecture 01 23
Distributed system
Set of computing nodes that cooperate in order to achieve a well defined goal
Nodes cooperate through communication
Communication is by message passing at the fundamental level
2002-08-15 S. Haridi, CS2104, Lecture 01 24
Distributed System
A distributed system is one/more applications running on a collection of independent computers that appears to its users as a single coherent system
2002-08-15 S. Haridi, CS2104, Lecture 01 25
What is a Distributed System?
Distributed hardware n processing elements (processor + memory), PE Interconnected by some network No shared-memory
Distributed software No centralized OS, each PE has its own copy of OS No physically centralized file system Means for inter-process communication
Distributed applications
2002-08-15 S. Haridi, CS2104, Lecture 01 26
Why distributed systems?
Information exchange (collaborative work) Resource sharing (e.g. printer, backup
storage, disk units, etc.) Resource sharing (applications, information,
media, services) Cost reduction Increase of availability (partial-failure) Increase of performance through
parallelism,...
2002-08-15 S. Haridi, CS2104, Lecture 01 27
Main characteristics
No shared memory between nodes Each node has its memory Communication by message passing
No global clock Each node has its own clock
Impossible for a node to obtain an instantaneous global state of the system
2002-08-15 S. Haridi, CS2104, Lecture 01 28
Examples of Distributed Systems
Airline reservation system Bank automated teller machine network CSCW (Computer Supported Cooperative
Work) Intranet Internet Mobile computing
2002-08-15 S. Haridi, CS2104, Lecture 01 29
intranet
ISP
desktop computer:
backbone
satellite link
server:
network link:
A typical portion of the Internet
2002-08-15 S. Haridi, CS2104, Lecture 01 30
A typical intranet
the rest of
email server
Web server
Desktopcomputers
File server
router/firewall
print and other servers
other servers
Local areanetwork
email server
the Internet
2002-08-15 S. Haridi, CS2104, Lecture 01 31
How Distributed Systems are built?
A number of computers connected by a network
Distribution middleware services layer that gives a uniform view of the nodes, and hides some of the network and distribution aspects
Application on top the middleware service layer (using a programming system)
2002-08-15 S. Haridi, CS2104, Lecture 01 32
Middleware view
2002-08-15 S. Haridi, CS2104, Lecture 01 33
Middleware view
Distributed Systems is organized often as a layer on the top of local operating systems
2002-08-15 S. Haridi, CS2104, Lecture 01 34
Goals of a Distributed System Transparency
Hide the fact the processes are resources are physically distributed
Scalability Distributed systems should be easy to expand
Availability Distributed systems should be continuously available
Openness New users/components into the system Incremental and independent augmentation by
independent developer teams
2002-08-15 S. Haridi, CS2104, Lecture 01 35
Transparency Ideally a distributed application (system) should
look like conventional centralized systems, no distinction between local and remote resources
This is the user view The developer view is different
Network aware, knows the cost of distribution of programming entities (e.g. objects)
Have means to control the distribution behavior
2002-08-15 S. Haridi, CS2104, Lecture 01 36
Transparency
Access Transparency Hide differences in data representation and
how a resource is accessed Hides heterogeneity of underlying nodes
Location Transparency Hide where a resource/service is located
Migration Transparency Hides that a resources/service may be moved
to another location while in use
2002-08-15 S. Haridi, CS2104, Lecture 01 37
Transparency
Relocation Transparency Hides that a resource maybe moved to
another location (machine/node) Failure Transparency
Hide the failure and recovery of a resource Concurrency Transparency
Hides that a resources may be shared by a number of competitive uses/processes
2002-08-15 S. Haridi, CS2104, Lecture 01 38
Transparency
Transparency Description
AccessHide differences in data representation and how a resource is accessed
Location Hide where a resource is located
Migration Hide that a resource may move to another location
RelocationHide that a resource may be moved to another location while in use
ReplicationHide that a resource may be shared by several competitive users
ConcurrencyHide that a resource may be shared by several competitive users
Failure Hide the failure and recovery of a resource
Persistence Hide whether a (software) resource is in memory or on disk
2002-08-15 S. Haridi, CS2104, Lecture 01 39
Scalability
Size Add more users and resources/components
Distance Cope with geographically apart resources/users
Management Spanning over independent administrative
organizations Local management
2002-08-15 S. Haridi, CS2104, Lecture 01 40
Scalability Problems (Size)
Examples of scalability limitations.
Concept Example
Centralized services A single server for all users
Centralized data A single database for location information
Centralized algorithms All requests go through one process
2002-08-15 S. Haridi, CS2104, Lecture 01 41
Scaling Techniques I
1.4
Off loading the server by sending form processing procedures to the client
2002-08-15 S. Haridi, CS2104, Lecture 01 42
Scaling Techniques II
• Distributed Algorithms• No process has complete information of the system• Process decisions are based on local information• Failure of one process does not ruin the whole system• Non implicit assumptions about exactly synchronized clocks (global clock)
2002-08-15 S. Haridi, CS2104, Lecture 01 43
Scaling Techniques II
1.5
An example of dividing the DNS name space into zones.
2002-08-15 S. Haridi, CS2104, Lecture 01 44
Scalability Problems (Distance)
Long communication delays Programming techniques for Local Area
Networks LAN do not really work for Wide Area Networks WAN Synchronous Communication like Remote
Procedure Calls (RPC) are not suitable Asynchronous Message passing is more
appropriate
2002-08-15 S. Haridi, CS2104, Lecture 01 45
Scalability Problems (Distance)
Long communication delays Programming techniques for Local Area
Networks LAN do not really work for Wide Area Networks WAN Synchronous Communication like Remote
Procedure Calls (RPC) are not suitable Asynchronous Message passing is more
appropriate
2002-08-15 S. Haridi, CS2104, Lecture 01 46
Scalability Problems (Distance)
WAN has unreliable communication media Cannot exploit broadcast communication
Only point-to-point communication Locating a service on a WAN is more difficult
that on LAN On LAN just broadcast a service identifier, and
wait for response
2002-08-15 S. Haridi, CS2104, Lecture 01 47
Scalability Problems (Different Administrative Organizations)
Different and conflicting policies for Resource usage Management of the system Security policies
WHO has access to WHAT resources Can I trust a non local system administrator
2002-08-15 S. Haridi, CS2104, Lecture 01 48
Scalability Problems (Different Administrative Organizations)
Protect DS from the domains 1 & 2 Protect domains 1 & 2 from the DS GRID Computing GGF
Distributed System DS
Admin Domain 1 Admin Domain 2
2002-08-15 S. Haridi, CS2104, Lecture 01 49
Focus of the Distributed systems part (Basics)
Components of Distributed Systems Inter-process communication Processes, threads, client/servers, code
migration, software agents Naming services
2002-08-15 S. Haridi, CS2104, Lecture 01 50
Focus of the Distributed systems part (Middleware)
Examples of middleware for building DS Distributed Object-based Systems
CORBA Distributed COM GLOBE
Distributed Coordination-based systems Security
2002-08-15 S. Haridi, CS2104, Lecture 01 51
Focus of the Distributed systems part (Infrastructures)
Distributed file systems Distributed document-based systems
2002-08-15 S. Haridi, CS2104, Lecture 01 52
Focus of the Distributed Algorithms part
Model of Computations Techniques for coordination of processess Techniques for high availability
Fault tolerance Reliable group communication Distributed agreement
Techniques for scalability Consistency models Replication techniques
2002-08-15 S. Haridi, CS2104, Lecture 01 53
Distributed Algorithms
2002-08-15 S. Haridi, CS2104, Lecture 01 54
Distributed Algorithms
How to design distributed algorithmsStudy of some fundamental problemsAnalysis of distributed algorithms
How to achieve fault-tolerance in a distributed systemFault-tolerance: ability for a system to provide
useful service despite the failure of some of its components
Very important for high availability
2002-08-15 S. Haridi, CS2104, Lecture 01 55
Why studying distributed algorithms?
Distributed algorithms are backbone of distributed computing systems
They are essential for the implementation of distributed systems Distributed operating systems Distributed databases, communication systems, Real-time process-control systems, Transportation, etc.
2002-08-15 S. Haridi, CS2104, Lecture 01 56
Classes of distributed algorithms
Fully decentralizedFault-tolerantMore difficult in general
With a centralized coordinatorConceptually simpler Single point of failure, bottleneckRequire efficient mechanisms for selecting a
new coordinator if the current one fails
2002-08-15 S. Haridi, CS2104, Lecture 01 57
References
Text book: Distributed Operating Systems & Algorithms
Randy Chow and Theodore Johnson, Addison Wesley, 1997
Others Distributed Algorithms
Nancy A. Lynch, 1996 Research papers
2002-08-15 S. Haridi, CS2104, Lecture 01 58
Distributed Algorithms Models of Distributed Computation
CausalityOrdering of events, Logical Clocks (timestamps)Causal communication
Distributed snapshotsDetecting stable properties, Diffusing computation
Modeling a distributed computationExpressing correctness properties of a dist. algo.
Failures in a distributed system
2002-08-15 S. Haridi, CS2104, Lecture 01 59
Distributed Algorithms: outline
SynchronizationDistributed mutual exclusion: needed to regulate
accesses to a common resource that can be used only by one process at a time
ElectionUsed for instance, to designate a new coordinator
when the current coordinator fails
2002-08-15 S. Haridi, CS2104, Lecture 01 60
Distributed Algorithms: outline Distributed agreement
How to get a set of nodes to agree on a value
Distributed agreement is used for instance, To determine which nodes are alive in the
systemTo confine malicious behavior of some
components (Fault-tolerance again!)
2002-08-15 S. Haridi, CS2104, Lecture 01 61
Distributed Algorithms: outline
Replicated data management A key for high availability is to replicate
components (data/files, servers, etc.)
We shall be concerned with Techniques for maintaining replicated data in a
distributed system, (database techniques) Atomic broadcast/multicast Membership
2002-08-15 S. Haridi, CS2104, Lecture 01 62
Distributed Algorithms: outline
Check-pointing and recovery Error recovery is essential for fault-tolerance When a processor fails and then is repaired, it will
need to recover its state of the computation To enable recovery, check-pointing (recording of
the state into a stable storage) is needed We will be concerned with techniques used for
this, in the context of distributed systems
2002-08-15 S. Haridi, CS2104, Lecture 01 63
Background
2002-08-15 S. Haridi, CS2104, Lecture 01 64
Distributed system, distributed computing
Early computing was performed on a single processor. Uni-processor computing can be called centralized computing.
A distributed system is a collection of independent computers, interconnected via a network, capable of collaborating on a task.
Distributed computing is computing performed in a distributed system.
2002-08-15 S. Haridi, CS2104, Lecture 01 65
Distributed Systems
T h e I n te r n e t
a n e tw o r k h o s t
w o r ks ta t io n s a lo c a l n e tw o r k
2002-08-15 S. Haridi, CS2104, Lecture 01 66
Examples of Distributed systems
Network of workstations (NOW): a group of networked personal workstations connected to one or more server machines.
The Internet An intranet: a network of computers and
workstations within an organization, segregated from the Internet via a protective device (a firewall).
2002-08-15 S. Haridi, CS2104, Lecture 01 67
Computers in a Distributed System
Workstations: computers used by end-users to perform computing
Server machines: computers which provide resources and services
Personal Assistance Devices: handheld computers connected to the system via a wireless communication link.
2002-08-15 S. Haridi, CS2104, Lecture 01 68
Centralized vs. Distributed Computing
m ain f r am e c o m p u terw o r k s ta tio n
n etw o r k h o s t
n e tw o r k lin k
ter m in al
ce n tra lize d co m pu t in gdis tribu te d co m pu t in g
2002-08-15 S. Haridi, CS2104, Lecture 01 69
Evolution of pardigms
Client-server: Socket API, remote method invocation Distributed objects Object broker: CORBA Network service: Jini Object space: JavaSpaces Mobile agents Message oriented middleware (MOM): Java Message
Service Collaborative applications
2002-08-15 S. Haridi, CS2104, Lecture 01 70
Cooperative distributed computing projects
Cooperative distributed computing projects (also called distributed computing in some literature): these are projects that parcel out large-scale computing to workstations, often making use of surplus CPU cycles. Example: seti@home: project to scan data retrieved by a radio telescope to search for radio signals from another world.
2002-08-15 S. Haridi, CS2104, Lecture 01 71
Why distributed computing?
Economics: distributed systems allow the pooling of resources, including CPU cycles, data storage, input/output devices, and services.
Reliability: a distributed system allow replication of resources and/or services, thus reducing service outage due to failures.
The Internet has become a universal platform for distributed computing.
2002-08-15 S. Haridi, CS2104, Lecture 01 72
The Weaknesses and Strengths of Distributed Computing In any form of computing, there is always a
tradeoff in advantages and disadvantages Some of the reasons for the popularity of
distributed computing : The affordability of computers and
availability of network access Resource sharing Scalability Fault Tolerance
2002-08-15 S. Haridi, CS2104, Lecture 01 73
The Weaknesses and Strengths of Distributed ComputingThe disadvantages of distributed computing: Multiple Points of Failures: the failure of
one or more participating computers, or one or more network links, can spell trouble.
Security Concerns: In a distributed system, there are more opportunities for unauthorized attack.
Difficult to develop application
2002-08-15 S. Haridi, CS2104, Lecture 01 74
Introductory Basics
M. L. Liu
2002-08-15 S. Haridi, CS2104, Lecture 01 75
Basics in three areas
Some of the notations and concepts from these areas will be employed from time to time in the presentations for this course: Programming Languages Operating systems Networks.
2002-08-15 S. Haridi, CS2104, Lecture 01 76
Procedural versus Object-oriented
Programming In building network applications, there are two main classes of programming languages: procedural language and object-oriented language. Procedural languages, with the C language
being the primary example, use procedures (functions) to break down the complexity of the tasks that an application entails.
Object-oriented languages, exemplified by Java, use objects to encapsulate the details. Each object carrying state data as well as behaviors. State data are represented as instance data. Behaviors are represented as methods.
2002-08-15 S. Haridi, CS2104, Lecture 01 77
Operating Systems Basics
2002-08-15 S. Haridi, CS2104, Lecture 01 78
Operating systems basics
A process consists of an executing program, its current values, state information, and the resources used by the operating system to manage its execution.
A program is an artifact constructed by a software developer; a process is a dynamic entity which exists only when a program is run.
2002-08-15 S. Haridi, CS2104, Lecture 01 79
Process State Transition Diagram
S im plif e d f in it e s ta te dia g ra m fo r a pro ce s s 's lif e t im e
s ta rt
re a dyru n n in g
blo ck e d
te rm in a te d
d is p atc h
q u eu ed
ev en t c o m p letio n w aitin gfo r ev en t
ex it
2002-08-15 S. Haridi, CS2104, Lecture 01 80
Example: Java processes
There are three types of Java program: applications, applets, and servlets, all are written as a class. A Java application program is run as an
independent(standalone) process. An applet is run using a browser or the applet viewer. A servlet is run in the context of a web server.
A Java program is compiled into byte code, a universal object code. When run, the byte code is interpreted by the Java Virtual Machine (JVM).
2002-08-15 S. Haridi, CS2104, Lecture 01 81
Three Types of Java programs Applications
a program whose byte code can be run on any system which has a Java Virtual Machine. An application may be standalone (monolithic) or distributed (if it interacts with another process).
Applets
A program whose byte code is downloaded from a remote machine and is run in the browser’s Java Virtual Machine.
Servlets
A program whose byte code resides on a remote machine and is run at the request of an HTTP client (a browser).
2002-08-15 S. Haridi, CS2104, Lecture 01 82
Three Types of Java programs
computer
Java object
Java Virtual Machine
A standalone Java application is run on a local machine
Java object
Java Virtual Machine
An applet is an object downloaded (transferred) from a remote machine,then run on a local machine.
request
response
a servlet
an applet
Aservlet is an object that runs on a remote machine andinteracts with a local program using a request-response protocol
a process
2002-08-15 S. Haridi, CS2104, Lecture 01 83
Concurrent Processing
On modern day operating systems, multiple processes appear to be executing concurrently on a machine by timesharing resources.
Processes
time
P1P2
P3P4
Timesharing of a resource
2002-08-15 S. Haridi, CS2104, Lecture 01 84
Concurrent processing within a process
It is often useful for a process to have parallel threads of execution,
each of which timeshare the system resources in much the same
way as concurrent processes.
p ar en t p r o c es s
c h ild p r o c es s es
A pa re n t pro ce s s m a y s pa wn ch ild pro ce s s e s .
a p r o c es s
m ain th r ead
c h ild th r ead 1
c h ild th r ead 2
A pro ce s s m a y s pa wn ch ild th re a ds
C o n cu rre n t pro ce s s in g with in a pro ce s s
2002-08-15 S. Haridi, CS2104, Lecture 01 85
Thread-safe Programming
When two threads independently access and update the same data object, such as a counter, as part of their code, the updating needs to be synchronized. (See next slide.)
Because the threads are executed concurrently, it is possible for one of the updates to be overwritten by the other due to the sequencing of the two sets of machine instructions executed in behalf of the two threads.
To protect against the possibility, a synchronized method can be used to provide mutual exclusion.
2002-08-15 S. Haridi, CS2104, Lecture 01 86
Race Condition
fe tch va lu e in co u n te r a n d lo a d in to a re g is te r
in cre me n t va lu e in re g is te r
s to re va lu e in re g is te r to co u n te r
t im e
fe tch va lu e in co u n te r a n d lo a d in to a re g is te r
in cre m e n t va lu e in re g is te r
s to re va lu e in re g is te r to co u n te r
in s tr u c tio n ex ec u ted in c o n c u r r en t p r o c es s o r th r ead 1
in s tr u c tio n ex ec u ted in c o n c u r r en t p r o c es s o r th r ead 2
This e xe c ut io n re s ul ts in the value 2 in the c o unte r
fe tch va lu e in co u n te r a n d lo a d in to a re g is te r
fe tch va lu e in co u n te r a n d lo a d in to a re g is te r
in cre me n t va lu e in re g is te r
in cre m e n t va lu e in re g is te r
sto re va lu e in re g is te r to co u n te r
s to re va lu e in re g is te r to co u n te r
This e xe c ut io n re s ul ts in the value 1 in the c o unte r
2002-08-15 S. Haridi, CS2104, Lecture 01 87
Network Basics
2002-08-15 S. Haridi, CS2104, Lecture 01 88
Network standards and protocols
On public networks such as the Internet, it is necessary for a common set of rules to be specified for the exchange of data.
Such rules, called protocols, specify such matters as the formatting and semantics of data, flow control, error correction.
Software can share data over the network using network software which supports a common set of protocols.
2002-08-15 S. Haridi, CS2104, Lecture 01 89
Protocols
A protocol is a set of rules that must be observed by the participants.
Protocols must be formally defined and precisely implemented. For each protocol, there must be rules that specify the followings:
How is the data exchanged encoded?
How are events (sending , receiving) synchronized so that the participants can send and receive in a coordinated order?
The specification of a protocol does not dictate how the rules are to be implemented.
2002-08-15 S. Haridi, CS2104, Lecture 01 90
The network architecture
Network hardware transfers electronic signals,which represent a bit stream, between two devices.
Modern day network applications require an application programming interface (API) which masks the underlying complexities of data transmission.
A layered network architecture allows the functionalities needed to mask the complexities to be provided incrementally, layer by layer.
Actual implementation of the functionalities may not be clearly divided by layer.
2002-08-15 S. Haridi, CS2104, Lecture 01 91
The OSI seven-layer network architecture
application layer
presentation layer
session layer
transport layer
network layer
data link layer
physical layer
application layer
presentation layer
session layer
transport layer
network layer
data link layer
physical layer
2002-08-15 S. Haridi, CS2104, Lecture 01 92
Network Architecture
The division of the layers is conceptual: the implementation of the functionalities need not be clearly divided as such in the hardware and software that implements the architecture. The conceptual division serves at least two useful purposes :1. Systematic specification of protocols
it allows protocols to be specified systematically
2. Conceptual Data Flow: it allows programs to be written in terms of logical data flow.
2002-08-15 S. Haridi, CS2104, Lecture 01 93
The TCP/IP Protocol Suite The Transmission Control Protocol/Internet Protocol suite is a set of
network protocols which supports a four-layer network architecture. It is currently the protocol suite employed on the Internet.
Ap p lic a tio n lay er
T r an s p o r t lay er
I n te r n e t lay er
P h y s ic a l lay er
Ap p lic a tio n lay er
T r an s p o r t lay er
I n te r n e t lay er
P h y s ic a l lay er
Th e I n te rn e t n e two rk a rch ite ctu re
2002-08-15 S. Haridi, CS2104, Lecture 01 94
The TCP/IP Protocol Suite -2
The Internet layer implements the Internet Protocol, which provides the functionalities for allowing data to be transmitted between any two hosts on the Internet.
The Transport layer delivers the transmitted data to a specific process running on an Internet host.
The Application layer supports the programming interface used for building a program.
2002-08-15 S. Haridi, CS2104, Lecture 01 95
Network Resources
Network resources are resources available to the participants of a distributed computing community.
Network resources include hardware such as computers and equipment, and software such as processes, email mailboxes, files, web documents.
An important class of network resources is network services such as the World Wide Web and file transfer (FTP), which are provided by specific processes running on computers.
2002-08-15 S. Haridi, CS2104, Lecture 01 96
Identification of Network Resources
One of the key challenges in distributed computing is the unique identification of resources available on the network, such as e-mail mailboxes, and web documents. Addressing an Internet Host Addressing a process running on a host Email Addresses Addressing web contents: URL
2002-08-15 S. Haridi, CS2104, Lecture 01 97
Addressing Internet Hosts
2002-08-15 S. Haridi, CS2104, Lecture 01 98
The Internet Topology
s u b n ets
T h e I n ter n e t b ac k b o n e
an I n ter n e t h o s t
Th e I n te rn e t To po lo g y M o de l
2002-08-15 S. Haridi, CS2104, Lecture 01 99
The Internet Topology The internet consists of an hierarchy of
networks, interconnected via a network backbone.
Each network has a unique network address. Computers, or hosts, are connected to a
network. Each host has a unique ID within its network.
Each process running on a host is associated with zero or more ports. A port is a logical entity for data transmission.
2002-08-15 S. Haridi, CS2104, Lecture 01 100
The Internet addressing scheme In IP version 4, each address is 32 bit long. The address space accommodates 232 (4.3 billion) addresses in total. Addresses are divided into 5 classes (A through E)
0
10
11
11
0
0111
1111 101
network address
host portionmulticast group
reserved
byte 0 byte 1 byte 2 byte 3
class A address
class B address
class C address
Multicast addresses
reserved address reserved
2002-08-15 S. Haridi, CS2104, Lecture 01 101
The Internet addressing scheme - 2
1 0 network address host portion
byte 0 byte 1 byte 2 byte 3
class B address
subnet address local host address
Subdividing the host portion of an Internet address:
A class A/C address space canalso be similarly subdivided..
Which portion of the host address
is used for the subnet identificationis determined by a subnet mask.
2002-08-15 S. Haridi, CS2104, Lecture 01 102
Suppose the dotted-decimal notation for a particular Internet address
is129.65.24.50. The 32-bit binary expansion of the notation is as
follows:
Since the leading bit sequence is 10, the address is a Class address. Within the class, the network portion is identified by theremaining bits in the first two bytes, that is, 00000101000001, and thehost portion is the values in the last two bytes, or 0001100000110010. For convenience, the binary prefix for class identification is often included as
part of the network portion of the address, so that we would say that this particular address is at network 129.65 and then at host address 24.50 on that network.
1 2 9 .6 5 .2 4 .5 01 0 0 0 0 0 0 1
0 1 0 0 0 0 0 1
0 0 0 11 0 0 0
0 0 11 0 0 1 0
Example
2002-08-15 S. Haridi, CS2104, Lecture 01 103
Given the address 224.0.0.1, one can expand it as follows:
The binary prefix of 1110 signifies that this is class D, or multicast, address. Data packets sent to this address should therefore be delivered to the multicast group
0000000000000000000000000001.
2 2 4 .0 .0 .1
11 1 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1
Another Example
2002-08-15 S. Haridi, CS2104, Lecture 01 104
The Internet Address Scheme - 3
For human readability, Internet addresses are written in a dotted decimal notation:
nnn.nnn.nnn.nnn, where each nnn group is a decimal value in the range of 0 through 255
# Internet host table (found in /etc/hosts file)
127.0.0.1 localhost
129.65.242.5 falcon.csc.calpoly.edu falcon loghost
129.65.241.9 falcon-srv.csc.calpoly.edu falcon-srv
129.65.242.4 hornet.csc.calpoly.edu hornet
129.65.241.8 hornet-srv.csc.calpoly.edu hornet-srv
129.65.54.9 onion.csc.calpoly.edu onion
129.65.241.3 hercules.csc.calpoly.edu hercules
2002-08-15 S. Haridi, CS2104, Lecture 01 105
IP version 6 Addressing Scheme
Each address is 128-bit long. There are three types of addresses:
Unicast: An identifier for a single interface. Anycast: An identifier for a set of interfaces
(typically belonging to different nodes). Multicast: An identifier for a set of interfaces
(typically belonging to different nodes). A packet sent to a multicast address is delivered to all interfaces identified by that address.
See Request for Comments: 2373 http://www.faqs.org/rfcs/ (link is in book’s reference)
2002-08-15 S. Haridi, CS2104, Lecture 01 106
The Domain Name System (DNS)
Each Internet address is mapped to a symbolic name, using the DNS, in the format of:
<computer-name>.<subdomain hierarchy>.<organization>.<sector name>{.<country code>}
e.g., www.csc.calpoly.edu.us
ro o t
co me du g o v n e t o rg m il
o rg a n iza t io n
...
...
h o s t n a m e
to p- le v e l do m a in
s u bdo m a in
in th e U.S .
To p- le v e l do m a in n a m e h a s to be a pplie d fo r.S u bdo m a in h ie ra ch y a n d n a m e s a re a s s ig n e d by th e o rg a n iza t io n .
cou n try code
2002-08-15 S. Haridi, CS2104, Lecture 01 107
The Domain Name System For network applications, a domain name must be
mapped to its corresponding Internet address. Processes known as domain name system servers
provide the mapping service, based on a distributed database of the mapping scheme.
The mapping service is offered by thousands of DNS servers on the Internet, each responsible for a portion of the name space, called a zone. The servers that have access to the DNS information (zone file) for a zone is said to have authority for that zone
2002-08-15 S. Haridi, CS2104, Lecture 01 108
Domain Name Hierarchy
. au . c a .u s . zw .c o m .g o v .ed u .m il .n e t. . .. . . . . . . o r g
c o u n tr y c o d e
u c s b .ed u c a lp o ly . ed u
c s c ee. . . en g lis h w ir e les s. . .c s ec e. . . . . .
. . . . . .
. ( r o o t d o m ain )
2002-08-15 S. Haridi, CS2104, Lecture 01 109
Name lookup and resolution
If a domain name is used to address a host, its corresponding IP address must be obtained for the lower-layer network software.
The mapping, or name resolution, must be maintained in some registry.
For runtime name resolution, a network service is needed; a protocol must be defined for the naming scheme and for the service. Example: The DNS service supports the DNS; the Java RMI registry supports RMI object lookup; JNDI is a network service lookup protocol.
2002-08-15 S. Haridi, CS2104, Lecture 01 110
Addressing a process running on a host
2002-08-15 S. Haridi, CS2104, Lecture 01 111
Logical Ports
...
process
port
...
host A
host B
The Internet
Each host has 65536 ports.
2002-08-15 S. Haridi, CS2104, Lecture 01 112
Well Known Ports
Each Internet host has 216 (65,535) logical ports. Each port is identified by a number between 1 and 65535, and can be allocated to a particular process.
Port numbers beween 1 and 1023 are reserved for processes which provide well-known services such as finger, FTP, HTTP, and email.
2002-08-15 S. Haridi, CS2104, Lecture 01 113
Well-known ports
Pro to co l Po rt S e rv ice
e cho 7 IPC te stin g
d a ytime 1 3 p ro v id e s the cu rre n t d a te a n d time
ftp 2 1 file tra n sfe r p ro toco l
te lne t 2 3 remote, c ommand- line termina l s es s ion
smtp 2 5 simp le ma il tran sfe r p ro to co l
time 3 7 p ro v id e s a s tan d a rd time
fin ge r 7 9 p ro v id e s in fo rma tio n a b ou t a u se r
h ttp 8 0 we b se rve r
R MI R e g is try 1 0 9 9 re g is try fo r R e m o te Me th o d In vo ca tio n
sp ecia l we b se rve r 8 0 8 0we b se rve r wh ich su p p o rts
se rv le ts , JSP, o r ASP
A ssig n m en t o f so m e w ell-kn o w n p o rts
2002-08-15 S. Haridi, CS2104, Lecture 01 114
Choosing a port to run your program
For programming: when a port is needed, choose a random number above the well known ports: 1,024- 65,535.
For providing a network service for the community, then arrange to have a port assigned to and reserved for your service.
2002-08-15 S. Haridi, CS2104, Lecture 01 115
Addressing a Web Document
2002-08-15 S. Haridi, CS2104, Lecture 01 116
The Uniform Resource Identifier (URI)
Resources to be shared on a network need to be uniquely identifiable.
On the Internet, a URI is a character string which allows a resource to be located.
There are two types of URIs: URL (Uniform Resource Locator) points to a
specific resource at a specific location URN (Uniform Resource Name) points to a
specific resource at a nonspecific location.
2002-08-15 S. Haridi, CS2104, Lecture 01 117
URL
A URL has the format of: protocol://host address[:port]/directory path/file name#section
A sample URL:
http:// www.csc.calpoly.edu :8080/~ mliu/ CSC369 / hw.html # hw1
protocol of server
host name
port number of server process
directory path
file name
section name
Other protocols that can appear in a URL are: file
ftp gopher news telnet WAIS
2002-08-15 S. Haridi, CS2104, Lecture 01 118
More on URL
The path in a URL is relative to the document root of the server.
A URL may appear in a document in a relative form:
< a href=“another.html”>
and the actual URL referred to will be another.html preceded by the protocol, hostname, directory path of the document .
2002-08-15 S. Haridi, CS2104, Lecture 01 119
Summary - 1
We discussed the following topics: What is meant by distributed computing Distributed system Basic concepts in operating system:
processes and threads
2002-08-15 S. Haridi, CS2104, Lecture 01 120
Summary - 2
Basic concepts in data communication: Network architectures: the OSI model and the
Internet model Connection-oriented communication vs.
connectionless communication Naming schemes for network resources
The Domain Name System (DNS) Protocol port numbers Uniform Resource Identifier (URI) Email addresses