Distributed Systems Introduction and background

CSCI652.002 Spring 2014 B&K 1

Distributed Systems

Introduction and background

Mohan Kumar


Course information

http://www.cs.rit.edu/~hpb/Lectures/20135/652/index.html


Requirements• CSCI-352 Operating Systems or

equivalent and CSCI-603 Advanced C++ and Program Design or equivalent


Course Content• Issues and challenges in distributed

systems, including: communication, distributed processes, naming and name services, synchronization, consistency and replication, transactions, fault tolerance and recovery, security, distributed objects, and distributed file systems.


Outcomes• Build a solid foundation in distributed systems. • Outcomes:

– Understand fundamental concepts of distributed computing systems.

– Understand modern distributed systems – P2P, mobile, pervasive, sensor etc.

– Recognize importance of addressing challenges in modern systems to facilitate distributed computing.

– Develop distributed programs on real systems.– More (you tell us at the end of semester)


Attendance• Class participation: ACTIVE

Participation will prepare students for midterms. Students are expected to interact actively during lectures. All students are expected to solve homework problems and engage in class discussions.


Course material• Reference Books• Slides by Coulouris et al.

– www.cdk4.net • Power point slides and whiteboard notes prepared

by the professors– Students are expected to read corresponding chapters

from textbook prior to each class (please see tentative schedule).

– PPT slides prepared by the professors may or may not be available before class. But they will be made available after class.

• Reference books and articles


Course organization• The course will mainly have two main themes. • Distributed Algorithms–

• distributed processes/objects, interprocess communication, remote procedure call, coordination, file systems, clocks and global states, security, concurrency, shared memory, transactions and replication.

• Systems -– Operating systems, Distributed file systems, Name

services, case studies, implementations, P2P, Security,

– Plan 9 System


Textbook and ReferencesTextbook

Distributed Systems: Concepts and DesignGeorge Coulouris, Jean Dollimore and Tim KindbergAddison Wesley, 4th Edition, 5th Edition- e-version of 5th edition is available on Kindle

ReferencesDistributed Systems: Principles and ParadigmsA.S. Tanenbaum and M. V. Steen, Pearson Publishers,2nd Edition.Distributed Operating Systems & Algorithms, R Chow

and T. Johnson, Addison-Wesley, 1997.

– Related Articles – details will be provided during the course


Grading• The structure of quizzes will be discussed in class, at least one week

prior to the quiz.– Midterm 1: 15%– Midterm 2: 15%– Final Exam: 30%

• Group Work (project, presentation, report and class participation): 40%.

• Group Presentations: Will be scheduled during the last week of semester.

• Group Work Reports: Due at 9 am May 10, 2014. • Each Group will have 3 members; Groups to be formed before February

15. • • Group Work: Problems will be assigned by February 25 and the

expected date of completion is May 10.


What is a distributed system?

• Concurrent components– Independent – Use message passing to communicate

and coordinate • Lack of global clock– Asynchronous

• Independent failures of components– Good for fault-tolerance


“ A distributed system is a collection of independent computers that appears to its users as a single

coherent system”Tannenbaum and Van Steen, Distributed Systems, 2007.

• Application developers can focus on developing applications rather than system issues

• The distributed system should be– Easy to expand or scale– Available all the time– Accessible uniformly– Fault-tolerant


Layered representation

Applications and services

Middleware

Operating SystemCommunications Network

Hardware

Mask HeterogeneityProvide abstraction, transparencyUniformity

PLATFORM


Motivation• Resource sharing– CPU– Disk– Software services– Databases

• Fault-tolerance– Redundancy– Replication


Challenges• Heterogeneity• Transparency, openness• Security and privacy• Scalability• Failure handling• Concurrency of components


Modern Distributed Systems• Mobility– Wireless communications

• WiFI, Bluetooth, Zigbee, LTE, WiMax, Cellular• Ubiquity– Small, but multifunctional devices

• Cell phones, sensors, RFIDs• Large scale– Components – Data– Users


Enablers• Computer Technology– Advanced microprocessors– Multi-core architectures– Lower costs (CPU, memory, peripheral devices)

• High-speed networks– Wired and wireless

• Applications– Business– Scientific– Everything else ….


Examples of Distributed Systems

• The Internet• Intranets• Mobile and

Ubiquitous systems Grid Computers

• Pervasive Systems

• Sensor Systems• P2P Networks

Airlines Aircraft Car Building University


Recent DevelopmentsWireless ad hoc networking• Novel algorithms and schemes developed• Cooperation in the absence of infrastructure

Pervasive computing• Context-aware services to users/applications• Smart environments

Distributed resources• Mobile devices possess myriad of resources

Opportunistic communications• Exchange of packets/bundles

Social networks and computing• Exploit gregarious nature of humans


Fading Distinctions

Servers and clients• Distributed systems, P2P systems• Cost and time

Producers and consumers of information• Users are producers of information as well

• User with a cell phone cameraService providers and consumers• Resources on user devices can be exploited

Resourceful and resource-poor entities • Servers, desktops, laptops, mobile phones• Grid computing• Cyber foraging

The Challenge is to provide a uniform view


What is a distributed system?

• Concurrent components– Independent – Use message passing to communicate

and coordinate • Lack of global clock– Asynchronous

• Independent failures of components– Good for fault-tolerance


Concurrency• Program execution• Access to resources• Message passing• Coordination• Resource sharingCoordination of concurrently executing

programs


No Global Clock• Clocks of different components are

not synchronized• Asynchronous• Concurrent programs coordinate

their actions by passing messages


Event ordering• Lamport’s logical ordering– X sends m1 before Y receives m1– Y sends m2 before X receives m2– Because we know replies are sent after

receiving messages– That is m2 is a reply to m1– Y receives m1 before sending m2


Time services• Global time consensus is needed to– Coordinate distributed activities• File backup• Expiration time of a received message/data

– Event related activities• When an event occurs or has already

occurred• How long did it take• Which event occurred first


Clocks• Physical clock– Approximation of real-time

• Logical clock– Preserves ordering of events


Independent Failures• Distributed systems can fail in

multiple ways– CPU/memory of one or more

components– Network link/s– Programs might stop executing• E.g., input/output, synchronization

– System components may get isolated


Resource sharing• Hierarchy• Processors, Disks– Shared data– Shared webpages• Search engine• Weather channel• Currency converter


Services• Manage resources• Present functionalities of resources to users and

applications– Coherent to applications/users

• Examples– File service– Mail service– FTP service

• Client-server architectures– Service may access resources remotely– Clients connect to servers

• Utilize services


Basic applications• Remote login

– Keyboard and display interface– Virtual terminal support

• telnet, rlogin• File transfer

– File, file structures, file attributes• E.g., FTP

• Messaging– Send and receive– Email, SMTP

• Browsing– Information retrieval

• Remote execution– Execute a program on a remote server

• E.g, MIME – multipurpose Internet mail extension


System models• Architectural models– Client-server model– Peer-to-peer model

• Functional models– Interaction model– Failure model– Security model


Architecture• Structural organization of various

components – Simple abstraction of components– Two main objectives

• Placements – Network topology– Data distribution

• Interrelationships– Patterns of communications– Relationships between data objects– Data access patterns, dependencies


Peer-to Peer and Client/server variations

• Peer-to-peer– No distinction among peers– Excellent scalability compared to C-S– Resources are utilized in a distributed network, and more efficiently.

• Minimize bottleneck points• Variations

– Multiple servers• Each server specializes in a providing a particular service

– E.g., web servers, DNS server, authentication etc.– Proxy servers

• Enhance availability• Reduce latency

– Caches• Objects cached to reduce latency

– Mobile code and mobile agents• Mobile code (e.g., applet) downloaded to client’s site

– Local interactions, fast response as there are no communication delays• Mobile agents include code and data

– Go around execute on different processors


Goals• Efficiency

– Propagation delays, communications– Overlapped computation/communication– Efficient distributed processing and load sharing

• Flexibility– User friendly– Ability to evolve and migrate

• Modularity, scalability, portability, and interoperability• Consistency

– Predictability and uniformity in system behavior– Integrity in concurrency control, failure handling and failure handling

• Robustness– Ability to handle exceptional situations and errors

• Change in topology, lost message, crashed system etc.– Reliability, protection and access control

• Secure and privacy preserving


Design requirements• Performance– Responsiveness• Access to shared resources

– Communication delays– Server loads, scheduling, wait periods– Control switching– Load balancing– Combined computation/communication

scheduling– Scalability– Fault-tolerance


Transparency• Ability to hide/mask all

system details from users/application developers– System details are

irrelevant to users/developers

– System details are very relevant to system managers

• Creation of an illusion of a model that it is supposed to beThis is in contrast to the meaning of transparency in English –

open, visible, see through etc.

Applications and services

Middleware

Operating SystemCommunications Network

Hardware

Mask HeterogeneityProvide abstraction, transparencyUniformity

PLATFORM


Basic Processes• Server

– Accepts inputs from other processes– Performs a service– Returns outcomes

• Client– User/application level– Makes requests, receives results

The roles of server and client may change with time

• Peer– All are equal


Processes• A process is a program in

execution– Sequential

• A single control block regulates the execution– A control block contains state

information – program counters, register contents, stack pointers, communication ports, file descriptors etc.

– Process control block (PCB)– Concurrent

• Simultaneously interacting sequential processes are said to be concurrent

• Asynchronous• Separate address space and PCBs• Components may interact

through communication/synchronization

Process

PCB

Process

PCB

Process

PCB


Threads• A lightweight process

– Threads of a process share the same address space, but have their own registers

– A thread control block (or TCB) is local to a thread

– Typically, • Threads have their own PC,

SP and register set. • Threads share address

space, communication ports and file descriptors

– Multiple threads are spawned by a process

– A PCB is shared among interacting threads

– Context switching among threads is lightweight compared to context switching among processes

PCB

Thre

ad

PCB

Thre

ad

Thre

ad

Thre

ad

Thre

ad

Thre

ad

TCB| TCB| TCBPCB

TCB| TCB

Thread run-time library supportOperating System Support


Interaction model• Process interactions

– C-S, P2P, message passing, shared space, synchronous, asynchronous– Single process/thread, multiple threads

• Distributed algorithms– Behavior of multiple processes– Includes message transmissions– Each process

• Has own its PCB and is inaccessible by other processes• Likely to be executing on different systems in the network• Difficult to coordinate

• Two significant factors– Communication performance– Maintenance of global state

• Computer clocks drift • Clock drifts differ from one another

Functional modelsInteraction modelFailure modelSecurity model


Performance of communication channels

• Latency – Time taken for message to arrive at the destination– Delay in accessing the network– Delay (processing times at) due to OS communication

services at both ends• Bandwidth

– Frequency– Interference – Channel sharing

• Jitter– Variation in times taken to deliver different components

of a message


Two variants• Synchronous– Process execution time is bounded– Message latency over a channel is bounded– Process’ local clock drift is bounded– Though difficult to build, very useful as a model

• Time outs• Detect failures

• Asynchronous– Blue bullets (Assumptions) above are NOT true– Most systems are asynchronous


Failure model• Omission failures

– Processor/process crash– Communication failure/message drops

• Arbitrary failures– Process setting wrong values in data– Data corruption during transmission

• Timing failures– Synchronous systems– Real-time systems– Clock, process, channel

• Masking failures– Replication– Service to mask failures



Security model• Protecting objects– Who is allowed to access what data

• Check access rights, verify identity• Securing process and interactions– Processes

• Server, client, peer– Communication channel

• Copy/alter messages; inject harmful messages• Encryption, authentication, time stamping

• Denial of service• Mobile code, mobile agents



Event ordering• Lamport’s logical ordering– X sends m1 before Y receives m1– Y sends m2 before X receives m2– Because we know replies are sent after

receiving messages– That is m2 is a reply to m1– Y receives m1 before sending m2


Time services• Global time consensus is needed to– Coordinate distributed activities• File backup• Expiration time of a received message/data

– Event related activities• When an event occurs or has already

occurred• How long did it take• Which event occurred first


Clocks• Physical clock– Approximation of real-time

• Logical clock– Preserves ordering of events


Network Background

Slides from Kurose and Ross’s book will be used

Please read the book


Networking review• Please read up chapter 4 or a

networking book• I will cover only mobile and wireless

networking


Mobile IP


Mobile IP• Triangle routing,

indirect routing• Direct Routing

– Home agents– Foreign agents– Registrations

• HA• FA• Anchor FA

– Care of address– Encapsulation– Agent discovery– Registration

• TCP – Transmission Control Protocol

• IP – Internet protocol• BS – Base Station• MH – Mobile host• CH – Correspondent

host• HA – Home agent• FA- Foreign agent


TCP• Transport layer protocol• Reliable, uses ACKs• Congestion control– Adjusts to network conditions

• Error control– Packets buffered until ACKs received– Buffered packets resent


Desired (in Mobile systems)• No disruption of services as the user

moves– Changes point of attachment

• How to ensure?– Autonomous transfer–Minimal delays and losses


Effects of Mobility • IP and Mobile IP

– IP• Packets are routed to their destinations according

to IP addresses.• IP addresses are associated with a fixed network

location.– Mobile IP

• Packets may be destined to mobile nodes• Seamless roaming to applications and users.• Shield mobility effects from

– applications – higher level protocolsTCP/IP was designed for wired networks; But it has survived in the wireless world; well, till now at least!!


Effects of Mobility • TCP congestion control mechanism– Acks not received • Slow start or other control mechanisms• Window size is reduced

– Slow startTCP congestion control mechanism in mobile environmentsWhen a MH hands-off from one network , it does not receive packets until it registers at another network. In the meanwhile TCP mechanism at the sender assumes the packets have been lost and goes into congestion recovery mode. Congestion window size is reduced and/or packets are retransmitted. Overall effect –performance deterioration.


Encapsulation/Tunneling• Messages originating at the CH have The home or original address of the MH• The HA encapsulates themessage with the address of theFA in the foreign network and forwards the packet to

the foreign network

• The FA peels off the ‘new address’ and forwards the original packet to the MH in the foreign network.

• This process of appending and peeling off care off addresses is called tunneling or encapsulation.

Original Address

Original AddressNew Address


Split Connections• Split at BS• Selective ACK of out of sequence packets• Mobile TCP

BS MHBSCH

The TCP/IP connection is split at the BS. The BS ACKs packets, buffers them and forwards to the MH.

Core network


Supervising host• One host as a controller in the core

network– Keep track of CHs and MHs

Supervising Host

MH MH MHMH

The supervising host (SH)resides in the wired network and keeps track of all the MHs. The supervising host is contacted for all correspondence related to MHs. The SH maintains a directory of MH locations. One can envision a set of distributed SHs catering to groups of MHs.


Snoop protocol• Lower layer solution

– Processing between TCP and IP at the BS

– Packets are snooped (processed) – Snoop module reads packet addresses

to determine which packets have not been ACKed.

– Facilitates retransmission at the BS• Requires packets to be buffered at the BS

– Multicast solution• MH uses a multicast address as the care-

of-address• All BSs the MH has been (And will be in)

contact with are invited to be members of the multicast group

• The BS where he MH is residing currently will forward the packets.– Remaining BSs discard the packet

TCPIP Snoop

MH

BS

BS

BS

CH

Distributed Systems Introduction and background

Documents

systems operating systems

distributed file systems

modern systems

modern distributed systems

real systems

distributed algorithms

distributed objects

distributed programs