This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
LW R3, head(R0) ; Load queue head into R3spin: LW R4, tail(R0) ; Load queue tail into R4 BEQ R4, R3, spin ; If queue empty, wait LW R5, 0(R3) ; Read x from queue into R5 ADDi R3, R3, 4 ; Shift head by one word SW R3 head(R0) ; Update head memory addr
T2 & T3 (2 copes
of consumer thread)
y x
Tail Head
y
Tail Head
After:Before:Higher Addresses
T1 code(producer)
ORi R1, R0, x ; Load x value into R1LW R2, tail(R0) ; Load queue tail into R2 SW R1, 0(R2) ; Store x into queueADDi R2, R2, 4 ; Shift tail by one wordSW R2 0(tail) ; Update tail memory addr
Critical section: T2 and T3 must take turns running red code.
Recall: Sequential ConsistencySequential Consistency: As if each thread takes turns executing, and instructions in each thread execute in program order.
LW R3, head(R0) ; Load queue head into R3spin: LW R4, tail(R0) ; Load queue tail into R4 BEQ R4, R3, spin ; If queue empty, wait LW R5, 0(R3) ; Read x from queue into R5 ADDi R3, R3, 4 ; Shift head by one word SW R3 head(R0) ; Update head memory addr
T2 code(consumer)
T1 code(producer)
ORi R1, R0, x ; Load x value into R1LW R2, tail(R0) ; Load queue tail into R2 SW R1, 0(R2) ; Store x into queueADDi R2, R2, 4 ; Shift tail by one wordSW R2 0(tail) ; Update tail memory addr
1
2
3
4
Legal orders: 1, 2, 3, 4 or 1, 3, 2, 4 or 3, 4, 1 2 ... but not 2, 3, 1, 4!
Total bus write bandwidth does not support more than 2 CPUs, in modern practice.
To scale further, we need to use write-back caches.
Write-back big trick: keep track of whether other caches also contain a cached line. If not, a cache has an “exclusive” on the line, and can read and write the line as if it were the only CPU.For details, take CS 252 ...
Clusters also used for web serversIn some applications, each machine can handle a net query by itself.
Example: serving static web pages. Each machine has a copy of the website.
but I intentionally ignore them here because theyare well studied elsewhere and because the issuesin this article are largely orthogonal to the use ofdatabases.
AdvantagesThe basic model that giant-scale services followprovides some fundamental advantages:
! Access anywhere, anytime. A ubiquitous infra-structure facilitates access from home, work,airport, and so on.
! Availability via multiple devices. Because theinfrastructure handles most of the processing,users can access services with devices such asset-top boxes, network computers, and smartphones, which can offer far more functionali-ty for a given cost and battery life.
! Groupware support. Centralizing data frommany users allows service providers to offergroup-based applications such as calendars, tele-conferencing systems, and group-managementsystems such as Evite (http://www.evite.com/).
! Lower overall cost. Although hard to measure,infrastructure services have a fundamental costadvantage over designs based on stand-alonedevices. Infrastructure resources can be multi-plexed across active users, whereas end-userdevices serve at most one user (active or not).Moreover, end-user devices have very low uti-lization (less than 4 percent), while infrastruc-ture resources often reach 80 percent utiliza-tion. Thus, moving anything from the deviceto the infrastructure effectively improves effi-ciency by a factor of 20. Centralizing theadministrative burden and simplifying enddevices also reduce overall cost, but are harderto quantify.
! Simplified service updates. Perhaps the mostpowerful long-term advantage is the ability toupgrade existing services or offer new serviceswithout the physical distribution required bytraditional applications and devices. Devicessuch as Web TVs last longer and gain useful-ness over time as they benefit automaticallyfrom every new Web-based service.
ComponentsFigure 1 shows the basic model for giant-scalesites. The model is based on several assumptions.First, I assume the service provider has limitedcontrol over the clients and the IP network.Greater control might be possible in some cases,however, such as with intranets. The model also
assumes that queries drive the service. This is truefor most common protocols including HTTP, FTP,and variations of RPC. For example, HTTP’s basicprimitive, the “get” command, is by definition aquery. My third assumption is that read-onlyqueries greatly outnumber updates (queries thataffect the persistent data store). Even sites that wetend to think of as highly transactional, such as e-commerce or financial sites, actually have thistype of “read-mostly” traffic1: Product evaluations(reads) greatly outnumber purchases (updates), forexample, and stock quotes (reads) greatly out-number stock trades (updates). Finally, as the side-bar, “Clusters in Giant-Scale Services” (next page)explains, all giant-scale sites use clusters.
The basic model includes six components:
! Clients, such as Web browsers, standalone e-mail readers, or even programs that use XMLand SOAP initiate the queries to the services.
! The best-effort IP network, whether the publicInternet or a private network such as anintranet, provides access to the service.
! The load manager provides a level of indirectionbetween the service’s external name and theservers’ physical names (IP addresses) to preservethe external name’s availability in the presenceof server faults. The load manager balances loadamong active servers. Traffic might flow throughproxies or firewalls before the load manager.
! Servers are the system’s workers, combiningCPU, memory, and disks into an easy-to-repli-cate unit.
IEEE INTERNET COMPUTING http://computer.org/internet/ JULY • AUGUST 2001 47
Giant-Scale Services
Client
Client
Client
Loadmanager
Persistent data store
Client
IP network
Single-site server
Optionalbackplane
Figure 1.The basic model for giant-scale services. Clients connect viathe Internet and then go through a load manager that hides downnodes and balances traffic.Load manager is a special-purpose computer that assigns
incoming HTTP connections to a particular machine.Image from Eric Brewer’s IEEE Internet Computing article.
Clusters also used for web servicesIn other applications, many machines work together on each transaction.
Example: Web searching. The search is partitioned over many machines, each of which holds a part of the database.
Altavista web search engine did not use clusters. Instead, Altavista used shared-memory multiprocessors. This approach could not scale with the web.
above 20 Gbits per second. They detect downnodes automatically, usually by monitoring openTCP connections, and thus dynamically isolatedown nodes from clients quite well.
Two other load-management approaches aretypically employed in combination with layer-4switches. The first uses custom “front-end” nodesthat act as service-specific layer-7 routers (in soft-ware).2 Wal-Mart’s site uses this approach, forexample, because it helps with session manage-ment: Unlike switches, the nodes track sessioninformation for each user.
The final approach includes clients in the load-management process when possible. This general“smart client” end-to-end approach goes beyondthe scope of a layer-4 switch.3 It greatly simplifiesswitching among different physical sites, which inturn simplifies disaster tolerance and overloadrecovery. Although there is no generic way to dothis for the Web, it is common with other systems.In DNS, for instance, clients know about an alter-native server and can switch to it if the primarydisappears; with cell phones this approach isimplemented as part of roaming; and applicationservers in the middle tier of three-tier databasesystems understand database failover.
Figures 2 and 3 illustrate systems at oppositeends of the complexity spectrum: a simple Web farmand a server similar to the Inktomi search enginecluster. These systems differ in load management,use of a backplane, and persistent data store.
The Web farm in Figure 2 uses round-robinDNS for load management. The persistent datastore is implemented by simply replicating all con-tent to all nodes, which works well with a smallamount of content. Finally, because all servers canhandle all queries, there is no coherence trafficand no need for a backplane. In practice, evensimple Web farms often have a second LAN (back-plane) to simplify manual updates of the replicas.In this version, node failures reduce system capac-ity, but not data availability.
In Figure 3, a pair of layer-4 switches managesthe load within the site. The “clients” are actuallyother programs (typically Web servers) that use thesmart-client approach to failover among differentphysical clusters, primarily based on load.
Because the persistent store is partitionedacross servers, possibly without replication, nodefailures could reduce the store’s effective size andoverall capacity. Furthermore, the nodes are nolonger identical, and some queries might need tobe directed to specific nodes. This is typicallyaccomplished using a layer-7 switch to parse
URLs, but some systems, such as clustered Webcaches, might also use the backplane to routerequests to the correct node.4
High AvailabilityHigh availability is a major driving requirementbehind giant-scale system design. Other infra-
IEEE INTERNET COMPUTING http://computer.org/internet/ JULY • AUGUST 2001 49
Giant-Scale Services
Client
Client
Client
Round-robin DNS
Simple replicated store
Client
IP network
Single-site server
Figure 2. A simple Web farm. Round-robin DNS assigns differentservers to different clients to achieve simple load balancing. Persis-tent data is fully replicated and thus all nodes are identical and canhandle all queries.
Program
Program
Program
Loadmanager
Partitioned data store
Program
IP network
Single-site server
Myrinet backplane
Figure 3. Search engine cluster. The service provides support to otherprograms (Web servers) rather than directly to end users.These pro-grams connect via layer-4 switches that balance load and hide faults.Persistent data is partitioned across the servers, which increasesaggregate capacity but implies there is some data loss when a serveris down. A backplane allows all nodes to access all data.