Chapter 3 Distributed Data Processing
Mar 30, 2015
Chapter 3Distributed Data Processing
Data Centers
A facility that houses computer systems and their associated components including storage and telecommunication systems
Can occupy a single room in a building, one or more floors, or an entire building
Much of the equipment consists of servers mounted in rack cabinets that are placed in rows that form corridors that enable people access
to both the front and the rear of each cabinet
Mainframe computers and storage devices are placed alongside the racks
Air-conditioning is used to control temperature and humidity
Centralized Data Processing
Data processing is done on one or on a cluster of computers located in a central data processing facilityUsers transmit data to the centralized data processing facility where it is processed by applications running on the computers located there
The data processing for an application does not take place on the user’s computing device
Centralized Data Processing Centralized Computers
One or more computers are located in a central facilityCentralized Processing
All applications are run on computers in the central data processing facility
Centralized DataMost data are stored in files and databases at the central facility
Centralized ControlThe central facility is managed by a data processing or information security manager
Centralized Support StaffMust include a technical support staff to operate and maintain the data center hardware and applications
DallasCounty
InformationSystems
Architecture
Distributed Data Processing (DDP)Computers are dispersed throughout an organizationObjective is to process information in a way that is most effective based on operational, economic, and/or geographic considerationsMay include a central data center plus satellite data centers or it may resemble a community of peer computing facilities
Various computers in the system must be connected to one anotherA DDP facility involves the distribution of computers, processing, and data
Carnival Valor Wireless
LAN
Table 3.1 Requirements for the Corporate Computing Function
Table 3.2
Potential Benefits
of Distributed Data
Processing
(page 1 of 2)
Table 3.2
Potential Benefits
of Distributed
DataProcessing
(page 2 of 2)
Table 3.3
Potential Drawbacks
of Distributed Data Processing
Table 3.4
Major Characteristics
of Data Center
Tiers
Data Center Computing and Storage Technologies
Mainframes Sales continue to be strong and they are increasingly being used as a hub for enterprise infrastructure because of their potential to enhance security, ensure availability, and improve manageability
In-memory computing systemsProcessors include terabyte-plus RAM capable of storing large data setsHas the potential to revolutionize business intelligence (BI) by making it possible to bring the equivalent data warehouse into memory to enable real-time data mining and business analytics
Virtualization
The creation of a virtual (rather than actual) version of something
In computing this means creating virtual versions of operating systems, servers, storage devices, and networks
Categories:Operating system virtualizationServer virtualizationStorage virtualizationNetwork virtualization
Client/Server Architecture (C/S)
Combines advantages of distributed and
centralized computing
Cost-effective and achieves economies of
scale by centralizing support for specialized
functions
Flexibility is provided by the fact that the functional services provided by servers
are not necessarily in a one-to-one relation
with physical computers
Three Tier Enterprise System Architecture
IntranetsProvides users of client devices with applications associated with the Internet but isolated within the organizationKey features:
Uses Internet-based standards such as HyperText Markup Language (HTML) and the Simple Mail Transfer Protocol (SMTP)Uses the TCP/IP protocol suite applications and servicesIncludes wholly owned content that is not accessible to external users over the public Internet
Such content can also be access by authorized internal users even though the corporation has Internet connections and runs a Web server on the Internet
Extranets
Makes use of TCP/IP protocols and applications, especially the WebDistinguishing feature is that it provides access to corporate resources by authorized outside clients
This outside access can be provided via the company’s connections to the Internet or through other data communications networks
Enables authorized outside clients with fairly extensive access to corporate resourcesTypical model of operation is client/server
Application Service Provider (ASP)
Businesses that provide computer-based services to business subscribers over a network
Software that ASPs provide is called on-demand software or software as a service (SaaS)
Costs and complexities of sophisticated software can be reduced to levels that small and medium-size firms can afford• Software is kept up to date• 24x7x365 technical support is provided• Physical and electronic security is provided
Service-level agreements • guarantee certain levels of service,
such as availability
Cloud Computing
Encompasses any subscription-based or pay-per-use service that extends an organization’s existing IT capabilities over the Internet in real timeEnables businesses to increase capabilities or capacity without investing in new infrastructure, licensing new software, or training personnelForms of cloud computing:
Software as a service (SaaS)Infrastructure as a service (IaaS)Platform as a service (PaaS)Managed service providers (MSP)
Distributed ApplicationsTwo dimensions characterize the distribution of applications
Allocation of application functions within the network
One application may be split up into components that are dispersed among multiple computersOne application may be replicated on different computersDifferent applications may be distributed among different computers
Whether the distribution of the application is vertical or horizontal
Vertical PartitioningInvolves one application split up into components that are dispersed among a number of machines
Examples:• Insurance
• Branch office operations and head office operations• Retail chains
• Point-of-sale terminals• Office and sales personnel computers
• Process control• Each major operational area is controlled by a console or workstation that is fed
information from distributed process-control microprocessors • Web mashups
• Created by integrating data from multiple sources to create a new application
• Google, eBay, Amazon
Horizontal PartitioningInvolves either one application replicated on a number of machines or a number of different applications distributed among a number of machines
Data processing is distributed among a number of computers that have a peer relationship
Computers normally operate autonomously
Examples:
• Small office/home office (SOHO) peer-to-peer networks• Users are linked together in peer-to-peer LANs• Access rights to sharable resources are governed by setting sharing
permissions on the individual machines• Air traffic control system
• Each regional center for air traffic control operates autonomously of the other centers, performing the same set of applications
Other Forms of DDP
Distributed devicesATM machinesFactory automation
Network managementCentralized systems provide management and control of distributed nodesAt least some of the computers in the distributed system must include some management and control logic to enable them to interact with the central network management system
Database Management Systems (DBMS)
DatabaseA structured collection of data stored for use in one or more applicationsIn addition to data, a database contains the relationships between data items and groups of data items
DBMSA suite of programs for constructing and maintaining the database and for offering ad hoc query capabilities to multiple users and applications
Query languageProvides a uniform interface to the database
Database Management Systems
Database Organization
Distributed database
A collection of several different databases, distributed among multiple computers, that looks like a single database to the user
DBMS controls access
Three ways of organizing data for use by an organization:
1. Centralized
2. Replicated
3. Partitioned
Centralized Versus Distributed DatabasesCentralized
Housed in a central computer facilityUsers and applications can be at a remote locationDesirable when the security and integrity of the data are paramountOften used with a vertical DDP organization
Distributed Design of data organization is more understandable and easier to implementData can be stored locally under local controlConfines the effects of a computer breakdown to its point of occurrenceCollection of data and the number of users is not limited by a single computer’s size and processing power
Table 3.5 Replication Strategy Variants
Table 3.6 Advantages and Disadvantages
of Database Distribution Methods
Table 3.7 Strategies for Database Organization
Networking Implications of DDP
Full Connectivity Using a Central Switch
Availability and PerformanceAvailability
The percentage of time that a particular function or application is available for usersCan be “desirable” or “essential”High availability requirements
Distributed system must be designed so that the failure of a single computer or device within the network does not deny access to the applicationCommunications links and equipment must be highly availableSome form of link and communication equipment redundancy and backup is needed
Performance
Response time is critically important for high interactive applications
Network must have sufficient capacity and flexibility to provide the required response time
If time is not critical, the major network performance concern is throughput
The network must be designed to handle large volumes of data
Table 3.8
Traditional Data Storage/Management
Technologies
Summary Centralized and distributed
organization
Technical trends leading to distributed data processing
Management and organizational considerations
Data center evolution
Client/server architecture
Intranets and extranets
Web services and cloud computing
Chapter 3: Distributed Data Processing
Distributed applications
Other forms of DDP
Database management systems
Centralized versus distributed databases
Replicated and partitioned databases
Networking implications of DDP
Big data infrastructure considerations