-
R.V. College of Engineering
1
R.V. COLLEGE OF ENGINEERING, BANGALORE-560059
(Autonomous Institution Affiliated to VTU, Belgaum)
SELF STUDY REPORT ON
INTERNET FIBRE CHANNEL PROTOCOL
IN STORAGE AREA NETWORKS
Submitted by
SANJAY VINAYAK H K
1RV13CS139
Under the guidance of
Ms. Ganashree K.C, Assistant Professor, CSE
Mrs. Prapulla S.B, Assistant Professor, CSE
Ms.Vishalakshi Prabhu H, Assistant Professor,CSE
Dr.Neeta Shivakumar, Associate Professor, BT
Submitted to
COMPUTER SCIENCE AND ENGINEERING DEPARTMENT
R.V. College of Engineering, Bangalore-59
-
R.V. College of Engineering
Department of Computer Science
Engineering
R.V. COLLEGE OF ENGINEERING, BANGALORE - 560059 (Autonomous
Institution Affiliated to VTU, Belgaum)
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
CERTIFICATE
Certified that the Self Study work titled INTERNET FIBRE
CHANNEL
PROTOCOL IN STORAGE AREA NETWORKS is carried out by SANJAY
VINAYAK H K (1RV13CS139), who is bonafide student of R.V College
of
Engineering, Bangalore, in partial fulfillment for the award of
degree of Bachelor of
Engineering in Computer Science and Engineering of the
Visvesvaraya
Technological University, Belgaum during the year 2014-2015. It
is certified that all
corrections/suggestions indicated for the internal Assessment
have been incorporated
in the report deposited in the departmental library. The Self
Study report has been
approved as it satisfies the academic requirements in respect of
Self Study work
prescribed by the institution for the said degree.
Ms.Ganashree K.C Mrs. Prapulla S.B
Assistant Professor, CSE Assistant Professor, CSE
Ms. Vishalakshi Prabhu H Dr.Neeta Shivakumar
Assistant Professor, CSE Associate Professor, BT
Dr.Shobha G
Head of Department,
Department of CSE,
R.V.College of Engineering, Bangalore-560059
-
R.V. College of Engineering
Department of Computer Science 3
Engineering
ii
TABLE OF CONTENTS
PROBLEM DEFINITION
DESIGN
IMPLEMENTATION
CONCLUSION
FUTURE ENHANCEMENT
-
R.V. College of Engineering
Department of Computer Science 4
Engineering
LIST OF FIGURES AND TABLES
Figure 1: TRADITIONAL DAS ARCHITECTURE.....7
Figure 2: FIBRE CHANNEL TOPOLOGIES...8
Figure 3: FIBRE CHANNEL PROTOCOL LAYER.. 9
-
R.V. College of Engineering
Department of Computer Science 5
Engineering
LIST OF SYMBOLS, ACRONYMS / ABBREVIATION AND
NOMENCLATURE
SAN : Storage Area Network
NAS: Network Attached Storage
DAS: Direct Attached Storage
iSCSI: Internet Small Computer Interface
iFCP: Internet Fibre Channel Protocol
FCIP: Fibre Channel over Internet Protocol
TCP/IP: Transmission Control Protocol/Internet Protocol
LAN: Local Area Network
WAN: Wide Area Network
IETF: Internet Engineering Task Force
-
R.V. College of Engineering
Department of Computer Science 6
Engineering
PROBLEM DEFINITION
Todays applications are rapidly overwhelming the capacity of
networks and of storage
space. In e-commerce, huge databases support electronic
cataloging and ordering while large
numbers of customers attempt to simultaneously access the
information. As corporations
grow and enter the international business environment,
enterprise systems maintain corporate
information across not only states but countries. To maintain
and make available to all users
that large amount of information reliably and in a timely manner
is challenging to say the
least. More and more feature films are incorporating digital
effects. Video editing software,
Computer Aided Drafting and photo-realistic rendering software
are utilized to either modify
a film or even create one from scratch. Even a few seconds worth
of a film requires hundreds
of megabytes of storage space. When teams of 20
animators/digital artists are trying to work
on their own piece of a film, the burden on the storage and the
network facilities are
tremendous. Web sites that serve up streaming audio and or video
are consuming more
resources as the demand for these services go up. In addition to
simply supporting bandwidth
and storage increases, corporations now want to be able to
safeguard their data. This typically
entails making backups of data (to tape) and saving data off the
corporate premises. This is
an extremely small sample of the applications that are
challenging the storage and networking
architectures.
Traditionally, these applications have been supported by file
servers with either large internal
disks or disk farms directly attached to the server The disks
are typically connected to the
server via SCSI (Small Computer System Interconnect). The SCSI
standard defines a high
throughput parallel interface that is used to connect up to 7
peripherals (including the host
adapter card itself) to the computer. Examples of these
peripherals are scanners, CD
(Compact Disk) players/recorders, digitizers, tape drives and as
previously stated hard disks.
This architecture has several limitations. The server can only
access data on devices directly
attached to it. If a server or any part of its SCSI hardware
fails, access to its data is cut off.
Also, SCSI supports a finite number of devices, therefore the
amount of data a server can
access is limited. If more storage space is needed, but there is
no more room on the SCSI bus,
expansion is no longer possible. SCSI, due to its parallel
structure, has distance limitations
as well. This requires that the storage be near the servers.
These limitations are the driving
force behind a new paradigm for data storage and access.
-
R.V. College of Engineering
Department of Computer Science 7
Engineering
Distance is one of the major drawbacks for those who rely solely
on Fibre Channel. In order
to promote disaster recovery, particularly in areas experiencing
frequent earthquakes, Fibre
Channel was originally designed to allow storage to take place
over distances of up to
approximately 10 km from hosts. Even by using various methods of
signal enhancement,
which might allow the distance to expand by several hundred
kilometers, storing data over
distances hundreds or thousands of miles is out oftion across
both enterprise and small to
medium sized business environments. the question for devices
connected merely by Fibre
Channel cables. More recently, other disaster recovery concerns
have arisen which have
resulted in the promotion of storage which takes place over
longer distances, sometimes
crossing international boundaries .
Additionally, Fibre Channel media and equipment can often be
expensive and
cumbersome, both to install and manage. Although backbone
technologies do exist which
can carry Fibre Channel data over longer distances, their
installation, cost, and maintenance
would present many difficulties that could be handled easily by
a technology such as TCP/IP,
which is already in place across the world. Thus, the
relationship between SAN technologies
which run over TCP/IP and other SAN technologies can be
considered similar to that between
Wide Area Networking and the LAN. Additionally, some SANs which
run over TCP/IP can
sometimes replace other SAN technologies altogether. These
factors, combined with the fact
that the Internet has become so widespread and convenient, are
among the primary
motivations for this thesis, and for the rising popularity of
storage over TCP/IP.
SANs also tend to enable more effective disaster recovery
processes. A SAN could span a
distant location containing a secondary storage array. This
enables storage replication either
implemented by disk array controllers, by server software, or by
specialized SAN devices.
Figure 1:Traditional DAS architecture
-
R.V. College of Engineering
Department of Computer Science Engineering 8
DESIGN
The generic Fibre Channel network is composed of one or more
bi-directional point-to-
point channels. The links support 1Gbps (or 100MBps) data rates
in each direction. The transport
media may be fiber optic cable, copper twisted pair or coax
cable. The links in the FC network are
between communication ports known as N_ports. N_port stands for
Node Port where a node is a
device on the FC network. The links may be point-to-point
between N_ports or the may be set up
as a Fabric. A Fabric consists of several N_Ports connected to a
switch. Note: Ports on the switch
are called F_ports. Finally, the ports may be daisy chained to
form a ring. This is called an
Arbitrated Loop (FC-AL). In this configuration the ports are
referred to as L_ports. No switch is
necessary for FC-AL. These basic layouts may be combined in
different ways to create more
complex topologies.
FC is typically realized in one of 3 topologies: Point-To-Point,
Loop or Fabric. The Point-To-Point
connection is the simplest type of connection. It can exist by
itself or as a subset in a Fabric or Loop
topology.
Figure 2:Different fibre channel topologies
-
R.V. College of Engineering
Department of Computer Science Engineering 9
Fibre Channel uses a multi layer protocol architecture along the
lines of the 7 Layer OSI Model.
There are 5 layers. They are FC-0: Physical layer, FC-1:
Encode/Decode layer, FC-2: Framing
Protocol/Flow Control, FC-3: Common Services and FC-4: Upper
Level Protocol Support.
Additionally, there is another layer, which although is not
typically considered part of the basic
architecture is so important as to warrant mention. This is the
FC-AL (Arbitrated Loop) layer.
iFCP uses OSPF to implement addressing and routing.
Figure 3: Fibre channel protocol layer
With iFCP, N_Port addressing can be locally assigned by each
gateway for a Gateway Region
local mode operation. Alternatively, in address-transparent
mode, N_Ports can be globally
assigned across an interconnected set of gateways.
The routing between Gateway Regions operates with IP only.
Routing that takes place within a
Gateway Region (if there is any routing within) is opaque to the
IP network. For example, Fibre
Channel routing and DFS traffic that may be operating within a
Gateway Region does not flow
between Gateway Regions.
Address Transparent Mode
In address transparent mode, the scope of N_Port addresses is
fabric wide. The IP network fabric
is defined as a name server object containing a collection of
gateways. The iSNS name server
acts as a fabric Domain Address Manager, and maintains a pool of
Domain IDs for the fabric,
-
R.V. College of Engineering
Department of Computer Science Engineering 10
assigning FC domain IDs to each gateway within the fabric.
Within each Gateway Region, the
gateway acts as the downstream principal switch. The advantage
of address transparent mode is the
transparency across the fabric and the resulting simplification
of gateway operation. The
disadvantage is that each Gateway Region consumes 65K of Node
IDs and this is inefficient when
the Gateway Region N_Port count is low. Also, Address
Transparent Mode is less scaleable as
communication among N_Ports is restricted to
N_Ports within the fabric.
Gateway Region Local Mode
In Gateway Region Local mode, the scope of the N_Port addresses
is local to the Gateway
Region. Each gateway maps N_Port network addresses of external
devices to N_Port fabric
addresses. Normal inter-gateway frame traffic is mapped on the
fly.
The advantage of Local Mode is scalability. N_Port connectivity
is network-wide, allowing
unrestricted addresses within a Gateway Region. Since each
gateway is individually responsible
for N_Port addresses allocated to its Gateway Region, the fabric
becomes more stable as the
network scales in size. This is because there is no dependence
on a central addressing authority, as
is the case with Fibre Channel and iFCP Transparent Mode
fabrics.
-
R.V. College of Engineering
Department of Computer Science Engineering 11
Mapping of Fibre Channel to iFCP
Fibre Channel frames ingressing the iFCP gateway are converted
to iFCP frames through the
process shown in Figure 7. The FC frames may be addressed to
remote devices, or to other FC
devices attached to the same iFCP gateway. If the latter is the
case, no address translation
mechanism is needed, and the frame is directly delivered to the
local N_Port. If the former is the
case, then an address mapping function must occur that maps a
key found in the D_ID to the TCP
connection addressed to the appropriate remote N_Port network
address (N_Port ID and IP address)
-
R.V. College of Engineering
Department of Computer Science Engineering 12
IMPLEMENTATION
DAA Component:
Implementation of Dijkstras algorithm in OSPF:
OSPF uses a shorted path first algorithm in order to build and
calculate the shortest path to all
known destinations.The shortest path is calculated with the use
of the Dijkstra algorithm. The
algorithm by itself is quite complicated. This is a very high
level, simplified way of looking at the
various steps of the algorithm:
1. Upon initialization or due to any change in routing
information, a router generates a link-
state advertisement. This advertisement represents the
collection of all link-states on that
router.
2. All routers exchange link-states by means of flooding. Each
router that receives a link-state
update should store a copy in its link-state database and then
propagate the update to other
routers.
3. After the database of each router is completed, the router
calculates a Shortest Path Tree to
all destinations. The router uses the Dijkstra algorithm in
order to calculate the shortest path
tree. The destinations, the associated cost and the next hop to
reach those destinations form
the IP routing table.
4. In case no changes in the OSPF network occur, such as cost of
a link or a network being
added or deleted, OSPF should be very quiet. Any changes that
occur are communicated
through link-state packets, and the Dijkstra algorithm is
recalculated in order to find the
shortest path.
The Dijkstra algorithm places each router at the root of a tree
and calculates the shortest path to
each destination based on the cumulative cost required to reach
that destination. Each router will
have its own view of the topology even though all the routers
will build a shortest path tree using
the same link-state database. The following sections indicate
what is involved in building a shortest
path tree.
OSPF Cost
The cost (also called metric) of an interface in OSPF is an
indication of the overhead required to
send packets across a certain interface. The cost of an
interface is inversely proportional to the
bandwidth of that interface. A higher bandwidth indicates a
lower cost. There is more overhead
(higher cost) and time delays involved in crossing a 56k serial
line than crossing a 10M ethernet
line. The formula used to calculate the cost is:
-
R.V. College of Engineering
Department of Computer Science Engineering 13
cost= 10000 0000/bandwith in bps
For example, it will cost 10 EXP8/10 EXP7 = 10 to cross a 10M
Ethernet line and will cost 10
EXP8/1544000 = 64 to cross a T1 line.
By default, the cost of an interface is calculated based on the
bandwidth; you can force the cost of
an interface with the ip ospf cost interface subconfiguration
mode command.
Shortest Path Tree
Assume we have the following network diagram with the indicated
interface costs. In order to build
the shortest path tree for RTA, we would have to make RTA the
root of the tree and calculate the
smallest cost for each destination.
The above is the view of the network as seen from RTA. Note the
direction of the arrows in
calculating the cost. For example, the cost of RTB's interface
to network 128.213.0.0 is not relevant
when calculating the cost to 192.213.11.0. RTA can reach
192.213.11.0 via RTB with a cost of 15
(10+5). RTA can also reach 222.211.10.0 via RTC with a cost of
20 (10+10) or via RTB with a
cost of 20 (10+5+5). In case equal cost paths exist to the same
destination, Cisco's implementation
of OSPF will keep track of up to six next hops to the same
destination.
After the router builds the shortest path tree, it will start
building the routing table accordingly.
Directly connected networks will be reached via a metric (cost)
of 0 and other networks will be
reached according to the cost calculated in the tree.
http://www.cisco.com/c/dam/en/us/support/docs/ip/open-shortest-path-first-ospf/7039-spf1.gif
-
R.V. College of Engineering
Department of Computer Science Engineering 14
TOC COMPONENT:
In computer science, a communicating finite-state machine is a
finite state
machine labelled with "receive" and "send" operations over some
alphabet of channels. They were
introduced by Brand and Zafiropulo, and can be used as a model
of concurrent processes like Petri
nets. Communicating finite state machines are used frequently
for modelling a communication
protocol since they make it possible to detect major protocol
design errors, including boundedness,
deadlocks, and unspecified receptions.
The advantage of communicating finite state machines is that
they make it possible to decide many
properties in communication protocols, beyond the level of just
detecting such properties. This
advantage rules out the need for human assistance or restriction
in generality.
It has been proved with the introduction of the concept itself
that when two finite state machines
communicate with only one type of messages, boundedness,
deadlocks, and unspecified reception
state can be decided and identified while such is not the case
when the machines communicate with
two or more types of messages. Later, it has been further proved
that when only one finite state
machine communicates with single type of message while the
communication of its partner is
unconstrained, we can still decide and identify boundedness,
deadlocks, and unspecified reception
state.
It has been further proved that when the message priority
relation is empty, boundedness, deadlocks
and unspecified reception state can be decided even under the
condition in which there are two or
more types of messages in the communication between finite state
machines.
Boundedness, deadlocks, and unspecified reception state are all
decidable in polynomial time
(which means that a particular problem can be solved in
tractable, not infinite, amount of time)
since the decision problems regarding them are nondeterministic
logspace complete.
Communicating finite state machines can be the most powerful in
situations where the propagation
delay is not negligible (so that several messages can be in
transit at one time) and in situations
where it is natural to describe the protocol parties and the
communication medium as separate
entities.
http://en.wikipedia.org/wiki/Computer_sciencehttp://en.wikipedia.org/wiki/Finite_state_machinehttp://en.wikipedia.org/wiki/Finite_state_machinehttp://en.wikipedia.org/wiki/Concurrency_(computer_science)http://en.wikipedia.org/wiki/Petri_netshttp://en.wikipedia.org/wiki/Petri_nets
-
R.V. College of Engineering
Department of Computer Science Engineering 15
Figure 4:CFSM for TCP/IP
-
R.V. College of Engineering
Department of Computer Science Engineering 16
OOPS COMPONENT:
Shortest path algorithms find wide applications Given below is a
practical implementation of
Dijkstras algorithm in C++.Dijkstras algorithm helps in finding
the single source shortest path in
OSPF.
#include
using namespace std;
#define INFINITY 99
void sp(int);
void pp(int);
int choose();
int dist[10],path[10],reach[10];
int adj[10][10],n,edge;
int main()
{
int i,j,s;
coutn;
for( i=1;i
-
R.V. College of Engineering
Department of Computer Science Engineering 17
}
couts;
cout
-
R.V. College of Engineering
Department of Computer Science Engineering 18
}
int choose()
{
int min=INFINITY;
int j;
for(int w=1;w
-
R.V. College of Engineering
Department of Computer Science Engineering 19
A* uses a best-first search and finds a least-cost path from a
given initial node to one goal node (out
of one or more possible goals). As A* traverses the graph, it
follows a path of the lowest expected
total cost or distance, keeping a sorted priority queue of
alternate path segments along the way.
It uses a knowledge-plus-heuristic cost function of node x
(usually denoted f(x)) to determine the
order in which the search visits nodes in the tree. The cost
function is a sum of two functions:
the past path-cost function, which is the known distance from
the starting node to the current
node x (usually denoted g(x))
a future path-cost function, which is an admissible "heuristic
estimate" of the distance from x to
the goal (usually denoted h(x)).
The h(x) part of the f(x) function must be an admissible
heuristic; that is, it must not overestimate
the distance to the goal. Thus, for an application like routing,
h(x) might represent the straight-line
distance to the goal, since that is physically the smallest
possible distance between any two points
or nodes. A practical implementation of A* search in C++ is
shown:
#include
#include
#include
#include
#include
#include
using namespace std;
const int n=60; // horizontal size of the map
const int m=60; // vertical size size of the map
static int map[n][m];
static int closed_nodes_map[n][m]; // map of closed (tried-out)
nodes
static int open_nodes_map[n][m]; // map of open (not-yet-tried)
nodes
static int dir_map[n][m]; // map of directions
const int dir=8; // number of possible directions to go at any
position
// if dir==4
//static int dx[dir]={1, 0, -1, 0};
//static int dy[dir]={0, 1, 0, -1};
// if dir==8
static int dx[dir]={1, 1, 0, -1, -1, -1, 0, 1};
static int dy[dir]={0, 1, 1, 1, 0, -1, -1, -1};
class node
{
// current position
int xPos;
int yPos;
// total distance already travelled to reach the node
int level;
// priority=level+remaining distance estimate
int priority; // smaller: higher priority
public:
node(int xp, int yp, int d, int p)
http://en.wikipedia.org/wiki/Best-first_searchhttp://en.wikipedia.org/wiki/Node_(graph_theory)http://en.wikipedia.org/wiki/Goal_nodehttp://en.wikipedia.org/wiki/Priority_queuehttp://en.wikipedia.org/wiki/Heuristichttp://en.wikipedia.org/wiki/Admissible_heuristichttp://en.wikipedia.org/wiki/Admissible_heuristichttp://en.wikipedia.org/wiki/Routing
-
R.V. College of Engineering
Department of Computer Science Engineering 20
{xPos=xp; yPos=yp; level=d; priority=p;}
int getxPos() const {return xPos;}
int getyPos() const {return yPos;}
int getLevel() const {return level;}
int getPriority() const {return priority;}
void updatePriority(const int & xDest, const int &
yDest)
{
priority=level+estimate(xDest, yDest)*10; //A*
}
// give better priority to going strait instead of
diagonally
void nextLevel(const int & i) // i: direction
{
level+=(dir==8?(i%2==0?10:14):10);
}
// Estimation function for the remaining distance to the
goal.
const int & estimate(const int & xDest, const int &
yDest) const
{
static int xd, yd, d;
xd=xDest-xPos;
yd=yDest-yPos;
// Euclidian Distance
d=static_cast(sqrt(xd*xd+yd*yd));
// Manhattan distance
//d=abs(xd)+abs(yd);
// Chebyshev distance
//d=max(abs(xd), abs(yd));
return(d);
}
};
// Determine priority (in the priority queue)
bool operator b.getPriority();
}
// A-star algorithm.
// The route returned is a string of direction digits.
string pathFind( const int & xStart, const int &
yStart,
const int & xFinish, const int & yFinish )
{
static priority_queue pq[2]; // list of open (not-yet-tried)
nodes
static int pqi; // pq index
static node* n0;
static node* m0;
static int i, j, x, y, xdx, ydy;
static char c;
pqi=0;
// reset the node maps
for(y=0;ygetPriority(); // mark it on the open nodes map
-
R.V. College of Engineering
Department of Computer Science Engineering 21
// A* search
while(!pq[pqi].empty())
{
// get the current node w/ the highest priority
// from the list of open nodes
n0=new node( pq[pqi].top().getxPos(),
pq[pqi].top().getyPos(),
pq[pqi].top().getLevel(), pq[pqi].top().getPriority());
x=n0->getxPos(); y=n0->getyPos();
pq[pqi].pop(); // remove the node from the open list
open_nodes_map[x][y]=0;
// mark it on the closed nodes map
closed_nodes_map[x][y]=1;
// quit searching when the goal state is reached
//if((*n0).estimate(xFinish, yFinish) == 0)
if(x==xFinish && y==yFinish)
{
// generate the path from finish to start
// by following the directions
string path="";
while(!(x==xStart && y==yStart))
{
j=dir_map[x][y];
c='0'+(j+dir/2)%dir;
path=c+path;
x+=dx[j];
y+=dy[j];
}
// garbage collection
delete n0;
// empty the leftover nodes
while(!pq[pqi].empty()) pq[pqi].pop();
return path;
}
// generate moves (child nodes) in all possible directions
for(i=0;igetLevel(),
n0->getPriority());
m0->nextLevel(i);
m0->updatePriority(xFinish, yFinish);
// if it is not in the open list then add into that
if(open_nodes_map[xdx][ydy]==0)
{
open_nodes_map[xdx][ydy]=m0->getPriority();
pq[pqi].push(*m0);
// mark its parent node direction
dir_map[xdx][ydy]=(i+dir/2)%dir;
}
else if(open_nodes_map[xdx][ydy]>m0->getPriority())
{
// update the priority info
open_nodes_map[xdx][ydy]=m0->getPriority();
// update the parent direction info
dir_map[xdx][ydy]=(i+dir/2)%dir;
// replace the node
// by emptying one pq to the other one
-
R.V. College of Engineering
Department of Computer Science Engineering 22
// except the node to be replaced will be ignored
// and the new node will be pushed in instead
while(!(pq[pqi].top().getxPos()==xdx &&
pq[pqi].top().getyPos()==ydy))
{
pq[1-pqi].push(pq[pqi].top());
pq[pqi].pop();
}
pq[pqi].pop(); // remove the wanted node
// empty the larger size pq to the smaller one
if(pq[pqi].size()>pq[1-pqi].size()) pqi=1-pqi;
while(!pq[pqi].empty())
{
pq[1-pqi].push(pq[pqi].top());
pq[pqi].pop();
}
pqi=1-pqi;
pq[pqi].push(*m0); // add the better node instead
}
else delete m0; // garbage collection
}
}
delete n0; // garbage collection
}
return ""; // no route found
}
int main()
{
srand(time(NULL));
// create empty map
for(int y=0;y
-
R.V. College of Engineering
Department of Computer Science Engineering 23
cout
-
R.V. College of Engineering
Department of Computer Science Engineering 24
EB COMPONENT:
Most data centres, by design, consume vast amounts of energy in
an incongruously wasteful
manner, interviews and documents show. Online companies
typically run their facilities at
maximum capacity around the clock, whatever the demand. As a
result, data centres can
waste 90 percent or more of the electricity they pull off the
grid, The Times found. To guard
against a power failure, they further rely on banks of
generators that emit diesel exhaust. The
pollution from data centres has increasingly been cited by the
authorities for violating clean
air regulations, documents show.
In Silicon Valley, many data centres appear on the state
governments Toxic Air Contaminant
Inventory, a roster of the areas top stationary diesel
polluters. Worldwide, the digital
warehouses use about 30 billion watts of electricity, roughly
equivalent to the output of 30
nuclear power plants, according to estimates industry experts
compiled for The Times. Data
centres in the United States account for one-quarter to
one-third of that load, the estimates
show. Energy efficiency varies widely from company to company.
But at the request of The
Times, the consulting firm McKinsey & Company analysed
energy use by data centres and
found that, on average, they were using only 6 percent to 12
percent of the electricity
powering their servers to perform computations. The rest was
essentially used to keep servers
idling and ready in case of a surge in activity that could slow
or crash their operations.
A server is a sort of bulked-up desktop computer, minus a screen
and keyboard that contains
chips to process data. The study sampled about 20,000 servers in
about 70 large data centres
spanning the commercial gamut: drug companies, military
contractors, banks, media
companies and government agencies.
The inefficient use of power is largely driven by a symbiotic
relationship between users who
demand an instantaneous response to the click of a mouse and
companies that put their
business at risk if they fail to meet that expectation.
-
R.V. College of Engineering
Department of Computer Science Engineering 25
Even running electricity at full throttle has not been enough to
satisfy the industry. In addition
to generators, most large data centres contain banks of huge,
spinning flywheels or thousands
of lead-acid batteries many of them similar to automobile
batteries to power the
computers in case of a grid failure as brief as a few hundredths
of a second, an interruption
that could crash the servers.
Rapid digitization of content has led to extreme demands on
storage systems. The nature of
data access such as simulation data dumps, check-pointing,
real-time data access
queries, data warehousing queries, etc., warrants an online data
management solution. Most
online data management solutions make use of hierarchical
storage management techniques
to accommodate the large volume of digital data. In such
solutions, a major portion of the
data set is usually hosted by tape-based archival solutions,
which offer cheaper storage at the
cost of higher access latencies. This loss in performance due to
tape-based archive solutions
limits the performance of the higher level applications that
make these different types of data
accesses. This is particularly true since many queries may
require access to older, archived
data. The decreasing cost and increasing capacity of commodity
disks are rapidly changing
the economics of online storage and making the use of these
large disk arrays more practical
for applications of low latency. Large disk arrays also enable
system scaling, an important
property as the growth of online content is predicted to be
enormous. The enhanced
performance offered by disk-based solutions comes at a price,
however. Keeping huge arrays
of spinning disks has a hidden cost, i.e., energy. Industry
surveys suggest that the cost of
powering our countrys data centers is growing at a rate of 25%
every year .Among various
components of a data center, storage is one of the biggest
energy consumers, consuming
almost 27% of the total.
Given the well-known growth in total cost of ownership, a
solution that can mitigate the high
cost of power, yet keep data online, is needed. Various studies
of data access patterns in data
centers suggest that on any given day, the total amount of data
accessed is less than 5% of
the total stored. Most energy conservation techniques make use
of various optimizations to
conserve energy, but this usually comes with a huge performance
penalty. Massive array of
idle disks (MAID) is a design philosophy recently adopted .The
central idea behind MAID
is that all disks in a MAID storage array are not spinning all
the time. Within a MAID
subsystem, disks remain dormant (i.e., powered off) until the
data they hold is requested.
When a request arrives for data on a disk that is off, the
controller turns on the disk, which
takes around 710 s, and services the request. Additionally, a
set of disks is designated as
cache disks, which are always spinning (i.e., never turned
off).
-
R.V. College of Engineering
Department of Computer Science Engineering 26
This disk-based caching is necessary because the regular memory
cache is usually not large
enough to hold all of the frequently accessed data. The MAID
concept works on the
assumption that less than 5% of the stored data actually gets
accessed on any given day.
Keeping this in mind, the MAID controller tries to make sure
that frequently accessed data
are moved to the always-on cache disks. For this reason, the
response time of the system is
very tightly tied to the size of the cache disk set. By
increasing the cache hit ratio, the
controller tries to minimize the response time and also conserve
energy. The savings increase
as the storage environments get larger. A commercial product
based on this idea, Copan
MAID, has seen a great deal of success in the realm of archival
systems. One of the main
drawbacks with the MAID approach is that it tries to keep the
most frequently
accessed data in the cache disk set, but this will not ensure
good response time for noncached
data. Data that are not cached could include data being accessed
for the first time or data that
cannot be cached due to their sheer volume or the access
pattern. A study of using application
hints to increase the efficiency of prefetching and to achieve
better energy efficient is
presented. Application hinting has drawn a great amount of
interest in the high-performance
computing community. The idea is to use application hints for
the purpose of prefetching
data ahead of time, thereby reducing the file system I/O
latencies. Other approaches to
increasing energy efficiency for storage systems are possible. A
new energy conservation
technique for disk array-based network servers called popular
data concentration (PDC) was
proposed .According to this scheme, frequently accessed data are
migrated to a subset of the
disks. The main assumption here is that data exhibit heavily
clustered popularities. PDC tries
to lay data out across the disk array so that the first disk
stores the most popular disk data,
the second disk stores the next most popular disk data, and so
on. Since data blocks are always
moved around to different locations in the disk array, this
mapping mechanism becomes very
important.
A new solution called Hibernator was presented. The main idea
here is the dynamic switching of
disk speeds based on observed performance. This approach makes
use of multispeed disk drives
that can run at different speeds but have to be shut down to
make a transition between different
speeds. The general consensus of all these works is that normal
cache management algorithms are
not necessarily the best option when it comes to power
conservation. Specifically, they explore the
use of spatial and temporal locality information together in
order to develop cache replacement
algorithms. A new type of hard disk drive that can operate at
multiple speeds is also explored for
energy saving. It was demonstrated that using dynamic
revolutions per minute (DRPM) speed
control for power management in server disk arrays can provide
large savings in power
consumption with very little degradation in delivered
performance.
-
R.V. College of Engineering
Department of Computer Science Engineering 27
CONCLUSION In the above report, the design of iFCP in SAN has
been discussed. Then in the implementation
part,the design of OSPF, design of protocols using CFSM and
practical implementation of the
shortest path algorithms has been done. Some of the basic
fundamentals of the Fibre Channel
technology have been provided in this thesis, which allow us to
understand the origins of many
Storage Area Networking mechanisms. We can understand on a basic
level how Fibre Channel
devices discover one another during initialization, which
enables them to establish the lower layers
necessary for the transport of subsequent Fibre Channel
frames.
FUTURE ENHANCEMENTS Fibre Channel is a mature networking
technology that is ideally suited for SANs.FC is a gigabit
technology supporting speeds up to 1gigabit per second with
faster rates being realized in the future.
FC supports different transport media such as copper for lower
cost lower capability configurations
or fiber optics for greater speed and distance at a higher cost.
FC products support a SANs need for
reliability by incorporating self-configuring capabilities that
allow reconfiguring of networks,
faulty equipment isolation and maintenance of the network all
with minimal to no impact on SAN
operations.
Clearly, the largest hindrance introduced with regard to the
transmission of Fibre Channel over
TCP/IP is the interface for the Fibre Channel hardware itself.
Research in Fibre Optic technology
seeks to reduce this problem
References
[1]David Norman. "Fibre Channel Technology for Storage Area
Networks, December
2011
[2]Claire Kraft. Design and Implementation of iFCP University of
Colorado, 2004
[3]Franco Travostino,iFCP-A Technical overview", Storage
Networking Industry
Association (SNIA), 2013
[4]Stephan Rurhup,Network Protocol Design and Evaluation,
University of
Freiburg,2009
-
R.V. College of Engineering
Department of Computer Science Engineering 28