16 September 2005 1 The ongoing evolution from Packet based networks to Hybrid Networks in Research & Education Networks Olivier Martin, CERN NEC’2005 Conference, VARNA (Bulgaria)
42
Embed
16 September 2005 1 The ongoing evolution from Packet based networks to Hybrid Networks in Research & Education Networks Olivier Martin, CERN NEC’2005.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Slide 1
16 September 2005 1 The ongoing evolution from Packet based
networks to Hybrid Networks in Research & Education Networks
Olivier Martin, CERN NEC2005 Conference, VARNA (Bulgaria)
Slide 2
16 September 2005 NEC2005 conference Slide 2 Presentation
Outline The demise of conventional packet based networks in the
R&E community The advent of community managed dark fiber
networks The Grid & its associated Wide Area Networking
challenges on-demand Lambda Grids Ethernet over SONET & new
standards WAN-PHY, GFP, VCAT/LCAS, G.709, OTN
Olivier H. Martin (5) Internet Backbone Speeds T1 Lines T3
lines OC3c OC12c IP/ ATM-VCs MBPS
Slide 6
Olivier H. Martin (6) Higher Speed, Lower cost, complexity and
overhead High Speed IP Network Transport Trends B-ISDN IP Over
SONET/SDH IP SONET/SDH Optical ATM SONET/SDH IP Optical IP Over
Optical IP Optical IP Over ATM ATM SONET/SDH IP Optical
Multiplexing, protection and management at every layer
Signalling
Slide 7
Olivier H. Martin (7)
Slide 8
Olivier H. Martin (8)
Slide 9
Olivier H. Martin (9)
Slide 10
October 12, 2001Intro to Grid Computing and Globus Toolkit10
Network Exponentials l Network vs. computer performance Computer
speed doubles every 18 months Network speed doubles every 9 months
Difference = order of magnitude per 5 years l 1986 to 2000
Computers: x 500 Networks: x 340,000 l 2001 to 2010 Computers: x 60
Networks: x 4000 Moores Law vs. storage improvements vs. optical
improvements. Graph from Scientific American (Jan- 2001) by Cleo
Vilett, source Vined Khoslan, Kleiner, Caufield and Perkins.
Slide 11
Know the user BW requirements # of users C A B A ->
Lightweight users, browsing, mailing, home use B -> Business
applications, multicast, streaming, VPNs, mostly LAN C ->
Special scientific applications, computing, data grids,
virtual-presence ADSLGigE LAN (3 of 12) F(t)
Slide 12
What the user BW requirements Total BW C A B A -> Need full
Internet routing, one to many B -> Need VPN services on/and full
Internet routing, several to several C -> Need very fat pipes,
limited multiple Virtual Organizations, few to few ADSLGigE LAN (4
of 12)
Slide 13
So what are the facts Costs of fat pipes (fibers) are one/third
of equipment to light them up Is what Lambda salesmen told Cees de
Laat (University of Amsterdam & Surfnet) Costs of optical
equipment 10% of switching 10 % of full routing equipment for same
throughput 100 Byte packet @ 10 Gb/s -> 80 ns to look up in 100
Mbyte routing table (light speed from me to you on the back row!)
Big sciences need fat pipes Bottom line: create a hybrid
architecture which serves all users in one coherent and cost
effective way (5 of 12)
Slide 14
Utilization trends Gbps Network Capacity Limit Jan 2005
Slide 15
Todays hierarchical IP network University Region al National or
Pan-National IP Network Other national networks NREN A NREN B NREN
C NREN D
Slide 16
Tomorrows peer to peer IP network World University Region al
Server World National DWDM Network NREN A NREN B NREN C NREN D
Child Lightpaths Child Lightpaths
Slide 17
Creation of application VPNs Commodity Internet Bio-informatics
Network University CERN University High Energy Physics Network
eVLBI Network Dept Research Network Direct connect bypasses campus
firewall
Slide 18
Production vs Research Campus Networks >Increasingly
campuses are deploying parallel networks for high end users
>Reduces costs by providing high end network capability to only
those who need it >Limitations of campus firewall and border
router are eliminated >Many issues in regards to security, back
door routing, etc >Campus networks may follow same evolution as
campus computing >Discipline specific networks being extended
into the campus
Slide 19
UCLP intended for projects like National LambdaRail CAVEwave
acquires a separate wavelength between Seattle and Chicago and
wants to manage it as part of its network including add/drop,
routing, partition etc NLR Condominium lambda network Original
CAVEwave
Slide 20
GEANT2 POP Design
Slide 21
UltraLight Optical Exchange Point u L1, L2 and L3 services u
Interfaces 1GE and 10GE 10GE WAN-PHY (SONET friendly) u Hybrid
packet- and circuit-switched PoP Interface between packet- &
circuit-switched networks u Control plane is L3 Photonic switch
Calient or Glimmerglass Photonic Cross Connect Switch
Slide 22
16 September 2005 NEC2005 conference Slide 22 LHC Data Grid
Hierarchy Tier 1 Tier2 Center Online System CERN 700k SI95 ~1 PB
Disk; Tape Robot FNAL: 200k SI95; 600 TB IN2P3 Center INFN Center
RAL Center Institute Institute ~0.25TIPS Workstations ~100-400
MBytes/sec 2.5/10 Gbps 0.11 Gbps Physicists work on analysis
channels Each institute has ~10 physicists working on one or more
channels Physics data cache ~PByte/sec 10 Gbps Tier2 Center ~2.5
Gbps Tier 0 +1 Tier 3 Tier 4 Tier2 Center Tier 2 Experiment
CERN/Outside Resource Ratio ~1:2 Tier0/( Tier1)/( Tier2)
~1:1:1
Slide 23
16 September 2005 NEC2005 conference Slide 23 grid for a
physics study group Deploying the LHC Grid grid for a regional
group [email protected] Tier2 Lab a Uni a Lab c Uni n Lab m Lab
b Uni b Uni y Uni x Tier3 physics department Desktop Germany Tier 1
USA UK France Italy Taipei? CERN Tier 1 Japan The LHC Computing
Centre CERN Tier 0
Slide 24
16 September 2005 NEC2005 conference Slide 24 What you get
[email protected] Tier2 Lab a Uni a Lab c Uni n Lab m Lab b Uni
b Uni y Uni x physics department physicist Germany Tier 1 USA UK
France Italy . CERN Tier 1 Japan CERN Tier 0
Slide 25
16 September 2005 NEC2005 conference Slide 25 Main Networking
Challenges Fulfill the, yet unproven, assertion that the network
can be nearly transparent to the Grid Deploy suitable Wide Area
Network infrastructure (50-100 Gb/s) Deploy suitable Local Area
Network infrastructure (matching or exceeding that of the WAN)
Seamless interconnection of LAN & WAN infrastructures firewall?
End to End issues (transport protocols, PCs (Itanium, Xeon), 10GigE
NICs (Intel, S2io), where are we today: memory to memory: 7.5Gb/s
(PCI bus limit) memory to disk: 1.2MB (Windows 2003 server/NewiSys)
disk to disk: 400MB (Linux), 600MB (Windows)
Slide 26
16 September 2005 NEC2005 conference Slide 26 Main TCP issues
Does not scale to some environments High speed, high latency Noisy
Unfair behaviour with respect to: Round Trip Time (RTT Frame size
(MSS) Access Bandwidth Widespread use of multiple streams in order
to compensate for inherent TCP/IP limitations (e.g. Gridftp,
BBftp): Bandage rather than a cure New TCP/IP proposals in order to
restore performance in single stream environments Not clear if/when
it will have a real impact In the mean time there is an absolute
requirement for backbones with: Zero packet losses, And no packet
re-ordering Which re-inforces the case for lambda Grids
Slide 27
16 September 2005 NEC2005 conference Slide 27 TCP dynamics
(10Gbps, 100ms RTT, 1500Bytes packets) Window size (W) =
Bandwidth*Round Trip Time Wbits = 10Gbps*100ms = 1Gb Wpackets =
1Gb/(8*1500) = 83333 packets Standard Additive Increase
Multiplicative Decrease (AIMD) mechanisms: W=W/2 (halving the
congestion window on loss event) W=W + 1 (increasing congestion
window by one packet every RTT) Time to recover from W/2 to W
(congestion avoidance) at 1 packet per RTT: RTT*Wp/2 = 1.157 hour
In practice, 1 packet per 2 RTT because of delayed acks, i.e. 2.31
hour Packets per second: RTT*Wpackets = 833333 packets
Slide 28
Single TCP stream performance under periodic losses Loss rate
=0.01%: LAN BW utilization= 99% WAN BW utilization=1.2% Bandwidth
available = 1 Gbps TCP throughput much more sensitive to packet
loss in WANs than LANs TCPs congestion control algorithm (AIMD) is
not well-suited to gigabit networks The effect of packets loss can
be disastrous TCP is inefficient in high bandwidth*delay networks
The future performance-outlook for computational grids looks bad if
we continue to rely solely on the widely-deployed TCP RENO
Slide 29
ResponsivenessPathBandwidth RTT (ms) MTU (Byte) Time to recover
LAN 10 Gb/s 11500 430 ms GenevaChicago 10 Gb/s 1201500 1 hr 32 min
Geneva-Los Angeles 1 Gb/s 1801500 23 min Geneva-Los Angeles 10 Gb/s
1801500 3 hr 51 min Geneva-Los Angeles 10 Gb/s 1809000 38 min
Geneva-Los Angeles 10 Gb/s 180 64k (TSO) 5 min Geneva-Tokyo 1 Gb/s
3001500 1 hr 04 min Large MTU accelerates the growth of the window
Time to recover from a packet loss decreases with large MTU Larger
MTU reduces overhead per frames (saves CPU cycles, reduces the
number of packets) C. RTT 2. MSS 2 C : Capacity of the link Time to
recover from a single packet loss:
Slide 30
16 September 2005 NEC2005 conference Slide 30 Internet2 land
speed record history (IPv4 & IPv6) period 2000-2004
Slide 31
16 September 2005 NEC2005 conference Slide 31 Layer1/2/3
networking (1) Conventional layer 3 technology is no longer
fashionable because of: High associated costs, e.g. 200/300 KUSD
for a 10G router interfaces Implied use of shared backbones The use
of layer 1 or layer 2 technology is very attractive because it
helps to solve a number of problems, e.g. 1500 bytes Ethernet frame
size (layer1) Protocol transparency (layer1 & layer2) Minimum
functionality hence, in theory, much lower costs
(layer1&2)
Slide 32
16 September 2005 NEC2005 conference Slide 32 Layer1/2/3
networking (2) 0n-demand Lambda Grids are becoming very popular:
Pros: circuit oriented model like the telephone network, hence no
need for complex transport protocols Lower equipment costs (i.e. in
theory a factor 2 or 3 per layer) the concept of a dedicated end to
end light path is very elegant Cons: End to end still very loosely
defined, i.e. site to site, cluster to cluster or really host to
host Higher circuit costs, Scalability, Additional middleware to
deal with circuit set up/tear down, etc Extending dynamic VLAN
functionality is a potential nightmare!
Slide 33
16 September 2005 NEC2005 conference Slide 33 Lambda Grids What
does it mean? Clearly different things to different people, hence
the apparently easy consensus! Conservatively, on demand site to
site connectivity Where is the innovation? What does it solve in
terms of transport protocols? Where are the savings? Less
interfaces needed (customer) but more standby/idle circuits needed
(provider) Economics from the service provider vs the customer
perspective? Traditionally, switched services have been very
expensive, Usage vs flat charge Break even, switches vs leased, few
hours/day Why would this change? In case there are no savings, why
bother? More advanced, cluster to cluster Implies even more active
circuits in paralle Is it realistic? Even more advanced, Host to
Host or even per flow All optical Is it really realisitic?
Slide 34
16 September 2005 NEC2005 conference Slide 34 Some Challenges
Real bandwidth estimates given the chaotic nature of the
requirements. End-end performance given the whole chain involved
(disk-bus-memory-bus-network-bus-memory-bus- disk) Provisioning
over complex network infrastructures (GEANT, NRENs etc) Cost model
for options (packet+SLAs, circuit switched etc) Consistent
Performance (dealing with firewalls) Merging leading edge research
with production networking
Slide 35
16 September 2005 NEC2005 conference Slide 35 Tentative
conclusions There is a very clear trend towards community managed
dark fiber networks As a consequence National Research &
Education Networks are evolving into Telecom Operators, is it
right? In the short term, almost certainly YES In the longer term,
probably NO In many countries, there is NO other way to have
affordable access to multi-Gbit/s networks, therefore this is
clearly the right move The Grid & its associated Wide Area
Networking challenges on-demand Lambda Grids are, according to me,
extremely doubtful! Ethernet over SONET & new standards will
revolutionize the Internet WAN-PHY (IEEE) has, according to me NO
future! However, GFP, VCAT/LCAS, G.709, OTN are very likely to have
a very bright future.
Slide 36
Single TCP stream between Caltech and CERN u Available (PCI-X)
Bandwidth=8.5 Gbps u RTT=250ms (16000 km) u 9000 Byte MTU u 15 min
to increase throughput from 3 to 6 Gbps u Sending station: Tyan
S2882 motherboard, 2x Opteron 2.4 GHz, 2 GB DDR. u Receiving
station: CERN OpenLab:HP rx4640, 4x 1.5GHz Itanium-2, zx1 chipset,
8GB memory u Network adapter: S2IO 10 GbE Burst of packet losses
Single packet loss CPU load = 100%
Slide 37
High Throughput Disk to Disk Transfers: From 0.1 to 1GByte/sec
Server Hardware (Rather than Network) Bottlenecks: Write/read and
transmit tasks share the same limited resources: CPU, PCI-X bus,
memory, IO chipset PCI-X bus bandwidth: 8.5 Gbps [133MHz x 64 bit]
Link aggregation (802.3ad): Logical interface with two physical
interfaces on two independent PCI-X buses. LAN test: 11.1 Gbps
(memory to memory) Performance in this range (from 100 MByte/sec up
to 1 GByte/sec) is required to build a responsive Grid-based
Processing and Analysis System for LHC
Slide 38
Transferring a TB from Caltech to CERN in 64-bit MS Windows
Latest disk to disk over 10Gbps WAN: 4.3 Gbits/sec (536 MB/sec) - 8
TCP streams from CERN to Caltech; 1TB file 3 Supermicro Marvell
SATA disk controllers + 24 SATA 7200rpm SATA disks Local Disk IO
9.6 Gbits/sec (1.2 GBytes/sec read/write, with