1 Dependability in the Internet Era
Mar 26, 2015
1
Dependability in the
Internet Era
2
Outline
• The glorious past (Availability Progress)
• The dark ages (current scene)
• Some recommendations
3
PreviewThe Last 5 Years: Availability Dark Ages
Ready for a Renaissance? • Things got better, then things got a lot worse!
9%
99%
99.9%
99.99%
99.999%
99.999%
1950 1960 1970 1980 1990 2000
Computer Systems
Telephone Systems
Cellphones
Internet
Ava
ilabi
lity
4
DEPENDABILITY: The 3 ITIES• RELIABILITY / INTEGRITY:
Does the right thing. (also MTTF>>1)
• AVAILABILITY: Does it now.
(also 1 >> MTTR ) MTTF+MTTRSystem Availability:If 90% of terminals up & 99% of DB up?
(=>89% of transactions are serviced on time).
• Holistic vs. Reductionist view
SecurityIntegrityReliability
Availability
5
Fail-Fast is Good, Repair is Needed
Improving either MTTR or MTTF gives benefit
Simple redundancy does not help much.
Fault Detect
Repair
Return
Lifecycle of a moduleLifecycle of a modulefail-fast gives fail-fast gives short fault latencyshort fault latency
High Availability High Availability
is low UN-Availabilityis low UN-Availability
Unavailability ~ Unavailability ~ MTTRMTTR MTTFMTTF
6
Fault Model• Failures are independent
So, single fault tolerance is a big win
• Hardware fails fast (dead disk, blue-screen)
• Software fails-fast (or goes to sleep)
• Software often repaired by reboot:– Heisenbugs
• Operations tasks: major source of outage– Utility operations
– Software upgrades
7
Disks (raid) the BIG Success Story
• Duplex or Parity: masks faults• Disks @ 1M hours (~100 years) • But
– controllers fail and – have 1,000s of disks.
• Duplexing or parity, and dual path gives “perfect disks”
• Wal-Mart never lost a byte (thousands of disks, hundreds of failures).
• Only software/operations mistakes are left.
8
Fault Tolerance vs Disaster Tolerance
• Fault-Tolerance: mask local faults– RAID disks– Uninterruptible Power Supplies– Cluster Failover
• Disaster Tolerance: masks site failures– Protects against fire, flood, sabotage,..– Redundant system and service
at remote site.
9
Case Study - Japan"Survey on Computer Security", Japan Info Dev Corp., March 1986. (trans: Eiichi
Watanabe).
Vendor (hardware and software) 5 MonthsApplication software 9 MonthsCommunications lines 1.5
YearsOperations 2 YearsEnvironment 2 Years
10 Weeks1,383 institutions reported (6/84 - 7/85)
7,517 outages, MTTF ~ 10 weeks, avg duration ~ 90 MINUTES
To Get 10 Year MTTF, Must Attack All These Areas
42%
12%
25%9.3%
11.2%
Vendor
Environment
OperationsApplication
Software
Tele Comm lines
10
Case Studies - Tandem Trends
MTTF improved
Shift from Hardware & Maintenance to from 50% to 10%
to Software (62%) & Operations (15%)
NOTE: Systematic under-reporting of EnvironmentOperations errorsApplication Software
unknown environment operations maintenance hardware software
0
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 0
100
1985 1987 1989
0
20
40
60
80
1 00
1 20
1985 19 87 1 989
Outag es/ 1000 Syste m Yearsby Primar y Cause
% of Outage s by Pri mary Cause
11
Dependability Status circa 1995 • ~4-year MTTF => 5 9s for well-managed sys.
Fault Tolerance Works.
• Hardware is GREAT (maintenance and MTTF).
• Software masks most hardware faults.
• Many hidden software outages in operations:
–New Software.
–Utilities.
• Make all hardware/software changes ONLINE.
• Software seems to define a 30-year MTTF ceiling.• Reasonable Goal: 100-year MTTF.
class 4 today => class 6 tomorrow.
12
What’s Happened Since Then?
• Hardware got better• Software got better
(even though it is more complex)• Raid is standard,
Snapshots coming standard• Cluster in a box: commodity failover• Remote replication is standard.
13
Availability99 999well-managed nodes
well-managed packs & clones
well-managed GeoPlex
Masks some hardware failures
Masks hardware failures, Operations tasks (e.g. software upgrades)Masks some software failures
Masks site failures (power, network, fire, move,…) Masks some operations failuresA
vaila
bilit
yUn-managed
14
Outline
• The glorious past (Availability Progress)
• The dark ages (current scene)
• Some recommendations
15
Progress?
• MTTF improved from 1950-1995• MTTR has not improved much
since 1970 failover• Hardware and Software online change
(pNp) is now standard• Then the Internet arrived:
– No project can take more than 3 months.– Time to market is everything– Change is good.
16
The Internet Changed Expectations
1990Phones delivered 99.999%
ATMs delivered 99.99%
Failures were front-page news.
Few hackers
Outages last an “hour”
2000Cellphones deliver 90%
Web sites deliver 98%
Failures are business-page news
Many hackers.
Outages last a “day”
This is progress?
17
Why (1) Complexity• Internet sites are MUCH
more complex.– NAP– Firewall/proxy/ipsprayer– Web– DMZ– App server– DB server– Links to other sites– tcp/http/html/dhtml/dom/xml/
com/corba/cgi/sql/fs/os…
• Skill level is much reduced
18
One of the Data Centers (500 servers)
C is c o 7 0 0 0
ICPMSCOMC7501
C is c o 7 0 0 0
ICPMSCOMC7502
C a ta lyst5 0 0 0
ICPMSCOMC5001(MSCOM1)
ATM0/0/0.1
FE4/0/0Port 1/1
HSRP
FE4/1/0 FE4/1/0
HSRP
Port 2/1 Port 2/1C a ta lyst
5 0 0 0
ICPMSCOMC5002(MSCOM2)
FE4/0/0
ATM0/0/0.1
Port 1/1
C is c o 7 0 0 0
ICPMSCOMC7503
C a ta lyst5 0 0 0
ICPMSCOMC5003(MSCOM3)
ATM0/0/0.1
FE4/0/0Port 1/1
HSRP
FE4/1/0 FE4/1/0
HSRP
Port 2/1 Port 2/1 C a ta lyst5 0 0 0
ICPMSCOMC5004(MSCOM4)
FE4/0/0
ATM0/0/0.1
Port 1/1
C is c o 7 0 0 0
ICPMSCOMC7504
SD
SERETH
NEXT
SELECT
RESET
TXCRXL
PWR
SYSTEMS
SERETH
NEXT
SELECT
RESET
TXCRXL
PWR
SERETH
NEXT
SELECT
RESET
TXCRXL
PWR
SERETH
NEXT
SELECT
RESET
TXCRXL
PWR
AC AC
48V DC 48V DC
5VDC OK 5VDC OK
SHUTDOWN SHUTDOWN
CAUTION:Double Pole/neutral fusing CAUTION:Double Pole/neutral fusingF12A/250V F12A/250V
ASX-1000
B DB DB D B D
A CA CA CA C
SD
SERETH
NEXT
SELECT
RESET
TXCRXL
PWR
SYSTEMS
SERETH
NEXT
SELECT
RESET
TXCRXL
PWR
SERETH
NEXT
SELECT
RESET
TXCRXL
PWR
SERETH
NEXT
SELECT
RESET
TXCRXL
PWR
AC AC
48V DC 48V DC
5VDC OK 5VDC OK
SHUTDOWN SHUTDOWN
CAUTION:Double Pole/neutral fusing CAUTION:Double Pole/neutral fusingF12A/250V F12A/250V
ASX-1000
B DB DB D B D
A CA CA CA C
ICPMDISTFA1001 ICPMDISTFA1002
3A22A2
2A2
1A2
ATM0/0/0.1
4A2
ATM0/0/0.1
4A2
1A2
C is c o 7 0 0 0
ICPMSCOMC7505
Catalyst 2926
ICPMSFTDLC2921(MSCOM DL1)
Port 1/1
FE4/0/0
HSRP
C is c o 7 0 0 0
ICPMSCOMC7506
Catalyst 2926
ICPMSFTDLC2922(MSCOM DL2)
Port 1/1
FE5/0/0
HSRP
Port 1/2Port 1/2
FE4/0/0
HSRP
FE5/0/0
HSRP
IIS
IIS
IIS
IIS
IIS
IIS
CPMSFTWBW26CPMSFTWBW28CPMSFTWBW30
CPMSFTWBW37CPMSFTWBW38CPMSFTWBW39
WWW.MICROSOFT.COMWWW.MICROSOFT.COM
CPMSFTWBW24CPMSFTWBW31CPMSFTWBW32CPMSFTWBW33CPMSFTWBW34
CPMSFTWBW35CPMSFTWBW40CPMSFTWBW41CPMSFTWBW42CPMSFTWBW43
SEARCH.MICROSOFT.COM
CPMSFTWBS01CPMSFTWBS02CPMSFTWBS03CPMSFTWBS04CPMSFTWBS05CPMSFTWBS06CPMSFTWBS07CPMSFTWBS08CPMSFTWBS09
CPMSFTWBS10CPMSFTWBS11CPMSFTWBS12CPMSFTWBS13CPMSFTWBS14CPMSFTWBS15CPMSFTWBS16CPMSFTWBS17CPMSFTWBS18
WWW.MICROSOFT.COM
CPMSFTWBW08CPMSFTWBW13CPMSFTWBW14CPMSFTWBW29
CPMSFTWBW36CPMSFTWBW44CPMSFTWBW45
WWW.MICROSOFT.COM
CPMSFTWBW01CPMSFTWBW15CPMSFTWBW25
CPMSFTWBW27CPMSFTWBW46CPMSFTWBW47
REGISTER.MICROSOFT.COM
CPMSFTWBR03CPMSFTWBR04CPMSFTWBR05
CPMSFTWBR09CPMSFTWBR10
SUPPORT.MICROSOFT.COM
CPMSFTWBT01CPMSFTWBT02
CPMSFTWBT03CPMSFTWBT07
CPMSFTWBT04CPMSFTWBT05
WINDOWS.MICROSOFT.COM
CPMSFTWBY01CPMSFTWBY02
CPMSFTWBY03CPMSFTWBY04
WINDOWS98.MICROSOFT.COM
CPMSFTWBJ01
WINDOWSMEDIA.MICROSOFT.COM
PREMIUM.MICROSOFT.COM
CPMSFTWBP01CPMSFTWBP02
CPMSFTWBP03
SUPPORT.MICROSOFT.COM
CPMSFTWBT06CPMSFTWBT08
CPMSFTWBR07CPMSFTWBR08
CPMSFTWBR01CPMSFTWBR02CPMSFTWBR06
REGISTER.MICROSOFT.COM
WINDOWSMEDIA.MICROSOFT.COM WINDOWSMEDIA.MICROSOFT.COM
CPMSFTWBJ01CPMSFTWBJ02
CPMSFTWBJ03CPMSFTWBJ05
CPMSFTWBJ06CPMSFTWBJ07CPMSFTWBJ08
CPMSFTWBJ09CPMSFTWBJ10
CPMSFTWBJ06CPMSFTWBJ07CPMSFTWBJ08
CPMSFTWBJ09CPMSFTWBJ10
MSDN.MICROSOFT.COM
CPMSFTWBN01CPMSFTWBN02
CPMSFTWBN03CPMSFTWBN04KBSEARCH.MICROSOFT.COM
CPMSFTWBT40CPMSFTWBT41CPMSFTWBT42
CPMSFTWBT43CPMSFTWBT44
INSIDER.MICROSOFT.COM
CPMSFTWBI01 CPMSFTWBI02
3D2
C a ta lyst5 0 0 0
IUSCCMQUEC5002(COMMUNIQUE2)
C a ta lyst5 0 0 0
IUSCCMQUEC5001(COMMUNIQUE1)
C a ta lyst5 0 0 0
C a ta lyst5 0 0 0
ICPMSCBAC5001ICPMSCBAC5502
Port 1/1 Port 1/2Port 2/12
C is c o 7 0 0 0
ICPCMGTC7501
C is c o 7 0 0 0
ICPCMGTC7502
FE4/1/0
Port 1/1
FE4/1/0SQL
Microsoft.com SQL Servers
Microsoft.com Stagers,Build and Misc. Servers
FTP 6
Build Servers 32
IIS 210
Application 2
Exchange 24
Network/Monitoring 12
SQL 120
Search 2
NetShow 3
NNTP 16
SMTP 6
Stagers 26
Total 459
Microsoft.com Server Count
Drawn by: Matt GroshongLast Updated: April 12, 2000
IP addresses removed by J im Gray to protect security
CPMSFTSQLB05CPMSFTSQLB06CPMSFTSQLB08CPMSFTSQLB09CPMSFTSQLB14CPMSFTSQLB16CPMSFTSQLB18CPMSFTSQLB20CPMSFTSQLB21
Backup SQL Servers
CPMSFTSQLB22CPMSFTSQLB23CPMSFTSQLB24CPMSFTSQLB25CPMSFTSQLB26CPMSFTSQLB27CPMSFTSQLB36CPMSFTSQLB37CPMSFTSQLB38CPMSFTSQLB39
CPMSFTSQLA05CPMSFTSQLA06CPMSFTSQLA08CPMSFTSQLA09CPMSFTSQLA14CPMSFTSQLA16CPMSFTSQLA18CPMSFTSQLA20CPMSFTSQLA21CPMSFTSQLA22
Live SQL ServersCPMSFTSQLA23CPMSFTSQLA24CPMSFTSQLA25CPMSFTSQLA26CPMSFTSQLA27CPMSFTSQLA36CPMSFTSQLA37CPMSFTSQLA38CPMSFTSQLA39
IIS
IIS
IIS IIS
IIS
IIS
IIS
IIS
IIS
IIS
IIS
IIS
Consolidator SQL Servers
CPMSFTSQLC02CPMSFTSQLC03CPMSFTSQLC06CPMSFTSQLC08CPMSFTSQLC16CPMSFTSQLC18CPMSFTSQLC20CPMSFTSQLC21CPMSFTSQLC22CPMSFTSQLC23
CPMSFTSQLC24CPMSFTSQLC25CPMSFTSQLC26CPMSFTSQLC27CPMSFTSQLC30CPMSFTSQLC36CPMSFTSQLC37CPMSFTSQLC38CPMSFTSQLC39
DOWNLOAD.MICROSOFT.COM DOWNLOAD.MICROSOFT.COM
HTMLNEWS(pvt).MICROSOFT.COM
CPMSFTWBV01CPMSFTWBV02CPMSFTWBV03
CPMSFTWBV04CPMSFTWBV05
CPMSFTWBD01CPMSFTWBD05CPMSFTWBD06
CPMSFTWBD07CPMSFTWBD08
CPMSFTWBD03CPMSFTWBD04CPMSFTWBD09
CPMSFTWBD10CPMSFTWBD11
ACTIVEX.MICROSOFT.COM
CPMSFTWBA02 CPMSFTWBA03
FTP.MICROSOFT.COM
CPMSFTFTPA03CPMSFTFTPA04
CPMSFTFTPA05CPMSFTFTPA06
NTSERVICEPACK.MICROSOFT.COM
CPMSFTWBH01CPMSFTWBH02
CPMSFTWBH03
HOTFIX.MICROSOFT.COM
CPMSFTFTPA01
ASKSUPPORT.MICROSOFT.COM
CPMSFTWBAM03CPMSFTWBAM04
CPMSFTWBAM01CPMSFTWBAM01
MSDNNews.MICROSOFT.COM
CPMSFTWBV21CPMSFTWBV22
CPMSFTWBV23
MSDNSupport.MICROSOFT.COM
CPMSFTWBV41 CPMSFTWBV42
NEWSLETTERS.MICROSOFT.COM
CPMSFTSMTPQ01 CPMSFTSMTPQ02
NEWSLETTERS
CPMSFTSMTPQ11CPMSFTSMTPQ12CPMSFTSMTPQ13CPMSFTSMTPQ14CPMSFTSMTPQ15
NEWSWIRE
CPMSFTWBQ01CPMSFTWBQ02CPMSFTWBQ03
Misc. SQL Servers
INTERNAL SMTP
CPMSFTSMTPR01CPMSFTSMTPR02
NEWSWIRE.MICROSOFT.COM
CPITGMSGR01 CPITGMSGR02
NEWSWIRECPITGMSGD01CPITGMSGD02CPITGMSGD03
OFFICEUPDATE.MICROSOFT.COM
CPMSFTWBO01CPMSFTWBO02
CPMSFTWBO04CPMSFTWBO07
PremOFFICEUPDATE.MICROSOFT.COM
CPMSFTWBO30CPMSFTWBO31
CPMSFTWBO32
SearchMCSP.MICROSOFT.COM
CPMSFTWBM03
SvcsWINDOWSMEDIA.MICROSOFT.COM
CPMSFTWBJ21 CPMSFTWBJ22
STATSCPITGMSGD04CPITGMSGD05CPITGMSGD07CPITGMSGD14CPITGMSGD15CPITGMSGD16CPMSFTSTA14CPMSFTSTA15CPMSFTSTA16
WINDOWS_Redir.MICROSOFT.COM
CPMSFTWBY05
COMMUNITIES
COMMUNITIES.MICROSOFT.COM
CPMSFTNGXA01CPMSFTNGXA02CPMSFTNGXA03
CPMSFTNGXA04CPMSFTNGXA05
CODECS.MICROSOFT.COM
CPMSFTWBJ16CPMSFTWBJ17CPMSFTWBJ18
CPMSFTWBJ19CPMSFTWBJ20
CGL.MICROSOFT.COM
CPMSFTWBG03CPMSFTWBG04CPMSFTWBG05
CPMSFTWBG04CPMSFTWBG05
CDMICROSOFT.COM
CPMSFTWBC01CPMSFTWBC02
CPMSFTWBC03
BACKOFFICE.MICROSOFT.COM
CPMSFTWBB01CPMSFTWBB03
CPMSFTWBB04
Build Servers
INTERNET-BUILDINTERNET-BUILD1INTERNET-BUILD2INTERNET-BUILD3INTERNET-BUILD4INTERNET-BUILD5INTERNET-BUILD6INTERNET-BUILD7INTERNET-BUILD8INTERNET-BUILD9INTERNETBUILD10INTERNETBUILD11INTERNETBUILD12INTERNETBUILD13INTERNETBUILD14INTERNETBUILD15INTERNETBUILD16
INTERNETBUILD17INTERNETBUILD18INTERNETBUILD19INTERNETBUILD20INTERNETBUILD21INTERNETBUILD22INTERNETBUILD23INTERNETBUILD24INTERNETBUILD25INTERNETBUILD26INTERNETBUILD27INTERNETBUILD30INTERNETBUILD31INTERNETBUILD32INTERNETBUILD34INTERNETBUILD36INTERNETBUILD42
IIS
IIS
IIS
IIS
IIS
IIS
IIS
IIS
IIS
IIS
IIS
IIS
IIS
IIS
IIS
IIS
IIS
IIS
IIS
IIS
IIS
IIS IIS
IIS
IIS
IIS
IIS
IIS
IIS
IIS
IIS
IIS
IISIIS
IIS IIS
SQL
SQL
SQL
SQL
SQLSQL
SQL
SQL
SQL
SQL
SQL
StagersCPMSFTCRA10CPMSFTCRA14CPMSFTCRA15CPMSFTCRA32CPMSFTCRB02CPMSFTCRB03CPMSFTCRP01CPMSFTCRP02CPMSFTCRP03
CPMSFTCRS01CPMSFTCRS02CPMSFTCRS03CPMSFTSGA01CPMSFTSGA02CPMSFTSGA03CPMSFTSGA04CPMSFTSGA07
PPTP / Terminal Servers
CPMSFTPPTP01CPMSFTPPTP02CPMSFTPPTP03CPMSFTPPTP04
CPMSFTTRVA01CPMSFTTRVA02CPMSFTTRVA03
CPMSFTSQLD01CPMSFTSQLD02CPMSFTSQLE01CPMSFTSQLF01CPMSFTSQLG01CPMSFTSQLH01CPMSFTSQLH02CPMSFTSQLH03CPMSFTSQLH04CPMSFTSQLI01CPMSFTSQLL01CPMSFTSQLM01CPMSFTSQLM02CPMSFTSQLP01CPMSFTSQLP02CPMSFTSQLP03CPMSFTSQLP04CPMSFTSQLP05CPMSFTSQLQ01CPMSFTSQLQ06
CPMSFTSQLR01CPMSFTSQLR02CPMSFTSQLR03CPMSFTSQLR05CPMSFTSQLR06CPMSFTSQLR08CPMSFTSQLR20CPMSFTSQLS01CPMSFTSQLS02CPMSFTSQLW01CPMSFTSQLW02CPMSFTSQLX01CPMSFTSQLX02CPMSFTSQLZ01CPMSFTSQLZ02CPMSFTSQLZ04CPMSFTSQL01CPMSFTSQL02CPMSFTSQL03
Monitoring Servers
CPMSFTHMON01CPMSFTHMON02CPMSFTHMON03
CPMSFTMONA01CPMSFTMONA02CPMSFTMONA03
Canyon Park Data CenterMicrosoft.com Network Diagram
19
A Schematic of HotMail• ~7,000 servers • 100 backend stores
with 120TB (cooked)• 3 data centers• Links to
– Passport– Ad-rotator– Internet Mail gateways– …
• ~ 1B messages per day• 150M mailboxes, 100M active• ~400,000 new per day.
Sw
ittc
hed
Eth
ern
et
Inte
rnet
Telnet Management
Local Director
Local Director
Local Director
Local Director
MSERVS
MSERVSMSERVSFrontDoors
MSERVSMSERVSIncoming
MailServers
MSERVSMSERVSAD Servers
Local Director
MSERVSMSERVSGraphicsServers
DataDataData
DataUSTORES
MemberDirectory
Local Director
MSERVSMSERVSLoginServers
20
Why (2) Velocity
• No project can take more than 13 weeks.
• Time to market is everything
• Functionality is everything• Faster, cheaper, badder
Schedule Quality
Functionality
trend
21
Why (3) Hackers• Hacker’s are a new increased threat• Any site can be attacked from anywhere• Motives include ego, malice, and greed.• Complexity makes it hard to protect sites.• Concentration of wealth makes attractive target:
• Why did you rob banks?• Willie Sutton: Cause that’s where the money is!
Note: Eric Raymond’s How to Become a Hacker http://www.tuxedo.org/~esr/faqs/hacker-howto.html
is the positive use of the term, here I mean malicious and anti-social hackers.
22
How Bad Is It?http://www-iepm.slac.stanford.edu/
Connectivity is poor.
23
How Bad Is It?• Median monthly % ping packet loss for 2/ 99
http://www-iepm.slac.stanford.edu/pinger/
24
Microsoft.Com• Operations mis-configured
a router• Took a day to diagnose
and repair.
• DOS attacks cost a fraction of a day.
• Regular security patches.
25
BackEnd Servers are More Stable• Generally deliver 99.99%
• TerraServer for example single back-end failed after 2.5 y.
• Went to 4-nodecluster
• Fails every 2 mo.Transparent failover in 30 sec.Online software upgradesSo… 99.999% in backend…
Time %
Total Up Time 8754:07:22 99.93%
Total Down Time 5:52:38 0.07%Total Time 8760:00:00 100.00%Scheduled Down 2:50:45Scheduled Availabilty 8757:09:15 99.97%
Un-Scheduled Down 3:01:53Time %
Up Time 12888:21:49 99.519%Scheduled Down 4:00:25 0.031%
Unscheduled Down 58:20:46 0.451%
Total Time 12950:43:00 99.52%Total Down 62:21:11 0.48%
Year 1
Through18
Months
Down 30 hours in July (hardware stop, auto restart failed, operations failure)
Down 26 hours in September (Backplane failure, I/O Bus failure)
26
eBay: A very honest site
• Publishes operations log.Publishes operations log.
• Has 99% of scheduled uptimeHas 99% of scheduled uptime
• Schedules about 2 hours/week down.Schedules about 2 hours/week down.
• Has had some operations outagesHas had some operations outages
• Has had some DOS problems.Has had some DOS problems.
http://www2.ebay.com/aw/announce.shtml#top
27
Outline
• The glorious past (Availability Progress)
• The dark ages (current scene)
• Some recommendations
28
Not to throw stones but…
• Everyone has a serious problem.
• The BEST people publish their stats.
• The others HIDE their stats (check Netcraft to see who I mean).
• We have good NODE-level availability5-9s is reasonable.
• We have TERRIBLE system-level availability2-9s is the goal.
29
Recommendation #1
• Continue progress on back-ends.– Make management easier
(AUTOMATE IT!!!)– Measure – Compare best practices– Continue to look for better algoritims.
• Live in fear– We are at 10,000 node servers– We are headed for 1,000,000 node servers
30
Recommendation #2• Current security approach is unworkable:
– Anonymous clients– Firewall is clueless– Incredible complexity
• We cant win this game!
• So change the rules (redefine the problem):– No anonymity– Unified authentication/authorization model – Single-function devices (with simple interfaces)– Only one-kind of interface (uddi/wsdl/soap/…).
31
ReferencesAdams, E. (1984). “Optimizing Preventative Service of Software Products.” IBM Journal of Research and
Development. 28(1): 2-14.0Anderson, T. and B. Randell. (1979). Computing Systems Reliability. Garcia-Molina, H. and C. A. Polyzois. (1990). Issues in Disaster Recovery. 35th IEEE Compcon 90. 573-577.Gray, J. (1986). Why Do Computers Stop and What Can We Do About It. 5th Symposium on Reliability in
Distributed Software and Database Systems. 3-12.Gray, J. (1990). “A Census of Tandem System Availability between 1985 and 1990.” IEEE Transactions on
Reliability. 39(4): 409-418.Gray, J. N., Reuter, A. (1993). Transaction Processing Concepts and Techniques. San Mateo, Morgan
Kaufmann.Lampson, B. W. (1981). Atomic Transactions. Distributed Systems -- Architecture and Implementation: An
Advanced Course. ACM, Springer-Verlag.Laprie, J. C. (1985). Dependable Computing and Fault Tolerance: Concepts and Terminology. 15’th FTCS. 2-
11.Long, D.D., J. L. Carroll, and C.J. Park (1991). A study of the reliability of Internet sites. Proc 10’th Symposium
on Reliable Distributed Systems, pp. 177-186, Pisa, September 1991.Darrell Long, Andrew Muir and Richard Golding, ``A Longitudinal Study of Internet Host Reliability,''
Proceedings of the Symposium on Reliable Distributed Systems, Bad Neuenahr, Germany: IEEE, September 1995, p. 2-9
http://www.netcraft.com/ They have even better for-fee data as well, but for-free is really excellent.http://www2.ebay.com/aw/announce.shtml#top eBay is an Excellent benchmark of best Internet practices http://www-iepm.slac.stanford.edu/pinger/ Network traffic/quality report, dated, but the others have died off!