1 Opera&onal Experiences from the Viewpoint of University IT System Administrators in the Metropolitan Area on East Japan Great Earthquake Kohichi Ogawa and Noriaki Yoshiura Informa7on Technology Center Saitama University ACM SIGUCCS 2012 Service & Support Conference Friday, October 19, 12
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
Opera&onal Experiences from the Viewpoint of University IT System Administrators
in the Metropolitan Area on East Japan Great Earthquake
Kohichi Ogawa and Noriaki Yoshiura
Informa7on Technology CenterSaitama University
ACM SIGUCCS 2012Service & Support Conference
Friday, October 19, 12
2
Great Earthquake and Great Tsunami
Friday, October 19, 12
Loca7on of Earthquake and our University
3
Friday, October 19, 12
Loca7on of Earthquake and our University
3
Friday, October 19, 12
Loca7on of Earthquake and our University
3
Epicenter
Friday, October 19, 12
Loca7on of Earthquake and our University
3
damaged Areas
Epicenter
Friday, October 19, 12
Loca7on of Earthquake and our University
3
damaged Areas
Epicenter
Friday, October 19, 12
Loca7on of Earthquake and our University
3
damaged Areas
Epicenter
Tokyo
Friday, October 19, 12
Loca7on of Earthquake and our University
3
damaged Areas
Epicenter
Tokyo
SaitamaUniversity
Friday, October 19, 12
Loca7on of Earthquake and our University
3
damaged Areas
Epicenter
Tokyo
SaitamaUniversity
Friday, October 19, 12
Loca7on of Earthquake and our University
3
damaged Areas
Epicenter
Tokyo
about 130 milesSaitamaUniversity
Friday, October 19, 12
4
Topics of the presenta7on
• Energy Problems by the Earthquake• Some Troubles in the Rolling Blackouts• Reloca7on to Data Center and VPS• Lessons and Experiences
Friday, October 19, 12
5
1. Introduc7onl System at the earthquake
2. Situa7on aSer the Earthquakel immediately aSer the EarthquakelOpera7on for rolling power outagel Impact of rolling power outage
3. Countermeasures against this situa7on lData Center lVPS
4. Effec7veness by countermeasures5. New System aSer the earthquake6. Lessons and Approaches
Friday, October 19, 12
6
1. Introduc7onl System at the earthquake
2. Situa7on aSer the Earthquakel immediately aSer the EarthquakelOpera7on for rolling power outagel Impact of rolling power outage
3. Countermeasures against this situa7on lData Center lVPS
4. Effec7veness by countermeasures5. New System aSer the earthquake6. Lessons and Approaches
Friday, October 19, 12
7
System at the disaster(2007-‐2011)
• Network System – L3 Switches x 6 switches– Wifi Access Points x 80 aps
• Server System– About 40 Server
• Hos7ng Services– Web Hos7ng Service(200 sites)– DNS Hos7ng Service (100 zones)– Mail Hos7ng Service (40 sub domains)
• Housing Service– Rent Space of server room for other organiza7on in the university
Friday, October 19, 12
8
Network Topology
• Star topology Network
• One-‐to-‐one connec7on from lab to server room
• No network switch between each room and the server room
Friday, October 19, 12
9
1. Introduc7onl System at the earthquake
2. Situa7on aSer the Earthquakel immediately aSer the EarthquakelOpera7on for rolling power outagel Impact of rolling power outage
3. Countermeasures against this situa7on lData Center lVPS
4. Effec7veness by countermeasures5. New System aSer the earthquake6. Lessons and Approaches
Friday, October 19, 12
10
Immediately situa7on aSer the Great Earthquake
• 5-‐lower in Saitama University• No direct damage such as collapsed buildings• Informa7on Infrastructure
– No Op7cal fiber cut in the server room– No troubles in network equipment and servers
Friday, October 19, 12
10
Immediately situa7on aSer the Great Earthquake
• 5-‐lower in Saitama University• No direct damage such as collapsed buildings• Informa7on Infrastructure
– No Op7cal fiber cut in the server room– No troubles in network equipment and servers
Friday, October 19, 12
11
The Rolling Blackouts
• Damaged nuclear power plant →the supply of electricity was weakened
• The government announced implementa7on of the rolling blackouts.
• 5 groups by regions• 4th group at Saitama University• Blackouts for about 4 hours at a 7me
Friday, October 19, 12
12
Impacts of rolling blackouts
Groups of the Rolling BlackoutsElectricity Place
by the Rolling Blackouts
Friday, October 19, 12
13
Countermeasures against the disasters
• Informa7on Infrastructure during Rolling Blackouts– to support the ac7vi7es of the university by email and web servers
• Rented Power Generator• Switching to the emergency power supply
– manpower
Friday, October 19, 12
14
Prac7cal use ofRented Power Generator
Rented Power Generator
Temporary Power Connec7on Board
Friday, October 19, 12
15
Schedule for the rolling blackouts
Date 3/14Mon
3/15Tue
3/16Wed
3/17Thu
3/18Fri
3/19Sat
3/20Sun
3/21Mon
3/22Tue
3/23Wed
0:00
Wait Wait Wait
6:00 9:20~12:30
6:20~10:00
Wait Wait Wait12:00 13:50
~17:30
15:20~18:40
Wait Wait Wait15:50~18:45
18:00 18:50~21:45
Wait Wait Wait
18:20~21:00
Friday, October 19, 12
16
Some troubles for Informa7on Infrastructure
• March 22 Three UPS and two servers failed at the 7me of changing switch. – Failure of the DNS server – Unavailability to access E-‐mail and Web Servers
• March 23 Troubles of L3 switches– Layer 3 switches trouble by rou7ng processing unit failure• A part of Campus Network stopped for 3 days
Friday, October 19, 12
17
Problems of fuel exhaus7on
• Emergency power fuel exhaus7on– Oil refinery damaged by earthquake– Reduc7on of oil fuel supply
• Staff Problems:– Scheduling of opera7on staffs– Traffic paralysis– Health status of opera7ons staffs
The difficulty of maintaining the informa7on infrastructure
Friday, October 19, 12
18
1. Introduc7onl System at the earthquake
2. Situa7on aSer the Earthquakel immediately aSer the EarthquakelOpera7on for rolling power outagel Impact of rolling power outage
3. Countermeasures against this situa7on lData Center lVPS
4. Effec7veness by countermeasures5. New System aSer the earthquake6. Lessons and Approaches
Friday, October 19, 12
19
Countermeasures against this situa7on
• Data Center– Physical Reloca7on– Reloca7on of cri7cal servers
• VPS (Virtual Private Server)– Logical Reloca7on– Reloca7on of func7ons
Friday, October 19, 12
20
Prepara7on of data center reloca7on
• Ready-‐to-‐use Data Center• Tour of the data center
– Two weeks before the earthquake
• A data center near the university by chance• Specifica7on
– 1 rack(Full Rack) 60A/100V – 100Mbps internet
Friday, October 19, 12
21
Standards of selec7ng the data center
• Access near to the university• Prepara7on of private power generator
• Carry out servers in three groups– Many checks – Carefully
• First Reloca7on– impac7ng only a few users
• Last reloca7on – E-‐mail System– impac7ng many users
Friday, October 19, 12
23
How to move to the Data Center
Friday, October 19, 12
23
How to move to the Data Center
Firewall
Friday, October 19, 12
23
How to move to the Data Center
Firewall
Friday, October 19, 12
23
How to move to the Data Center
Firewall
Net
Friday, October 19, 12
23
How to move to the Data Center
Firewall
Net
Friday, October 19, 12
23
How to move to the Data Center
LDAP Server
Firewall
Net
Friday, October 19, 12
23
How to move to the Data Center
LDAP Server
Mailing Lists Server
DNS Hosting Server
Mail Hosting Server
Firewall
Net
Friday, October 19, 12
23
How to move to the Data Center
LDAP Server
Mailing Lists Server
DNS Hosting Server
Mail Hosting Server
Firewall
Net
Friday, October 19, 12
23
How to move to the Data Center
Spam FilterAppliance
LDAP Server
Mailing Lists Server
DNS Hosting Server
Web Mail Server
Outside SMTPServer
Mail Hosting Server
University Mail Server
Firewall
Net
Friday, October 19, 12
24
The actual reloca7on of Data Center
• About one week from the applica7on of the data center
• Completed the reloca7on of all the hardware at the end of March, 2011
• Reloca7on experience – one of the opera7on staff
Friday, October 19, 12
25
Some Troubles of DNS sehngs
• Mis-‐opera7on of DNS sehng– unaccessible to mail servers
• Changing the IP addresses of servers in the data center reloca7on– shorten “TTL values” of the DNS configura7on
• Laboratory Routers– A func7on of the DNS cache– Reboot aSer the big change of infrastructure
Friday, October 19, 12
26
Usage of VPS
• VPS(Virtual Private Server)– Opera7ons via Web Browsers– Installing and sehng up some OS (CentOS, Fedora…)– Sehng up Servers freely
Friday, October 19, 12
27
Servers relocated to VPS
• Secondary Mail Spool Server– Prevent lost mail, during data center reloca7on or the rolling blackouts
• DNS Server(Slave Server)– Secondary DNS Server
• Web Server of Saitama University– www.saitama-‐u.ac.jp
• Web Hos7ng Server– Virtual Web server for laboratory and office
Friday, October 19, 12
28
1. Introduc7onl System at the earthquake
2. Situa7on aSer the Earthquakel immediately aSer the EarthquakelOpera7on for rolling power outagel Impact of rolling power outage
3. Countermeasures against this situa7on lData Center lVPS
4. Effec7veness by countermeasures5. New System aSer the earthquake6. Lessons and Approaches
Friday, October 19, 12
29
Effec7veness of Reloca7on (1)
• Reloca7on of the server decrease consump7on of electric power.
• Electricity consump7on reduc7ons suffice for the cost of data center
Friday, October 19, 12
30
Trends in electricity usageelectricity use
date
Friday, October 19, 12
30
Trends in electricity usageelectricity use
date
Friday, October 19, 12
30
Trends in electricity usageelectricity use
date
earthquake
Friday, October 19, 12
30
Trends in electricity usageelectricity use
date
earthquake
Friday, October 19, 12
30
Trends in electricity usageelectricity use
date
earthquake
Friday, October 19, 12
30
Trends in electricity usageelectricity use
date
earthquake new system star7ng
Friday, October 19, 12
30
Trends in electricity usageelectricity use
date
earthquake new system star7ng
Friday, October 19, 12
31
Effec7veness of Reloca7on (2)
• Reduc7on of the opera7on for maintaining informa7on infrastructure
• Contribu7on for stable Mail service and Web services– Availability of remote support without going to the server room at the university
Friday, October 19, 12
32
1. Introduc7onlSystem at the earthquake
2. Situa7on aSer the Earthquakelimmediately aSer the EarthquakelOpera7on for rolling power outagelImpact of rolling power outage
3. Countermeasures against this situa7on 4. Effec7veness by countermeasures5. New System aSer the earthquake6. Lessons and Approaches
Friday, October 19, 12
33
New System (2012~now)
• Wi-‐Fi Access Points (300 APs)• Virtualiza7on Technology• Aware of using the Data Center
Cisco UCS EMC VMXFriday, October 19, 12
34
Comparison of server hardware
about 40 Servers(1U or 2U Server)
2 Servers(Cisco UCS)
2007~2011 2012~now
Friday, October 19, 12
34
Comparison of server hardware
about 40 Servers(1U or 2U Server)
2 Servers(Cisco UCS)
2007~2011 2012~now
Friday, October 19, 12
35
1. Introduc7onl System at the earthquake
2. Situa7on aSer the Earthquakel immediately aSer the EarthquakelOpera7on for rolling power outagel Impact of rolling power outage
3. Countermeasures against this situa7on lData Center lVPS
4. Effec7veness by countermeasures5. New System aSer the earthquake6. Lessons and Approaches
Friday, October 19, 12
36
Organiza7ons
• The top execu7ves of the university and person in charge have the same viewpoints.– “The informa7on infrastructure is important”
• Staff skill and manpower are important
Lessons
Friday, October 19, 12
36
Organiza7ons
• The top execu7ves of the university and person in charge have the same viewpoints.– “The informa7on infrastructure is important”
• Staff skill and manpower are important
Lessons
Friday, October 19, 12
36
Organiza7ons
• The top execu7ves of the university and person in charge have the same viewpoints.– “The informa7on infrastructure is important”
• Staff skill and manpower are important
Lessons
Approaches
• Take smooth communica7ons in organiza7on• Improve technology skills of opera7on staffs• Make compact informa7on system• Set the priori7es of elements of the system
Friday, October 19, 12
37
Environments
• Because it was one campus, communica7on between faculty and staff was good.
Lessons
Friday, October 19, 12
37
Environments
• Because it was one campus, communica7on between faculty and staff was good.
Lessons
Friday, October 19, 12
37
Environments
• Because it was one campus, communica7on between faculty and staff was good.
Lessons
Approaches
• In separate Campus, Unavailability of telephones
→Preparing Satellite-‐based mobile phones
Friday, October 19, 12
38
Coopera7on among Universi7es
• We back up the data among universi7es for each other• Service for the damaged university was provided by other non-‐damaged university
Lessons
Friday, October 19, 12
38
Coopera7on among Universi7es
• We back up the data among universi7es for each other• Service for the damaged university was provided by other non-‐damaged university
Lessons
Friday, October 19, 12
38
Coopera7on among Universi7es
• We back up the data among universi7es for each other• Service for the damaged university was provided by other non-‐damaged university
• "Disaster Net Box” (from WTC2012)- Low cost backup system among universi7es
Lessons
Approaches
Friday, October 19, 12
39
System Administrators in disasters
• The change of the power generator required manpower.• In disasters, the traffic paralysis disrupted commute of system administrator.
Lessons
Friday, October 19, 12
39
System Administrators in disasters
• The change of the power generator required manpower.• In disasters, the traffic paralysis disrupted commute of system administrator.
Lessons
Friday, October 19, 12
39
System Administrators in disasters
• The change of the power generator required manpower.• In disasters, the traffic paralysis disrupted commute of system administrator.
• The measures to maintain the informa7on infrastructure remotely are effec7ve.
Lessons
Approaches
Friday, October 19, 12
40
Contribu7on for Areas near the University
• Mobile phones were unavailable in disasters.• People could not use the Internet during disasters.
Lessons
Friday, October 19, 12
40
Contribu7on for Areas near the University
• Mobile phones were unavailable in disasters.• People could not use the Internet during disasters.
Lessons
Friday, October 19, 12
40
Contribu7on for Areas near the University
• Mobile phones were unavailable in disasters.• People could not use the Internet during disasters.
Lessons
Friday, October 19, 12
40
Contribu7on for Areas near the University
• Mobile phones were unavailable in disasters.• People could not use the Internet during disasters.
Lessons
Friday, October 19, 12
40
Contribu7on for Areas near the University
• Mobile phones were unavailable in disasters.• People could not use the Internet during disasters.
Lessons
Friday, October 19, 12
40
Contribu7on for Areas near the University
• Mobile phones were unavailable in disasters.• People could not use the Internet during disasters.
Lessons
Approaches
Friday, October 19, 12
40
Contribu7on for Areas near the University
• Mobile phones were unavailable in disasters.• People could not use the Internet during disasters.
• Open the university resources for commuters and the neighborhood inhabitants in disasters
• The informa7on infrastructure of the university • Be careful about false rumors!
Lessons
Approaches
Friday, October 19, 12
41
Conclusion
• We relocated servers to Data Center and VPS as countermeasures against Rolling Blackouts.
• We learned some lessons by the Great Earthquake and the Rolling Blackouts.
Friday, October 19, 12
42
If you have ques7on or interest, please send E-‐mail as follows.