Top Banner
Infrastructure Through the Public Netwo Challenges & Solutions
38

IT Infrastructure Through The Public Network Challenges And Solutions

Dec 14, 2014

Download

Documents

Martin Jackson

Identifying the challenges that companies face when they wish to adopt Infrastructure as a Service like those from Amazon and Rackspace and possible solutions to those problems. This presentation seeks to provide insight and possible solutions, covering the areas of security, availability, cloud standards, interoperability, vendor lock in and performance management.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: IT Infrastructure Through The Public Network   Challenges And Solutions

IT Infrastructure Through the Public Network:Challenges & Solutions

Page 2: IT Infrastructure Through The Public Network   Challenges And Solutions

$ whoami

• Martin Jackson – Uncommon Sense Consulting– Working in the IT Field since 1993– Linux and Virtualization Consultant specialising

in automated build and deployment of virtual infrastructures

– Infrastructure as Code Hacker– DevOps Advocate– Keen Judoka– @actionjack on Twitter– [email protected]

Page 3: IT Infrastructure Through The Public Network   Challenges And Solutions

$ cat /infrastructure/info

Source: http://en.wikipedia.org/wiki/Cloud_computing

Page 4: IT Infrastructure Through The Public Network   Challenges And Solutions

$ whatis iaas

• Outsourced Hardware• Outsourced Operating system• Outsourced Network• Self Managed• Typically available in Minutes• Pay per play

Page 5: IT Infrastructure Through The Public Network   Challenges And Solutions

#1 Challenge:

Security

Page 6: IT Infrastructure Through The Public Network   Challenges And Solutions

$ info security

• How do you protect your data in an infrastructure that you do not own or control?

Page 7: IT Infrastructure Through The Public Network   Challenges And Solutions

$ cat security/access

• Protect your API keys and Use complex passwords– Cyber-Ark Enterprise Vault–Manage Engine Password Manager Pro – KeePass– APG and GPG

Page 8: IT Infrastructure Through The Public Network   Challenges And Solutions

$ cat security/access

• Keep your systems patched (religiously)– Yum– Red Hat Network–Microsoft Update Network– Shavlik NetChk Protect– Apt

Page 9: IT Infrastructure Through The Public Network   Challenges And Solutions

$ cat security/access

• Limit access to least privilege – Only create accounts for those who

“need” them– Create separate accounts per device– Do not allow direct access via privileged

user accounts e.g. Administrator or Root– Use audited privilege elevation e.g. sudo,

rootsh, sudosh, runas, shellrunas– Only use encrypted login mechanisms

e.g. ssh, ssl certificates

Page 10: IT Infrastructure Through The Public Network   Challenges And Solutions

$ cat security/access

• Aggregate and monitor all login attempts– Splunk– Logstash– Graylog2– GFI Events Manager

Page 11: IT Infrastructure Through The Public Network   Challenges And Solutions

$ cat security/data

• Encrypt your sensitive data before you place it into the cloud– PGP, GPG

• Keep it encrypted while in the cloud– TrueCrypt, LUKS

• Ensure encryption is maintained if data needs to be transmitted elsewhere– SCP, SSL, VPN, SSH

Page 12: IT Infrastructure Through The Public Network   Challenges And Solutions

$ cat security/network

• If you need Secure Intra IaaS communication– SSL Auth– CohesiveFT’s VPN-Cubed– OpenVPN– Amazon Virtual Private Cloud

Page 13: IT Infrastructure Through The Public Network   Challenges And Solutions

#2 Challenge:

Outages

Page 14: IT Infrastructure Through The Public Network   Challenges And Solutions

$ whatis outage

• Unplanned unavailability of a service

"...in the cloud, you control your SLA..."

George Reese, founder enStratus Networks LLC

Page 15: IT Infrastructure Through The Public Network   Challenges And Solutions

$ whatis outage

“large-scale, essentially self-managed and commoditised infrastructure-as-a-service (IaaS) has price benefits but, if

things go wrong, they do so in a big way”

Dr Aydin Kurt-Elli, Lumison

Page 16: IT Infrastructure Through The Public Network   Challenges And Solutions

$ whatis outage

• Vendor: TerremarkOutage Date: March 17, 2010Outage Duration: 7 hoursReason for Outage: Terremark's vCloud Express services suffered an outage after a bout of connectivity loss in its Miami data center. T he outage resulted in intermittent periods of connectivity with high data packet loss starting at 11:54 a.m. eastern and lasting more than seven hours, ending at 7:05 p.m. eastern time. According to Apparent Networks' Cloud Performance Center, during the outage access to systems in Terremark's Miami data center was severely degraded and often unavailable, affecting many businesses using Terremark's vCloud Express services.Severity: Medium

• http://www.crn.com/slide-shows/applications-os/225701829/10-biggest-cloud-outages-of-2010-so-far.htm;jsessionid=o+AywGYF+Mv5w3ZoWChIbQ**.ecappj01?pgno=5

Page 17: IT Infrastructure Through The Public Network   Challenges And Solutions

$ whatis outage

• Vendor:Rackspace• Outage Date:2011-02-01• Outage Duration:30 minutes• Reason for Outage:DNS Issue Causes MySQL Server

Outage.An unspecified DNS issue prevented users from connecting to MySQL and making external API calls. Rackspace resolved the issue and advised their users to refresh their browsers to view the site properly.

• Severity:Low • http://outagecenter.com/rackspace-cloud-reports/

cloud-sites-dfw1-wc2-degraded-2/

Page 18: IT Infrastructure Through The Public Network   Challenges And Solutions

$ whatis outage

• Vendor:Rackspace• Outage Date:April 28,2011• Outage Duration:6 hours• Reason for Outage:At approximately 4:00 PM (CDT)

customers began to experience connectivity issues related to Domain Name System (DNS) on Jungle Disk/Cloud Drive.The issue was identified to be an error with hostname translations on a single DNS server. This server was returning erroneous DNS information.an emergency maintenance to change the DNS configuration was performed In order to mitigate the issue.

• Severity: Medium  

Page 19: IT Infrastructure Through The Public Network   Challenges And Solutions

$ whatis outage

• Vendor: Amazon Web ServiceOutage Date: April 21, 2011Outage Duration: UnknownReason for Outage: Amazon began reporting trouble on its Service Health Dashboard about 5 a.m. Eastern today. At 5:16 a.m., the site reported connectivity issues that were affecting its Relational Database Service, which is used to manage a relational database in the cloud, across multiple zones in the eastern U.S. A networking event early this morning triggered a large amount of re-mirroring of EBS volumes in US-EAST-1.The re-mirroring created a shortage of capacity in one of the US-EAST-1 Availability Zones, which impacted new EBS volume creation as well as the pace with which we could re-mirror and recover affected EBS volumes. Amazon also reported problems with its EC2, or Elastic Compute Cloud, a service that provides pay-as-you-go compute capacity in the cloud. The company also reported issues with its EBS, or Elastic Block Storage, which is storage related to the EC2 service.

• Severity: High• http://

www.computerworld.com/s/article/9216064/Amazon_gets_black_eye_from_cloud_outage

Page 20: IT Infrastructure Through The Public Network   Challenges And Solutions

$ whatis outage

• Vendor: Amazon Web ServiceOutage Date: August 08, 2011Outage Duration: 30 MinutesReason for Outage: The issue happened in the networks that connect the Availability Zones to the internet. The event began when a southern router inside one of Availability Zones briefly stopped exchanging route information with all adjacent devices, going into an incommunicative state. Upon re-establishing its health, the router began advertising an unusable route to other southern routers in other Availability Zones, deviating from its configuration and bypassing the standard protocol restriction on how routes are allowed to flow. The bad default internet route was picked up and used by the routers in other Availability Zones. Internet traffic from multiple Availability Zones in US East was immediately not routable out to the internet through the border. The issue was resolved by removing the router from service.Severity:Medium

• http://outagecenter.com/category/amazon-web-services-reports/amazon-elastic-compute-cloud-ec2-north-virginia/

Page 21: IT Infrastructure Through The Public Network   Challenges And Solutions

$ whatis outage

• Failure is the new black, expect it and embrace it

• Design for failure and build your infrastructures to be redundant on 5 different levels– Physical– Virtual resource– Availability zone– Region– Cloud

Page 22: IT Infrastructure Through The Public Network   Challenges And Solutions

#3 Challenge:

Standards

Page 23: IT Infrastructure Through The Public Network   Challenges And Solutions

$ find standard

• Cloud standards and Interoperability• To be honest they don’t exist yet…

http://www.infoq.com/articles/problem-with-cloud-computing-standardization

Page 24: IT Infrastructure Through The Public Network   Challenges And Solutions

$ cat standard/api

• Many different clouds…• Many ways to interact with them…• All do the same sort thing…• Let abstract them– Deltacloud– Libcloud– Jclouds

Page 25: IT Infrastructure Through The Public Network   Challenges And Solutions

$ cat standard/api/deltacloud

• http://incubator.apache.org/deltacloud/

• Ruby client

http://www.infoq.com/articles/problem-with-cloud-computing-standardization

require 'deltacloud'

api_url = 'http://localhost:3001/api' api_name = 'mockuser' api_password = 'mockpassword' client = DeltaCloud.new( api_name, api_password, api_url )

Page 26: IT Infrastructure Through The Public Network   Challenges And Solutions

$ cat standard/api/libcloud

• http://libcloud.apache.org/• Python client

http://www.infoq.com/articles/problem-with-cloud-computing-standardization

from libcloud.compute.types import Providerfrom libcloud.compute.providers import get_driver

EC2_ACCESS_ID = 'your access id'EC2_SECRET_KEY = 'your secret key'

Driver = get_driver(Provider.EC2)conn = Driver(EC2_ACCESS_ID, EC2_SECRET_KEY)

Page 27: IT Infrastructure Through The Public Network   Challenges And Solutions

$ cat standard/api/jclouds

• http://libcloud.apache.org/• Java client

http://www.infoq.com/articles/problem-with-cloud-computing-standardization

ComputeServiceContext context = new ComputeServiceContextFactory().createContext("aws-ec2", accesskeyid, secretkey, ImmutableSet.<Module> of(new Log4JLoggingModule(), new JschSshClientModule()));

Page 28: IT Infrastructure Through The Public Network   Challenges And Solutions

#4 Challenge:

Monitoring and Management

Page 29: IT Infrastructure Through The Public Network   Challenges And Solutions

$ service monitor status

• Pay per play monitoring or fixed instance

• On premise or Off• Ramping up and tearing down of

instances• Focus on Service monitoring vs host

monitoring• Monitoring tool must have an api

Page 30: IT Infrastructure Through The Public Network   Challenges And Solutions

$ service monitor status

• Next Generation Cloud Monitoring Services

• Cloudkick - https://www.cloudkick.com

• Pingdom - http://www.pingdom.com• Watchmouse -

http://www.watchmouse.com

• Monitis – http://www.monitis.com

Page 31: IT Infrastructure Through The Public Network   Challenges And Solutions

$ service management status

• Provision within minutes – Ready in Days???

• If it takes 5 minutes to get a Virtual Machine

• How long are you willing to wait to use it?• Data Center Automation Tools can help– Puppet– Chef– CFEngine

Page 32: IT Infrastructure Through The Public Network   Challenges And Solutions

$ cat management/puppet

• http://puppetlabs.com/

package { 'openssh-server':ensure => installed,}

Page 33: IT Infrastructure Through The Public Network   Challenges And Solutions

$ cat management/chef

• http://www.opscode.com/

package "openssh-server" do action :installend

Page 34: IT Infrastructure Through The Public Network   Challenges And Solutions

$ cat management/chef

• http://cfengine.com/

control: any:: actionsequence = ( packages )

DefaultPkgMgr = ( rpm ) RPMcommand = ( /bin/rpm ) RPMInstallCommand = ( "/usr/bin/yum -y install %s" )

packages: any:: openssh-server action=install

Page 35: IT Infrastructure Through The Public Network   Challenges And Solutions

#5 Challenge:

Governance

Page 36: IT Infrastructure Through The Public Network   Challenges And Solutions

$ make governance

• The game has changed and you’ll need to change with it

• Conway's law applies:“...organizations which design systems ... are constrained to produce designs

which are copies of the communication structures of these organizations.”

Page 37: IT Infrastructure Through The Public Network   Challenges And Solutions

Challenge:

Questions

Page 38: IT Infrastructure Through The Public Network   Challenges And Solutions

$ cat links

• http://www.accenture.com/us-en/outlook/Pages/outlook-online-2011-challenges-cloud-computing.aspx

• http://www.infoq.com/articles/problem-with-cloud-computing-standardization

• http://www.computerworld.com/s/article/9217158/Cloud_interoperability_Problems_and_best_practices

• http://www.theaccidentalsuccessfulcio.com/cloud-computing/cio-cloud-computing-101-problems-with-clouds

• http://nylawblog.typepad.com/suigeneris/2009/11/does-cloudcomputing-compromise-clients.html

• http://horicky.blogspot.com/2009/08/multi-tenancy-in-cloud-computing.html

• http://www.cio.com/article/488478/The_Trouble_with_Cloud_Vendor_Lock_in

• http://www.agathongroup.com/blog/2010/04/cloud-computing-and-latency/