Top Banner
IBM Almaden Research Center April 23, 2006 End-User Software Engineering for System Administrators Allen Cypher, Eben Haber, Eser Kandogan USER Group, Computer Science
9

IBM Almaden Research Center April 23, 2006 End-User Software Engineering for System Administrators Allen Cypher, Eben Haber, Eser Kandogan USER Group,

Mar 27, 2015

Download

Documents

Mia Goodwin
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: IBM Almaden Research Center April 23, 2006 End-User Software Engineering for System Administrators Allen Cypher, Eben Haber, Eser Kandogan USER Group,

IBM Almaden Research Center

April 23, 2006

End-User Software Engineering for System Administrators

Allen Cypher, Eben Haber, Eser KandoganUSER Group, Computer Science

Page 2: IBM Almaden Research Center April 23, 2006 End-User Software Engineering for System Administrators Allen Cypher, Eben Haber, Eser Kandogan USER Group,

IBM Almaden Research Center

EUSE for System Adminstrators © 2006 IBM Corporation2

Overview

Field Studies of System Management

Autonomic Task Manager for Administrator/A1: System Management Tool Development

Studies on Policy-based Autonomic Computing– Field Studies

– Experimental Studies

Ongoing and Future Work– System Simulation Studies

– Activities for IT Management

Page 3: IBM Almaden Research Center April 23, 2006 End-User Software Engineering for System Administrators Allen Cypher, Eben Haber, Eser Kandogan USER Group,

IBM Almaden Research Center

EUSE for System Adminstrators © 2006 IBM Corporation3

System Management Today

Complexity Large scale operations, complex layered IT infrastructures, variety of system interactions, many system management tools, various notions of configurations, permissions, etc.

RiskContractual obligations, high availability demands, unacceptable loss of data, dependence of modern life

CostPeople cost dominating over hardware and software, obstacle to future technology

America Online6 August 1996 outage: 24 hoursMaintenance/Human ErrorCost: $3 million in rebates

NYSEJune 8, 2001>1700 stocks stopped trading for 90 minutesSoftware Upgrade

Storage: What $3 million bought in 1984 and 2000.

1984 2000

$2 millionStorageAdministration

$1

$2

$3 mil

$1

$2

$3 mil

$1 millionSystem

$1 millionStorage Administration

$2 millionSystem

Web Browser

Yellow Zone

Example GWA Deployment Architecture Diagram for an Internet application with Yellow Zone

Non-GWA Legacy Databasew/ MQ MgrHostname:

Red Zone

(Internet)

http(s)

Green Zone

DPropR

DB2 Staging ServerMQ Client

CAE

DFSStaging Server

e-mail

MQ Server

DominoClusterMQClientIR pluginProd DFS

GNA Replication for Hybrid

Content Changes

LEI or LCLSX

DFS Production

Cell

DFS Staging

Cell

FTP ContentCPT

WebSphereClusterIR pluginServletsJSPsEJBsMQClient

Prod DFS

ND

IHS = IBM HTTP Server ND = Network Dispatcher WAS = WebSphere Application Server WTE = Web Traffic Express Proxy Server

ND

IHS ClusterHTMLCGI/FCGInet.dataCAE

IR pluginProd DFS

WTE

Java MQ Agent

Reverse Proxy

WAS Session DB2 Server

DB2 ProdServerMQ Client

Notes Replication

Asynch App ServerJava/C/PerlCron, MailMQ Client

e-mail

JDBC

Javamail

JDBC

CAE or JDBC

DominoStagingMQClientDFS

URL

MQ Server

Content Changes

Blue Zone (Intranet)

LEI orLCLSX

LEI or LCLSX

Db2Connect or JDBC

Java MQ Agent

JDBC

“We upgrade and patch systems all the time. We have about 1,000

production systems that are Sun, HP , Linux, and Dell."

Page 4: IBM Almaden Research Center April 23, 2006 End-User Software Engineering for System Administrators Allen Cypher, Eben Haber, Eser Kandogan USER Group,

IBM Almaden Research Center

EUSE for System Adminstrators © 2006 IBM Corporation4

Field Studies of System Management

Web Hosting, Data Management, Operating System, Security, Storage, Data Center Operations

14 Visits, 6 sites

Surveys (~ 100 people) Observations (~ 50 days) Video (~ 300 hours) Interviews (~ 30 people) Diary (~ 10 months)

Qualitative and quantitative analysis

• Data Management Poughkeepsie

3 Days

• Web Hosting Boulder

3 Days + 1 Eve

• Web Hosting Southbury

1 Week

• Web Hosting Southbury

1 Week

• Data ManagementCharlotte3 Days

• Web Hosting Boulder1 Week

• Storage Boulder3 Days

• Security Urbana1 Week

• Operating system Boulder3 Days

• Security Urbana3 Days

• Storage Greenbelt

3 Days

• Storage Greenbelt

3 Days

• Data Center Boulder3 Days

• Data Center Boulder3 Days

Page 5: IBM Almaden Research Center April 23, 2006 End-User Software Engineering for System Administrators Allen Cypher, Eben Haber, Eser Kandogan USER Group,

IBM Almaden Research Center

EUSE for System Adminstrators © 2006 IBM Corporation5

A Case from the Field

“How can I put a number and date on the same line?”

Page 6: IBM Almaden Research Center April 23, 2006 End-User Software Engineering for System Administrators Allen Cypher, Eben Haber, Eser Kandogan USER Group,

IBM Almaden Research Center

EUSE for System Adminstrators © 2006 IBM Corporation6

“How can I put a number and date on the same line?”

Task: Resolve customer issue so that problem does not repeat itself for three consecutive days

People: Rob (web admin), Jack (architect), Andy (operating system), Managers, Tech Support, etc.

Problem: Intermittent connection spikes between web server and application server that lead to unresponsive application

Page 7: IBM Almaden Research Center April 23, 2006 End-User Software Engineering for System Administrators Allen Cypher, Eben Haber, Eser Kandogan USER Group,

IBM Almaden Research Center

EUSE for System Adminstrators © 2006 IBM Corporation7

The Crit Sit

Page 8: IBM Almaden Research Center April 23, 2006 End-User Software Engineering for System Administrators Allen Cypher, Eben Haber, Eser Kandogan USER Group,

IBM Almaden Research Center

EUSE for System Adminstrators © 2006 IBM Corporation8

Troubleshooting intermittent, multi-system problems is hard.

Sysadmins build tools to increase shared situational awareness.

Tool building is a collaborative effort too.

There is a variety of tool building expertise among admins.

Existing scripting languages aren't aimed at non-programmers.

Scripting languages don’t handle input & output and errors very well.

“How can I put a number and date on the same line?”

Page 9: IBM Almaden Research Center April 23, 2006 End-User Software Engineering for System Administrators Allen Cypher, Eben Haber, Eser Kandogan USER Group,

IBM Almaden Research Center

EUSE for System Adminstrators © 2006 IBM Corporation9

Issues in Tools and Tool Building

Diversity: “Every shop is different with its own processes and infrastructure. Vendor tools sometimes are just not doing it.”

Integration: “There isn’t one tool that does all the things I need to do. Every vendor provides its own management utilities.”

Transparency: “I tend to learn the guts of things in a CLI as close to the heart of the matter as possible , and then translate it into a GUI”

24x7 Coverage: “We have scripts that monitor our systems and notify us when something goes wrong. We can’t be on the job all the time!”

Trust: “I prefer the CLI. These tools seem to be the most truthful and accurate for administration.”

Efficiency: “I need to reset 150 passwords every week!”

Risk: “We can’t afford to make mistakes as we move from test to staging to production servers.”