Top Banner
Real-World Techniques for Automating Configuration of Network Devices @NANOG 24 Mark Epstein CTO, Ponte February 11 th , 2002
25

Real-World Techniques for Automating Configuration of Network Devices @NANOG 24 Mark Epstein CTO, Ponte February 11 th, 2002.

Dec 18, 2015

Download

Documents

Jeffry Marsh
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Real-World Techniques for Automating Configuration of Network Devices @NANOG 24 Mark Epstein CTO, Ponte February 11 th, 2002.

Real-World Techniques forAutomating Configuration of

Network Devices@NANOG 24

Mark EpsteinCTO, Ponte

February 11th, 2002

Page 2: Real-World Techniques for Automating Configuration of Network Devices @NANOG 24 Mark Epstein CTO, Ponte February 11 th, 2002.

Slide 2

www.ponte.com© 2000-2002 Ponte Communications, Inc.

The Challenge: Large ScaleEpstein’s rule of large numbers

Responsibility for large numbers of anything that must be individually managed is a real pain

• Large firms have large numbers

– Specific business initiatives and functions

– Vendors, models, and instances of devices

– Employees and Operators

– Security breaches and breach attempts

• Additional Challenges

– High employee turnover

– More operators than device-savvy staff

Page 3: Real-World Techniques for Automating Configuration of Network Devices @NANOG 24 Mark Epstein CTO, Ponte February 11 th, 2002.

Slide 3

www.ponte.com© 2000-2002 Ponte Communications, Inc.

NOC2

Core WAN

RPOP

RPOP

NOC1

RPOP

Service Provider Network

Internet

Customer

Customer

CustomerCustomer

Customer

Customer

Customer

Customer

Customer

Customer RPOP

Customer

Customer

Page 4: Real-World Techniques for Automating Configuration of Network Devices @NANOG 24 Mark Epstein CTO, Ponte February 11 th, 2002.

Slide 4

www.ponte.com© 2000-2002 Ponte Communications, Inc.

CoreWAN

Regional POP

Intermediary Device

CoreRouter

CoreRouter

Intermediary Device

customers

AccessRouter

customers

AccessRouter

customers

AccessRouter

customers

AccessRouter

Many Devices working in Concert

Page 5: Real-World Techniques for Automating Configuration of Network Devices @NANOG 24 Mark Epstein CTO, Ponte February 11 th, 2002.

Slide 5

www.ponte.com© 2000-2002 Ponte Communications, Inc.

BusinessSystems

Operations Support Systems

InternetINTRANET

HomeOffice

Secured ChannelsBranchOffice

NetworkControl Point

Network Operations Center

CONTROL SERVER

Delivery Drivers

Assembly Templates

Security Service Modules

Network Security ControlPonte nsControl™ Architecture

NetworkControl Point

NetworkControl Point

ClientControl Point

HeadquartersOffice

Page 6: Real-World Techniques for Automating Configuration of Network Devices @NANOG 24 Mark Epstein CTO, Ponte February 11 th, 2002.

Slide 6

www.ponte.com© 2000-2002 Ponte Communications, Inc.

Many Interrelated Problems

Tcl/Expect

• Issues

– Buffer skew and device prompts

– Timing and reset behavior

– Terminal servers

– Firmware revisions and delivery

– High Availability and Fail-over

– Control channel problems

– Using existing configurations

– Differential configuration

Page 7: Real-World Techniques for Automating Configuration of Network Devices @NANOG 24 Mark Epstein CTO, Ponte February 11 th, 2002.

Slide 7

www.ponte.com© 2000-2002 Ponte Communications, Inc.

Prompts

Tcl/Expect Issues

• Buffer skew– Your code isn’t looking at what you expect

• typical enable prompt ends with “#”• banner=“### authorized users only ###”• enable prompt=“router23#”

Canonical vs custom prompts• Can cause buffer skew• Know the prompts or be very flexible

Strategies• Resync with unique text • Use time of output as additional sync?

Page 8: Real-World Techniques for Automating Configuration of Network Devices @NANOG 24 Mark Epstein CTO, Ponte February 11 th, 2002.

Slide 8

www.ponte.com© 2000-2002 Ponte Communications, Inc.

# resynchronize the buffer with text unlikely

# to occur in the input buffer

# PIX example

proc BufferResync {} {

set buffer_data ""

send "who ?\r"

expect {

-ex {usage: who [ip]} { set

buffer_data $expect_out(buffer) }

timeout { error }

}

# now safe to expect prompt

expect {

-ex {#} {}

timeout { error }

}

# empty the input buffer

expect {

-re {.*} {}

}

# error out if anything arrived after

# prompt

expect -timeout 1 {

-re {.} { error }

timeout {}

}

# erase input

send [control u]

return $buffer_data

}

Prompts

Tcl/Expect Code

Page 9: Real-World Techniques for Automating Configuration of Network Devices @NANOG 24 Mark Epstein CTO, Ponte February 11 th, 2002.

Slide 9

www.ponte.com© 2000-2002 Ponte Communications, Inc.

• Device Reset Behavior – IOS devices disconnect the control terminal on a 'reload’– But still accept new connections– And leave other active connections up until later in the reload process– Thus difficult to detect when device has completed its reset

• Typing Speed Some devices are command speed limited

– Device communication over slow serial lines– Minimum-cost processors (i.e. slow)

Inter-command speed can be naturally limited– Throttle inter-command speed by processing intervening prompts– You cannot depend on prompts

–Ex: when connecting through a terminal server to a device–Do not to send an initial [CR] too quickly or device may drop it

Speed & Timing

Tcl/Expect Issues

Page 10: Real-World Techniques for Automating Configuration of Network Devices @NANOG 24 Mark Epstein CTO, Ponte February 11 th, 2002.

Slide 10

www.ponte.com© 2000-2002 Ponte Communications, Inc.

# ask for the reload, then wait 5min before

# attempting to reconnect

ExpectReload

sleep 300

...

# slow our “typing” speed for slow device

set sendRate [JobVar DeviceBitRate]

# can only accept data at 25% of bit rate

set loadFactor 25

set send_slow [deviceSpeed $sendRate $loadFactor]

...

send -s "long data string\r"

• Measure actual device reset time, encode into scripts

• Different for every device type

• Sophistication makes sense — but still device-specific

• Slow command entry (“typing”) may be critical for reliable behavior

Speed Issues

Tcl/Expect Code

Page 11: Real-World Techniques for Automating Configuration of Network Devices @NANOG 24 Mark Epstein CTO, Ponte February 11 th, 2002.

Slide 11

www.ponte.com© 2000-2002 Ponte Communications, Inc.

Device Control via Terminal Servers 1-0

Tcl/Expect Issues

• Unpredictable prompt at connection– Serial vs. virtual-terminal TCP connection– Device may be in any state at all– Get device into known state

• Terminal server port resets– Terminal server ports get wedged– Good configuration reduces this problem– Need to be terminal-server-aware– Pay careful attention to timeouts– Rebooting terminal server may cause device reboots!!

Page 12: Real-World Techniques for Automating Configuration of Network Devices @NANOG 24 Mark Epstein CTO, Ponte February 11 th, 2002.

Slide 12

www.ponte.com© 2000-2002 Ponte Communications, Inc.

proc ExpectLogin {access} {

set timeout 10

set retries 3

set passwordfailed 0

expect {

-ex {>} {}

-ex {#} {

warning "device was left in \

enable mode"

send "disable\r"

}

-ex {sername:} {

send "[getCSUserName \

$access]\r"

exp_continue

}

-ex {assword:} {

if {$passwordfailed == 0} {

send "[getSystemPasswd \

$access]\r"

set passwordfailed 1

} else {

error "System password was \

rejected"

}

exp_continue

}

-ex {Enter Selection:} {

;# for c1900, enterprise edition

send "K"

exp_continue

}

Device Control via Terminal Servers 1-1

Tcl/Expect Code

Page 13: Real-World Techniques for Automating Configuration of Network Devices @NANOG 24 Mark Epstein CTO, Ponte February 11 th, 2002.

Slide 13

www.ponte.com© 2000-2002 Ponte Communications, Inc.

-ex {Press any key to continue.} {

send "\r"

exp_continue

}

-ex {Password required, but none \

set} {

error "Connection closed by \

foreign host. Possible cause:\

no password on device"

}

eof {

retry "Telnet connection to \

device closed unexpectedly"

}

timeout {

set timeout 120

if {$retries > 0} {

incr retries -1

send "\r" exp_continue

} else {

retry "Login timed out \

waiting for \"Password:\""

}

}

}

}

Device Control via Terminal Servers 1-2

Tcl/Expect Code

Page 14: Real-World Techniques for Automating Configuration of Network Devices @NANOG 24 Mark Epstein CTO, Ponte February 11 th, 2002.

Slide 14

www.ponte.com© 2000-2002 Ponte Communications, Inc.

Device Control via Terminal Servers 2-0

Tcl/Expect Issues

• “Console” output– Usually console (serial) is the true “console”– Terminal page length may be fixed over the serial port– Asynchronous, unrelated output increases need for resynchronization and fault tolerance

Page 15: Real-World Techniques for Automating Configuration of Network Devices @NANOG 24 Mark Epstein CTO, Ponte February 11 th, 2002.

Slide 15

www.ponte.com© 2000-2002 Ponte Communications, Inc.

Suppress console output…

# suppress line width editing

sendCmd "terminal width 0"

# suppress console monitor messages

sendCmd "terminal no monitor"

... [do stuff] ...

sendCmd "terminal monitor"

# suppress "More" prompts

sendCmd "no pager"

...

sendCmd "pager"

Try, try again…(What to do when you can’t

suppress console output)

for {set retries 0} {retries < 3} {incr retries}

{

sendCmd "show version"

set buffer_data [BufferResync]

if {regexp {VERSION: (\W)} \

$buffer_data junk version} {

break

}

}

if {! [info exists version]} { error }

Device Control via Terminal Servers 2-1

Tcl/Expect Code

Page 16: Real-World Techniques for Automating Configuration of Network Devices @NANOG 24 Mark Epstein CTO, Ponte February 11 th, 2002.

Slide 16

www.ponte.com© 2000-2002 Ponte Communications, Inc.

Special Concerns RE Firmware

Tcl/Expect Issues

• Configuration File Issues– Commands may be added or removed– Differences in meaning between versions– Often must reconfigure to support firmware– Wholesale firmware change (E.G. CatOS to IOS)

• Transfer Concerns– Distance vs. Reliability– Some devices require local access– Pilot error– TFTP

Page 17: Real-World Techniques for Automating Configuration of Network Devices @NANOG 24 Mark Epstein CTO, Ponte February 11 th, 2002.

Slide 17

www.ponte.com© 2000-2002 Ponte Communications, Inc.

Fail-over Devices

Tcl/Expect Issues

• Active/standby and primary/secondary– IP address vs. terminal server “mismatch”– “Two men say they’re Jesus, one of them must be wrong”

• Change volume limits (PIX example) (i.e., 200 lines of conduit changes per “commit”)

• New and expanded commands

Page 18: Real-World Techniques for Automating Configuration of Network Devices @NANOG 24 Mark Epstein CTO, Ponte February 11 th, 2002.

Slide 18

www.ponte.com© 2000-2002 Ponte Communications, Inc.

proc PIXActive {} {

set cablestatus {NOT FOUND}

set iam {}

set state {}

send "\r"

expect {

{# $} {

send "sho fai\r"

}

timeout {

sendAbort

error "PIXActive: timed out \

waiting for first prompt"

}

}

expect {

-re "Cable status: (\[^\r\n]*)" {

set cablestatus \

$expect_out(1,string)

}

-re "(\n|\r)<--- More --->" {

send " \r"

exp_continue

}

timeout {

sendAbort

error "PIXActive: timed out \

searching for `Cable \

status:.*'"

}

}

Fail-over Devices [detection] (1)

Tcl/Expect Code

Page 19: Real-World Techniques for Automating Configuration of Network Devices @NANOG 24 Mark Epstein CTO, Ponte February 11 th, 2002.

Slide 19

www.ponte.com© 2000-2002 Ponte Communications, Inc.

expect {

-re "This host: (\[^ ]*) - \

(\[^ \r\n]*)" {

set iam $expect_out(1,string)

set state $expect_out(2,string)

}

-re "(\n|\r)<--- More --->" {

send " \r"

exp_continue

}

timeout {

sendAbort

error "PIXActive: timed out \

searching for `This host:.*'"

}

}

expect {

-re "(\n|\r)<--- More --->" {

send "q\r"

exp_continue

}

{# $} {}

timeout {

sendAbort

error "PIXActive: timed out \

waiting for final prompt"

}

}

Fail-over Devices [detection] (2)

Tcl/Expect Code

Page 20: Real-World Techniques for Automating Configuration of Network Devices @NANOG 24 Mark Epstein CTO, Ponte February 11 th, 2002.

Slide 20

www.ponte.com© 2000-2002 Ponte Communications, Inc.

if {$iam == "Secondary" && \

$state == "Active"} {

JobRetAdd -append Warning \

failover_secondary_active \

"Secondary PIX is Active, \

cable status: $cablestatus\n"

}

if {$cablestatus != "Normal"} {

sendAbort

error "PIXActive: cable status \

failure: $iam Cable status: \

$cablestatus"

}

switch -- $state {

{Standby} {return 0}

{Active} {return 1}

default {

sendAbort

error "PIXActive: failed to \

determine if this host \

active, host $iam, state \

$state, \

cable status $cablestatus"

}

}

}

Fail-over Devices [detection] (3)

Tcl/Expect Code

Page 21: Real-World Techniques for Automating Configuration of Network Devices @NANOG 24 Mark Epstein CTO, Ponte February 11 th, 2002.

Slide 21

www.ponte.com© 2000-2002 Ponte Communications, Inc.

proc ExpectConfigure {cmds} {

ExpectConfigMode

set count 0

foreach cmd $cmds {

send -s $cmd

send "\r"

expect {

-ex {Type help or '?' for a list of \

available commands.} {

sendAbort

error "ExpectConfigure: invalid \ configuration command \

detected, check session log"

}

{(config)#} { }

timeout {

sendAbort

error "ExpectConfigure: timed \

out waiting for (config)# \

after $cmd"

}

}

if {[incr count] > [JobVar

MaxConfigurationLines]} {

set count 0

ExpectWriteConfig

sleep 30

ExpectConfigMode

}

}

ExpectWriteConfig

}

Fail-over Devices [change volume]

Tcl/Expect Code

Page 22: Real-World Techniques for Automating Configuration of Network Devices @NANOG 24 Mark Epstein CTO, Ponte February 11 th, 2002.

Slide 22

www.ponte.com© 2000-2002 Ponte Communications, Inc.

Control Channel Problems

Tcl/Expect Issues/Code

• Loss of connection triggers Expect EOF

• Many scripts consider this retry-able– Often caused by transient network failure– But what state was the device in, anyway?

• Distribute control to reduce risk– Place control close to devices– Distance between control and controlled device == risk of network failure

expect {

....

eof { retry "lost connection, retry request" }

}

Page 23: Real-World Techniques for Automating Configuration of Network Devices @NANOG 24 Mark Epstein CTO, Ponte February 11 th, 2002.

Slide 23

www.ponte.com© 2000-2002 Ponte Communications, Inc.

Turning Found Configurations into Data

Tcl/Expect Issues

• Retrieve configuration from a PIX#roam-request -a pixdevice -- req_class=AuditConfig \action=import-pixconfig

• After the configuration is retrieved, import- pixconfig uses roam-pixload to parse configuration

#roam-pixload -r $requestId

• roam-pixload gets relevant data from configuration

–interface name, security level, mtu, speed/options,ip address, netmask, fail-over configuration,access and enable passwords

• Then pushes found data back into device profile#roam-device pixdevice -- iface.inside.speed=auto#...

• Device profile is used in combination with template to create configuration file

Page 24: Real-World Techniques for Automating Configuration of Network Devices @NANOG 24 Mark Epstein CTO, Ponte February 11 th, 2002.

Slide 24

www.ponte.com© 2000-2002 Ponte Communications, Inc.

• The only way to update device configurations without losing connections on most devices

• Often not possible - many device commands are not invertable

• Must take care to maintain control connectivity

• Cannot do it for firmware update

• Often difficult– Often cannot just add onto end of existing configuration– Can cause serious security issues– Order-dependent configuration changes often cannot be made at all

• Much more difficult to do reliably than just replacing startup configuration and reloading

Differential Configuration

Tcl/Expect Issues

Page 25: Real-World Techniques for Automating Configuration of Network Devices @NANOG 24 Mark Epstein CTO, Ponte February 11 th, 2002.

Slide 25

www.ponte.com© 2000-2002 Ponte Communications, Inc.

proc ConfigUpdate {} {

global spawn_id timeout

set system [getSystemPasswd \

[JobGetVar Access]]

set enable [getEnablePasswd \

[JobGetVar Access]]

if {[JobVarExists Failover] && \

[JobGetVar Failover]} {

ConnectFailover $system $enable \

[JobGetVar RemoteAddrList]

} else {

Connect [lindex [JobGetVar \

RemoteAddrList] 0] \

$system $enable

}

set conffile [ExpectGetConfig] VerifyTarget $conffile

set oldconf [prepare_config $conffile]

set newconf [prepare_config \

[JobGetFile [JobGetVar ConfigFile]]]

ExpectConfigure [ComputeDeltaConfig \

$oldconf $newconf]

send "exit\r"

ExpectClose

}

Differential Configuration

Tcl/Expect Code