Tilman Wolf Department of Electrical and Computer Engineering University of Massachusetts Amherst October 7, 2016 Attacks and Hardware Defenses for Network Infrastructure Tilman Wolf 2 Computer Networks § Networks provide connectivity between end-systems • Use for remote access, data transfers, control, etc. § Networking requires common protocols for communication client server network
29
Embed
Attacks and Hardware Defenses for Network Infrastructure · Attacks and Hardware Defenses for Network Infrastructure Tilman Wolf 2 Computer Networks ... man-in-the-middle attack,
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Tilman WolfDepartment of Electrical and Computer Engineering
University of Massachusetts Amherst
October 7, 2016
Attacks and Hardware Defenses for Network Infrastructure
Tilman Wolf 2
Computer Networks§ Networks provide connectivity between end-systems
• Use for remote access, data transfers, control, etc.
§ Networking requires common protocols for communication
client
server
network
Tilman Wolf 3
Computer Networks§ Success of the Internet: hourglass architecture
• Very basic services (connectivity, bit pipes, etc.)• Highly diverse set of applications “on top”
§ Success is also a problem• Diverse applications, diverse systems
§ Changing requirements for network layer• New network functionality
− Security, quality-of-service, multicast, reliability, etc.
• New communication paradigms− Content distribution, content addressable networks, data aggregation, etc.
• Application-layer processing in network− Payload transcoding, content-based load balancing, etc.
Layered protocol stack
Physical layer
Link layer
Network layer
Transport layer
Application layer
IP
UDP TCP
HTTP
TLS/SSL
DNS BGP
SIP
Ethernet
DSL FDDI
1000BASE-T
SONET/SDH802.11a/b/g/n
RS-232
...
...
...
...
Example protocols
Tilman Wolf 4
Extended Network Functionality§ Extensions to current Internet
• New functionality in routers− Firewalls, network address translation− Intrusion detection systems− Traffic shaping− Etc.
§ Customization of data plane• Complex per-packet protocol
processing operation • Deployment of new features at runtime• Vendors may compete on features
§ Requires routers with ability to adapt• Programmability is necessary
`
End system:- IP security- TCP termination
Server:- Content-based switching- Firewall- SSL termination- IP security
• Custom network processor on NetFPGA• Packet forwarding with vulnerable code• Malicious packet injected into benign background traffic
§ Single malicious packet triggers attack at full link rate!
§ Attack has not yet been shown on commercial system
Tilman Wolf 16
Header Insertion Exploit§ Ability to exploit vulnerability depends on processor system
• Previous result: custom ARM-based packet processor• Other system: Click modular router on Linux system
− Stack smashing crashes router, but could not create DoS attack
§ Main observation: software on NP can be attacked• Exploits can happen through data plane only
§ Need to develop defense mechanisms for router systems
Tilman Wolf 17
Outline§ Introduction§ Vulnerabilities
• Example attack on network processor§ Defense mechanism
• Hardware monitor§ Extensions
• Multicore hardware monitor and dynamic workloads• Secure loading and avoiding homogeneity• Operating system support
§ Conclusions
Tilman Wolf 18
Defense Mechanism for Processors§ Software-based defenses (e.g., virus scanner)
• High processing overhead• Processing requirement is proportional to input/output operations• Scanning for known attacks is reactive, not proactive
§ Hardware-based defenses more suitable• Defense mechanism can be separated from data plane
− Makes it more difficult to circumvent• Performance impact on packet processor is small• Challenge is to make it dynamically adaptable
− Needs to work for new packet processing functions
§ We have designed and prototyped hardware defense for NP• Hardware monitor tracks processor• Deviation from “normal behavior” due to attack can be detected• Reset operation recovers system
Tilman Wolf 19
Related Work§ Monitor-based defense mechanism for embedded systems
• Aurora et al., DATE 2005• Ragel et al. DAC 2006• Zambreno et al., TECS 2005• Our monitor uses finer-grained monitoring for faster detection
− More details in Mao and Wolf, TC 2010
§ Processor-based defense mechanisms• No eXecute (NX) bit (creates virtual Harvard architecture)• Depends on processor architecture
§ Network-based defense mechanisms• Attack signature in intrusion-detection systems (e.g., snort, bro)• Problem with system homogeneity and IDS only at network edge
Tilman Wolf 20
System Architecture§ Hardware monitor
co-located witheach processor core• Core reports hash of
each executed instruction§ Monitoring graph repre-
sents correct behavior• Obtained from offline
analysis of binary• Deviations trigger reset
§ Change of software easy• Just need matching
monitoring graph
networkprocessor
core
instruction memory
data memorypacket buffer
processing code
network interface
comparison logic
mon. memorymon. graph
netw
ork
proc
esso
r
hard
war
e m
onito
r
hash of processinginstruction
reset/recovery
processing codebinary
NFA monitoringgraph
DFA monitoringgraph
NFA-to-DFA transformation
offli
ne a
naly
sis
runt
ime
oper
atio
n
Tilman Wolf 21
Offline Analysis of Processing Binary§ Executed instruction reported by core as 4-bit hash
• Hash combines address, opcode, registers• Hash allows for compact representation of information
§ Monitoring graph• Each instruction represented as a state• Edges correspond to execution of instruction• Control-flow operations lead to multiple possible next states
• Single memory access plus lookup into fixed-size register file§ Memory size of monitor
• More states due to NFA-to-DFA conversion• More states due
to multiple entries in memory for certain states
• In practice, overhead is below 10%
Tilman Wolf 27
Timing Diagram§ Attack without monitor
• Attack packet is forwarded on all ports
53
6.2. Experimental results
6.2.1. Attack Detection
This section explains the experiments performed to test the ability of our proposed
security monitoring system to detect and recover from an attack. We observed the security
monitor operation in simulation using the ModelSim-Altera simulator [41], and in hardware
using an Altera Signal-tap logic generator [56].
6.2.1.1. Network processor without security monitor
We initially tested the single-core network processor operation without the security
monitor system when the attack described in section 5.1 is implemented. Figure 34 shows the
simulation results for the behavior of the processor system. The attack packet was received
through MAC port Rx0, and then forwarded to the network processor. The processor then
forwards the attack packet to all the outgoing ports of the router and then crashes the router.
This behavior was also verified in hardware.
Figure 34: Simulation waveform showing attack packet propagation in the network processor system.
6.2.1.2. Network processor with security monitor
We then repeated the previous experiment after including the security monitor as
illustrated in Figure 26. Figure 35 shows the simulation results for the behavior of the
network processor system when an attack packet and normal packet are sent simultaneously.
Tilman Wolf 28
Timing Diagram§ Monitor works as expected
• Attack packet is detected and dropped• Later normal packet is forwarded
Tilman Wolf 29
Attack with Defense in Place§ Attack packet dropped, router continues to operate
Tilman Wolf 30
Outline§ Introduction§ Vulnerabilities
• Example attack on network processor§ Defense mechanism
• Hardware monitor§ Extensions
• Multicore hardware monitor and dynamic workloads• Secure loading and avoiding homogeneity• Operating system support
§ Conclusions
Tilman Wolf 31
Multicore Monitor§ Dynamic workloads pose problem for
hardware monitor• Processing may differ between packets• Monitors need to match processing
§ Mapping between processors and monitors• 1-to-1 mapping requires frequent reload of monitor• Any-to-any mapping costly to implement• Clusters with n-to-m mapping provide balance
§ Interconnect is configured dyna-mically depending on workload• Mapping between core and
monitor
core
monitor
core
monitor
core
monitor
core
monitor...
...
core core core
monitor monitor monitor monitor
...
...
core core
monitor monitor monitor
...
core core
monitor monitor monitor
...
...
...
...
Tilman Wolf 32
System Architecture of Clustered System§ Multiple cores can access multiple monitors
• Dynamic configuration of crossbar§ Secure loading of monitors through external interface
Proc Proc Proc... Proc Proc Proc
...
Proc Proc Proc...
n Processors
Inter-coreInterconnect
External Memory
Crossbar Crossbar Crossbar
Mon Mon Mon...
m Monitors
Mon Mon Mon... Mon Mon Mon...
... ...
AESCentralized
MonitorMemory
Control Processor
...
External Interface
ControlSignals
Network Interface
Tilman Wolf 33
Cluster Design§ Simple implementation of clustered monitor
• Dynamic configuration through programming of demultiplexersNP Core
32
Hash
MonitorSelect
3
4Hash_1
R/R fromMonitor_1 to 6
Reset/RecoverNP Core
Hash
Hash_2
NP Core
Hash
Hash_4
Monitor_1
Hash 4
FromHash_1 to 4
Reset/Recover
1
Monitor_2 Monitor_3 Monitor_4 Monitor_5 Monitor_6
Proc Select
2
NP Core
Hash
Hash_3
1
4
Tilman Wolf 34
Dual-Ported Monitor Implementation§ Memory of monitor can be shared between two monitors
• Effective use of dual-ported memory• Two monitoring graphs can be used in parallel
One-hotEncoding
HashComparison
4-bit HashFunction
16
4
32
1
ProcessorInstruction
Reset/Recover
NextStateSelect
4
GraphSelect
K-1
K
1
Next State
Valid Hashes
One-hotEncoding
HashComparison
4-bit HashFunction
4
16
NextStateSelect
GraphSelect
1
K
4
14
Next StateValid Hashes 14
1
Reset/Recover
ProcessorInstruction
32
Monitor 1
Monitor 2
Monitoring Graph 2
Monitoring Graph 1
Tilman Wolf 35
Runtime Monitor Allocation§ How many monitors per cluster?
• Number of monitors m, number of processor cores n§ Analytical model
• Blocking occurswhen no monitoris available forgiven packetprocessing
• Two programswith equaltraffic and workloadassumed
§ Overprovisioningof 1.5 is sufficient
0.5
0.6
0.7
0.8
0.9
1
1 1.2 1.4 1.6 1.8 2
thro
ughp
ut
monitor overprovisioning (m/n)
n=2n=4n=8
n=16n=32
0.5
0.6
0.7
0.8
0.9
1
1 1.2 1.4 1.6 1.8 2
thro
ughp
ut
monitor overprovisioning (m/n)
0.5
0.6
0.7
0.8
0.9
1
1 1.2 1.4 1.6 1.8 2
thro
ughp
ut
monitor overprovisioning (m/n)
0.5
0.6
0.7
0.8
0.9
1
1 1.2 1.4 1.6 1.8 2
thro
ughp
ut
monitor overprovisioning (m/n)
0.5
0.6
0.7
0.8
0.9
1
1 1.2 1.4 1.6 1.8 2
thro
ughp
ut
monitor overprovisioning (m/n)
Tilman Wolf 36
Prototype Implementation on FPGA§ Multi-core system (4 cores, 6 monitors)
• Monitor logic very simple• Interconnect uses very little resources• Monitors require about 1/3 of memory of processors• Monitors require about 1/8 of power of processors
Tilman Wolf 37
Runtime Operation§ Adaptation based on threshold in queue for application§ Simulation results
• Monitor allocation adapts to dynamics in traffic
• Dynamics: runtime verification of monitoring graphs− Network traffic and functionality change at runtime− Multiple processor cores and
their monitors need to be reprogrammed based on the traffic
• Homogeneity: parameterizablehashing for heterogeneity− Practical networks use large
numbers of identical router devices
− A successful attack on one device can lead to Internet-scale failures
Tilman Wolf 42
Security Loading of Monitoring Graph§ Three entities:
• Router manufacturer
• Network operator• Router/network
processor§ Signatures on graph
establish chain oftrust• Network processor
verifies authenticity• Network operator
can install new graph
Tilman Wolf 43
Prototype Implementation on FPGA§ Prototype system
• Altera Stratix IV FPGA on a DE4 board
• Nios II connects to a FTP server through OpenSSL
• Parameterizablehash function in hardware monitor
Tilman Wolf 44
Security Operations Evaluation on Nios II§ Secure download, decryption, and verification times
• IPv4 with congestion management application • Verification takes several sections
Tilman Wolf 45
Parameterizable Hash Function§ Merkle tree for hash
function• Can be parameterized• High performance
implementation in hardware
• Low resource overhead§ Each network processor
can use a different parameter value• Resulting monitoring
graph has differenthash values
Tilman Wolf 46
Hash Function Evaluation§ Resource cost for hash function
• Compared to non-parameterizable hash function
§ Distribution of hash values in Merkle tree• Random distribution of Hamming distance for almost all inputs• Hash function requires zero Hamming distance for same inputs
Tilman Wolf 47
Outline§ Introduction§ Vulnerabilities
• Example attack on network processor§ Defense mechanism
• Hardware monitor§ Extensions
• Multicore hardware monitor and dynamic workloads• Secure loading and avoiding homogeneity• Operating system support
§ Conclusions
Tilman Wolf 48
Extension: Operating System in ES§ Coordination between embedded OS and monitor
• Multiple activeprocesses in OS,multiple activemonitoring graphs
• Monitor switchesmonitoring graphsin sync with OSprocesses
• Requires minorextension to OS
§ Prototype:• NIOS II processor• μC/OS-II operating
system
processorcore
instruction memory
data memory
I/O interface
comparison logic
mon. memory
embe
dded
pro
cess
or
hard
war
e m
onito
r
hash of processinginstruction
task reset
offli
ne
anal
ysis
runt
ime
oper
atio
n
processing codebinary
monitoringgraph
... ...
processing code mon. graph
task context
OS or active task
context info
active graph
Tilman Wolf 49
Processor-to-Monitor Interface§ OS on processor needs to coordinate with monitors
• Process creation (ensure monitoring graph is ready)• Context switch between processes (switch monitoring graph)• Process deletion (remove monitoring state)• Reset signal from monitor
§ A set of five registers to communicate with the processor
Tilman Wolf 50
Operating System Support§ Hardware monitoring logic tracks OS operations
32
31 10x0000h
14 20x1200h
x 0xxxx
x 0xxxx
GID# of Active Processes
Base Addr
4 14
21 14
11 31
x xxxx
GIDPID Valid
1
1
1
0
4 0x0002h
21 0x0004h
11 0x0000h
x xxxx
PID Address Pointer Valid
1
1
1
0
Address Pointer +
override
0
1
Frame Address
Hash Comparison
CPU Instruction
Valid HashNext State
...
...
...
Group1 Addr
Group3 Addr
Group1 Addr
Group3 Addr
0x1200h + 0x0000h:
0x1200h + 0x0008h:
0x0008h:
0x0000h: Group1 Addr
Group3 Addr
Group1 Addr
Group3 Addr
Valid HashNext State
...
...
...
Slot 4 Region
Slots 2 and 3 Regions
Slot 1 Region
Group 1
Group 2
Group 3
Group 4
0x0008h
0xffffh
0x000eh
0x000ah
...
read data
Base Addresses
Register File
Graph Memory
write data
14
16
16 4
Sequencing logic
4
Position of matching
hash in the hash vector
14Read
Address
14
Control FSM
One‐hot encoding
Hash calculation
Recovery signal
PID
GID
Operation
Done
From the CPU
Pipeline
Processor interface
CPU Interrupt controller
load
Controller
DMA
14 32
Write dataWrite
addressWrite enable
Graph pool
PID addresses
PID to GID binding GID to frame binding
Enable/Disable
Tilman Wolf 51
Operating System Support§ Context switch interactions:
Operating System Support§ Implementation cost on Stratix IV FPGA
§ Hardware monitoring can be used for embedded systems • Embedded systems are similarly performance constrained
Tilman Wolf 53
Outline§ Introduction§ Vulnerabilities
• Example attack on network processor§ Defense mechanism
• Hardware monitor§ Extensions
• Multicore hardware monitor and dynamic workloads• Secure loading and avoiding homogeneity• Operating system support
§ Conclusions
Tilman Wolf 54
Conclusions§ Current and future Internet needs to meet new demands
• Flexibility is key to avoid ossification• Deployment of new edge services requires programmable data plane
§ Programmable routers provide packet processing platform• Systems problem: security vulnerabilities• Attacks can be launched within data plane (i.e., not control access)• Monitor-based hardware defense mechanism is effective
§ Our work has addressed many practical concerns• Workload dynamics and secure installation of monitoring graphs• System heterogeneity• Extension to general embedded systems with operating systems
§ Exciting research area that spans computer networking, embedded systems, and system security
Tilman Wolf 55
Acknowledgements§ Graduate students
• Kekai Hu (now: Intel)• Arman Pouraghily• Harikrishnan Chandrikakutty
§ Sponsors• National Science Foundation• Altera Corporation
Tilman Wolf 56
Selected Publications§ Data plane attack:
• Danai Chasaki and Tilman Wolf. Attacks and defenses in the data plane of networks. IEEE Transactions on Dependable and Secure Computing, 9(6)798–810, November 2012.
• Danai Chasaki and Tilman Wolf. Design of a secure packet processor. In Proc. of ACM/IEEE Symposium on Architectures for Networking and Communication Systems (ANCS), San Diego, CA, October 2010.
§ Hardware monitors for network processors:• Shufu Mao and Tilman Wolf. Hardware support for secure processing in embedded
systems. IEEE Transactions on Computers, 59(6):847–854, June 2010.• Harikrishnan Kumarapillai Chandrikakutty, Deepak Unnikrishnan, Russell Tessier, and
Tilman Wolf. High-performance hardware monitors to protect network processors from data plane attacks. In Proc. of 50th Design Automation Conference (DAC), Austin, TX, June 2013.
• Kekai Hu, Harikrishnan Chandrikakutty, Russell Tessier, and Tilman Wolf. Scalable Hardware Monitors to Protect Network Processors from Data Plane Attacks. In Proc. of First IEEE Conference on Communications and Network Security (CNS), Washington, DC, October 2013. (Best Paper Award)
• Kekai Hu, Tilman Wolf , Thiago Teixeira, and Russell Tessier. System-level security for network processors with hardware monitors. In Proc. of 51st Design Automation Conference (DAC), San Francisco, CA, June 2014.
§ Hardware monitors for embedded systems:• Tedy Thomas, Arman Pouraghily, Kekai Hu, Russell Tessier, and Tilman Wolf. Multi-task
support for security-enabled embedded processors. In Proc. of 26th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP), pages 136–143, Toronto, ON, July 2015.