Top Banner
TIME-WAIT Hack For High Performance Ephemeral Connection in Linux TCP Stack E A Faisal [email protected]
22

TIME-WAIT Hack for High Performance Ephemeral Connection in Linux TCP Stack

Feb 18, 2016

Download

Documents

EA Faisal

Slides used for presentation during MOSC Q4 Meetup 2015
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: TIME-WAIT Hack for High Performance Ephemeral Connection in Linux TCP Stack

TIME-WAIT HackFor High Performance Ephemeral Connection in

Linux TCP Stack

E A [email protected]

Page 2: TIME-WAIT Hack for High Performance Ephemeral Connection in Linux TCP Stack

$ whoami

Engku Ahmad Faisal

⇛ github.com/efaisal⇛ twitter.com/efaisal⇛ facebook.com/eafaisal⇛ plus.google.com/u/0/+EAFaisal

Linux user since 1996/1997

Attempted to contribute to open source projects:few accepted, most rejected ;-P

Page 3: TIME-WAIT Hack for High Performance Ephemeral Connection in Linux TCP Stack

$ whoami

Worked with Nexo Prima Sdn Bhd

● Open Source Cloud Infrastructure○ Virtualisation: oVirt/OpenStack○ Storage: Gluster/Ceph

● High Availability & Scalability Infrastructure○ Linux-based solutions

● System Performance Tuning & Profiling○ Focusing on web-based application on Linux platform

Page 4: TIME-WAIT Hack for High Performance Ephemeral Connection in Linux TCP Stack

TCP STATE MACHINE

Page 5: TIME-WAIT Hack for High Performance Ephemeral Connection in Linux TCP Stack

TCP :: ACTIVE CLOSE

3-way handshakeESTABLISHED

CLOSED

CLOSING

TIME_WAIT

FIN_WAIT_1

FIN_WAIT_2

Active C

lose

2MSL Timeout

close()/fin

ack/-

fin/ack

fin+ack/ackack/-

fin/ack

Page 6: TIME-WAIT Hack for High Performance Ephemeral Connection in Linux TCP Stack

TCP :: ACTIVE CLOSE

● By the initiator of close()● TIME-WAIT & 2MSL are there for good reasons:

○ due to nature of Internet - packet lost, re-transmission, arrives late○ to ensure the other end properly closed

● RFC 793 states 2MSL should be 4 minutes● 2MSL:

○ MS Windows - 4 minutes○ Linux - 1 minute (hard coded)

TIME-WAIT is good for TCP communication over the Internet

Page 7: TIME-WAIT Hack for High Performance Ephemeral Connection in Linux TCP Stack

TCP :: PASSIVE CLOSE

3-way handshakeESTABLISHED

CLOSED

LAST_ACK

CLOSE_WAIT Passive C

losefin/ack

close()/fin

ack/-

Page 8: TIME-WAIT Hack for High Performance Ephemeral Connection in Linux TCP Stack

TCP :: PASSIVE CLOSE

● By the receiver of close()● CLOSE-WAIT

○ waits up to 60 seconds in Linux○ configurable via tcp_fin_timeout

● WARNING!Some resources on the Web wrongly informed their readers to tweak tcp_fin_timeout to tune TIME-WAIT

Page 9: TIME-WAIT Hack for High Performance Ephemeral Connection in Linux TCP Stack

WEB APPLICATION OF TODAY

Page 10: TIME-WAIT Hack for High Performance Ephemeral Connection in Linux TCP Stack

SIMPLIFIED WEB APP STACK

Client

Load Balancer

Web App

Database MQCacheRESTAPI

Page 11: TIME-WAIT Hack for High Performance Ephemeral Connection in Linux TCP Stack

WEB APP STACK

● Supporting services for Web App layer typically use TCP as transport protocol● Web App layer is both:

○ TCP server listening to connection from the client○ TCP client connecting to various supporting services

● Consider a LAMP stack + memcached server○ Each HTTP request, creates/opens a TCP connection to the memcached○ At the end of the request, the connection is closed○ OMG! Ephemeral connection!

○ If we have more supporting services (MQ, REST API, etc), there might be more open/close

operations for each request○ HTTP is considered ephemeral by “nature”

Page 12: TIME-WAIT Hack for High Performance Ephemeral Connection in Linux TCP Stack

IMPACT AND PROBLEMS

Page 13: TIME-WAIT Hack for High Performance Ephemeral Connection in Linux TCP Stack

BUSY SERVER WITH EPHEMERAL CONNECTIONS

● Busy server, e.g. 1,000 HTTP requests/second● Web App layer also open TCP connection to backend services at that rate or

more● In 1 minute, we’re going to have thousands lingering TCP TIME_WAIT● You can check using netstat or ss command

$ ss -nt state time-wait$ netstat -tn | grep TIME_WAIT

Page 14: TIME-WAIT Hack for High Performance Ephemeral Connection in Linux TCP Stack

PROBLEMS: CONNECTION TABLE SLOT

Connection in TIME-WAIT state hold a local port for 1 minute

Local port range is finite - 16-bit integer

In many distro, default to around 30,000

Can be changed: net.ipv4.ip_local_port_range

If local port range is exhausted, any connect() results in EADDRNOTAVAIL

Page 15: TIME-WAIT Hack for High Performance Ephemeral Connection in Linux TCP Stack

PROBLEMS: ADDITIONAL MEMORY & CPU USAGE

● Memory Usage to Hold Socket Structure○ Though not really significant but annoying enough

● Additional CPU Usage○ Searching for free port uses CPU○ Wasting CPU cycle to iteratively purge tons TIME_WAIT connections

Page 16: TIME-WAIT Hack for High Performance Ephemeral Connection in Linux TCP Stack

EXISTING & POTENTIAL SOLUTIONS

Page 17: TIME-WAIT Hack for High Performance Ephemeral Connection in Linux TCP Stack

SOLUTION 1: tcp_tw_reuse

From Linux doc:“Allow to reuse TIME-WAIT sockets for new connections when it is safe from protocol viewpoint. Default value is 0. It should not be changed without advice/request of technical experts.”

Commonly recommended to be enabled$ echo 1 > /proc/sys/net/ipv4/tcp_tw_reuse

Dependent on another kernel param to be enabled: net.ipv4.tcp_timestamps

Does it really work?

Page 18: TIME-WAIT Hack for High Performance Ephemeral Connection in Linux TCP Stack

SOLUTION 2: TIME-WAIT NEGOTIATION

Proposed by Theodore Faber, Joe Touch & Wei Yue from University of Southern California in 1999

No code available, claimed have experimental code written for SunOS 4.1.3

Involves modifying TCP by adding a new TCP option called TW-Negotiate, negotiated during the three-way handshake

Not a viable solution, simply a theoretical one

Page 19: TIME-WAIT Hack for High Performance Ephemeral Connection in Linux TCP Stack

INTRODUCING LINUXTCPTW

Page 20: TIME-WAIT Hack for High Performance Ephemeral Connection in Linux TCP Stack

LINUXTCPTW

Implementation of an old idea

● Once discussed in kernel core dev mailinglist to make TIME-WAIT tunable● Rejected by kernel core dev - TIME-WAIT is there for good reasons● Easily abused to make TCP non-compliant to standard● Open source project to create patch set to the kernel for configurable TIME-

WAIT● Introduce a new kernel param - tcp_timewait_len● A new entry in proc fs - /proc/sys/net/ipv4/tcp_timewait_len● Able to use sysctl for configuration - net.ipv4.tcp_timewait_len

Page 21: TIME-WAIT Hack for High Performance Ephemeral Connection in Linux TCP Stack

THE PROJECT

Project lives at https://github.com/efaisal/linuxtcptw/

Binary release available for CentOS 6 and 7 at https://github.com/efaisal/linuxtcptw/releases

Unfortunately not battle tested in production environment yet - any volunteer?

Currently working on Ubuntu 14.04 LTS kernel

Page 22: TIME-WAIT Hack for High Performance Ephemeral Connection in Linux TCP Stack

THANK YOU