SPUD A Distributed High Performance Publish-Subscribe Cluster Uriel Peled and Tal Kol Guided by Edward Bortnikov Software Systems Laboratory Software Systems Laboratory Faculty of Electrical Engineering, Technion Faculty of Electrical Engineering, Technion
20
Embed
SPUD A Distributed High Performance Publish-Subscribe Cluster
SPUD A Distributed High Performance Publish-Subscribe Cluster. Uriel Peled and Tal Kol Guided by Edward Bortnikov Software Systems Laboratory Faculty of Electrical Engineering, Technion. Project Goal. Design and implement a general-purpose Publish-Subscribe server - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
SPUDA Distributed High Performance
Publish-Subscribe Cluster
Uriel Peled and Tal Kol
Guided by Edward Bortnikov
Software Systems LaboratorySoftware Systems LaboratoryFaculty of Electrical Engineering, TechnionFaculty of Electrical Engineering, Technion
Project Goal
Design and implement a general-purpose Publish-Subscribe server
Push traditional implementations into global scale performance demands
1 million concurrent clientsMillions of concurrent topicsHigh transaction rate
Demonstrate server abilities with a fun client application
What is Pub/Sub?
topic://traffic-jams/ayalon
subscribe
publish
accident in hashalom
accident in hashalom
What Can We Do With It?Collaborative Web
Browsing
others: others:
What Can We Do With It?Instant Messaging
Hi buddy!
Hi buddy!
Seems Easy To Implement, But…
“I’m behind a NAT, I can’t connect!”Not all client setups are server friendly
“Server is too busy, try again later?!”1 million concurrent clients is simply too much
“The server is so slow!!!”Service time grows exponentially with load
“A server crashed, everything is lost!”
Single points of failure will eventually fail
Naïve Implementation(example 1)
Simple UDP for client-server communication
No need for sessions since we send messagesVery low cost-per-clientSounds perfect?
NAT
NAT Traversal
UDP hole punchingNAT will accept UDP reply for a short windowOur measurements: 15-30 secondsKeep UDP pinging from each client every 15s
Days-long TCP sessionsNAT remembers current sessions for repliesIf WWW works - we should workIncreases dramatically cost-per-clientOur research: all IM’s do exactly this
Naïve Implementation(example 2)
Blocking I/O with one thread per client
Basic model for most servers (JAVA default)Traditional UNIX – fork for every clientSounds perfect?
500clients
500clients
500clients
Network I/O InternalsBlocking I/O – one thread per client
2MB stack, 1GB virtual space enough for only 512 (!)
Non-blocking I/O - selectLinear fd searches are very slow
Asynchronous I/O – completion portsThread pool to handle request completionOur measurements: 30,000 concurrent clients!What is the bottleneck?
Number of locked pages (zero-byte receives)
TCP/IP kernel driver non-paged pool allocations
Scalability
Scale upBuy a bigger box
Scale outBuy more boxes
Which one to do?Both!Push each box to its hardware maximum
1000’s of servers is impractical
Add relevant boxes as load increasesThe Google way (cheap PC server farms)
Identify Our Load Factors
Concurrent TCP clientsScale up: async-I/O, 0-byte-recv, larger NPPScale out: dedicate boxes to handle clients=> Connection Server (CS)
High transaction throughput (topic load)
Scale up: software optimizationsScale out: dedicate boxes to handle topics => Topic Server (TS)