eBay’s Scaling Odyssey Growing and Evolving a Large eCommerce Site Randy Shoup and Franco Travostino eBay Distinguished Architects The 2 nd Workshop on Large-Scale Distributed Systems and Middleware (LADIS ’08) IBM T.J. Watson Research Center, Yorktown, NY, USA September 15-17, 2008
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
eBay’s Scaling OdysseyGrowing and Evolving a Large eCommerce Site
Randy Shoup and Franco TravostinoeBay Distinguished Architects
The 2nd Workshop on Large-Scale DistributedSystems and Middleware (LADIS ’08)
IBM T.J. Watson Research Center, Yorktown, NY, USASeptember 15-17, 2008
Pattern: Failure Detection– Servers log all requests
• Log all application activity, database and service calls on
multicast message bus
• Over 2TB of log messages per day– Listeners automate failure detection and notification
Pattern: Rollback– Absolutely no changes to the site which cannot be undone (!)– Every feature has on / off state driven by central configuration
• Feature can be immediately turned off for operational or business reasons
• Features can be deployed “wired-off” to unroll dependencies
Pattern: Graceful Degradation– Application “marks down” an unavailable or distressed resource– Non-critical functionality is removed or ignored– Critical functionality is retried or deferred
Choose Appropriate Consistency Guarantees– Brewer’s CAP Theorem
• To guarantee availability and partition-tolerance, we trade off immediate consistency
– Consistency is a spectrum, not binary– Prefer eventual consistency to immediate consistency
Immediate
Consistency
Bids , Purchases
Eventual
Consistency
Search Engine , Billing System , etc.
No
Consistency
Preferences
Avoid Distributed Transactions– eBay does absolutely no distributed transactions – no two-phase commit– Minimize inconsistency through state machines and careful ordering of operations– Eventual consistency through asynchronous event or reconciliation batch
Challenge: From reactive to proactive-und-predictable
• Model-centric architecture– Model reflects relationships, behaviors, constraints– Applies to discrete components throughout the whole system of systems
• Failure ≡ gap from model– Failure management enacted upon crossing tolerance threshold T
• What’s the Erlang-B equivalent of failing a service request by a random user?– Once I know what that is, can I actively manage it??
• Experience-driven design– Design evolves incrementally with experience deriving from a live user community– Contrast against controlled conditions and synthetic workloads– Just say PlanetLab !
• Improve in a substantive way– 3% improvement doesn’t cut it– It better be worthy of changing engines in a mid-air plane…
“a firm will tend to expand until thecosts of organizing an extratransaction within the firm becomeequal to the costs of carrying out thesame transaction by means of anexchange on the the open market….”-- “The Nature of the Firm” (1937)
Randy Shoup has been the primary architect for eBay's search infrastructuresince 2004. Prior to eBay, Randy was Chief Architect and Technical Fellow atTumbleweed Communications, and has also held a variety of softwaredevelopment and architecture roles at Oracle and Informatica.
Franco Travostino is the lead architect for virtualization and cloud computing.Prior to eBay, Franco has enjoyed several journeys in research and development,with features in operating systems, data communication, storage area networksand security. Franco is the first author for the book: “Grid Networks: EnablingGrids with advanced communication technology” (Wiley, 2006).