Top Banner
When small problems become big problems @adrianfcole
51

When small problems become big problems

Jul 07, 2015

Download

Technology

Adrian Cole

Challenges met developing a multi-tenant PaaS runtime summarized from interviews of CloudHub.io development and product teams.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: When small problems become big problems

When small problems become big problems

@adrianfcole

Page 2: When small problems become big problems

Agenda

• Introduction to CloudHub

• Challenges we faced building multi-tenant architecture

• Q/A

Page 3: When small problems become big problems

Adrian Cole (@jclouds) founded jclouds march 2009 cloudhub.io architect at cloudhub.io architect at cloudhub.io architect at

Ego slide

Page 4: When small problems become big problems

4

Page 5: When small problems become big problems

56

Platform as a Service

Automated ProvisioningEvent TrackingCentralized LoggingSecure Data Gateway

Page 6: When small problems become big problems

The landlord’s dilemma

Page 7: When small problems become big problems

When you’ve priced yourself out of

business

Page 8: When small problems become big problems

Cloud is utility, but your service may be

more• Measurement based pricing exists in

infrastructure tier

• Know your customer, who are they and where in the value chain you act

• Don’t get into race to the bottom

Page 9: When small problems become big problems

When 200 users becomes 2000

accounts

Page 10: When small problems become big problems

Choosing a BASIC starting point

• Already had a LDAP infrastructure

• Straightforward integration with console and other access tools

• Easy to do do BASIC authentication

Page 11: When small problems become big problems

Remember users (and api users)(and api users)

• Basic Auth is not a good choice for an API over time

• System integrators need delegated access

• Hard to cleanup accounts when there are multiple owners

Page 12: When small problems become big problems

When myapp.cloudhub.io

becomesmyapp001.cloudhub.iomyapp001.cloudhub.io

Page 13: When small problems become big problems

How to present the iApps

• X.cloudhub.io

• DNS is flexible to deal with

• clear branding

Page 14: When small problems become big problems

X.cloudhub.io woes

• Namespace contention

• qa.cloudhub.io isn’t really an iApp

• need to maintain blacklist

Page 15: When small problems become big problems

When mule isn’t mule

Page 16: When small problems become big problems

PaaS is more than java -jar mule.jar

• CloudHub adds services integration to Mule

• Logging, Event Tracking, Replay, etc.

Page 17: When small problems become big problems

appstack -> platform is tricky

• transparent features and also compatible?

• dealing with network streams that could be more brittle

• matching serialization/marshalling w/ cloud features like streaming

Page 18: When small problems become big problems

When SLA turns into refund

Page 19: When small problems become big problems

Desire to rely on more services

• Cloud Infrastructure

• Cloud Search

• Cloud Scaling

Page 20: When small problems become big problems

Reality of relying on more services

• uptime is less the more service dependencies you add

• services may underperform their SLAs with little financial impact

• you may need to manually deal with service outages

Page 21: When small problems become big problems

When logging turns into a big data

problem

Page 22: When small problems become big problems

Customers desire real time search

• need to centralize and index logs

• using ElasticSearch can avoid service fees or license fees

• with a custom logging plugin, we can redirect output to the cluster

Page 23: When small problems become big problems

Logging is always a big problem

• Clusters can fail for reasons beyond servers deployed

• API design for logging is different

• What happens if your disk fails or your cluster fails?

• What happens when you replace a worker?

Page 24: When small problems become big problems

Real men test in production

Page 25: When small problems become big problems

Testability is crucial

• each dependency needs to be testable and mockable

• devs need a local environment that matches, or your test cases will suffer

• creation of new tenants means more money.. test it!

Page 26: When small problems become big problems

Platform testing is really hard

• Some external deps don’t have sandboxes

• Can you try 500 applications?

• Can you maintain a quiet production “neighborhood" while testing QA

Page 27: When small problems become big problems

When security updates = vi ipsec.conf in for

loop

Page 28: When small problems become big problems

Security in a public service is hard

• assume user is infinitely clever and malicious

• deny by default vs service simplicity

• maintain segregation and availability of tenants

• Asset value can vary widely across tenants

Page 29: When small problems become big problems

Security design touches everything

• ipsec is hard to maintain without proper CM, and wasn’t built for noisy network

• deny by default means higher maintenance, and not all products support it

• it is easy to violate tenancy segregation in a platform

• you may have to hire consultants

Page 30: When small problems become big problems

When your management service

goes haywire

Page 31: When small problems become big problems

automation automation automation

• myriad of technology to automate scaling and availability

• policies can be fine tuned to relaunch or scale out based on system feedback or api

Page 32: When small problems become big problems

What about network splits

• Will your management server “heal” something that is already around?

• Is your management server on the same failure plane as your managed servers

• Will you end up with manual intervention controls (aka red button)

Page 33: When small problems become big problems

When your api design haunts you

Page 34: When small problems become big problems

Put an API on everything

• Allows automation and guis besides what you’ve invented

• simplifies testing

• eat your own dogfood

Page 35: When small problems become big problems

Design redo is a big problem

• GUIs can change easier as humans drive them

• Maintaining old apis may not be worth it

• People may depend on bugs or semantic gaps

• Version practices in ReST are not uniform

• remember understanding state machine is a prerequisite for HATEOAS

Page 36: When small problems become big problems

When 5 retries becomes a DDoS

attack

Page 37: When small problems become big problems

We want to build resilient apps

• recovery is a part of the service you provide, more important as you go up in value chain

• connections should assume failure and be able to reconnect to dependencies

• recovery is non-trivial

Page 38: When small problems become big problems

5 retries is code smell

• things that backup or fail can get worse with naive error retry loops

• APIs often can be made to include data about when to retry or that you need to slow down

• Treat resilience as a requirement, not a feature

Page 39: When small problems become big problems

When your users ask the same questions

Page 40: When small problems become big problems

Wrong words suck

• Some terms seem sensible in design discussions, but public use something else

• Changing requires retraining, and thorough doc review

• What goes online lingers

Page 41: When small problems become big problems

When a feature request implies new

architecture

Page 42: When small problems become big problems

• Customers are looking for service, not explanations of why it is hard

• Adding value implies touch decisions on new features

• As the world turns, expectations rise

• Know your customer

Platform changes

Page 43: When small problems become big problems

• Not all databases support full-text search, esp with partitioning

• Some data is better stored in S3, how does that affect indexing strategy?

• Real-time tools are emerging but immature

Real-time, full-text search, streaming.. oh

my!

Page 44: When small problems become big problems

When you end up with a “lock” table in

mongo

Page 45: When small problems become big problems

Datastore diversity!

• NoSQL datastores like Mongo are attractive and energize developers

• Cloud provisioners like RDS-driven MySQL are also attractive

• Specialized stores like CloudWatch for statistics

Page 46: When small problems become big problems

Don’t expect mongo to do magic

• Database Engines Mature

• Consistent backups are tricky and only recently supported

• Data Ops and visualization tools are emerging

• There are type safe bridges like Morphia

Page 47: When small problems become big problems

Hammers and screwdrivers

• In a pinch, you can knock in a screw with a hammer, but you can’t screw in a nail with a screwdriver

• Don’t throw data into whatever store happens to be easy to grab, even if you can.

• Rechecking data assumptions at T1 is better than T3. At T6, you may a disaster

Page 48: When small problems become big problems

Summary

Page 49: When small problems become big problems

When developing a multi-tenant platform

• Own your dependencies or they will own you

• Add time for entropy

• Repeatedly remind yourself you are a landlord

Page 50: When small problems become big problems

Architecture as iterative development

• Forethought

• Critical debate

• Decision review

Page 51: When small problems become big problems

‣ @adrianfcole

[email protected]

‣ www.cloudhub.io