Top Banner
How We Scaled Freshdesk to Handle 150M Requests/Week Kiran Darisi Director, Technical Operations at Freshdesk
26
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: How We Scaled Freshdesk To Take 65M Requests/week

How We Scaled Freshdesk to Handle 150M Requests/Week

Kiran DarisiDirector, Technical Operations at Freshdesk

Page 2: How We Scaled Freshdesk To Take 65M Requests/week

Our customer base grew by 400% and the number of requests per week boomed from 10 to 65 million in a year (2013).

Page 3: How We Scaled Freshdesk To Take 65M Requests/week

Not from an engineering perspectiveCool for a 3 year old startup?

Page 4: How We Scaled Freshdesk To Take 65M Requests/week

We used a bunch of methods to scale vertically in a really short amount of time.

Sure, we eventually had to shard our databases, but some of these techniques helped us stay afloat, for quite a while.

Page 5: How We Scaled Freshdesk To Take 65M Requests/week

MOORE’S WAYIncreasing the RAM, CPU and I/O

But the amount of RAM we added and the CPU cycles did not correlate with the workload we got out of the instance. So we stayed put at 64GB.

We upgraded from Medium Instance Amazon EC2 First Generation to High Memory Quadruple Extra Large (thus increasing our RAM from 3.75 GB to 64 GB)

Page 6: How We Scaled Freshdesk To Take 65M Requests/week

R/W split increased the number of I/Os performed on our databases but it didn’t do much for write performance.

We marked dedicated roles for each slave because using round robin algorithm to select different slaves for different queries proved ineffective.

THE READ/WRITE SPLIT

Using MySQL replication and distributing the reads between master and slave

Page 7: How We Scaled Freshdesk To Take 65M Requests/week

We chose the partition key and the number of partitions and the table was partitioned automatically.

Post-partitioning, our read performance increased dramatically but again, the write performance was a problem.

MYSQL PARTITIONINGUsing the MySQL 5 built-in

partitioning capability.

Page 8: How We Scaled Freshdesk To Take 65M Requests/week

1. Choose the partition key carefully or alter the current schema to follow the MySQL partition rules.

2. The number of partitions you start with will affect the I/O operations on the disk directly.

3. If you use a hash-based algorithm with hash-based keys, you cannot control who goes where. This means you’ll be in trouble if two or more noisy customers fall within the same partition.

4. Make sure that every query contains the MySQL partition key. A query without the partition key ends up scanning all the partitions. Performance is sure to take a dive.

Things to keep in mind while performing MySQL partitioning

Page 9: How We Scaled Freshdesk To Take 65M Requests/week

We cached ActiveRecord objects as well as HTML partials (bits and pieces of HTML) using Memcached.

We chose Memcached because it scales well with multiple clusters. The Memcached client used makes a lot of difference in response time so we went with dalli.

CACHING

Caching objects that rarely change in their lifetime

Page 10: How We Scaled Freshdesk To Take 65M Requests/week

DISTRIBUTED FUNCTIONS

Keeping response time low by using different storage engines for

different purposes

We started using Amazon RedShift for analytics and data mining, and Redis to store state information and background jobs for Resque.

But because Redis can’t scale or fallback, we don’t use it for atomic operations.

Page 11: How We Scaled Freshdesk To Take 65M Requests/week

We decided that scaling horizontally by sharding was the only cost-effective way to increase write scalability beyond the instance size.

But scaling vertically can only get you so far.

Page 12: How We Scaled Freshdesk To Take 65M Requests/week

Two main concerns we had before we took the final call on sharding:

1. No distributed transactions – We wanted all tenant details to be in one shard.

2. Rebalancing the shards should be easy – We wanted control over which tenant sits in which shard and to be able to move them around when needed.

A little research showed us that directory-based sharding was the only way to go.

Page 13: How We Scaled Freshdesk To Take 65M Requests/week

REASONS FOR CHOOSING DIRECTORY-

BASED SHARDING

It is simpler than hash key-based or range-based sharding.

Rebalancing shards is easier here than in other methods.

Page 14: How We Scaled Freshdesk To Take 65M Requests/week

A typical directory entry looks like this

tenant info shard_details shard_status

Stark Industries shard1 Read & Write

• tenant_info - unique key referring to the DB entry

• shard_details - shard in which that tenant exists

• shard_status - tells what kind of activity the tenant is ready for (we have multiple shard statuses like Not Ready, Only Reads, Read & Write etc)

Page 15: How We Scaled Freshdesk To Take 65M Requests/week

The sharding API even acts as a unique ID generator so that the tenant ID generated is unique across shards.

How directory lookups work

API wrapper is tuned to accept the tenant information in multiple forms like tenant URL, tenant ID etc.

Page 16: How We Scaled Freshdesk To Take 65M Requests/week

Sometimes a customer grows from processing 1000 tickets per day to 10,000 tickets per day. This will affect the performance of the whole shard.

We can’t solve this by splitting up customer data into multiple shards because we didn’t want the mess of distributed transactions.

So, in these cases, we’d move the noisy customer to a shard of his own. That way, everybody wins.

Why we care about rebalancing

Page 17: How We Scaled Freshdesk To Take 65M Requests/week

Steps to Rebalance a Shard

Page 18: How We Scaled Freshdesk To Take 65M Requests/week

Every shard will have its own slave to scale the reads. For example, say Wayne Enterprises and Stark industries are in shard1.

1

Wayne Enterprises shard1 Read & Write

Stark Industries shard1 Read & Write

The directory entry looks like this:

Page 19: How We Scaled Freshdesk To Take 65M Requests/week

If Wayne enterprises grows at a breakneck pace, we would decide to move it to another shard.

(averting the danger of Bruce Wayne and Tony Stark being mad at us the same time).

2

Page 20: How We Scaled Freshdesk To Take 65M Requests/week

So we would boot up a new slave to shard1 and call it shard2. Then, we’d attach a read replica to the new slave and wait for it to sync with the master.

3

Page 21: How We Scaled Freshdesk To Take 65M Requests/week

We would then stop the writes for Wayne Enterprises by changing the shard status in the directory.

4

Wayne Enterprises shard1 Read Only

Stark Industries shard1 Read & Write

Page 22: How We Scaled Freshdesk To Take 65M Requests/week

Then we would stop the replication of master data in shard2 and promote it to master.

5

Now the directory entry will be updated accordingly.

Wayne Enterprises shard2 Read & Write

Stark Industries shard1 Read & Write

Page 23: How We Scaled Freshdesk To Take 65M Requests/week

This effectively moves Wayne Enterprises to its own shard.

Batman is happy and so is Iron Man.

6

Page 24: How We Scaled Freshdesk To Take 65M Requests/week

1. Don’t do it unless it’s absolutely necessary. You will have to rewrite code for your whole app, and maintain it.

2. You could use functional partitioning (moving an over-sized table to another DB altogether) to completely avoid sharding if writes are not a problem.

3. Choosing the right sharding algorithm is a bit tricky as each has its own benefits and drawbacks. You need to make a thorough study of all your requirements while picking one.

4. You will have to take care of the Unique ID generation across shards.

Word of caution

Page 25: How We Scaled Freshdesk To Take 65M Requests/week

We get 250,000 tickets across Freshdesk every day and 100 M queries during the same time (with a peak of 3-4k QPS). We have a separate shard now for all new sign ups. And each shard can roughly carry 20,000 tenants.

In the future, we’d like to explore Multi-pod architecture and also look at Proxy architecture using MySQL Fabric, Scalebase etc.

What’s next for Freshdesk

Page 26: How We Scaled Freshdesk To Take 65M Requests/week

“Behind every slideshare is a great blogpost”

Read more about scaling freshdesk here http://blog.freshdesk.com/how-freshdesk-scaled-using-

sharding/