8/14/2019 App Engine Google
1/41
8/14/2019 App Engine Google
2/41
Agenda
Using the Python runtime effectivelyNumbers everyone should knowTools for storing and scaling large data setsExample: Distributed counters
Example: A blog
8/14/2019 App Engine Google
3/41
Prevent repeated, wasteful work
8/14/2019 App Engine Google
4/41
Prevent repeated, wasteful work
Loading Python modules on every request can be slowReuse main() to addresses this:def main():wsgiref.handlers.CGIHandler().run(my_app)
if __name__ == "__main__":
main()Lazy-load big modules to reduce the "warm-up" cost
def my_expensive_operation():import big_modulebig_module.do_work()
Take advantage of "preloaded" modules
8/14/2019 App Engine Google
5/41
Prevent repeated, wasteful work 2
Avoid large result setsIn-memory sorting and filtering can be slowMake the Datastore work for you
Avoid repeated queries
Landing pages that use the same query for everyoneIncoherent cachingUse memcache for a consistent view:
results = memcache.get('main_results')
if results is None:results = db.GqlQuery('...').fetch(10)memcache.add('main_results', results, 60)
8/14/2019 App Engine Google
6/41
Numbers everyone should know
8/14/2019 App Engine Google
7/41
Numbers everyone should know
Writes are expensive!Datastore is transactional: writes require disk accessDisk access means disk seeks
Rule of thumb: 10ms for a disk seek
Simple math:1s / 10ms = 100 seeks/sec maximum
Depends on:
The size and shape of your dataDoing work in batches (batch puts and gets)
8/14/2019 App Engine Google
8/41
Numbers everyone should know 2
Reads are cheap!Reads do not need to be transactional, just consistent
Data is read from disk once, then it's easily cachedAll subsequent reads come straight from memory
Rule of thumb: 250usec for 1MB of data from memorySimple math:
1s / 250usec = 4GB/sec maximum
For a 1MB entity, that's 4000 fetches/sec
8/14/2019 App Engine Google
9/41
Tools for storing data
8/14/2019 App Engine Google
10/41
Tools for storing data: Entities
Fundamental storage type in App EngineSchemalessSet of property name/value pairsMost properties indexed and efficient to queryOther large properties not indexed (Blobs, Text)
Think of it as an object store, not relationalKinds are like classesEntities are like object instances
Relationship between Entities using KeysReference propertiesOne to many, many to many
8/14/2019 App Engine Google
11/41
Tools for storing data: Keys
Key corresponds to the Bigtable row for an EntityBigtable accessible as a distributed hashtableGet() by Key: Very fast! No scanning, just copying data
Limitations:
Only one ID or key_name per EntityCannot change ID or key_name later500 bytes
8/14/2019 App Engine Google
12/41
Tools for storing data: Transactions
ACID transactionsAtomicity, Consistency, Isolation, Durability
No queries in transactionsTransactional read and write with Get() and Put()
Common practiceQuery, find what you needTransact with Get() and Put()
How to provide a consistent view in queries?
8/14/2019 App Engine Google
13/41
Tools for storing data: Entity groups
Closely related Entities can form an Entity groupStored logically/physically close to each other
Define your transactionalityRDBMS: Row and table locking
Datastore: Transactions across a single Entity group"Locking" one Entity in a group locks them allSerialized writes to the whole group (in transactions)Not a traditional lock; writers attempt to complete in parallel
8/14/2019 App Engine Google
14/41
Tools for storing data: Entity groups 2
HierarchicalEach Entity may have a parentA "root" node defines an Entity groupHierarchy of child Entities can go many levels deep
Watch out! Serialized writes forall children of the root
Datastore scales wideEach Entity group has serialized writesNo limit to the number of Entity groups to use in parallelThink of it as many independent hierarchies of data
8/14/2019 App Engine Google
15/41
Tools for storing data: Entity groups 3
Root
Child
Root
Child
Root
Child
Root
Child
Txn 1 Txn 2
Entity groups all transacting in parallel:
Txn 3 Txn 4
8/14/2019 App Engine Google
16/41
Tools for storing data: Entity groups 4
PitfallsLarge Entity groups = high contention = failed transactionsNot thinking about write throughput is bad
Structure your data to match your usage patternsGood news
Query across entity groups without serialized access!Consistent view across all entity groups
No partial commits visibleAll Entities in a group are the latest committed version
8/14/2019 App Engine Google
17/41
Example: Counters
8/14/2019 App Engine Google
18/41
Counters
Using Model.count()Bigtable doesn't know counts by designO(N); cannot be O(1); must scan every Entity row!
Use an Entity with a count property:
class Counter(db.Model):count = db.IntegerProperty()
Frequent updates = high contention!Transactional writes are serialized and too slowFundamental limitation of distributed systems
8/14/2019 App Engine Google
19/41
Counters: Before and after
Counter
Single
Counter
Sharded
Counter Counter
8/14/2019 App Engine Google
20/41
Counters: Sharded
Shard counters into multiple Entity groupsPick an Entity at random and update it transactionallyCombine sharded Entities together on reads"Contention" reduced by 1/NSharding factor can be changed with little difficulty
8/14/2019 App Engine Google
21/41
Counters: Models
class CounterConfig(Model):name = StringProperty(required=True)
num_shards = IntegerProperty(required=True,default=1)
class Counter(Model):name = StringProperty(required=True)
count = IntegerProperty(required=True,default=0)
8/14/2019 App Engine Google
22/41
Counters: Get the count
def get_count(name):total = 0
for counter in Counter.gql('WHERE name = :1', name):
total += counter.count
return total
8/14/2019 App Engine Google
23/41
Counters: Increment the count
def increment(name):
config = CounterConfig.get_or_insert(name,name=name)
def txn():
index = random.randint(0, config.num_shards - 1)
shard_name = name + str(index)counter = Counter.get_by_key_name(shard_name)
if counter is None:
counter = Counter(
key_name=shard_name, name=name)
counter.count += 1
counter.put()db.run_in_transaction(txn)
8/14/2019 App Engine Google
24/41
Counters: Cache reads
def get_count(name):total = memcache.get(name)
if total is None:
total = 0
for counter in Counter.gql(
'WHERE name = :1', name):
total += counter.count
memcache.add(name, str(total), 60)
return total
8/14/2019 App Engine Google
25/41
Counters: Cache writes
def increment(name):
config = CounterConfig.get_or_insert(name,name=name)
def txn():
index = random.randint(0, config.num_shards - 1)
shard_name = name + str(index)counter = Counter.get_by_key_name(shard_name)
if counter is None:
counter = Counter(key_name=shard_name,
name=name)
counter.count += 1
counter.put()
db.run_in_transaction(txn)
memcache.incr(name)
8/14/2019 App Engine Google
26/41
Example: Building a Blog
8/14/2019 App Engine Google
27/41
Building a Blog
Standard blogMultiple blog postsEach post has commentsEfficient paging without using queries with offsets
Remember, Bigtable doesn't know counts!
8/14/2019 App Engine Google
28/41
Building a Blog: Blog entries
Blog entries with an indexHaving an index establishes a rigid orderingIndex enables efficient pagingThis is a global counter, but it's okayLow write throughput of overall posts = no contention
8/14/2019 App Engine Google
29/41
Building a Blog: Models
class GlobalIndex(db.Model):max_index = db.IntegerProperty(required=True,
default=0)
class BlogEntry(db.Model):
index = db.IntegerProperty(required=True)title = db.StringProperty(required=True)
body = db.TextProperty(required=True)
8/14/2019 App Engine Google
30/41
Building a Blog: Posting an entry
def post_entry(blogname, title, body):
def txn():
blog_index = BlogIndex.get_by_key_name(blogname)
if blog_index is None:
blog_index = BlogIndex(key_name=blogname)
new_index = blog_index.max_indexblog_index.max_index += 1
blog_index.put()
new_entry = BlogEntry(key_name=blogname + str(new_index),
parent=blog_index,index=new_index,title=title, body=body)
new_entry.put()
db.run_in_transaction(txn)
8/14/2019 App Engine Google
31/41
Building a Blog: Posting an entry 2
Hierarchy of Entities:
BlogIndex
Entry
8/14/2019 App Engine Google
32/41
Building a Blog: Getting one entry
def get_entry(blogname, index):
entry = BlogEntry.get_by_key_name(parent=Key.from_path('BlogIndex', blogname),blogname + str(index))
return entry
That's it! Super fast!
8/14/2019 App Engine Google
33/41
Building a Blog: Paging
def get_entries(start_index):
extra = Noneif start_index is None:
entries = BlogEntry.gql('ORDER BY index DESC').fetch(POSTS_PER_PAGE + 1)
else:start_index = int(start_index)entries = BlogEntry.gql(
'WHERE index
8/14/2019 App Engine Google
34/41
8/14/2019 App Engine Google
35/41
Building a Blog: Comments
High write-throughputCan't use a shared index
Would like to order by post datePost dates aren't unique, so we can't use them to page:
2008-05-26 22:11:04.1000 Before
2008-05-26 22:11:04.1234 My post
2008-05-26 22:11:04.1234 This is another post
2008-05-26 22:11:04.1234 And one more post
2008-05-26 22:11:04.1234 The last post
2008-05-26 22:11:04.2000 After
8/14/2019 App Engine Google
36/41
Building a Blog: Composite properties
Make our own composite string property:"post time | user ID | comment ID"Use a shared index for each user's comment ID
Each index is in a separate Entity groupGuaranteed a unique ordering, querying across entity groups:
2008-05-26 22:11:04.1000|brett|3 Before
2008-05-26 22:11:04.1234|jon|3 My post
2008-05-26 22:11:04.1234|jon|4 This is another post
2008-05-26 22:11:04.1234|ryan|4 And one more post
2008-05-26 22:11:04.1234|ryan|5 The last post
2008-05-26 22:11:04.2000|ryan|2 After
8/14/2019 App Engine Google
37/41
Building a Blog: Composite properties 2
High throughput because of parallelism
UserIndex
Comment
UserIndex
Comment
UserIndex
Comment
8/14/2019 App Engine Google
38/41
What to remember
8/14/2019 App Engine Google
39/41
What to remember
Minimize Python runtime overhead
Minimize waste
Why Query when you can Get?
Structure your data to match your loadOptimize for low write contentionThink about Entity groups
Memcache is awesome-- use it!
8/14/2019 App Engine Google
40/41
Learn more
code.google.com
8/14/2019 App Engine Google
41/41