Top Banner

of 41

App Engine Google

May 30, 2018

Download

Documents

Giraldo Ocampo
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 8/14/2019 App Engine Google

    1/41

  • 8/14/2019 App Engine Google

    2/41

    Agenda

    Using the Python runtime effectivelyNumbers everyone should knowTools for storing and scaling large data setsExample: Distributed counters

    Example: A blog

  • 8/14/2019 App Engine Google

    3/41

    Prevent repeated, wasteful work

  • 8/14/2019 App Engine Google

    4/41

    Prevent repeated, wasteful work

    Loading Python modules on every request can be slowReuse main() to addresses this:def main():wsgiref.handlers.CGIHandler().run(my_app)

    if __name__ == "__main__":

    main()Lazy-load big modules to reduce the "warm-up" cost

    def my_expensive_operation():import big_modulebig_module.do_work()

    Take advantage of "preloaded" modules

  • 8/14/2019 App Engine Google

    5/41

    Prevent repeated, wasteful work 2

    Avoid large result setsIn-memory sorting and filtering can be slowMake the Datastore work for you

    Avoid repeated queries

    Landing pages that use the same query for everyoneIncoherent cachingUse memcache for a consistent view:

    results = memcache.get('main_results')

    if results is None:results = db.GqlQuery('...').fetch(10)memcache.add('main_results', results, 60)

  • 8/14/2019 App Engine Google

    6/41

    Numbers everyone should know

  • 8/14/2019 App Engine Google

    7/41

    Numbers everyone should know

    Writes are expensive!Datastore is transactional: writes require disk accessDisk access means disk seeks

    Rule of thumb: 10ms for a disk seek

    Simple math:1s / 10ms = 100 seeks/sec maximum

    Depends on:

    The size and shape of your dataDoing work in batches (batch puts and gets)

  • 8/14/2019 App Engine Google

    8/41

    Numbers everyone should know 2

    Reads are cheap!Reads do not need to be transactional, just consistent

    Data is read from disk once, then it's easily cachedAll subsequent reads come straight from memory

    Rule of thumb: 250usec for 1MB of data from memorySimple math:

    1s / 250usec = 4GB/sec maximum

    For a 1MB entity, that's 4000 fetches/sec

  • 8/14/2019 App Engine Google

    9/41

    Tools for storing data

  • 8/14/2019 App Engine Google

    10/41

    Tools for storing data: Entities

    Fundamental storage type in App EngineSchemalessSet of property name/value pairsMost properties indexed and efficient to queryOther large properties not indexed (Blobs, Text)

    Think of it as an object store, not relationalKinds are like classesEntities are like object instances

    Relationship between Entities using KeysReference propertiesOne to many, many to many

  • 8/14/2019 App Engine Google

    11/41

    Tools for storing data: Keys

    Key corresponds to the Bigtable row for an EntityBigtable accessible as a distributed hashtableGet() by Key: Very fast! No scanning, just copying data

    Limitations:

    Only one ID or key_name per EntityCannot change ID or key_name later500 bytes

  • 8/14/2019 App Engine Google

    12/41

    Tools for storing data: Transactions

    ACID transactionsAtomicity, Consistency, Isolation, Durability

    No queries in transactionsTransactional read and write with Get() and Put()

    Common practiceQuery, find what you needTransact with Get() and Put()

    How to provide a consistent view in queries?

  • 8/14/2019 App Engine Google

    13/41

    Tools for storing data: Entity groups

    Closely related Entities can form an Entity groupStored logically/physically close to each other

    Define your transactionalityRDBMS: Row and table locking

    Datastore: Transactions across a single Entity group"Locking" one Entity in a group locks them allSerialized writes to the whole group (in transactions)Not a traditional lock; writers attempt to complete in parallel

  • 8/14/2019 App Engine Google

    14/41

    Tools for storing data: Entity groups 2

    HierarchicalEach Entity may have a parentA "root" node defines an Entity groupHierarchy of child Entities can go many levels deep

    Watch out! Serialized writes forall children of the root

    Datastore scales wideEach Entity group has serialized writesNo limit to the number of Entity groups to use in parallelThink of it as many independent hierarchies of data

  • 8/14/2019 App Engine Google

    15/41

    Tools for storing data: Entity groups 3

    Root

    Child

    Root

    Child

    Root

    Child

    Root

    Child

    Txn 1 Txn 2

    Entity groups all transacting in parallel:

    Txn 3 Txn 4

  • 8/14/2019 App Engine Google

    16/41

    Tools for storing data: Entity groups 4

    PitfallsLarge Entity groups = high contention = failed transactionsNot thinking about write throughput is bad

    Structure your data to match your usage patternsGood news

    Query across entity groups without serialized access!Consistent view across all entity groups

    No partial commits visibleAll Entities in a group are the latest committed version

  • 8/14/2019 App Engine Google

    17/41

    Example: Counters

  • 8/14/2019 App Engine Google

    18/41

    Counters

    Using Model.count()Bigtable doesn't know counts by designO(N); cannot be O(1); must scan every Entity row!

    Use an Entity with a count property:

    class Counter(db.Model):count = db.IntegerProperty()

    Frequent updates = high contention!Transactional writes are serialized and too slowFundamental limitation of distributed systems

  • 8/14/2019 App Engine Google

    19/41

    Counters: Before and after

    Counter

    Single

    Counter

    Sharded

    Counter Counter

  • 8/14/2019 App Engine Google

    20/41

    Counters: Sharded

    Shard counters into multiple Entity groupsPick an Entity at random and update it transactionallyCombine sharded Entities together on reads"Contention" reduced by 1/NSharding factor can be changed with little difficulty

  • 8/14/2019 App Engine Google

    21/41

    Counters: Models

    class CounterConfig(Model):name = StringProperty(required=True)

    num_shards = IntegerProperty(required=True,default=1)

    class Counter(Model):name = StringProperty(required=True)

    count = IntegerProperty(required=True,default=0)

  • 8/14/2019 App Engine Google

    22/41

    Counters: Get the count

    def get_count(name):total = 0

    for counter in Counter.gql('WHERE name = :1', name):

    total += counter.count

    return total

  • 8/14/2019 App Engine Google

    23/41

    Counters: Increment the count

    def increment(name):

    config = CounterConfig.get_or_insert(name,name=name)

    def txn():

    index = random.randint(0, config.num_shards - 1)

    shard_name = name + str(index)counter = Counter.get_by_key_name(shard_name)

    if counter is None:

    counter = Counter(

    key_name=shard_name, name=name)

    counter.count += 1

    counter.put()db.run_in_transaction(txn)

  • 8/14/2019 App Engine Google

    24/41

    Counters: Cache reads

    def get_count(name):total = memcache.get(name)

    if total is None:

    total = 0

    for counter in Counter.gql(

    'WHERE name = :1', name):

    total += counter.count

    memcache.add(name, str(total), 60)

    return total

  • 8/14/2019 App Engine Google

    25/41

    Counters: Cache writes

    def increment(name):

    config = CounterConfig.get_or_insert(name,name=name)

    def txn():

    index = random.randint(0, config.num_shards - 1)

    shard_name = name + str(index)counter = Counter.get_by_key_name(shard_name)

    if counter is None:

    counter = Counter(key_name=shard_name,

    name=name)

    counter.count += 1

    counter.put()

    db.run_in_transaction(txn)

    memcache.incr(name)

  • 8/14/2019 App Engine Google

    26/41

    Example: Building a Blog

  • 8/14/2019 App Engine Google

    27/41

    Building a Blog

    Standard blogMultiple blog postsEach post has commentsEfficient paging without using queries with offsets

    Remember, Bigtable doesn't know counts!

  • 8/14/2019 App Engine Google

    28/41

    Building a Blog: Blog entries

    Blog entries with an indexHaving an index establishes a rigid orderingIndex enables efficient pagingThis is a global counter, but it's okayLow write throughput of overall posts = no contention

  • 8/14/2019 App Engine Google

    29/41

    Building a Blog: Models

    class GlobalIndex(db.Model):max_index = db.IntegerProperty(required=True,

    default=0)

    class BlogEntry(db.Model):

    index = db.IntegerProperty(required=True)title = db.StringProperty(required=True)

    body = db.TextProperty(required=True)

  • 8/14/2019 App Engine Google

    30/41

    Building a Blog: Posting an entry

    def post_entry(blogname, title, body):

    def txn():

    blog_index = BlogIndex.get_by_key_name(blogname)

    if blog_index is None:

    blog_index = BlogIndex(key_name=blogname)

    new_index = blog_index.max_indexblog_index.max_index += 1

    blog_index.put()

    new_entry = BlogEntry(key_name=blogname + str(new_index),

    parent=blog_index,index=new_index,title=title, body=body)

    new_entry.put()

    db.run_in_transaction(txn)

  • 8/14/2019 App Engine Google

    31/41

    Building a Blog: Posting an entry 2

    Hierarchy of Entities:

    BlogIndex

    Entry

  • 8/14/2019 App Engine Google

    32/41

    Building a Blog: Getting one entry

    def get_entry(blogname, index):

    entry = BlogEntry.get_by_key_name(parent=Key.from_path('BlogIndex', blogname),blogname + str(index))

    return entry

    That's it! Super fast!

  • 8/14/2019 App Engine Google

    33/41

    Building a Blog: Paging

    def get_entries(start_index):

    extra = Noneif start_index is None:

    entries = BlogEntry.gql('ORDER BY index DESC').fetch(POSTS_PER_PAGE + 1)

    else:start_index = int(start_index)entries = BlogEntry.gql(

    'WHERE index

  • 8/14/2019 App Engine Google

    34/41

  • 8/14/2019 App Engine Google

    35/41

    Building a Blog: Comments

    High write-throughputCan't use a shared index

    Would like to order by post datePost dates aren't unique, so we can't use them to page:

    2008-05-26 22:11:04.1000 Before

    2008-05-26 22:11:04.1234 My post

    2008-05-26 22:11:04.1234 This is another post

    2008-05-26 22:11:04.1234 And one more post

    2008-05-26 22:11:04.1234 The last post

    2008-05-26 22:11:04.2000 After

  • 8/14/2019 App Engine Google

    36/41

    Building a Blog: Composite properties

    Make our own composite string property:"post time | user ID | comment ID"Use a shared index for each user's comment ID

    Each index is in a separate Entity groupGuaranteed a unique ordering, querying across entity groups:

    2008-05-26 22:11:04.1000|brett|3 Before

    2008-05-26 22:11:04.1234|jon|3 My post

    2008-05-26 22:11:04.1234|jon|4 This is another post

    2008-05-26 22:11:04.1234|ryan|4 And one more post

    2008-05-26 22:11:04.1234|ryan|5 The last post

    2008-05-26 22:11:04.2000|ryan|2 After

  • 8/14/2019 App Engine Google

    37/41

    Building a Blog: Composite properties 2

    High throughput because of parallelism

    UserIndex

    Comment

    UserIndex

    Comment

    UserIndex

    Comment

  • 8/14/2019 App Engine Google

    38/41

    What to remember

  • 8/14/2019 App Engine Google

    39/41

    What to remember

    Minimize Python runtime overhead

    Minimize waste

    Why Query when you can Get?

    Structure your data to match your loadOptimize for low write contentionThink about Entity groups

    Memcache is awesome-- use it!

  • 8/14/2019 App Engine Google

    40/41

    Learn more

    code.google.com

  • 8/14/2019 App Engine Google

    41/41