Top Banner
NDB The new Python client library for the Google App Engine Datastore Guido van Rossum [email protected]
36

NDB The new Python client library for the Google App Engine Datastore

Mar 23, 2016

Download

Documents

arlo

NDB The new Python client library for the Google App Engine Datastore. Guido van Rossum [email protected]. Google App Engine in a nutshell. Run your web apps in Google’s cloud Opinionated Platform-as-a-Service ( PaaS ) Automatically scales your app - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: NDB The new Python client library for the Google App Engine  Datastore

NDBThe new Python client library for the

Google App Engine Datastore

Guido van [email protected]

Page 2: NDB The new Python client library for the Google App Engine  Datastore

Google App Engine in a nutshell

• Run your web apps in Google’s cloud• Opinionated Platform-as-a-Service (PaaS)• Automatically scales your app• Python-only launch April 2008; Java in 2009• NoSQL datastore– ORM is primary API– small subset of SQL (“GQL”) on top or ORM– Original Python ORM called “db”

Page 3: NDB The new Python client library for the Google App Engine  Datastore

Google App Engine in numbers

• Attained 7.5 Billion daily hits• 1 Million active applications• 250,000 active developers (30-day actives)• Half of all internet IP addresses touch Google

App Engine servers per week• 2 Trillion datastore operations per month

Page 4: NDB The new Python client library for the Google App Engine  Datastore

NDB in a nutshell

• Fix design bugs in the old db API• Implement cool new API ideas• Asynchronous to the core• 100% compatible on-disk representation• Google App Engine Datastore only– Python 2.5 and 2.7 (single- and multi-threaded)– HRD and M/S datastore; US and EU datacenters

Page 5: NDB The new Python client library for the Google App Engine  Datastore

Development process

• Notice widespread frustration with old db• Get management buy-in for a full rewrite• Sit in a corner coding for a year :-)• No, really:– Release open source version early and often– Beg users for feedback and contributions– Try to document, redesign what’s hard to explain– Rinse and repeat

Page 6: NDB The new Python client library for the Google App Engine  Datastore

What’s wrong with old db

• Hard to modify– any time we try to change internals, some user

code breaks that depends on those internals• Started out as a quick demo– “how to do Django-style models in App Engine”– made the official API only weeks before launch

• Has too many layers– data is copied too many times between layers

Page 7: NDB The new Python client library for the Google App Engine  Datastore

Layer cake (old)

db

protocol buffers

datastore.py

Page 8: NDB The new Python client library for the Google App Engine  Datastore

Layer cake (new)

db

datastore.py

datastore_{rpc,query}.py

protocol buffers

ndb

Page 9: NDB The new Python client library for the Google App Engine  Datastore

Cool new API features

• Async core• Auto-batching• Integrated caching• Pythonic query syntax• Give entities nestable structure• Make subclassing Property classes easy

Page 10: NDB The new Python client library for the Google App Engine  Datastore

Other nice things

• Use repeated=True instead of ListProperty• Pre- and post-operation hooks• Key and Query types are truly immutable• All objects have useful repr()s• Unified terminology (id instead of key_name)• PickleProperty, JsonProperty• ProtoRPC support: MessageProperty

Page 11: NDB The new Python client library for the Google App Engine  Datastore

The basics

Page 12: NDB The new Python client library for the Google App Engine  Datastore

Model (schema) definitions

• Model class and Property classes– similar to Django (or any Python ORM)– uses a simple metaclass

• Example:– class Employee(ndb.Model):

name = ndb.StringProperty(required=True) rank = ndb.IntegerProperty(default=3) phone = ndb.StringProperty()

Page 13: NDB The new Python client library for the Google App Engine  Datastore

Basic CRUD

• (Create, Read, Update, Delete)• emp = Employee(name=‘Guido’)• key = emp.put()• emp = key.get()• emp.phone = ‘555-5555’; emp.put()• key.delete()

Page 14: NDB The new Python client library for the Google App Engine  Datastore

Queries

• Query for all entities:– all_emps = Employee.query().fetch()– for emp in Employee.query(): …

• Query for property values:– Employee.query(Employee.rank > 3)– Employee.query(Employee.phone == None)

• Query for multiple conditions:– Employee.query(<cond1>, <cond2>, …)

Page 15: NDB The new Python client library for the Google App Engine  Datastore

Why repeat the class name?

• Limitations of Python as a DSL…• Old db used string literals; error-prone:– Employee.all().filter(‘ rank >’, 3) # extra space

• Protip: write queries as class methods:– @classmethod

def outranks(cls, rank): return cls.query(cls.rank > rank)

– Employee.outranks(3).fetch()

Page 16: NDB The new Python client library for the Google App Engine  Datastore

Mapping a query over a callback

• # Pretend you don’t see the async bits– @ndb.tasklet

def callback(ent): if not ent.name: ent.name = ent.first_name + ent.last_name yield ent.put_async()

– Employee.query().map(callback)• Concurrency controlled by query batch size

Page 17: NDB The new Python client library for the Google App Engine  Datastore

StructuredProperty

• Example: list of tagged phone numbers• In old db:– class Contact(db.Model):

name = db.StringProperty() # following two are parallel arrays phones = db.StringListProperty() tags = db.StringListProperty()

– def add_phone(contact, number, tag): contact.phones.append(number) contact.tags.append(tag)

Page 18: NDB The new Python client library for the Google App Engine  Datastore

StructuredProperty (2)• class Phone(ndb.Model):

number = ndb.StringProperty() tag = ndb.StringProperty()

• class Contact(ndb.Model): name = ndb.StringProperty() phones = ndb.StructuredProperty(Phone, repeated=True)

• def add_phone(contact, number, tag): contact.phones.append(Phone(number=number, tag=tag))

• Contact.query(Contact.phones.number == ‘555-1212’)

Page 19: NDB The new Python client library for the Google App Engine  Datastore

Transactions

• Nothing really new or exciting• Well integrated with contexts and caching• Decorator @ndb.transactional• To specify options:– @ndb.transactional(retries=N, xg=True)

• Join current transaction if one is in progress:– @ndb.transactional(propagation=ALLOWED)

Page 20: NDB The new Python client library for the Google App Engine  Datastore

Caching

• CRUD automatically caches in two places:– in memory (per-context; write-through)– in memcache (shared)• one memcache server for all instances of you app• write locks and clears, but doesn’t update memcache

– memcache algorithm ensures consistency• even when using transactions• except maybe under extreme failure conditions

Page 21: NDB The new Python client library for the Google App Engine  Datastore

Caching (2)

• User can override caching policies– per call, per model class, per context– write your own policy function– can even turn off datastore writes completely!

• Query results are not cached– consistency is too hard to guarantee– however, this works for high cache hit rates:

ndb.get_multi(q.fetch(keys_only=True))

Page 22: NDB The new Python client library for the Google App Engine  Datastore

The async API

(a fairly deep dive)

Page 23: NDB The new Python client library for the Google App Engine  Datastore

Async basics

• Based on PEP 342: generators as coroutines• Has its own event loop and Future class• Constrained by App Engine async API– based on RPCs (“Futures” for server-side work)– only RPCs can be asynchronous (no select/poll)– can wait for multiple RPCs– in original (Python 2.5) runtime, no threads– greenlets/gevent/etc. useless in this environment

Page 24: NDB The new Python client library for the Google App Engine  Datastore

Synchronous example code

def get_or_insert(id): ent = Employee.get_by_id(id) if ent is None: ent = Employee(…, id=id) ent.put() return ent

Page 25: NDB The new Python client library for the Google App Engine  Datastore

Converted to async style

@ndb.taskletdef get_or_insert_async(id): ent = yield Employee.get_by_id_async(id) if ent is None: ent = Employee(…, id=id) yield ent.put_async() raise ndb.Return(ent)

“Look ma, no callbacks”

Page 26: NDB The new Python client library for the Google App Engine  Datastore

Writing async code

• The decorated function (tasklet) is async itself• Really, async operations just return Futures– can separate call from yield:

f = foo_async(); …; a = yield f• yield takes any Future, or a list of Futures– yield <list> returns a list of results:

f = f_sync(); …; g = g_sync(); …; a, b = yield f, g• yielding multiple futures is key to running

multiple tasklets concurrently

Page 27: NDB The new Python client library for the Google App Engine  Datastore

Futures

• NDB Futures are explicit Futures– must use an explicit API to wait for the result

• Three ways to wait:– call f.get_result() # in synchronous context– yield f # in a tasklet– f.add_callback(callback_function) # internal

• Any number of waiters are supported• An exception is also a result (i.e. is re-raised)

Page 28: NDB The new Python client library for the Google App Engine  Datastore

Event loop

• Doesn’t know about Futures• Knows about App Engine RPCs though…• And knows about callback functions• When you’re calling an async API or tasklet– a helper to run the tasklet is queued– you’re given a Future right away– the helper will eventually set the Future’s result– use the Future to wait for the result

Page 29: NDB The new Python client library for the Google App Engine  Datastore

The magic yield

• How does yielding a Future wait for its result?1. “Trampoline” code calls g.next() or g.send() on

the underlying generator object2. If this returns a Future, the trampoline adds a

callback to the Future to restart the generator3. It’s up to whatever created the Future to make

sure that its result is eventually set4. Go to #1, passing the result into g.send()

Page 30: NDB The new Python client library for the Google App Engine  Datastore

Edge cases

• If g.next() or g.send() raises StopIteration, we’re done (ndb.Return is a subclass thereof)

• If it raises another exception, we’re also done, and we pass the exception on

• If it returns an RPC instead of a Future, use the event loop’s native understanding of RPCs

• If it returns a non-Future, that’s an error

Page 31: NDB The new Python client library for the Google App Engine  Datastore

You don’t have to understand this

• Just remember these rules:– use @ndb.tasklet on a generator function– yield *_async() operations– raise ndb.Return(x) instead of return x

• Use yield <list> to increase concurrency• Don’t call synchronous APIs!• Helpful convention: name tasklets *_async• Exception passing is remarkably natural

Page 32: NDB The new Python client library for the Google App Engine  Datastore

Auto-batching

• Automatically combine operations in one RPC– Only like operations can be combined– Must use async API to benefit

• Example:– e1.put(); e2.put() # Two RPCs– yield e1.put_async(), e2.put_async() # One RPC!

• Implemented for datastore get, put, delete; and memcache operations (via Context)

Page 33: NDB The new Python client library for the Google App Engine  Datastore

Auto-batching (2)

• Biggest benefit is between multiple tasklets• Each tasklets does some single ops– example: get_or_insert()

• Tasklets are run concurrently• Each tasklet in turn runs until first blocking op• Those ops are buffered, not sent out yet• When no tasklets left to run, buffered ops are

combined into one batch RPC

Page 34: NDB The new Python client library for the Google App Engine  Datastore

Auto-batching (3)

• Each original single op has its own Future• When the RPC completes, its result is

distributed back over those Futures• And… the tasklets are back in the race!

• But… why not just manually batch operations?– restructuring your code to do that is often hard!

Page 35: NDB The new Python client library for the Google App Engine  Datastore

Conclusion: caveats

• Async coding has lots of newbie traps• Careful when overlapping I/O and CPU work– auto-batch queues only flushed when blocking

• Mixing async and synchronous ops can be bad– in extreme cases can cause stack overflow

• Debugging async code is a challenge– too much state in suspended generators’ locals– can’t easily step over a yield in pdb

Page 36: NDB The new Python client library for the Google App Engine  Datastore

Q & A