The App Engine NDB documentation does a good job of explaining the benefits of the new interface, but it doesn’t really have anything for application developers who want to upgrade their existing models. As I discussed in my previous post, the Khan Academy engineering team recently went through this process and came out with a lot of experience about what works and what doesn’t for an established codebase. What follows is our refined plan of attack, distilled into a field guide that you can use to transition over your own application.
N.B.: The difficulty of making this transition is proportional to the size and complexity of your application. If it’s small enough that you can convert everything at once, great. If not, be prepared to do a fair bit of debugging to smooth things out. Our experience taught us that some parts of this are going to be rocky no matter what, but that the flexibility offered by the upgrade is worth it.
1. Change your models to subclass from ndb.Model and use NDB properties and APIs
- The trivial part:
class Video(ndb.Model). If only this were all it took!
- Always keep the NDB Cheat Sheet handy. It’s massively useful for mapping between the two APIs.
- Some important property differences:
- A lot of the specialized properties like
PostalAddressPropertyare now simply
- There is no more
ListProperty. Instead, add
repeated=Trueto the property constructor. For example, what was once
db.ListProperty(bool)will now be
- NDB has
KeyPropertydoes not automatically fetch the referred-to entity from the datastore. You could write a custom
ndb.Propertysubclass to emulate the old ReferenceProperty:
- A lot of the specialized properties like
from google.appengine.ext import ndb class ReferenceProperty(ndb.KeyProperty): def _validate(self, value): if not isinstance(value, ndb.Model): raise TypeError('expected an ndb.Model, got %s' % repr(value)) def _to_base_type(self, value): return value.key def _from_base_type(self, value): return value.get()
- You’ll also need to convert any of your custom object properties to inherit from
ndb.Property. Good news: it’s pretty trivial and the way to do custom properties is vastly simplified in NDB. See my custom
entity.key.delete(), etc. Refer to the cheat sheet. Making sure you’ve covered these changes everywhere they need to happen is the most difficult part of the conversion process.
- Watch out if any of your models override delete e.g. to do cache invalidation. NDB entities are deleted via
- Speaking of caching, if you fear that NDB’s automatic caching might interfere with whatever you’re doing already, you can disable it on a per-model basis by adding a
_use_cache = Falseclass variable to each model as necessary. More sophisticated policy functions are available as well, but those are best left for final tweaking. This is less of a “I need to be afraid of a potential slowdown” thing and more of a “I want to preserve my existing performance characteristics at the risk of not getting potential improvements, because it will make me feel safer about this” thing.
2. Change all code that uses the newly converted models to use the NDB interface
- This step generally involves two parts: using the new query syntax and then using the new APIs with the query results. Again, the Cheat Sheet is your friend, and the advice from the previous step holds for this one too.
- If you really want to punt on using the new query syntax, you could use the more familiar GQL to build
ndb.Queryinstances. But I don’t recommend punting: the NDB query syntax is pretty sexy and this is one of the least error-prone parts of the conversion. Do note, however, that calling methods like
filteron a query instance doesn’t modify it in-place; you need to do that yourself by reassigning the instance to itself:
from google.appengine.ext import db, ndb class OldBananaStand(db.Model): contains_money = db.BooleanProperty() class NewBananaStand(ndb.Model): contains_money = ndb.BooleanProperty() old_ones = OldBananaStand.all() old_ones.filter('contains_money = True') # => ok! new_ones = NewBananaStand.query() new_ones.filter(NewBananaStand.contains_money == True) # => nope new_ones = new_ones.filter(NewBananaStand.contains_money == True) # => ok!
- If you do any serialization to or deserialization from protocol buffers, that’s different in NDB too but not documented anywhere except in the SDK source. To summarize:
from google.appengine.ext import db, ndb from google.appengine.datastore import entity_pb def db_entity_to_protobuf(e): return db.model_to_protobuf(e).Encode() def protobuf_to_db_entity(pb): # precondition: model class must be imported return db.model_from_protobuf(entity_pb.EntityProto(pb)) def ndb_entity_to_protobuf(e): return ndb.ModelAdapter().entity_to_pb(e).Encode() def protobuf_to_ndb_entity(pb): # precondition: model class must be imported return ndb.ModelAdapter().pb_to_entity(entity_pb.EntityProto(pb))
3. Test, test, test
- Unit tests are good. End-to-end tests are better. (The sun will rise tomorrow.)
- Even when you’ve gotten rid of any exceptions you find, there might still be unintended 404s if, for example, you’re sticking a
db.Keyin your client-side template but the corresponding server-side endpoint is querying for an
4. Deploy the mechanical translation and squash any remaining bugs
- The bad news is that Python’s dynamic typing makes it difficult to perform this kind of refactoring with complete certainty. The good news is that at this point there probably aren’t too many conversion-related errors lurking in your code. Your users will find them all, and you’ll be able to catch and fix them quickly.
If you’ve made it this far, you’re in great shape. You have a solid NDB foundation and now the more advanced features are available for you to play with.
5. Start using the asynchronous API
- This is the first step in which I will recommend a thorough read-through of the relevant documentation before you continue. Really. There’s a lot of new stuff in there.
- Note that while NDB gets/puts/deletes, datastore RPCs (i.e. old-db operations), memcache operations, and urlfetch operations can all be auto-batched, they are only batched with operations of the same kind. The batching is also done within the limits of a single datastore RPC, so if you do a million urlfetches don’t expect to get all those results in one round-trip.
- Asynchronous control flow is done with tasklets, which are functions decorated with
@ndb.tasklet. A tasklet returns a future, which you can get the result of by calling
get_result, naturally. By convention, I append
_asyncto the names of newly tasklet-ized functions. But what if that function needs to be called from existing synchronous code? A future is of little use there. You could upgrade your synchronous code to always call
get_resultafter calling a tasklet, but a slightly nicer solution is this conditionally async decorator that introduces a
from google.appengine.ext import ndb def tasklet(func): """Tasklet decorator that lets the caller specify either async or sync behavior at runtime. If make_sync is False (the default), the tasklet returns a future and can be used in asynchronous control flow from within other tasklets (like ndb.tasklet). If make_sync is True, the tasklet will wait for its results and return them, allowing you to call the tasklet from synchronous code (like ndb.synctasklet). """ @ndb.utils.wrapping(func) def tasklet_wrapper(*args, **kwds): arg_name = "make_sync" sync_by_default = False make_sync = kwds.get(arg_name, sync_by_default) if make_sync: taskletfunc = ndb.synctasklet(func) else: taskletfunc = ndb.tasklet(func) if arg_name in kwds: del kwds[arg_name] return taskletfunc(*args, **kwds) return tasklet_wrapper
- The tasklet decorator makes a function a generator, so to achieve parallelism your tasklet must make asynchronous calls–which could be to other tasklets–and yield when doing so. Results are returned by raising the special
ndb.Returnexception. This is a good example from the App Engine documentation:
# from https://developers.google.com/appengine/docs/python/ndb/async @ndb.tasklet def get_cart_async(acct): cart = yield CartItem.query(CartItem.account == acct.key).fetch_async() yield ndb.get_multi_async([item.inventory for item in cart]) raise ndb.Return(cart) @ndb.tasklet def get_offers_async(acct): offers = yield SpecialOffer.query().fetch_async(10) yield ndb.get_multi_async([offer.inventory for offer in offers]) raise ndb.Return(offers) @ndb.tasklet def get_cart_plus_offers(acct): cart, offers = yield get_cart_async(acct), get_offers_async(acct) raise ndb.Return((cart, offers))
- Keep appstats handy to watch your waterfalls change. Unfortunately, the graphical waterfall is about all that’s useful for code paths that involve tasklets. The default stack limit for tracebacks is too small to be useful for coroutines, and they also prevent you from seeing basic information like what kind of entity a query is for. The App Engine team is aware of this, but they’ve deemed it a low-priority issue.
Finally, here are a couple of additional anecdotes that are somewhat specific to our codebase but worth sharing nonetheless:
- We quickly found out that trying to mix old-db and NDB entities is asking for trouble. We have Topic Tree code that organizes our topics, videos, and exercises into collections, meaning that to upgrade Video we’d also have to also upgrade Topic and Exercise in one fell swoop. In theory you could pepper your code with
isinstancechecks to deal with both types appropriately, but in practice that’s really ugly.
- appengine-mapreduce supports mapping over NDB entities, but until recently there was a bug in the library would cause
yield op.db.Put(entity)to accumulate too many NDB entities and fail with a “datastore RPC too large” error. Luckily for you, this bug has since been fixed and doesn’t exist in later revisions.
Don’t panic. Welcome to the future!