The GAE Datastore: Living with Eventual Consistency
Google just opened up App Engine to the public, and the web development world is reeling. We get Google's infrastructure for free (mostly), but there's a catch: we have to give up our beloved relational databases. No JOINs. No ACID transactions across the whole DB. Welcome to the App Engine Datastore.
Bigtable and Entity Groups
The Datastore is built on top of Bigtable. To achieve massive scale, Google distributes your data across many machines. Because of this, global queries are eventually consistent. If you write a record and immediately try to query for it, it might not show up yet.
To get strong consistency, you must use Entity Groups. By defining a parent-child relationship between entities, you tell Google to store that data in the same "node," allowing for transactional integrity.
Defining Models in Python
In 2008, GAE only supports Python. Here is how we define a strongly consistent relationship:
from google.appengine.ext import db
class Author(db.Model):
name = db.StringProperty(required=True)
class Story(db.Model):
author = db.ReferenceProperty(Author, collection_name='stories')
title = db.StringProperty()
body = db.TextProperty()
created = db.DateTimeProperty(auto_now_add=True)
# To ensure strong consistency, we make the Author the parent
def create_story(author_key, title, body):
story = Story(parent=author_key, title=title, body=body)
story.put()
return story
The Ancestor Query
If you want to see the latest stories from an author and be sure you aren't seeing stale data, you use an Ancestor Query:
author = Author.all().filter('name =', 'Homer').get()
# This query is strongly consistent because it stays within the entity group
stories = Story.all().ancestor(author.key()).order('-created').fetch(10)
The Cost of Consistency
Entity Groups aren't a silver bullet. You are limited to roughly one write per second per entity group. If you try to put all your users in one group to get global consistency, your app will grind to a halt as soon as you get any traffic. You have to design your data model to be "sharded" by default, embracing eventual consistency where it doesn't matter (like a global list of public posts) and using Entity Groups only where it's vital (like a user's private settings).
Aunimeda provides DevOps engineering and infrastructure services - CI/CD pipelines, containerization, cloud deployments, and monitoring setups.
Contact us to discuss your infrastructure needs. See also: DevOps Services, Custom Software Development