Update: As of version 1.4 Django supports SELECT FOR UPDATE.
A frequently overlooked feature of SQL is the FOR UPDATE
clause on SELECT
.
Even in this age of NoSQL datastores, most large-scale webapps will use a
relational database somewhere, and good money would bet on most having a
transaction race condition somewhere.
Transaction isolation in a nutshell
Transaction isolation is the mechanism by which a SQL database prevents
multiple in-progress transactions from interfering with each other. Postgres
provides two different levels, read committed and serializable. Read
committed is the default and is sufficient for the vast majority of
operations. A simple mental model for how this works is to have a dirty
flag on each row. Any time a row is touched by an UPDATE
or DELETE
statement,
the dirty is set. Similarly any time a SELECT
is about to operate on a dirty
row, more specifically one which was marked dirty by another transaction, it
will block and wait for the other transaction to either commit or rollback and
will then update the value it was trying to access.
Using UPDATE
As long as you can express all your operations using SQL expressions in an
UPDATE
, this isolation is sufficient. For example, you can increment an
integer column via UPDATE foo SET x=x+1 WHERE id=1;
. Unfortunately Django
doesn’t make this easy. The naive way to perform the same operation is:
obj = Foo.objects.get(id=1)
obj.x += 1
obj.save()
This means that two requests can come in simultaneously and the second will clobber the first:
A: obj = Foo.objects.get(id=1) # obj.x is now 1
B: obj = Foo.objects.get(id=1)
A: obj.x += 1 # obj.x is now 2
B: obj.x += 1
A: obj.save() # obj.x saved as 2, as expected
B: obj.save() # obj.x is still only 2, when it should be 3
To do this correctly in plain Django calls, we can use QuerySet.update and an F expression:
Foo.objects.filter(id=1).update(x=F('x')+1)
This works, but the syntax is a bit unfortunate, even moreso when you just want to update a field on a model you already have. I highly recommend using Andy McCurdy’s update method:
obj = Foo.objects.get(id=1)
update(obj, x=F('x')+1)
Using update()
like this also has the added advantage of only sending the
given fields to the database, as opposed to save()
which serializes the
entire model (possibly causing race conditions even on fields you didn’t
modify).
The need for FOR UPDATE
Where things start to break down is when you need to update a row using more
complex code. One option is to use stored procedures, but this is effectively
impossible to do while keeping mutli-database compatibility. The other option
is to do the computation in Python code. Without FOR UPDATE
this puts us
back into race-condition territory in the same way as Model.save()
. What
FOR UPDATE
does is to set the same dirty flags that UPDATE
and DELETE
use before returning the rows. This means that no other transaction can
alter it.
Unfortunately the Django ORM doesn’t yet expose FOR UPDATE
as part of
the query system, but with some creativity we can add it in. Much of the
credit for this goes to Alexander Artemenko
who wrote the initial version of the helper.
from django.db import models, connections
from django.db.models.query import QuerySet
class ForUpdateQuerySet(QuerySet):
def for_update(self):
if 'sqlite' in connections[self.db].settings_dict['ENGINE'].lower():
# Noop on SQLite since it doesn't support FOR UPDATE
return self
sql, params = self.query.get_compiler(self.db).as_sql()
return self.model._default_manager.raw(sql.rstrip() + ' FOR UPDATE', params)
class ForUpdateManager(models.Manager):
def get_query_set(self):
return ForUpdateQuerySet(self.model, using=self._db)
Then all you have to do is inherit from ForUpdateManager
in your manager and
use for_update()
at the end of the filter chain:
qs = Foo.objects.filter(id=1).for_update()
obj = qs[0]
update(obj, x=something_complex())
An example
Our main use case at Atari for all of this is updating a user’s billing information, specifically their next billing date for people on monthly cycles. Due to the nature of MMOs, it is possible that someone could redeem a game-time card at the exact moment that a background task is processing them for monthly billing. At heart this code boils down to:
user = User.objects.filter(id=1).for_update()[0]
new_date = user.next_billing_date + relativedelta(months=1, day=user.next_billing_day)
update(user, next_billing_date=new_date)
This is both transactionally safe and concise, which is just about all you can ask for from a SQL database.