Aggregating Nested Values with JSONFields

Note: I originally wrote this for the Kwoosh Workshop blog. Since it's no longer around and I'm posting it here.


Integrating with 3rd party APIs you'll often need to implement a webhook for your application to respond to external events. It's useful to take the event data and store it in your Database for future reference.

As Kwoosh is run out of Texas, users in Texas need to pay an 8.25% sales tax. To my surprise Stripe's
dashboard doesn't seem to offer an answer to a simple question: How much did I collect that was taxable
and how much do I need to remit to the state for last quarter's sales.

Armed with our event log data and Python we can quickly get that the numbers we're looking for.

Our data looks something like this. Naturally a real event would have a lot more data, but for our purposes
today this should suffice.

{
"data": {
"object": {
"tax": 25
...
}
}
}

This data is sent with each invoice.payment_successful event and it's saved in aa JSONField in our database. Using the KeyTransform we can extract values from our JSONField and annotate them
to our QuerySet. Even better, as of Django 1.11 we can nest our KeyTransform so we can extract values
that are multiple levels deep.

Our plan of attack is to annotate the value, sum them together, and return them with the period.
Unfortunately we have to sum them in Python until bug #29139
is fixed. We're not summing millions of rows so it's still quick enough.

from functools import reduce

from django.contrib.postgres.fields.jsonb import KeyTransform

from accounts.models import StripeEvent

def accumulate_tax(start_date, end_date):
"""
Can't aggregate KeyTransformed objects directly, so summing manually
Reference: https://code.djangoproject.com/ticket/29139
"""
events = StripeEvent.objects.filter(
created__range=(start_date, end_date),
type='invoice.payment_succeeded'
).annotate(
tax=KeyTransform('tax', KeyTransform('object', KeyTransform('data', 'data'))),
).values('tax')

if events:
tax_total = reduce(lambda x, y: dict((k, (v or 0) + (y[k] or 0),)
for k, v in x.items()), events)
else:
tax_total = {'tax': 0}

for key, value in tax_total.items():
tax_total[key] = '${:,.2f}'.format(value / 100)

return {'start': start_date, 'end': end_date, 'totals': tax_total}

And just like that we can aggregate values from our JSONFields. If you want to sum other fields
you can simply add another line like this, but replace "tax" with the field you want to sum.

To make it a bit more useful, I built a second function that uses this function to give me quarterly sums.

import datetime

QUARTERS = (
((1, 1), (3, 31)),
((4, 1), (6, 30)),
((7, 1), (9, 30)),
((10, 1), (12, 31)),
)

def calculate_quarters():
today = datetime.date.today()
last_year = today.year - 1

for start, end in QUARTERS:
quarter_start = datetime.date(year=today.year, month=start[0], day=start[1])
year = today.year
if quarter_start > today:
year = last_year

yield accumulate_tax(datetime.datetime(year=year, month=start[0],
day=start[1], hour=0, minute=0,
second=0),
datetime.datetime(year=year, month=end[0],
day=end[1], hour=23, minute=59,
second=59))

We can simply iterate over this function and generate a simple report that shows us a quarterly breakdown
of our sales and taxes collected.