Django has a nice security feature that verifies the request HOST header against the ALLOWED_HOSTS whitelist and will return errors if the requesting host is not in the list. Often you’ll see this when first setting up an app where you only expect requests to app.example.com but some bot makes a request to <server ip address>.

While it’s not strictly harmful to add your server ip to your ALLOWED_HOSTS, in theory, it does allow bots to easily reach and fire requests to your Django app, which will needlessly consume resources on your app server. It’s better to filter out the requests before they get to your app server.

For HTTP requests, you can block requests by adding default_server that acts as a catchall. Your app server proxy then set its server_name to the a domain in your ALLOWED_HOSTS. This simple configuration will prevent http://<server ip address> requests from ever reaching your app server.


// default.conf server { listen 80 default_server; return 444; } // app.conf upstream app_server { server 127.0.0.1:8000 fail_timeout=0; } server { listen 80; server_name {{ WEB_SERVER_NAME }}; access_log /var/log/nginx/access.log access_json; error_log /var/log/nginx/error.log warn; location /static/ { alias /var/app/static/; } location / { proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Request-Id $request_id; proxy_redirect off; proxy_pass http://app_server; } }

However, once you enable SSL with Let’s Encrypt, despite the fact that they matching by host, as there is only one SSL server configuration by default, it routes all https traffic to the same host. What this means is that while requests made to http://<server ip address> will continue to be blocked, requests to https://<server ip address> will begin to be forwarded to your django app server, resulting in errors. Yikes!

The solution is to add a default SSL enabled server, much like your http configuration. Thee only tricky bit is that all ssl configurations must have a valid ssl certificate configuration as well.  Rather than making a self-signed certificate I reused my let’s encrypt ssl configuration.

// default.conf
server {
  listen 80 default_server; return 444;
}

server {
  listen 443 ssl default_server;
  ssl_certificate /etc/letsencrypt/live/{{ WEB_SERVER_NAME }}/fullchain.pem;
  ssl_certificate_key /etc/letsencrypt/live/{{ WEB_SERVER_NAME }}/privkey.pem;
  include /etc/letsencrypt/options-ssl-nginx.conf;
  ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem;

  if ($host != {{ WEB_SERVER_NAME }}) {
    return 444;
  }
}

By adding a default SSL server to your nginx config your server_name settings will be respected and requests that do not match your host name will no longer be forwarded to your app server.

Recently at work I’ve been working quite a bit with Django and GraphQL. There doesn’t seem to be much written about best practices for organizing your Graphene-Django projects, so I’ve decided to document what’s working for me. In this example I have 3 django apps: common, foo, and hoge.

There’s two main goals for this architecture:

  1. Minimize importing from “outside” apps.
  2. Keep testing simple.

Queries and Mutations Package

Anything beyond simple queries (i.e. a query that just returns all records of a given model) are implemented in their own file in the queries or mutations sub-package. Each file is as self-contained as possible and contains any type definitions specific to that query, forms for validation, and an object that can be imported by the app’s schema.py.

Input Validation

All input validation is performed by a classic Django form instance. For ease of use django form input does not necessarily match the GraphQL input. Consider a mutation that sends a list of dictionaries with an object id.

{
  "foos": [
    {
        "id": 1,
        "name": "Bumble"
    },
    {
        "id": 2,
        "name": "Bee"
  ]
}

Before processing the request, you want to validate that the ids passed actually exist and or reference-able by the user making the request. Writing a django form field to handle input would be time consuming and potentially error prone. Instead each form has a class method called convert_graphql_input_to_form_input which takes the mutation input object and returns a dictionary that can be passed the form to clean and validate it.

from django import forms
from foo import models

class UpdateFooForm(forms.Form):
    foos = forms.ModelMultipleChoiceField(queryset=models.Foo.objects)

    @classmethod
    def convert_graphql_input_to_form_input(cls, graphql_input: UpdateFooInput):
        return { "foos": [foo["id"] for foo in graphql_input.foos]] }

Extra Processing

Extra processing before save is handled by the form in a prepare_data method. The role this method plays is to prepare any data prior to / without saving. Usually I’d prepare model instances, set values on existing instances and so forth. This allows the save() method to use bulk_create() and bulk_update() easily to keeps save doing just that – saving.

Objects/List of objects that are going to be saved / bulk_created / updated in save are stored on the form. The list is defined / set in init with full typehints. Example:

from typing import List, Optional

class UpdateFooForm(forms.Form):
    foos = forms.ModelMultipleChoiceField(queryset=models.Foo.objects)

    def __init__(*args, **kwargs)
        super().__init__(*args, **kwargs)
        self.foo_bars: List[FooBar] = []
        self.bar: Optional[Bar] = None

Type Definition Graduation

Types are defined in each query / mutation where possible. As schema grows and multiple queries/mutations or other app’s queries/mutations reference the same type, the location where the type is defined changes. This is partially for a cleaner architecture, but also to avoid import errors.

└── apps
├── common
│   ├── schema.py
│   └── types.py  # global types used by multiple apps are defined here
└── hoge
├── mutations
│   ├── create_hoge.py  # types only used by create_hoge are in here
│   └── update_hoge.py
├── queries
│   └── complex_query.py
├── schema.py
└── types.py  # types used by either create/update_hoge and or complex_query are defined here

Example Mutation

The logic kept inside a query/mutation is as minimal as possible. This is as it’s difficult to test logic inside the mutation without writing a full-blown end-to-end test.

from graphene_django.types import ErrorType


class UpdateHogeReturnType(graphene.Union):
    class Meta:
        types = (HogeType, ErrorType)


class UpdateHogeMutationType(graphene.Mutation):

    class Meta:
        output = graphene.NonNull(UpdateHogeReturnType)

    class Arguments:
        update_hoge_input = UpdateHogeInputType()

    @staticmethod
    def mutate(root, info, update_hoge_input: UpdateHogeInputType) -> str:
        data = UpdateHogeForm.convert_mutation_input_to_form_input(update_hoge_)
        form = MutationValidationForm(data=data)
        if form.is_valid():
            form.prepare_data()
            return form.save()
        errors = ErrorType.from_errors(form)
        return ErrorType(errors=errors)

Adding Queries/Mutations to your Schema

This architecture tries to consistently follow the graphene standard for defining schema. i.e. when defining your schema you create a class Query and class Mutation, then pass those to your schema schema = Schema(query=Query, mutation=Mutation)

Each app should build its Query and Mutation objects. These will then be imported in the schema.py, combined into a new Query class, and passed to schema.

# hoge/mutations/update_hoge.py

class UpdateHogeMutation:

    update_hoge = UpdateHogeMutationType.Field()

# hoge/mutations/schema.py

from .mutations import update_hoge, create_hoge


class Mutation(update_hoge.Mutation,
               create_hoge.Mutation):
    pass

# common/schema.py

import graphene

import foo.schema
import hoge.schema

class Query(hoge.schema.Query, foo.schema.Query, graphene.GrapheneObjectType):
    pass

class Mutation(hoge.schema.Mutation, foo.schema.Mutation, graphene.GrapheneObjectType):
    pass

schema = graphene.Schema(query=Query, mutation=Mutation)

Directory Tree Overview

└── apps
├── common
│   ├── schema.py
│   └── types.py
├── foo
│   ├── mutations
│   │   └── create_or_update_foo.py
│   ├── queries
│   │   └── complex_foo_query.py
│   └── schema.py
└── hoge
├── mutations
│   ├── common.py
│   ├── create_hoge.py
│   └── update_hoge.py
├── queries
│   └── complex_query.py
├── schema.py
└── types.py

A side project of mine is to archive the air pollution data for the state of Texas from the Texas Commission on Environmental Quality (TCEQ). My archiver then tweets out via the @Kuukihouston when thresholds of certain compounds go above certain thresholds that have been deemed by the EPA to be a health risk.

Recently I added support to automatically update the list of locations that it collects data from, rather than having a fixed list. Doing so is very straight forward: download the webpage, look for the <select> box that contains the sites, and scrape the value and text for each <option>.

There was only only a single hiccup during development of this feature: the developers don’t close their option tags and instead rely on web browsers “to do the right thing”.

That is their code looks like this:

        Oyster Creek [29]        Channelview [R]

When it should look like this:

        Oyster Creek [29]        Channelview [R]

Lucky web browsers excel in guessing and fixing incorrect html. But as I do not rely on a web browser to parse the html, I’m using BeautifulSoup. The BeaitfulSoup  html.parser closes the tags at the end of all of the options i.e. just before the </select> tag. What this does is when I try to get the text for the first option in the list, I get the text for the first option + every following option.

The simple fix is to switch from the html.parser parser to the lxml parser, which will close the open <option> tags at the beginning of the next <option> tag, allowing me to get the text for each individual item.

# Bad
soup = BeautifulSoup(response.text, ‘html.parser')
# Good 
soup = BeautifulSoup(response.text, 'lxml')

Getting started with GraphQL and Python most of the documentation is focused on the basics: basic queries, filtering using pre-built libraries and so forth. This is great for quick “Hello World” APIs, but there isn’t much discussion for best practices that discuss how to build out larger APIs, testing, or maintenance. Perhaps it’s just too early in the Python-GraphQL story for the best practices to have been fully established and documented.

Introductory GraphQL examples online don’t really require much testing of the resolver specifically. This is because these examples  just return the results of a Django Queryset directly. For those types of fields executing a full query is usually enough. But how to you handle more complex resolvers, ones that have some processing?

Accessing your resolver directly from a unit test is difficult and cumbersome. To properly test a resolver, you’re going to need to  split the parts that warrant independent testing into their own functions / classes. Then once it’s split, you can pass the required input for processing, and return the results.

However passing or returning Graphene objects to your functions will make testing them difficult in much of the same way that calling your resolver outside of a GraphQL request is difficult: you can’t access the attribute values directly – they must be resolved.

blog = Blog(title="my title")
assert blog.title === "my title" # fail

Where Blog is a Graphene object, the above test will fail. As blog.title will not be a String as you’d think, but rather a graphene wrapper that will eventually return “my title” when passed through the GraphQL machine.

There’s two ways to work around this:

  1. Pass in `namedtuples`  that match attribute for attribute your Graphene objects in their place. This is will become a maintenance headache as each time your object changes, you’ll need to also update your named tuples to match.
  2. Pass/return primitive values into your functions and box them into GraphQL objects in your resolver directly before return.

I’ve done both in my code and think that the second method is a best practice when writing GraphQL apis.

By passing the values from graphene as primitives to your processing functions your code is no longer tied directly to graphene directly. You can change frameworks and re-use the same code.

It’s also easier to write tests as you pass in common objects like dict and int which are quite easy to assert equality and simpler to reason about.

Takeaways:

  1. Break your logic out of your resolver and into specialized functions / classes.
  2. Pass/return primitive or other such values from these functions to maximize reuse.

When sharing photos at work most of my co-workers would simply post a link to Google Photos in our company Slack. As an iCloud user, I thought my photos were only visible on my Mac or iPhone – machines logged in to my Apple account and setup to sync photos. So if I wanted to share photos with co-workers on Slack, I had to either upload them to Flickr or upload them directly into Slack. I always just uploaded them into Slack.

I just realized today that I can view all of my photos from the web via iCloud Photos. What’s more is I can share photos with a URL just like my co-workers have been with Google Photos. The shared link also expires after 1 month, which is a nice additional security / privacy feature.

Knowing that I can access my photos outside of Apple devices eases my mind. While I can’t ever see myself switching to Android from iOS, I could see myself using a Thinkpad + Linux for my desktop computing needs.

My website has recently come full circle back to WordPress. It’s been a number of years since I’ve used WordPress. The last time was probably in college on the cheapest shared host I could find. I avoided coming back to WordPress because I didn’t want to maintain a server; I fiddle enough with them at work. Already being a Digital Ocean customer, the 1-Click setup/hardened server seemed like the best way to go. 

I quickly got it configured with all of the IndieWeb plugins to facilitate back-feeding content that I create on other platforms onto my website. The final step starting to use MarsEdit, my old favorite blog editor. Except it couldn’t connect to my website.

Turns out the reason is that the majority of WordPress security issues stem from bots abusing the xmlrpc api and the digital ocean install blocks it at a low level by default. Disabling this block on the server allows programs to use the xmlrpc api and hence MarsEdit to work. Execute the following commands to disable the xmlrpc block.

sudo a2disconf block-xmlrpc
sudo systemctl reload apache2

Back when Macs came with modems built in, the one feature that I thought was super cool but never used was the Print to Remote Fax feature. It was one of those features like “Duh, of course. We can create PDFs. Naturally we can send them over the modem for printing.”

On a project at work I’ve been learning GraphQL. I’m in charge of both developing the backend ( using the wonderful graphene-django package) and the frontend ( using Typescript / Vue.js / axios ) for this specific feature.

After wrapping my head around GraphQL and getting a basic endpoint working I wanted to write some tests to ensure that filtering and other requirements are working as expected.

For a proper End-to-end test I’d usually setup a client and post a json dictionary or such. With End-to-end GraphQL tests you need to send actual GraphQL queries, which makes sense but it feels a bit like SQL Injection.

Below is how I’ve settled organizing and writing my GraphQL tests.

tests/
├── __init__.py
├── conftest.py
├── test_myapp/
│   └── test_schema.py

Because the graphql client will be reused across the django site, I added a global fixture that will automatically load mty project’s schema.

# tests/conftest.py
@pytest.fixture
def graphql_client():
from myproject.schema import schema
from graphene.test import Client
return Client(schema)

In this example I’m testing that I’m filtering data as expected when passing a search parameter.

For setup, first I write a query as its own fixture so I can re-use it throughout the test and it’s clear exactly what is going to be run. Second, I make sure the query uses variables instead of hard-coded values when querying so I can change the input depending on the test. Third, setup a model_bakery fixture for data that I expect to find.

import pytest
from model_bakery import baker

@pytest.mark.django_db
class TestMyModelData:
@pytest.fixture
def query(self):
return """
query testGetMyModel($searchParam: String!){
myModelData(searchParam: $searchParam) {
totalCount
}
}"""

@pytest.fixture
def my_model(self):
baker.make(
"myapp.MyModel",
total_count="20", # Decimal field
)

def test_none_response(self, graphql_client, query, my_model):
executed = graphql_client.execute(query, variables={"searchParam": "skittles"})
assert executed == {"data": {"myModelData": None}}

def test_filters_usage(self, graphql_client, query, my_model):
params = {"searchParam": "skittles"}
executed = graphql_client.execute(query, variables=params)
assert executed == {
"data": {
"myModelData": {
"totalCount": 20
}
}
}

Executing each test I simply pass my query and required variables for the test/query. In this I’m testing the same query twice: once with and one without a searchParameter. My expectation is that I get no results without a search term and data when to my graphql_client fixture.

As the return value from our client is a dictionary, I can simply assert my expecte results with the actual results. If something changes I’ll know immediately.

Using the techniques above I can easily add new tests for my GraphQL endpoint as the available changes or bugs are found.

When testing in Django there’s two basic ways to make an End-to-End test for your view: use the test client to send a request to the server or create a fake request object and manually call your view function.

One isn’t “better” than the other, but I’ve come to prefer using the mock client over the fake request for the following reasons:

  1. Client tests hit the entire stack of code before executing your view allowing you to catch any conflicts with a middleware or settings and your view.
  2. Url Path tests come for free. When testing with fake request objects you can put any path you’d like in there and it will execute missing that bad merge where your url config change removing an endpoint.
  3. It’s (slightly) easier to reason about. If I’m writing a test to confirm X happens when Y is posted I make Y and post it rather than making an object that pretends Y was posted.
  4. It removes the friction to refactor your views. As long as the url stays the same, you can rename and move your view however you’d like without changing any of the tests. This makes it easier to create a more consistent codebase e.g. some views use the verb “save” while others use “register”.