A side project of mine is to archive the air pollution data for the state of Texas from the Texas Commission on Environmental Quality (TCEQ). My archiver then tweets out via the @Kuukihouston when thresholds of certain compounds go above certain thresholds that have been deemed by the EPA to be a health risk.

Recently I added support to automatically update the list of locations that it collects data from, rather than having a fixed list. Doing so is very straight forward: download the webpage, look for the <select> box that contains the sites, and scrape the value and text for each <option>.

There was only only a single hiccup during development of this feature: the developers don’t close their option tags and instead rely on web browsers “to do the right thing”.

That is their code looks like this:

        Oyster Creek [29]        Channelview [R]

When it should look like this:

        Oyster Creek [29]        Channelview [R]

Lucky web browsers excel in guessing and fixing incorrect html. But as I do not rely on a web browser to parse the html, I’m using BeautifulSoup. The BeaitfulSoup  html.parser closes the tags at the end of all of the options i.e. just before the </select> tag. What this does is when I try to get the text for the first option in the list, I get the text for the first option + every following option.

The simple fix is to switch from the html.parser parser to the lxml parser, which will close the open <option> tags at the beginning of the next <option> tag, allowing me to get the text for each individual item.

# Bad
soup = BeautifulSoup(response.text, ‘html.parser')
# Good 
soup = BeautifulSoup(response.text, 'lxml')

Getting started with GraphQL and Python most of the documentation is focused on the basics: basic queries, filtering using pre-built libraries and so forth. This is great for quick “Hello World” APIs, but there isn’t much discussion for best practices that discuss how to build out larger APIs, testing, or maintenance. Perhaps it’s just too early in the Python-GraphQL story for the best practices to have been fully established and documented.

Introductory GraphQL examples online don’t really require much testing of the resolver specifically. This is because these examples  just return the results of a Django Queryset directly. For those types of fields executing a full query is usually enough. But how to you handle more complex resolvers, ones that have some processing?

Accessing your resolver directly from a unit test is difficult and cumbersome. To properly test a resolver, you’re going to need to  split the parts that warrant independent testing into their own functions / classes. Then once it’s split, you can pass the required input for processing, and return the results.

However passing or returning Graphene objects to your functions will make testing them difficult in much of the same way that calling your resolver outside of a GraphQL request is difficult: you can’t access the attribute values directly – they must be resolved.

blog = Blog(title="my title")
assert blog.title === "my title" # fail

Where Blog is a Graphene object, the above test will fail. As blog.title will not be a String as you’d think, but rather a graphene wrapper that will eventually return “my title” when passed through the GraphQL machine.

There’s two ways to work around this:

  1. Pass in `namedtuples`  that match attribute for attribute your Graphene objects in their place. This is will become a maintenance headache as each time your object changes, you’ll need to also update your named tuples to match.
  2. Pass/return primitive values into your functions and box them into GraphQL objects in your resolver directly before return.

I’ve done both in my code and think that the second method is a best practice when writing GraphQL apis.

By passing the values from graphene as primitives to your processing functions your code is no longer tied directly to graphene directly. You can change frameworks and re-use the same code.

It’s also easier to write tests as you pass in common objects like dict and int which are quite easy to assert equality and simpler to reason about.

Takeaways:

  1. Break your logic out of your resolver and into specialized functions / classes.
  2. Pass/return primitive or other such values from these functions to maximize reuse.

On a project at work I’ve been learning GraphQL. I’m in charge of both developing the backend ( using the wonderful graphene-django package) and the frontend ( using Typescript / Vue.js / axios ) for this specific feature.

After wrapping my head around GraphQL and getting a basic endpoint working I wanted to write some tests to ensure that filtering and other requirements are working as expected.

For a proper End-to-end test I’d usually setup a client and post a json dictionary or such. With End-to-end GraphQL tests you need to send actual GraphQL queries, which makes sense but it feels a bit like SQL Injection.

Below is how I’ve settled organizing and writing my GraphQL tests.

tests/
├── __init__.py
├── conftest.py
├── test_myapp/
│   └── test_schema.py

Because the graphql client will be reused across the django site, I added a global fixture that will automatically load mty project’s schema.

# tests/conftest.py
@pytest.fixture
def graphql_client():
from myproject.schema import schema
from graphene.test import Client
return Client(schema)

In this example I’m testing that I’m filtering data as expected when passing a search parameter.

For setup, first I write a query as its own fixture so I can re-use it throughout the test and it’s clear exactly what is going to be run. Second, I make sure the query uses variables instead of hard-coded values when querying so I can change the input depending on the test. Third, setup a model_bakery fixture for data that I expect to find.

import pytest
from model_bakery import baker

@pytest.mark.django_db
class TestMyModelData:
@pytest.fixture
def query(self):
return """
query testGetMyModel($searchParam: String!){
myModelData(searchParam: $searchParam) {
totalCount
}
}"""

@pytest.fixture
def my_model(self):
baker.make(
"myapp.MyModel",
total_count="20", # Decimal field
)

def test_none_response(self, graphql_client, query, my_model):
executed = graphql_client.execute(query, variables={"searchParam": "skittles"})
assert executed == {"data": {"myModelData": None}}

def test_filters_usage(self, graphql_client, query, my_model):
params = {"searchParam": "skittles"}
executed = graphql_client.execute(query, variables=params)
assert executed == {
"data": {
"myModelData": {
"totalCount": 20
}
}
}

Executing each test I simply pass my query and required variables for the test/query. In this I’m testing the same query twice: once with and one without a searchParameter. My expectation is that I get no results without a search term and data when to my graphql_client fixture.

As the return value from our client is a dictionary, I can simply assert my expecte results with the actual results. If something changes I’ll know immediately.

Using the techniques above I can easily add new tests for my GraphQL endpoint as the available changes or bugs are found.