Bookmark of Personal Data Warehouses: Reclaiming Your Data

I gave a talk yesterday about personal data warehouses for GitHub’s OCTO Speaker Series, focusing on my Datasette and Dogsheep projects. The video of the talk is now available, and …

So many good ideas in this talk.


  1. I love this idea of standardizing all of your data to sqlite databases so you can freely explore it. I also love this idea of shipping static datasets inside a sqlite db inside a Docker image so you can "scale to zero".

One thing I've been wanting to do for a while is add some kind of public dashboard for my Airbot data. Using something like Datasette I could export subsets (or all of it) to sqlite and allow you to slice and dice the data at will.

  • Also really like the idea of having automated cron/lambda jobs setup to pull your personal data off the web automatically. Right now I'm only importing my swarm checkins / interactions with my syndicated tweets. Having some automated cron jobs to just collect the data to sqlite would allow me to explore my data much easier.


  • There seems to be recurring theme (maybe it's the holy grail) of nerds wanting to build their own search engines/portals for all of their data. In one sense it's a "solved" problem with Spotlight and other such tools. On the other hand Spotlight and these tools don't provide you context.

    There was a tool that was under development in the early Mono days on (written by Nat?) that did this, at least partially. If you were chatting in Gaim it'd show you a window of your recent emails, their contact info, maybe their latest rss feeds. I've always thought a tool like this would be killer - but with so much data being up in servers and hidden behind apis and proprietary services these days it seems increasingly difficult.