Personal Data Warehouses: Reclaiming Your Data by Simon Willison
I gave a talk yesterday about personal data warehouses for GitHub’s OCTO Speaker Series, focusing on my Datasette and Dogsheep projects. The video of the talk is now available, and …

So many good ideas in this talk.

  1. I love this idea of standardizing all of your data to sqlite databases so you can freely explore it. I also love this idea of shipping static datasets inside a sqlite db inside a Docker image so you can “scale to zero”.

    One thing I’ve been wanting to do for a while is add some kind of public dashboard for my Airbot data. Using something like Datasette I could export subsets (or all of it) to sqlite and allow you to slice and dice the data at will.

  2. Also really like the idea of having automated cron/lambda jobs setup to pull your personal data off the web automatically. Right now I’m only importing my swarm checkins / interactions with my syndicated tweets. Having some automated cron jobs to just collect the data to sqlite would allow me to explore my data much easier.

  3. There seems to be recurring theme (maybe it’s the holy grail) of nerds wanting to build their own search engines/portals for all of their data. In one sense it’s a “solved” problem with Spotlight and other such tools. On the other hand Spotlight and these tools don’t provide you context.

    There was a tool that was under development in the early Mono days on (written by Nat?) that did this, at least partially. If you were chatting in Gaim it’d show you a window of your recent emails, their contact info, maybe their latest rss feeds. I’ve always thought a tool like this would be killer – but with so much data being up in servers and hidden behind apis and proprietary services these days it seems increasingly difficult.

