Planning the Wordpress Migration

The next big challenge for Tanzawa, and the last thing required for me to switch to it, is import my data from my existing Wordpress blog. There's 4 major parts to this challange:

1. Parsing the Wordpress export XML file
2. Figuring out how to map Wordpress posts to Tanzawa posts
3. Downloading and importing media
4. Rewriting existing posts to use these new asset urls and fix links.

The first step is the easiest. I've figured out the basics of it yesterday using Beautiful Soup, but will require more exploration of the various posts before I can decide how to properly map data.

The other steps are managable, but wrapped up in a 5th challenge –  managing the entire import process itself. Initially I had planned on just making a command line import tool. Run the command and it does its best to import everything. But telling Tanzawa how to map categories to streams would entail complex command parameters, which I wouldn't want to use myself, let alone inflict on others.

Rather, I need a simple web interface and database tables that will let me manage and monitor the process. The basic workflow I'm imagining is something like this:

1. User uploads Wordpress export file -> Tanzawa saves it into a blob in its database along with some basic meta information about it.
2. Tanzawa will create a mapping record for each category/post/post-kind found in the file. In step 2 users will see a list of their Wordpress categories with a dropdown next to each one with the stream it should map to (not mapping is also an option).
3. Tanzawa will also provision a record for each photo and post to import. This will include its planned final permanent url, as wellΒ  its existing permanent url, and will be central when rewriting content.
4. The photo records will track not only urls, but also file download status, so we don't download photos twice. There should be a page where users can see a list of all photos to import, the status, and perhaps a button to retry if it's failed.Β 

One tricky bit will be that Tanzawa doesn't support background tasks. Which means I can either introduce them (don't really want to) or I need to find a way to control entirely by the front-end.Β  I think a little a small Stimulus controller on the photo list page that loops through each photo and call an import api should be sufficient.

5.Β  Once the photos have been imported there should be a big button to publish all the changes. This will be button that will actually execute the entry creations.
6. After importing is complete, all of the old Wordpress urls should automatically redirect to their new Tanzawa permalink.

Throughout this process I'll likely find data that I (should) import that I don't have a way to handle in Tanzawa - and as such I may need to create features to handle them along the way.

Thinking about how large of a task importing Wordpress properly is a bit daunting. But if I just make a little progress each day, piece by piece, I'll complete it before I know it.