Also: puttering is fun.
I incorporated in my site workflow a while back because of fRew
Schmidt. See, he wrote an interesting post about his Hugo / Unix / Vim
blogging workflow. I immediately copied
Perl script he wrote for querying site metadata via an in-memory
q is ridiculously fast and convenient if you know the query you want to ask.
Is your query too awkward to comfortably type on a command line? Put it in a
shell script, like fREW’s
That gets you a list of all the tags in your entries, ordered by number of appearances.
I converted it to Python eventually, of course. Records
produced nice-looking output while letting me stick to SQL syntax. Noticeably
q, but acceptable. A bit later, I added a table for aliases.
Then, another table for announcements — my tweets and toots about new content.
Aside from answering idle questions like “how many drawings have I posted?” my little database streamlines the process of publishing site updates and automatically announcing new posts.
What’s the problem?
I’m thinking about adding more tables to improve IndieWeb integration. But to be honest, my creation has grown unwieldy. And slow.
Compared to the original Perl script?
I’m tempted to gut my Python workflow. Start fresh from
q and Perl again. But
no. That would be a lot of work — duplicated work, at that.
I’ll stick to Python for now. But surely we can do better than the duct tape script I’ve been using.
sqlite-utils may be just the thing.
sqlite-utils provides both a Python library and a command
line too for building and querying SQLite databases. It can guess a schema from
structures as you insert them — though you can be explicit if you prefer. It’s
particularly useful for bulk inserting or exporting data dumps in JSON or CSV.
Makes sense. It’s part of the Datasette ecosystem. Willison’s Datasette project simplifies exploring and publishing data for folks who understand data but not necessarily databases.
And although it’s slower than
q for SQL queries, it’s much closer than what I
came up with!
Okay! Let’s do this. I want to load and prepare details from my content entries
before I let
sqlite-utils turn it into a database.
Getting metadata from hugo and front matter
We could recursively walk through the
content/ folder, collecting front
matter details about everything we find. Most of the time, this is more than I
want. Hugo site organization includes supplemental
material like page bundles and
_index pages. That’s distracting when I only
want to focus on my posts and notes. If you just want core content — posts,
pages, stuff like that — ask Hugo.
I mentioned the
hugo list commands before.
hugo list all prints summaries of all your main content entries, as
comma-separated text. Python’s standard subprocess and csv are all I
need to turn that into something useful.
csv.DictReader gives me a collection of manageable dictionaries.
I can’t think of a pretty way to get at an entry’s front matter, so let’s just get it out of the way.
- read the file
- grab the front matter YAML
- hand back a dictionary based on that front matter text
Why a function? It’s only a couple lines of code.
Sure, but it’s an ugly couple of lines. Plus it hides the YAML-specific details away in case I switch my front matter to TOML, JSON, or some other format.
- A namedtuple helps when you want some structure but not a full class.
- Always make sure times are in UTC when saving to the database!
- Hugo determines sections from a file’s location under
content/, so let’s do the same.
entry_pathwill help connect tags to entries when creating the database
I got everything tidy and pretty. We’re ready for the database.
Loading the data
sqlite-utils Python API ends up taking hardly any
code at all, thanks partly to the work spent massaging the entries. You treat
the database as a dictionary, with each table name as a field. Tables get
created when you insert something.
I think we called that “autovivification” in Perl. It’s nice.
- Rebuild the database from scratch every time this runs. Alternately, leave this out and
upsert_allto do an insert or update.
pathas the primary key, since every filename should be unique.
- A tuple
pkspecifies a compound primary key — each pair must be unique.
sqlite-utilshides the details of foreign keys in SQLite. Just use a list of
All that works makes
build_db a convenenient starting point for
I could specify an in-memory database, but no. For the moment I’ll settle on
creating a database file. That way I can more easily play with the
sqlite-utils command line interface.
I’m content. But I’m also curious. What does my site metadata look like in Datasette?
site.db with Datasette
I don’t have any great insights here. I just wanted to look at the pretty tables. Played with metadata and checked out the results.
Oh and more on the pleasant interface. It allows you to edit the SQL for your current view.
There are detail pages for each row. Again, the Niche Museums site shows that detail view can be heavily tweaked.
Year 0001? Hang on, let me check that in my shiny new Datasette server.
Year 0001 content
|…and a bunch of drafts||[redacted]||0001-01-01T00:00:00+00:00||1||draft||[redacted]|
Oh that’s right. Non-dated pages and drafts.
No I’m too embarrassed to share the full list. I get lots of ideas, okay? Then
I’m afraid to throw any of them away. Sometimes they mutate. A couple months
ago, this post was going to be about playing with
hugo list in Python.
Yep. Datasette is pretty nifty. I’m not missing
q at all. And I’m really
not missing my Python
That was fun! Still got a lot ahead of me.
- Figuring out the quickest way to refresh the database
- Adding those IndieWeb tables, so I can review Webmentions
- New Pyinvoke tasks to build, query, and explore the database
- Fixing those tasks that still call out to my old and busted
- Make my Datasette site dashboard pretty
Check out some other datasettes available online!
- New York City Cultural Institutions
- Every mention of 2016 primary candidates in hip-hop songs
- the location of every tree in San Francisco — you need to page through to see all 189,144 locations