Collecting my attempts to improve at tech, art, and life

Listing Hugo Content Extensions With Raku

Tags: hugo raku-lang csv site programming

attachments/img/2020/cover-2020-03-31.jpg
We like quick answers to important questions

How many text formatting languages have I used for my Hugo site? For that matter, how many content files were written in each?

$ hugo list all | raku -e 'bag(lines[1..*].map({ .split(",")[0].IO.extension })).say'
Bag(adoc(4), html, md(327), rst(109))

Mostly Markdown, with a fair chunk of reStructuredText and a little bit of Asciidoctor. Oh and one HTML source file, originally an Org-Jekyll experiment.

Okay that’s it. That’s the post, everyone. Time to go home!

Breaking it down

It helps me to understand the pieces I smash together in my one-liners. Read along if you like, or move on to more interesting topics. I don’t judge.

First off: why?

The Hugo static site generator supports multiple content formats. I use a few of them, which complicates my occasional urge to rebuild the whole site with something else.

If I know how my content formats are distributed, it will help me understand how much work I have cut out for me in The Eventual Inevitable Rebuild.

hugo list

Hugo’s list commands print a CSV list of your site’s content files. The content listed depends on which subcommand you use:

list all
Everything! Well, except section indexes.
list draft
Content with draft: true
list expired
Content with expiryDate in the past
list future
Content with date in the future

What does that output look like?

$ hugo list all
path,slug,title,date,expiryDate,publishDate,draft,permalink
content/draft/listing-hugo-content-extensions-with-raku/index.adoc,,Listing Hugo Content Extensions With Raku,2020-03-27T22:36:13-07:00,0001-01-01T00:00:00Z,0001-01-01T00:00:00Z,true,https://randomgeekery.org/draft/listing-hugo-content-extensions-with-raku/
content/draft/managing-music-with-beets/index.adoc,,Managing My Music With Beets,2020-03-27T10:31:41-07:00,0001-01-01T00:00:00Z,0001-01-01T00:00:00Z,true,https://randomgeekery.org/draft/managing-music-with-beets/
content/post/2020/03/stdu-viewer/index.rst,,STDU Viewer,2020-03-26T23:42:16-07:00,0001-01-01T00:00:00Z,2020-03-26T23:42:16-07:00,false,https://randomgeekery.org/2020/03/26/stdu-viewer/
content/note/2020/03/every-post-has-a-uuid/index.rst,,Every Post Has a UUID,2020-03-21T19:06:00-07:00,0001-01-01T00:00:00Z,2020-03-21T19:06:00-07:00,false,https://randomgeekery.org/note/2020/81/every-post-has-a-uuid/

I could feed that to any language with a nice library for handling CSV files — which is most of them. Heck, I could feed it to Excel!

Now that I think to look, there’s the Awesome CSV list of tools and resources.

But no. Today I handed it off to the first tool that came to mind.

raku -e

Look, we’ve all been stuck at home for a bit. I need a break from Python. How about Perl’s sister language, Raku?

bag(lines[1..*].map({ .split(",")[0].IO.extension })).say

bag(…).say

bag uses its arguments to create a Bag — basically, a set that gives each member a “weight” based on integer values. say prints the gist of the Bag, telling me what I need to know. The highest level view of this one-liner is “make a Bag and give me a general idea what it looks like.”

lines[1..*].map({ … })

Now I need to create that bag from hugo list all. lines called as a routine creates a list of lines from $*ARGFILES, which currently holds the piped output from my Hugo invocation. I don’t need the header line, so I use a Range to slice the remaining lines.

map applies a block to each of those lines, returning a new list to create our Bag. What’s going on in that map?

.split(",")[0].IO.extension

That leading dot? It’s an item context view of the topic variable handed to the block by map. Yes, for folks who don’t feel like clicking: topic variable is Raku’s name for $_, an easily abused blessing of Perl languages.

So the line of comma-delimited values is split into values. Each line from Hugo’s CSV gets split into a list of values, but I only care about the first one. The first value is the path to the content file itself.

Coercing that to an IO::Path object lets me ask for an extension.

The block returns that extension, so when map is all done it has a list of file extensions:

(adoc adoc rst rst md md md rst ...)

During initialization, the Bag counts how many times each extension appears in the list. Since the result of that tally is all I care about, I print it out.

$ hugo list all | raku -e 'bag(lines[1..*].map({ .split(",")[0].IO.extension })).say'
Bag(adoc(4), html, md(327), rst(109))

Alternate versions

While I was learning more about my impulsive little invocation, I wondered about other ways to get the same information from Raku.

A bit more Perlish

All those method dots bother you? No problem. We can use them like plain old subroutines too. Course, we have to reach for $*SPEC. This lower-level IO::Spec object understands file extensions on our platform.

$ hugo list all | raku -e 'say bag(map({ $*SPEC.extension(split(",", $_)[0]) }, lines[1..*]))'
Bag(adoc(4), html, md(327), rst(109))

Using Text::CSV

I know what to expect from Hugo’s CSV output, but what if I didn’t? I’d feed the standard input handle $*IN to H. Merijn Brand’s Text::CSV module.

$ zef install Text::CSV
$ hugo list all | raku -MText::CSV -e \
  'bag(csv(in => $*IN, headers=>"skip", fragment=>"col=1").map({ .IO.extension })).say'
Bag(adoc(4), html, md(327), rst(109))

Though if I was being this careful, I’d probably also move away from a one-liner. But that takes us a long ways away from my original goal of getting a quick answer to an idle question.

Well, I satisfied my curiosity and understood a little more Raku. That was fun!


Got a comment? A question? More of a comment than a question?

Talk to me about this page on: mastodon

Added to vault 2024-01-15. Updated on 2024-02-02