A couple shell pipes. A little Vimscript. Now Neovim tells me (roughly) how many words my post has!
My problem
once again I’m trying to get myself to write more regularly. I post so much everywhere else. Surely I can spare a few hundred words a week on my blog, right?
Since I implied a number in that question, I have a metric I can use. Word count! Just write until I have 250 words, then post it. Or five hundred. Or a thousand. Whatever. I don’t even need a concrete goal. Seeing the number encourages me to type a little bit more.
But how to get the word count of a post? The approach I used a few years ago required enough extra thought that I eventually forgot all about it until editing this post!
Hugo includes word count and reading time in page variables, so I could put the variable in the post template, then keep the server running and browser open while I type. In fact, that’s exactly what I’ve been doing!
Honestly, this is cumbersome. I prefer something that doesn’t require I have Hugo running. Sure, I enjoy previewing the site with live reload, but when I’m writing my flow is interrupted by half my screen refreshing every few seconds.
Yes, I tend to save my work that often. You lose enough editing sessions and it becomes a reflex.
Something in my editor would be better.
A solution
wc
can count words, but hand it the file and it counts everything. I care about the prose and code examples. I don’t want front matter, link URLs, or formatting artifacts included in the count.
What if I stripped the Markdown components out, leaving plain text? That’s exactly what I want to count! Okay, so how do I go about that? I already know that I’ll mess something up if I write my own solution. Maybe one of the tools I already use?
Let’s see. My posts are formatted with mmark, a Markdown parser with its own extensions to the Markdown foundation. Near as I can tell, mmark does not support generating plain text from its input. But Pandoc can. Pandoc can do anything. Except parse mmark.
Well maybe it could if I wrestled with the extensions a little. But I don’t want to do that today. mmark can turn Markdown into HTML. Pandoc can turn HTML into plain text!
So. Have mmark turn the file into HTML, then have pandoc turn the HTML into text, and count the words of text produced.
We use pipes. Pipes everywhere.
NOTE
Found out that Hugo shortcodes look a lot like mmark includes, and that choked my pipes up a bit. Added an initial pass through Perl to remove anything that might be a shortcode. Confuses counting in a couple of my posts about Hugo, but I will survive.
I told mmark to generate a fragment of HTML rather than a complete HTML document, to reduce noise from things like a <title>
tag.
Plus a little massaging by tr
since wc -w
had leading spaces in its output.
Naturally, Hugo reports a different number at this point: 397. You know what, though? I’m not going to fuss about a small difference. If I was getting paid per word (or at all) for these posts, I might care more. Eventually I may dig into code and find out why they report different numbers. As it is, I am happy with the answers that pandoc + wc give me.
How to get Vim to report it
So I know how to get the numbers I want. Now I must learn enough Vimscript to move that information from the shell to the status line.
WARNING
I’ve been using Neovim as my main terminal
$EDITOR
. Although it maintains compatibility with Vim, some of the functions here may require a newer release of Vim than you have on your machine.The goal was to figure this out on my own machine, so unfortunately the only advice I can offer if your Vim complains is: upgrade Vim or switch to Neovim.
Later I may expand this into a library function that understands about different content types. Right now it returns a formatted word count if the filetype of the current buffer is Markdown, and an empty string otherwise. That way there’s something to put in the status line.
Look! Functions! For the full list of neovim’s built-in functions, see :help eval
. Let’s just look at the ones I used.
system
gets me the output of a shell command (with a little backslash escaping so my pipes work).bufname
returns the filename for the requested buffer (%
is Vim shorthand for “the current buffer”)trim
gets rid of that pesky newline. Oh hey I bet it gets rid of the leading spaces, too. Might not need totr
the word count.
Also, I want to keep it from counting words all the time. The piped commands are fast enough on my computer, but it just feels like wasted effort. Keep the word count and time of last check in buffer variables. Update the word count only if the file’s been touched since the last check.
Hey, a couple new functions!
getftime
tells me when the file was last modifiedlocaltime
tells me what time it is now.
Both of these time functions work in “seconds since 1970-01-01.” That’s not unusual, but it does mean you’d need to feed the values to strftime
if you wanted human-readable times.
Yeah, my status line is a little busy.
Anyways. It works! Is it perfect? Of course not. But it gives me the information I want, and that pleases me. On to the next task!
Backlinks
Got a comment? A question? More of a comment than a question?
Talk to me about this page on: mastodon
Added to vault 2024-01-15. Updated on 2024-02-01