I want to write at least 250 words per day. This is not a 30 day challenge. It is just something I want to do. I write more than 250 words daily when you count social network posts and chat text. Wouldn’t it be nice if some of those words were organized around a single idea?
I need some way to count those words, of course. The obvious solution is wc.
$ wc counting-words.markdown 106 464 3108 counting-words.markdown
The documentation tells me that the first column is the number of lines, the second column is the number of words, and the third column is the number of characters.
I can train my brain to remember this, but instead I use the
-w flag to get only the word count.
$ wc -w counting-words.markdown 464 post.markdown
That is better, but it is not an accurate word count. I am currently using Jekyll for blogging, and every blog post file includes a section of front matter a section of Markdown content. My goal is 250 words of prose, not 250 total words. I do not want to count the front matter.
I could use assorted shell tools to accomplish this, but I would rather make a Ruby one-liner.
First I get the basic information I was already getting from
$ ruby -e 'puts ARGF.read.split.count' counting-words.markdown 464
How do I separate the head from the body of the post?
I could do some fiddly bits using
ARGF.readlines with a separator argument, but I will keep going with what I have.
$ ruby -e 'puts ARGF.read.split(/^---$/).inspect' counting-words.markdown ["", "\nlayout: post\ntitle: Counting Words in Blog Posts\ndescription: Using Ruby to track my verbosity\ncategory: Programming\ndate: 2014-10-02\ntags: ruby\n", "\nI want to write at least 250 words per day. ..."]
How many words are in the body?
$ ruby -e 'puts ARGF.read.split(/^---$/)[-1].split.count' counting-words.markdown 317
I did say that I wanted my word count to be prose. I should exclude code blocks. That calls for a multi-line regular expression, stripping out the fenced code blocks in my post.
$ ruby -e 'puts ARGF.read.split(/^---$/)[-1].gsub(/^~~~ .+?^~~~ $/m, "").split.count' counting-words.markdown 357
I do not want to count link definitions either.
$ ruby -e 'puts ARGF.read.split(/^---$/)[-1].gsub(/^~~~ .+?^~~~ |\[.+?\]:.+?$/m, "").split.count' counting-words.markdown 341
This is good enough. Now I turn it into a bash alias.
# words in post / work in progress alias wip='ruby -e '"'"'puts ARGF.read.split(/^---$/)[-1].gsub(/^(~~~ .+?^~~~ |\[.+?\]:.+?)$/m, "").split.count'"'"
Oh jeez those quotes hurt my brain. It was the first solution I came across to handle shell quoting, though. I may come up with something prettier. Perhaps a full script or looking for an existing tool. This will do for now.
$ wip counting-words.markdown 341