Joplin CLI Batch Processing With Raku

This is embarrassing. I fired up the Joplin desktop app this morning and it told me there was an update. Makes sense. I haven’t loaded the desktop app in a couple months.
Oh hang on. What about the terminal app which I was just writing about over the last couple posts?
Yep. The Changelog shows updates, one of which includes batch processing. Batch processing sounds like exactly the thing to address my many complaints about performance.
Update Joplin with Volta
I use Volta to manage my Node.js resources. Volta treats installing and updating as the same action.
volta install joplin
This is what I have now:
$ joplin version
joplin 1.8.1 (prod)
Now I’m up to date. Let’s see what changes I can make to my journaling code.
Fix the one-liners
The one-liner for adding journal entries works fine as-is.
joplin use Journal && joplin edit $(date --iso=minute)
Reading the entries needs improvement.
raku -e '
qqx{joplin cat $_}.subst(/^(<[\dT:\-]>+)/, { "# $0" }).say for qx{ joplin ls }.lines.sort
' \
| python -m rich.markdown -
Joplin CLI v1.8.1 added a batch
command, which executes commands from a
text file. My challenge: joplin batch
does not appear to have an option
for standard input. This means I can’t casually pipe output from another
process. Here’s what I came up with:
joplin batch <(raku -e 'qx{joplin ls}.lines.sort.map({ "cat $_" }).join("\n").say') \
| raku -ne '.subst(/^(<[\dT:\-]>+)$/, { "# $0" }).say' \
| python -m rich.markdown -
We take advantage of a little shell magic to treat the output of another process as a file.
Don’t ask me to understand the shell magic. In GNU Bash, command <(stuff)
means something along the lines of “evaluate stuff and hand the
output of that evaluation to command
as if it was a file.”
It’s all a little inside-out and twisty. It might help if we break up the chunks.
Chunk | What it does |
---|---|
qx{joplin ls}.lines.sort |
collect the sorted entry list from this notebook |
.map{ "cat $_" } |
create a Joplin command to display this entry |
.join("\n").say |
print those commands as one multi-line string |
joplin batch <(...) |
send raku ’s output to joplin batch |
... | .subst(...) |
turn timestamp lines from output into Markdown headers |
... | python -m rich.markdown - |
format the output for terminal display |
We pull Raku in twice: once to build the command and again to parse the output. On the other hand we’re only calling Joplin twice instead of forty or so times.
That makes the one-liner downright zippy, all things considered.
$ time joplin batch <(raku -e 'qx{joplin ls}.lines.sort.map({ "cat $_" }).join("\n").say') \
| raku -ne '.subst(/^(<[\dT:\-]>+)$/, { "# $0" }).say' \
| python -m rich.markdown -
...
real 0m1.407s
user 0m1.608s
sys 0m0.140s
One and a half seconds for a formatted display of every journal entry. Not bad, considering that I’m running on WSL. Plus I don’t really know one-liners, Raku, or Joplin.
Fix the script
Splitting up the Raku script into logical pieces the other day means that today I only need to fix a single function. Thank goodness.
sub read-entries(@entries) {
@entries.map({
qqx{ joplin cat $_}.subst(/^(<[\dT:\-]>+)/, { "# $0" })
}).join;
}
How much does this function need to improve?
$ time jj today
...
real 0m3.001s
user 0m3.281s
sys 0m0.390s
$ time jj all
...
real 0m31.253s
user 0m31.779s
sys 0m4.616s
Lots. This function needs to be lots quicker. It took three seconds to
display today’s lone entry, and over 30 seconds to display all 40 journal
entries. Every new entry slows the whole thing down, because every new entry
means a new call to joplin
.
I tried mimicking the shell magic but couldn’t figure out how in the time I allowed myself. This isn’t work code where you have to get things just so. This is a fun little utility for my own amusement.
joplin batch
expects a file? Let’s give it a file. But I want that file
to go away when I’m done, so let’s find a module to handle temporary files.
Poking through the directory of Raku modules quickly showed me two possibilities:
Temp::Path
- gives you a friendly object you can write to or stringify when you need a filename
File::Temp
- presents a more utilitarian interface, providing filename and filehandle as separate variables
The end result is the same: a file that goes away when you no longer need it.
I like friendly. Let’s see how Temp::Path
does.
Try Temp::Path
Need to install it, of course. zef
handles Raku modules. I set that
up a while back with rakubrew
.
zef install Temp::Path
Then we let Raku know we’re using the module. That traditionally goes near the top of our script.
use Temp::Path;
More or less following along Temp::Path’s sample usage. with
creates
a block for our temporary file. It even sets the topic
variable $_
. Don’t need to come up with a temporary
variable name for our temporary file.
sub read-entries(@entries) {
with make-temp-path {
.spurt(@entries.map({ "cat $_" }).join("\n"));
qqx{ joplin batch $_ }.subst( /^^(<[\dT:\-]>+)$$/, { "# $0" }, :g );
}
}
The regular expression is starting to look interesting. joplin batch
hands
everything to us as one string. We need to adjust the entry-oriented logic we
had before. Now we find any line containing a lone ISO-8601 timestamp, and
convert it to a top-level Markdown header. The :g
flag tells .subst
to
replace every occurrence.
Those few lines don’t change anything for me as a user. Maybe the speed?
$ time jj today
...
real 0m2.969s
user 0m3.385s
sys 0m0.303s
$ time jj all
...
real 0m3.034s
user 0m3.328s
sys 0m0.505s
Huh. It’s not any faster than the best case for the initial script, with a
single entry taking roughly the same amount of time to load and display. Then
again, batch
is clearly doing its job. One entry takes almost exactly the
same amount of time as 40. Since most days I’ll have multiple entries, that is
an effective optimization for the common case.
But why is my one-liner twice as fast? Is it Temp::Path? Raku? Joplin? Something to do with file I/O on WSL 2? No idea.
Let’s find out if File::Temp does any better.
Try File::Temp
Out comes zef
…
zef install File::Temp
…then use File::Temp instead of Temp::Path…
use File::Temp;
…then rewrite read-entries
one more time…
sub read-entries(@entries) {
my ($filename, $filehandle) = tempfile;
$filehandle.spurt( @entries.map({ "cat $_" }).join("\n") );
qqx{ joplin batch $filename }.subst(/^^ (<[\dT:\-]>+) $$/, { "# $0" }, :g);
}
…and try it out.
$ time jj today
...
real 0m2.502s
user 0m2.771s
sys 0m0.326s
$ time jj all
...
real 0m2.611s
user 0m2.911s
sys 0m0.381s
Ran each version a few times, just to be sure. The version with File::Temp consistently finished a noticeable fraction of a section quicker than using Temp::Path. Still nowhere near the one-liner’s performance, but good enough that I’ll stick with File::Temp until I come up with something better.
Do I care enough to reboot into Linux and see how much of a difference that makes?
Not really.
I can probably optimize this, but it’s not urgent or important. So far I only skim my entries when I already have a few moments to spare. Besides, the real optimizations almost definitely lie with using the Joplin API.
What I’m saying is don’t get hung up on trivia.
Speaking of trivia…
About that regular expression
I need to do something about this.
qqx{ ... }.subst(/^^ (<[\dT:\-]>+) $$/, { "# $0" }, :g);
We already know that regular expressions are their own little language embedded in whatever programming language we happen to be getting work done in. With Raku, we can treat regular expressions as part of the Raku language itself.
Let’s tackle this backwards. Top-down. Whatever it is the fancy people say. I’m going to split it out into its own function. Makes it easier to think of this transformation in isolation.
Hide it in a function
What do I want this function to do? I want it to give me my journal text, but with formatted headers in the right places.
sub format-headers($journal-text) {
$journal-text.subst(
/^^ (<[\dT:\-]>+) $$/,
{ "# $0" }, :g);
}
Use a named capture
Do I want to format every $0
? No. I want to format every entry title.
sub format-journal($journal-text) {
$journal-text.subst(
...,
{ "# $<entry-title>" }, :g
);
}
Of course Raku supports named captures. The part we care
about is stored in the match object. Behind the scenes, $<entry-title>
is
getting the value stored under the key "entry-title"
.
An rx{}
block for legibility
How do I know the entry title? I know the entry title because I found a lone timestamp.
sub format-journal($journal-text) {
$journal-text.subst(
rx{
$<entry-title> = ( <lone-timestamp> )
},
{ "# $<entry-title>" }, :g
);
}
rx{ ... }
indicates an anonymous regex. “Anonymous” as
opposed to what exactly? I’m getting there. As our expressions get more
complex, take advantage of all useful quoting mechanisms.
Notice that instead of a (?<name> pattern)
approach to named captures, in
Raku it looks a lot more like assigning a pattern to a variable. Okay fine.
Assigning a pattern to the match object’s hash, under the key
"entry-title"
. But still. It looks like a more familiar programming
language assignment.
But rather than the expected elaborate chain of metacharacters, the pattern we store is — another identifier?
I told you I was getting there.
Name your regex, not just your capture
What’s a lone timestamp? It’s a timestamp on a line by itself.
my regex lone-timestamp {
^^ <timestamp> $$
}
Now we have a regular expression as its own scoped code object. The regex is the rawest component of a family that includes tokens, rules, and entire grammars. I’m not ready to get into grammars yet, but I am absolutely getting closer.
It’s not an expression; it’s a composition
What does a timestamp look like? Well, a DateTime String
holds an ISO 8601 date, a clock time, and and offset, with a 'T'
between the date and the clock time.
my regex timestamp {
<iso8601-date> 'T' <clock-time> <offset>
}
If we’re looking for a literal string, it’s okay to use a string literal.
Now we have a few regex patterns to define. An ISO 8601 date includes a
year, a month, and a day of the month, separated by '-'
.
my regex iso8601-date {
<year> '-' <month> '-' <day-of-month>
}
Playing more with a language gives me a feel for how to use it based on what it makes easy. Raku makes it easy to create a program by composing it from small pieces. Tiny pieces, even.
Mind you, I have no idea if that’s what raku
the compiler likes. But the
syntax loves it.
A year is four digits, a month is two digits, and the day of the month is two digits.
my regex year { \d ** 4 }
my regex month { \d ** 2 }
my regex day-of-month { \d ** 2 }
The general quantifier **
indicates how many times you
expect a chunk to appear. To this day I can’t remember the exact syntax for
quantifiers in old-school regular expressions. But I can remember the number
4.
Looks like clock time gets saved as hours, minutes, and seconds. In the interest of time, we’ll oversimplify those too.
my regex hours { \d ** 2 }
my regex minutes { \d ** 2 }
my regex seconds { \d ** 2 }
my regex clock-time {
<hours> ':' <minutes> ':' <seconds>
}
And my offset holds an indicator, some hours, and some minutes. Hey, I can reuse my existing regex definitions for those!
my regex offset {
<[+-]> <hours> ':' <minutes>
}
All right. I think that covers it. I enjoyed reusing my expressions for hours and minutes like that. Actual code reuse, in a regular expression. Who would’ve thought?
When I take this lone-timestamp
regex and match it against
"2021-05-24T08:11:00-07:00"
we can see those named expressions at work.
The potential really starts to sink in for me.
「2021-05-24T08:11:00-07:00」
lone-timestamp => 「2021-05-24T08:11:00-07:00」
timestamp => 「2021-05-24T08:11:00-07:00」
iso8601-date => 「2021-05-24」
year => 「2021」
month => 「05」
day-of-month => 「24」
clock-time => 「08:11:00」
hours => 「08」
minutes => 「11」
seconds => 「00」
offset => 「-07:00」
hours => 「07」
minutes => 「00」
And this is just me composing regex objects. Eventually I’m going to try grammars and then look out world!
Ship it!
What am I doing on this soapbox? Time to step down.
My script works. It’s still not fast, but at least it’s never slow. It’s readable. And most important of all, I had fun.
The complete script
Includes a couple more steps into composition that I didn’t feel merited extra blog post paragraphs.
#!/usr/bin/env raku
use File::Temp;
constant $notebook = "Journal";
constant $entry-window = "minute";
my regex digit { \d }
my regex two-digits { <digit> ** 2 }
my regex year { <digit> ** 4 }
my regex month { <two-digits> }
my regex day-of-month { <two-digits> }
my regex hours { <two-digits> }
my regex minutes { <two-digits> }
my regex seconds { <two-digits> }
my regex iso8601-date {
<year> '-' <month> '-' <day-of-month>
}
my regex clock-time {
<hours> ':' <minutes> ':' <seconds>
}
my regex offset {
<[+-]> <hours> ':' <minutes>
}
my regex timestamp {
<iso8601-date> 'T' <clock-time> <offset>
}
my regex lone-timestamp {
^^ <timestamp> $$
}
sub add-entry() {
my $timestamp = DateTime.now.truncated-to($entry-window);
my $command = "joplin use $notebook && joplin edit $timestamp";
shell( $command );
}
sub all-entries() {
qqx{joplin use $notebook && joplin ls}.lines.sort;
}
sub filtered-entries(Str $date-funnel) {
all-entries.grep({ .starts-with($date-funnel) });
}
sub entries-for-today() {
filtered-entries DateTime.now.yyyy-mm-dd;
}
sub entries-for-yesterday() {
my $yesterday = DateTime.now.earlier(days => 1); # or :1day for the terse
filtered-entries $yesterday.yyyy-mm-dd;
}
sub read-entries(@entries) {
my ($filename, $filehandle) = tempfile;
$filehandle.spurt( @entries.map({ "cat $_" }).join("\n") );
format-headers( qqx{ joplin batch $filename } );
}
sub format-headers($journal-text) {
if $journal-text ~~ /<lone-timestamp>/ { $/.say; }
$journal-text.subst(
rx{
$<entry-title> = [ <lone-timestamp> ]
},
{ "# $<entry-title>" }, :g
);
}
multi sub MAIN("add") { #= Add an entry
add-entry;
}
multi sub MAIN("today") { #= Read today's entries
say read-entries( entries-for-today );
}
multi sub MAIN("yesterday") { #= Read yesterday's entries
say read-entries( entries-for-yesterday );
}
multi sub MAIN("all") { #= Read all entries (SLOW!)
say read-entries( all-entries );
}