Using the Webmention.io API
Fetching my IndieWeb mentions with HTTPie and Requests
post python indiewebSeries: fixing my webmentions

So I hosed a local copy of my mentions feed the other month. What’s my “mentions feed,” I hear you wondering?
Whenever somebody shares a reaction to something here — like, reshare, reply, mention — that reaction gets sent to Webmention.io. There are more moving parts than that, of course. Bridgy aggregates reactions to my announcement toots and tweets and sends those to Webmention. It shows in my mentions feed as a reaction to site content when someone reacts to a relevant tweet.
Sometimes folks even post mentions, replies, and reactions directly to the Webmention endpoint. Mostly it’s just social media reactions, though.
The Webmention.io API lets me gather all of these reactions.
Let’s acquaint ourselves with the important parts of this API. You’ll need your API token, which can be found in the Webmention settings once you sign up.
Reading the feed with HTTPie
I’ll use HTTPie for my little exploration. I like the way it works.
pip install httpie
Getting recent reactions
We mainly care about the mentions endpoint. Hand it your domain and API token, and it will send you the 20 most recent responses for your site.
http get https://webmention.io/api/mentions.jf2 \
domain==randomgeekery.org \
token==$WEBMENTION_KEY
HTTPie’s double-equals ==
syntax means “make a query string,” so I end up
with something like this::
https://webmention.io/api/mentions.jf2?domain=randomgeekery.org&token=xxxxx
When http
fetches that URL, I get back a JF2 feed that looks something
like this.
{
"children": [
{
"author": {
"name": "Jumpei KAWAMI",
"photo": "https://webmention.io/avatar/…",
"type": "card",
"url": "https://twitter.com/junkw"
},
"content": {
"text": "I wrote a note:\n\nI added this note from org mode…"
},
"published": "2020-10-25T23:32:25+00:00",
"repost-of": "https://randomgeekery.org/note/2020/10/i-added-this-note-from-org-mode/",
"type": "entry",
"url": "https://twitter.com/junkw/status/1320508544601509889",
"wm-id": 887739,
"wm-private": false,
"wm-property": "repost-of",
"wm-received": "2020-10-26T04:07:20Z",
"wm-source": "https://brid-gy.appspot.com/repost/twitter/brianwisti/…",
"wm-target": "https://randomgeekery.org/note/2020/10/i-added-this-note-from-org-mode/"
},
⋮
],
"name": "Webmentions",
"type": "feed"
}
What’s JF2? It’s obviously JSON. Maybe something to do with JSON Feed? Similar, but no. JF2 is a JSON format for IndieWeb’s microformats2. The mnemonic I’ve been trying to drill into my head is “JSON (micro)Formats 2.”
It’s not a very good mnemonic.
Each entry summarizes the reaction, including which of my posts they were reacting to. That’s kind of important. Most recently, Twitter user @junkw retweeted my announcement about adding a note from Org mode.
Note
There’s also a .json
endpoint for every feed that presents a different
structure for mentions. I prefer it, because it contains fewer wm-*
fields. But the documentation uses JF2, so that’s what I’ll do.
Checking for new reactions
Maybe I’m checking again later and only want to see the new reactions. I
request mentions received since the value of the wm-received
field in the
last entry I have.
http get https://webmention.io/api/mentions.jf2 \
domain==randomgeekery.org \
token==$WEBMENTION_KEY \
since=="2020-10-26T04:07:20Z"
{
"children": [],
"name": "Webmentions",
"type": "feed"
}
Well, yeah. That makes sense. I don’t get the kind of traffic where you’d expect fresh reactions every time you check.
Fetching the oldest reactions first
As I mentioned at the start, my site is a little broken. I need to rebuild the full list of reactions so my Hugo site can work with a complete record. To do that, I should probably start from the oldest mentions and work my way forward.
Rather than the default sort-dir
of down
, I specify up
.
http get https://webmention.io/api/mentions.jf2 \
domain==randomgeekery.org \
token==$WEBMENTION_KEY \
sort-dir==up
{
"children": [
{
"author": {
"name": "Steve Scaffidi",
"photo": "https://webmention.io/avatar/…",
"type": "card",
"url": "https://twitter.com/hercynium"
},
"content": {
"html": "This is where I wish Perl5 had something like Python's AST class hierarchy…",
"text": "This is where I wish Perl5 had something like Python's AST class hierarchy…"
},
"in-reply-to": "https://randomgeekery.org/2020/02/17/python-invoke/",
"published": "2020-02-18T03:11:58+00:00",
"type": "entry",
"url": "https://twitter.com/hercynium/status/1229604443651526656",
"wm-id": 757935,
"wm-private": false,
"wm-property": "in-reply-to",
"wm-received": "2020-02-18T22:32:20Z",
"wm-source": "https://brid-gy.appspot.com/comment/twitter/brianwisti/…",
"wm-target": "https://randomgeekery.org/2020/02/17/python-invoke/"
}
],
"name": "Webmentions",
"type": "feed"
}
Aww, my first site reply. From @hercynium.
I only get 20 results by default, though. Here. Let’s make jq show us. Here’s a default page.
http get https://webmention.io/api/mentions.jf2 \
domain==randomgeekery.org token==$WEBMENTION_KEY sort-dir==up \
| jq '.children | length'
20
Handling result pagination
I can specify how many responses I want in each response with the per-page
parameter. With per-page
set to 100, I get a hundred entries.
http get https://webmention.io/api/mentions.jf2 \
domain==randomgeekery.org token==$WEBMENTION_KEY per-page==100 \
| jq '.children | length'
100
Of course, if there aren’t a hundred entries to fill the page, I only get what’s available.
http get https://webmention.io/api/mentions.jf2 \
domain==randomgeekery.org token==$WEBMENTION_KEY since=="2020-10-26T04:07:20Z" per-page=100 \
| jq '.children | length'
0
The page
parameter — which starts at zero — lets me step through the feed
in batches.
http get https://webmention.io/api/mentions.jf2 \
domain==randomgeekery.org \
token==$WEBMENTION_KEY \
sort-dir==up \
page==1
{
"children": [
{
"author": {
"name": "brian wisti",
"photo": "https://webmention.io/avatar/…",
"type": "card",
"url": "https://twitter.com/brianwisti"
},
"content": {
"html": "…",
"text": "…"
},
"in-reply-to": "https://randomgeekery.org/2020/01/19/restructuredtext-basics-for-blogging/",
"published": "2020-03-10T06:24:45+00:00",
"type": "entry",
"url": "https://twitter.com/brianwisti/status/1237263101482823681",
"wm-id": 766993,
"wm-private": false,
"wm-property": "in-reply-to",
"wm-received": "2020-03-10T06:38:55Z",
"wm-source": "https://brid-gy.appspot.com/comment/twitter/brianwisti/…",
"wm-target": "https://randomgeekery.org/2020/01/19/restructuredtext-basics-for-blogging/"
},
⋮
],
"name": "Webmentions",
"type": "feed"
}
Right. That’s Bridgy catching a Twitter thread. At least I can see the full conversation from my site. Or I will once I’m done fixing everything.
Bonus: checking for reactions to a specific post
I could get a JF2 feed for specific URLs on my site if I was so inclined.
http get https://webmention.io/api/mentions.jf2 \
target==https://randomgeekery.org
{
"children": [
{
"author": {
"name": "",
"photo": "",
"type": "card",
"url": ""
},
"mention-of": "https://randomgeekery.org",
"published": null,
"type": "entry",
"url": "https://kevq.uk/blogroll/",
"wm-id": 796241,
"wm-private": false,
"wm-property": "mention-of",
"wm-received": "2020-05-14T11:25:47Z",
"wm-source": "https://kevq.uk/blogroll/",
"wm-target": "https://randomgeekery.org"
}
],
"name": "Webmentions",
"type": "feed"
}
I deal with my site reactions in bulk so they can be incorporated in the Hugo build. This could be handy for JavaScript-driven update on reactions since the site was last built and pushed, though.
Rebuilding the local mentions file
Now I want to take what I learned about the API to build a local copy of my site’s mention history. Let’s step away from HTTPie and the command line before I try something dangerous.
The Requests library for Python can help me build one list of Webmentions.
import json
import os
import time
import requests
def rebuild_full_feed(domain: str, token: str, target_file: str) -> None:
endpoint = "https://webmention.io/api/mentions.jf2"
page_size = 100
all_entries = []
page_index = 0
while True:
params = {
"domain": domain,
"token": token,
"page": page_index,
"per-page": page_size,
"sort-dir": "up",
}
r = requests.get(endpoint, params=params)
this_page = r.json()
entries = this_page["children"]
all_entries += entries
page_index += 1
entry_count = len(entries)
print(f"Added {entry_count} entries")
# Be a polite Internet citizen
time.sleep(1)
# Stop when we're done
if entry_count < page_size:
print("Reached end of feed")
break
with open(target_file, "w") as f:
json.dump(all_entries, f, indent=4)
print(f"Wrote {len(all_entries)} entries to {target_file}")
if __name__ == "__main__":
domain = "randomgeekery.org"
token = os.environ["WEBMENTION_KEY"]
target_file = "mentions.jf2"
rebuild_full_feed(domain, token, target_file)
$ python rebuild-mentions-feed.py
Added 100 entries
Added 100 entries
Added 100 entries
Added 83 entries
Reached end of feed
Wrote 383 entries to mentions.jf2
This gives me the first requirement for rebuilding my mentions: an intact historical record, up to the current moment.
Intermission
Time to pause.
Why? Why not just use this JSON as a Hugo data file — let it filter through finding relevant mentions for each post?
Short answer: that’s how I got myself into this mess in the first place. I want to take a more careful approach this time.
I have some ideas. Spoiler alert: sqlite-utils is involved. Again.
But first push this and get ready for bed so I’m at least a little rested for work.