Collecting my attempts to improve at tech, art, and life

Using markdown-it in Python

Tags: python markdown

NOTE

I hope the information I share here is useful, but I’m just documenting what I needed to know after reading the markdown-it-py user guide for basic usage and principles.

What is it

markdown-it-py is a Python markdown library based on markdown-it from the JavaScript world. markdown-it-py provides the core flexibility needed by MyST Markdown, a particularly capable Markdown flavor for the Executable Book Project.

markdown-it-py is configurable, extensible, and — most important for me today — not too hard to get started with.

How do I install it

markdown-it-py alone gets you “enough”. Everything you need for CommonMark, at least. But I want more than enough. I want all the features I can reasonably gather under one install.

$ pip install markdown-it-py[linkify,plugins]

What did that just install?

markdown-it-py
provides core markdown handling sufficient for common expected behavior
linkify-it-py
enables recognition of URLs embedded in text strings; needs additional plugins to do anything with those URLs
mdit-py-plugins
provides a collection of core plugins that make markdown-it-py useful to a feature-happy person such as myself

How do I use it

from markdown_it import MarkdownIt

markdown = "Hello, **world**"
md = MarkdownIt()
print(md.render(markdown))
<p>Hello, <strong>world</strong></p>

But I need to be just a little fancier than “Hello World.” Let’s build a little Typer application that takes a markdown path and makes an HTML fragment. While I’m at it, I can borrow from my neovim rplugin to fit everything into my Hugo site.

import typer
from markdown_it import MarkdownIt

def make_html(markdown):
    """Return HTML string rendered from markdown source."""

    md = MarkdownIt()

    return md.render(markdown)


def main(source_path: str):
    """Transforms markdown into HTML with markdown-it-py."""

    target_path = source_path.replace(".md.txt", ".html")

    with open(source_path, encoding="utf-8") as fp:
        markdown = fp.read()

    html = make_html(markdown)

    with open(target_path, "w", encoding="utf-8") as fp:
        fp.write(html)


if __name__ == "__main__":
    typer.run(main)

For the moment I’ll rely on Python to let me know if I try rendering a Markdown file that doesn’t exist.

This will turn a single .md.txt file - the extension I’m using to slide past Hugo’s default Markdown handling - into HTML. I even get a nice --help blurb.

$ python rendermd.py --help

Usage: rendermd.py [OPTIONS] SOURCE_PATH

  Transforms markdown into HTML with markdown-it-py.

Arguments:
  SOURCE_PATH  [required]

Options:
  --install-completion [bash|zsh|fish|powershell|pwsh]
                                  Install completion for the specified shell.
  --show-completion [bash|zsh|fish|powershell|pwsh]
                                  Show completion for the specified shell, to
                                  copy it or customize the installation.
  --help                          Show this message and exit.

Now I start adding capabilities. If you see this post on the site, you’ll know it worked.

Picking a preset for common patterns

markdown-it-py provides predefined parser presets, allowing you to choose between common parsing patterns.

commonmark
the default; sticks with the core CommonMark specification; probably good enough for 80% of the Markdown that gets written
gfm-like
similar to Github-flavored Markdown; better if you need tables and URL transformation
js-default
similar to markdown-it base behavior; adds typographical replacements like “smart quotes” to the gfm-like set
zero
basically just breaks text into paragraphs; provides a bare minimum for you to build a highly custom Markdown parser

I went with js_default for my own baseline because it enables the most core features.

def make_html(markdown):
    """Return HTML string rendered from markdown source."""

    md = MarkdownIt("js-default")

    return md.render(markdown)

Using options to tune your parser

Even with presets available, there are common tweaks that some folks can’t live with and others can’t live without. markdown-it-py wraps those up in a single dictionary of options.

maxNesting
recursion protection; think of it as a number for “how fancy can I get with my Markdown?”
html
allow raw HTML through
linkify
transform URLs into links
typographer
processes assorted typographic conventions including proper quote marks
quotes
what double and single quotes look like if you enable typographer
xhtmlOut
ensure output is valid in the ancient XHTML dialect
breaks
treat line breaks in source as <br> elements
langPrefix
CSS class prefix for code blocks; language-python, language-console etc
highlight
a function to provide syntax highlighting for code blocks

Presets have default values for each of these options.

Option commonmark gfm_like js_default zero
maxNesting 20 20 100 20
html enabled enabled
linkify enabled
typographer
quotes “”‘’ “”‘’ “”‘’ “”‘’
xhtmlOut enabled enabled
breaks
langPrefix language- language- language- language-
highlight

I like fancy quotes. I expect URLs to display as links. I occasionally need to fall back to raw HTML.But most importantly on this here blog: I insist on syntax highlighting.

Adding a highlight function

Rather than decide for themselves how syntax highlighting is done, the markdown-it-py folks added a single option for us to hook in a function using our preferred approach.

The highlight function should take three arguments:

You could use whatever highlighting code you want. You could even have your function call out to an external program. I use Pygments because it’s familiar.

Also, I’m going to ignore attrs for now. I rarely add special options to my code samples, so it’s kind of wasted on me. So far.

Better make a note or something in case I forget that I’m ignoring it.

def highlight_code(code, name, attrs):
    """Highlight a block of code"""

    if attrs:
        rich.print(f"Ignoring {attrs=}")

    lexer = get_lexer_by_name(name)
    formatter = HtmlFormatter()

    return highlight(code, lexer, formatter)

Okay, what’s make_html look like now, with options set and highlighting function defined?

def make_html(markdown):
    """Return HTML string rendered from markdown source."""

    md = MarkdownIt(
        "js-default",
        {
            "linkify": True,
            "html": True,
            "typographer": True,
            "highlight": highlight_code,
        },
    )

    return md.render(markdown)

Still a few pieces missing from my minimal toolkit. I need to dig a little deeper than I planned for a “hey friends, markdown-it-py looks like fun” post. But I at least want to render the kind of posts I would write.

For that I need to use some plugins. Good thing I installed mdit-py-plugins.

Adding parser functionality with plugins

mdit-py-plugins bundles many plugins into a single library. Today I need no plugins beyond what that library provides.

definition lists

Description lists have been core to my HTML authoring flow since forever. I’ll write the HTML myself if I have to. Fortunately, I don’t have to, thanks to the deflist plugin.

from mdit_py_plugins import deflist

def make_html(markdown):
    # ...
    md.use(deflist.deflist_plugin)

    return md.render(markdown)

Now I can write a definition list:

[`markdown-it-py`][markdown-it-py]
: provides core markdown handling sufficient for common expected behavior

[`linkify-it-py`][linkify-it-py]
: enables recognition of URLs embedded in text strings; needs additional plugins
  to *do* anything with those URLs

[`mdit-py-plugins`][mdit-py-plugins]
: provides a collection of core plugins that make `markdown-it-py` useful to a 
  feature-happy person such as myself

And markdown-it-py produces a proper description list:

<dl>
    <dt><a href="https://pypi.org/project/markdown-it-py/"><code>markdown-it-py</code></a></dt>
    <dd>provides core markdown handling sufficient for common expected behavior</dd>
    <dt><a href="https://pypi.org/project/linkify-it-py/"><code>linkify-it-py</code></a></dt>
    <dd>Enables recognition of URLs embedded in text strings; needs additional plugins
    to <em>do</em> anything with those URLs</dd>
    <dt><a href="https://pypi.org/project/mdit-py-plugins/"><code>mdit-py-plugins</code></a></dt>
    <dd>Provides a collection of core plugins that make <code>markdown-it-py</code> useful to a
    feature-happy person such as myself</dd>
</dl>

admonition blocks

So about a week ago, I was writing the first version of this post. I was nearly done. Then I got a little too tired and deleted the wrong file — without adding it to the repo first!

Oops.

Anyways, this redraft is less of a tutorial and more of a notes dump. I want to warn folks about that with a little blurb at the top. I can use the containers plugin for that. The plugin provides slots for validation and deeper processing. All I want today is a <div> with custom class. I can use CSS for the rest.

Using the container plugin with a name option provides that much.

from mdit_py_plugins import container, deflist

def make_html(markdown):
    # ...
    md.use(container.container_plugin, name="note")

    return md.render(markdown)

A note container looks like this in the markdown:

::: note
This is my *note*.
:::

Without any additional configuration, it produces this HTML:

<div class="note">
    <p>This is my <em>note</em>.</p>
</div>

Good enough!

Stopping here because it’s good enough for what I wrote so far today. But there is plenty more to explore. markdown-it-py allows reviewing and manipulating parsed tokens directly. Plus there’s the whole MyST Markdown thing to explore.

But for now we’re good. Let me drop in the Python code that transformed this post in the context of my Hugo site, then go convert the Typer logic to Invoke.

#!/usr/bin/python

import frontmatter
import rich
import typer
from markdown_it import MarkdownIt
from mdit_py_plugins import container, deflist
from pygments import highlight
from pygments.formatters import HtmlFormatter
from pygments.lexers import get_lexer_by_name


def highlight_code(code, name, attrs):
    """Highlight a block of code"""

    if attrs:
        rich.print(f"Ignoring {attrs=}")

    lexer = get_lexer_by_name(name)
    formatter = HtmlFormatter()

    return highlight(code, lexer, formatter)


def make_html(markdown):
    """Return HTML string rendered from markdown source."""

    md = MarkdownIt(
        "js-default",
        {
            "linkify": True,
            "html": True,
            "typographer": True,
            "highlight": highlight_code,
        },
    )
    md.use(deflist.deflist_plugin)
    md.use(container.container_plugin, name="note")

    return md.render(markdown)


def main(source_path: str):
    """Transforms markdown into HTML with markdown-it-py."""

    target_path = source_path.replace(".md.txt", ".html")
    post = frontmatter.load(source_path)
    post.content = make_html(post.content)
    post.metadata["format"] = "md"

    with open(target_path, "w", encoding="utf-8") as fp:
        fp.write(frontmatter.dumps(post))


if __name__ == "__main__":
    typer.run(main)

Got a comment? A question? More of a comment than a question?

Talk to me about this page on: mastodon

Added to vault 2024-01-15. Updated on 2024-02-02