What flavor do you want?

July 24, 2018

One of the defining properties of Markdown is its wide variety of implementations.¹ Because there are so many versions of Markdown from which to choose, an important part of my exercise writing a Markdown-aware text editor is to figure out which Markdown standard to use and thus which features the app will boast.

CommonMark provides an impressively complete specification…

A good starting point for thinking about Markdown implementations is a look at the CommonMark spec. CommonMark’s documentation completely and unambiguously defines a Markdown standard based on the core features from the original Markdown spec, plus a few (like link references) that were not a part of the original Markdown specification.

CommonMark’s documentation goes to great lengths to define the behavior of even the most extreme of edge cases,² which is the mark of a great spec.

However, not only does CommonMark provide a great specification of a common Markdown standard; the creators of CommonMark also describe a strategy for parsing CommonMark documents. This strategy works for implementing a CommonMark parser, but a careful developer can use it as a guideline in creating a parser for other flavors of Markdown—namely, more feature-rich versions.

…but kramdown has such attractive features…

kramdown is perhaps the most feature-rich Markdown interpreter, or at least the most popular among extended Markdown implementations.

kramdown is compatible with many of the less extended Markdown standards, and it provides features that CommonMark does not. Tables, footnotes, and attribute lists are a few such examples.³

Additionally, Jekyll, a static site generator popularized by GitHub Pages, uses kramdown as the default Markdown processor, while GitHub Pages only supports kramdown.

This make kramdown a well-endorsed and quite popular standard. It also means that I have grown accustomed to using kramdown’s features, and I am convinced that many of them should be considered “standard” Markdown features.⁴

…and I want a feature-rich standard that is well-defined…

There are no perfect Markdown flavors. CommonMark has a complete spec but is missing great features (e.g. footnotes). kramdown has a wide and useful feature set, but its documentation lacks enough depth to meaningfully aid me if I implement my own kramdown parser.

The creators of kramdown have published a syntax document, but it is merely an overview compared to the CommonMark spec. The kramdown syntax documentation barely touches on how it handles edge cases, and it certainly does not provide as many examples as the CommonMark spec.

The complete kramdown spec is a consequence of its parser code. The kramdown syntax documentation does not try to unambiguously define a specification the way CommonMark’s documentation does, which means articulating the complete kramdown spec requires reading, running, and testing the kramdown processor code.

If CommonMark has the specification completeness that I need for developing a Markdown interpreter, but kramdown is what I and plenty others actually use, what is the solution?

…so I will implement a CommonMark parser, then attempt to implement a kramdown parser.

It seems like the best path forward is to start by implementing a CommonMark parser, simply because the CommonMark documentation is more verbose and complete than the kramdown documentation.

Once I have successfully implemented a CommonMark parser in the app, I should have a good enough understanding of how the parser functions to properly add kramdown features on top of it and modify existing behaviors to conform to the kramdown spec.⁵

Perhaps I can even make both CommonMark and kramdown parsing available to the user by allowing them to choose which the standard in which they’re writing. My initial impression of this idea, however, is that it will unnecessarily complicate the user experience.

my thoughts on a unifying Markdown standard

As I was researching for this part of development, I skimmed an article from 2014 entitled “The State of Markdown,” in which the author seems pretty adamant that the diversity of standards for Markdown is a Really Bad Thing and that people really need to hop on the CommonMark train so that we can all have one unifying standard.

Whether one unifying Markdown standard can or will exist, I am unsure. Whether Markdown even needs such a unifying standard is a question I think is worth asking. It’s a markup language with relatively informal roots.⁶ In its current form, it just doesn’t feel like the kind of thing that needs a unifying standard.

Without the addition of key features like footnotes, it seems to me unlikely that CommonMark will become the Markdown standard, and it does not appear as though any such additions will happen anytime soon, for the following reasons:

Since the publishing of the standard in 2014, no major features have ever been added.⁷
The list of proposed changes is bloated and has only ever grown.⁸
The finalized 1.0 spec is set to be released later this year.

If CommonMark can’t become the common Markdown standard, I think it’s very unlikely that any Markdown standard will rise to the top in the near future.

This huge variety of implementations is worth exploring more in depth. The proliferation of differing Markdown standards is unique; with other markup languages, you see different interpreters adopting slightly different standards, but not with such strong disagreement between the standards.

You see this with HTML; browsers render certain minor or newly developed elements differently. In addition, some modern browsers adopt different standards for novel CSS selectors. The biggest disagreement between browsers concerns the parts of the standard that are “under development.” With every other part of the standard, there is very good agreement.

This is not the case with Markdown, even the fundamentals of how to interpret a list differs between major standards.

Part of what may explain this is the existence of a standards body, W3C, for web languages like HTML. The existence of W3C puts pressure on browsers to adopt the common standards that other browsers are using, resulting in good agreement between browsers on how to render most webpages.

W3C has not chosen to standardize Markdown the way they have standardized HTML and CSS, and I think it is safe to assume that this strongly correlates with the massive count of Markdown standards that exist. I’d have to look further into this to assert that such is truly the case. ⤴︎
Over 20 percent of the examples provided in the CommonMark spec are dedicated to explaining how emphasis and strong emphasis indicators are interpreted. Additionally, 17 rules defining the use of emphases are outlined in the section. ⤴︎
While I have yet to create a table with Markdown, it’s comforting to know it’s an option. I use attribute lists mostly for styling images in my posts, and as you can see, I use footnotes very regularly. ⤴︎
For example, even John Gruber, the inventor of Markdown, uses footnotes on his blog, yet they are not included in his Markdown spec. Footnotes are not specified in the CommonMark spec, either. ⤴︎
Or, more realistically, I will have to either write a whole new parser for kramdown or make (or find) a wrapper for the existing kramdown Ruby parser. Again, I expect this to be easier once I have the experience of implementing a CommonMark parser. ⤴︎
As the authors of the CommonMark spec point out, the canonical description of Markdown is buggy and ambiguous. ⤴︎
Refer to the specification’s change log. ⤴︎
Refer to the histories of each of the two lists of proposed extensions on the CommonMark GitHub wiki. ⤴︎