Coding for the Web: A Proposal for Better Inline Syntax Highlighting

Monday, March 14, 2011.

Markdown is great for semi-structured text. Pygments is great for syntax highlighting. This blog uses both: jekyll+liquid passes code snippets surrounded by {% highlight languageX %} and {% endhighlight %} to pygments. The rest gets processed with markdown.

So is there anything to complain about here? As usual, the answer is yes.

  • The {% highlight languageX %} syntax isn't supported by github's default markdown renderer. So if I use it in the README.md file for a project, it will appear literally around the un-highlighted output. This may well confuse the hell out of someone trying to copy and paste some code or shell commands. If the github guys don't want to pay the penalty for parsing and syntax highlighting in markdown everywhere, I completely understand. But then let's try find a relatively inert way of specifying the language of a code snippet.
  • The {% highlight languageX %} syntax is also jekyll+liquid-specific. I don't see support for this elsewhere. I do see people rolling their own syntax like this one, which was later incorporated into the python markdown package, and looks for code snippets surrounded by [sourcecode:languageX] [/sourcecode]. It's similarly deficient in that code snippets must be surrounded by special beginning and ending tokens that will be confusing if emitted literally.
  • The {% highlight languageX %} syntax doesn't actually play nice with markdown code blocks: you can't indent the code snippet with 4 spaces and wrap it with {% highlight languageX %} {% endhighlight %}. You must use no indentation for the snippet. This means a markdown processor that doesn't understand the syntax won't even know to emit an html code element; you'll get plain, wrapped text, probably not in a monospace font. Not good. These unindented code snippets also look like shit in markdown.vim.

To summarize, a better solution:

  • Should be "inert", ie not confusing or ugly if output literally as part of the snippet.
  • Should gracefully degrade when a markdown processor doesn't implement special syntax highlighting.
  • Doesn't need both beginning and ending tags. Just scope the syntax highlighting to the current code block.

On to the specific proposal, which is really nothing fancy or new:

  • Put a shebang line at the beginning of the code block.

In a sense, this is a solved problem.

My editor already does syntax highlighting based on the shebang line, and chances are, so does yours. In many cases it also makes the code snippet more complete: if you're going to copy and paste it into a new script, you're going to add the shebang line anyway. But you could also choose to suppress the shebang line when rendering.

Another solution might be to use a modeline. In either case, you're embedding information in a language-specific comment, and doing so in a way that already has precedent.

Here's an example code snippet from the documentation for python-percentcoding:

#!/usr/bin/env python
from percentcoding import quote, unquote
str = "This is a test!"
escaped = quote(str)
print escaped
assert(str == unquote(escaped))

I've already implemented the proposal in Python here, in order to generate html documentation for pypi. It would need to be ported over to Ruby for use in jekyll+liquid.

Posted by Alan on Monday, March 14, 2011. (Discuss)

blog comments powered by Disqus
maelstrom

"After a little while I became possessed with the keenest curiosity about the whirl itself. I positively felt a wish to explore its depths, even at the sacrifice I was going to make; and my principal grief was that I should never be able to tell my old companions on shore about the mysteries I should see."

Illustration for Edgar Allan Poe's story "Descent into the Maelstrom" by Harry Clarke, published in 1919.