A new markup language for posts

2013-06-04

Previous versions of these posts (blog and project news) were written mostly in pure HTML, within a small "this is the title, this is the summary" framework. I've now added my own markup language and converted all existing posts.

I've been planning this markup for some time; in fact, it's supposed to become a Perl module Positron::Markup, part of the Positron project. The markup is slightly inspired by Markdown, wiki syntaxes, and the capabilities of rich text editors. I say "slightly" because an important goal of Markdown is that the source text be as humanly readable as possible, i.e. resemble a plain text email.

This markup does not have this goal: my demands are low ambiguity, a concise syntax, and fast typing. Also, Markdown is a text-to-HTML converter, while this markup should – at least in theory – first parse to an abstract semantic document tree, and then be rendered to HTML or anything else.

Overview of the syntax

The text is first split into paragraphs. Paragraphs are separated by one or more blank lines. Sometimes one might want to have blank lines within paragraphs, so any tilde ~ at the beginning of a line is removed.

Most paragraphs are simple text paragraphs, which end up in a <p> tag in HTML. Special paragraphs can be denoted by starting them with =tag, for example =subhead for the header above. These can be a single line (like =subhead A heading), or longer:

=code[lang=html]
<h1>Hello, World!</h1>
~
<p>That will be all</p>

which could become

<pre class="html">
&lt;h1&gtHello, World!&lt;/h1&gt;

&lt;p&gt;That will be all&lt;/p&gt;
</pre>

The above shows a blank line, and the options syntax for special paragraphs. In this case, knowing that the code is HTML could be used someday for syntactic coloring. Not today, though.

(Of course, to show you this syntax, I had to wrap the entire thing in its own =code[lang=p_markup] tag, and double up the ~ to ~~, since only one is removed!)

Span-level elements

Within a standard text paragraph, first "span level elements" are extracted. I don't have a better name for them, but if paragraphs correspond to <p>s, these correspond to <span>s. Span level elements start with a [, have a tag: at the beginning and end at the next ]

So far, I've only added links, like [link:/projects the project page] to link to the project page. For fully qualified links, in fact, I've added a shortcut: [http://www.bendeutsch.de/ Ben Deutsch], using the http: as the tag for a HTTP link!

Text level markup

Finally, at the text level, I have *emphasis*, **strong emphasis**, and `monospaced`. These have a simple form shown above (with no whitespace between the delimiter and the contents) and a complex form **- *- `- nested -` -* -**, which allows for nested tags!

Also, text within span-level elements can be marked up at text level too, giving me links with emphasis!

Entities

Finally, I need a way to escape the syntax, if I want to use those characters as actual text. For example to talk about the syntax. As I've been doing for this entire post.

The three variants I know of are doubling up characters like [[, escaping via an escape character like \[, and using named entities like &amp; in HTML. I like the latter most, since the character that needs to be escaped does not appear in the entity at all.

I use an exclamation mark as leader, not "&", so I'd write "!lb;" for "[", "!tick;" for "`", "!bang;" for "!", and of course "!bang;bang" for "!bang;".

The future

This markup will eventually land in Positron, where I hope to find a way to express the various paragraph and span level types with templating markup.

As for this site, the next addition will most likely be something like

=figure
src: /images/example.png
size: 200 x 200
alt: An example of the figure tag
caption: An example of the `figure` tag

And if that works, we can finally get some images here!