Pager now has links and code blocks

After using pager to write the last article I found some shortcomings of it, first of all I had to put html tags between normal text to create code blocks and links, and that create some ugly markdown files. So I decided to extend pager to accept code and link constructs.

The first problem I faced when extending pager to accept constructs inside a paragraph is that there is no syntax that recognizes what is a a pararaph on pager, if it's not a header then it is parsed as if it was a paragraph, so if I had put the parser for the links in the same parser as the paragraph/header parser and put a link inside a paragraph, it would spit out this html <p>... parser and put a link ...</p><a href="">example</a><p>, it would spit out this html...</p>, and that's not what we wanted, we wanted to have the link inside the paragraph like this, <p>... parser and put a link ... <a href="">example</a>, it would spit out this html ... </p>. So I needed to come up with another idea to parse the links that are inside paragraph.

After giving it another thought to the problem I came with the idea of a two pass parser, that would be run a parser that would spit out the text with headers and paragraph in html but leave the text inside the paragraph untouched, thus having the link and code constructs, and then running another parser on top of this semi-parsed output. Obviosly to get this we needed to modify our datatypes so this are the modification in datatypes that I made. First we needed to add a new Tag constructor for every new construct so our Tag datatype definition is this data Tag = Head Header | Para Paragraph | Lin Link | Lit Literal | Cod Code, The difference we have with respect to the first version are the last 3 constructor, the Lin constructor that receives a Link, the Lit that receives a Literal (basically anything that isn't a Code or Link in the second pass) and the Cod that receives a Code.

The second stage of the parser is easy to understand, first we try to parse a literal that doesn't consume any of the char in reservedCharacters, the reservedCharacters are the characters that starts a new construct, in this case [ and ~. if it founds one of this character it tries to parse a Link and if it fails it tries to parse a Code construct. If all of this fails it means that there was some of the characters that wasn't matched with it's respective construct so it consume the character and tries to match a literal. That's it, those are new features on pager.

Thanks for reading.