Lacking Natural Simplicity

Random musings on books, code, and tabletop games.

EPUB files, Markup Languages, and briefly Unix

What follows is a lightly edited version (for clarity and relevance) of the postscripts from an email that I recently wrote, transferred here for posterity and the general good.

Danger! Danger Will Robinson! Danger! The postscripts and footnotes are much longer than the main body of the reply! And the footnotes are longer than the text of the postscripts!

P.S. H., P. (and H. M., if you are interested, though I admit this combines some of my more geeky interests and thus may be of less interest to all of you, or for Howard and Paul, for that matter):

I actually figured out how to make ebooks (to a limited degree) because I wanted to try an ebook I made of an RPG adventure I wrote for a currently on hiatus[0] fantasy Savage Worlds roleplaying game campaign for my daughter Lily and my niece and nephews (N1). I originally wrote the adventure[1] in three typesetting systems which use markup languages, LaTeX, ConTeXt, and troff[2] (which I usually use in its guise as GNU groff, but this time I used Heirloom troff, part of the Heirloom Documentation Tools, for its easy access to modern fonts) to compare the markup languages and their PDF output to decide which one I prefered to use. Later I converted it to ReStructuredText, a lightweight markup language[3] that I use, to compare it to the other markup languages.

I have used ReStructuredText on and off for many years, but the main drawbacks to it was that (1) the output produced by its original docutils implementation was excessively stark and difficult to customize to have a nicer appearance, and (2) its workflow was somewhat difficult,[4] Some time ago I discovered Pandoc, a “universal document converter” which can read many input sources, including ReStructuredText, and produce output in many output formats, including PDF (via LaTeX, ConTeXt, or troff; in ways easier to customize the appearance of) and HTML, and, as it turns and importantly to this story, EPUB, the most common format for ebooks! I started using Pandoc because it made it easier to generate PDF from ReStructuredText with one command (since Pandoc runs all the intermediate steps and cleans up any temporary files needed). It turned out that the abilities to read multiple input formats and to more easily customize the output was important to me as well.

So, having converted the adventure over to ReStructuredTexT for comparison[5] and at first using PDF through Pandoc's troff -ms output, I soon decided to take a look at Pandoc's other output formats. I started with LaTeX and ConTeXt, and decided that the PDF output via LaTeX was not of much interest to me, but the PDF output via ConTeXt offered greater control over the appearance of the final PDF output and the opportunity of adding via writing Lua filters some features to the resulting documents that lightweight markup languages normally don't offer, such as indexes and cross references that are both hyperlinks and include page numbers and section names in the PDF output, which are features that I didn't need in the adventure document, but which I expect to need in future documents.

But back to the important point, Pandoc can produce EBUB output for ebooks! Since I already had the adventure in ReStructuredText, and Pandoc produces EPUB, and I have an ebook reader, a Kindle, it just makes sense to figure out how to get it onto my Kindle! First I used Pandoc to generate the EPUB. That required figuring out how to generate a reasonably attractive cover. Then wrote a small config file for Pandoc. Then I generated the EPUB output. Then I figured out how to convert that over to MOBI, one of the formats that the Kindle can use.[6] Then I mailed it to my Kindle's email, and it looked reasonably good![7]

I hope you've enjoyed this twisty maze of passages, all different!

And with a Zork reference I really must end this email!

P.P.S. Omitted for irrelevance.

P.P.P.S. Sorry, no deeply nested parenthetical expressions this time!

Here's an addendum with two Apple Messenger messages to P., reflecting on converting this from an HTML email into a blog post:

The HTML dialect Google uses in its MIME emails is very odd. It doesn’t use <p> elements, using instead <div> elements. Unfortunately, pandoc converts those into containers, and nests them according to the nesting of the <div> elements. To fix this I hand edited the HTML to remove the outer <div> elements and convert the remaining ones into <p>s. Also, for some reason when I ran the documents through HTML tidy it converted the unicode characters into incorrect HTML character entities. I see now that it has a -utf8 switch, which I’ll have to remember for the next time I do this. (There will inevitably be a next time.)

OMG, now I have have to put that in the blog post! How many saving throws am I going to fail today anyway?

Last edited: 2021-07-17 00:53:29 EDT

Print Friendly and PDF


Comments powered by Disqus