Lacking Natural Simplicity (Posts about tex)https://tkurtbond.github.io/categories/tex.atom2024-01-23T18:49:42ZT. Kurt BondNikolaEPUB files, Markup Languages, and briefly Unixhttps://tkurtbond.github.io/posts/2020/12/01/epub-files-markup-languages-and-briefly-unix/2020-12-01T15:56:13-05:002020-12-01T15:56:13-05:00T. Kurt Bond<p>What follows is a lightly edited version (for clarity and relevance)
of the postscripts from an email that I recently wrote, transferred
here for posterity and the general good.</p>
<hr class="docutils">
<p><strong>Danger! Danger Will Robinson! Danger!</strong> The postscripts and
footnotes are much longer than the main body of the reply! And the
footnotes are longer than the text of the postscripts!</p>
<p><strong>P.S</strong>. H., P. (and H. M., if you are interested, though I
admit this combines some of my more geeky interests and thus may be of
less interest to all of you, or for Howard and Paul, for that matter):</p>
<p>I actually figured out how to make ebooks (to a limited degree)
because I wanted to try an ebook I made of an RPG adventure I wrote
for a currently on hiatus<a class="footnote-reference brackets" href="https://tkurtbond.github.io/posts/2020/12/01/epub-files-markup-languages-and-briefly-unix/#footnote-1" id="footnote-reference-1" role="doc-noteref"><span class="fn-bracket">[</span>0<span class="fn-bracket">]</span></a> fantasy <a class="reference external" href="https://en.wikipedia.org/wiki/Savage_Worlds">Savage Worlds</a> roleplaying game
campaign for my daughter <a class="reference external" href="https://www.facebook.com/lily.bond.31">Lily</a> and my <a class="reference external" href="https://www.facebook.com/eva.atha.7">niece</a> and nephews (<a class="reference external" href="https://www.facebook.com/mason.atha.7">N1</a>). I originally wrote the
adventure<a class="footnote-reference brackets" href="https://tkurtbond.github.io/posts/2020/12/01/epub-files-markup-languages-and-briefly-unix/#footnote-2" id="footnote-reference-2" role="doc-noteref"><span class="fn-bracket">[</span>1<span class="fn-bracket">]</span></a> in three typesetting systems which use <a class="reference external" href="https://en.wikipedia.org/wiki/Markup_language">markup
languages</a>, <a class="reference external" href="https://en.wikipedia.org/wiki/LaTeX">LaTeX</a>, <a class="reference external" href="https://en.wikipedia.org/wiki/ConTeXt">ConTeXt</a>, and <a class="reference external" href="https://troff.org/">troff</a><a class="footnote-reference brackets" href="https://tkurtbond.github.io/posts/2020/12/01/epub-files-markup-languages-and-briefly-unix/#footnote-3" id="footnote-reference-3" role="doc-noteref"><span class="fn-bracket">[</span>2<span class="fn-bracket">]</span></a> (which I usually use in its guise as
<a class="reference external" href="https://en.wikipedia.org/wiki/GNU">GNU</a> <a class="reference external" href="https://www.gnu.org/software/groff/">groff</a>, but this time I
used <span class="app">Heirloom troff</span>, part of the <a class="reference external" href="http://n-t-roff.github.io/heirloom/doctools.html">Heirloom Documentation
Tools</a>, for its easy access to modern fonts) to compare the markup
languages and their PDF output to decide which one I prefered to
use. Later I converted it to <a class="reference external" href="https://docutils.sourceforge.io/rst.html">ReStructuredText</a>, a lightweight markup
language<a class="footnote-reference brackets" href="https://tkurtbond.github.io/posts/2020/12/01/epub-files-markup-languages-and-briefly-unix/#footnote-4" id="footnote-reference-4" role="doc-noteref"><span class="fn-bracket">[</span>3<span class="fn-bracket">]</span></a> that I use, to compare it to the other markup
languages.</p>
<p>I have used ReStructuredText on and off for many years, but the main
drawbacks to it was that (1) the output produced by its original
<a class="reference external" href="https://docutils.sourceforge.io/">docutils</a> implementation was
excessively stark and difficult to customize to have a nicer
appearance, and (2) its workflow was somewhat difficult,<a class="footnote-reference brackets" href="https://tkurtbond.github.io/posts/2020/12/01/epub-files-markup-languages-and-briefly-unix/#footnote-5" id="footnote-reference-5" role="doc-noteref"><span class="fn-bracket">[</span>4<span class="fn-bracket">]</span></a> Some
time ago I discovered <a class="reference external" href="https://pandoc.org/">Pandoc</a>, a “universal
document converter” which can read many input sources, including
ReStructuredText, and produce output in many output formats, including
PDF (via <span class="app">LaTeX</span>, <span class="app">ConTeXt</span>, or <span class="app">troff</span>; in ways easier
to customize the appearance of) and HTML, and, as it turns and
importantly to this story, EPUB, the most common format for ebooks! I
started using <span class="app">Pandoc</span> because it made it easier to generate PDF from
ReStructuredText with one command (since <span class="app">Pandoc</span> runs all the
intermediate steps and cleans up any temporary files needed). It
turned out that the abilities to read multiple input formats and to
more easily customize the output was important to me as well.</p>
<p>So, having converted the adventure over to ReStructuredTexT for
comparison<a class="footnote-reference brackets" href="https://tkurtbond.github.io/posts/2020/12/01/epub-files-markup-languages-and-briefly-unix/#footnote-6" id="footnote-reference-6" role="doc-noteref"><span class="fn-bracket">[</span>5<span class="fn-bracket">]</span></a> and at first using PDF through <span class="app">Pandoc</span>'s
<code class="docutils literal">troff <span class="pre">-ms</span></code> output, I soon decided to take a look at <span class="app">Pandoc</span>'s other
output formats. I started with <span class="app">LaTeX</span> and <span class="app">ConTeXt</span>, and decided that the
PDF output via <span class="app">LaTeX</span> was not of much interest to me, but the PDF
output via <span class="app">ConTeXt</span> offered greater control over the appearance of the
final PDF output and the opportunity of adding via writing Lua filters
some features to the resulting documents that lightweight markup
languages normally don't offer, such as indexes and cross references
that are both hyperlinks and include page numbers and section names in
the PDF output, which are features that I didn't need in the adventure
document, but which I expect to need in future documents.</p>
<p>But back to the important point, <span class="app">Pandoc</span> can produce EBUB output for
ebooks! Since I already had the adventure in ReStructuredText, and
<span class="app">Pandoc</span> produces EPUB, and I have an ebook reader, a Kindle, it just
makes sense to figure out how to get it onto my Kindle! First I used
<span class="app">Pandoc</span> to generate the EPUB. That required figuring out how to
generate a reasonably attractive cover. Then wrote a small config file
for <span class="app">Pandoc</span>. Then I generated the EPUB output. Then I figured out how
to convert that over to MOBI, one of the formats that the Kindle can
use.<a class="footnote-reference brackets" href="https://tkurtbond.github.io/posts/2020/12/01/epub-files-markup-languages-and-briefly-unix/#footnote-7" id="footnote-reference-7" role="doc-noteref"><span class="fn-bracket">[</span>6<span class="fn-bracket">]</span></a> Then I mailed it to my Kindle's email, and it looked
reasonably good!<a class="footnote-reference brackets" href="https://tkurtbond.github.io/posts/2020/12/01/epub-files-markup-languages-and-briefly-unix/#footnote-8" id="footnote-reference-8" role="doc-noteref"><span class="fn-bracket">[</span>7<span class="fn-bracket">]</span></a></p>
<p>I hope you've enjoyed this twisty maze of passages, all different!</p>
<p>And with a <a class="reference external" href="https://en.wikipedia.org/wiki/Zork">Zork</a> reference I
really must end this email!</p>
<aside class="footnote-list brackets">
<aside class="footnote brackets" id="footnote-1" role="doc-footnote">
<span class="label"><span class="fn-bracket">[</span><a role="doc-backlink" href="https://tkurtbond.github.io/posts/2020/12/01/epub-files-markup-languages-and-briefly-unix/#footnote-reference-1">0</a><span class="fn-bracket">]</span></span>
<p>Pandemics are no fun!</p>
<p>I originally thought I'd get through this email without footnotes, but
<a class="reference external" href="https://www.worldwidewords.org/qa/qa-nee1.htm">needs must when the devil
drives</a>. I rather
enjoy footnotes in email messages, but it's not as convenient in gmail
as it used to be in Emacs. And since it offered the opportunity for a
Shakespeare reference of sorts, I'm quite pleased, in general.</p>
</aside>
<aside class="footnote brackets" id="footnote-2" role="doc-footnote">
<span class="label"><span class="fn-bracket">[</span><a role="doc-backlink" href="https://tkurtbond.github.io/posts/2020/12/01/epub-files-markup-languages-and-briefly-unix/#footnote-reference-2">1</a><span class="fn-bracket">]</span></span>
<p>As it turns out, I actually wrote <strong>seven</strong> Savage Worlds
adventures in <span class="app">troff</span>, and then converted them all to
<span class="app">LaTeX</span> and <span class="app">ConTeXt</span> for comparison later. I actually
wrote the first <strong>three</strong> adventures using <a class="reference external" href="https://www.libreoffice.org/">LibreOffice</a>, a conventional office suite with
a word processor, something I normally dislike but was giving
another chance. I decided after three adventures that I wasn't
going to do another in <span class="app">LibreOffice</span>, and started looking for
alternatives, hence comparing markup languages. I tend to like
markup languages better than <a class="reference external" href="https://en.wikipedia.org/wiki/WYSIWYG">WYSIWYG</a> editors; this may just be the
programmer in me liking the idea of languages over WYSIWYG, but
there did turn out to be significant advantages to switching to a
markup language in the end. The primary one was that I could put
character and creature descriptions in external files and reference
them in the main file, rather than cut and paste them from one
document to another, which meant I could just change the external
file and it would automatically be included in the updated document
next time I generated output from it. With a WYSIWYG tool I'd have
had to go back and cut and paste the changed material in every
document every time I changed it, which would be immensely tedious
and horribly error prone and all too common.</p>
</aside>
<aside class="footnote brackets" id="footnote-3" role="doc-footnote">
<span class="label"><span class="fn-bracket">[</span><a role="doc-backlink" href="https://tkurtbond.github.io/posts/2020/12/01/epub-files-markup-languages-and-briefly-unix/#footnote-reference-3">2</a><span class="fn-bracket">]</span></span>
<p>This footnote is about <span class="app">LaTeX</span>, <span class="app">ConTeXt</span>, and
<span class="app">troff</span>, and peripherally about <a class="reference external" href="https://en.wikipedia.org/wiki/TeX">TeX</a>, the progenitor of
<span class="app">LaTeX</span> and <span class="app">ConTeXt</span>. <span class="app">Troff</span> was one of the
earliest computer typesetting systems, invented in 1973 as a scheme
at the computer science portion of Bell Labs to get a PDP-11 so
they could have an <a class="reference external" href="https://en.wikipedia.org/wiki/Time-sharing">time-sharing</a> operating system,
like the earlier <a class="reference external" href="https://en.wikipedia.org/wiki/Multics">Multics</a> that ran on much more
expensive hardware and that the researchers had worked on
previously and looked back fondly after Bell Labs pulled out of
that research. Bell Labs wouldn't just pay for a computer for the
researchers to play with, so they proposed developing a computer
typesetting system for the secretaries to use, largely for patent
submission, something Bell Labs did a lot of. Their scheme
succeeded and as a result they invented Unix and <span class="app">troff</span>.</p>
<p>So, Unix was invented <strong>explicitly</strong> to run <span class="app">troff</span>!</p>
<p><span class="app">TeX</span>, by contrast, was not invented until 1978, <span class="app">LaTeX</span> in
1985, and <cite>ConTeXt</cite> not until 1990! (I wish I'd found out about the
latter earlier!) <span class="app">TeX</span> was invented because of <a class="reference external" href="https://en.wikipedia.org/wiki/Donald_Knuth">Donald
Knuth</a>'s desire to produce gloriously typeset books with
mathematics for his multi-volume work <a class="reference external" href="https://en.wikipedia.org/wiki/The_Art_of_Computer_Programming">The Art of Computer
Programming</a>.
He finished <span class="app">TeX</span> long ago, but is still working on those
books.</p>
<p>All of these typesetting systems have what are called markup
languages, where the text of the document is interspersed with
commands distinguished in some way from the regular text. For
instance, the command <code class="docutils literal">\begin{document}</code> from <span class="app">LaTeX</span> is
typical of <span class="app">TeX</span>, <span class="app">LaTeX</span>, and <span class="app">ConTeXt</span>, all of
which are related. <span class="app">Troff</span> uses backslash commands in the
middle of text and and commands on separate lines starting with
periods but historically those commands have been limited to names
of two characters, though this was relaxed in the later
<span class="app">troff</span> implementation <span class="app">groff</span>, and in the <span class="app">Heirloom
troff</span> implementation
which extended the second <span class="app">troff</span> implementation,
<span class="app">ditroff</span>, with similar features as gnu groff, but easier
access to modern fonts.</p>
<p>I am particularly impressed by <span class="app">troff</span>'s ability to
correctly typeset documents that I wrote 30 years ago and that
others have written even earlier. It has never failed me in this
task.† This has often been a problem for me with documents from
WYSIWYG systems on the contrary, even when those documents were
more recently created, including one significant one
from 2004. (<a class="reference external" href="https://en.wikipedia.org/wiki/StarOffice">Star Office</a>, I'm looking at
you!‡). <span class="app">LaTeX</span> is reasonably backward compatible, though it
did go through some big changes earlier it is now mostly stable. I
did experience some compatibility problems, minor with my documents
and major with complicated documents written by others.
<span class="app">ConTeXt</span> is generally stable, but it is developing rapidly so
has more changes, though the developers are good about backward
compatibility. The increasing sophistication of <span class="app">ConTeXt</span>,
which along its development has subsumed both <span class="app">TeX</span> and
<a class="reference external" href="https://en.wikipedia.org/wiki/MetaPost">MetaPost</a> and combined and extended them with the Lua scripting
language (mentioned again below), producing something that is even
more flexible and impressive than <span class="app">TeX</span> and <span class="app">LaTeX</span>.</p>
<p>Another thing I like about markup languages is the fact that they
are <a class="reference external" href="https://en.wikipedia.org/wiki/Plain_text">plain text</a>‖, and can be manipulated with any program you
want. Before the emergence of <a class="reference external" href="https://en.wikipedia.org/wiki/XML">XML</a>-based WYSIWYG document formats
in <span class="app">Microsoft Word</span>§ and <span class="app">Star Office</span> this was
practically impossible. Even now the complexity of the ZIP file and
XML markup renders this much much more unpleasant to deal
with. Kicking dead whales down the beach indeed! Being able to use
any tool at all on a document is considerably more useful than
being limited to the poor extensions languages of <span class="app">Microsoft
Word</span> and <span class="app">LibreOffice</span>, and usually much simpler.</p>
<p>† I <strong>have</strong> had to change a few external programs I've written to
help in the build process. <a class="reference external" href="https://www.perl.org/">Perl</a> was a problem here. (I tried to resist
the footnote within the footnote, but again, need must when the devil
drives.)</p>
<p>‡ Sure, the current <span class="app">LibreOffice</span> will open the file, but the
formatting is significantly messed up. Earlier versions, if I
remember correctly, did not open the file correctly.</p>
<p>§ I have never written a document in <span class="app">Microsoft Word</span> for my
personal use, though unfortunately I have used it often at work.</p>
<p>‖ I have delightedly taken to using Unicode characters in my plain text
documents, as the ReStructuredText source of this document shows.</p>
</aside>
<aside class="footnote brackets" id="footnote-4" role="doc-footnote">
<span class="label"><span class="fn-bracket">[</span><a role="doc-backlink" href="https://tkurtbond.github.io/posts/2020/12/01/epub-files-markup-languages-and-briefly-unix/#footnote-reference-4">3</a><span class="fn-bracket">]</span></span>
<p>Lightweight markup languages, in contrast with <span class="app">TeX</span>,
<span class="app">LaTeX</span>, <span class="app">ConTeXt</span>, and <span class="app">troff</span>, are usually things
that start with the conventions like indicating *italics* by
surrounding phrases in plain text email messages and <a class="reference external" href="https://en.wikipedia.org/wiki/Usenet">USENET</a> posts
around them in the olden days. Most of them avoid the use of lots
of keywords and backslashes, of the sort <span class="app">TeX</span>, <span class="app">LaTeX</span>,
<span class="app">ConTeXt</span>, and to a partial extent <span class="app">troff</span> use. Instead,
they largely try to use the non-alphanumeric characters on a
standard keyboard to indicate how the text should be typeset, and
without using long command names. The lack of these long command
names (or short ones in <span class="app">troff</span>'s case) and the relatively
unobtrusive nature of the non-alphanumeric characters makes
documents easier to read. This is why they are called “lightweight”
markup languages. <a class="reference external" href="http://Lightweight_markup_language">Wikipedia</a> has a good article that
explains and compares them. Another advantage of most lightweight
markup languages is that since they don't generally use command
names, native speakers of languages other than English don't have
to learn English command names, a significant matter.</p>
<p>I happen to prefer ReStructuredText, but Markdown is another very
popular lightweight markup language that I sometimes use.</p>
<p>Another advantage to lightweight markup languages such as
ReStructuredText and Markdown is that they often have programs allowing
multiple kinds of output from them (PDF and HTML are typical) and since
lightweight markup languages make no pretensions to being programming
languages, which the markup languages of the original typesetting
systems do (since that was how they allowed customization and
extension), writing the programs to output multiple output types for
lightweight markup languages is simpler than than writing programs to
parse the heavy markup languages, which is the common approach that
people take to get HTML from <span class="app">LaTeX</span>, for instance. The fact that heavy
markup languages are usually <a class="reference external" href="https://en.wikipedia.org/wiki/Turing_completeness">Turing
complete</a> and so
can be extensively (and definitely are in practice) extended and often
have programmable syntax makes processing them with other tools
difficult and usually require much hand conversion. It is my impression
that while <span class="app">LaTeX</span> to HTML translators like
<a class="reference external" href="https://tug.org/tex4ht/">TeXht</a> and
<a class="reference external" href="http://hevea.inria.fr/">HEVEA</a> are very good for documents that only
use the standard features of <span class="app">LaTeX</span> they can't deal easily with heavily
programmed documents, since that would require more semantic
understanding of the original <span class="app">LaTeX</span> source.</p>
<p>One interesting attempt in this direction for <span class="app">troff</span> was the <a class="reference external" href="http://www-rn.informatik.uni-bremen.de/software/unroff/">unroff</a>
program, written in <a class="reference external" href="http://sam.zoy.org/elk/">Elk Scheme</a>. It
took the approach of implementing a complete <span class="app">troff</span> parser and
proving Scheme as an extension language so you could completely
customize the output. It provided a complete implementation for
the <code class="docutils literal">troff <span class="pre">-ms</span></code> macros, and I was easily able to extend those to
handle cross references and indexes that I had extended that <span class="app">troff</span>
document's build process to provide, in 170 lines of Scheme.</p>
</aside>
<aside class="footnote brackets" id="footnote-5" role="doc-footnote">
<span class="label"><span class="fn-bracket">[</span><a role="doc-backlink" href="https://tkurtbond.github.io/posts/2020/12/01/epub-files-markup-languages-and-briefly-unix/#footnote-reference-5">4</a><span class="fn-bracket">]</span></span>
<p>In particular, there was no standard name for the commands used
to generate various kinds of output; on some operating systems it
was rst2latex, and on others it was rst2latex.py. Also, the
<span class="app">docutils</span> toolchain for producing PDF output generated
intermediate <span class="app">LaTeX</span> files which necessitated processing with
further tools, which usually necessitated writing a <a class="reference external" href="https://en.wikipedia.org/wiki/Makefile">Makefile</a> so I didn't have to
retype multiple commands whenever I regenerated the output
document. For a simple document that was considerable hassle and
overhead, even when worth it for a more complicated
document. (Makefiles are well worth it for complicated documents
with complicated build processes, of course. I have lots of those.)</p>
</aside>
<aside class="footnote brackets" id="footnote-6" role="doc-footnote">
<span class="label"><span class="fn-bracket">[</span><a role="doc-backlink" href="https://tkurtbond.github.io/posts/2020/12/01/epub-files-markup-languages-and-briefly-unix/#footnote-reference-6">5</a><span class="fn-bracket">]</span></span>
<p>As a result of the comparison, I decided that I greatly
prefered ReStructuredText and <span class="app">pandoc</span> for the tool to process
it. <span class="app">Pandoc</span>'s ability to customize its output using
<a class="reference external" href="https://pandoc.org/lua-filters.html">filters</a> written in the
programming language <a class="reference external" href="https://www.lua.org/">Lua</a> was
particularly appealing, as was the ability to customize its default
templates for generating output using the <code class="docutils literal">troff <span class="pre">-ms</span></code> macros and
<span class="app">ConTeXt</span>. I see a use for both of those, since the -ms output
is easier to customize for things that the base -ms provides, but
the <span class="app">ConTeXt</span> output offers greater control over the final
appearance, though often at the cost of greater effort. For
instance, I have a moderately long document† that is currently in
<a class="reference external" href="https://docbook.org/">DocBook</a> 5.0 XML format, and I now find
it tedious to edit and the open source tool for generating PDF from
it has serious flaws. (I'm resisting another footnote in a
footnote. Be impressed that I succeeded!) I can see how I can
convert it to ReStructuredText (or Markdown, for that matter) and
use <span class="app">Pandoc</span>'s <span class="app">ConTeXt</span> output to produce a nicer, more
attractive PDF. Now I just need the time to write the lua filter
and do the conversion. (<span class="app">Pandoc</span> will convert it from DocBook, but
will lose the indexing information, which I would have to do all
over again, a task with more work than I want to contemplate at the
moment.)</p>
<p>I still find uses for <span class="app">troff</span> and <span class="app">ConTeXt</span>. In
particular, if I have to use complicated tables in a document I
find that the either <span class="app">troff</span> or <span class="app">ConTeXt</span> works
better. (Simple tables for either are OK from ReStructuredText
output, but complicated ones…!)</p>
<p>† The DocBook version of the document was derived from the <code class="docutils literal">troff <span class="pre">-ms</span></code>
source mentioned previously, though by the time the
conversion happened I vaguely recall I no longer had access to a working
<span class="app">unroff</span>, I think because of <a class="reference external" href="https://en.wikipedia.org/wiki/Software_rot">bitrot</a>. <a class="reference external" href="http://netbsd.org/">NetBSD</a> has an <span class="app">unroff</span>
<a class="reference external" href="https://pkgsrc.se/textproc/unroff">package</a> in its pkgsrc
collection of program, and I could install it now on my NetBSD
machine, but when I tried to process the document <span class="app">unroff</span> exited
complaining about a syntax error in one of its Scheme files. So bitrot
seems to prevail.</p>
</aside>
<aside class="footnote brackets" id="footnote-7" role="doc-footnote">
<span class="label"><span class="fn-bracket">[</span><a role="doc-backlink" href="https://tkurtbond.github.io/posts/2020/12/01/epub-files-markup-languages-and-briefly-unix/#footnote-reference-7">6</a><span class="fn-bracket">]</span></span>
<p>Using an open source command line utility provided with
<a class="reference external" href="https://calibre-ebook.com/">Calibre</a> ebook reader, of course!</p>
</aside>
<aside class="footnote brackets" id="footnote-8" role="doc-footnote">
<span class="label"><span class="fn-bracket">[</span><a role="doc-backlink" href="https://tkurtbond.github.io/posts/2020/12/01/epub-files-markup-languages-and-briefly-unix/#footnote-reference-8">7</a><span class="fn-bracket">]</span></span>
<p>There are some oddities in the current build with the conversion to
mobi complaining about fonts not being found in the right places and
being deleted from the result, but I don't know enough about ebooks to
debug it at this time. Besides, I've hit the auspicious footnote number
seven (though it's not the seventh footnote, as it is actually the
eighth!) and should really finish this email now.</p>
</aside>
</aside>
<p><strong>P.P.S.</strong> <em>Omitted for irrelevance</em>.</p>
<p><strong>P.P.P.S.</strong> Sorry, no deeply nested parenthetical expressions this
time!</p>
<hr class="docutils">
<p>Here's an addendum with two Apple <span class="app">Messenger</span> messages to P.,
reflecting on converting this from an HTML email into a blog post:</p>
<p>The HTML dialect Google uses in its MIME emails is very odd. It
doesn’t use <code class="docutils literal"><p></code> elements, using instead <code class="docutils literal"><div></code> elements.
Unfortunately, <span class="app">pandoc</span> converts those into containers, and nests
them according to the nesting of the <code class="docutils literal"><div></code> elements. To fix this
I hand edited the HTML to remove the outer <code class="docutils literal"><div></code> elements and
convert the remaining ones into <p>s. Also, for some reason when I
ran the documents through HTML tidy it converted the unicode
characters into incorrect HTML character entities. I see now that it
has a <code class="docutils literal"><span class="pre">-utf8</span></code> switch, which I’ll have to remember for the next time
I do this. (There will inevitably be a next time.)</p>
<p>OMG, now I have have to put that in the blog post! How many <a class="reference external" href="https://en.wikipedia.org/wiki/Saving_throw">saving
throws</a> am I going to fail today anyway?</p>
<p><em>Last edited: 2021-07-17 00:53:29 EDT</em></p>
<!-- Local Variables:
time-stamp-format: "%Y-%02m-%02d %02H:%02M:%02S %Z"
time-stamp-start: "\\*Last edited:[ \t]+\\\\?"
time-stamp-end: "\\*\\\\?\n"
time-stamp-line-limit: -20
End: -->Paragraph Justification in groff and TeXhttps://tkurtbond.github.io/posts/2020/07/31/paragraph-justification-in-groff-and-tex/2020-07-31T22:42:15-04:002020-07-31T22:42:15-04:00T. Kurt Bond<p>An interesting message thread developed on the groff <a class="reference external" href="https://lists.gnu.org/archive/html/groff/">mailing list</a>
about various features of <span class="app">Groff</span> and <span class="app">Heirloom Troff</span>, with
a mention of <span class="app">neatroff</span>. In particular, Steve Izma's post (<a class="reference external" href="https://lists.gnu.org/archive/html/groff/2020-07/msg00092.html">P1</a>)
discussed how he found <span class="app">TeX</span>'s paragraph-at-a-time justification
required as much tweaking as groff's simpler paragraph justification.
That lead to Peter Schaffter's post (<a class="reference external" href="https://lists.gnu.org/archive/html/groff/2020-07/msg00101.html">P2</a>) linking to an earlier post
(<a class="reference external" href="https://lists.gnu.org/archive/html/groff/2014-03/msg00322.html">P3</a>) where he proposed a simpler algorithm to improve <span class="app">Groff</span>'s
line-breaking and justification than the full Knuth-Plass
Line-Breaking Algorithm (<a class="reference external" href="http://www.eprg.org/G53DOC/pdfs/knuth-plass-breaking.pdf">KP1</a>, <a class="reference external" href="http://litherum.blogspot.com/2015/07/knuth-plass-line-breaking-algorithm.html">D1</a>) that <span class="app">TeX</span> uses. While
writing this post I ran across another paper, “Global multiple
objective line breaking” by Alex Holkner (<a class="reference external" href="https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.585.8487">GMOLB1</a>, <a class="reference external" href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.585.8487&rep=rep1&type=pdf">GMOLB2</a>) that explores
another line breaking algorithm and references some of the other
papers on the subject.</p>