Lacking Natural Simplicity

Random musings on books, code, and tabletop games.

Text Subtleties

I just noticed that when wget tells you the filename of file it just saved, if your LANG=C then it surrounds it with apostrophes ('), but if your LANG=en_US.UTF-8 then it surrounds it with Unicode LEFT SINGLE QUOTATION MARK (‘)and RIGHT SINGLE QUOTATION MARK (’). I appreciate little subtleties like that.

I use Unicode characters in most of the writing I do. For LaTeX, which I rarely use these days, I use XeTeX, which understands UTF-8 natively. ConTeXt, which I do use regularly, also understands UTF-8 natively. For groff I use the -k switch, which preprocesses the text with preconv (which is part of groff), converting the UTF-8 characters into groff character escapes, since groff doesn't understand UTF-8 natively. Of course, if it is ReStructuredText that I'm working with then pandoc can be configured to use any one of LaTeX, ConTeXt, and groff for creating PDF output, and since rst2html.py just produces LaTeX that includes any character you put in your source you can just use xelatex as part of your commands to turn it into PDF. And sometimes, when I'm feeling whimsical, I use Heirloom Troff, from the Heirloom Documentation Tools, which understands UTF-8 natively.

Last edited: 2020-08-03 16:02:52 EDT

Print Friendly and PDF

Comments

Comments powered by Disqus