Lacking Natural Simplicity

Random musings on books, code, and tabletop games.

Sorting differs between LANG=C and LANG=en_US.UTF-8, even in ls

This is here mostly to give me a concrete example of this happening.

I have a directory, with two files in it, ab.txt and a-c.txt. Which sorts first when I do a ls? It depends on what locale is set. Like this:

$ LANG=en_US.UTF-8 ls
ab.txt       a-c.txt
$ LANG=C ls
a-c.txt  ab.txt

Same thing happens with the sort command:

$ cat >a-c-first.txt
a-c.txt
ab.txt
$ cat >ab-first.txt
ab.txt
a-c.txt
$ LANG=C sort ab-first.txt
a-c.txt
ab.txt
$ LANG=en_US.UTF-8 sort a-c-first.txt
ab.txt
a-c.txt

I found this surprising when it first broke some code I was using, since I'd spent so much of my existence in LANG=C (I'm still surprised that Unicode has existed for more than half my life), but once I set LANG=en_US.UTF-8 because I was using UTF-8 characters in my documents I found I was in a different (sorting) world.

Print Friendly and PDF

Comments

Comments powered by Disqus