Lacking Natural Simplicity (Posts about sorting)https://tkurtbond.github.io/categories/sorting.atom2024-01-23T18:49:39ZT. Kurt BondNikolaSorting differs between LANG=C and LANG=en_US.UTF-8, even in lshttps://tkurtbond.github.io/posts/2021/07/28/sorting-differs-between-langc-and-langen_usutf-8-even-in-ls/2021-07-28T05:05:50-04:002021-07-28T05:05:50-04:00T. Kurt Bond<p>This is here mostly to give me a concrete example of this happening.</p>
<p>I have a directory, with two files in it, <span class="file">ab.txt</span> and
<span class="file">a-c.txt</span>. Which sorts first when I do a <span class="command">ls</span>? It
depends on what locale is set. Like this:</p>
<div class="code"><pre class="code bash"><a id="rest_code_aa17191c81b9407b8f5f0777723de01f-1" name="rest_code_aa17191c81b9407b8f5f0777723de01f-1" href="https://tkurtbond.github.io/posts/2021/07/28/sorting-differs-between-langc-and-langen_usutf-8-even-in-ls/#rest_code_aa17191c81b9407b8f5f0777723de01f-1"></a>$<span class="w"> </span><span class="nv">LANG</span><span class="o">=</span>en_US.UTF-8<span class="w"> </span>ls
<a id="rest_code_aa17191c81b9407b8f5f0777723de01f-2" name="rest_code_aa17191c81b9407b8f5f0777723de01f-2" href="https://tkurtbond.github.io/posts/2021/07/28/sorting-differs-between-langc-and-langen_usutf-8-even-in-ls/#rest_code_aa17191c81b9407b8f5f0777723de01f-2"></a>ab.txt<span class="w"> </span>a-c.txt
<a id="rest_code_aa17191c81b9407b8f5f0777723de01f-3" name="rest_code_aa17191c81b9407b8f5f0777723de01f-3" href="https://tkurtbond.github.io/posts/2021/07/28/sorting-differs-between-langc-and-langen_usutf-8-even-in-ls/#rest_code_aa17191c81b9407b8f5f0777723de01f-3"></a>$<span class="w"> </span><span class="nv">LANG</span><span class="o">=</span>C<span class="w"> </span>ls
<a id="rest_code_aa17191c81b9407b8f5f0777723de01f-4" name="rest_code_aa17191c81b9407b8f5f0777723de01f-4" href="https://tkurtbond.github.io/posts/2021/07/28/sorting-differs-between-langc-and-langen_usutf-8-even-in-ls/#rest_code_aa17191c81b9407b8f5f0777723de01f-4"></a>a-c.txt<span class="w"> </span>ab.txt
</pre></div>
<p>Same thing happens with the <span class="command">sort</span> command:</p>
<div class="code"><pre class="code bash"><a id="rest_code_a1889f87e0a34c97b7e5c99590df45ff-1" name="rest_code_a1889f87e0a34c97b7e5c99590df45ff-1" href="https://tkurtbond.github.io/posts/2021/07/28/sorting-differs-between-langc-and-langen_usutf-8-even-in-ls/#rest_code_a1889f87e0a34c97b7e5c99590df45ff-1"></a>$<span class="w"> </span>cat<span class="w"> </span>>a-c-first.txt
<a id="rest_code_a1889f87e0a34c97b7e5c99590df45ff-2" name="rest_code_a1889f87e0a34c97b7e5c99590df45ff-2" href="https://tkurtbond.github.io/posts/2021/07/28/sorting-differs-between-langc-and-langen_usutf-8-even-in-ls/#rest_code_a1889f87e0a34c97b7e5c99590df45ff-2"></a>a-c.txt
<a id="rest_code_a1889f87e0a34c97b7e5c99590df45ff-3" name="rest_code_a1889f87e0a34c97b7e5c99590df45ff-3" href="https://tkurtbond.github.io/posts/2021/07/28/sorting-differs-between-langc-and-langen_usutf-8-even-in-ls/#rest_code_a1889f87e0a34c97b7e5c99590df45ff-3"></a>ab.txt
<a id="rest_code_a1889f87e0a34c97b7e5c99590df45ff-4" name="rest_code_a1889f87e0a34c97b7e5c99590df45ff-4" href="https://tkurtbond.github.io/posts/2021/07/28/sorting-differs-between-langc-and-langen_usutf-8-even-in-ls/#rest_code_a1889f87e0a34c97b7e5c99590df45ff-4"></a>$<span class="w"> </span>cat<span class="w"> </span>>ab-first.txt
<a id="rest_code_a1889f87e0a34c97b7e5c99590df45ff-5" name="rest_code_a1889f87e0a34c97b7e5c99590df45ff-5" href="https://tkurtbond.github.io/posts/2021/07/28/sorting-differs-between-langc-and-langen_usutf-8-even-in-ls/#rest_code_a1889f87e0a34c97b7e5c99590df45ff-5"></a>ab.txt
<a id="rest_code_a1889f87e0a34c97b7e5c99590df45ff-6" name="rest_code_a1889f87e0a34c97b7e5c99590df45ff-6" href="https://tkurtbond.github.io/posts/2021/07/28/sorting-differs-between-langc-and-langen_usutf-8-even-in-ls/#rest_code_a1889f87e0a34c97b7e5c99590df45ff-6"></a>a-c.txt
<a id="rest_code_a1889f87e0a34c97b7e5c99590df45ff-7" name="rest_code_a1889f87e0a34c97b7e5c99590df45ff-7" href="https://tkurtbond.github.io/posts/2021/07/28/sorting-differs-between-langc-and-langen_usutf-8-even-in-ls/#rest_code_a1889f87e0a34c97b7e5c99590df45ff-7"></a>$<span class="w"> </span><span class="nv">LANG</span><span class="o">=</span>C<span class="w"> </span>sort<span class="w"> </span>ab-first.txt
<a id="rest_code_a1889f87e0a34c97b7e5c99590df45ff-8" name="rest_code_a1889f87e0a34c97b7e5c99590df45ff-8" href="https://tkurtbond.github.io/posts/2021/07/28/sorting-differs-between-langc-and-langen_usutf-8-even-in-ls/#rest_code_a1889f87e0a34c97b7e5c99590df45ff-8"></a>a-c.txt
<a id="rest_code_a1889f87e0a34c97b7e5c99590df45ff-9" name="rest_code_a1889f87e0a34c97b7e5c99590df45ff-9" href="https://tkurtbond.github.io/posts/2021/07/28/sorting-differs-between-langc-and-langen_usutf-8-even-in-ls/#rest_code_a1889f87e0a34c97b7e5c99590df45ff-9"></a>ab.txt
<a id="rest_code_a1889f87e0a34c97b7e5c99590df45ff-10" name="rest_code_a1889f87e0a34c97b7e5c99590df45ff-10" href="https://tkurtbond.github.io/posts/2021/07/28/sorting-differs-between-langc-and-langen_usutf-8-even-in-ls/#rest_code_a1889f87e0a34c97b7e5c99590df45ff-10"></a>$<span class="w"> </span><span class="nv">LANG</span><span class="o">=</span>en_US.UTF-8<span class="w"> </span>sort<span class="w"> </span>a-c-first.txt
<a id="rest_code_a1889f87e0a34c97b7e5c99590df45ff-11" name="rest_code_a1889f87e0a34c97b7e5c99590df45ff-11" href="https://tkurtbond.github.io/posts/2021/07/28/sorting-differs-between-langc-and-langen_usutf-8-even-in-ls/#rest_code_a1889f87e0a34c97b7e5c99590df45ff-11"></a>ab.txt
<a id="rest_code_a1889f87e0a34c97b7e5c99590df45ff-12" name="rest_code_a1889f87e0a34c97b7e5c99590df45ff-12" href="https://tkurtbond.github.io/posts/2021/07/28/sorting-differs-between-langc-and-langen_usutf-8-even-in-ls/#rest_code_a1889f87e0a34c97b7e5c99590df45ff-12"></a>a-c.txt
</pre></div>
<p>I found this surprising when it first broke some code I was using,
since I'd spent so much of my existence in LANG=C (I'm still surprised
that Unicode has existed for more than half my life), but once I set
LANG=en_US.UTF-8 because I was using UTF-8 characters in my documents
I found I was in a different (sorting) world.</p>