Sorting words separated by commas
I often have lists of "words", separated by commas, possibly on multiple lines, like this example from a Makefile:
# bookman, schoolbook, palatino, times, # helvetica, helvetica-narrow, optima, cormorant-garamond, # or ebgaramond.
I find these lists are always getting out of order, or they end up with some short lines and some long lines. I want to be able to reformat them automatically, like this:
# bookman, cormorant-garamond, ebgaramond, helvetica, helvetica-narrow, # optima, palatino, schoolbook, or times.
So, I wrote three scripts to deal with them, sort-with-commas, strip-leading-hash to get rid of the leading hashes and spaces, and prefix to put the leading hashes and spaces back.
Now, above I said "words", because really it's anything separated by commas, so the "words" can contain space, etc.
Also, notice that the period after "ebgaramond" and the "or" before
"ebgaramond"` in the original list disappear, and an "or " appears
before the new end of the list, "times", and a period follows it. And
you can have have the same situation with "and". So, the -p
option to sort-with-commas adds a period after the last
word, the -a
option adds and "and " before the last word, and the
-o
option adds an "or " before the last word. If you are sorting
only part of a list, you want to have a comma after the last "word",
so there is the option -f
for that. And to remove the the period
from the original list, so it doesn't end up in the middle of the new
list, or to remove "and " or "or ", there is the -r
option.
The default is to return the sorted list as one long line, but you can easily reformat it to multiple lines by running it through the Unix command fmt.
Although in this case the list is prefixed with a "#" and some spaces because it comes from a comment in a Makefile, you have to remove those to sort the list. I wrote the script strip-leading-hash to do that, too, rather than having to remember the sed command to so that all the time.
So, to sort the original list I'd run the command
which means “strip the leading hashes and spaces, remove the trailing period and the "and " or "or ", add a final period after the last word, add an "or " before the final word, reformat as a paragraph, and prefix the lines with the hash and spaces.”
When I use this I'm usually in emacs and using M-| to run it on the region (the currently selected text), often with the C-u to replace the region with results.
Here's the main script, sort-with-commas:
#! /usr/bin/env bash ############################################################################### # Sort a list of words that are seperated by commas, optionally followed by # a newline into a single line seperated by commas followed by spaces. # # For example: it translates (ignore the "# +" at the beginning of lines) # bookman, schoolbook,palatino, # times, helvetica, helvetica-narrow, # to # bookman, helvetica, helvetica-narrow, palatino, schoolbook, times ############################################################################### AND_OPT=off # Insert "and " before last word. FINAL_OPT=off # Leave "," after last word. OR_OPT=off # Insert "or " before last word. PERIOD_OPT=off # Insert a final period after last word. REMOVE_AND_OR_PERIOD_OPT=off let errors=0 while getopts "?afhopr" opt do case "$opt" in (\?|h) let errors++ ;; (a) AND_OPT=on ;; (f) FINAL_OPT=on ;; (o) OR_OPT=on ;; (p) PERIOD_OPT=on ;; (r) REMOVE_AND_OR_PERIOD_OPT=on ;; esac done shift $((OPTIND-1)) [[ $# > 0 ]] || [[ $errors > 0 ]] && { cat <<EOF usage: sort-with-commas [OPTION] This reads its standard input and sorts a line or multiple lines with "words" separated by commas, then reassembles the line, words separated by a comma and s space, optionally leaving a final comma after the last word, or a period, and optionally putting "and " or "or " before the last word. Options -? -h This message. -a Insert "and " before last word. -f Leave final comma after last word. -o Insert "or " before last word. -p Insert a period after the last word. -r Remove "and " or "or " that occur at the beginning of a "word" in the original list. Note that combining -a and -o, or -f and -p do what you say, but the results are silly. EOF exit 1 } tr ',' '\n' | sed -E -e 's/^[ \t]+//' -e '/^$/d' | ([[ "$REMOVE_AND_OR_PERIOD_OPT" = "on" ]] && sed -E -e 's/^(and|or)[ \t]+//' -e 's/\.[ \t]*$//' || cat) | sort -u | sed -E -e 's/$/,/' | (if [[ "$AND_OPT" = "on" ]]; then sed -e '$s/^/and /'; else cat; fi) | (if [[ "$FINAL_OPT" = "on" ]]; then cat; else sed -e '$s/,//'; fi) | (if [[ "$OR_OPT" = "on" ]]; then sed -e '$s/^/or /'; else cat; fi) | (if [[ "$PERIOD_OPT" = "on" ]]; then sed -e '$s/$/./'; else cat; fi) | tr '\n' ' ' | sed -E -e 's/[ ]$//'
Here's strip-leading-hash:
And here's prefix:
Comments
Comments powered by Disqus