Back to Basics: Sort and Uniq

Learn the fundamentals of sorting and de-duplicating text on the command line.

If you've been using the command line for a long time, it's easy to take the commands you use every day for granted. But, if you're new to the Linux command line, there are several commands that make your life easier that you may not stumble upon automatically. In this article, I cover the basics of two commands that are essential in anyone's arsenal: sort and uniq.

The sort command does exactly what it says: it takes text data as input and outputs sorted data. There are many scenarios on the command line when you may need to sort output, such as the output from a command that doesn't offer sorting options of its own (or the sort arguments are obscure enough that you just use the sort command instead). In other cases, you may have a text file full of data (perhaps generated with some other script), and you need a quick way to view it in a sorted form.

Let's start with a file named "test" that contains three lines:


Foo
Bar
Baz

sort can operate either on STDIN redirection, the input from a pipe, or, in the case of a file, you also can just specify the file on the command. So, the three following commands all accomplish the same thing:


cat test | sort
sort < test
sort test

And the output that you get from all of these commands is:


Bar
Baz
Foo

Sorting Numerical Output

Now, let's complicate the file by adding three more lines:


Foo
Bar
Baz
1. ZZZ
2. YYY
11. XXX

If you run one of the above sort commands again, this time, you'll see different output:


11. XXX
1. ZZZ
2. YYY
Bar
Baz
Foo

This is likely not the output you wanted, but it points out an important fact about sort. By default, it sorts alphabetically, not numerically. This means that a line that starts with "11." is sorted above a line that starts with "1.", and all of the lines that start with numbers are sorted above lines that start with letters.

To sort numerically, pass sort the -n option:


sort -n test

Bar
Baz
Foo
1. ZZZ
2. YYY
11. XXX

Find the Largest Directories on a Filesystem

Numerical sorting comes in handy for a lot of command-line output—in particular, when your command contains a tally of some kind, and you want to see the largest or smallest in the tally. For instance, if you want to find out what files are using the most space in a particular directory and you want to dig down recursively, you would run a command like this: