Chris Martin

Fun with find

find is one of many simple yet powerful command-line tools, and I haven’t until recently had a chance to learn how to use it. Oh, the things you can do. Here’s a fun example Matt Luongo and I worked out a few days ago, plus a bit I added on just now:

find -name "*.java" \
    | xargs wc -l   \
    | sort -nr      \
    | head -n 5     \
    | sed -r -e 's/\.\/(.*\/)*//g'

find recursively enumerates all of the files within the current directory which match the name *.java.

The result of that gets piped to xargs, which takes each line (a file path) and sends them as arguments to wc (word count). The -l flag tells wc to count the number of lines in each file, and also give a total count.

This is a pretty big list of files, and the most interesting results are the largest ones. So, the next pipe goes to sort. -n tells it sort by number, not alphabetically, and -r sorts in reverse other to put the biggest numbers on top.

It’s sorted, but I still only care about the first few, so I pipe that into head, whose -n parameter lets me specify that I only want the first 5 lines.

Finally, the line length of the output is pretty long, because it’s showing the entire relative path to the files, which are somewhat deeply nested. I only want to see the filenames. Fortunately, this is nothing that a simple regex can’t handle, and sed is up to the task with a search-and-replace which yields exactly what I’m looking for.

I ran this from the SVN directory for my software engineering project to see how much code we’d produced. The result:

 18208 total
   706 Connection.java
   636 ColabClient.java
   632 ColabClientGUI.java
   506 ParagraphEditor.java

Fascinating.

Instead of writing this, I really should have spent the last half hour sleeping. I do only have one more day to debug 18208 lines of Java for our final demo.

I write about Haskell and related topics; you can find my works online on Type Classes and in print from The Joy of Haskell.