
Using Python's os.walk function to walk through a tree of files and directories.
I'm a web guy; I put together my first site in early 1993. And so, when I started to do Python training, I assumed that most of my students also were going to be web developers or aspiring web developers. Nothing could be further from the truth. Although some of my students certainly are interested in web applications, the majority of them are software engineers, testers, data scientists and system administrators.
This last group, the system administrators, usually comes into my course with the same story. The company they work for has been writing Bash scripts for several years, but they want to move to a higher-level language with greater expressiveness and a large number of third-party add-ons. (No offense to Bash users is intended; you can do amazing things with Bash, but I hope you'll agree that the scripts can become unwieldy and hard to maintain.)
It turns out that with a few simple tools and ideas, these system administrators can use Python to do more with less code, as well as create reports and maintain servers. So in this article, I describe one particularly useful tool that's often overlooked: os.walk, a function that lets you walk through a tree of files and directories.
os.walk Basics
Linux users are used to the ls
command to get a list of files in a
directory. Python comes with two different functions that can return
the list of files. One is os.listdir
, which means the "listdir"
function in the "os" package. If you want, you can pass the name of a
directory to os.listdir
. If you don't do that, you'll get the names
of files in the current directory. So, you can say:
In [10]: import os
When I do that on my computer, in the current directory, I get the following:
In [11]: os.listdir('.')
Out[11]:
['.git',
'.gitignore',
'.ipynb_checkpoints',
'.mypy_cache',
'Archive',
'Files']
As you can see, os.listdir
returns a list of strings, with each
string being a filename. Of course, in UNIX-type systems, directories
are files too—so along with files, you'll also see subdirectories
without any obvious indication of which is which.
I gave up on os.listdir
long ago, in favor of
glob.glob
, which means
the "glob" function in the "glob" module. Command-line users are used
to using "globbing", although they often don't know its name. Globbing
means using the * and ? characters, among others, for more flexible
matching of filenames. Although os.listdir
can return the list of
files in a directory, it cannot filter them. You can though with
glob.glob
: