Automate Sysadmin Tasks with Python’s os.walk Function

Python logo

Using Python's os.walk function to walk through a tree of files and directories.

I'm a web guy; I put together my first site in early 1993. And so, when I started to do Python training, I assumed that most of my students also were going to be web developers or aspiring web developers. Nothing could be further from the truth. Although some of my students certainly are interested in web applications, the majority of them are software engineers, testers, data scientists and system administrators.

This last group, the system administrators, usually comes into my course with the same story. The company they work for has been writing Bash scripts for several years, but they want to move to a higher-level language with greater expressiveness and a large number of third-party add-ons. (No offense to Bash users is intended; you can do amazing things with Bash, but I hope you'll agree that the scripts can become unwieldy and hard to maintain.)

It turns out that with a few simple tools and ideas, these system administrators can use Python to do more with less code, as well as create reports and maintain servers. So in this article, I describe one particularly useful tool that's often overlooked: os.walk, a function that lets you walk through a tree of files and directories.

os.walk Basics

Linux users are used to the ls command to get a list of files in a directory. Python comes with two different functions that can return the list of files. One is os.listdir, which means the "listdir" function in the "os" package. If you want, you can pass the name of a directory to os.listdir. If you don't do that, you'll get the names of files in the current directory. So, you can say:


In [10]: import os

When I do that on my computer, in the current directory, I get the following:


In [11]: os.listdir('.')
Out[11]:
['.git',
 '.gitignore',
 '.ipynb_checkpoints',
 '.mypy_cache',
 'Archive',
 'Files']

As you can see, os.listdir returns a list of strings, with each string being a filename. Of course, in UNIX-type systems, directories are files too—so along with files, you'll also see subdirectories without any obvious indication of which is which.

I gave up on os.listdir long ago, in favor of glob.glob, which means the "glob" function in the "glob" module. Command-line users are used to using "globbing", although they often don't know its name. Globbing means using the * and ? characters, among others, for more flexible matching of filenames. Although os.listdir can return the list of files in a directory, it cannot filter them. You can though with glob.glob: