Breaking Up Apache Log Files for Analysis

Dave tackles analysis of the ugly Apache web server log.

I know, in my last article I promised I'd jump back into the mail merge program I started building a while back. Since I'm having some hiccups with my AskDaveTaylor.com web server, however, I'm going to claim editorial privilege and bump that yet again.

What I need to do is be able to process Apache log files and isolate specific problems and glitches that are being encountered—a perfect use for a shell script. In fact, I have a script of this nature that offers basic analytics in my book Wicked Cool Shell Scripts from O'Reilly, but this is a bit more specific.

Oh Those Ugly Log Files

To start, let's take a glance at a few lines out of the latest log file for the site:


$ head sslaccesslog_askdavetaylor.com_3_8_2019
18.144.59.52 - - [08/Mar/2019:06:10:09 -0600] "GET /wp-content/
↪themes/jumpstart/framework/assets/js/nivo.min.js?ver=3.2
 ↪HTTP/1.1" 200 3074
"https://www.askdavetaylor.com/how-to-play-dvd-free-windows-
↪10-win10/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64)
 ↪AppleWebKit/537.36 (KHTML, like Gecko) Chrome/
 ↪64.0.3282.140 Safari/537.36 Edge/18.17763 X-Middleton/1"
 ↪52.53.151.37 - - [08/Mar/2019:06:10:09 -0600] "GET
 ↪/wp-includes/js/jquery/jquery.js?ver=1.12.4 HTTP/1.1"
 ↪200 33766 "https://www.askdavetaylor.com/how-to-play
↪-dvd-free-windows-10-win10/" "Mozilla/5.0 (Windows NT
 ↪10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko)
 ↪Chrome/64.0.3282.140 Safari/537.36 Edge/18.17763
 ↪X-Middleton/1" 18.144.59.52 - - [08/Mar/2019:06:10:09
 ↪-0600] "GET /wp-content/plugins/google-analytics-for-
↪wordpress/assets/js/frontend.min.js?ver=7.4.2 HTTP/1.1"
 ↪200 2544 "https://www.askdavetaylor.com/how-to-play
↪-dvd-free-windows-10-win10/"
 ↪"Mozilla/5.0 (Windows NT 10.0; Win64; x64)
 ↪AppleWebKit/537.36 (KHTML, like Gecko)
 ↪Chrome/64.0.3282.140 Safari/537.36 Edge/18.17763
 ↪X-Middleton/1"

It's big and ugly, right? Okay, then let's just isolate a single entry to see how it's structured:


18.144.59.52 - - [08/Mar/2019:06:10:09 -0600] "GET
 ↪/wp-content/themes/jumpstart/framework/assets/js/
↪nivo.min.js?ver=3.2 HTTP/1.1" 200 3074
"https://www.askdavetaylor.com/how-to-play-dvd-free-windows-
↪10-win10/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140
 ↪Safari/537.36 Edge/18.17763 X-Middleton/1"

That's still obfuscated enough to kick off a migraine!

Fortunately, the Apache website has a somewhat clearer explanation of what's known as the custom log file format that's in use on my server. Of course, it's described in a way that only a programmer could love: