Online Chat

Use the window below to chat with me (if I'm online ...)

Use the edit nick field above to let me see your name.

cazh1: on Business, Information, and Technology

Thoughts and observations on the intersection of technology and business; searching for better understanding of what's relevant, where's the value, and (always) what's the goal ...

Saturday, September 10, 2005

Analog and Report Magic Log File Formats

Analog and Report Magic Log File FormatsAnalog and Report Magic Log File Formats

I got an email asking for some insight on Analog and Report Magic for web site reports. When I set this up last year, I had a bit of trouble getting the format correct for the log files generated by 1and1, my hosting provider. They come out looking a bit like basic Apache, but I had to tweak the format a bit to get a decent amount of valid data.

Here's an overview of my log / reporting process, let me know if you have any more questions ...

  • Log Files: I download from my FTP account about twice a month. At first I didn't realize that 1and1 keeps logs around for three months, maybe less, then smokes 'em. So, I missed a few months late last year, my bad. Everything comes down to my machine in gzip format, so I have to expand 'em. Also - the naming convention was a little puzzling, until I realized they go by "week number"; access.log.20 is from the 20th week of the year. If you do a download midweek, you'll see daily logs, numbered appropriately (access.log.20.1, access.log.20.2, etc.). I just ignore the most recent logs and only report off the full-week ones. Also, I'll rename them (access.log.2005.W20), so I can go multi-year with my deep dive analytics.
    Note that I grab the ftp and ftpxfer logs as well, and rename them the same way - I don't do any reporting on that stuff - yet ...
  • Analog: Log file analyzer, infinite ROI, best in class, yada. Download and install, but note that I have bookmarked the readme page because I periodically tweak the settings in ...
  • Analog.cfg: Nice little text file for all the settings ... here are some important ones ...

APACHELOGFORMAT (%h %l %u %t \"%r\" %s %b %{Host}i \"%{Referer}i\" \"%{User-agent}i\" "%j")

I have a comment in mine from last year - the string that 1and1 provides is nowhere close! I arrived at this format thru some trial and error - Analog spits out a log file (errors.txt) that details the issues with your config file and the input records it can't handle (based on this format). I get about 100 log records kicked out as "bad" from a total of ~6200; < 2% errors, I can live with that.

CONFIGFILE SearchEngines.txt
CONFIGFILE RobotInclude.txt
CONFIGFILE TypeAlias.txt
CONFIGFILE RefSpam.txt

Mike Shor maintains a really nice set of files full of cruft IDs like search engines, robots, etc. to exclude from your metrics - you don't want that false sense of uber-traffic. Create a recurring ToDo to download updates occasionally.

  • QuickDNS: I want to understand who is coming in to the site, but Analog (below) is a bit slow, so I found QuickDNS, by Analogx (no relation). It uses the same analog.cfg file for settings like logfile name, so I just installed it in the same directory as Analog. It's a command line utility, so the line in my batch file to gen reports is

qdns /l access.log /d dnsfile.txt /y 192.168.x.y /t 500

The /y parm points to my DNS server.

  • Report Magic: Hmmm, as I write this the URL is not responding - hopefully, this superlative chunk o'sware can still be found. This will read the Analog output and render beautiful graphs that will impress your friends ...

Click on the picture for a full-size image!

Ok, not the most original post, but I've published that LOGFORMAT line for 1and1, so there's my value add to the Internet for the day!

Technorati Tags: , , , , , , , , , ,

<< blog home