Analog and Report Magic Log File Formats
I got an email asking for some insight on Analog and Report Magic for web site reports. When I set this up last year, I had a bit of trouble getting the format correct for the log files generated by 1and1, my hosting provider. They come out looking a bit like basic Apache, but I had to tweak the format a bit to get a decent amount of valid data.
Here's an overview of my log / reporting process, let me know if you have any more questions ...
- Log Files: I download from my FTP account about twice a month. At first I didn't realize that 1and1 keeps logs around for three months, maybe less, then smokes 'em. So, I missed a few months late last year, my bad. Everything comes down to my machine in gzip format, so I have to expand 'em. Also - the naming convention was a little puzzling, until I realized they go by "week number"; access.log.20 is from the 20th week of the year. If you do a download midweek, you'll see daily logs, numbered appropriately (access.log.20.1, access.log.20.2, etc.). I just ignore the most recent logs and only report off the full-week ones. Also, I'll rename them (access.log.2005.W20), so I can go multi-year with my deep dive analytics.
Note that I grab the ftp and ftpxfer logs as well, and rename them the same way - I don't do any reporting on that stuff - yet ... - Analog: Log file analyzer, infinite ROI, best in class, yada. Download and install, but note that I have bookmarked the readme page because I periodically tweak the settings in ...
- Analog.cfg: Nice little text file for all the settings ... here are some important ones ...
APACHELOGFORMAT (%h %l %u %t \"%r\" %s %b %{Host}i \"%{Referer}i\" \"%{User-agent}i\" "%j")
I have a comment in mine from last year - the string that 1and1 provides is nowhere close! I arrived at this format thru some trial and error - Analog spits out a log file (errors.txt) that details the issues with your config file and the input records it can't handle (based on this format). I get about 100 log records kicked out as "bad" from a total of ~6200; < 2% errors, I can live with that.
CONFIGFILE SearchEngines.txt
CONFIGFILE RobotInclude.txt
CONFIGFILE TypeAlias.txt
CONFIGFILE RefSpam.txtMike Shor maintains a really nice set of files full of cruft IDs like search engines, robots, etc. to exclude from your metrics - you don't want that false sense of uber-traffic. Create a recurring ToDo to download updates occasionally.
- QuickDNS: I want to understand who is coming in to the site, but Analog (below) is a bit slow, so I found QuickDNS, by Analogx (no relation). It uses the same analog.cfg file for settings like logfile name, so I just installed it in the same directory as Analog. It's a command line utility, so the line in my batch file to gen reports is
qdns /l access.log /d dnsfile.txt /y 192.168.x.y /t 500
The /y parm points to my DNS server.
- Report Magic: Hmmm, as I write this the URL is not responding - hopefully, this superlative chunk o'sware can still be found. This will read the Analog output and render beautiful graphs that will impress your friends ...
Click on the picture for a full-size image!
Ok, not the most original post, but I've published that LOGFORMAT line for 1and1, so there's my value add to the Internet for the day!
Technorati Tags: analog, reportmagic, logformat, 1and1, analytics, quickdns, analogx, shor, searchengines.txt, apachelogformat,