Simon Fell > Its just code > bandwidth, Google, awstats

Wednesday, March 22, 2006

My bandwidth usage for has doubled in the last 6 months, I've been trying to work out where its all going. I figured it was largely from downloads, and have been thinking about moving the downloads to a cheaper hosting deal.

I ran this years logs through awstats (a fairly painful process, it seems odd to me that anyone building a web server log mining reporting tool would assume you don't have lots of existing logs to feed it). It generated some interesting stats, not least of which is that the googlebot has already soaked up over one Gb of bandwidth this month (more than 3x of Yahoo, and 5x of AskJeeves), WTF is it doing ? the site is not that big, why has it done 100k hits and 1.03GB of bandwidth just in march ?

Also turns out that my RSS feed eats about as much bandwidth as the binary downloads, so back to the drawing board there I think.

awstats was pretty easy to get up and running on my Mac (easier than a previous attempt to run it on windows), although it got the httpd.conf changes wrong, that was easy enough to fix. One nice trick I managed is that you can feed it logs from a pipe, so there's no need for me to download the logs locally to the machine first, you can just feed it the logs directly from curl, e.g.

LogFile="curl -u user:pass ftp://myserver/serverlogs/%YY-24%MM-24%DD-25.log |"

awstats still has a few holes in it I'd like to see a top 10 list of download files (.exe, .zip, etc, would be nice if the tables were sortable by their headings), and the summaries are great, but you can't drill down, so either I'll be getting more friendly with grep, or I'll be trying out some other tools (any recommendations ?)