Simon Fell > Its just code > [more on] bandwidth

Friday, March 31, 2006

Between grep, awstats, perl and python I've been picking my logs apart to get a handle on the bandwidth usage, here's a summary of what I've found so far.

  • Google for some reason was just pounding some 3 year old content which hasn't changed forever, over and over again, a swift kick in the robots.txt seems to have quieted that down, still it managed to chew through 1.86Gb of bandwidth in march.
  • The RSS feed for this blog is responsible for a good chunk of bandwidth too, I see people still insist on writing aggregators that don't do conditional GETs, WTF is wrong with you people, the shit list includes AlestiFeedBot, RssReader, squeet. BlendBlogs, NewsAlloy and Thunderbird.
  • Some comedian from 72.41.107.1 is running FeedForAlls rss2htm which is crappy enough to download the entire feed for every request, it also sends the URL of my feed as the referrer so other than the IP, I can't find out where this is being used, but its made more than 13k requests for the feed this month. (If you know who you are, get in touch)

Why doesn't the Google bot, and other search engine support conditional GETs?? the amount of change on the site is a fairly small %, it doesn't really need to get the entire thing over, to pick up the 3 changes since last time. As this doesn't make things easier for them (although you'd think re-indexing changes rather than everything would be faster and therefore cheaper) I doubt we'll see it. This is definitely an area where I think one of the underdogs could step up to the plate and force some movement.

Yeah, I know my feed doesn't support compression, I've been talking to the Orcsweb folks about it (who do a fantastic job hosting), the IIS compression settings are all global though, so they can't add .xml files to the compression list without affecting everyone on that server, so they're understandably reluctant to do it. Not sure what to do about that, in the mean time I've cut down the number of entries in the feed, I don't blog as much as I used to, so it shouldn't be a big issue.

update : Good news, both the NewsAllow and Squeet folks have rolled out new versions with conditional GET support, and the BlendBlog guys are working on it, thanks!