Half of the “stuff” on my site seemed to break all in one weekend. And it wasn’t even my fault!

I’m referring to the newsfeeds I have here. Two of them died- my Google news (top of the page) and Slashdot (formerly lower left of the main page).

Google news normally sits at the top of the main page. The headlines are “scraped” from their site ever couple of hours (supposedly). Well, the RSS service I was using stopped being updated late last week. I think I’ve may have figured out why: based on some testing I did with some scripts I found for doing this, it looks like Google is blocking attempts to read the site with tools that don’t identify themselves as certain “standard” browser types. I hacked a workaround using php-CURL (which lets me change the client agent (browser) identification header sent when my server connects to Google. Unfortunately, its not quite working properly yet. Hopefully later this week- for now, those links at the top of the page are dead.

Slashdot…it seems my server has been “banned” (check this page out, and look for “My RSS reader tells me I was banned!” for details)for accessing their site “too often” or something. I have two web pages that used to do hourly refreshes. Barring something freaky, this shouldn’t generate more than two queries an hour…but maybe Slashdot has gotten more picky lately. Also, maybe someone has spoofed my server’s IP- I sent a message to Slashdot to see if they can give me some clues why my server has been banned. For now, I’ve turned my Slashdot feeds off.

Oh, and I have all the parts for rebuilding my server here (see my previous article), but between having to work this weekend, sick cats, and the problems mentioned here I didn’t end up with any time to spare).

Google news normally sits at the top of the main page. The headlines are “scraped” from their site ever couple of hours (supposedly). Well, the RSS service I was using stopped being updated late last week. I think I’ve may have figured out why: based on some testing I did with some scripts I found for doing this, it looks like Google is blocking attempts to read the site with tools that don’t identify themselves as certain “standard” browser types. I hacked a workaround using php-CURL (which lets me change the client agent (browser) identification header sent when my server connects to Google. Unfortunately, its not quite working properly yet. Hopefully later this week- for now, those links at the top of the page are dead.

Slashdot…it seems my server has been “banned” (check this page out, and look for “My RSS reader tells me I was banned!” for details)for accessing their site “too often” or something. I have two web pages that used to do hourly refreshes. Barring something freaky, this shouldn’t generate more than two queries an hour…but maybe Slashdot has gotten more picky lately. Also, maybe someone has spoofed my server’s IP- I sent a message to Slashdot to see if they can give me some clues why my server has been banned. For now, I’ve turned my Slashdot feeds off.

Oh, and I have all the parts for rebuilding my server here (see my previous article), but between having to work this weekend, sick cats, and the problems mentioned here I didn’t end up with any time to spare).