Blog spam continues to pour in: Akismet starting to miss some…

It’s old news for the big bloggers, but for me its somewhat novel- the sheer volume of spam comments to my blog is growing at an amazing pace. You don’t see the spam in my comments because I both use Akismet (the spam filter for WordPress 2.x blogs) and moderate all comments. But here is what I have observed on this very blog you are reading right now:

  • Total spam comments between July 2005 and January 1, 2006: approximately 50
  • Spam comments from January 2006 to May 7: approximately 1500
  • Spam comments from May 7 to today: approximately 1700
  • Spam comments from January to Yesterday (five months) that made it past Akismet and that I had to manually moderate: about 70
  • Spam comments today that made it past Akismet and that I had to manually moderate: 25

I think I sense a trend here…and its not a happy one. I’m up in the 100 spam comments a day range now, and its not unreasonable to expect 200 a day by the end of the year. I wonder if the law of diminishing returns will start to kick in for the services generating the comments?

What I’m describing is an old story on the Internet…wasting other people’s time and money is far too easy. I’m kind of hopeful that the fact that bloggers and makers of blog software like WordPress have learned from what happened with email, and the filters will clamp down more effectively on the junk early in the process. Akismet works pretty well, and if every blogger/blog tool has something similar up and running now or within the next couple of months that might be a good start. If that were to happen,
perhaps the questionable business model upon which blog spamming is based would simply dry up and blow away in the wind.

Yeah, I’m being optimistic- but I can hope, can’t I?

UPDATE: I may be optimistic, but I’m not stupid. I have installed Bad Behavior (to block spam attempts before they even create a comment) and BBStats (to show Bad Behavior’s activity)

5 thoughts on “Blog spam continues to pour in: Akismet starting to miss some…”

  1. The whole point behind Bad Behavior is to drive up the cost of blog spamming. In fact, the cost of sending spam goes up exponentially the more people use Bad Behavior.

    The problem is that not enough people are using it. Which is why I’m developing version 2 to reach more platforms and stop even more spam on even more blog, wiki, forum and guestbook software packages.

    In the meantime, while I haven’t driven the spammers out of business yet, I’ve at least been able to provide virtually spam-free blogging. Enjoy!

  2. Welcome, Michael!

    Thank you for developing Bad Behavior. When I read about how it works, it seemed to me that it would frustrate spammers a great deal more than simply having their spam filtered. My hope is that in combination with Akismet it will make the spam problem largely invisible to me and, at the same time, disadvantage the folks generating the spam enough to perhaps convince them to back off.

    One thing I’ve noticed in my logs is that it appears Feedster’s crawler (or at least something identifying itself as Feedster) is being blocked for incorrect use of the TE attribute. My next step should probably be to track down a Bad Behavior discussion/support forum so I can find out more about whether this is expected/wanted behavior.

  3. Heh. I am the Bad Behavior discussion/support forum. 🙂

    As for Feedster, it is using TE correctly, according to my logs. If you’re seeing it blocked, check to ensure your site isn’t using a misconfigured HTTP accelerator (reverse proxy). This is usually squid, tux, or things along those lines, but some unusual Apache configurations might do it.

    If Feedster is being blocked at your site, it’s likely that so is the Opera web browser, so it’s definitely something you’ll want to work with your web host to get fixed, or if they’re unwilling to fix it, to find a new web host.

  4. Like you being your own support forum, I am my own web host 🙂 The server running this site is upstairs in my house.

    I have no reverse proxy installed. In terms of Apache config…I’m running four VHosts on the same Apache instance. The only other “strange” thing is I’m using that I can think of is mod_deflate.

    This is what I’m seeing in the Bad Behavior log table for Feedster:

    http_headers:
    GET /feed/ HTTP/1.1
    TE: deflate,gzip;q=0.3
    Accept-Encoding: gzip, deflate, compress
    From: [email protected]
    Host: www.kgadams.net
    If-Modified-Since: Sun, 21 May 2006 21:26:38 GMT
    If-None-Match: "87af99f4cc3f479eaf691ac5697be113"
    User-Agent: Feedster Crawler/1.0; Feedster, Inc.
    
    denied_reason:
    Header 'TE' present but TE not specified in 'Connection' header

    As you mentioned, I’ve observed a couple of rejections for Opera of which the following is an exemplar:
    http_headers:
    
    GET /?p=2 HTTP/1.1
    Host: www.kgadams.net
    User-Agent: Opera/6.03 (Windows 2000; U) [en]
    Pragma: no-cache
    Accept: */*
    Referer: http://www.kgadams.net/?p=2
    Max-Forwards: 10
    
    denied_reason:
    
    Header 'Pragma' without 'Cache-Control' prohibited for HTTP/1.1 requests

    Thoughts?

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.