Note: This entry has been restored from old archives.
For random primates, such as myself, friendly spiders and assorted maladroit suckers of all the Internet’s most rank drivel must represent near 100% of our readership. Since, in truth, most people are little better than the few lines of code behind my most frequent website visitors I bid you all welcome. You’re welcome to be my friends Googlebot, Baiduspider, Gigabot, TurnitinBot, Zeusbot, msnbot and the other 65 or so eaters of my robots.txt I have seen in the last year. But those of you who shun my robots.txt, especially those of you lacking decent user-agent strings, can crawl back into your dingy holes with the slugs and worms (I’m looking at you: bots from EZRSSFeeds, WebSense (Konqueror my arse) and other houses of deception). Alas for you, even these clammy denizens of dank and musty places will probably shun your presence.
One of your number seems to have more in common with the leech than any other form of life. To me this nefarious creature appears to propose: “I’ll make it easy for you to steal content to put on your website to fool Google into thinking you actually have content of your own.”.
- No mention of copyright or content ownership on the site, none that I can find.
- The “spider” page doesn’t tell you about the spider employed, it tries to sell you some kind of “spider”.
- The bot grabs RSS with high regularity. (>30 hits in the last 8 days.)
- The bot doesn’t advertise its self via user-agent, it doesn’t send a user-agent string at all. (But it’s IP reverses to the domain name: 18.104.22.168)
- I’m guessing here, but I bet the bot pays no attention to robots.txt! (The IP above started hitting RSS on my site in September 2006 and has never requested the robots.txt file).
I’m blocking the little bugger’s IP now, for general bad behaviour and likely evilness… but that’s only effective up until it starts crawling with a different IP. In truth, if you put stuff on “the Web” there isn’t any way to protect it, consider it “fair game”. With just a little work this bot could be made much harder to identify, since you’re already behaving in a questionable way why not start employing bot-nets to do the surfing, and use some legitimate UA strings! You’re a dumb bot! As a friend of mine might say: no bot-biscuit for you! I think there is a viewpoint floating around that sees providing an RSS feed as permission to play free and easy with the content. People who write weblogs are essentially attention whores so any distribution of their content must be a good thing in their eyes, right?
Now, to some squishy human life-forms: If you’re considering using the service associated with this bot, or anything similar, you might want to consider potential copyright implications. It might be fine, maybe it just provides excerpts and properly references the source, or maybe not. Like I said, their website makes no mention of copyright and their bot doesn’t identify its self, this is incriminating behaviour in my opinion. If it is legitimate why doesn’t it do the right thing?
Alternatively, just write some bloody content you poop fairy.
To the leeches: My apologies if I offended you.
Back to the good bots: Goodnight my friends.