Yvan Seth's Hole in the Internet

Further Internetual randomness courtesy of Yvan Seth, 2E8F CE5E AEA8 B7B4 EE29 641F F2F2 EE44 AA02 4D53.
/Entries/Technology/General/ <rss>

EZRSSFeeds and other WebSuckers

Tue 2007-01-09 19:33

For random primates, such as myself, friendly spiders and assorted maladroit suckers of all the Internet's most rank drivel must represent near 100% of our readership. Since, in truth, most people are little better than the few lines of code behind my most frequent website visitors I bid you all welcome. You're welcome to be my friends Googlebot, Baiduspider, Gigabot, TurnitinBot, Zeusbot, msnbot and the other 65 or so eaters of my robots.txt I have seen in the last year. But those of you who shun my robots.txt, especially those of you lacking decent user-agent strings, can crawl back into your dingy holes with the slugs and worms (I'm looking at you: bots from EZRSSFeeds, WebSense (Konqueror my arse) and other houses of deception). Alas for you, even these clammy denizens of dank and musty places will probably shun your presence.

One of your number seems to have more in common with the leech than any other form of life. To me this nefarious creature appears to propose: "I'll make it easy for you to steal content to put on your website to fool Google into thinking you actually have content of your own.".

Highlights:

  • No mention of copyright or content ownership on the site, none that I can find.
  • The "spider" page doesn't tell you about the spider employed, it tries to sell you some kind of "spider".
  • The bot grabs RSS with high regularity. (>30 hits in the last 8 days.)
  • The bot doesn't advertise its self via user-agent, it doesn't send a user-agent string at all. (But it's IP reverses to the domain name: 147.202.50.50)
  • I'm guessing here, but I bet the bot pays no attention to robots.txt! (The IP above started hitting RSS on my site in September 2006 and has never requested the robots.txt file).

I'm blocking the little bugger's IP now, for general bad behaviour and likely evilness... but that's only effective up until it starts crawling with a different IP. In truth, if you put stuff on "the Web" there isn't any way to protect it, consider it "fair game". With just a little work this bot could be made much harder to identify, since you're already behaving in a questionable way why not start employing bot-nets to do the surfing, and use some legitimate UA strings! You're a dumb bot! As a friend of mine might say: no bot-biscuit for you! I think there is a viewpoint floating around that sees providing an RSS feed as permission to play free and easy with the content. People who write weblogs are essentially attention whores so any distribution of their content must be a good thing in their eyes, right?

Now, to some squishy human life-forms: If you're considering using the service associated with this bot, or anything similar, you might want to consider potential copyright implications. It might be fine, maybe it just provides excerpts and properly references the source, or maybe not. Like I said, their website makes no mention of copyright and their bot doesn't identify its self, this is incriminating behaviour in my opinion. If it is legitimate why doesn't it do the right thing?

Alternatively, just write some bloody content you poop fairy.

To the leeches: My apologies if I offended you.

Back to the good bots: Goodnight my friends.

No Responses

Name:
Email: (You must register an email address!)
Url: (optional)
Title: (optional)
Response:
All HTML will be escaped. Paragraphs and new-lines are honoured and you can use *word* for bold and _word_ for underline.
Save my Name, URL, and Email for next time
/Entries/Technology/General/EZRSSFeeds and other WebSuckers

© 2005-2009 Yvan Seth — EMail Yvan | XHTML 1.0 Strict | Add to GoogleSubscribe with Bloglines | Creative Commons License

    follow me on Twitter

    Categories

    Badgers

    Protecting your bits. Open Rights Group