Category Archives: Uncategorized

Bot or Not?

Note: This entry has been restored from old archives.

Stick a form on the web and within a few days it’s getting hammered. My recently added “comment” form started getting a few posts, all to the “Comments” entry (web-search anyone?). So I added email registration… could have saved myself time by just not adding the silly “comment” ability in the first place. The registration was just in time, a day later the flood started. Maybe not much of a flood by big-web standards, it must be scary to be a popular website! In the last 12 hours I’ve had just over 1000 POSTs of the comment form.

They’re not hitting the “Comments” entry much anymore either. The breakdown is:

      1 Food/Ristretto/The_Coffee_House_on_Watford_High_Street.html
      9 Technology/General/Comments.html
     46 Food/Cooking/Spinach_Pasta.html
     47 Random/Riverbank_Teahouse.html
     48 Food/Eating/England/London/Gourmet_Burger_Kitchen.html
     49 Technology/Code/awk_awk_awk_.html
     52 Technology/Code/Just_Like_Uni.html
     53 Technology/General/Flashy_Shite.html
     54 Food/Cooking/Lime_Poached_Chicken.html
     55 Food/Eating/England/London/The_Neal_Street_Restaurant.html
     59 Technology/General/EZRSSFeeds_and_other_WebSuckers.html
     59 Technology/General/Still_Doesn_t_Like_Kaspersky.html
     67 Random/Flip_Out_Like_A_Ninja.html
     69 Random/BAA_BAA_Whisky.html
     70 Random/Birdflu.html
     83 Health/Your_Back_Needs_Debugging.html
     84 Food/Ristretto/Caffe_Vergnano_1882_on_Charing_Cross_Road.html
     87 Health/Beerolies.html
     98 Technology/General/Collateral_Damage__An_Unintentional_Storm_Worm_DOS.html

Why these pages? Not quite sure, but they co-incide with pages that get the most Google hits, maybe that’s it. I’ve collected data on the form postings, the primary aim being to capture whether or not real people were behind the postings. Now, I thought this highly unlikely, looked like bot activity to me. But there is a lot of “they just get cheap people in China to fill in forms” going around that I thought I’d try something… I’ve added javascript to the page that records all keystrokes and mouse activity to a log that is sent to my web server when the form is submitted. This was fun, and neat, for an example have a look at keylog.html.

The end result of this little exercise is that I seem to have confirmed my opinion that there are no “real” people involved here. This isn’t representative of course, my site is tiny, unimportant, and doesn’t employ CAPTCHAs. If anything I’m a very unlikely target of such attention. Further, there are two ways to disable my logging:

  1. Cache the form and present from some “form filling” tool (unlikely).
  2. Have javascript disabled (duh).

I classify the first as highly improbable. I classify the second as not being the case for my forms since I’ve started getting submissions with spam data filled into hidden fields.

It would have been much more interesting to pick up some key logs! But the effort has revealed interesting data regardless.

  1. After changing the form the new fields didn’t show up in POSTs so that POSTer (a bot) responsible has cached the form (or form params at any rate).
  2. There was a delay of only one hour between the form change and the first new spam post with the new fields. Of 1000 POSTs in the next 12 hours only 10 were for the new form. Most current POSTs are still using the old form fields.
  3. Nine of the new-form hits were for the same page (Technology/General/Comments.html), so first hit from a new crawl of the form-snaffling bot I take it.
  4. Just one was for Food/Ristretto/The_Coffee_House_on_Watford_High_Street.html, and this is a very different POST from all the others (spammy random URL and random-letter “words”, while all others all have real “English” word secuences).
  5. Reflecting back on the access logs it looks like POSTs are usually preceded by GETs to the correct URLs and the GET has no referrer (related: in the same period there are 4 hits to the page by MSIE variants with no referrers and no other hits from the same IP, the spider maybe? Two of the IE UA strings are just broken looking.)
  6. The “url” field is always filled in with a “http://…” URL.
  7. Across all 1000+ posts only 33 URLs are used. These are not evenly distributed, with about 5 around the 100 mark, 27 below 25 (10 are a single occurrences), and 7 in the 25-80 range.
  8. A total of 148 IPs source the POSTS, many make only 1 or 2 POSTs, 22 make beteen 10 and 50, 5 between 50 and 100, and one makes 127 POSTS (submitting 15 URLs with very uneven distribution).
  9. Five URLs appear to be a typos with “hyml” rather than “html”, but I’m not giving them the satisfaction of a hit to find out for sure. It might just be an obfuscation attempt. Of these possible typos three are the three most submitted URLs.
  10. 40 “name”s are used and 35 “title”s, these usually are filled in with identical data, and usually related to the obvious subject of the URL.
  11. Most spamvertising is for drug names (I recognise “viagra” but the rest mean little to me: “levitra”, “ambien”, “xanax”, “cialis”), next most popular is gaming/casinos (including the most spammed URL), finally there’s porn (comparatively infrequent).
  12. The “comment” field is usually filled in with some supposedly complimentary text, and only contains URLs in two cases.

I’ll leave the observations at that, more interesting would be to draw relationships between the different field content, inspection doesn’t show any obvious patterns and I don’t have time to dig deeper. The frequency of comment content is:

      1 comment:[[URLS REMOVED]]
      1 comment:good post man [[URLS REMOVED]]
      1 comment:so many interesting [[URLS REMOVED]]
      1 comment:yujlh lzqfe heug xsjepcl dljfugw axiwrlbcm visf
      6 comment:Hello, nice site look this:
     44 comment:Good design!
     48 comment:Great work!
     49 comment:Pretty much nothing seems important.
     50 comment:Good site. Thank you.
     50 comment:I like your site very much indeed.
     51 comment:Great site! Beautiful craftsmanship!! Keep of the wonderful work!!
     52 comment:Nice site
     53 comment:Cool site. Thank you!
     53 comment:Hello, very nice site!
     53 comment:TARRIFIC SITE!
     53 comment:Thank you!
     55 comment:Hi, nice site
     56 comment:Well done!
     57 comment:very interesting fix links
     60 comment:Nice site. Thanks.
     61 comment:I feel like a bunch of nothing.
     61 comment:I just don't have anything to say.
     64 comment:Cool site. Thank you:-)
     64 comment:Excellent web site. I will visit it often.
     69 comment:Nice site. Thanks!

We’ve all seen “Nice site. Thanks!” on blogs all over the ‘net. My favourite is “I feel like a bunch of nothing.”, makes me feel sorry for some poor depressed zombie machine somewhere. The fourth one, “yujlh…” is from the only POST that looks completely unlike all the others, a URL submitted but with all other fields meaningless character sequences.

My feeling is that this is the “new spam”, though maybe not so new just harder to measure. Why try to push to victims through email, which is rapidly loosing the peoples’ trust, when you can focus real effort to simultaneously getting the word spread all over the ‘net and push search-ranking juice to these pages? Does this really work? Seems unlikely, but I’ve never been able to get my head around the fact that spam is actually effective … it takes all kinds of stupid to make a society.

They say that email spam is declining (but people like to say that every few months, then there’s another surge) so maybe the resources are going into this instead. The next question is the source? I think it is probably clear that this is the work of a bot-net, do we think Storm? Who’s paying them? Maybe the URLs are actually

There’s been 100 new POSTs since I started writing this (one hour ago).

What can we do about this? The solution seems simple. Guard web forms appropriately! CAPTCHAs are popular, but requiring login/registration may be better. Mark all URLs as “nofollow” to kill any hopes of search-state inflation (or don’t allow URLs if they can be avoided). The simplicity is probably misleading though, this flood against my little site is unsophisticated and this is probably the case because this is all that’s needed to post to so many blog type sites. If bloggers raise the bar the bot herders will just jump higher. Depressing isn’t it? The continued lack of any real solutions against malware and spam often makes me “feel like a bunch of nothing”, to quote one of the bots.


Leftovers, some more stats:

User agents:

      1 HTTP_USER_AGENT:Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30)
      1 HTTP_USER_AGENT:Xrqhgdfzi sipmvr zqboirha
      3 HTTP_USER_AGENT:Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)
      6 HTTP_USER_AGENT:User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; .NET CLR 1.1.4322; .NET CLR 2.0.50727; InfoPath.2)
     54 HTTP_USER_AGENT:Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; http://www.tropicdesigns.net)
     63 HTTP_USER_AGENT:Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)
     75 HTTP_USER_AGENT:Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)
    105 HTTP_USER_AGENT:Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
    116 HTTP_USER_AGENT:Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; Maxthon)
    129 HTTP_USER_AGENT:Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; InfoPath.1)
    147 HTTP_USER_AGENT:Mozilla/4.0 (compatible; MSIE 5.0; Windows 98; DigExt; MRA 4.0 (build 00768))
    157 HTTP_USER_AGENT:Mozilla/4.0 (compatible; MSIE 5.0; Windows 98)
    327 HTTP_USER_AGENT:Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)

Intriguing list of vias and proxies:

      1 HTTP_VIA:1.1 MCRDSC, 1.0 kiwi.khi.wol.net.pk:3128 (squid/2.5.STABLE7)
      1 HTTP_VIA:1.1 TTCache03 (Jaguar/3.0-59)
      1 HTTP_VIA:1.1 barracuda.lcps.k12.nm.us:8080 (http_scan/1.1.2.5.10)
      1 HTTP_VIA:1.1 fiorillinet:3128 (squid/2.6.STABLE7)
      1 HTTP_VIA:1.1 firewall.seduc.ro.gov.br:3128 (squid/2.5.STABLE6)
      1 HTTP_VIA:1.1 localhost.localdomain
      1 HTTP_VIA:1.1 localhost:3128 (squid/2.5.STABLE14)
      1 HTTP_VIA:1.1 mirage.certelnet.com.br:3128 (squid/2.5.STABLE14)
      1 HTTP_VIA:1.1 none:8080 (Topproxy-2.0/)
      1 HTTP_VIA:1.1 proxy:3128 (squid/2.5.STABLE11)
      1 HTTP_VIA:1.1 sexto.fmetsia.upm.es
      2 HTTP_PROXY_AGENT:Sun-Java-System-Web-Proxy-Server/4.0.3
      2 HTTP_VIA:1.0 allserver.all-milwaukee.org:3128 (squid/2.6.STABLE16)
      2 HTTP_VIA:1.1 PERFECTION01
      2 HTTP_VIA:1.1 i187340:3128 (KEN!)
      2 HTTP_VIA:1.1 proxy-server1
      2 HTTP_VIA:1.1 server2.buffalowelding.com:3120 (squid/2.5.STABLE13)
      4 HTTP_VIA:1.1 FLASH:3128 (squid/2.6.STABLE16-20071117)
      4 HTTP_VIA:1.1 MCRDSC, 1.0 cherry.khi.wol.net.pk:3128 (squid/2.5.STABLE7)
      5 HTTP_VIA:1.1 MCRDSC, 1.0 mango.khi.wol.net.pk:3128 (squid/2.5.STABLE7)
      5 HTTP_VIA:1.1 MCRDSC, 1.0 pear.khi.wol.net.pk:3128 (squid/2.5.STABLE7)
      6 HTTP_VIA:1.0 HAVP
      7 HTTP_VIA:1.1 ISAFW
      8 HTTP_VIA:1.1 ppr-cache1 (NetCache NetApp/6.1.1D2)
     12 HTTP_VIA:1.1 FGMAIN2
     14 HTTP_VIA:1.1 ndb-bau02:3128 (KEN!)
     21 HTTP_VIA:1.1 proxy.net:3128 (squid/2.6.STABLE13)
     24 HTTP_VIA:1.1 microcon-serv3:3128 (KEN!)
     39 HTTP_VIA:1.1 PRINTER
     65 HTTP_VIA:1.1 admin:3128 (squid/2.6.STABLE9)
     97 HTTP_VIA:1.1 gtw1.ciberpoint.com.br:3128 (squid/2.6.STABLE13)

(Interesting to note that some companies here are effectively giving out details about how their internal web clients are scanned at the gateway. Some of this could be enough to expose the existence of vulnerable infrastructure software or help whittle down the list of software you need to check your targeted malware with. Not good practice.)

Blueprint Café

Note: This entry has been restored from old archives.

Grouse grouse mate! Who remembers “grouse”? When I was growing up in a south-west WA surfie town in the 80s the word was nearing the end of its life. Gone the way of many recycled words, back into the compost. To say something was grouse was to say it was knarley, cool, or these days, way mad. Well, I think, but these things change from year to year and place to place, and I’m getting a bit long in the tooth to keep up. Fully sick mate, says Kat, bloody Westies. There’s a word that needs some context: Westie. I’m talking Sydney’s western suburbs, but from a geographical perspective I’m far more westie than anyone from, say, Penrith. And what can all this mean to someone in London anyway! Let alone any other part of the world.

On with the topic! On Saturday we were hunting game, unfortunately this was with a web-browser rather than a shot-gun. You got game? The game restaurant in London appears to be Rules, but when I tried to book the response was that I was about a month too late to book at this time of year. So we hunted… Eventually finding ourselves with a reservation at Blueprint Café. Some note has to be made regarding the process of booking here. It was all done via a web-site called D&D London, which handles several other popular London restaurants as well. How very modern and convenient. But I don’t really like it, I tried booking over the phone first but got a message saying to try the website (this was at 11AM). This web lark takes some of the fun out of booking a table at a decent restaurant.

So at 18:00 we rock up. The game on offer was not extensive, less prevalent than on the online menu (but game supply is unpredictable). We caught a couple of game dishes, a mallard entrée and, the pièce de résistance, grouse.

Kat didn’t order an entrée, as it transpired this was a wise decision. I, on the other hand, couldn’t resist the Salt Mallard served with raspberry conserve, quince paste, and watercress. The mallard was exquisite, cold medallions of deep red breast flesh. Maybe the raspberry and quince flavours (and sweetness) were a bit much for it, but since they were “on the side” this was a small concern. I mopped up remains with some bread, no worries!

For main course I had the Welsh Blackface Mutton. The mutton was perfect, baked to succulent tenderness, but the “parsley and mustard crust” was far, far too mustardy. As if they were trying to hide the fact that the mutton tasted like mutton!

Kat’s main course was grouse. A baked bird served with handmade crisps, salad leaves, raspberry sauce, and traditional giblets-on-toast (grouse on top). Alongside was a bowl of “bread sauce” which I can best describe as a lumpy and sweet béchamel, this complemented the grouse well. Grouse is described, by Hugh Fearlessly-Eatsitall, as having “a unique, herby, heathery flavour”. I’m not familiar with heather, but “herby” is spot on, this was one tasty bird. Tender, pink-red, and a real pain in the butt to eat! Kat made a great effort then I took over and did what I could with the carcass, it’s a complicated meal but well worth the effort. I even found a piece of lead shot in one of the legs! (Do you think they deliberately leave it in, or insert one just in case even?). I wasn’t keen on the crisps (maybe if they were parsnip I’d be happier) and the pile of what seemed like fried bread-crumbs. But really, everything in addition to the bird is just fluff. It was good.

A warning that the grouse would take three times as long to eat as the mutton would have been appreciated.

We were advised to order sides with our mains, in my case this was warranted as all I had with my mutton was a few leaves. Kat certainly had no need of a side, her meal already came with plenty of cress. She ordered a mixed leaf salad, which was fine. I ordered Purple Broccoli Spears, which were a bit too mushy for my tastes.

Dessert? We had to. I had a Quince and Apple Shortcake, it was too sweet and not quincey enough. More quince and normal cream instead of the sweet muck and it would have been much better. Kat had Orange Polenta Cake with Vanilla Bean Icecream and Poached Pear, hard to go wrong with this one, perhaps on the sweet side again. On sweetness I must note that we both very rarely eat sweet foods, so we may be over-sensitive to sweetness.

We had some wine too, “Primitivo” from Puglia at 5 quid per glass. Good eating wine.

In the end the meal cost us 90 quid, including 15 quid for three glasses (175ml) of wine and 10 quid for “12.5% discretionary service charge”. (The service was good, though maybe a little thin on the ground.) This is quite reasonable for London eating, I expect to pay over 100 quid for a night out at a London restaurant. Kat didn’t have an entrée, though she did have the most expensive main course on the menu, 22 quid, this is around what you’d expect for grouse.

My regrets are dessert, mushy broccoli, and the mustardy crust. But the evening overall was a success thanks to the duck and the grouse. There’s something to be said for the location too. Above the Design Museum (though that doesn’t excite me much) and right next to the Thames. If you’re near the glass frontage, we were right against it, you have a view of the dark glittering expanse of the Thames (which would be a view of the green murky and debris covered expanse of the river if it were daytime). Downriver the Canary Wharf skyline dominates as the river curves down to Isle of Dogs. Upriver the nearby Tower Bridge steals the show. If you can get a seat by the window Blueprint Café is a perfect restaurant for the sight seer.

Clean Swarm

Note: This entry has been restored from old archives.

There’s been an increasing density of news from the robotics front in the tech media over the last year, it’s even been spilling over into mainstream news. Much of the interest is, of course, in humanising of robotics. The latest two-legged walking robot, the cutest robot, etc. The more interesting news is in areas like unmanned exploratory robotics (those amazing Mars Rovers but more autonomous), and context-aware machinery. Generally, it looks like we’re moving towards far more capable robotics.

Beyond all the cuteness, humanness, industrial efficiency, and extreme exploration there’s something I really want out of robotics: a cleaner.

It seems a less horrendous problem than making a machine like a human, but there are probably difficulties I’ve not imagined. I’ve been thinking about swam robotics a lot in recent times, though not many stories report on it. I’d have thought that swarms would be more robust and flexible in many situations, especially exploration. You could have a generic chassis plus some specialisation and you’d have redundancy. I guess the problems are in connectivity and co-ordination, and energy density. So maybe we’re waiting for advances in power and mesh-networking technologies to make this sort of thing feasible. Another approach would be a “queen bee” that mothers a swam which are the queen’s eyes, ears, hands, etc. Maybe this could mitigate the power and control problems by adding some centralisation? I guess if it comes to exploration there’s also the chance a shark might eat your swarm-bots! 🙂

Aaaanyway… cleaning swarms? I’m terrible when it comes to cleaning, and only slightly better than Kat :-p so our place can generally get pretty chaotic. I’m often heard to exclaim, much exasperated, about my inability to keep the kitchen in a state that doesn’t resemble a pig sty. (Yet I cook in there, to some extent, almost every day — and am yet to have given either of us a case of food poisoning.) Now chaos I don’t actually mind, it’s the dirt and grime that breeds within the chaos that gets my goat. My thought is that you could have a small swam of ‘bots that have simple cleaning functions. They don’t do anything pointlessly complex, like stack stuff in the dish-washer, rather, they clean everything in-place. Dishes, utensils, surfaces, everything.

They have little brushes and mops and scuttle around washing dirt off everything. Biological recyclables go in one bucket and everything else in another (that might be a hard one to implement). There’ll be a dump-station where they empty their little rubbish accumulations. A central command computer, leaving the ‘bots themselves requiring minimal intelligence of their own. And a maintenance station where they can charge up (power, cleaner), self-clean, and change any consumable parts when required. They’re only active when there is no non-‘bot activity in the room, if a human enters while they’re active they scuttle to the corners and stop, so long as they can’t get in the way (maybe ‘bot-holes?), otherwise they just stop (and work themselves out if manually relocated).

Yeah, there’s a lot of difficulties. How hard to scrub? What to scrub? What is mess and what is something left on the bench for later? What about if you’re interrupted by a phone-call in the middle of preparing dinner and the ‘bots clean away your “mess”?

It’d be a great area to work in. One of the many things that makes me wish I was at, or able to go back to, Uni.

Beerolies

Note: This entry has been restored from old archives.

Believe it or not, there are Calories in beer. So while you’re being careful with that poached chicken and steamed vegie dinner[1] the two beers you wash it down with might double your Calories! [2]

Alcohol is much more a carbohydrate than a protein or far and for nutritional purposes you can count it as such. Note however that alcohol gives you 7 Calories per gram, rather than the 4 from normal carbs. Most beer will also contain sugars which contribute to Calories. The unfortunate thing is that brewers don’t have to put any nutritional information on their beer (nor wineries on wine for that matter). Luckily for us we can get a vague idea of the damage we’re doing to our careful planning from the alcohol volume. In some beers (much more so for wines) the sugar content is much less significant a contribution to Calories than the alcohol, although others can have a fairly high carbohydrate contribution from sugars. So alcohol-only derived Calories are a minimum and the true calories could be higher still (some examples of alcoholic Calories versus published Calories are given below).

The calculation is simple, but if you try to find information on the ‘net you mostly seem to get not-so-useful “select number of drinks” weekly calculators, where the drink classifications may or may not be relevant to whatever you’re guzzling. (“Red Wine” hey, 10% or 14% alcohol volume?)

My example is an Innis & Gunn Oak Aged Beer. A 330ml bottle at 6.6% alcohol volume.

The calculation:

  • Calculate millilitres of alcohol, 6.6% of 330ml:
    • 0.066 x 330 = 21.78
  • Calculate weight by multiplying millilitres by the specific gravity of alcohol:
    • 21.78 x 0.789 = 17.18442
  • Calculate Calories by multiplying by Calories per gram of alcohol:
    • 17.18442 x 7 = 120

So, there are at least 120 Calories in a bottle of Innis & Gunn Oak Aged Beer.

Another useful number to know is the number of Calories in a “standard unit” of Alcohol, in the UK and Australia this is 10ml:

  • 10 x 0.789 = 7.89
  • 7.89 x 7 = 55

This is a very useful number to know, no matter where you are you just need to know that your alcoholic Calorie intake is: <std-drinks>x55! Though that might be a bit hard to deal with after eight pints of larger.

So, food for thought:

A pint of Guinness, 568ml at 4.3%:

  • 0.043 x 568 = 24.424
  • 24.424 x 0.789 = 19.270536
  • 19.270536 * 7 = 135

The official figure for a pint of Guinness is 210 Calories, as you can see there is a good number of Calories from other sources.

A 187.5ml (¼ bottle) of 12.5% wine:

  • 0.125 x 187.5 = 23.4375
  • 23.4375 x 0.789 = 18.4921875
  • 18.4921875 x 7 = 129

Average figures available on the ‘net for “dry white” are around 140 Calories for this volume.

40ml of Lagavulin 16yo single malt whisky at 43%:

  • 0.43 x 40 = 17.2
  • 17.2 x 0.789 = 13.5708
  • 13.5708 x 7 = 95

So there you go, maybe you’ll hold that third beer now? Regretting those 6 pints of Guinness every Friday after work, and maybe a few other days too?


[1] 200g chicken breast, 100g broccoli, 7.5g olive oil, plus herbs and spices: 320 Calories.

[2] It is a somewhat unusual convention that “Calories” with a capital “C” represents “kilo-calories”. A Calorie is enough energy to raise the temperature of one litre of water by one degree, a calorie is enough to raise the temperature of one millilitre of water by one degree. Almost always when you see calories discussed in the context of nutrition (even with lowercase “c”) people are talking about kilo-calories.

Comments

Note: This entry has been restored from old archives.

I’ve added commenting. It’s likely not at all worth the effort involved, but eh. Maybe now I wont have to try and remember corrections/observations that people send my way via email, once or twice a year. Minor spam protection is in place, but no registration/captcha … for now (let’s see how long that lasts). Not quite sure what’s proper for this sort of thing, the comment form ends up in the RSS — maybe that’s wrong? Doesn’t seem to be normal. Existing comments also end up in the RSS version of entries, but do not have their own RSS feed and the UUID isn’t altered so there isn’t an RSS way of tracking them (that’d really not be worth bothing with!).

Along the way I had troubles getting LWP to work. The reason being that I run apache in a gaol (being a tech-term I guess I should use “jail”?) and it didn’t quite have the full set of required files. Anyway, strace is your friend in these instances. Error along the lines of:

500 Can't connect to google.com:80 (Bad protocol 'tcp')

Caused by lack of /etc/protocols and /etc/libnss_files.so.2. Or:

500 Can't connect to google.com:80 (Bad hostname 'google.com')

Caused by lack of /lib/libnss_dns.so.2.

Example of inventorying the files required for something like LWP:

:; strace lwp-request http://google.com/ 2>&1 | 
    grep '^open' | 
    grep -v ENOENT | 
    cut -d'"' -f2 | 
    sort -u | 
    grep '^/(etc|lib)'
/etc/host.conf
/etc/hosts
/etc/ld.so.cache
/etc/localtime
/etc/nsswitch.conf
/etc/protocols
/etc/resolv.conf
/lib/libc.so.6
/lib/libcrypt.so.1
/lib/libdl.so.2
/lib/libm.so.6
/lib/libnss_dns.so.2
/lib/libnss_files.so.2
/lib/libpthread.so.0
/lib/libresolv.so.2

Note that while these files are used by the command they’re not all necessarily required. That final grep is just to trim down the list, which is otherwise quite a flood from /usr/lib/

Erroneous Blame for Firefox Slowness

Note: This entry has been restored from old archives.

For a while I’ve been very annoyed by how horribly slow Firefox is, writing it off as Firefox just having grown into a disgusting slow heap. That said, I wasn’t comfortable blaming Firefox in such an off-hand manner, the issue could be Ubuntu doing something wrong, or one of the extensions I use. I almost felt I’d confirmed it was Ubuntu a little while back when switching to the mozilla.org firefox install sped my Firefox up — yes it did (something to do with fonts and AA I’ve read) but it was still pretty slow. I’ve wiped my profile and rebuilt my Firefox setup from scratch a couple of times even, still all bad.

What I failed to do was start by blaming that which is, really, the most unreliable part of my configuration: the ten or so extensions I use. Extensions are outside the control of Firefox and Ubuntu, often written by some random, and often written badly. (Well, so I expect in my cynical way.) Today I nuked my Firefox install and browsed my usual morning sites with no extensions installed, using the Ubuntu Firefox, and it really is pretty snappy. I’ve now re-installed Google Browser Sync and browsing has not degraded. Over the next few days I’ll reinstall my set of usual extensions and find out which is to blame (if any single one).

My Firefox extensions are:

  • Google Browser Sync (I don’t know how I lived without this. On the slowdown front it Seems OK, so far.)
  • SwitchProxy Tool (Essential, I work through different redirected proxies throughout the day. Might be a better plugin for this though. There are notes on the addons.mozilla.org page that say this is a cause of slowdown.)
  • AdBlock Plus (Difficult to live without this, I hate flashing/moving graphics all over websites. Flashblocker almost replaces it. Need flash+anigif blocker, that might be OK.)
  • NeoDiggler (Provides the essential “clear URL bar” button, does some other things too that I don’t use.)
  • Google Toolbar (I probably don’t really need this, it’s so common though that I doubt it is the problem.)
  • Tab Mix Plus (Use this to tweak a few tab settings, can probably live without — closed tabs history is often helpful though.)
  • Web Developer (Usually disabled anyway, very useful. It can cause slowness when enabled.)
  • Firebug (Usually disabled anyway, extremely useful. It causes extreme slowness when some parts are enabled, shouldn’t be a worry in a disabled state though.)
  • Google Gears (Have issues with this, it occasionally segfaults at shutdown-time, at least that’s where GDB points the finger. It is “Google BETA”. It makes offline Google Reader work, but I never use it.)

I’ll reinstall one per day over the next few days, in the order above, and see how my browsing joy fares. I’ll need at least a full day’s worth of browsing to work out if a plugin has a noticeable impact. (I don’t generally do a lot of web browsing.) I might try installing the Load Time Analyser extension next though, so long as it doesn’t slow anything down it seems likely to be useful.

Even with the massive no-extensions responsiveness boost, Firefox seems less speedy than Opera. I’ve been using Opera more often these days, now that it has some sort of sync feature it might be a viable Firefox replacement.

Referrer Bot

Note: This entry has been restored from old archives.

This is a quick addition to my previous post: Bot or Not?. Curiosity got the better of me so, through roundabout means, I got samples of some of the pages. First note is that the ‘hyml’ pages are 404s, so probably a typo.

Next note is that there is some dodgey looking script in some of the pages. My first thought was: Oh, this is just another botnet propogation setup. There’s two layers of encode in the snippet, first the data is URI-decoded, then each byte has 1 subtracted from it to get the real code, this is then eval()ed. This shows that the decoded content is short and simple, not a bot infester:

var r=escape(document.referrer), t="", q;
document.write("<script src=\"http://www.Z-ZZZZZ-Z.com/counter.php?id=ambien&r="+r+"\"></script>");

URL obscured, but points to what looks like a front with no links and the text “See How The Traffic Is Driven To Your Site” (the page is nothing but an image with no links). So this looks like just a route to grabbing referrer dollars from a dodgey advertising site. Note how the target script will neatly get both the spammy page and the URL of the page that was spammed.

So what about counter.php? More redirection! The script imported looks like this (reformatted for readability):

<!-- document.write(
    '<script language="JavaScript">
        function f() {
            document.location.href = "http://www.XXXXXXXXX.com/ambien.html";
        } window.onFocus =  f(); </'+'script>'); // -->
<script>
    document.write(
        '<script language="JavaScript">
            function f() {
                document.location.href = "http://www.XXXXXXXXX.com/ambien.html";
            } window.onFocus =  f(); </'+'script>');
</script>

We’ve reached the end of the road. The real URL in this code goes to an “Online Pharmacy” at a domain registered since February this year. The page contains little javascript, no exploits. A function for adding to bookmarks, some “menu” code, and it imports “urchin.js” from Google Analytics.

So yeah, everyday, regular spam.

Digital Spectrum

Note: This entry has been restored from old archives.

IEEE’s Spectrum magazine is making a digital distribution available[1]. I’ve been trying to use it over the last couple of months and have opted to get the digital version from next year. It’s a mutually exclusive offer, you either get bits or you get paper. The digital carrot is very compelling:

  • You get your Spectrum significantly earlier, fresh news is always alluring.
  • You don’t end up with a pile of paper that gathers dust.

So, like I said, I’ve opted for digital distribution. Piles of IEEE emails on the subject have compelled me to do so. There’s a rather large BUT though:

I will no longer read Spectrum.

Why? Well, the actual news content of any printed publication is valueless these days so this isn’t the reason Spectrum gets read in the first place. I’ll have skimmed anything interesting from the weekly news mailouts I get from IEEE, ACM, and SANS — not to mention news feeds like Slashdot, and Google. I read the paper edition of Spectrum because I can read it in the toilet, it’s not pretty but it’s true. Spectrum has well written and detailed stories on subjects that I wouldn’t normally investigate, it doesn’t matter that the information isn’t breaking-news and I’m using time in which I’d otherwise be staring at the door.

What does the new digital Spectrum do for me?

  • It employs an annoying and cumbersome non-web online reader.
  • It ties me to reading only when I’m in front of a computer.
  • I can’t read it on the toilet, or in bed late at night.

These are both locations where I tend not to take the laptop, and, really, I’d prefer neither one to be any more digitally enabled. So, I’ll only be able to read Spectrum while I’m sitting at my desk, or when laptopping elsewhere. But in these cases I usually have work to do, and in-between work times I have the entire Internet before me. Why opt to read Spectrum when I have expert-selected content feeds?

As for the first point, the digital Spectrum interface is crap. The real Spectrum killer for me is in the toilet, but usability is pretty important too. Has anyone ever seen one of these non-web web-content systems that doesn’t suck? They would be better off just sticking to PDF, but then I guess they’d loose whatever DRM the system they’re using provides. I’ve seen a lot of publications go for such non-web online systems during these web-or-die times, most of them have either given up (nobody reads because they made it too difficult) or switched to the sanity of just sticking with HTML. (Example: The West Australian, a newspaper I grew up with but stopped reading when I left WA because their online setup was unusable. Now they use a site that looks like every other news site, while design-dorks may shudder and think “urgh, how ununique”, my opinion is: good, I know how to use this site. I’m after news, not obstructions.)

So, despite all my complaining, I’ve opted for digital. But now I wont read Spectrum. Logic anyone?! I’m not at all sad about this, it was my decision. I have other magazines to stock the toilet, and now I wont have to debate with myself over how long to keep Spectrums and feel bad about throwing stacks of them in the recycling every 6-or-so months (so: periodical karma improved by about one fifth). It is intriguing to reflect on these moments when something leaves your life, why is it so and what do the stirrings of these surface currents indicate is lurking below. Then get on with life, differently informed.


[1] Using Qmags, which seems to offer quite a selection of publications. Maybe I’m in a minority, thinking the interface is crap. Or maybe there just happens to be enough people willing to use it to keep the thing alive. I’m not investigating their service in detail, the IEEE Spectrum interface might not even be what they use to deliver most of their titles. Some “Secure” Acrobat/ebook file would be another option, though I don’t like them much either (still not loo-compatible in my mind, and printouts defeat the purpose).

Us Techies

Note: This entry has been restored from old archives.

Us techies (in my generation: who grew up in the 80s/90s knowing what an IRC was, how to push the power button on the “computer”, and wrapping weird batch scripts around multi-disc ARJ decompression to install pirated games for our friends, for example) were a fairly unique bunch in back in the day. But to those entering Uni now (10 years later) the tech was there from the day they were born, for all of them.

Amongst our own generation we’re still respected (though never necessarily liked) for our ability to tweak the Excel spreadsheet or fix the Internet.

Amongst the youngsters will we be no better than the dude who fixes the shower? Just another annoying job that somebody has to do.

Nothing against plumbers, here in the UK the plumbers make the $$$ and can actually afford things like houses. I still think I’d rather be a landscape gardener.

My thoughts while reading Meet Your Future Employee.

I do worry when they start talking about “HTML programming” though, and “advances like blogging and social networking”. I suspect the reporter may be showing her own generational gap… nobody can hope to keep up with the pace of change.

I can’t help but observe that statements like “[lack of] face-to-face communications skills, a critical asset for a modern IT career” are comming from the old IT career professionals. Predicting their own obsolescence?

[Actually, I think my IT-generation sits somewhere in between that which is the subject of the article and that which is the observer, um, or am I generation Y? I’m certainly not X. The article mainly leaves me just all colours of confused. Gah, bloody compartmentalisation.]

Collateral Damage: An Unintentional Storm Worm DOS

Note: This entry has been restored from old archives.

Anyone else get the feeling that the Storm Worm proves that the entire ‘net security industry is useless? We already know that most security is ineffective against targeted attacks, and now Storm makes is clear that the state of security in general is ineffective against widespread attacks. Sure, your AV product will almost certainly protect you from Storm, but it wont protect you from Storm breaking the ‘net in general. The problem is that the fact that you do have an AV product installed and up to date places you in the minority.

OK, implying that we’re all stuffed is rather over the top … but sometimes I really feel rather perturbed by the whole situation.

Anyway, the latest fun fact I’ve noticed regarding the Storm worm is that some security-sensitive sites have started using blacklists to block HTTP clients. At this moment there are several security sites that give me messages like “ACCESS DENIED” or “File Not Found or your IP is blocked. Sorry.” but they work perfectly well if I bounce through a remote proxy. Why? Well according to some lists, such as cbl.abuseat.org, I have a Storm Worm infection. It happens that my ADSL picked up a new dynamic IP this morning that someone with an infection must have had last week. I understand why the websites are doing this, though I’m skeptical of the effectiveness of it as a countermeasure. Being the victim of a DDoS is pretty much worst-case-scenario for a popular site, anything that might reduce your vulnerability is going to look good.

What is the solution? Certainly not this sort of blacklistsing? We probably need to see a shift in the responsibility. The dumb end users can’t be held responsible, would it be a car owner’s fault if his car was stolen and subsequently the thief runs down a child with it? What if the car owner left the car with the engine running while popping into the newsagent to pick up a paper? The child’s death is still not the car owner’s fault I’d say, even if said owner is somewhat foolish. But we don’t know how to hold the thief responsible in the botnet case. The analogy works to describe my case for absolving the user, but breaks down when you look at it for assigning blame to the driver. Are the cars computers, IP addresses, or packets? Who’s the driver? What we do know is that 100% of car thieves are homicidal maniacs! Iieee!

Now, given that there are cars speeding around everywhere being driven by child-killers, roadblocks have been set up all over the place to keep the killer-cars out. Each roadblock has a long list of number-plates to check against approaching cars, the problem is that the list is very large and is always out of date. Some killers will get though (but you may be saved from the DDoS) though you’ll possibly just end up with a huge line of cars at your roadblock (DDoSing your roadblock!). Also keep in mind that the killers who aren’t on the list know that they aren’t and are capable of organising themselves to show up at a given location instantly.

How do we reliably know a bad packet from a good one? Who should be responsible (infrastructure providers need to foot some of this I think). What’s the solution? Buggered if I know 🙂 and if I did I wouldn’t be telling, would I? Let’s hope that some of the large number of smart cookies out there thinking about this come up with something that doesn’t suck! However, I fear that all solutions involve a giant and expensive leap: a new Internet. (Or, at least, a major overhaul of the one we have.) Is that even possible?