ACM Queue on Virtualisation (bonus: me on spelling)

Note: This entry has been restored from old archives.

Or should that be virtualization. I’m sick of dealing with two slightly different dialects of English, we zhould remove the letter ezz from the alphabet or zomething. The problem is that back in those oh-so-formative years of primary school it was hammered into us that spelling errors are a crime against all things good and decent. Hammered in with a bloody pile-driver. People who didn’t “get it” in the spelling department, I being one of them, were labelled as stupid slow-learners and punished. This is despite getting into “talented” class streams (via some weird abstract test) and being a voracious reader of “very large books” — doesn’t matter, if you can’t spell you’re clearly a moron. (Do I hold a grudge against the Western Australian public Primary School curriculum, nah, despite life becoming incredibly better after I moved to a private school for my final year of primary school, never…)

The point is: not being able to spell is a crime and a word spelt incorrectly is an abomination. This facet of education is, I suspect, the reason so many people seem so ridiculously patriotic about their funny little localisations of English. My guess is that the root of the problem is a deep and abject fear of our primary school teachers, fascist dictators over years of our lives, who would mock and ridicule us if we forgot our i before e except after c (except the exceptions.)

So, what are we to do in a world where there are two correct ways to spell many words? Where, for business reasons, spelling something the wrong correct way (the uncomfortable way) is often required. A lot of people, such as the audience for that report you’re writing, suffer the same mental disability when it comes to uncomfortable foreign spelling. Grin and bear it I say, try to put the childhood monsters of “incorrect” spelling behind you. For any given document decide which is the best way to go and, first and foremost, strive for consistency.

OK… that was an unintentional rant-cum-ponderance. What I was meaning to write was that the Jan/Feb ACM Queue landed in my mailbox the other day and I’ve been reading it over my morning coffee this week. It provides a pragmatic, low-hype, introduction to virtualization. Starting with an essential history and “what is it” before moving on to an informative coverage of some technical gotchas by Ulrich Drepper. I highly recommend reading the articles to anyone curious about this latest buzz-word. They put the full Queue on the web now, not as nice to read as the paper version IMO (yeah, trees, I know) but better than nothing. (sigh using some non-web-format-for-the-web like IEEE spectrum.)

Tremor!

Note: This entry has been restored from old archives.

This morning just after I went to bed at about 1AM the whole building wobbled, plates rattled on the shelves, something fell over in the front room… I was kind of spooked, although the whole event lasted only a few seconds and didn’t even wake Kat up. Last time I felt a house move was back in 2005 when I was living in Haddenham and there was a huge thump that shook the place. That thump was a single shock though so very different, at the time I thought it might have been a supersonic flyover (there are air bases all over the shop here), but that’s also something I’ve never experienced. The next day it turned out that thump was a gas reservoir exploding more than 20 miles away!

So, an explosion was the first thing that came to mind last night. But, while brief, the shaking lasted far longer than you’d expect from a shock wave. I looked out the windows, no telltale flashing lights or shouting (not that there’d be any if it was a gas explosion 20 miles away.) No mushroom cloud on the horizon (the thought did cross my mind, very briefly.) I had a peek at several news sources online and saw no interesting “breaking news.” So I went back to bed thinking that maybe it was all in my pre-sleep imagination!

It’s 13:00 now and I’m looking at the day’s news for the first time… it turns out the shaking wasn’t hallucinated, it was an earthquake! The UK’s second strongest on record and strongest in 25 years (measuring, according to various sources, anything from 4.7 to 5.3, and apparently there was a small 1.8 aftershock at around 04:00.) Pretty crazy stuff. A fun reminder of just how tiny and insignificant we are, scurrying around on the surface of this great big ball of hot rock.

Oh My Hare!

Note: This entry has been restored from old archives.

Hares have been associated with gods, goddesses, witches, fertility, and all manner of other myth and legend. For me, from this night onwards, hares are associated first and foremost with the best animal flesh I have ever eaten. Seriously, I should just give up on the whole food thing now as I don’t think I’ll ever cook myself something this good again. I’ve had grouse, considered by some the best thing on two legs; I’ve had wagyu beef, considered by some the best thing on four legs… Hare is, I suppose, somewhere between the two and four legged and fittingly, in flavour it is much like grouse, yet in tenderness and absolute melt-in-the mouth divinity it is much like wagyu. Admittedly I probably haven’t had the best grouse there is, and never having been in Japan I’ve certainly never had the best wagyu there is. Though, my first hare ever, bought from the local butcher, have I had the best hare there is?

I’ll write up the full details of my roast hare experience in time, it’ll probably take a week or two given how little “spare” time I tend to have. It was quite a production as well, so isn’t going to be simple to get into words. In the meantime the following photo will have to suffice.

Ready to serve

Hare Krishna Hare Krishna
Krishna Krishna Hare Hare
Hare Rama Hare Rama
Rama Rama Hare Hare

Harey Weekend

Note: This entry has been restored from old archives.

It’s been a pretty terrible week for me. On Tuesday evening I lay down to sleep and suddenly had a sore throat, very strange. Seriously, there wasn’t a hint of a problem until I lied down and then within minutes it felt like I’d swallowed a caltrop. I’ve had the throat all week, progressively getting better while my head got worse. I tried to describe how I felt to Kat and came up with “it feels like I have a nest of insane, woolly ferrets running around in circles in my head.” All great fun, I assure you! sigh I never used to get colds and their ilk, must have stronger bugs here in the UK (admittedly this is just the second cold I’ve had in two years, so it could be worse.) Anyway, enough whinging, pathetic, weak human!

I’ve been looking forward to the weekend. In the preamble to my latest lamb shank casserole recipe I mentioned that I’d ordered a hare. Well, this morning we picked up our hare from Hamblings, it was only 10 quid! An animal fit for roasting that’d had at least a good 5 days hanging. Unfortunately we don’t know exactly how long it was hung for, the butcher said 5 days was the worst-case. Ideally a hare should hang for at least 7 to 10 days, and it’s pretty cool at the moment so longer would be better. The butcher got it in on Tuesday (it’d been hung prior to this), hung it for another couple of days and it was skinned and paunched on Thursday. I picked up some unsmoked streaky bacon from him too. I tried to get caul fat but he told me it’s “like gold-dust”, and said that’s the way it’s been since abattoir work became piece-work. Things that take too much time to do (and don’t yield much money) just aren’t done any more.

The butcher separated the hare’s legs from its saddle for me, then we wandered back home, via the veggie shop, to admire the goods. The first thing to hit me was the smell, this is one pretty pungent beast! Not a bad smell, not to my nose, but I think some might find it a bit nauseas. Anyway, you can admire the goods without the smell, as usual I’m taking plenty of photos!

Mr Hare

The meaty back legs I’m reserving for a casserole tomorrow. The saddle I’ve trimmed up and will roast tonight. The front legs and trimmings have gone into a pot with vegetables and herbs to make a game stock that’ll be used for both the roast and the casserole.

In other news, I put an order in with a catering company called Nisbets on Thursday. It was time for a new frypan, my old one I brought over from Sydney has reached the end of its non-stick life. Based on a recommendation from the much worshipped “Hugh book” I went for the Bourgeat brand (Nisbets was also recommended by the book.) Hugh described Bourgeat as the “current chef’s favourite” (in 2004), that seems a pretty good rating. I went all-out and ordered three different sizes! (20cm, 28cm, 3-eff’n-huge-6cm) I also got a nice big and heavy cleaver for butchering, well, anything really. Plus a length of muslin (something I’ve had trouble finding anywhere else), and a good solid muffin tray since we didn’t have one (it’s not generally going to be used for muffins though!) I can report that Nisbets’s “next day delivery” (their cheapest delivery option) really is next day! Here’s the loot:

Nisbets Goodies

I’ll be writing entries about the making of the stock, the roasting of the saddle, and the casseroling of the legs. Though, as usual, it will probably take a week or two for me to get the entries done, spare time is a rare commodity.

Stephen Fry on OSS

Note: This entry has been restored from old archives.

I didn’t know, until just now, that Stephen Fry has a blog. It doesn’t stop there however, he writes about technology and, furthermore, open source software. This is just crazy, I didn’t think the man could be any more godlike. Some choice articles from his blog:

It’s delightfully varied, check it out.

Skype Spam!

Note: This entry has been restored from old archives.

Not new news, but the first time this has happened to me. A Skype chat from “Online Notice ®” just popped up and told me:

Evil Skype Message

Evil Skype Message

A bit suspicious maybe?! Especially given that I’m not running any of the “Affected Software.” They’re trying to work me into a panic though it seems “Your system IS affected, download the patch from the address below! Failure to do so may result in severe computer malfunction.” Bullshit!

Visiting the URI shows a page that appears to run a scan and tells me, with a nice HTML/CSS generated “window” that looks just like an XP alert box, that I have a bunch of malicious software installed. Eeep! Next thing it tries is to sell you a 20 USD product they name as “Windows Software Patch – Scan & Repair”. Attempting to close the “window” pops up a real dialogue that says “Don’t close this window if you want your PC to be clean.

Evil Skype Website

Evil Skype Website

The final product page is registered to a Russian address and the page pushed via Skype is registered to a US address. Neither seems to be actively trying to exploit browsers, but, regardless, I wouldn’t visit either from an non-sacrificial system. In fact, the final site is well documented as a pusher of spyware known as ScanAndRepair:

  • SpywareRemove — removal instructions.
  • ZDNet blog — mostly identical to what I’ve seen, from November 2007.
  • McAfee — with a “please don’t sue us” disclaimer that says the program may have legitimate uses, bullshite.
  • CA — CA isn’t as insecure in their classification of this crapware.

Note that the sites are plastered with “ScanAlert” branding. This is actually a reputable security company (but not one that sells an AV product) recently acquired by McAfee. Don’t trust the branding you see on a website, be sure you have the right URL.

Please never buy any software that comes to your attention via email or Skype/IM, most especially never buy it by following links from either source of information! If you’re not running AV software on your ‘doze boxes go out and get some, but from a reputable source (over the counter or online from a known and trusted retailer), and stick to a brand name you’ve heard of. Then keep it up to date or it useless! (Debate about general brokenness of AV software aside, for the moment I still think it is better to be running AV software than not.)

Talk Talk isn’t a “morning person”

Note: This entry has been restored from old archives.

I’ve been on the Carphone Warehouse “Talk Talk” plan for quite some time now. Generally I’m happy with it. Call costs are great, for example our calls to Australia are about 50 times less than they were with BT. ADSL is generally stable, and download throughput close to the 8Mbit rating. There just one niggling problem: I can never turn off my ADSL modem. If I do turn it off, as I did last night, then I can look forward to several hours of unstable ‘net access the following day. What happens is, after powering up the modem, every 10 to 20 minutes the connection drops and comes back 30 seconds later. It’s a fresh connection each time, so new IP address and all existing SSH sessions need to be restarted.

I depend on ‘net access for work since I work from home. I can do almost everything without a connection. But getting up and running on a Monday morning tends to be a pretty ssh-intensive period. Mainly email and fresh checkouts, also of note if that our SCM (Perforce) requires connection to the server to edit files. I should look into running a local mirror/proxy of something I guess. Anyway, this all means that I never turn off the ADSL modem now. If I don’t turn if off the ADSL connection will remain rock solid for days at a time.

Slow Cooked Lamb Shanks with Puy Lentils

Note: This entry has been restored from old archives.

Preamble, or random chatter before the recipe

Lamb Shanks

The weekend of Feb 2nd was an interesting one in the kitchen, alas it wasn’t quite high enough standard to write about. In brief, we visited our favourite local butcher on the Saturday and picked up some very fresh English lamb liver and a bag of “stewing venison.” The liver I treated as simply as it deserved, sliced about 1cm thick, flash-fried for about 2 minutes a side in a very hot cast iron pan, and served with fried onions, sweet potato mash and a pita bread. I’m now a solid fan of lamb liver, this was a simple yet 100% delicious feed. I’ll try to cover something like it in more detail in the future.

The venison was very strong in flavour, a well hung beast I’d judge. I got extremely experimental on it’s ass, in chocolate style! As an accompaniment I cooked up my first ever mole (no, not a small rodent dug up from the local common), it worked pretty well but I’ll need to give it some more practice. I think I’ll have to pick up some of the fancy chillies from the chilli-dude at Borough Market (or grow them!) The venison itself was browned with some lardons then stewed for just 1.5 hours in lots of red wine with some carrots, onions, and celery. The venison was removed and the juices and veges passed through a food mill a couple of times then boiled hard, until it got too salty and I gave in and thickened it up a little more with some cornflour. Finally some 80% dark chocolate was grated into it. This was a very rich meal, very satisfying. The idea needs more work and a couple more tries before I can write it up.

Spices

After such an experimental weekend I decided to stick to more familiar territory on the following one. Lamb shank casserole is something I can do in my sleep! We picked up some pancetta and two very juicy looking English lamb shanks from the butcher on Saturday and everything else in this recipe came from the cupboard or vegetable bowl. I decided to twist my usual flavourings a little, throwing out the usual rosemary or cinnamon and adding instead juniper berries, star anise, cardamom, and cassia bark (almost the same thing as cinnamon really.) This flavour change worked well, especially in the surplus lentil soup.

I should add some sad news. Our preferred local butcher is Hamblings, since they’re Guild-of-Q and are a BASC Game’s On supporter. We only discovered them a little while ago, we considered finding a great butcher within walking distance of home an excellent bit of luck! (If you recall, when asked for rabbit the High Street butcher could offer only Chinese rabbits, ick. Meanwhile, Hamblings knows a local guy who shoots local rabbits … it really is a much more inspiring place!) One of the local councillors provides the surprisingly modern convenience of an RSS feed of monthly council news, including summaries of planning applications and their resolutions. It was from this that I learnt of an application to turn the Hamblings site into a “hot food shop.” Shock! Misery! I confirmed this with the butcher last weekend, they’ll be around for another handful of months and when they close they won’t be opening up elsewhere. The current butcher’s father started the business in 1969! Alas, of nearly 40 years we only get to know them for their final year, oh well.

We’ll be making the best use we can of Hamblings while it’s still around. We picked up more lambs liver (dinner last night) and a couple of very nice looking sirloin steaks (dinner tonight) today. Plus we ordered a fresh hare for next weekend, that’s going to be fun!

Anyway, enough chatter, I’ve got a recipe to write up…

Ingredients

Ingredients

Serves: 2 Large Dinners, plus 8 “leftover” 280g serves of lentils.

To serve more people simply add more shanks, the limit depends on the size of your casserole! I could add two more shanks to mine without a problem. This means you’ll add less water later and will probably want to make up the difference after the shanks are removed, otherwise the lentils will be too dry in the end.

2
lamb shanks (these shanks were 450g each)
1 tbsp
light olive oil (about 10g)
135g
cubed pancetta (lardons or streaky bacon will suffice)
1 large
diced brown onion (225g prepared, 260g before)
3 sticks
halved and sliced celery (180g prepared, 190g before)
2 small
roughly cubed carrots (100g prepared, 110g before)
1 small
roughly cubed eggplant (270g prepared, 280g before)
4 cloves
sliced garlic (20g prepared, 22g before)

Puy Lentils

350ml
decent dry red wine (a cask of Banrock Station!)
800g
organic chopped tomatos in rich juice
500ml
light beef stock
2 stars
star anise
5 small
pieces of cassia bark
10
cardamom pods
10
crushed juniper berries
4
small bay leaves
500g
puy lentils (soaked for 10 mins, then rinsed and drained)

Prepared Ingredients

I’m not going to detail the, minimal, preparation any further. The basic descriptions above combined with the photo to the left should provide all the detail required.

The first thing to do it pull out a heavy casserole, I own and love a blue 24cm Chasseur which is the vehicle for almost all my slow cooked recipes. For years I tried this sort of thing with lesser stockpots and saucepans and, while they can do the job, they just aren’t as easy to work with. Stick the casserole on a medium heat and add the olive oil, heat until it runs freely (but not so hot that it smokes) then toss in the pancetta. This should merrily sizzle and pop but not smoke, toss the sizzling pig until golden brown. Now the lamb shanks, ideally at room temperature and patted dry with a paper-towel, make some space amongst the pancetta pieces and place the shanks fat-end down. Let them sit and brown for a couple of minutes, then put them onto their sides and do the same, turn and repeat until the shanks have a good all-round browning (except where the curve of the meat/bone make this impossible of course!) The browning probably takes about 15 minutes all up. With this done put the shanks aside in a dish but keep the lardons in the casserole.

Ready for liquids

The vegetables come next. Toss the onion, celery, and carrot into the pot. This should be sizzling quietly, like quiet radio static (“What’s that!?” Says the digital radio generation.) Keep the veggies on the move so that they’re evenly heated and keep at it until they’re translucent and just beginning to brown. At this point the eggplant and garlic goes in. Again, keep things on the move until the eggplant has absorbed any excess oil and is starting to soften up, this should only be about 5 minutes. Add all the spices, toss, and then nestle the shanks into the vegetables, shifting veggies out of the way so the shanks are as low as possible.

Just Level

In with 250ml of wine! Note, keep 100ml for later. In with the stock! In with the tomato! Now top the casserole up with water until the liquid level is just level with the tops of the shanks (photo left.) This took a litre of water for me, but will depend on the size of your shanks and your pot. Give everything a good stir, making sure the shanks stay low in the water. Don’t worry that the liquid is rather watery, we’ll deal with this later.

Bring the liquid to just barely simmering, put the lid on the pot, and leave for 30 minutes. I suggest checking every five minutes three times to ensure the simmer is maintained. If it gets too eager you must reduce the heat. After the first thirty minutes are up give the casserole a good stir and turn the shanks. Do the same thing twice more at 30 minute intervals then after the next 30 minutes (so 2 hours all up) we’re done.

Reduced

Pull the shanks out of the casserole and put them aside in a bowl. Now push the flame under the casserole right up and in with the lentils! We want the liquid in the casserole bubbling pretty furiously, but not so much that it’s making a mess of your stove. Keep it like this until the liquid reaches a nice soupy texture, this is achieved by reduction and also by starches from the lentils. My casserole had lost about 1 inch (2.5cm) or liquid by this stage. Now stir in the extra 100ml of wine, this adds a desirable piquancy to the soup. Reduce the heat to a gentle simmer and keep on this until the lentils are done as you prefer, I gave them another 15 minutes. I like my lentils al-dente, especially puy lentils which will retain some of their lovely mottling if you don’t over-cook them. That’s really up to your own tastes though, stop the heat when the lentils have reached whatever you consider to be their perfect texture. Taste and add salt if desired, carefully.

Warming Shanks

We’re almost done now. The last thing to do is sink the shanks back into the lentils for about 10 minutes. This will re-heat the shanks and let the lentils cool a bit.

Serve by dropping a shank into a good sized bowl, ladling over as much lentil soup as desired, and topping off with some good EVOO and fresh ground pepper. A generous sprinkle of chopped parsley would go well I think, or even a gremolata, alas we didn’t have any parsley. Enjoy with a rich, dry red, maybe the one you cooked with — you do cook with a wine that is good enough to drink, right? I’m actually using a Banrock Station cask red for cooking at the moment, but prefer a richer wine to go with this meal. (Banrock Station was my preferred cooking plonk back in Sydney and a wine I’m quite happy to have a glass of. The price here compared to Sydney is scary, but that’s just London for you and the wine has travelled half way around the globe after all, bad “food mile” karma.)

Ready to Serve

Nutrition

In the end the lentil “soup” came to 2.9kg and per 300g serve has ~310 Calories. That’s taking into account all the ingredients above, assuming not much alcohol was lost from the wine, and that the lamb shanks added about 50g of fat to the soup (all erring on the greater side I think.) The other caveat is the pancetta, from one piece to another the fat content can vary wildly. So as usual, the nutritional details are a rough estimate (as you must realise these things always are!)

Lentil Soup: 300g
Thing Value
Energy 310 kcal
Carbohydrate 34.5g
Protein 16.8g
Fat 9.2g
  Sat 2.0g
  Mono 4.4g
  Poly 0.7g
Dietry Fibre 8.0g

I’ll leave it up to the reader to decide how many calories a lamb shank has, it’s simply too variable! After portioning out 300g of lentils for each shank we had leftovers to make 8 280g serves of soup, so 290 kcal per serve (+44 with a 5g drizzle of EVOO.)

There’s a few more photos of the cooking and ingredients in the
Lamb Shanks with Lentils photo album.

unsigned vs. long, an integer odyssey

Note: This entry has been restored from old archives.

It’s probably no surprise that I’m a little hyper-aware of secure coding of late. Sure, I’ve got the certification – but there’s still so much I don’t know. One thing I note in much of my own code and most code I’ve read is that there is little regard for integer errors. I’d take a stab that this is the most common coding error out there.

Of particular note is the problem of assigning the result of an operation between two variables to a variable of a different type. For example (unsigned) int versus (signed) long. Intuitively you’d think something like “long has more bits so it’s safe to assign an unsigned int to a signed long.” That could result in the following code:

unsigned aardvarkCageCount();
unsigned aardvarkCount();

int main(void) {
    ...
    long x = aardvarkCount();
    x -= aardvarkCageCount();
    ...
}

Our assumption is that because the functions called return an unsigned int that it is safe to put this into a signed long because a long‘s gotta have at least one more bit than an int, right? Yeah?

It is best to never make assumptions. The C99 standard defines only minimum values for integer bounds, by the standard the relevant bounds are:

Bound
Value
UINT_MAX: 65,535
LONG_MAX: 2,147,483,647

Hey, looks pretty safe doesn’t it? Recall that I said these are minimum bounds! Compiler writers can do whatever they like as far as maximum values are concerned. Compile and run this code:

#include <limits.h>
#include <stdio.h>

int main(void) {
    printf("UINT_MAX: %u (%d bytes)n", UINT_MAX, sizeof(unsigned));
    printf("LONG_MAX: %ld (%d bytes)n", LONG_MAX, sizeof(long));
    return 0;
}

For me:

:; gcc stdint.c -o stdint
:; ./stdint 
UINT_MAX: 4294967295 (4 bytes)
LONG_MAX: 2147483647 (4 bytes)
:;

Oh oh! An unsigned int can be larger than a signed long! So the assumption that the latter will be big enough to hold the former is wrong! Unfortunately this is an intuitive “easy fix” for those worrying about overflow, it seems good at the time, in theory. But the loose specification for int is that it represents the most natural value size for the underlying hardware, so 32 bits on most machines with which we’ve grown, with growing popularity for 64 bits. We’ll get to the relationship between int and long shortly.

#include <limits.h>
#include <stdio.h>

/* We have four billion aardvarks!  That's a lot of stew. */
unsigned aardvarkCount() { return 4000000000u; }
/* And only 20 cages. */
unsigned aardvarkCageCount() { return 20; }

int main(void) {
    long x = aardvarkCount();
    printf("There are %ld aardvarks.n", x);
    x -= aardvarkCageCount();
    printf("Aardvarks minus cages is %ld.n", x);
    return 0;
}

Gives:

:; gcc aardvark.c -Wall -o aardvark
:; ./aardvark 
There are -294967296 aardvarks.
Aardvarks minus cages is -294967316.

No surprise. Surely we can get the compiler to tell us when we’re doing something stupid like this?

:; gcc aardvark.cc -Wall -Wextra -ansi -pedantic -o aardvark
:;

No luck. As far as I can tell from scanning man gcc there isn’t an option that’s going to work for us. Some might be thinking, “what about -ftrapv?!” Sorry, not gonna help. The -ftrapv option in gcc only works when both arguments to arithmetic operations are signed – technically it works for overflow but not wrap. If we compile this code with -ftrapv it runs fine. For an example of how not-so-useful this can be:

:; cat ftrapv.c 
#include <stdio.h>
int main(void) {
    long x = 2000000000;
    unsigned y = 2000000000;
    long z = x + y;
    printf("z = %ldn", z);
    return 0;
}
:; gcc ftrapv.c -Wall -Wextra -ftrapv -o ftrapv
:; ./ftrapv 
z = -294967296
:;

To make -ftrapv do something the arguments to arithmetic operations must both be signed. As in:

:; cat ftrapv.c 
#include <stdio.h>
int main(void) {
    long x = 2000000000;
    long y = 2000000000;
    long z = x + y;
    printf("z = %ldn", z);
    return 0;
}
:; gcc ftrapv.c -Wall -Wextra -ftrapv -o ftrapv
:; ./ftrapv 
Aborted (core dumped)
:;

Unfortunately, as far as I can see, even then that’s all we can do. Aborting isn’t entirely useful, but I guess it is better than continuing on in an undefined state! Furthermore -ftrapv turns signed arithmetic operations into a function call, if we look at the disassembly of the generated code we see:

...
080483a4 <main>:
...
 80483b5: c7 45 f0 00 94 35 77 movl   $0x77359400,-0x10(%ebp)
 80483bc: c7 45 f4 00 94 35 77 movl   $0x77359400,-0xc(%ebp)
 80483c3: 8b 45 f4             mov    -0xc(%ebp),%eax
 80483c6: 89 44 24 04          mov    %eax,0x4(%esp)
 80483ca: 8b 45 f0             mov    -0x10(%ebp),%eax
 80483cd: 89 04 24             mov    %eax,(%esp)
 80483d0: e8 2b 00 00 00       call   8048400 <__addvsi3>
...
...
08048400 <__addvsi3>:
 80483f0: 55                 push  %ebp
 80483f1: 89 e5              mov   %esp,%ebp
 80483f3: 53                 push  %ebx
 80483f4: 83 ec 04           sub   $0x4,%esp
 80483f7: 8b 45 0c           mov   0xc(%ebp),%eax           # arg2
 80483fa: 8b 4d 08           mov   0x8(%ebp),%ecx           # arg1
 80483fd: e8 2a 00 00 00     call  804842c <__i686.get_pc_thunk.bx>
 8048402: 81 c3 e2 11 00 00  add   $0x11e2,%ebx
 8048408: 85 c0              test  %eax,%eax                # set SF if arg2 < 0
 804840a: 8d 14 01           lea   (%ecx,%eax,1),%edx       # 'lea' trick for arg1 + arg2
 804840d: 78 11              js    8048420 <__addvsi3+0x30> # if arg1 < 0 goto cmp below
 804840f: 39 d1              cmp   %edx,%ecx                # else compare result and arg1
 8048411: 0f 9f c0           setg  %al                      # %al = 1 if arg1 < result
 8048414: 84 c0              test  %al,%al
 8048416: 75 0f              jne   8048427 <__addvsi3+0x37> # if %al == 0 jump to abort!
 8048418: 83 c4 04           add   $0x4,%esp
 804841b: 89 d0              mov   %edx,%eax
 804841d: 5b                 pop   %ebx
 804841e: 5d                 pop   %ebp
 804841f: c3                 ret   
 8048420: 39 d1              cmp   %edx,%ecx                 # compare result and arg1
 8048422: 0f 9c c0           setl  %al                       # %al = 1 if arg1 > result
 8048425: eb ed              jmp   8048414 <__addvsi3+0x24>  # jump to test above
 8048427: e8 b0 fe ff ff     call  80482dc <abort@plt>
...

Ick! That’s a fair bit of code for an addition! A more readable representation of __addvsi3 would be:

int add(int a, int b) {
    int good = 0;
    int res = a + b;
    if ((b < 0) && (a > res)) good = 1;
    else if (a < res) good = 1;
    if (good == 0) abort();
    return res;
}

(Sorry about the iffy one-liners.)

Before you think “but what about -O[123]!?” … Yep, they get rid of the call. The optimisation takes precedence, since the code is adding two simple constants the optimiser does the addition at compile time. In this way optimisation reduced the effectiveness of -ftrapv. It doesn’t disable it entirely though, a simple trick is to set one of the variables volatile and do an optimised compile. In this case you’ll observe that the call to __addvsi3 is included despite optimisation. This is heading towards another can of worms now though, how often does optimisation really matter for most software? I’ll save the rest of this line of thought for another day…

Anyway, this is all beside the point as our original problem involved assignment with mixed signed values. I.e. it boils down to this:

    long x = 4000000000u;
    printf("z = %ldn", x);

We’ve failed to find any -W flags or any other trap/exception for gcc that will help us out here (or g++ for that matter, my original examples were all C++ but I decided to “purify” them.)

What to do? I think the answer is: don’t make the mistake in the first place! Be aware of the limits, and the limitations of the limits. When necessary always check the values, but try not to get into problematic situations in the first place. If you can afford the overhead use an arithmetic library that deals with the details for you (I’d guess that the majority of code can afford the overhead.) Adding your own boundary guards can be tricky work and I’d recommend sticking to safe alternatives or, at least, recipes from a reliable source.

Let us follow some “intuition” again. An approach is to first store the return values in local variables of the right type. Then test the values before trying to store the result in a long (yes, the example is rather contrived, live with it.)

int main(void) {
    long x = 0;
    unsigned a = aardvarkCount();
    unsigned c = aardvarkCageCount();
    if ((a - c) > LONG_MAX) {
        printf("Oops: %u - %u = %u!n", a, c, (a-c));
        printf("LONG_MAX=%ld LONG_MIN=%ldn", LONG_MAX, LONG_MIN);
        return 1;
    }
    x = a - c;
    printf("There are %u aardvarks.n", a);
    printf("Aardvarks minus cages is %ld.n", x);
    return 0;
}

This is only a guard for the “subtraction result too large to store in a long” case. It looks reasonable enough to cover this though, right? Nup! What happens when a is 1 and c is 2?

:; ./aardvark 
Oops: 1 - 2 = 4294967295!
LONG_MAX=2147483647 LONG_MIN=-2147483648

The problem here is that the (unsigned - unsigned) expression yields an unsigned value, thus we get integer wrap! So, we’re not there yet. We’d better guard out the wrap case:

    if ( ((a > c) && (a - c) > LONG_MAX) ) {
        printf("Oops: %u - %u = %u!n", a, c, (a-c));
        printf("LONG_MAX=%ld LONG_MIN=%ldn", LONG_MAX, LONG_MIN);
        return 1;
    }

Now, we also want to guard the case where (a - c) is is too small for LONG_MIN. Don’t go thinking this is good enough either:

    (a < c) && (c - a) < LONG_MIN) /* BAD! */

Remember that (a - c) will always yield an unsigned result, thus the second condition will always be false. Good thing about this case is that the compiler will actually warn you about the comparison between signed and unsigned values. (You always compile with warnings on, right?) We’ll need to rearrange the arithmetic.

    if ( ((a > c) && (a - c) > LONG_MAX) ||
         ((a < c) && (c > (LONG_MIN + a))) ) {
        printf("Oops: %u - %u = %u!n", a, c, (a-c));
        printf("LONG_MAX=%ld LONG_MIN=%ldn", LONG_MAX, LONG_MIN);
        return 1;
    }

This raises the question: “is (LONG_MIN + unsigned int) a safe expression?” We can look at section 6.3.1.1 of the C standard to answer this. The integer conversion rank definition states, in part, “The rank of a signed integer type shall be greater than the rank of any signed integer type with less precision.” (The precision is defined as the number of bits used to represent values.) The following ranking clause is “The rank of long long int shall be greater than the rank of long int, which shall be greater than the rank of int, …” These two clauses tell us, I think, that if an int is 64 bits then a long will also be 64 bits, which would be the primary concern here. Phew, we seem to be safe.

I’ll leave this here since I don’t have time to dig any deeper. There’s more to consider though! For example, what about non-two’s-complement representations, can we derive the same safety from the rules? What about performance optimisations? What else? I suggest you read the specification. It’s hard going! And reading is very different to understanding. Of course, understanding what is specified also requires understanding exactly what isn’t specified! I’m not even close myself…

The main point of this exercise is to indicate that the problem is a difficult one and the solutions are often non-obvious. For this reason I’d recommend the previously suggested path of relying on a 3rd party secure/safe arithmetic library. This offloads the worry to someone else, who’s hopefully better informed and also does their best to keep their solutions optimal. Personally, I’d prefer to let someone else solve the problem.

A good starting point for learning more yourself is the CERT Secure Coding Standard wiki section on integers (in C.) Then try to find a reasonable safe integer arithmetic library, I’m afraid I don’t have the experience to recommend any particular one. The ones documented in literature I’ve covered are for Windows (I’m most familiar with IntegerLib on the CERT site, it is a Visual Studio project and needs some tweaking to be compiled on Linux. I’ve done enough tweaking to make it compile for me, see note [1] below.) Alternatively (and perhaps most simply) change your integer representation completely by switching to one of the various arbitrary precision arithmetic libraries available. GMP, for example, has been around for ages, is multi-platform, and even has a C++ wrapper.


[1] I got the CERT IntegerLib compiling under Linux. In theory this is written by experts who got all the guards right… Unfortunately it was clearly written with only Windows in mind as there is no support for a non-Windows build. I’ve worked through the code and added the trivial changes and a build system that make it compile for me (up to date Ubuntu gutsy.) My mods to IntegerLib are here (or here for just a diff.) Don’t mistake “compiles” for “is correct” though, it has not gone through rigorous testing. You’ll need a typical C build environment (with all the autotools/libtool junk if you want mod and regenerate the configure stuff.) To compile enter IntegerLib/IntegerLib and do a ./configure then a make. If you’re lucky this will give you the test binary, run it to see the examples of how it works. Have a look at their IntegerLib.h file for a listing of the available methods. Note especially that correct use of their library requires that all parameters must be of the type defined in their API and the same goes for the variable used to store the return type. This code doesn’t directly solve the original problem: subtracting unsigned int values to a signed long result. The main problem is that the unsigned int subtraction may result in a value that is perfectly reasonable for a signed long but no good for an unsigned int. The library as it stands doesn’t cover arithmetic operations assigned to a different type (where the result may possibly be valid.)

pcrecpp::QuoteMeta and null bytes

Note: This entry has been restored from old archives.

Update 2008-02-22 13:45: Sometimes I hate it, just a little, when random things I write end up as 2nd/3rd Google hits for the thing I wrote about! This isn’t a review, it’s an entry about one pretty trivial bug found in a utility method provided by the pcrecpp API. The project maintainers were quick to respond to the bug with a good fix. In general I have this to say about pcrecpp: It’s the best option I’ve found for working with PCREs in C++, it beats the pants off working with the C API or writing your own wrapper.
Update 2008-02-15 09:45: Bug filed yesterday and they’ve gone with special-casing the escape for NULL. I provided a patch that added a new QuoteMetaHex function, but I much prefer the route they’ve chosen. (I was concerned that they might really want it to be exactly like Perl’s quotemeta.)

Be warned! If using pcrecpp::QuoteMeta on strings with embedded null bytes the results might not be as you expect!

#include <pcrecpp.h>
#include <string>
#include <iostream>

int main(void)
{
    std::string unquoted("foo");
    unquoted.push_back('');
    unquoted.append("bar");
    std::string autoquoted = pcrecpp::RE::QuoteMeta(unquoted);
    std::string manualquoted("foo\x00bar");
    std::cout &lt;&lt; "Auto quoted version is: " &lt;&lt; autoquoted &lt;&lt; std::endl;
    std::cout &lt;&lt; "Auto match result is: " &lt;&lt; pcrecpp::RE(autoquoted).FullMatch(unquoted) &lt;&lt; std::endl;
    std::cout &lt;&lt; "Manual quoted version is: " &lt;&lt; manualquoted &lt;&lt; std::endl;
    std::cout &lt;&lt; "Manual match result is: " &lt;&lt; pcrecpp::RE(manualquoted).FullMatch(unquoted) &lt;&lt; std::endl;
    return 0;
}
:; g++ quotemeta.cc -o quotemeta -lpcrecpp
:; ./quotemeta 
Auto quoted version is: foobar
Auto match result is: 0
Manual quoted version is: foox00bar
Manual match result is: 1
:;

Dammit!

But is it a bug? The documentation in pcrecpp.h says:

  // Escapes all potentially meaningful regexp characters in
  // 'unquoted'.  The returned string, used as a regular expression,
  // will exactly match the original string.  For example,
  //           1.5-2.0?
  // may become:
  //           1.5-2.0?
  static string QuoteMeta(const StringPiece& unquoted);

And that’s what man pcrecpp tells me too. So the definition is essentially “does what perl’s quotemeta does.” Hrm:

:; perl -e 'print quotemeta("foox00bar") . "n"'
foobar
:; perl -e 'print quotemeta("foox00bar") . "n"' | xxd
0000000: 666f 6f5c 0062 6172 0a                   foo.bar.

That second command is just to determine that the null byte is actually still there. The same trick with ./quotemeta shows that the null is also still there when pcrecpp::QuoteMeta is used.

So, the behaviour of pcrecpp::QuoteMeta is correct by definition.

What about the matching then? Should “” followed by a literal null be part of the regular expression? I’m not sure about libpcre semantics for this but let’s test with Perl. Note that pcrecpp::FullMatch means the whole string must match, so the Perl expression must have “^” and “$” at either
end.

:; perl -e '$s="foobar"; $p=quotemeta($s); $s =~ s/^$p$//; print "<$s>n"' | xxd
0000000: 3c3e 0a                                  <>.
:; perl -e '$s="foobar"; $p=quotemeta($s); $s =~ s/^foo//; print "<$s>n"' | xxd
0000000: 3c00 6261 723e 0a                        <.bar>.
:; perl -e '$s="foobar"; $p=quotemeta($s); $s =~ s/^foo//; print "<$s>n"' | xxd
0000000: 3c62 6172 3e0a                           <bar>.

OK, looks like pcrecpp isn’t matching like Perl. Digging into the pcrecpp.cc source equivalent to the Ubuntu package I’m using shows:

...
RE(const string& pat) { Init(pat, NULL); }
...
void RE::Init(const string& pat, const RE_Options* options) {
 pattern_ = pat;
...
  re_partial_ = Compile(UNANCHORED);
  if (re_partial_ != NULL) {
    re_full_ = Compile(ANCHOR_BOTH);
  }
...
pcre* RE::Compile(Anchor anchor) {
...
  if (anchor != ANCHOR_BOTH) {
    re = pcre_compile(pattern_.c_str(), pcre_options,
                      &compile_error, &eoffset, NULL);
  } else {
    // Tack a 'z' at the end of RE.  Parenthesize it first so that
    // the 'z' applies to all top-level alternatives in the regexp.
    string wrapped = "(?:";  // A non-counting grouping operator
    wrapped += pattern_;
    wrapped += ")\z";
    re = pcre_compile(wrapped.c_str(), pcre_options,
                      &compile_error, &eoffset, NULL);
  }
...

Hm! The problem is that the QuoteMeta leaves the literal null byte in then later on the compilation uses the string’s c.str(). Naturally this will be null-terminated, so that marks the end of our pattern.

It seems to be that pcre_compile doesn’t offer a version with a specified string length so there’s no way around this without printable-quoting the null. This is find in libpcre since the behaviour is obvious. Maybe not so find in pcrecpp since it is common to use std::string as a non-printable data container (it is very common, but maybe it is a “bad thing”™?) I think it is a bug in pcrecpp, but it could be a “document the caveat” bug rather than “do magic to make it work bug.”

Looks like pcrecpp.cc is actually part of upstream libpcre. Should file a bug I guess.