All models are wrong.: How many horses?

So the Lib Dems sent me some election material this morning. Unfortunately for them, ours is a very safe Labour seat, as you can see from this bar chart of the last election:

Not particularly pretty, but fairly clear, I think. Labour have a big majority, the Lib Dems are a (relatively) distant second, and the Tories and Greens are pretty much just making up the numbers.

So, how did the Lib Dems choose to present these data in their election leaflet? Like this!

Crikey. They say it's a two-horse race, and it really does look like one, doesn't it? Except hang on, this graph should be showing the same data as my one, why does it look so different? Surely they haven't been abusing statistics for political gain?

Well, before we accuse them of that, let's check a couple of common tricks people use when presenting bar charts to try and give a particular impression.

First up, it's the 'cut the y-axis above zero' method. Here that means rather than having the bottom of the graph equivalent to zero votes, having it equivalent to something larger. The Lib Dems can't have done this though, because that would only exaggerate the difference in votes. To demonstrate, if we dismiss the Tories and just plot the Lib Dem and Labour votes, and have a cut-off at 9,000 votes, it looks like this:

Wow, no point voting for anyone other than Labour here, they've got it wrapped up... (Obviously, were we making real propaganda, we'd leave off the y-axis; you can't have people reading that and working out what we're up to!)

So the Lib Dem's can't have done that, so another option is a logarithmic y-axis. What this means is that rather than each mark on the y-axis indicating a constant increase of votes, each mark instead corresponds to an increase by a factor, maybe 10. In other words, whilst a standard axis will go 1,000, 2,000, 3,000, and so on, a logarithmic one would go 1,000, 10,000, 100,000, increasing by a factor of 10 each time. These scales are useful for when you're trying to show a graph with both very large and very small numbers. It would seem a bit silly to use one here, but can it explain the Lib Dem graph?

Encouraging? Maybe. Notice now how everyone seems much closer, and that the y-axis is increasing logarithmically; going up in multiples of 10. This still doesn't really look like the graph the Lib Dems produced (the Tories seem a lot closer than they should be), so let's tweak it a bit more, and go back to cutting the y-axis off somewhere suitable. We'll also drop the pesky marks on the y-axis that actually tell us what's going on:

Aha! That's much more like it. Not a perfect imitation, but certainly getting there. We've got the Tories down as an also-ran, and the Lib Dems really giving Labour a run for their money. We could probably pick a better logarithmic factor (we used 10 here) to get the Lib Dem and Labour bars a bit closer together, but I think by now we've established that the Lib Dems are really just playing Silly Buggers. I can't imagine they actually fished around for a good scale on which to make the graph look like that, instead they've just drawn some appropriately shaped bars and stuck the numbers on. Of course, they've told us the numbers (and even given a source for bonus authenticity!), so it's our own fault if we just look at the coloured rectangles and draw the wrong conclusion. Still, that's precisely what they're hoping people will do, and it's a great example of why people don't trust statistics.

All models are wrong.

Wednesday, 5 May 2010

How many horses?

No comments:

Post a Comment

Followers

Blog Archive

About Me