Thursday 13 May 2010

Doing it by Degrees

When it comes to looking at university education, many do not have to think too hard about what they want to study, the bigger dilemma is where to study it. That said, university prospectuses will often tout various statistics to try and lure potential students into plumping for a particular course, with employability one of the more commonly seen figures. But is this a reliable metric of how 'valuable' a degree is? Or is it yet another example of STATISTICS ABUSE? (roll opening credits)

Every year AGCAS produces a report looking at destinations of graduates six months after graduation. The latest report, from 2009, is the result of questionnaires sent to all graduates from the 2007/8 academic year, and can be downloaded here. The report itself contains quite a lot of interesting data, with destination breakdown (how many graduates are employed, unemployed, or studying for further degrees) as well as stats on the types of job those in employment have found themselves in. With these statistics available for a number of subjects (or subject areas), we can get a feel for which subjects seem the most or least valuable.

The report provides details on the following subjects/subject areas:

Science

Biology; Chemistry; Environmental, Physical Geographical and Terrestrial Sciences; Physics; Sports Science

Mathematics, IT and Computing

Computer Science and Information Technology; Mathematics

Engineering and Building Management

Architecture and Building; Civil Engineering; Electric and Electronic Engineering; Mechanical Engineering

Social Sciences

Economics; Geography; Law; Politics; Psychology; Sociology

Arts, Creative Arts and Humanities

Art and Design; English; History; Media Studies; Languages; Performing Arts

Business and Administrative Studies

Accountancy; Business and Management; Marketing

So let's start with employment, surely a perfectly good benchmark of how 'good' a degree is. The AGCAS report splits graduates into those in UK employment, overseas employment, as well as those working and studying. We add these three together to give us our employment figures:

Top 5 for Employment:

Civil Engineering (78.3% employed)
Marketing (74.6%)
Business and Management (73.6%)
Architecture and Building (73.4%)
Accountancy (73.0%)

Bottom 5 for Employment:

Law (35.2%)
Physics (37.9%)
Chemistry (44.0%)
Biology (58.0%)
History (58.7%)

I think it's fair to say there are some surprises here. Marketing, and Business and Management two subject areas often cited as housing archetypal 'Mickey Mouse' degrees make the top 5, whilst historically 'tough' subjects like chemistry and physics are at the opposite end. Are people really better off studying business over biology? Or is there something wrong with our metric?

Naturally, I'm inclined to believe the latter, and with good reason. As is so often the case, one statistic does not tell the whole story; whilst these numbers tell us what proportion of graduates were employed six months after graduation, it is not simply the case that everyone else was unemployed. AGCAS reports a number of 'studying' statistics as well, such as those studying for a higher degree, a PGCE, or professional qualifications. Perhaps then, unemployment is a better way of assessing degrees, as this takes people who are 'employed' with study into accout. Let's see what happens:

Top 5 for Unemployment:

Law (5.5% unemployed)
Sports Science (5.6%)
Geography (6.4%)
Civil Engineering (7.0%)
Psychology (7.4%)

Bottom 5 for Unemployment

Computer Science and Information Technology (13.7%)
Media Studies (12.3%)
Art and Design (12.2%)
Electrical and Electronic Engineering (11%)
Accountancy (10.9%)

Quite a big change. Law jumps from worst for employment to best for unemployment (as you might expect, they're all studying), and accountancy has done the opposite. There are still some surprises, such as Computer Science and IT having the highest rate of unemployment, and another 'Mickey Mouse' course in the form of Sports Science being second best. However, this seems a much less debatable statistic than employment, and so it seems reasonable to take these figures at face value.

There is, of course, an issue we have yet to discuss, which will be a rather pressing one for many new graduates: money. What good is being employed if you're only getting paid £5 an hour for those fancy letters after your name?

The salary data in the AGCAS report are a little harder to find, let alone digest. Whilst we get nice pie charts and percentage breakdowns for destinations, discussion of salaries is restricted to an introductory paragraph. If we trawl through these, however, we do get some numbers, and merging them all together we can do another top and bottom 5, this time based on the average salary of respondents.

Top 5 for Salary

Economics (£24065)
Civil Engineering (£24006)
Architecture and Building (£23689)
Mechanical Engineering (£23683)
Electrical and Electronic Engineering (£22372)

Bottom 5 for Salary

Art and Design (£15656)
Media Studies (£16295)
Psychology (£16500)
Sports Science (£16627)
English (£16642)

Once again, a rather marked change. Media Studies keeps the bottom 5 place it enjoyed under the unemployment stats, but it is joined by Sports Science, which was second best for unemployment. There are no real surprises in our top 5, however, all these subjects having a fairly substantial pedigree.

For the sake of argument, then, let's suppose that you are most interested in average salary. As I mentioned, the AGCAS report makes it much easier to find the employment/unemployment figures for a subject than it does to find average salaries. Do these provide an adequate indicator of average salary? Our top/bottom 5s above would suggest not, but these only cover 10 of 26 subjects. Let's plot some graphs!

First up, average salary against employment, is there a strong link between the two?



Hmm, no obvious pattern there, then. How about unemployment, does that give us a better fit?



There doesn't seem to be any sort of pattern there either.

We can in fact calculate a number that gives us an idea of how closely related two sets of numbers are. The correlation coefficient between two sets of (x,y) points (like our (employment %, salary) points on our graph) varies from -1 to 1. If it's close to 0 that means our numbers are not closely related, whilst if it is close to +1 or -1 it suggests a strong relationship. For example, if in our plot of employment against salary above all our points seemed to be on a straight line, this would suggest a correlation of around 1 or -1. The sign indicates the direction of the correlation. If it's positive this means as salary increases, so does employment. If it's negative, then as salary increases, employment decreases. This doesn't mean the two are related - "correlation does not imply causation" is one of a statistician's many mantras - it just shows that these data happen to have an association (which we may go on to convince ourselves is a causal one).

So that diversion aside, what correlations do we get in our two plots above? Looking at them, we'd expect it to be close to zero; there doesn't seem to be much of a pattern in either of them. For the first plot, of employment against salary, we find a correlation coefficient of 0.12 - so not much of a surprise there. For unemployment it's even worse: 0.06. In short, neither employment nor unemployment is a good indicator of average salary.

There is one area of the AGCAS report we haven't discussed, however, which might prove useful. Whilst each subject has a page of percentages of those in employment, studying, and so on, it also has a page showing what types of jobs are held by those who are employed. These range from a variety of 'Professionals' down to 'Numerical Clerks and Cashiers', and 'Retail, Catering, Waiting and Bar Staff'. This last one doesn't sound too glamarous; you've just spent 3 years earning a degree and you're still working in a bar? More to the point, these jobs are going to be low paying, so hopefully they're a better indicator of average salary. Let's see:



There definitely seems to be a pattern there, and the correlation between the two variables is -0.88 - that's a pretty strong negative correlation. The higher the proportion of those employed in retail, the lower the average salary. Not a surprising result, but it's always worth checking these things.

Is this at all useful, though? The salary data are in the document, you just have to dig for them a bit more. There is, however, one thing we've not mentioned. Because the report doesn't give average salaries the same prominent treatment as the employment data, some numbers are, in fact, missing. Whilst we can see what proportion of history graduates are studying in the UK for a teaching qualification, we can't find their average salary six months after graduation (and the same goes for performing arts). However, because we've identified the percentage of those working in retail as a useful indicator of average salary, we can use this knowledge to predict the average salaries of history and performing arts graduates. (In statistics, we'd call our retail statistic an 'instrument' for salary.)

So how do we turn our retail employment data into a prediction of salary? If you read my previous post about the times goals are scored in football matches, you should already know where I'm going with this. If not, then go and read it now, and come back when you're ready to apologise for such an oversight.

So anyway, it's time for some more linear regression. We're looking to fit the model S = a + bR, where S is salary, and R is the percentage of those employed who are employed in retail. If we can estimate a and b, then we can use this equation to estimate S when we only know R, as is the case for history and performing arts degrees. We can also plot a cool line on our graph to show the trend. Running the numbers, we find a = 25014 and b = -468, and plotting the line this generates onto our graph gives us:



We can now either use our equation S = a + bR with a and b replaced with 25014 and -468, or read straight off the line on our graph. For both history and performing arts, retail employment was 17.4%, so plugging R = 17.4 into this equation gives S = 25014 - 468*17.4 = £16,870.40. Our model suggests that both subjects seem to lead to (relatively) low average salaries, something which would not have been easy to discern from the report alone.

Alas, this all assumes our model is accurate, and with a relatively small number of observations I wouldn't be inclined to place too much confidence in these conclusions. Here I've taken a single report to base rather a lot of analysis on. However, it does illustrate a couple of interesting points. Firstly, mere 'employability' figures seem a rather dubious metric on which to base the value of a degree. Perhaps more surprisingly, unemployment doesn't seem to be a particularly good one either, at least in terms of indicating average salary. Whilst this report did have salary data in it, they weren't as clearly laid out as the other data, and were in fact missing for some subjects. This has allowed us to demonstrate how you can use another variable (if you think it's a good enough surrogate) to estimate missing data. Whilst for this particular problem you're probably better off just trying to hunt down the data you want in another report, our way is clearly much more fun.

1 comment:

  1. This is interesting. These figures are always manipulated, depending on the intended message - I remember repeatedly being shown (in my first year, I think) that behind medicine and something else predictable, German was the third best for employability!

    ReplyDelete