Popular Posts

Caveat Emptor

The opinions expressed on this page are mine alone. Any similarities to the views of my employer are completely coincidental.

Thursday 28 February 2013

Oxford's Institutional Bias 2

I said I would post again about this so here it is. As I made clear in my last post I don't know what the RELEVANT facts of the matter are about ethnic group variation in access rates to undergraduate degrees at Oxford and I think it is in the public interest for the appropriate information to be made available. I'd go further than that and say that the University should be monitoring admissions, getting a competent person to analyze the data, and putting the results in the public domain. If something is wrong about the way we are doing things then we should know and we should put it right. If we don't, then we only have ourselves to blame if journalists have to resort to FOI requests and then tell only carefully selected bits of the story. In fact we haven't been entirely negligent - a few years ago the university put quite a bit of money into the Oxford Admissions Study which was conducted by some of my colleagues. It's a great shame that this initiative wasn't carried on.

OK, now to my comments on the Guardian story. As always the Devil is in the details and the details are important  because if we don't pay attention to them we can fail to learn what it is that  data tells us.

1) If we want to learn about the selection process that generated the outcomes, which I assume are accurately described, then you have to make a serious effort to model that process and estimate numbers which correspond to the behaviours that are elicited by the choices, costs, constraints and benefits actually confronting the people involved ie students and admissions tutors. That means, amongst other things, you have to take account only of  information available to the actors at the points at which their choices are made. So, for example, admissions tutors do not choose students on the basis of their actual A level grades. They do have other information that is correlated with grades, but there is also, inevitably, error. So, to make it concrete, I happened to get the highest A level grades that it was possible for somebody taking 3 A levels to get, but at the point of application to universities all I had was a clutch of very mediocre O levels and a good reference from my Head of Sixth Form who took infinite pains in making careful personalized assessments of each of the pupils she wrote for (for which I will be eternally grateful). Despite her best efforts my O level performance would never have got me an Oxbridge offer in any subject. If I had applied and been rejected would that have  demonstrated that Oxbridge admissions tutors were prejudiced against children from the provincial lower middle classes? No of course it wouldn't. That doesn't mean that they weren't (and aren't) but it can't count as evidence in favour of the hypothesis because there is a reasonable alternative: I simply hadn't achieved enough at age 16 and there were a very large number of much better qualified candidates, yet if you control for my A level score it might look as though I'd been hard done by.

2) So controlling for GCSE performance is OK, as is controlling for predicted A level grade, but controlling for actual A level grade is not. Take a look at the Guardian data. Do you notice something odd? Where are the people that got at least 1 B at A level. It surely can't be the case that everyone who applies eventually gets at least 3 As. It would be a remarkable world in which there were no slip ups. Would you imagine that there might be a correlation between GCSE score and not quite getting the grade required? In other words there is an odd kind of sample selection going on here with respect to what really matters ie the information available ex ante.

3) You can't arbitrarily ignore important features of the selection process, ie the fact that colleges play an important role in admissions, that college choice is largely up to the candidates and that candidates are differentially equipped to maximize their chance of entry by making canny choices about which colleges to choose. Now it may be that empirically it turns out that choice of college is irrelevant. If so, all well and good, we've found out something we didn't know, but we can't simply assume that or appeal to official rhetoric - they would say that wouldn't they. If you want to understand the process you have to make some attempt to model the process no matter how difficult that may be. If you think colleges are irrelevant then it is not unreasonable of me to demand that you show me that this is the case. If it is hard then you have to get somebody who is up to the job to extract the maximum possible information from the (imperfect) data to hand.

4) You shouldn't trawl around for "significant" differences. Looking at the Guardian data as a whole what strikes me is that in very many subjects there are no differences worth talking about. In a few subjects there are some differences (but remember my caveats in points 1 & 2 above), medicine is one. Medicine is an important  subject and I would in no way wish to avoid investigating what lies behind these numbers. One of the things though that lies behind significant findings is that if you trawl through enough comparisons you will, just by chance, find some in the direction you are looking for and it is highly likely that the difference you find will be much bigger than the real difference ie unintelligent data-mining runs the risk of exaggerating effect sizes, often quite considerably. Set your p. value at 0.05 and 5% of the time you will find a difference that isn't "really" there. Do this in the context of a big data dredge and the probability is much greater.

So what's the way forward? The obvious answer is that the University should put enough of the right data in the public domain so that a proper analysis can be done on it. Important questions have, quite properly, been raised in the Guardian but we are very far from being in a position to give proper answers to them and the Guardian's own account is far from adequate. That's not to say that when all is said and done there might not be something to it. The fact is that at the moment we just don't know and we shouldn't be leaping to conclusions, particularly when those conclusions seem to imply inappropriate behaviour on the part of some admissions tutors, when all that has to be done is establish properly what the facts of the matter are. Keep the hair shirts on standby, we might yet need them, but first question all the suspects before you name the accused.

No comments: