Popular Posts

Caveat Emptor

The opinions expressed on this page are mine alone. Any similarities to the views of my employer are completely coincidental.

Monday 30 November 2015

In Praise of Brian Glanville

In one way I owe Brian Glanville a great deal. Between the ages of 11 and 12 I more or less stopped reading. Up to then I had consumed the usual diet of Enid Blyton adventures and Anthony Buckeridge prep school stories. But then I quit, unless you consider that reading means the keen consumption, whenever I could get them,  of Commando and Battle Picture Library strip comic books. "This is where you get yours Fritz! Achtung! For you ze var is ofer Tommy". I've blogged before about the odd assumptions that secondary school teachers at that time made about our reading tastes. But Brian Glanville came to my rescue.

For reasons that are obscure to me, and are perhaps no more profound than my parent's recognition of my all consuming, and given my lack of talent, wildly unrealistic, fantasy that I was going to become a professional football player, I  was given a copy of Glanville's novel for teenage boys Goalkeepers are Different. I read it immediately. Twice. It is, though obviously I wouldn't have put it like that at the time,  a Bildungsroman and, crucially, it was about people living now, who weren't completely different from myself. Somehow it kick-started me back into a reading habit that led to John Wyndham and a love of science-fiction. OK , it wasn't Jane Austen, but it was better than nothing.

For 40 years I had more or less forgotten Brian Glanville. I knew he was considered one of the best journalists writing about football, but I had no idea he had written novels for adults. Then a few weeks ago I acquired a copy of his long out of print 1974 novel The Comic republished a decade or so ago in paperback  by Smaller Sky Books. It is actually a very fine piece of work, a first person narrative told from the point of view of an over the hill comic sitting in rehab after some kind of breakdown.

The ingredients are all pretty standard, pick and mix  the life story of Max Miller, Tony Hancock, Max Wall and tens of others, but it is done with style and sensitivity. And it pretty much gets to the heart of the love-hate relationship between the stand-up comedian, their audience, their family, the agents, and the promoters. In the end the protagonist finds some sort of redemption and insight into their own lonely predicament through taking on a straight role in regular theater. 

One man, in the spotlight, trying to make other people laugh by being something he is not; living and dying by the reaction to his last performance. There is something in it that in etiolated form is vaguely reminiscent of  being a university lecturer. 

Smaller Sky Books, by the way, seems to be owned by John Wain's son.

Wednesday 25 November 2015

Where we went wrong

If you are concerned about the future of higher education in the UK and you only have time to read one thing today, this week, this year then you should read this interview with David Colquhoun. (And a bonus is that he has sensible and informed things to say about p. values.)

Tuesday 17 November 2015

Hollow Point

So Jezza is getting it in the neck again, not least from his own party, over his response to a question about "shoot to kill". It has to be said that he didn't play the ball very well and in hindsight he was naive not to ask for the question to be posed in a much clearer way so that he could give a more precise answer. 

It's now possible to portray him as believing that the security services should not try to kill Kalashnikov toting lunatics who are picking off passers by. If he meant that, then clearly he can't be Prime Minister and shouldn't be leader of the opposition. But I don't think that is what he meant at all. 

Actually he is right to say that we don't want a "shoot to kill" policy if in practice it is equivalent to the set of "procedures" that led to the killing of Jean Charles de Menezes. In other words poorly controlled armed police, desperate for a result, running around our large cities targeting ordinary people going about their business because they think they look a little suspicious. And then lying about what they have done.

Jezza should be thinking twice before he speaks and we should be careful about what we wish for. Next time whose son or daughter will it be that gets the hollow point?

Friday 13 November 2015

Death of the random sample greatly exaggerated

We are told over and over again that there is a coming crisis in empirical sociology and that one of the victims of this will be the face-to-face social survey with respondents selected by probability sampling methods. To be sure there are increasingly big challenges involved in collecting data in this way - responses rates more or less everywhere have plummeted over the last 20 years - but there is still life in the old dog. And today's Guardian has an encouraging report about the success of the British Election Survey's post-election data collection compared to the pre-election polls (including their own pre-election panel). I'll link to the BES's own blog post on this rather than the Guardian's story because it contains much more detail and because they deserve the traffic! 

Shout out too for one of our ex-students Jon Mellon who is behind a lot of the work reported there.

The lesson seems to be that if you actually care whether the results of your research bear some relationship to reality as opposed to only caring about creating a big media splash, then you have to spend the money  to select respondents at random and then make the effort to pursue them vigorously. Because, at least for some questions, relying on heavily self-selected respondents is like pissing in the wind: unless you are very agile you are going to get your feet wet.

Thursday 12 November 2015

The Comedians

The important things to be done this week disposed of I sat down last night and started to read Social Class in the 21st Century. I wish I hadn't. I managed  the introduction but I now have doubts as to whether I'll be able to get to the end of the book without losing the will to live. Life really is too short for this sort of masochism.

Where to start? Problem number one is the word salad. I wonder what the Penguin editor was doing? Certainly not making sure that the prose always makes sense. They wouldn't need any specialist scientific knowledge to do that, so I  guess  they just didn't care as long as there were actual words on the page and units shifted off the shelf.

What should have happened is something like this:

Editor: Hi guys, great draft, just a few changes I'd like to run past you.

GBCS team: Er... OK, this won't take long will it? We're due at the BBC in half an hour.

Editor: Relax, I've booked you into Fawlty Towers for a week or two.

GBCS team: [Looks of uncomprehending astonishment]

Editor: Let's get going. You say on page 4 that: "Understanding class as based on these three capitals allows us to understand how growing economic inequality is also associated with growing class inequality between the top and the bottom." Nice sentence. Just one small problem. How can you say anything about, let alone understand,  "growing class inequality" when you only have information about one point in time? If you guys have solved that problem then you should patent it.

GBCS team: [Nervous smiles, rolling of eyes and black looks] Sure, whatever, only we are rather busy...

Editor: And another thing, on page 7 you write:  "Scientific experiments are normally expected to stand back from the research they are conducting in order to provide distanced and 'objective' results, for instance using randomized controlled tests when comparing which medical interventions are effective. However, in the case of the GBCS, we could not do this. Interests in class are themselves so highly loaded that if we stand back, then we miss the energies, intensities, but also the hostility and insecurity that are bound up with class."

Now then, let's look at that first sentence. It is usually the human being ie the experimenter, rather than the experiment itself which, as you so charmingly put it,  is "expected to stand back". But that is a mere bagatelle, a slip of the pen and easily corrected.

 But then you go on to put 'objective' inside those scare marks. What are you implying exactly? Shouldn't you spell it out? It looks like you are casting aspersions but don't have the courage to say explicitly what you mean. Do you think objectivity is a bad thing, or simply an impossible thing? Don't you owe it to your readers to be straight and tell them exactly what you mean? 

Let me put it this way, if someone wrote: "Professor X the renowned 'public intellectual'" you could, quite reasonably, interpret it as a disparaging remark suggesting doubt as to whether he really was an intellectual, or even scepticism about the pretensions of public intellectuals in general. What precise shade of meaning was intended would be difficult to pin down (which is why it was used). It would, in effect, be a lazy jibe which leaves the reader to fill in the gaps with a wink from the author. That's OK in boulevard journalism, but not in an academic book - even a trade book on an academic subject.

So  are you for objectivity? If you are not, why should anyone pay any more attention to you than to the bloke down the pub?

Oh, and one more thing, I believe you mean randomized controlled trials. A classical education at Balliol does instill  respect for precision in a chap you know [pursed lips, looking down nose].

GBCS team: [Impatient and unimpressed] How much longer is this going to take?

Editor: [Headmasterly] Well perhaps you should have taken a bit more time with your prep...Sit down, you're not going anywhere until I'm done.

I want to talk to you now about your figures and tables. I'm wondering why you felt it necessary to put a scale in units of 20 miles on your map (pp 8 Figure 1.1 or 0.1, note to copy editor: you're fired) of Great Britain?  I mean, nobody is going to be using it as a route map  to get them from say Basingstoke to Aberdeen and the distances involved are quite irrelevant to the point being made. And  why is the legend on page 9 so incomprehensible.  I'll give you that all becomes clear when you read the body of the text, but didn't they teach you in grad school that figures and tables should be understandable without having to refer to extraneous material?

And then there is the case of the mysterious column labels to Table 0.3 (pp 15). The first two I got, but the third had me baffled for a bit. What it says is: "% of the population who undertook the GBCS (2011 Census, England and Wales)". After a bit of thought I realized that what you meant was just  "% of the population of England and Wales". Your description is a) confusing and b) tells me you have the wrong reference population (though that is among  the least of your worries).

Now we come to column four and there you really got me. The label says "% of each group's graduates who undertook the GBCS". I was really struggling now to understand what the numbers meant until the penny dropped that what the column should have said was simply "% graduates" and that thus inter alia you were telling me that 71.9 percent of the Chinese respondents to the GBCS were graduates. At least that is what I think you meant to say, but who knows? Do you? Does it even matter except as an indicator of a rather, shall we say, casual, attitude towards data, evidence, facts and that sort of thing.

GBCS team: This is  just petty nit picking. Most of our audience is innumerate anyway so what do they care? Let's face it, you can fool most of the people most of the time and we should know.

Editor: Quite. But don't you as Britain's foremost quantitative experts on the sociology of class care about the possible damage to your reputation? [mulls over ancient AJP Taylor quip about Professor Hugh Very Ropey] Let's treat that as a rhetorical question. Anyway, I read with great interest what you write on page 5.

"The current explosion of interest in questions of class came home to us in 2013 when we published findings from the BBC's Great British Class Survey, which was publicized by the media and provoked astonishing interest across the globe."

I was so interested in fact that I asked the ever obliging Maureen to do a little internet research for me  using Google Trends. I feel sure you approve of the method. First of all let's look at the frequency of searches originating in the UK using the words "social class". 


To interpret this correctly (I'm sure you are really keen on that) you have to know how Google normalizes the data. It starts by expressing the number of searches mentioning the search term as a proportion of all searches in a particular time period. It then sets the highest proportion to 100 and expresses the rest of the series relative to that.  The important point is that this is a measure of the relative salience of interest in the search term. Absolute interest in the term could be increasing even though relative salience is decreasing.

Still, for what it is worth, the trend in (relative) interest between 2014 and today is, if anything, downwards and more of an exhausted fart than an explosion. The exception to this trend is the spike in 2013 corresponding to the  publicity puff given to the initial GBCS paper  by the BBC. So what we learn is that if a massive public service broadcaster makes a news article out of something it has itself manufactured then you can get people interested for a short while. But then their interest returns to roughly the same level as before. Big whoop. Should anyone be surprised by that?

But hold on, this is a little unfair I hear you say. OK, though we can't get numbers on the absolute number of searches from Google we can try and contextualize interest in "social class". Let's compare searches on "social class" with three other probes into the Great British Public's interests. Maureen thought it would be a good idea to also search on "Britain's got Talent", "Manchester United" and another abstract idea "religion".
"Britain's got Talent" peaks and troughs depending on whether the show is running. Interest in "Manchester United is high and has been gently climbing since 2013. "Religion" which is just as abstract an idea as "social class" has been pretty steady. The overall levels of all three make interest in "social class" look insignificant and the spike in 2013 look like a pimple. If interest in "Britain's got Talent" in 2009 were K2 interest in "social class" in 2013 would be Richmond Hill.

In the big picture the 'explosion' was more like the pricking of a small  balloon filled with hot air.

99 Düsenflieger
Jeder war ein großer Krieger
Hielten sich für Captain Kirk
Es gab ein großes Feuerwerk








Monday 9 November 2015

And which one would you take

So, if you were given that fateful Desert Island Discs choice of which one would you save from the wreck, what would you choose? There is enough consistency about my preferences to say that it would have to be a love song by Robert Burns. Can anyone come up with a better 4 lines than:

Not vernal showers to budding flowers
Not autumn to the farmer
So dear can be as thou to me
My fair, my lovely charmer

 Here is the definitive version.

Gutted 2

I mentioned last week my deep disappointment when Amazon proved themselves not up to the job of delivering my copy of Social Class in the 21st Century on the day of publication. It's only fair to report that they did manage to get it to me on Saturday & knocked off the price of delivery so credit where credit is due. In the end I got it for £6.29 (RRP £8.99). 

I can barely restrain myself from reading it all at once, but sadly I have a few more important things to do this week so I'm not sure when I'll get around to it. I couldn't resist though flicking through it  and within 30 seconds managed to spot my first howler. Unfortunately I don't have a scanner in my office so a webcam picture will have to suffice. Turn to page 82 where you will find the words: "Figure 2.2 shows clearly how all the different components of economic capital have similar age profiles."

Here is the figure they are talking about:


Well, similarity is, I suppose, in the eye of the beholder, but forsooth, perhaps you'd like to have another go at that one guys.

Perhaps you'd also like to have a go at explaining why the average 16 year old in the GBCS has £50,000 worth of savings (especially generous rich uncles?) and more than £200,000 worth of property? (we know that the £40,000 in income isn't  pocket-money but the joint household income which mostly isn't theirs to dispose of). Could we, perhaps, be mixing up a few different processes that we really shouldn't be confounding (like moving out of the family home)?

Take that nonsense away and do you really think these profiles are similar? Honestly? You do? OK er..

Houston, we have a problem...

Oh what a lovely war...

So there I was in the kitchen preparing the chicken chasseur, having a crafty sip of what I must say was an excellent 35 year old Spanish red and expecting to hear Desert Island Discs on Radio 4. No luck because it was trumped by the Remembrance ceremony from the Cenotaph. Fair enough, I'm as much in favour as anyone of  acknowledging the debt we owe to those we've put in the firing line. It would also be good if we did a better job of looking after them when they leave the armed forces, but that's another story.

But what left me momentarily speechless was the march being played by the military band. Which buffoon thought it was a good idea to play Oh what a lovely war? I'm sure the satirical intent was obvious in 1917 when it was a music hall hit, but at a solemn ceremony in 2015? I'm not sure that was a good place for postmodern irony.

Here's an even older song about one aspect of the military life.

Thursday 5 November 2015

The Strange Case of G. E. Bartlett - Part 2


This is the second part of this post. 

The question now arises as to whether there is anything else about Bartlett's data that suggests something untoward? Abernethy points out that the heaping is suspicious. I'm not entirely convinced it is, but let's run with a related idea. I would conjecture that when people "estimate" or make up data a real give away is that they tend to underestimate natural variability. That translates into a pretty straightforward prediction. If Bartlett's numbers aren't entirely kosher then the residual variation from his observations should be smaller than the residual variation from all the other interviewers.

There is a simple test for this. In effect we estimate a regression both for the level of income and for its variance. To keep things simple and feasible the only predictor I use for the latter is whether or not the observation is attributable to Bartlett. Bruce Western and Deirdre Broome have a nice paper on how to do this and more importantly some  Stata code to get the job done. I use their two step maximum-likelihood method - in effect an iterated gamma regression for the variance. It's also possible to do this kind of thing by REML with Stata's Mixed procedure however I lost patience waiting for the full model to converge and gave up. With simpler less heavily parameterized models the estimates point in the same direction though.

So here are the results. Same model for the means as in Part 1, but now with an extra equation for the residual variance. If Bartlett was "estimating" we would expect the variance of his observations  to be smaller than the variance of the observations generated by the other interviewers, in other words the coefficient for the Bartlett dummy should be negative. And this is indeed what we find (λ =-.14, t = 4.52). 

Though this proves nothing definite, Abernethy's case seems to gain some strength.

At the beginning of Part 1 I mentioned a detective story, so for those who are really interested in that rather than statistical games with 80 year old data, here it is. Abernethy notes that "Little is known of G. E. Bartlett..." True, but after a bit of spade work I can make a conjecture as to who he was. The evidence is circumstantial, but taken together is, I think, quite convincing. If I'm right it also may explain how he was able to carry out his prodigious interviewing feat.

Using a well know genealogy site I was able to look through all the Bartletts in the 1929 London Electoral Register. It turns out that there is only one with the initials G. E. - George Edwin Bartlett. It's easy to find George Edwin in the 1911 Census. He is living at 32 Netherfold Road, Clapham, SW with his wife Sarah Louise, two children and a domestic servant. The most important piece of information is that he is an LCC Attendance Officer, in other words somebody employed to make sure that kids go to school. This is significant because we know that a lot of the NSLLL data was collected by school attendance officers and this is the thing that tips the balance of evidence towards our man.

George Edwin Bartlett was born in Brighton in 1865 the son of a plasterer and seems to have been an attendance officer at least from the final years of the 1890s. There is in fact a reference to him in a London School Board document of 1898.  At the 1881 Census he is recorded as living with his parents in Clerkenwell and his occupation is given  as Confectioners Errand Boy. By 1891 he was lodging in Islington and in the census he is recorded as G. E. bartlett with the occupation Confectioner's Assistant. In 1901 we know from the electoral register that he was living in Lavender Hill in one furnished room. The Census has him as a visitor at another address and tells us that he is a School Attendance Officer and  a widower. By 1918 he has remarried and is living at 30 Union Grove, Clapham which is where we find him in 1929. He died in 1935 aged 70.

We don't actually know what Bartlett was doing in 1929, but at 64 it is not impossible that he had retired and therefore had a lot of time on his hands. As a School Attendance Officer he was in a sense a professional nosey parker and would have known  the circumstances of many of the families on his patch pretty well. More than 30 years of working for the LCC in this capacity may well have acquainted him with a very large number of people. A retired man who was still reasonably vigorous could easily do 20 of the the rather minimalist interviews required of him during the day, especially if he was willing to take his data from the the most easily accessible source which would have been the wives of the men who were away at work. This of course raises the possibility that the estimating and rounding were not done by Bartlett and that he was merely faithfully recording what he was told by the wives about their husband's earnings.

At the end of the day the important question, as Abernethy strongly points out, is should we trust the NSLLL data? I think one thing is clear: no modern survey organization would let one interviewer collect 20% of the data. Even if the "bias" attributable to that individual is small in percentage terms - the sheer weight of their contribution might be important for some questions. It is of course, important though to make that judgement within the context of a particular question. To take a modern example, we know that the earnings data from the modern Labour Force Survey though biased are good enough for some broad brush stroke comparisons. However you would be very ill advised to use them for questions which rely on information about the tails of the distribution ie very high or very low earners. 

As it happens for what I was interested in - the rank order of  occupational average earnings  - it really makes very little difference whether you include or exclude Bartlett's contribution. The Pearson correlation between the occupation averages (actually the shrunken level 2 residuals) with and without Bartlett is 0.98 and that is good enough for me.



Gutted

My day has been utterly ruined. Those of you who have been paying attention will know that today is a very important day for British social science for it is the publication day of what Nicola Lacey, whose expertise is in criminal law and legal theory, has called "a magisterial new analysis of class" - Savage et al's Social Class in the 21st Century. Naturally I placed my advance order with Amazon so that I would receive this instant classic on the very day of its publication. Imagine my horror when I looked through my emails this morning and found this:

We regret to inform you that the following items have been delayed:

  Savage, Mike "Social Class in the 21st Century (Pelican Introduction)"
    Estimated arrival date: December 19 2015

I guess I'm just going to have to bite the bullet and wait until the Christmas vacation unless someone wants to send me a review copy.

Assuming that everyone is in the same boat you can get an inkling of the content from the audio of the launch event that took place last Monday.

The best part of the whole thing is around 1:03:15 when the red haired lady says in reply to a question about the policy conclusions of the book "...for me now really it's just end capitalism  [wild applause from the floor]." At least she says what she thinks without equivocation. Perhaps though she might appreciate the sentiment behind this.

Wednesday 4 November 2015

The Strange Case of G. E. Bartlett - Part 1

This one has two of my favourite ingredients, numbers and a detective story. The time is 1929 the place is the London School of Economics. Hubert Llewellyn Smith is Directing the New Survey of London Life and Labour  (NSLLL) and Arthur Lyon Bowley is in charge of sampling London households.  In the field are more than 150 interviewers collecting information on household income.

Fast forward to 2015. I want to categorize the occupations recorded in the 1931 Census. The NSLLL contains information on the occupation and earnings of the household residents. I figure it could give me some guidance about the similarities between occupations. The NSLLL was probably the largest social survey carried out in Britain during the inter-war period and more to the point it is, to my knowledge, the only one that has (mostly) been digitised. Even more to the point, I happen to have a copy of it on my hard disk.

The question is: can I trust these data? Why not? I hear you ask. Well, mainly because of the activities of one of the interviewers a certain G. E. Bartlett who appears to be responsible for conducting not far short of 20% of all the interviews. Bartlett regularly clocked up over 400 interviews a month and in October 1930  managed 600, 20 a day if he worked 7 days per week. Strictly speaking we don't know for sure that Bartlett was a 'he' but the evidence on the surviving handwritten cards suggests it was so. He certainly had an incentive. He was making more than a shilling per  interview, and £30 for a months work was roughly 3 times the median working-class earnings level.

Bartlett's Stakhanonvite workrate and more importantly seeming peculiarities in portions of the data he recorded certainly made Simon Abernethy  - a Cambridge history postgraduate - suspicious. In a very interesting paper he argues, very plausibly, that though it is unlikely Bartlett literally sat at home and made the data up, the evidence is consistent with him estimating a large portion of the earnings data he was supposed to be collecting. On first reading I found Abernrthy's account pretty convincing. But then I started to wonder. 

Admittedly things look bad for Bartlett, but was 600 interviews a month as implausible as it sounded? Bear in mind these were nothing like modern survey interviews. Very little data was actually collected - the interview probably lasted no more than 10-15 minutes and the addresses were heavily clustered. Most of Bartlett's households were in Battersea, Camberwell, Lambeth, Southwark, St Pancras and Wandsworth and the sampling fraction was about 1 in 50. Someone who knew the areas well could probably make fairly rapid progress. Perhaps 20 interviews a day for someone working on it full-time was not as surprising as it seemed.

Then I had another thought. If you believe  that someone is guilty and that proof of that guilt is to be found in unusual data patterns then unless you carefully specify before peaking at the data which unusual patterns you are looking for then you are bound to turn up something. There are a very large number of observations in the NSLLL and plenty of scope for sub-group analysis. Seek and ye will find. We all know about the "look elsewhere effect" don't we? Perhaps what Abernethy finds is nothing more than extreme values that are due to chance (regardless, so to speak of what the p. values say). Indeed, what he does is looks at small subsets of the data - particular occupations for example - and shows that data collected by Bartlett differ in some ways from data collected by the rest of the interviewers. 

But what happens if we look at all of the data and, to provide a bit of comparison, distinguish from the rest the second, third and fourth most industrious interviewers. J. Hopkers, J. Ludgate and A. N. Winter though not in Bartlett's league  were each responsible for surveying more than 800 households. Is there any evidence that they produced unusual results too?

Time for some data analysis. I'm working with the public release version of the data which is restricted to the 'working class' households (Abernethy  has also digitized a large portion of the so called ' middle class cards' but these data are not yet in the public domain). My sample consists of everyone who is either employed or self-employed and aged over 13. Observations are clustered within households and the primary variable of interest is weekly earnings expressed as shillings per week (12 pence to a shilling, 20 shillings to a pound). I exclude a few cases where the earnings that are reported are in some sense 'joint' and not attributable to a single earner.

For all sorts of reasons interviewers differed in the level of earnings they reported. About 10 percent of the total variation is between interviewer variation.  To put that in context the table that follows also gives numbers for  some other salient sources of variation.
The amount of variation attributable to interviewers is roughly  similar to the amount attributable to household membership and very roughly double that attributable to the fact that people with similar incomes tend to live in proximity to each other. The heavy hitter here though is occupation - which is good for me given that this is what first brought me to these data. Of course interviewer, household, geographical area and occupation are confounded with each other so we can't read these numbers as unique contributions to the total amount of earnings variation. Interviewer "effects" will undoubtedly shrink once we control for other sources of variation.

But before we do that let's examine the data a little more closely. In the figure below I superimpose a histogram of the earnings information (in units of 1 shilling) collected by Bartlett (white bars with black borders) on top of the earnings information collected by all the other interviewers (in green).

The question is: are these distributions different? The answer is (obviously): yes. But so what? You wouldn't expect them to be exactly the same. They share some features and  differ in some ways.

 Actually the stand out feature is the heaping of observations on certain values, multiples of 20 shillings for instance.  Heaping is to be expected for at least 4 reasons. Firstly  it might reflect reality - employers paying an hourly rate calculated to deliver a nice round number for a standard working week. Secondly respondents may be rounding their actual wages up or down to a particularly salient value - say £3 a week. Thirdly, the interviewers might be rounding what they are told. Fourthly, the interviewers might be using their own estimates (perhaps based on good local knowledge) rather than actually asking the respondents about their earnings.

Any and all of these things could be happening. We don't know and it is very difficult to draw conclusions from just looking at the  the distributions. Where most of the interviewers heap, Bartlett also heaps.  Sometimes he heaps a bit more sometimes a bit less. A lot of his data is crowded into the 50-80 shillings range which might be taken to suggest that something untoward was going on. It might also just reflect the fact that he interviewed in particular areas with particular concentrations of occupations that received similar wages. To get any further we need to impose more structure on the data.

The basic idea is to estimate some regressions that control for a lot of stuff. We could do this in a number of different ways but I'm going to keep it simple. The dependent variable is weekly income in shillings and I include fixed effects for the 433 occupational groups and the 36 areas. There is a dummy for gender and for whether the respondent is employed or self-employed. Hours of work are controlled as are age and age squared.  Finally I distinguish four interviewers (Bartlett, Hopker, Ludgate and Winter) and a residual group and the dummy indicators for these are interacted with gender. The estimated coefficients for this interaction are of central interest.

The table below gives the average deviation for each separately identified interviewer from the conditional mean earnings level recorded by the other interviewers taken as a group. This deviation is expressed separately for male and female wage earners. There are of course a number of different ways to parameterize this interaction, but for our purposes this seems to be the most enlightening.
What this suggests is that Bartlett's numbers  on average had working class respondents earning around 2s and 6d to 3 shillings more than the figures obtained by the majority of the interviewers. The difference between his male and female figure is statistically significant, but of little substantive importance. 2s and 6d is, as my grandmother would say, a lot of money if you don't have it but it's also just about 5% of the median working class wage and a somewhat lower percentage of the median male working class wage. If Bartlett was guessing or "estimating" he was, on the whole, actually doing a pretty good job! He was also not alone in his "inaccuracy".

 Hopker appears to have erred in the opposite direction, underestimating both male and female earnings while Ludgate, though he does a good job for the males, seems to find particularly well paid women. Of the 4 only Winter is, as it were, consistently on the money.

The point is I'm pretty sure that if I looked at the next 5 most prolific interviewers I could find differences of this magnitude  and probably also  if I continued looking all the way down to where the group size is so small that it would take truly massive differences to produce statistically "significant effects". My conjecture is that these results on their own don't reveal anything particularly unusual about Bartlett's  modus operandi.

In Part 2 I'll look at a different indicator of the unusualness of Bartlett's data - the variance and I'll tell you who I think this man of mystery was and how he managed to do all that interviewing.