Tuesday, 3 August 2010

The BJS and Public Domain Data

It is desirable that data used to generate evidence in scholarly publications should be available for scrutiny by other interested scientists. We can all agree about the principle. How this can best be achieved is less clear, especially once one starts to consider other desiderata - such as safeguarding the right to privacy of subjects who in some cases may never have given explicit consent for their personal data to be used for the purposes of social research. 
Consider the situation in the UK for somebody publishing an empirical article in an academic journal. Anyone  acquiring data from the the UK Data Archive at the University of Essex, for instance, is required to sign an End User Licence which prohibits the distribution of Archive data to a third party that has not him/herself entered into an End User agreement with the Archive. In practice this is not very restrictive as any member of a UK higher education institution can register as an End User and, without paying a fee, acquire the original data. Once you think about it this minimal level of restriction is sensible. Data in free circulation has a tendency to 'mutate' and it is sensible from the point of view of scientific integrity to encourage users to go back to the original source.
Access to some data is much more restricted. Take for instance the ONS Longitudinal Study (LS). This has been created from linked census records, birth and death registrations and cancer records. Use of it is free to UK academics but there is an involved and rigorous process of project approval and the data can only be accessed from secure servers. User are prohibited from distributing LS data to third parties. The LS is a very important source of information about social demographic and epidemiological topics. However, the "subjects" have not given explicit permission for their personal information to be used for the purposes of  research. They have given information about themselves to the state either because they by law have to or because such information is collected as part of the state's routine administrative processes. In the circumstances it doesn't seem unreasonable to be careful about how and for what purposes these data are used.
Now consider another important source of social scientific data in the UK - the  National Study of Health and Development (NSHD) popularly known as the 1946 Birth Cohort Study. Though paid for out of the public purse, latterly by the Medical Research Council, this study is not yet fully in the public domain. There are a large number, myself included, who think that it should be. But there are legitimate concerns that  completely unrestricted access to data of this sort - containing for instance very detailed medical information - compromises the guarantee of  anonymity given to subjects and jeopardises their continued participation in the study. Clearly a case can be made, based on  scientific interest and the ethical treatment of subjects, for having some controls on access to the raw data and for prohibiting unauthorized data dissemination.
So, what seems like a good idea, free and unrestricted access to data, is not as straightforward or desirable as you might think once you start to  take seriously the rights of data providers and the unintended consequences of data mutation.
So why am I preaching this sermon? Because it has been drawn to my attention that one of the major British sociology journals - the British Journal of Sociology - appears to have refused to publish an article because the data used in it are not freely available to all researchers. 
The data in question are derived from the population registers that are maintained, for instance, in all of the Nordic countries. These typically allow linkage- through a unique person number -  of a vast amount of information about citizens. As a social scientific and epidemiological resource these data are of enormous importance and results from them  are routinely published in leading  disciplinary journals throughout the world. These data though are not collected for the purpose of carrying out social or medical research and there are serious concerns about the threats posed by  the linkage of administrative records to the ordinary citizen's right to privacy. For these reasons it is normally prohibited to export  register data beyond the territorial boundaries of the state and access is granted only after a specific project proposal has been vetted. Data users are normally not permitted to make or keep copies of the data and are, of course, forbidden to disseminate it to third parties. With differences of detail the constraints that researchers work under are similar to those imposed on UK users of the LS.
Three things concern me. Firstly, if consistently applied, the BJS policy will exclude leading researchers and cutting edge work from its pages. To me this seems perverse and very bad for British sociology. Secondly, the BJS's data dissemination policy is not itself in the public domain. You can find the current guidance for authors here. As of 03/08/2010 there is no mention of a data availability policy or of any specific requirement to deposit or disseminate data. In the case I have been told about the issue of data availability was only raised with the author, by an editor, after the refereeing process was completed. If I were the author  I think I would feel that I had had my time wasted and that I were being treated less than fairly. Thirdly, the issues at stake were raised with the editorial team by a member of the Editorial Board more than six weeks ago together with a request that a clear statement of the BJS's data dissemination requirements  be added to the notes of guidance for authors. I can't understand why this, as yet, hasn't been done. Those who provide the BJS's copy (for free) and, incidentally,  generate enormous profits for the London School of Economics, deserve better treatment.