Interview
“More must be done to ensure that the investment that we are making in research is really delivering what we need”
James Wilsdon, professor of research policy at the University of Sheffield, has led studies in the UK and abroad to analyse the use and effect of metrics in research assessment and management. He is vice-chair of the International Network for Government Science Advice (INGSA), a collaborative platform across national and international science advisory organisations that fostersevidence-informed policy formation.
How can funders (agencies, governments, foundations, etc.) use metrics to assess the excellence and impact of research? Do we agree on what excellence means and what impacts are desirable?
This is an important part of the debate. And let me put the word “excellence” in quotes because sometimes it’s a problematic term. It obscures as much as it illuminates in terms of what we are valuing in the system. The balance between conventional criteria of research excellence (primarily assessed through citations, patents, etc.), alongside the growing emphasis for research to have broader forms of impacts on society and the economy, creates a need for more responsible use of metrics.
Broadening out the range of metrics we use and accompanying them with sensible, qualitative peer review can be very helpful. Some altmetrics, for example, can be a way of recognising citations by non-academic bodies. If you treat these altmetrics as an important part of your assessment process, it will encourage academics to engage with other audiences than the academic community.
Are altmetrics going to change the way we assess research outcomes?
A lot of the focus around altmetrics has been tilted towards social media. And that’s interesting, but it’s a rather superficial proxy for really understanding whether research is having an impact on important societal problems such as changing the practice of the criminal justice system. What’s happening in social media can give us some useful information, but I think it would be dangerous to link funding to those indicators.
In general, we are at a very early stage in developing effective indicators for societal impacts. There’s room for developing newer and more helpful impact metrics.
What is the Metric Tide?
The Metric Tide was the final report of an independent review of the role of metrics and quantitative indicators in the management and assessment of UK research. It was commissioned by the UK government and I chaired it. I worked with a group of 12 experts – scientists, social scientists, bibliometricians, research funders – for about 18 months, and the report was published in the summer of 2015. At that time, there was growing discussion across the global research community about metrics and their uses. DORA (San Francisco declaration on research Assessment) and the Leiden manifiesto are two initiatives that had taken this discussion forward.
Why did the UK government commission this work?
The narrow, specific reason was linked to the Research Excellence Framework (REF). Every 5-6 years, the REF assesses the UK national research system based on peer-review subject panels and allocates about a third of the public research budget across universities and across all disciplinary areas. In 2014, the government wanted to look at whether the whole exercise could be done in a more efficient way by just using metrics, so the Metric Tide review was initiated.
The broader aspect was the greater significance attached to quantitative indicators and metrics of various kinds in the management of research, in the allocation of funding, in the assessment of individuals and research groups in universities. We wanted to look at that broad phenomenon in a more holistic way and see what this “raising tide” of metrics means for the research culture, research practice, and the way we govern and direct our science and research system. The report also generated interest outside the UK.
One of the outcomes of the report is that we need more metrics, but they have to be responsible. What does responsible metrics mean?
We came up with this term “responsible metrics” to convey both the possibilities and the pitfalls of metrics usage. We are all aware of the many instances in which certain indicators get used inappropriately in research assessment and management processes. The most obvious and egregious example is the misuse of journal impact factors. We know from a large volume of empirical work that the correlations between the quality of an individual paper and the impact factor of the journal it was published in are poor. And yet we constantly see impact factors used in inappropriate ways.
Responsible metrics are used in a sensible, robust way that can be a valuable part of the management of the research system. But we need to be very alert to the context in which they are used.
What should responsible metrics look like?
Data should be as robust as possible. We want to make sure there is enough coverage of the different disciplines and that different research outcomes are accounted for. And we need humility in the way we use metrics: they should support but not supplant peer evaluation. Academic research is a complicated endeavour by its nature and you can achieve a more nuanced assessment of research with some combination of metrics and peer review.
In addition, there are other factors such as transparency, i.e., that those being assessed understand the nature of the measurements and indicators that are being used to assess their work. And we also need diversity: a diverse set of indicators and research outcomes – from papers to exhibitions to data sets – but also of different career paths.
What would be good examples of non-responsible metrics versus responsible metrics?
An example of a bad practice could be the ResearchGate score. The website ResearchGate is used by many academics as a convenient way to share their work with peers. The site also awards you a score, but it’s very unclear on which algorithm basis the score is calculated. That is not a responsible metric. The other obvious example would be many international university and research rankings, which are methodologically and statistically dubious.
An example of a good practice in recruitment or assessment of individuals, e.g. for promotion, would be to ask researchers to highlight in a narrative way the two or three contributions to research that they consider to be the most important in their career to date and why. And then the panel can read that work. It doesn’t matter what journals the articles were published in. You are bringing more qualitative, evaluative dimensions to that process.
What about the concern that peer review might be very vulnerable to intrinsic and systemic biases?
Ideally, you need a mix of quantitative indicators and qualitative expert judgement. Peer review is not perfect; we are all aware of its weaknesses. But, at the same time, it’s rather like democracy: it’s the least bad system we have developed as the academic community to govern ourselves.
Peer review, when it’s done well, is formative as well as summative, i.e. we are not just trying to evaluate but also to improve the quality of each other’s work, whereas metrics are most commonly only summative.
But it’s true that metrics can also act as a more objective and positive countervailing force in places with a culture of patronage or nepotism or sexism. And this would be, in fact, a responsible use of metrics.
Are you seeing any rapid change following the Metric Tide and other related initiatives?
There has definitely been very visible and interesting discussion and a resulting awareness about this topic over the last 5 or 6 years. And that’s to be welcomed. But it would be naïve to say that the tide has completely turned. We are in a period of transition, of contestation and debate. I expect it will take some time for the different actor systems to align and take action. And it’s not in any way certain that all will resolve in the optimal way.
Do you agree with some of the criticism that when valuing impacts sometimes we confuse the needs of society with the needs of industry?
I would always include business and commercial uptake of research as part of any assessment of impact. Working with business can be as important as working with a government or with a community. In essence, the job of research assessment and management is largely to be neutral. We want a research system that contributes across the board, which has lots of impacts in different places and in different sectors.
I think the tension is not as much between research engagement with business versus engagement with other parts of society. I think the main problem is still engagement in any way with society. The predominant focus in academia is still publishing work that is read by ten other experts and never has any impact on anything. That’s the battle.
What about scientists complaining about a system that is trying to micromanage them? Is the new assessment system we are promoting serving us better than the old system?
We don’t want all academics to be writing only for academics, but that should still be an important part of their work. Discovery-led science is still something we want to support. What we are talking about is about finding the right balance. And across Europe, in the US and in many other countries, we’re seeing a shift towards more applied and impact-oriented research.
If we tip the balance so far that suddenly there’s not enough discovery-led science taking place, it would be damaging. To keep that balance right is a perennial question of research policy. There’s no correct answer.
Any guidelines on how to keep that balance?
If we take a step back and we consider the scale of the scientific academic enterprise, the extent to which it has grown over the past 30 years, I think it’s right to ask the question: are we seeing a corresponding increase in the contribution of that activity to meeting the really pressing needs of our economy and society? That’s the big policy funding question. Of course, people are going to be nervous and resistant to changing the incentive system, but more must be done to ensure that the investment that we, as countries, are making in science and research is really delivering what we need.
So, what are the tools and expertise that policy makers need to do that job?
This brings us back to research on evaluation and metrics. A good system will be very reformative, it will want to understand the different ways in which it is contributing, over different timeframes and sectors. I think the REF is a good approach. It can be improved, but it’s an attempt to do what we are discussing.
Research on research, or the science of science, is a growing field to address some of these challenges. It is not only about assessment. It is also driven by concerns over scientific practice, reproducibility, integrity, perverse incentives and broader research culture – all topics that are higher on the agenda now than they were 5-10 years ago.
The shift towards societal impact does not need to damage the fundamental research system. We need to better understand how the system works, the range of impacts, and find a balance across funding systems. These are the types of discussions every system should be engaging with to come up with the right answers. There’s no magic recipe, but lots to be done.
Interview by Silvia Bravo Gallart