Friday, December 4, 2015

All rankings should include error bars

In introductory science courses we try and instill in undergraduates the basic notion that any measurement has an error and you should estimate that error and report it with your measurement. Yet "Professors" who are in senior management don't do that.

Today in Australia the results of the Excellence in Research Australia (ERA) ranking exercise were announced. Every research field at every university is given a score. A colleague wisely pointed out that given the ad hoc procedure involved all the rankings should include error bars. He conjectured that the error bar was about one. Hence, one cannot distinguish the difference between a 4 and 5. Yet, this is a distinction that university managers and marketing departments make a lot off.

I think for almost all ranking exercises it would be quite straight forward for the compilers to calculate/estimate the uncertainty in their ranking. This is because almost all rankings are based on the average of rankings or scores produced by a number of assessors. One simply needs to report the standard deviation in those scores/rankings. I think the conclusion of this will be that rankings largely tell us what we knew already and that any movement up or down since the last ranking is within the error bars. John Quiggin has made such arguments in more detail.

The ERA is largely modelled on the UK equivalent; originally, called the RAE but now the REF. This has been widely criticised; it wastes massive amounts of time and money, involves flawed methodology, and has been devastating for staff morale. These issues are nicely (and depressingly) chronicled in a blog post by Liz Morrish. One academic Derek Sayer fought to be excluded from the RAE as a protest.  He explains in detail why it is such a flawed measure of real scholarship.

 It is also worth looking at The Metric Tide: Report of the Independent Review of the Role of Metrics in Research Assessment and Management, commissioned by The Higher Education Funding Council which is responsible for the REF. Reading the recommendations is strange. It sounds a bit like "most people think metrics are rubbish but we are going to use them anyway...".


  1. In discussions about metrics, it is often suggested that instead of using metrics one should just read the papers. I found it really interesting (depressing?) that the THE article suggests that this doesn't work. (At least as implemented in REF.) Because of the most obvious objection to reading things - it takes a lot of time.

    1. Ben, Thanks for the comment.
      Reading the papers does not work for the REF because the assessors are "reading" a ridiculously large number of papers.
      I think reading a couple of select papers should be done for specific individuals, e.g. in promotion decisions. Indeed, an idealist would argue that this won't be much work because if the applicant has actually done something worthwhile the expert assessor will have already read the papers because they are of interest and value to her.