Nov 29, 2010

The mismeasurement of science & Bagehotʼs Rule

Reading Michael Nielsens excellent post on the mismeasurement of science, he brings up an interesting notion he calls centralised indexes . 

 

Centralized metrics create perverse incentives: Imagine, for the sake of argument, that the US National Science Foundation (NSF) wanted to encourage scientists to use YouTube videos as a way of sharing scientific results. The videos could, for example, be used as a way of explaining crucial-but-hard-to-verbally-describe details of experiments. To encourage the use of videos, the NSF announces that from now on they’d like grant applications to include viewing statistics for YouTube videos as a metric for the impact of prior research. Now, this proposal obviously has many problems, but for the sake of argument please just imagine it was being done. Suppose also that after this policy was implemented a new video service came online that was far better than YouTube. If the new service was good enough then people in the general consumer market would quickly switch to the new service. But even if the new service was far better than YouTube, most scientists – at least those with any interest in NSF funding – wouldn’t switch until the NSF changed its policy. Meanwhile, the NSF would have little reason to change their policy, until lots of scientists were using the new service. In short, this centralized metric would incentivize scientists to use inferior systems, and so inhibit them from using the best tools.

The YouTube example is perhaps fanciful, at least today, but similar problems do already occur. At many institutions scientists are rewarded for publishing in “top-tier” journals, according to some central list, and penalized for publishing in “lower-tier” journals. For example, faculty at Qatar University are given a reward of3,000 Qatari Rials (US $820) for each impact factor point of a journal they publish in. If broadly applied, this sort of incentive would creates all sorts of problems. For instance, new journals in exciting emerging fields are likely to be establishing themselves, and so have a lower impact factor. So the effect of this scheme will be to disincentivize scientists from participating in new fields; the newer the field, the greater the disincentive! Any time we create a centralized metric, we yoke the way science is done to that metric. 

Centralized metrics misallocate resources: One of the causes of the financial crash of 2008 was a serious mistake made by rating agencies such as Moody’s, S&P, and Fitch. The mistake was to systematically underestimate the risk of investing in financial instruments derived from housing mortgages. Because so many investors relied on the rating agencies to make investment decisions, the erroneous ratings caused an enormous misallocation of capital, which propped up a bubble in the housing market. It was only after homeowners began to default on their mortgages in unusually large numbers that the market realized that the ratings agencies were mistaken, and the bubble collapsed. It’s easy to blame the rating agencies for this collapse, but this kind of misallocation of resources is inevitable in any system which relies on centralized decision-making. The reason is that any mistakes made at the central point, no matter how small, then spread and affect the entire system.

In science, centralization also leads to a misallocation of resources. We’ve already seen two examples of how this can occur: the suppression of cognitive diversity, and the creation of perverse incentives. The problem is exacerbated by the fact that science has few mechanisms to correct the misallocation of resources. Consider, for example, the long-term fate of many fashionable fields. Such fields typically become fashionable as the result of some breakthrough result that opens up many new research possiblities. Encouraged by that breakthrough, grant agencies begin to invest heavily in the field, creating a new class of scientists (and grant agents) whose professional success is tied not just to the past success of the field, but also to the future success of the field. Money gets poured in, more and more people pursue the area, students are trained, and go on to positions of their own. In short, the field expands rapidly. Initially this expansion may be justified, but even after the field stagnates, there are few structural mechanisms to slow continued expansion. Effectively, there is a bubble in such fields, while less fashionable ideas remain underfunded as a result. Furthermore, we should expect such scientific bubbles to be more common than bubbles in the financial market, because decision making is more centralized in science. We should also expect scientific bubbles to last longer, since, unlike financial bubbles, there are few forces able to pop a bubble in science; there’s no analogue to the homeowner defaults to correct the misallocation of resources. Indeed, funding agencies can prop up stagnant fields of research for decades, in large part because the people paying the cost of the bubble – usually, the taxpayers – are too isolated from the consequences to realize that their money is being wasted.

One metric to rule them all

No-one sensible would staff a company by simply applying an IQ test and employing whoever scored highest (c.f., though, ref). And yet there are some in the scientific community who seem to want to move toward staffing scientific institutions by whoever scores highest according to the metrical flavour-of-the-month. If there is one point to take away from this essay it is this: beware of anyone advocating or working toward the one “correct” metric for science. It’s certainly a good thing to work toward a better understanding of how to evaluate science, but it’s easy for enthusiasts of scientometrics to believe that they’ve found (or will soon find) the answer, the one metric to rule them all, and that that metric should henceforth be broadly used to assess scientific work. I believe we should strongly resist this approach, and aim instead to both improve our understanding of how to assess science, and also to ensure considerable heterogeneity in how decisions are made.

One tentative idea I have which might help address this problem is to democratize the creation of new metrics. This can happen if open science becomes the norm, so scientific results are openly accessible, online, making it possible, at least in principle, for anyone to develop new metrics [2]. That sort of development will lead to a healthy proliferation of different ideas about what constitutes “good science”. Of course, if this happens then I expect it will lead to a certain amount of “metric fatigue” as people develop many different ways of measuring science, and there will be calls to just settle down on one standard metric. I hope those calls aren’t heeded. If science is to be anything more than lots of people following the comet head, we need to encourage people to move in different directions, and that means valuing many different ways of doing science.

Update: After posting this I Googled my title, out of curiosity to see if it had been used before. I found an interesting article by Peter Lawrence, which is likely of interest to anyone who enjoyed this essay.


This brought to mind Bagehot’s rule, that ”the way to deal with a panic in which nobody is sure if contracts will be honored is for the government to make sure that contracts are honored by lending freely to anybody who asks. The lending should be “at a penalty rate”—financiers should never profit from the fact of government assistance to stem the panic.”

First comes into mind the (massively important) empirical question of the effects of Basel III on financial stability, how the enforcement through such accords of centralised standards affect economic growth and stability and their social distribution.

Secondly I wonder if there isnt a lesson in Bagehot’s rule about how to allocate resources to science?



About
Things I do around the web, lightly curated. Subscribe via RSS.