Thursday, April 14, 2011

Inter-Reviewer Agreement

A subject that recently came up that I believe reflects faulty thinking about proposal reviewing. It is the subject of inter-reviewer agreement. To explain, when annotators or translators work, it is possible, and even desirable, to test the agreement among the annotators or translators to be sure that the results are sound. Inter-annotator agreement means that, if you brought in someone new and had them do the annotation or translation, the result would still be more or less the same. Recently, I've learned that some people expect that the same consideration should be given to scientific proposal review.

It is instructive for this topic to relate an experience within a very large, interdisciplinary program, which included proposals in social sciences and information technology. To ensure "fair" review, it was decided to have two separate panels review such proposals: one in the social sciences and one in information technology. The surprising result was that the information technology reviewers had little variance among themselves as to their evaluations of the proposals, but the social science reviewers were extremely broad in their evaluations. Evaluations from the latter on some proposals ranged the entire gamut of possible scores on the same proposal. While there are a lot of possible hypotheses one could propose as to why this difference occurred, it is at least a demonstration that inter-reviewer agreement clearly varies across disciplines.

After thinking about why this result should be the case (and it was demonstrated more than once), it may be due to a couple of different reasons. One possible reason could be that information technology research is well-supported by the government, and the social sciences are not. When a discipline is well-supported, it gives the opportunity for researchers to meet fairly often in review panels and learn how to appreciate and accept the scientific views of their colleagues. The opposite might be said of disciplines that are not well-supported. When the views of colleagues are understood and accepted, it may cause others in the same field to bow to the expert and accept their evaluations of research in areas in which they are expert. It may also cause others who are not so expert to get at least a glimmer of appreciation of others' research and what good research in that field looks like. Proposal evaluation panels are often opportunities for instruction for the panelists in this way.

Another possible reason why information technology reviewers agree more than social science reviewers could be that the social sciences are much broader than information technology, resulting in a greater diversity of opinion. While I tend to accept this view, I believe a confounding factor is that the social sciences have also not progressed toward common understandings even in sub-disciplines of social science precisely because the field has been underfunded for a long time. Underfunding means a lack of strong selection processes enabled by reviews of proposals for funding and subsequent support for the best ideas. In other words, many different ideas bloomed, with little to no selection of the best from among them. So, they all continue to exist in their own little niches.

Finally, it should be said that, in interdisciplinary proposal review panels, consensus reviews should be captured as well as the different reviews of each participant. The struggle toward consensus creates the opportunity for those who are outliers in the review to explain themselves. If they are convincing, then the results are not only a more informed result for the program manager, but also it results in an education of those who were outliers as to points of view that also exist in the scientific community and why. This point is important, because those other points of view may very well be based on established results that were unknown by others on the panel.

One last point, a question was once made on what the ideal size of a review panel might be. Of course, this has never been really tested, and it depends upon the diversity of fields assembled. Nevertheless, it would be very hard to make a case for less than 15-20 reviewers on a single panel. Panel assembly is a sampling process, and experience has demonstrated that the danger of omitting important expertise tends to be less of concern with panels of this size. Larger panels are also unwieldy, so more than 20 may also be a bad idea for other reasons having to do with time management.

No comments:

Post a Comment