Content area
Full Text
Qual Quant (2015) 49:463470
DOI 10.1007/s11135-014-0003-1
Rosa Falotico Piero Quatto
Published online: 13 February 2014 Springer Science+Business Media Dordrecht 2014
Abstract The Fleiss kappa statistic is a well-known index for assessing the reliability of agreement between raters. It is used both in the psychological and in the psychiatric eld. Unfortunately, the kappa statistic may behave inconsistently in case of strong agreement between raters, since this index assumes lower values than it would have been expected. The aim of this paper is to propose a new method to avoid this paradox through permutation techniques. Furthermore, we study the problem of kappa condence intervals and, in particular, we suggest to use Bootstrap condence intervals free of paradoxes.
Keywords Inter-rater agreement Fleiss kappa Kappa paradoxes
Monte Carlo simulations Bootstrap condence intervals
1 Introduction
The kappa statistic was proposed by Cohen (1960) to measure the agreement between two raters (also called judges or observers), independently judging n subjects through a scale consisting of q categories. Kappa has become a well known index for the comparison of expert advices, especially in the psychometric eld (Uttal et al. 2013; Harvey and Tang 2012; Markon et al. 2011; stlin et al. 1990).
A comprehensive review of inter-rater agreement coefcients has been put forth by Gwet (2008) and Dijkstra and Eijnatten (2009).
The use of Cohens kappa statistic has been increasing despite two important paradoxes (Cicchetti and Feinstein 1990; Feinstein and Cicchetti 1990): (i) the presence of high levels of raters agreement with low kappa values (related to prevalence of the trait in the sample) and (ii) the lack of predictability of changes in the statistic with different marginals (due to the symmetry of rates in the disagreement categories). This paradoxical behaviour has
R. Falotico (B) P. Quatto
Department of Economics, Management and Statistics, University of Milan-Bicocca, Piazza Ateneo Nuovo 1, 20126 Milano, Italye-mail: [email protected]
Fleiss kappa statistic without paradoxes
123
464 R. Falotico, P. Quatto
been widely studied (Cicchetti and Feinstein 1990; Feinstein and Cicchetti 1990; Lantz and Nebenzahl 1996; Shoukri 2004).
On the contrary, very little attention has been devoted so far to a similar problem affecting the statistic proposed by Fleiss (1971) as a multiple-raters generalization of the Cohens kappa. As a matter of fact, in specic situations, Fleiss kappa...