Pairwise Agreement

Accuracy and F1 score do not take into account expected random chords, which are likely to occur when people comment on instances. Measures taken taking into account the expected random agreement: Cohen Kappa statistics are a popular synthesis measure of concordance, but are limited to evaluating the concordance between the ordinal classifications of two evaluators [2,3]. However, several extensions of Cohens Kappa have been developed, offering summary measures of concordance (and association) between multiple evaluators. These include Fleiss`Kappa for multiple assessors [4], the intra-rater correlation coefficient, also known as CCI [5], and the weighted (and unweighted) kappas of Meilke et al. [6] Despite the availability of these extended measures, many compliance studies report the average or range of cohen-kappas in pairs and weighted kappas in assessing the agreement or association between more than two evaluators [7-12]. This can lead to complexities in interpretation and is not feasible in studies conducted with a large number of reviewers. If pairwise comparisons with the four rules mentioned are indeed transitive, pairwise comparisons for a list of alternatives (A1, A2, A3, …, An-1 and An) can take the form: p0w is now the weighted part of the observed conformity and the pcw the weighted part of the random agreement. .