SPSS Tutorial: Inter and Intra rater reliability (Cohen's Kappa, ICC)

Kappa CI and SEM calculator: https://tinyurl.com/zcm2e8h

In this video I discuss the concepts and assumptions of two different reliability (agreement) statistics: Cohen's Kappa (for 2 raters using categorical data) and the Intra-class correlation coefficient (for 2 or more raters using continuous data). I also show how to calculate the Confidence Interval Limits for Kappa and the Standard Error of Measurement (SEM) for ICC.

Cohen’s Kappa (For ICC, please see video tutorial)

Cohen's kappa coefficient is a statistic which measures inter-rater agreement for categorical items. It is generally thought to be a more robust (stronger, reliable) measure than simple percent agreement calculation, since κ (kappa) takes into account the agreement occurring by chance. It must be noted that there are variations of Cohen's kappa (κ) that are specifically designed for ordinal variables (called weighted kappa, κw) and for multiple raters (i.e., more than two raters). This tutorial is purely for nominal (unranked categorical) variables.

The equation for Cohen’s Kappa is:

Po is the relative observed agreement among raters, and Pe is the hypothetical probability of chance agreement, using the observed data to calculate the probabilities of each observer randomly saying each category. If the raters are in complete agreement then κ = 1.


1: The response (e.g., judgement) that is made by your two raters is measured on a nominal scale and the categories need to be mutually exclusive (e.g., male or female, can’t be both – technically).

2: The response data are paired observations of the same phenomenon, meaning that both raters assess the same observations.

3: Each response variable must have the same number of categories and the crosstabulation must be symmetric (i.e., "square") (e.g., a 2x2 crosstabulation, 3x3 crosstabulation, etc).

4: The two raters are independent (i.e., one rater's judgement does not affect the other rater's judgement).

5: The two raters are fixed, meaning that they are specifically selected to take part in the study.

SPSS Guide


If the obtained K is less than .70 -- conclude that the inter-rater reliability is not satisfactory.

If the obtained K is greater than .70 -- conclude that the inter-rater reliability is satisfactory.

No comments:

Post a Comment