In this video I discuss the concepts and assumptions of two different reliability (agreement) statistics: Cohen's Kappa (for 2 raters using categorical data) and the Intra-class correlation coefficient (for 2 or more raters using continuous data). I also show how to calculate the Confidence Interval Limits for Kappa and the Standard Error of Measurement (SEM) for ICC.
Cohen’s Kappa (For ICC, please see video tutorial)
Cohen's kappa coefficient is a statistic which measures
inter-rater agreement for categorical items. It is generally thought to be a
more robust (stronger, reliable) measure than simple percent agreement
calculation, since κ (kappa) takes into account the agreement occurring by
chance. It must be noted that there are variations of Cohen's kappa (κ) that
are specifically designed for ordinal variables (called weighted kappa, κw) and
for multiple raters (i.e., more than two raters). This tutorial is purely for
nominal (unranked categorical) variables.
The equation for Cohen’s Kappa is:
Po is the relative observed agreement among raters, and Pe
is the hypothetical probability of chance agreement, using the observed data to
calculate the probabilities of each observer randomly saying each category. If
the raters are in complete agreement then κ = 1.
Assumptions
1: The response (e.g., judgement) that is made by your two
raters is measured on a nominal scale and the categories need to be mutually
exclusive (e.g., male or female, can’t be both – technically).
2: The response data are paired observations of the same
phenomenon, meaning that both raters assess the same observations.
3: Each response variable must have the same number of
categories and the crosstabulation must be symmetric (i.e., "square")
(e.g., a 2x2 crosstabulation, 3x3 crosstabulation, etc).
4: The two raters are independent (i.e., one rater's
judgement does not affect the other rater's judgement).
5: The two raters are fixed, meaning that they are
specifically selected to take part in the study.
SPSS Guide
Interpretation
If the obtained K is less than .70 -- conclude that the
inter-rater reliability is not satisfactory.
If the obtained K is greater than .70 -- conclude that the
inter-rater reliability is satisfactory.
No comments:
Post a Comment