Intraclass correlation coefficients (ICC) are recommended for the assessment of the reliability of measurement scales. However, the ICC is subject to a variety of statistical assumptions such as normality and stable variance, which are rarely considered in health applications.
A Bayesian approach using hierarchical regression and variance-function modeling is proposed to estimate the ICC with emphasis on accounting for heterogeneous variances across a measurement scale. As an application, we review the implementation of using an ICC to evaluate the reliability of Observer OPTION5, an instrument which used trained raters to evaluate the level of Shared Decision Making between clinicians and patients. The study used two raters to evaluate recordings of 311 clinical encounters across three studies to evaluate the impact of using a Personal Decision Aid over usual care. We particularly focus on deriving an estimate for the ICC when multiple studies are being considered as part of the data.
The results demonstrate that ICC varies substantially across studies and patient-physician encounters within studies. Using the new framework we developed, the study-specific ICCs were estimated to be 0.821, 0.295, and 0.644. If the within- and between-encounter variances were assumed to be the same across studies, the estimated within-study ICC was 0.609. If heteroscedasticity is not properly adjusted for, the within-study ICC estimate was inflated to be as high as 0.640. Finally, if the data were pooled across studies without accounting for the variability between studies then ICC estimates were further inflated by approximately 0.02 while formerly allowing for between study variation in the ICC inflated its estimated value by approximately 0.066 to 0.072 depending on the model.
We demonstrated that misuse of the ICC statistics under common assumption violations leads to misleading and likely inflated estimates of interrater reliability. A statistical analysis that overcomes these violations by expanding the standard statistical model to account for them leads to estimates that are a better reflection of a measurement scale's reliability while maintaining ease of interpretation. Bayesian methods are particularly well suited to estimating the expanded statistical model.