In my last column, I reviewed the basics of extracting valid measures of clinical performance during case review. It turns out that such measures can be of great benefit in minimizing bias in peer review. The key is to remember that measures of clinical performance require an aggregate view over multiple cases. They are not reliable enough to justify harsh corrective action based on a single case. This fits well with the QI model for peer review. When each case review is not a threatening, high-stakes exercise and the rating scale captures all shades of gray, life is easier.
This is worth emphasizing. A common misconception is that bias can be minimized by parsing peer review findings into a small number of categories. Such thinking stems from confusion between reliability of a measure and agreement between raters. Agreement is an illusion if the reliability of a measure is low. For example, we can agree that the standard of care was not met, but be wrong for having overlooked a key factor in our case review.
In the language of quality improvement, we can think of bias as a variety of special cause variation that degrades the reliability and validity of measurement. In the context of peer review, bias commonly arises in relation to the clinical outcome or the reviewer. Less commonly, there may be combined reviewed physician and reviewer bias as in cronyism and bad faith peer review. Such conflicts of interest are generally obvious and readily avoided by judicious program design and review assignment. Finally, case selection is a potential source of bias that has not been formally studied. This possibility could be minimized by standardizing case identification mechanisms as well as any applicable pre-review screening activity to create a level playing field.
Cases with bad outcomes are judged more harshly than the same care which results in an acceptable outcome. While our mental model holds that serious adverse events are more likely to be associated with substandard care, it is not a simple relationship. Good physicians can have bad outcomes, even if, other things being equal, we expect them to have fewer bad outcomes than their less capable counterparts. This makes the case for measuring clinical performance through a standardized process that emphasizes data aggregation and routine constructive feedback. If case reviews are evenly distributed to members of the review committee without regard to clinical subject matter, this approach also helps to mitigate the potential bias from reviewer factors such as being too critical or lenient.
Reviewer bias is also reduced when committee discussion serves to validate review findings. Those who work together tend to develop common standards. Training, particularly an exercise in duplicate case review and discussion, can further reduce biases. See my whitepaper on this subject for more detail. Similarly, if reviewers understand the risk of outcome bias, the committee can develop the habit of explicitly discussing the issue. Some authors distinguish hindsight bias (Monday morning quarterbacking) from outcome bias, that is, having knowledge of the outcome tends to make one over-confident in one’s ability to have predicted it. Hindsight bias can be most productively addressed in the process of identifying potential strategies to prevent recurrence of an error or adverse event. It may be of less concern for the process of clinical performance measurement per se.
With this background, we should be ready for a deeper dive into effective event analysis.
Marc T. Edwards, MD, MBA
President & CEO
QA to QI
An AHRQ Listed Patient Safety Organization