The Clinical Peer Review Program Self-Assessment Inventory is an evidence-based 20-item questionnare addressing characteristics of program structure, process and governance. It takes only minutes to complete. The resulting QI Model Score measures the extent to which an organization's clinical peer review program conforms to the best practice QI model. It is a product of four national studies of clinical peer review practices in US hospitals.

First published in "Peer Review: A New Tool for Quality Improvement" (Phys Exec 2009;35(5):54-59), the inventory originally contained 11 factors identified in the 2007 National Peer Review Practices Survey, which predict higher perceived ongoing program impact on the quality and safety of care when controlling for other program factors. These factors were subsequently validated in the 2009 ACPE Clinical Peer Review Outcomes Study. The 2007 study also showed that few if any programs were using reliable methods for measuring clinical performance during case review, even though there are valid models to do so. Therefore, the original tool included 2 additional elements (Clinical Performance Measurement and Rating Scale reliability) for a total of 13 items.

The current revision of the Clinical Peer Review Program Self-Assessment Inventory is based on the results of the Longitudinal Clinical Peer Review Effectiveness Study completed in March 2016 and accepted for publication on March 8, 2018 by the International Journal of Quality in Health Care published by Oxford University Press. It replaces the 2013 modification that followed publication of "A Longitudinal Study of Clinical Peer Review's Impact on Quality and Safety in US Hospitals" (J Healthcare Manage 2013;58(5):369-384). The 2013 version and associated methodological notes remain available for reference.

2018 Revision

The Clinical Peer Review Program Self-Assessment Inventory has been expanded to include 20 best-practice items. The maximum possible score is still 100 points. Ten of the 11 items identified in the 2007 study consistently performed well as multi-variate predictors of subjectively-measured ongoing program impact on quality and safety across all three subsequent studies. They have been retained.

An item addressing the use of adverse event rates as a measure of program effectiveness was not correlated with any outcome measure as re-written for this study. The use of adverse event rates as a measure of program effectiveness may have appeared as an artifact of item wording, since many organizations track these rates for other purposes. Adverse event rates are a more problematic and less specific measure than the count of improvement opportunities identified by the program, the turn-around-time for case review or the count of clinicians whose excellent performance was worthy of recognition—which are all strongly correlated with program effectiveness and in combination have been added to the QI model.

The the 2016 data validated the three items added in 2013. These include the likelhood for self-reporting of adverse events, near misses and/or hazardous conditions affecting one's patients, the quality of case review, and the proportion of case reviews discussed in committee prior to final decision-making. The data also justified the addition of five new items addressing program goals, scope, relation to credentialling, performance dashboard and solicitation of input from reviewed clinicians.

The point values for scoring item responses were derived empirically, guided by the odds ratios from multivariate ordinal logistic regression models involving various combinations of the factors and the R2 values of the equivalent linear regression models. The standards for acceptance of the regressions were the same as for the 2011 study: namely that all the factor coefficients and intercepts were significant at p > 0.05 and the goodness-of-fit tests met p < 0.1. Response levels were selectively collapsed guided by this goal. These collapses define the transition between the “ineffective” and “effective” levels of any given factor when controlling for other factors.

Revised QI model scores ranged from 0 to 96 with a median [IQR] of 50 [32-68]. Only 13% scored at least 75. 71% score below 65 – the equivalent of a “failing grade.” A 10 point increase is associated with an odds ratio [CI] of 2.5 [2.2-3.0] for a one level increase in quality impact, 1.8 [1.6-2.1] for medical staff engagement in quality, 2.0 [1.8-2.3] for medical staff perceptions of the peer review program, and 1.5 [1.3-1.7] for physician-hospital relations. The revised score better predicts program impact than the original when comparing log likelihoods for the regressions (-267 vs. -304).

Thus, these data show that the vast majority of programs are still operating largely within the framework of the outmoded and dysfunctional QA model. This is a serious problem. The QA model perpetuates a “shame and blame” culture and thereby inhibits progress toward a culture of safety characterized by trust, honest and timely reporting of adverse events, near misses and hazardous conditions, and improvement of identified problems.

There are no comparable data available for clinical peer review programs for nurses and other healthcare professionals such as pharmacists. Nevertheless, the QI model would be expected to be a relevant guide to best practice.

Clinical Peer Review Program Improvement Opportunity

In the following table, the rank order highlights the improvement opportunity in terms of the average number of QI Model Score points which could be gained on each item from adoption of best practices.

Revised QI Model Factors and Improvement Opportunity
Factor Points Average Improvement Opportunitya
Likelihood of self-reporting adverse events, near misses and/or hazardous conditions 10 7.0
Quality of case review 10 6.4
Degree of standardization of peer review process 10 5.3
Diligence of program governance 10 4.2
Connecting the peer review program to the hospital’s quality/safety/performance improvement process 6 3.8
Excellent reviewer participation 6 3.8
Aiming the program to improve quality and safety above all elseb 8 2.9
Using a dashboard of process and outcome indicators to manage and improve the programb 3 2.6
Recognizing outstanding clinical performance 3 2.4
Routinely soliciting input to case review from involved cliniciansb 5 2.2
Using reliable rating scales to make subjective measures of clinical performancec 2 1.7
Reviewing an adequate volume of cases to generate meaningful improvements in care delivery 3 1.4
Measuring clinical performance as part of the case review process 2 1.2
Sharing information about program effectiveness with the board of trustees 2 1.0
Providing timely case review and communication of opportunities for improved performance 5 0.9
Allowing for group discussion of most case reviews 2 0.9
Looking for process improvement opportunities in every case review 7 0.8
Reviewing pertinent diagnostic studies that influenced critical decisions, not just the reports 2 0.5
Separating the peer review program from credentialing, even if the results of peer review inform credentialing activitiesb 2 0.4
Program scope includes either concurrent review or case-specific, individually-targeted recommendations to improve performanceb 2 0.3
Total 100 49.8
aAverage QI model score points available from adoption of best practices
bProvisional additions to QI model based on 2015 data
cIncluded in QI model based on measurement reliability theory


Additional Methodological Notes

From the 2009 study, the inter-rater reliability of the inventory was estimated using 27 paired ratings as 0.61 [0.31-0.80]. The reliability of the mean of 2 independent ratings of the QI Model Score is estimated as 0.75 [0.47-0.89]. This means that the average of 2 or more independent ratings should have good reliability for serial measurement when used for organizational self-assessment.

As a practical matter, the larger the organization and the greater the variation in process, the more input should be obtained. The tool asks for a rating of what prevails in the institution. Where there are islands of QI model-style process in a sea of QA-style activity, there may be a tendency to inappropriately up-score. In such organizations, I have found it useful to walk through the tool as a group. This sharing of information can provide a useful platform from which to plan improvements.

The language used in the evaluation tool seeks to mitigate widespread misunderstanding of clinical performance measurement as applied to clinical peer review. The scientific literature demonstrates the superiority of a methodology termed structured implicit review. Therefore, the original Self-Evaluation Tool contained an item labeled Structured Review. Unfortunately, this terminology proved confusing because the vast majority of organizations use structured forms to record the findings from peer review, even though such forms do not meet the standard for structured implicit review. Consequently, I believe that we should abandon the term structured implicit review other than for academic purposes. The Self-Evaluation Tool includes items that assess in less ambiguous language whether the clinical peer review process generates reliable clinical performance measures.

If you are considering making improvements, learn more about how our Peer Review Enhancement ProgramSM (PREPSM) and My PREPTM Toolkit can help.