Notice: Out-Dated Information for Archival Purpose Only

These methodological notes were updated after the results of the Longitudinal Clinical Peer Review Effectiveness Study completed in March 2016 were accepted for publication in the International Journal of Quality in Health Care, published by Oxford University Press. The most current information can be found at:


PREP-SETTM is an evidence-based 16-item inventory of fundamental peer review process characteristics. It takes only minutes to complete.

PREP-SETTM measures the extent to which an organization's clinical peer review program conforms to the best practice QI model. It originally contained 11 factors, first identified in the 2007 National Peer Review Practices Survey, which predict higher perceived ongoing program impact on the quality and safety of care when controlling for other program factors. These factors were subsequently validated in the 2009 ACPE Clinical Peer Review Outcomes Study. The 2007 study also showed that few if any programs were using reliable methods for measuring clinical performance during case review, even though there are valid models to do so. Therefore, the original tool included 2 additional elements (Clinical Performance Measurement and Rating Scale reliability) for a total of 13 items.

This revision to the Clinical Peer Review Self-Evaluation Tool is based on the results of the Longitudinal Clinical Peer Review Effectiveness Study conducted at the end of 2011. The tool has been expanded to include 16 best-practice items. The maximum possible score is still 100 points. All the factors first identified in the 2007 study continue to correlate with subjective quality impact. The items related to clinical performance measurement and reliable rating scales have been retained, despite the absence of such correlation, because of the strong literature support for the practices which they define.

The point values for scoring item responses were derived empirically, guided by the odds ratios from multivariate ordinal logistic regression models involving various combinations of the factors and the R2 values of the equivalent linear regression models. The standards for acceptance of the regressions were the same as for the 2011 study: namely that all the factor coefficients and intercepts were significant at p > 0.05 and the goodness-of-fit tests met p < 0.1. Response levels were selectively collapsed guided by this goal. These collapses define the transition between the “ineffective” and “effective” levels of any given factor when controlling for other factors. Finally, the new scoring schema was validated against the regression models for quality impact presented in the published report of the 2011 study.

The new QI Model Score predicts the likelihood that a program makes a significant ongoing contribution to quality and safety with R2 = 43%. A 10 point increase in the score predicts a one level change in quality impact with an odds ratio [95% CI] of 2.5 [2.1 – 2.9]. Among the 300 respondents to the study, the median new QI Model Score was 43 and the range 0 – 100. Only 4% scored at or above 80, a level that indicates substantial adoption of the QI model. Moreover, 83% score below 65 – the equivalent of a “failing grade.”

Thus, these data show that the vast majority of programs are still operating largely within the framework of the outmoded and dysfunctional QA model. This is a serious problem. The QA model perpetuates a “shame and blame” culture and thereby inhibits progress toward a culture of safety characterized by trust, honest and timely reporting of adverse events, near misses and hazardous conditions, and improvement of identified problems.

There are no comparable data available for clinical peer review programs for nurses and other healthcare professionals such as pharmacists. Nevertheless, the QI model would be expected to be a relevant guide to best practice.

PREP-SETTM was first published in the September/October 2009 ACPE Physician Executive Journal:


Additional Methodological Notes

From the 2009 study, the inter-rater reliability of PREP-SETTM was estimated using 27 paired ratings as 0.61 [0.31-0.80]. The reliability of the mean of 2 independent ratings of the QI Model Score is estimated as 0.75 [0.47-0.89]. This means that the average of 2 or more independent ratings should have good reliability for serial measurement.

As a practical matter, the larger the organization and the greater the variation in process, the more input should be obtained. The tool asks for a rating of what prevails in the institution. Where there are islands of QI model-style process in a sea of QA-style activity, there may be a tendency to inappropriately up-score. In such organizations, I have found it useful to walk through the tool as a group. This sharing of information can provide a useful platform from which to plan improvements.

The language used in the evaluation tool seeks to mitigate widespread misunderstanding of clinical performance measurement as applied to clinical peer review. The scientific literature demonstrates the superiority of a methodology termed structured implicit review. Therefore, the original Self-Evaluation Tool contained an item labeled Structured Review. Unfortunately, this terminology proved confusing because the vast majority of organizations use structured forms to record the findings from peer review, even though such forms do not meet the standard for structured implicit review. Consequently, I believe that we should abandon the term structured implicit review other than for academic purposes. The Self-Evaluation Tool includes items that assess in less ambiguous language whether the clinical peer review process generates reliable clinical performance measures.

If you are considering making improvements, be sure to learn more about how our Peer Review Enhancement ProgramSM (PREPSM) and My PREPTM Toolkit can help.