The state of Colorado has long been at the forefront of attempts to develop effective methods for coming to terms with the risks posed by people who have sexually abused. In the days when each county or jurisdiction seemed to have different approaches, Colorado implemented the Containment Model. When there were no actuarial measures or other tools for structured professional judgment for grounding assessments, the Colorado Sex Offender Management Board (SOMB) assembled a list of 17 factors, which are a focus of the study below. For purposes of the study, these 17 items are treated as an alternative assessment measure, which was apparently not the original intention, although many evaluators have doubtless treated them as such. For its part, the SOMB is well aware that these 17 items are no longer the final word in assessments, even as they still receive consideration.
It seems important to note this background context, as Colorado’s efforts have indeed been pioneering over the years. In retrospect, it can seem easy to criticize the pioneering developments of groups of professionals. However, it should not be forgotten that when knowledge was scarce and approaches to sex crime resembled the Tower of Babel across the US, Colorado was among the first to develop approaches that numerous other states have emulated. Just the same, there is much we can learn from the study of these approaches, which is the subject of this blog.
An Online-First study by Katharine McCallum, Marcus Boccaccini, and Claire Bryson in the journal Criminal Justice and Behavior, offers fresh insight into the practical application of risk assessment research. The abstract describes their findings succinctly:
In Colorado, evaluators conducting sex offender risk assessments are required to assess 17 risk factors specified by the state’s Sex Offender Management Board (SOMB), in addition to scoring actuarial risk assessment instruments. This study examined the association between instrument scores, the 17 SOMB risk factors, and evaluator opinions concerning risk and need for containment in 302 Colorado cases. Evaluators’ ratings of risk indicated by noninstrument factors were often higher than their ratings of risk indicated by instrument results, but only their ratings of noninstrument factors were independently predictive of containment recommendations. Several of the most influential noninstrument factors (e.g., denial, treatment motivation) have been described by researchers as potentially misleading because they are not predictive of future offending. Findings highlight the need for more studies examining the validity of what risk assessment evaluators actually do, as opposed to what researchers think they should do.
This is not the first study finding that professionals often over-estimate risk across a range of conditions. The authors provide an eye-opening literature review, and Dr. Boccaccini has elsewhere found that the results of evaluations are often swayed according to who is paying for the service. For a context in which evaluators consider 17 items originally developed by the SOMB as a part of their evaluations over and above the far more scientifically proven actuarial measures, it is not surprising that evaluators would give extra weight to the SOMB measure and the items within it. In reading the study, several points become clear:
First, the evaluators in Colorado seem to face a difficult assignment, having historically assessed risk using items shown in research to have no predictive utility. What is the evidence-based assessor to do? Among the most heavily weighted items in the SOMB measure are defensiveness, psychopathology, and level of empathy, which are famously not associated with risk (and therefore with summary risk ratings), but are very likely strong responsivity factors to consider. This leads to questions as to what kinds of risk is actually being assessed, risk for sexual re-offense or risk for problematic adjustment to the conditions of community supervision. If it is the latter, perhaps the findings in this study might be more understandable – even appropriate – if the SOMB tool became more of a measure of risk, need, and responsivity? In this way, risk for sexual re-offense would be evaluated as a first hurdle, with treatment needs and the ability of the examinee to respond to treatment as the second and third hurdles of a more comprehensive assessment. Whatever the case, this study suggests that many evaluators were not pursuing evidence-based approaches in making recommendations related to detention; this should be of concern to anyone interested in effective policy and human rights.
Adding to the complexity of the task, many of the SOMB items most considered in evaluations seem to overlap with items in actuarial scales such as Static-99r, the VRAG, and SORAG. Examples include criminal history, offense history and victim choice, and the nature of the person’s social support system. All of these lead to questions about conceptual double-dipping; how many times does one review criminal history before assessment results become skewed?
At the risk of appearing to be a Pollyanna, it is at least encouraging to see as much use of empirically validated measures as there is. It wasn’t that long ago that risk was assessed with little structure in the process and low accountability for the examiner (e.g., even including the physical attractiveness of the examinee). Although these findings point to much hard work ahead for professionals and policymakers alike, we can at least take heart that our methods have improved in many jurisdictions.
Just the same, the apparent conflation of responsivity and risk factors should cause any professional or lawmaker to be concerned. This comes along with the persistent overestimation of risk, and the means by which conclusions take shape. Further, as Boccaccini’s other research has shown, biases can enter the assessment process through any number of ways, whether explicitly or beyond the awareness of the examiner. This study reminds us that, for all of the rich scientific evidence at our disposal, we are still human beings, subject to being judgmental, opinionated, and biased.
Extending this last point further, one of the most interesting findings in this study was also one of the least explored. In the authors’ words: “In the current study, evaluator differences accounted for 8% of the variance in SOMB summarized risk ratings and 21% of the variance in summarized actuarial risk ratings” (p. 13). In other words, who the evaluator is can be a highly variable part of the equation. For all of our attempts to – and bluster about – the importance of impartiality, we have yet to reach the goal of remaining objective. In some cases, this may be an artifact of using relatively vague items. In other words, evaluator bias may even be akin to the famous country song: “Ya gotta dance with the one that brung ya.”
Reviewing both the study and the Colorado experience itself brought to mind a number of important reminders:
- First, although these findings echo related findings elsewhere, it is still a single study
- Second, it is always important to keep in mind that our best measures and best policies are always subject to bias at the hands of the individuals involved. Our ultimate work should be in the direction of professional self-development and consistency across groups of professionals.