PHILADELPHIA -- Field testing of proposed new diagnostic criteria for many mental disorders showed that, for the most part, they were at least adequately reliable, according to initial reviews presented here.
But some new disorders that had appeared set for inclusion in DSM-5, the forthcoming fifth revision of the American Psychiatric Association's (APA) influential diagnostic manual, will now probably be relegated to a research-use-only classification because interrater reliability was so poor. Among them is the controversial "attenuated psychosis syndrome," intended to capture patients with low-level hallucinations and thinking disturbances.
In some cases, it appeared that the field trials themselves had failed -- that too few patients were found with a given diagnosis to provide meaningful data, or that other problems cropped up that made the results uninterpretable.
Results of the field trials -- a critical step in the development of DSM-5 -- were presented here at the APA's annual meeting.
The trials are essentially the last step before the final round of meetings and decisions that will set DSM-5 in its published form. Although DSM-5 will be a "living document" that will be revised on an ongoing basis, the paper version must go to the printer in December and therefore the contents must be finalized in the next few months, explained David Kupfer, MD, of the University of Pittsburgh, the task force chairman for the effort.
The trials were conducted in 11 large academic clinical centers and in a large number of small community practices. The academic clinicians were exclusively psychiatrists, whereas the bulk of small-practice clinicians were other types of professionals, including clinical psychologists, psychiatric nurses, and licensed counselors.
The only qualification for small-practice clinicians was that they had to be licensed to make psychiatric diagnoses. Close to 600 were recruited.
Academic Center Findings
About 2,000 patients were seen in the academic centers, and about 1,450 in the small community practices.
In the academic settings, new patients were initially seen and evaluated by a psychiatrist, then again from four hours to two weeks later (median one week) by a second psychiatrist.
Darrel Regier, MD, the APA's research director, explained that the trials were intended primarily to establish reliability -- that different clinicians using the diagnostic criteria set forth in the proposed revisions would reach the same diagnosis for a given patient.
The key reliability measure used in the academic center trials was the so-called intraclass kappa statistic, based on concordance of the "test-retest" results for each patient. It's calculated from a complicated formula, but the essence is that a kappa value of 0.6 to 0.8 is considered excellent, 0.4 to 0.6 is good, and 0.2 to o.4 "may be acceptable." Scores below 0.2 are flatly unacceptable.
Kappa values for the dozens of new and revised diagnoses tested ranged from near zero to 0.78.
For most common disorders, kappa values from tests conducted in the academic centers were in the "good" range:
- Bipolar disorder type I: 0.54
- Schizophrenia: 0.46
- Schizoaffective disorder: 0.50
- Mild traumatic brain injury: 0.46
- Borderline personality disorder: 0.58
In the "excellent" range were autism spectrum disorder (0.69), PTSD (0.67), ADHD (0.61), and the top prizewinner, major neurocognitive disorder (better known as dementia), at 0.78.
The Losers
But some fared less well. Criteria for generalized anxiety disorder, for example, came in with a kappa of 0.20. Major depressive disorder in children had a kappa value of 0.29.
A major surprise was the 0.32 kappa value for major depressive disorder. The criteria were virtually unchanged from the version in DSM-IV, the current version, which also underwent field trials before they were published in 1994. The kappa value in those trials was 0.59.
But a comparison is not valid, Regier told Ƶ. The patient samples were very different -- in the DSM-IV trials, patients with psychiatric comorbidities were excluded, whereas they were allowed in the DSM-5 tests.
Also, the clinicians in the DSM-IV trials received much more elaborate training before seeing patients than did those in the DSM-5 trials.
Helena Kraemer, PhD, a statistician at Stanford University who helped set the general DSM-5 trial structure, said an important goal was to test the criteria "with real clinicians and real patients."
She said the kappa values in DSM-IV were "inflated" because of the patient exclusions and clinician training.
Jan Fawcett, MD, of the University of New Mexico, who headed the DSM-5 workgroup that decided on the depression criteria, said that the disorder may simply be hard to diagnose reliably because of the many comorbidities that come with it.
"I was not happy with the kappas that came out," he said. "But it may be that that is the reliability of that diagnosis in the real world."
Diagnoses that did very poorly in the trials included either because of low kappa values or confidence intervals so wide as to be meaningless:
- Mixed anxiety-depressive disorder in adults and children
- Attenuated psychosis syndrome
- Obsessive-compulsive personality disorder (separate from obsessive-compulsive disorder)
- Antisocial personality disorder
- Nonsuicidal self-injury (e.g., cutting)
Although the kappa value for attenuated psychosis was actually in the "good" range at 0.46, the 95% confidence interval extended to zero, probably reflecting a failed trial.
Most of the others had low point values for kappa -- approaching zero for mixed anxiety-depression in adults -- as well as wide confidence intervals.
These conditions had been considered seriously for inclusion in the main body of DSM-5, but now will probably either be ditched entirely or placed in Section III (what had been the appendix in DSM-IV), for diagnoses that could be clinically useful but need more research-based refinement.
Dimensional Assessments
The academic center trials also examined reliability of so-called dimensional assessments, one of the key innovations proposed for DSM-5. These are add-ons to the conventional checklist-based diagnostic criteria whereby clinicians can grade the severity of certain symptoms or features.
DSM-5 leaders believe that these dimensional assessments will help clinicians better characterize individual patients and also monitor changes over time -- important because symptoms may never entirely go away but they may lighten, which may not be reflected in a checklist approach.
Some "cross-cutting" dimensions, like anxiety or anger, may be present with a large number of individual disorders in different categories. Others may be specific to certain conditions.
William Narrow, MD, MPH, presented some of the results of these evaluations. Reliability in this case was measured with intraclass correlations, for which scores below 0.4 were considered poor; 0.4 to 0.6 was fair; 0.6 to 0.8 was good; and higher scores were excellent.
Most of the cross-cutting dimensions showed good to excellent reliability, but at some sites the scores for depression, mania, and a few other assessments were only in the "fair" range.
In the sites testing children's diagnoses, the reliability often differed when clinicians examined the children themselves versus interviewing the parents.
Narrow said that not all the sites applied the criteria in the same way and that patient mixes may have been different as well. Sample sizes also were a problem at some centers, where too few children with disorders of interest were available.
Another change contemplated in DSM-5 is to replace the Global Assessment of Functioning instrument with another general functional rating tool developed by the World Health Organization called the Disability Assessment Schedule (WHODAS).
It performed well in the trials at most sites, Narrow said.
He also listed cross-cutting measures that need more examination before final decisions are made on including them in DSM-5:
- Adults: depression, anger, mania, alcohol use, other substance use, repetitive behaviors
- Children: anger, mania, psychosis, inattention, prescription abuse, suicidal ideation, self-care/getting around
Results from Small Practices
The test-retest model was not used in the small practices, and therefore no kappa values were available. Each participating clinician was asked to evaluate one new patient and one existing patient both with the DSM-IV and the proposed DSM-5 criteria, and then give their impressions of how the DSM-5 version stacked up.
Specifically, clinicians were asked about the feasibility and clinical utility of the DSM-5 approach and which was better. Patients were asked whether they thought the DSM-5 version would help the clinician understand their symptoms better.
Eve Moscicki, ScD, of the APA-affiliated American Psychiatric Institute for Research and Education, reported that, for most conditions, few clinicians thought the newer criteria were easy to use but few thought they were very difficult.
On two of the more controversial proposals in DSM-5 -- revisions to criteria for ADHD and autism spectrum disorders -- opinion on clinical utility tilted toward the favorable.
Just over 40% of clinicians rated the ADHD criteria as very or extremely useful, while about 25% said "not at all" or "slightly" useful. Some 50% rated the autism spectrum criteria as very or extremely useful and less than 15% had unfavorable views.
When asked which they liked better, DSM-IV or DSM-5, the latter was preferred by three-quarters for ADHD and by 60% for autism spectrum diagnoses, Moscicki reported.
For major adult disorders -- depression, anxiety, trauma and schizophrenia -- DSM-5 was also the clear winner.
Patients also generally gave thumbs-up to DSM-5 as making it easier for clinicians to help them both for the childhood-onset and adult disorders.