Accuracy of recall by adults of their childhood psychiatric history is an important concern for several areas of clinical research that often must rely on retrospective reports, e.g., developmental psychopathology, psychiatric epidemiology, and family studies. From a clinical perspective, accuracy of recalled childhood psychiatric data is frequently relevant to diagnostic evaluation. Inaccuracies and difficulties in remembering early signs and symptoms are ignored in the current nosological systems, and criteria for historically derived diagnoses do not differ from those applied to current status. Retrospective childhood data are particularly important for the diagnosis of adult attention deficit hyperactivity disorder (ADHD), since a positive childhood history is required. Although the validity of retrospective reports of childhood psychiatric history has been questioned (1–5), there are few investigations of retrospective accuracy.
Yarrow and colleagues (6) investigated retrospective reports of early childhood data presumed related to personality development. Recollections were obtained from subjects and their mothers about data collected during the preschool period. Numerous features of infant characteristics, maternal care, the family environment, child rearing, and personality characteristics of the child were included. Most relationships between baseline and recall data were significant; however, the correlations were small (0.20–0.30), particularly for subjective judgments. Also, the range of follow-up intervals varied widely, and follow-up ages ranged from 7 to 34 years. In addition, attrition was high (about 50%). Finally, assessments were not systematic, and, as acknowledged by the authors, sources of information were inconsistent across subjects.
Henry and colleagues (7) compared retrospective reports of childhood information with prospectively obtained information on a variety of topics (e.g., residential changes, delinquency, reading ability, behavior problems). The subjects were 18 years old, attrition was low, childhood information was systematically obtained, and data on activity level were examined. Recall of subjective information demonstrated the lowest agreement with original material. For "general activity level," correlations between retrospective and childhood ratings made by subjects, parents, and teachers were small and insignificant (0.04–0.09).
Holmshaw and Simonoff (8) interviewed 32 subjects, aged 24–37 years, who were initially seen at child psychiatric clinics between age 5 and 17. (Data on attrition were not provided.) Clinicians who evaluated the subjects were unaware of their childhood psychiatric status. Retrospective recall was compared to information extracted from childhood psychiatric records and clinicians’ checklists. Agreement for "hyperactivity symptoms," as measured by the kappa coefficient, was 0.50, generally indicating modest concordance (9). Childhood diagnoses included anxiety, mood, elimination, and conduct disorders, and other miscellaneous conditions (e.g., Asperger’s syndrome), but not ADHD. Therefore, the findings do not inform our evaluation of recall of ADHD.
The present study reports on the long-term recall of childhood ADHD in a prospective follow-up study of children with this disorder (10–13). This study has several methodological advantages. Subjects were clinically diagnosed at a child psychiatric clinic and met research criteria for ADHD. They were initially seen as children (mean age=8 years) and later assessed as adults (mean age=25 years). The size of the study group was large (N=207 boys), attrition was low (15%), and the follow-up interval was extended (mean=16.5 years). A non-ADHD comparison group was also evaluated. Follow-up interviews were conducted by clinicians who were completely blind to childhood status, and both childhood and adult clinical information were obtained systematically.
The study addressed the following questions:
What percentage of adults who had ADHD in childhood report sufficient symptoms to warrant a retrospective diagnosis of ADHD by clinicians? Stated differently, what percent of true cases is correctly identified by retrospective report (sensitivity of recall)? Also, what percentage of non-ADHD comparison subjects is correctly judged as not having had childhood ADHD (specificity of recall)? To what degree do rates of retrospectively diagnosed childhood ADHD among probands (true cases) differ from the comparison group (odds ratio)?
Are certain recalled behavioral manifestations of inattention, impulsivity, and hyperactivity (e.g., trouble finishing things, for inattention; doing dangerous things, for impulsivity; being a restless sleeper, for hyperactivity) better at identifying childhood ADHD than others?
When data are adjusted for the prevalence of ADHD, what percentage of all subjects who would be retrospectively diagnosed as having ADHD is correctly identified as having childhood ADHD (positive predictive value)? Conversely, of all subjects who would not be diagnosed as having ADHD as children, what percentage is correctly identified as not having childhood ADHD (negative predictive value)? How do adjustments of prevalence affect these estimates?
The initial cohorts have been described elsewhere (10), and their characteristics are reviewed only briefly here. The children with ADHD included 207 Caucasian, 6- to 12-year-old boys of middle socioeconomic status who were referred to a no-cost psychiatric research clinic between 1970 and 1977 (14, 15). The criteria for study inclusion were 1) referral by schools because of behavior problems, 2) elevated ratings by teachers on standard scales of hyperactivity, 3) behavior problems in settings other than school (e.g., home), 4) a diagnosis of DSM-II hyperkinetic reaction of childhood by a child psychiatrist on the basis of interviews with the mother and child, 5) a full-scale IQ of at least 85 (16, 17), 6) no evidence of psychosis or neurological disorder, and 7) English-speaking parents and a home telephone. Children whose primary reason for referral involved aggressive or other antisocial behaviors were excluded.
The comparison subjects, who were recruited in adolescence, were identified from nonpsychiatric departments of the same medical center (10, 18). Charts from the adolescent medicine outpatient clinic were reviewed to identify Caucasian males who had no recorded behavior problems before age 13 years and who were seen for routine physical examinations or acute transient illnesses (e.g., the flu). Parents of children who matched the demographic characteristics of the probands (group matching) were asked whether elementary school teachers had ever complained about their child’s behavior and whether they ever had concerns about their child’s behavior before age 13. If either teachers or the parents had concerns about the child’s behavior, the child was excluded.
To enlarge the comparison group, a community-sampling service recruited 14 additional subjects who met the study criteria. The resulting group of comparison subjects consisted of 178 Caucasian males between the ages of 16 and 23 years (mean=18.8 years, SD=1.5) who came primarily from middle-class homes (mean=2.8, SD=1.1), as assessed with the Hollingshead Index of Social Position (19).
All subjects were directly interviewed by a clinical psychologist or a psychiatric social worker who was blind to the subject’s childhood status and the study design. Subjects provided written informed consent after the purpose of the study and its procedures had been fully explained. Subjects were administered the Schedule for the Assessment of Conduct, Hyperactivity, Anxiety, Mood, and Psychoactive Substances (20), a semistructured psychiatric interview designed to generate lifetime DSM-III-R diagnoses. The instrument has been shown to have good to excellent interrater reliability for all major diagnoses, with kappas ranging from 0.60 to 1.00. The kappa for ADHD was 0.70 in the present study (11).
The Schedule for the Assessment of Conduct, Hyperactivity, Anxiety, Mood, and Psychoactive Substances includes sections on inattention, hyperactivity, and impulsivity during childhood. Affirmative responses are followed by a series of clinical probes, e.g., Was that more so than others your age? Could you give me an example? How often did that occur? How much of a problem was that for you? Did that lead to any difficulties in school or with other people? Each item was rated as present or absent on the basis of the clinical judgment of the interviewer, rather than on an initial affirmative or negative response from the subject. For example, if a subject endorsed being "very active" as a child, but described activity levels that clearly did not deviate from peers, the behavior of excessive activity was rated absent.
Interviewers formulated definite and probable DSM-III-R diagnoses, and documented the diagnoses in narrative summaries, which were blindly reviewed by authors R.G.K. or S.M. for diagnostic accuracy. A definite diagnosis was given when the criteria were fully met. A probable diagnosis was given if the symptom criteria were not fully met but clinically significant impairment was associated specifically with the symptoms.
Analyses were based on probable or definite, retrospective childhood DSM-III-R ADHD diagnoses formulated on the basis of the direct interviews of adult subjects by clinician interviewers who were unaware of the subjects’ childhood status. Combining probable and definite diagnoses for analysis is consistent with clinical practice, since significant impairment was required for both levels of certainty. In addition, including probable diagnoses seemed sensible, since retrospective recall was being assessed and a certain amount of forgetting was expected. However, essentially the same results were obtained and no major conclusions were altered when the level of certainty was limited to definite.
Kappa (9) was used as a measure of chance-corrected concordance of retrospective and childhood ADHD diagnoses. A logistic regression analysis (21) compared the rates of retrospective ADHD diagnoses in probands versus comparison subjects. Socioeconomic status at follow-up differed significantly between probands and comparison subjects. Since socioeconomic status was significantly lower for subjects with versus without a retrospective diagnosis of childhood ADHD, this variable was entered as a covariate. However, the results and conclusions were unchanged when no adjustments were made for the effects of socioeconomic status.
Odds ratios, sensitivities, and specificities assessing the diagnostic utility of symptom ratings are reported (22). Positive predictive value is reported as an estimate of the percentage of correctly diagnosed subjects and negative predictive value as an estimate of the percentage correctly not diagnosed, as a function of the base rate (23).
One hundred seventy-six former ADHD probands and 168 comparison subjects were directly interviewed as adults at follow-up. The follow-up intervals ranged from 14 to 20 years (mean=16.5, SD=1.1), with relatively low attrition (15%). At follow-up, proband and comparison subjects did not differ in age (mean=25.3 years, SD=1.4, versus mean=25.1, SD=1.8) (t=1.21, df=342, p=0.23), but the comparison subjects had a significantly higher-ranking socioeconomic status than the probands (mean=2.7, SD=0.9, versus mean=3.4, SD=0.8) (t=6.76, df=342, p<0.001).
Retrospective Recall of ADHD
Retrospective childhood diagnoses of ADHD were made for 78% of probands (N=137) and 11% of the comparison subjects (N=18) (sensitivity=0.78; specificity=0.89; odds ratio adjusted for socioeconomic status=26.47, Wald χ2=106.60, df=1, p<0.001). The kappa was 0.67, suggesting good agreement beyond chance between retrospective diagnoses made at follow-up and diagnoses made in childhood (10).
Diagnostic Utility of ADHD Symptoms
t1 shows the diagnostic utility of each ADHD symptom, ordered from highest to lowest discriminating power (odds ratio) within each of the three groups of symptoms (inattention, impulsivity, hyperactivity). Although many symptoms achieved high odds ratios, they were not always accompanied by high sensitivities. For example, for needing supervision, the odds ratio was 17.27, but the sensitivity was 0.49; for having "difficulty sticking to play," the odds ratio was 9.46, but the sensitivity was 0.10.
As a measure of relative diagnostic utility, the following criteria were applied to identify symptoms that clearly discriminated probands from comparison subjects: 1) high odds ratio (>10.00) and 2) sensitivity and specificity both >0.70. Six symptoms fulfilled these criteria (t2). All had high face validity and were practically synonymous with the symptoms they represented: distractibility, concentration difficulties, and complaints of inattention, for inattention; acting before thinking, for impulsivity; and being "on the go" and fidgeting/squirming, for hyperactivity. Using different combinations (with an "or" rule) of these symptoms generally increased sensitivity but decreased specificity (t2).
Adjusting for Prevalence of Childhood ADHD
The probability that an individual who presents as having childhood ADHD actually has childhood ADHD depends on the prevalence of the disorder, as well as the sensitivity and specificity of the retrospective diagnoses (24). This is also true for correct rejections of the diagnosis. t3 shows the projected numbers for the sensitivities and specificities obtained in the present study, assuming a 5% general population prevalence for ADHD (176 cases/3,520 total subjects=5%; therefore, 176 true cases plus 3,344 noncases would be expected). The last two columns show the positive predictive value (i.e., the proportion of true positive cases among all individuals retrospectively given a diagnosis) and the negative predictive value (i.e., the proportion of true negative cases among all individuals retrospectively considered not to have the disorder).
Nearly all individuals classified as not having the disorder were correctly identified as such. However, only 27% of those classified as having a diagnosis of childhood ADHD had true positive cases of the disorder. For cases identified on the basis of symptom ratings, the rate of correctly classified individuals was worse, ranging from 9% to 18%.
t4 shows the relationship between prevalence, positive predictive value, and negative predictive value for the sensitivity and specificity estimates obtained in the present study. In general, as the prevalence of ADHD increased, positive predictive value increased and negative predictive value decreased.
Accuracy of Retrospective Recall
In a large group of subjects who had received a diagnosis of ADHD as children in a psychiatric research clinic and who were followed up in adulthood, the rate for correctly identifying adults who had ADHD as children on the basis of subjects’ retrospective recall of childhood symptoms was 78%. The rate of false positive classification of subjects was 11%.
The present study employed trained clinicians (a clinical psychologist and a psychiatric social worker) to minimize false positive ratings and diagnoses and to maximize the identification of salient features (i.e., avoid missing actual cases of ADHD). In addition, semistructured interviews were used to systematize data collection and to ensure coverage of predetermined information. The results might have differed if lay interviewers and totally structured instruments were used.
The sensitivity and specificity of 0.78 and 0.89 for the retrospective diagnosis of ADHD are impressive. However, participants in the present study were patients who had been referred for treatment of pervasive ADHD. These estimates may not apply to nonreferred children, who represent a substantial proportion of children with ADHD (25–27) and for whom lower sensitivity and specificity of retrospective diagnosis are likely.
When general population prevalence is taken into consideration, the accuracy of retrospective recall was less promising. Assuming a 5% prevalence of ADHD, the correct rejection rate would be nearly perfect, but only 27% of those with a retrospective diagnosis of childhood ADHD would have the disorder. Stated differently, about three of four retrospectively diagnosed cases will be false positives. Since retrospective diagnoses for less severely ill, nonreferred individuals are likely to have even lower sensitivities and specificities, and the positive predictive value decreases with reductions in these parameters (23), the "average" case is even more likely to be susceptible to misidentification. However, these findings were obtained by using a general population estimate of 5%, as would be expected in an epidemiological survey, a general primary care facility, a school environment, and other settings in which childhood ADHD is not expected to be overrepresented. As the proportion of expected cases increases, positive predictive value also increases, and acceptable levels of both positive predictive value and negative predictive value are reached at around 50% prevalence (t4). Therefore, for example, since children with ADHD are at increased risk for having antisocial personality as adults (11, 13), retrospective assessment for ADHD in a prison population would likely have a positive predictive value greater than that obtained by using a 5% prevalence adjustment. Similarly, individuals attending an adult ADHD clinic would be expected to have a higher rate of childhood ADHD.
Diagnostic Utility of Specific Symptoms
We found a level of accuracy of recall greater than that reported in other studies (6–8). However, when prevalence was taken into consideration, specific symptoms had limited diagnostic utility. Positive predictive values ranged between 9% and 18%, indicating that only one of 10 to one in five individuals with a retrospective diagnosis would have a true case of ADHD (t3).
The six most discriminating behaviors (distractibility, concentration difficulties, and complaints of inattention, for inattention; acting before thinking, for impulsivity; and being "on the go" and fidgeting/squirming, for hyperactivity) practically defined the symptoms that they represented. The least discriminating items, disorganization (odds ratio=2.87) and losing things (odds ratio=2.15), have interesting histories in the classification systems. Disorganization was classified as an impulsivity behavior in DSM-III, was not included in DSM-III-R because of low discriminating power in the field trials (odds ratio=2.81) (22), and was reinstated as an inattention behavior in DSM-IV. Losing things was not included in DSM-III, obtained one of the lowest odds ratios (odds ratio=4.37) among the 15 items in the DSM-III-R field trials (22), and was included in DSM-IV as an inattention behavior.
The present study assessed the accuracy of retrospective diagnoses of ADHD and did not address whether different criteria should be used in making a current childhood diagnosis versus a retrospective childhood diagnosis. Recognizing that different standards may apply to retrospective diagnoses, we suggest that the following considerations might be appropriate for the DSM-V criteria for ADHD:
Of the six most discriminating symptoms, one, acting before thinking, was not included in DSM-IV. This symptom was included in DSM-III, but not DSM-III-R. Considering its ability to differentiate subjects in the present study (odds ratio=10.96, sensitivity=0.80, specificity=0.74) and considering that the most discriminating items had the highest face validity, this symptom should be considered for inclusion in the DSM-V criteria for ADHD.
Disorganization and losing things do not appear to contribute meaningfully to retrospective diagnoses of ADHD. In view of their performance in the present study, as well as their unclear histories, the DSM-V task force should consider dropping these items from the criteria for ADHD.
Implications and Limitations
The results of the present study do not bode well for relying on information from adults’ self-reports to identify a history of childhood ADHD in the general population. For example, the use of general adult population surveys to estimate the prevalence of childhood ADHD would be expected to yield substantial overestimates. However, the results are more encouraging for groups of adults that are presumed to include a large proportion of individuals with childhood ADHD.
The present findings highlight the importance of obtaining contemporaneous data to substantiate retrospective diagnoses of childhood ADHD, since, for any given group of adults, the true proportion of individuals with childhood ADHD is not known. Future studies should examine whether data from knowledgeable informants about childhood behaviors will further improve the validity of retrospective diagnoses. In a treatment study by Wender et al. (28), adult patients who reported chronic histories of restlessness, impulsivity, irritability, and emotional lability since childhood were randomly assigned to receive pemoline (a psychostimulant) or placebo. No significant treatment differences were found. However, when analyses were limited to patients who were rated by their parents as markedly hyperactive in childhood, pemoline emerged as highly and significantly more effective than placebo. Using treatment response as a validity indicator, these findings suggest that knowledgeable informants may be needed to ensure accurate retrospective diagnoses of ADHD.
A related issue concerns the interpretation of the retrospective diagnosis. To what extent are problems in retrospective recall associated with the retrieval process, and to what extent are they a function of children’s awareness of ADHD problem behaviors? Similarly, are there differences between individuals who, as children, knew they had ADHD and those who did not? Although we cannot address these questions in the present study, future follow-up investigations should do so with appropriate research designs.
The current DSM-IV standards did not exist at the time these children were identified. However, the original study inclusion criteria required cross-situational impairment resulting from ADHD symptoms, as does DSM-IV for a diagnosis of ADHD. In addition, while the behavioral manifestations listed in each successive revision of DSM have changed, the three core symptoms of inattention, impulsivity, and hyperactivity have remained consistent throughout. Also, as reported elsewhere (13), other childhood measures (e.g., blinded, classroom observations and teacher ratings) of these behaviors are consistent with a DSM-IV diagnosis of ADHD.
The present research included a clinic-referred group of Caucasian male subjects with ADHD. The results cannot be simply generalized to other populations; however, the nature of the findings is such that we believe general applicability is likely.
Received Dec. 21, 2001; revision received May 29, 2002; accepted June 10, 2002. From the New York University Child Study Center, New York University School of Medicine; the Nathan S. Kline Institute for Psychiatric Research, Orangeburg, N.Y.; the Department of Psychiatry, Columbia University, New York; and the Department of Psychology, New York University. Address reprint requests to Dr. Mannuzza, New York University Child Study Center, 215 Lexington Ave., 13th Floor, New York, NY 10016; firstname.lastname@example.org (e-mail). Supported by NIMH grants MH-18579 and MH-30906.