We included randomized controlled trials (including studies with counterbalanced crossover designs) that were published in peer-reviewed journals at any time from the inception of the databases. We limited our search to published trials to ensure a level of methodological adequacy and rigor among included trials and to avoid the inevitable problems with securing access to a full set of unpublished trials and the bias that this would introduce (18). Participants (ages 3 to 18 years) had a diagnosis of ADHD of any subtype (DSM-defined ADHD or ICD-defined hyperkinetic disorder, as well as historic variants; we excluded minimal brain dysfunction) or met accepted criteria for clinical levels of symptoms on validated ADHD rating scales. Studies had to have an appropriate control condition. For studies that used two control conditions, we selected the most stringent, in the following order: sham/placebo, attention/active control, treatment as usual, waiting list. Treatment as usual could include medication, but trials were excluded if the nonpharmacological therapy was an adjunct to medication or if both interventions were combined into one therapeutic arm as part of the study design. For instance, studies evaluating the additional benefit of nonpharmacological therapies to already effective medication were excluded. Because allowing medication in treatment as usual may have reduced effect sizes for the nonpharmacological comparator, we conducted sensitivity analyses to compare effect sizes for those trials with low/no medication. Studies in which enrollment depended on the presence of rare comorbid conditions (e.g., fragile X syndrome) were excluded.
A common search strategy was employed for all treatment domains, using a broad range of electronic databases: Science Citation Index Expanded; Social Sciences Citation Index; Arts and Humanities Citation Index; Conference Proceedings Citation Index–Science; Conference Proceedings Citation Index–Social Sciences and Humanities; Index Chemicus; Current Chemical Reactions; Current Contents Connect; Derwent Innovations Index; Biological Abstracts; BIOSIS Previews; CAB Abstracts and Global Health (both from CABI); Food Science and Technology Abstracts; Inspec; MEDLINE; Zoological Record; Ovid MEDLINE; PsycINFO; EMBASE Classic+EMBASE; Web of Science; ERIC; and CINAHL. Articles written in English, German, Spanish, Dutch, and Chinese were included in the search. Common terms for participants (e.g., all variants of ADHD, hyperkinetic disorder, attention deficit) and study design terms were used across domains. The design terms were randomized controlled trial(s); cluster randomized controlled trial(s); clinical trial; controlled clinical trial; crossover procedure or crossover study; crossover design; double blind procedure; double blind method; double blind study; single blind procedure; single blind method; single blind study; random allocation; randomization; random assignment; and randomized controlled trial. Separate treatment terms were used: 1) restricted elimination diet: few foods diet, elimination diet, oligoantigenic diet, restriction diet, food intolerance, food allergy, and food hypersensitivity; 2) artificial food color elimination: food color, food dye, Feingold diet, Kaiser Permanente diet, K-P diet, tartrazine, azo dye, carmoisine, sunset yellow, brilliant blue, indigotine, allura red, quinoline yellow, and ponceau 4R; 3) free fatty acid supplementation: essential fatty acid, long-chain polyunsaturated fatty acids, omega-3, omega-6, docosahexaenoic acid, eicosapentaenoic acid, and arachidonic acid; 4) cognitive training: cognitive training, attention training, working memory training, cognitive remediation, executive function training, and cognitive control; 5) neurofeedback: neurofeedback, EEG biofeedback, neurotherapy, and slow cortical potentials; and 6) behavioral interventions: contingency management, management techniques, contingency techniques, psychosocial interventions, psychosocial treatment, psychosocial therapy, social skills training, social skills intervention, social skills treatment, problem solving intervention, problem solving treatment, problem solving training, problem solving therapy, behavior modification, cognitive behavior treatment, cognitive behavior therapy, cognitive behavior training, parent training, parent counseling, parent support, school-based, classroom-based, school intervention, classroom intervention, teacher training, after-school or remedial teaching, peer tutoring, computer assistance learning, task modification, curriculum modification, classroom management, education intervention, multimodal intervention, multimodal treatment, multimodal therapy, multimodal intervention, multimodal treatment, multimodal therapy, educational intervention, and verbal self-instruction training. Our search terms for behavioral interventions covered a wide variety of intervention types with the aim of being as thorough as possible. However, in the end all the trials that met our criteria involved some element of behavioral training based on social learning or operant techniques. For the specific syntax and language specific formulations used in different databases, see the published study protocol. Database searches were supplemented by manual searches of published reviews. Two coauthors (S. Cortese and M. Ferrin) separately conducted and cross-checked all searches, which were finalized on April 3, 2012.
The outcome measure was pre- to posttreatment change in total ADHD symptom severity measured at the first posttreatment assessment. Results from ADHD-specific symptom scales were used where available (e.g., the DSM-IV ADHD subscale of Conners’ Parent and Teacher Rating Scales) (19). We also permitted questionnaire measures of ADHD-related dimensions (e.g., inattention on Rutter parents’ and teachers’ scales [20]) as well as direct observations.
Trials were blindly double-coded for eligibility. Articles were initially screened on the basis of titles and abstracts, and assessment of articles for final inclusion was based on full text. Disagreements not resolved by coders (N=6) were arbitrated by either of two authors (E. Sonuga-Barke or J. Sergeant) who were independent of the domain specific work groups. The process was independently validated by another author (E. Simonoff) on the basis of “near miss” cases. Study quality was assessed by two independent raters (with disagreements resolved by E. Simonoff) using the standard definitions for randomization, blinding, and treatment of missing data provided by Jadad et al. (21).
Sample and design information of included trials were entered into RevMan, version 5.0 (http://ims.cochrane.org/revman) to provide a systematic record of study features (22). Data were extracted by a single person in each domain and independently checked by another. See the published protocol for a list of data extracted.
Individual effect sizes (the standardized mean difference) were based on the recommended formula: mean pre- to posttreatment change minus the mean pre- to posttreatment control group change divided by the pooled pretest standard deviation with a bias adjustment (23). Crossover trials were treated as parallel group trials because insufficient data were provided to permit analysis of within-individual change (e.g., there were no correlations of scores between conditions). This is a conservative approach, equivalent to setting the between-condition correlation to zero (24). In this case, the pretest (baseline) standard deviation was used as the denominator in the calculation of the standardized mean difference. When necessary, missing standard deviations were imputed separately for each of the outcome measures. The reported pretest standard deviations for each outcome measure were pooled across trials, and the value at the third quartile was adopted for studies with missing standard deviation values (25). Standardized mean differences for trials in each domain were combined using the inverse-variance method, in which the reciprocal of their variance is used to weight the standardized mean difference from each trial before being combined to give an overall estimate (26). Given the heterogeneity of ADHD assessments, sample characteristics, and implementation of treatments within domains in the included studies, we chose a priori to use random-effects models, as recommended by Field and Gillett (27). The I2 statistic was calculated, a posteriori, as an estimate of between-trial heterogeneity in standardized mean difference, although given the number of trials included, the power to detect heterogeneity in these analyses is relatively low (28).
The most proximal assessment analysis used a report by the rater closest to the therapeutic setting as the outcome measure (i.e., parent ratings except for teacher-based interventions when teacher ratings or direct observations were used). If ratings of total ADHD symptoms (inattention, hyperactivity, and impulsivity) were not reported, then the next most appropriate available measure was used (e.g., ratings of one ADHD dimension). Ratings of non-ADHD-related dimensions were not included in the analyses. The probably blinded assessment analysis included both placebo- and non-placebo-controlled trials with an ADHD assessment made by an individual likely to be blind to treatment. In trials in which more than one such measure was available, the best blinded measure was selected. In nonplacebo or sham-treatment designs implemented in the home, these were either direct observations by an independent researcher or teacher ratings, as parent ratings were not considered probably blinded assessments. If the intervention was implemented at school, teacher ratings were not considered probably blinded assessments. When two measures were available, we considered independent direct observation as the best probably blinded assessment measure. In placebo or sham-treatment controlled trials, where all measures were likely to have some degree of blinding, parent ratings (home-implemented) and teacher ratings (school-implemented) were considered probably blinded assessments. For home-based interventions, direct observation or teacher ratings (in that order of preference) were considered better probably blinded assessments. Of the included studies, 93% of dietary and 54% of psychological trials had probably blinded assessments. Sensitivity analyses examined the impact of background ADHD medication use in trial samples on probably blinded assessments for which at least three trials in a domain had less than 30% of participants receiving medications (i.e., were no/low medication trials). Random-effects meta-regression was used to test whether lower-quality trials (as represented by total Jadad score) had larger effect sizes. Given the relatively small number of methodologically sound studies, the field is not yet mature enough for the investigation of publication bias using funnel plots—the interpretation of which, moreover, is equivocal when based on a small number of studies (29). In addition, it is problematic to distinguish between the effects of study heterogeneity and publication bias with sparse data (30).