The ability to infer what other people are thinking and feeling is one of the most fundamental aspects of human social interaction. This capacity is also known as "mentalizing," having a "theory of mind," or possessing an "intentional stance" (1–7). This ability forms the basis of such diverse human activities as negotiating treaties, deciding to make a charitable donation, or selecting a day-care center for one’s child. It is also the basis for many of the interactions that occur during the practice of psychiatry, such as conducting a diagnostic interview, assessing a person’s motives or psychodynamics, performing psychotherapy, assessing the severity of a syndrome and the need for medications, or determining suicidal risk. Therefore, understanding how our brains and minds actually "mentalize" is a topic of great interest. The tools of functional neuroimaging now permit us to observe one person’s mind as it attempts to infer what is occurring in another person’s mind, thereby making a previously mysterious process accessible to scientific investigation.
Several recent studies have begun the process of exploring the neural basis of theory of mind (TOM): how networks and faculties within our brains and minds activate and interact when we perform mental activities that require us to understand the subjective reality of other people. These studies have used a variety of imaging techniques, designs, and test materials.
In one of the first positron emission tomography (PET) studies on TOM, Goel et al. (8) showed activation of the left medial frontal gyrus and the left middle and superior temporal gyri when they compared normal subjects making inferences about the minds of others (from Columbus’s era) to a baseline condition of making inferences about the physical world. Another early PET study compared subjects on comprehension of "TOM" stories, "physical" stories, and unlinked sentences, and it also revealed a selective activation of the left medial frontal lobe in the TOM condition (9). A more recent PET study (10) used a nonverbal task involving cartoons to compare attribution of intent with several control conditions, and it too found medial frontal activations, as well as others in the bilateral temporal lobes and cerebellum. In an fMRI study, Gallagher et al. (11) showed their subjects stories and cartoons requiring both TOM and non-TOM processing. They found activations in the medial prefrontal gyrus and in the temporoparietal junctions during the TOM task. Taken together, these studies have begun to identify some of the regions involved when human beings attribute mental states to others, and in particular they suggest that medial frontal brain regions play an important role.
Most of the studies to date have attempted to dissect components of TOM in designs that are relatively simple and therefore clean and elegant. Most have examined passive responses to stimuli rather than active engagement in the TOM task. However, mentalizing as it occurs in daily life is not a simple cognitive process, nor is it usually passive. Evidence from evolutionary biology suggests that a fully developed TOM ability is limited to human beings and requires multiple cognitive capacities and systems (1, 3, 12–16). It has been suggested that the capacities for language and episodic memory make some contributions to the highly developed human ability to mentalize (4, 10, 16–21).
Therefore, we chose to conduct a PET study of TOM using a task that requires subjects to actively place themselves in another person’s place and to attribute mental states to that person. We asked them to imagine and describe the experience of a stranger met during a chance encounter on a park bench. While we measured regional cerebral blood flow (rCBF) using PET, healthy subjects verbally described what the stranger had experienced. We analyzed the subjects’ utterances in order to assess their "mentalizing" capacities. The abilities examined during this task are similar to those used by psychiatrists when they take a patient’s history and attempt to imagine the internal experience of another person. They are also similar to those used in many types of psychotherapy, including psychodynamic psychotherapy. Therefore, this study provides an opportunity to begin to understand the neural basis of the mental processes that are used in some aspects of the practice of psychiatry.
The subjects were 13 healthy right-handed volunteers (seven women and six men) recruited from the community. They were screened to rule out a current or past history of psychiatric illness by using a short version of the Comprehensive Assessment of Symptoms and History (22). They were also evaluated by means of a history and physical examination to rule out any current or past history of neurological or general medical illness. Their mean age was 26.5 years (SD=6.4), and their mean educational achievement was 14.6 years (SD=0.9). They had a mean full-scale IQ of 108 (SD=9). All gave written consent to a protocol approved by the University of Iowa Human Subjects Institutional Review Board.
During the experimental task, referred to as the "TOM story," the subject was told: "Imagine you sat next to a woman (for men), or next to a man (for women) on a park bench and you realized she (he) was crying. Make up a story about what led up to her (his) crying." These instructions were presented to the subjects on a video monitor positioned 12–13 inches from their eyes. Subjects were given 30 seconds to read the instructions and plan their narrative before beginning their story. They were allowed to speak for approximately 100 seconds, and PET data were acquired during this time interval. The control/comparison task ("read story") was to read aloud a simple story requiring no attribution of mental states. This story was also presented on the video monitor. The subjects were required to read for 40 seconds. If they finished reading the story before the time was up, they had to restart from the beginning. The experimental and control tasks both required the subjects to read instructions or other material and to speak aloud in continuous narrative speech, thereby isolating the TOM component in the experimental task. Subjects were audiotaped during the two conditions, and transcriptions were prepared from the tapes. Both conditions were timed so that the subject began speaking 10 seconds before the arrival of the bolus of [15O]H2O in the brain. The two tasks were part of a larger eight-condition study of reading and language production. The order of the tasks in this study was counterbalanced. Results from the analysis of other conditions have not yet been reported.
Each TOM story transcript was rated independently by two raters for the occurrence of mental state attributions to self or others. Every TOM story was divided into separate sentences or parts on the basis of what was being relayed. Each one of these parts is referred to as an utterance, which is defined as an idea that could be easily distinguished from the one that precedes or follows. Each utterance was scored as level 0 if no attribution of mental states was made, level I if one level of attribution of mental states to self or other was made, and level II if at least two levels of attributions of mental states to self or other were made (see A1 for examples of rating). Both raters had trained by reading the literature available as well as agreeing on guidelines to follow (17, 23–27). The interrater agreement was 76%. The percentage of mental state attributions was then determined by counting the total number of utterances containing attributions and dividing that number by the total number of utterances (×100). All utterances containing TOM statements used rather rhetorically (e.g., "…I guess" and "…you know") were not scored (28).
Quantitative PET blood flow data were acquired on a GE 4096-plus whole-body scanner (General Electric Systems, Milwaukee) after a bolus injection of [15O]H2O (29). To acquaint the subject with the imaging conditions and to ascertain stimulus timing, the time from injection to bolus arrival in the brain was individually measured by delivering a 15 mCi bolus during an initial scout injection. During the scout injection, the subject was asked to read a list of words presented on the monitor. All subsequent scans employed a 50 mCi [15O]H2O intravenous bolus dose. Imaging began at the time of injection (t=0) and continued for 100 seconds in the form of 20 frames of 5 seconds each. Arterial blood was sampled from a catheter placed in the radial artery to obtain the input function needed for calculation of tissue perfusion (in milliliters per minute per 100 g of tissue).
On the basis of the time-activity curves over major cerebral arteries, the eight frames reflecting the 40 seconds of postbolus transit were summed and reconstructed into 2-mm voxels (128×128 matrix) by using a Butterworth filter (order=6, cutoff frequency=0.35 Nyquist units) (30). By using this summed image and the measured arterial input function, the CBF was calculated on a pixel-by-pixel basis by means of the autoradiographic method (31). The CBF calculation was normalized by dividing it by the global CBF. To reduce anatomical variability, an 18-mm Hanning filter was applied. Imaging was repeated at approximately 15-minute intervals.
+
Magnetic Resonance Imaging
MR scans, used for anatomic localization of functional activity, were obtained for each subject with a standard T1-weighted three-dimensional spoiled gradient recall acquisition sequence on a 1.5-T GE Signa scanner (General Electric Systems, Milwaukee) (TE=5, TR=24, flip angle=40°, NEX=2, field of view=26, matrix=256×192, 1.5-mm slice thickness).
The normalized quantitative PET blood flow images and MR images were analyzed by using the locally developed software package BRAINS (32–34). The outline of the brain was identified on the MR images by a combination of edge detection and manual tracing. MR scans were volume rendered; the anterior commissure-posterior commissure line was identified and used to realign the brains of all the subjects to a standard position to place each brain in standardized Talairach coordinate space (35). The PET image of each individual was then fit to that individual’s MR scan by using a surface-fit algorithm (36). Subjects were checked for head movement with each injection, and images were individually refit as needed. The MR images of all the subjects were averaged, so that the data obtained with PET could be localized on coregistered MR images. The coregistered images were resampled and simultaneously visualized in all three orthogonal planes.
Student’s t test was used to analyze the differences in behavioral performance across groups and conditions.
Statistical analysis for a within-group comparison of the PET images was performed by using an adaptation (29) of the method of Worsley et al. (37). Images were resampled to 128×128×80 voxels by using the Talairach atlas bounding box. A within-subject subtraction of the findings for the experimental and control tasks was then performed, followed by across-subject averaging of the subtraction images and computation of voxel-by-voxel t tests of the changes in rCBF. Significant regions of activation were calculated on the t-map images by using a technique that corrects for the large number of voxel-by-voxel t tests, the lack of independence between voxels, and the resolution of the processed PET images (29, 37). Areas containing at least 50 voxels with a t value greater than 3.61 are reported in the tables, as well as the highest t value (t max) and the total number of voxels in the region (equivalent to p=0.0005, uncorrected).
t1 shows the behavioral data of the subjects, indicating their performance during the TOM task and the read-story task. The difference between conditions in the rate of word production per minute was not statistically significant (read-story task: mean=172, SD=29; TOM task: mean=163, SD=26) (p=0.40). Subjects produced a mean of 14.7 utterances (SD=4.0) and 257 words (SD=69) during the TOM task. t1 also shows the average number of mental state attributions at the different levels during the TOM task, as scored by the two raters. There were, on average, 6.3 level-0 utterances (SD=3.7), 8.3 (SD=2.8) level-I utterances, and 0.2 (SD=0.5) level-II utterance per story. Fifty-nine percent of the utterances contained attribution of mental states to self or others (i.e., mean=0.59, SD=0.22, of the total number of utterances contained level-I or level-II attributions).
t2 shows the results of subtracting the read-story condition from the TOM-task condition. The table indicates the brain regions with a higher rCBF during one condition compared with the other, using region names based on inspection of the coregistered MR and PET images, as well as the x, y, and z coordinates from the Talairach atlas. Areas containing at least 50 voxels with a t value greater than 3.61 are reported in the tables, as well as the highest t value (t max) and the total number of voxels in the region. The positive t values represent the brain regions that showed greater blood flow during the TOM-task condition than during the read-story condition, while the negative t values show regions with greater blood flow during the read-story condition. The data are presented by using the Talairach atlas convention, with x referring to the left/right position with respect to a midsagittal plane, y referring to the rostral/caudate (anterior/posterior) position with respect to a verticofrontal plane defined by the anterior commissure, and z referring to the superior/inferior position with zero at the anterior-posterior commissure plane.
F1, F2, F3, and F4 show four sets of PET images that have been selected to illustrate the activations shown in t2. Visual display of results is shown in two ways. One presentation shows only the peaks, as defined by the volume measurement, superimposed on the composite average MR image from the 13 subjects. The other presentation, referred to as the "t map," shows the color-coded t values for all voxels in the image. The peak map and the t map provide complementary information. The former identifies areas of activation by using a strict definition based on a relatively arbitrary cutoff point, while the latter provides a more descriptive picture of the geography of the circuitry involved. Regions activated by the TOM task appear in shades of red/yellow, and regions that were used for the read-story task appear in shades of blue.
The TOM condition appears to activate an extensive network that is primarily in the left hemisphere. Examples of the components of this network are shown in F1, F2, F3, and F4. As has been noted in all previous functional imaging studies of TOM, the medial frontal cortex is activated. Multiple additional frontal activations are also observed in both the superior and inferior frontal regions; one of the superior frontal regions is on the boundary of the cingulate gyrus (i.e., the paracingulate), and activations are also present in both the anterior cingulate and the retrosplenial cingulate. Additional regions used during the TOM task include the angular gyrus of the parietal lobe and the anterior pole of the temporal lobe.
During the TOM task the largest activation occurs in the right cerebellum, however, and is composed of 10,360 voxels. This activation reflects the interaction of the right cerebellum with the contralateral left hemisphere cortical activations, as predicted by cross-hemispheric diaschisis. Three other smaller cerebellar activations are also present, including two in the anterior lobe of the vermis.
As shown in t2 and in F1, F2, F3, and F4, the control task activates regions that are engaged by the process of reading a story. These activations occur primarily in the right hemisphere. As expected, the bilateral visual cortex is extensively activated, reflecting the need to visually scan the material being read aloud. Heschl’s gyrus (auditory cortex) is also activated, as the subjects listen to themselves read. Other areas of activation due to reading include several inferior frontal, superior parietal, and inferior temporal regions.
This study adds to the growing literature on TOM by examining a complex mentalizing task that requires the use of language and active imagining of another person’s mental state. It replicates some earlier findings and adds some new ones. The TOM task investigated in this study activated a distributed group of brain regions that are engaged when individuals attempt to imagine and verbally describe the subjective state of another individual whom they have never met before. It is related to the state that psychiatrists think of as "the empathic mode."
The medial frontal cortex appears to be a crucial region for this TOM task. Like the four previous studies that have used PET or fMRI to investigate TOM (8–11), this study showed that the medial frontal region is activated when human beings attribute mental states (F1). The function of this region is thought to be maintenance of a representation of the mental state of self (12). The previous four studies that also found medial frontal activation were all quite different in their designs and stimulus materials: pictures of various kinds of objects for which subjects must infer the objects’ normal use and model another person’s knowledge about the objects’ use (8), inferential versus factual stories (9), cartoons versus stories that also contrast facts with jokes and deception (11), and cartoons that contrast intent with physical causality (10). Yet these different designs found medial frontal activation, predominately in the left hemisphere, during the TOM condition. Therefore, we can conclude that this region must be centrally involved in the core process of mentalizing in a manner that is independent of the modality through which the TOM task is presented (e.g., visual versus verbal).
This study differs from its predecessors in two significant respects that may affect its results and illuminate other components of TOM. Unlike the previous studies, which required passive viewing of stimuli and some type of simple response (often pushing a button in response to a question), this study asked subjects to actively imagine the internal state of another person and to describe that internal state in extended verbal discourse. This task required using both language and memory to create a TOM discourse. As shown in the behavioral data, subjects made an attribution of mental states in 59% of their utterances, indicating that this task was appropriate for tapping into TOM. There was no difference in the rates of words produced per minute between the experimental and control tasks. Therefore, we can conclude that the verbalization component of the TOM task was subtracted out through the control task. Furthermore, the classic "language production regions" (e.g., Broca’s area, Wernicke’s area, sensorimotor speech areas) were not seen as activated in the experimental task, adding to the evidence that this task successfully examined TOM.
Since making up a story about another person does require the creative and spontaneous use of language (which the control task did not), however, we cannot assume that language components were completely eliminated. Nor is such elimination necessarily desirable, given the extensive literature suggesting that language is a crucial component of TOM (10, 17–19, 38). Likewise, it has been suggested that TOM recruits episodic memory (recollections of past personal experience) (16–21). TOM is a complex cognitive process, which is likely to require that subjects mentally reference their own past experiences as they attempt to imagine what the other person has experienced. Therefore, the distributed activations observed in this study must be interpreted in the light of the possible participation of these additional brain systems, which must to some extent participate in the "TOM system" when it is challenged by a complex empathic task.
The medial frontal activations observed in this study are part of a large cluster of frontal activations. One group occurs in the superior frontal regions, with three significant activations that include a more anterior activation (Talairach coordinates: x=–17, y=42, z=20), a more rostral one (coordinates: x=–15, y=31, z=34), and finally the most posterior one (coordinates: x=–20, y=10, z=50) (F2). The most posterior activation is very large (4,292 voxels) and divides into two separate activations if a higher significance threshold is used. At a threshold of 4.2, this activation separates into two: one centered in the superior frontal gyrus (coordinates: x=–20, y=9, z=50; 4.4 cc; t max=5.99) and one in the anterior cingulate (Brodmann’s area 32) (coordinates: x=–5, y=14, z=40; 0.5 cc; t max=4.76). The function of these multiple superior frontal regions is not definitively known, but it is likely that they reflect various aspects of medial frontal functions. That is, they may contribute to the emotional, cognitive, and action components of representation of the self (12).
In addition, a second group of frontal activations occurs in several regions of the cingulate gyrus (F1 and F4). The activations are bilateral, but the largest (coordinates: x=5, y=32, z=15; 1,095 voxels) is on the right (with some extension into the left hemisphere). The cingulate gyrus has been extensively studied in functional imaging studies, as well as in lesion and animal studies. A recent review proposed that the role of the anterior cingulate is to juxtapose emotions with focused problem solving, error recognition, self-control, and adaptive response to changing conditions (39). The cingulate gyrus is widely connected with diverse parts of the brain and appears to be central for coordinating and focusing attention on complex tasks. Several lesion, animal, and electrophysiological studies have identified an anterior "emotional" part of the anterior cingulate and another more posterior "cognitive" region (40, 41). Thus, the multiple cingulate activations appear to reflect efforts to integrate the emotional and cognitive components of the TOM task.
A retrosplenial cingulate activation is also present. This region has been shown in several imaging studies to play a role in the expression of emotions, as well as in episodic memory (42–44). Working in conjunction with the other activated regions, the retrosplenial cortex may be used for focusing attention on an emotionally charged topic, retrieving personal experiences that will assist the empathic response, and attributing the retrieved information to self and others (4, 16, 20, 21). An additional group of activations may also reflect the language and memory retrieval components of this particular TOM task. They include the activations in the angular gyrus and the anterior temporal pole (F3). The angular gyrus is a parietal association region that has connections to the anterior temporal pole, as well as to prefrontal regions (19). These regions may be activated by the search for specific words, for story content, and for episodic memories. Both frontal and anterior temporal regions have been shown to be active in verbal memory tasks that require verbal recall of word lists, complex narratives, and episodic memory (42, 45, 46). Anterior temporal activations were also observed in all previous PET and fMRI TOM studies (8–11), suggesting that these regions may be a modality-independent part of the TOM system.
The largest activation during the TOM task occurs in the right cerebellum (10,360 voxels). This peak fills the lateral and medial aspects of the right cerebellum. There is a small mirror activation in the left lateral cerebellum. In addition, two activations occur in the anterior lobe of the cerebellar vermis (F3 and F4). These findings add to the rapidly growing evidence that the cerebellum is an important "cognitive organ" in the human brain. They indicate that the cerebellum plays a role in yet another complex "higher" nonmotor mental activity: performing a task that requires a theory of mind. Previous studies have shown that the cerebellum is activated in many other cognitive tasks such as memory for faces, verbal memory, episodic memory, error detection, attention, sensory activation and discrimination, and timing (47–60). Cerebellar activations were also seen in two earlier TOM studies (10, 11). The large right cerebellar activation reflects cross-hemispheric diaschisis: the multiple left hemisphere frontal, temporal, and parietal regions used in the TOM task are linked to the contralateral cerebellum, reflecting recognized anatomic pathways (47–58). The two activations in the vermis may be presumed to reflect a similar link to the midline "paralimbic" activations in the cingulate gyrus. Taken together, these large and multiple cerebellar activations indicate that the cerebellum is working in concert with the cerebral cortex to coordinate the process of imagining and describing the internal state of another person, interactively finding, checking, and monitoring the creation of the TOM story. It is of interest that a growing body of literature has suggested an abnormal cerebellar structure in autism, at both the cellular and anatomical levels (61–64). Impaired TOM, in fact, seems to be a hallmark of this neurodevelopmental illness (23).
Frith and his team (12) have made extensive contributions to the TOM literature, both experimentally and theoretically. They suggested that the brain may possess a "theory of mind system," just as it possesses other systems that have become increasingly well-mapped, such as the language system, the facial recognition system, or the object recognition system. On the basis of their experimental work and studies of brain activity in higher primates, they proposed that the TOM system is composed of three major nodes: medial prefrontal, superior temporal sulcus, and inferior frontal. They proposed that the medial prefrontal node maintains representations of the mental state of the self, that the superior temporal sulcus detects the behavior of agents and analyzes the goals and outcomes of this behavior, and that the inferior frontal region maintains representations of actions and goals (12).
The present study, especially when examined in the context of other recent studies, adds another perspective to the nature of the TOM system in the human brain. Instead of attempting to eliminate the contributions of episodic memory and language, it includes them in the design, based on the recognition that most efforts to attribute mental states to other people depend heavily on these forms of mental activity. Two competing mechanisms have been proposed for our ability to mentalize. The simulation theory suggests that we draw from our repertoire of past experiences when we attempt to imagine the experiences of others, while the theory theory suggests that we formulate an abstract representation or set of rules (theory) and then apply that model/theory to recreate or imagine the experience of others (2, 26, 65, 66). Both of these mechanisms inevitably must conduct the simulation or formulate the theory using language, and both must also draw on personal past experience (episodic memory). Therefore, our approach suggests that both language and episodic memory should be considered as components of the TOM system.
The TOM network may consist of a core system and "supportive" components. Lesion studies, for example, have shown some brain areas to be essential for TOM processing (2, 67, 68). Medial frontal lesions, particularly right ventral lesions, impaired detection of deception in the study by Stuss et al. (2), for example, suggesting that this brain area is part of the core system. As for language, there is some evidence that developing normal language skills is also important for the acquisition of TOM (20). Impaired grammar in adulthood, however, does not seem to affect performance on mentalizing tasks (38). This finding suggests that although language centers might contribute significantly to normal TOM processing, some components (e.g., the phonological loop) might not be as essential, while others (e.g., semantic components) may be. In this study we have identified a TOM system that is widely distributed and that is made up of interactive nodes in frontal and anterior cingulate regions, associative and memory regions in the angular gyrus and anterior temporal lobe, and the cerebellum. Many of these regions have also been found to be active during mentalizing in other TOM imaging studies.
Regardless of how the TOM system is conceptualized, the replication of so many findings across multiple studies suggests that the tools of functional imaging can indeed be used to study complex mental activities. Using the mind to create scientific studies of how one human mind can understand another has become a realizable goal. Freud’s project for a scientific psychology is now well under way.
Received May 15, 2002; revision received Dec. 3, 2002; accepted April 9, 2003. From University of Iowa Health Care, Department of Psychiatry, and Mental Health-Clinical Research Center, Department of Psychiatry, Roy J. and Lucille A. Carver College of Medicine, University of Iowa Hospitals and Clinics, Iowa City, Iowa; the Mental Illness and Neuroscience Discovery (MIND) Institute, Albuquerque, N.M.; and the Department of Psychiatry, University of New Mexico, Albuquerque, N.M. Address reprint requests to Dr. Andreasen, University of Iowa Health Care, Department of Psychiatry, 2880 JPP, 200 Hawkins Dr., Iowa City, IA 52242-1057. Supported by NIMH grants MH-40856, MH-60990, MH-19113, and MHCRC43271. The authors thank Alicia Bales and Eugene Zeien for assistance with data acquisition and data analysis.