What can the Sally-Anne task really tell you?

A Sandy who’sa what test?

There are two girls in front of you: one with a basket (Sally) and one with a box (Anne). Sally puts a marble inside of her basket. Then, she leaves the room. Anne walks over to the basket, takes the marble out, and puts it in her box. Sally comes back to the room with both the basket and box in front of her. Where will she look for the marble?

That’s the gist of the Sally-Anne test. A correct answer depends on your understanding that Sally did not see Anne move the marble to her box. That means Sally is going to look for her marble where she last left it: in her basket. Interestingly, not every child can answer this question correctly. While some will say that Sally will look in the basket, many insist that she will look where Anne moved the marble, even though it is impossible that Sally could know that.

Researchers found a specific age range when children begin to answer this question correctly. In 1983, Heinz Wimmer and Josef Perner read a story to children where one character, Maxi’s mother, removes a chocolate bar and Maxi returns to the scene of the crime. Each child being tested knows about the chocolate bar being relocated, but Maxi does not. Older children, in the age range of four and six, were better at responding correctly to the question, “Where will Maxi look for the chocolate bar?” Maybe this isn’t surprising; after all, older children have better control over their responses, a longer attention span, and understand more words.

Two years after Wimmer and Perner published their original study, Cambridge researchers Simon Baron-Cohen and colleagues streamlined it further. Its main difference was that, instead of being presented as a story, the scenario used dolls. They also tested children with an Autism Spectrum Disorder (ASD) diagnosis and children with a Downs Syndrome diagnosis. The children in these two groups were aged between three-and-a-half and 17. Typically developing children and children with Downs Syndrome were able to correctly answer the question, “Where will Sally look for her marble?” at almost identical rates (85% and 86%, respectively). Children with an ASD diagnosis, however, gave incorrect answers as often as the other groups gave correct answers (a rate of 80%).  Thus, the notorious Sally-Anne task was born.

A recreation of the study in 1988 exclusively tested those with an ASD diagnosis, this time using real people instead of puppets. It found similar results: the majority of individuals with autism were unable to correctly answer where an actor would look as opposed to where an object truly was. Researchers argue that this is because, instead of picturing what another person has seen, children with ASD respond with what they, themselves, have seen.

Taking what we thought we think–and making us think we thought our thoughts we’ve been thinking our thoughts we think we thought–I think…

Yes, it’s a Spongebob quote, but I promise it’s related. Patrick is making an assumption that the aliens are evil, and as evil beings, they will try to control people’s thoughts. Making this type of prediction is not too far off from the natural process that neurotypical people follow when making inferences about others’ intentions and actions. Being able to think about what another person knows is called Theory of Mind (ToM). This can consist of anticipating an individual’s physical perspective, intentions, and/or desires. Specifically, the Sally-Anne test asks children about their “belief understanding.”

A belief can either be true or false. A true belief is one where someone’s ideas about a scenario are correct; they match up with reality. A false belief is the opposite; an individual’s beliefs do not match up with reality. For example, the Sally-Anne task demonstrates a “false belief” situation. Sally’s belief about where the marble is does not match reality.

Examples of this task inspiring modern methods include those of Southgate et al., who, in 2007, published an article using a setup of “windows.” Like the Sally-Anne task, participants were faced with two boxes. Unique, however, was that, above the boxes, a person opened or closed a window, placing an object inside or taking an object out. By tracking the participants’ eyes, Southgate et al. (2007) found that 2-year-olds looked at the place they expected the hands to be before they were actually there. Babies were anticipating the person’s true or false belief.

A more recent example comes from the research done by Moll et al. in 2017. This study uses only one box, but sometimes the actor that the children watched was correct about what was inside, and other times she was incorrect. In other words, sometimes she held a true belief, and other times she held a false belief. When she held a false belief, children were much more interested in the outcome. They showed anticipation for the actor’s surprise upon finding what was actually in the box, which meant that they knew she had a false belief.

A few things have changed since the 80s

While the studies above presented those with an ASD diagnosis as a completely separate group from typically developing children, autism is a spectrum. Any good academic knows that, because the group differences were statistically significant, those with an ASD diagnosis tended to perform worse and those without tended to perform better. However, group differences were determined by roughly 80 percent of the ASD group failing the test, and just over 80 percent of the typically developing children passing the test. What, then, of the other 20 percent for both groups?

It’s impossible to create a rigid binary between being “on the spectrum” and being typically developing. Some children with autism may pass the Sally-Anne test effortlessly, while some typically developing children may struggle or even fail. This is why future studies should consider comparing the differences between groups not by presence of a diagnosis, but by performance (e.g., Korkiakangas et al., 2015). 

A final, but the most important, point: it’s true that the Sally-Anne task only measures ToM. ToM is related but is, in fact, different from empathy (Bzdok et al., 2012). While ToM describes a “rational” ability, empathy is an “emotional,” gut-feeling ability. Many individuals with an ASD diagnosis empathize with family, friends, and even strangers, while some typically developing children are unable to. This is contrary to the negative stereotype that people with autism “lack empathy and cannot understand emotion” (Brewer & Murphey, 2016).

While analysis methods have changed, and will continue to change, the 1985 version of the Sally-Anne test that was brainstormed by Baron-Cohen et al. remains largely unaltered. By imposing slight variations, this test can be used to help determine exactly which experiences, genetics, and environments impact ToM.


Baron-Cohen, S., Leslie, A. M., & Frith, U. (1985). Does the autistic child have a “theory of mind” ? Cognition, 21(1), 37–46. https://doi.org/10.1016/0010-0277(85)90022-8

Brewer, R., & Murphey, J. (2016, July 13). People with Autism Can Read Emotions, Feel Empathy. Scientific American. Retrieved September 6, 2022, from https://www.scientificamerican.com/article/people-with-autism-can-read-emotions-feel-empathy1/

Bzdok, D., Schilbach, L., Vogeley, K., Schneider, K., Laird, A. R., Langner, R., & Eickhoff, S. B. (2012). Parsing the neural correlates of moral cognition: ALE meta-analysis on morality, theory of mind, and empathy. Brain Structure & Function, 217(4), 783–796. https://doi.org/10.1007/s00429-012-0380-y

Korkiakangas, T., Dindar, K., Laitila, A., & Kärnä, E. (2016). The Sally-Anne test: an interactional analysis of a dyadic assessment: Sally-Anne test: an interactional analysis. International Journal of Language & Communication Disorders, 51(6), 685–702. https://doi.org/10.1111/1460-6984.12240

Leslie, A. M., & Frith, U. (1988). Autistic children’s understanding of seeing, knowing and believing. British Journal of Developmental Psychology, 6(4), 315–324. https://doi.org/10.1111/j.2044-835X.1988.tb01104.x

Moll, H., Khalulyan, A., & Moffett, L. (2017). 2.5‐Year‐Olds Express Suspense When Others Approach Reality With False Expectations. Child Development, 88(1), 114–122. https://doi.org/10.1111/cdev.12581

Southgate, V., Senju, A., & Csibra, G. (2007). Action Anticipation through Attribution of False Belief by 2-Year-Olds. Psychological Science, 18(7), 587–592. https://doi.org/10.1111/j.1467-9280.2007.01944.x

Wimmer, & Perner, J. (1983). Beliefs about beliefs: Representation and constraining function of 

wrong beliefs in young children’s understanding of deception. Cognition, 13(1), 103–128. https://doi.org/10.1016/0010-0277(83)90004-5