How Do You Learn Best: The Role of Information Modality in Learning and Memory

Nicole Elbaz-Deckel and Karina Agadzhanyan

Think back to a time when you missed a school lecture or a work meeting and had to catch up on the material you missed? Did you look over the slides, graphs, and images or listened to an audio recording of the lecture or meeting? How about simultaneously listening to the audio and reviewing the slides? While we would all like to believe we are able to effectively absorb the material of our missed obligations by picking one of these strategies, it is important to explore how our brain encodes and integrates information of varying modality. 

Within a modern day classroom setting, instructors make use of both physical and digital multimedia materials to instruct students of all educational levels. It has been speculated that adding visual and auditory stimuli cues may guide cognitive processing in a way which positively affects learning (Xie et al., 2019). Moreover, instructional videos are known to not only provide the to-be-learned information but to also direct learners’ attention to specific aspects of that information (Merrill, 2012). Therefore, when creating instructional materials, visual and auditory aids can be used to help capture student attention and draw focus to important details. 

Brief Overview of Information Processing

Most newly encountered information is first detected through our senses, such as sight (iconic memory), hearing (echoic memory), and touch (haptic memory). An example of auditory stimulation concerns the presentation of sound sources such as complex auditory scenes, isolated auditory objects, and music, whereas visual stimulation engages with processing of scenes, images, graphs, and various other object depictions (Cohen et al., 2009). It has been shown that auditory stimuli are processed much faster and generally last longer than visual stimuli (Jain et al., 2015). In other words, visual information decays from iconic memory after one hundred milliseconds (Sperling, 1960) while auditory information remains in our sensory memory for approximately two to five seconds (Darwin et al., 1972). If information at the sensory level receives enough attention it is passed into short-term memory, where it is temporarily stored before it reaches the permanent long-term memory store. Therefore, it is important to understand the impact of auditory and visual stimuli used separately and in combination to optimize student learning.

Considering Stimulus Modality on Memory

Past research assessing whether encoding is most successful when information is presented visually, auditorily, or simultaneously auditorily & visually has produced varied results. A recent study observed memory performance in children aged 7-8 years is better when information is encountered with an auditory presentation modality as well as a combined auditory-visual modality than when presented visually (Pillai & Yathiraj, 2017). These results support the idea that using auditory stimuli to present information would benefit memory more than using visual stimuli, while addressing the effectiveness of their integration as long as auditory resources are present. However, multiple studies have found contrasting results, concluding memory for visual stimuli (i.e., pictures) proved significantly better than memory for auditory stimuli (i.e., sound clips) and even deemed auditory stimuli notably inferior to visual stimuli in the present case (Cohen et al., 2009; Nijboer et al., 2008). Interestingly, a study reported inferior auditory memory even in subjects with auditory expertise such as professional musicians (Cohen et al., 2011). These results imply visual stimuli benefit memory more than auditory stimuli, and such an idea could advocate for the use of visual stimuli in classrooms and learning environments as an effective educational tool.

An interesting question arises: given that our brain processes visual information differently than auditory information, would the addition of auditory stimuli counteract the retention benefits of visual stimuli? Would the incorporation of auditory stimuli create an increased demand for attentional resources, resulting in lower memory performance, or will it facilitate visual attention, strengthening memory encoding? Since the majority of our daily perceptions come from the integration of multiple sensory stimuli (i.e., multisensory), we must consider effective integration of visual and auditory stimuli and their effects on our attention and memory.

Integration of Auditory and Visual Stimuli

Exploring the idea of multisensory integration, often referred to as dual cueing, a study by Xie and colleagues (2019) considers whether the binding and coordinating of both visual and auditory screen cues can improve learning over using either one isolated cue or no cue. In their experiment, college students were assigned to one of the following multimedia conditions: dual cues group, visual cues group, auditory cues group, no cues group (control). Respectively, all participants within each group underwent a study phase consistent with their assigned condition where they learned a computer-based lesson about neural transmission. Memory performance was measured by averaging test scores within each condition. Results revealed the group who learned the material with both visual and auditory cues (i.e., dual cues) demonstrated better retention, and ultimately better learning, than the groups who learned material with no cues or only with one type of cue. Therefore, this study suggests people can successfully integrate both visual and auditory cues in learning. In other words, multisensory integration enhances the salience and attentional processing of encountered material resulting in better encoding.

However, it is important to note that in order to be effectively integrated, auditory and visual stimuli must be semantically congruent in order for the binding between the stimuli to happen (Chen & Spence, 2010). To illustrate, people are better at identifying the object they are seeing (e.g., cat) if they are provided with an auditory cue (e.g., the sound of a cat). In fact, it has been shown that adding an auditory cue to a visual cue results in better recognition memory for visual objects (see Matusz et al., 2017 for a review). Such memory benefit could be explained by the availability of two valid sources of information regarding the learned material, resulting in additional retrieval cues. Hence, during testing, seeing one type of cue could initiate memory for the other cue, which then when properly binded can help in identifying and remembering the previously studied material. Thinking of this from a student perspective, it may be most beneficial to review slides or any visuals while simultaneously following along with the lecture audio recording of the same material.


The ability to effectively combine sensory inputs across different modalities is essential for acquiring and learning new information. A famous Confucian scholar once said about our learning, ‘What I hear, I forget. What I see, I remember,’ which states that our memory for auditory information is inferior to our memory for visual information. As we may already know, the human brain indeed processes auditory information differently than visual information. In other words, the way our mind processes and stores sound is different from the way it analyzes and stores visual information. The majority of work has shown that seeing is better than hearing when it comes to remembering information. Teachers typically assume students will remember everything they say. However, research has shown that in order to improve students’ encoding and memorability, one must include a visual or hands-on experience, in addition to auditory information. The binding of visual and auditory stimuli can prove beneficial towards the optimization of learning, in both educational and practical settings as such stimuli have proven to cue attention and improve memory when simultaneously presented. Therefore, a strategy to best absorb content and optimally learn is the use of combined multisensory integration, specifically, when visual and auditory information is semantically congruent. Perhaps, listening to the recording of your missed lecture while analyzing its accompanying visuals aids would prove to be most beneficial for learning after all.



Chen, Y. C., & Spence, C. (2010). When hearing the bark helps to identify the dog: Semantically-congruent sounds modulate the identification of masked pictures. Cognition, 114, 389-404.

Cohen, M. A., Horowitz, T. S., & Wolfe, J. M. (2009). Auditory recognition memory is inferior to visual recognition memory. Proceedings of the National Academy of Sciences, 106, 6008–6010.

Cohen, M. A., Evans, K. K., Horowitz, T. S., & Wolfe, J. M. (2011). Auditory and visual memory in musicians and nonmusicians. Psychonomic Bulletin & Review, 18, 586-591.

Darwin, C. J., Turvey, M. T., & Crowder, R. G. (1972). An auditory analogue of the Sperling partial report procedure: Evidence for brief auditory storage. Cognitive Psychology, 3, 255-267.

Jain, A., Bansal, R., Kumar, A., & Singh, K. D. (2015). A comparative study of visual and auditory reaction times on the basis of gender and physical activity levels of medical first year students. International Journal of Applied and Basic Medical Research, 5, 124.

Lal, S. K., Henderson, R. J., Carter, N., Bath, A., Hart, M. G., Langeluddecke, P. & Hunyor, S. N. (1998). Effect of feedback signal and psychological characteristics on blood pressure self-manipulation capability. Psychophysiology, 35, 405-412.

Matusz, P. J., Wallace, M. T., & Murray, M. M. (2017). A multisensory perspective on object memory. Neuropsychologia, 105, 243-252.

Merrill, M. D. (2012). Instructional transaction theory: An instructional design model based on knowledge objects. Instructional Design: International Perspectives: Volume I: Theory, Research, and Models: Volume Ii: Solving Instructional Design Problems, 381.

Nijboer, F., Furdea, A., Gunst, I., Mellinger, J., McFarland, D. J., Birbaumer, N., & Kübler, A. (2008). An auditory brain–computer interface (BCI). Journal of Neuroscience Methods, 167, 43-50.

Pillai, R. & Yathiraj, A. (2017). Auditory, visual and auditory-visual memory and sequencing performance in typically developing children. International Journal of Pediatric Otorhinolaryngology, 100, 23-34.

Sperling, G. (1960). The information available in brief visual presentations. Psychological Monographs: General and Applied, 74, 1.

Xie, H., Mayer, R. E., Wang, F., & Zhou, Z. (2019). Coordinating visual and auditory cueing in multimedia learning. Journal of Educational Psychology, 111, 235–255.