Michael Thacker

“Investigating the evolution of consciousness through integrated symbolic, archaeological, and psychological research.”

The Evolution of Visual and Auditory Systems and Their Effect on Working Memory

Abstract

Human working memory (WM) is shaped by evolutionary and neurobiological factors, particularly those tied to sensory input. This paper explores the relationship between visual and auditory sensory systems and their influence on WM, tracing their evolutionary development and functional integration in the brain. Evidence suggests the visual system, with its ancient origin and extensive cortical representation, plays a dominant role in WM processes such as navigation, object recognition, and emotionally salient encoding. Comparative analysis of empirical studies reveals mixed findings: some suggest visual superiority in WM performance, while others find parity between auditory and visual modalities, or longer retention for auditory input. Limitations in sample sizes, age diversity, and methodological scope are identified as contributing to these inconsistencies. The paper proposes that evolutionary pressures favored the development of visual acuity and its integration with WM for adaptive purposes such as survival, foraging, and spatial orientation. A hypothesis is presented that visual stimuli more robustly engage WM due to both cortical prioritization and evolutionary tuning. The review concludes that further research using diverse populations and complex multimodal methodologies is needed to clarify sensory dominance in WM and to explore whether technological and environmental changes are reshaping these cognitive dynamics in modern humans.

Introduction

Humans have evolved unique abilities both physically and cognitively since they branched from chimpanzees approximately 6 million years ago. Some of these unique abilities include walking and running upright (bipedalism), crafting tools, cooking food, as well as communicating and cooperating with one another, among other things. These features coincided with the development of larger brains, especially that of the neocortex wherein complex thinking, abstract reasoning, and memory formation transpire (Chin et al., 2023). The latter of these features of the neocortex has been of intrigue among researchers over the past century or more.  

Furthermore, some of the earliest work on memory being conducted by Hermann Ebbinghaus in the late 19th century. His experiments consisted of an individual (including himself) learning nonsense syllable and then reciting these syllables from memory. The results revealed that 7 syllables were the ideal ratio for an exact recitation from memory, and the higher the number of syllables the lower the accuracy of recitation (Roediger & Yamashiro, 2019). These results were later proved to be related to short-term memory or working memory (Bajaffer et al., 2021). Several decades later, studies conducted by George A. Miller published in 1956 provided similar results with short-term/working memory with a 7 plus or minus 2 regarding bits of information result for an average person’s memory threshold.

Since the time of Ebbinghaus and Miller, hundreds of experiments have been conducted and published pertaining to memory and with these studies an improved understanding of both the functionality and composition of memory. For instance, according to Amal Bajaffer and colleagues (2021), researchers have now divided the functionality of memory into three distinct linear phases that include encoding, storage, and retrieval of information. Furthermore, memory has been categorized into three distinct categories: sensory memory, working memory, and long-term memory. Sensory memory (SM) is short in duration and possesses a large storage capacity, and it involves utilizing the bodily senses (touch, taste, smell, sight, and hearing) in the detection of information that is directly stored in the nervous system. Working memory (WM) is directly linked to short-term memory wherein the information acquired is held for a short period of time, approximately 30 secs, and it has a storage capacity of approximately 4 chunks of information. Finally, long-term memory involves information that is stored for exceptional amounts of time, including months and even years (Bajaffer et al., 2021).

According to Peter Carruthers in his book “In Light of Evolution: The Human Mental Machinery (2014),” WM in particular has been perceived as a fundamental aspect of human survival and flourishing throughout the evolutionary past up to today. This specific feature of memory is involved in multiple processes that are essential to life, including learning, speech, comprehension, and future planning abilities. Moreover, it has been postulated through research that WM is a feature of executive functioning that is distributed throughout the frontal lobes of the brain. Here, WM is thought to collaborate with sensory cortical regions of the brain that interact through attentional processes. It is further speculated that it is through the executive control of attention that sensory information is allowed access into WM (Carruthers, 2014).

However, the most significant and efficient means by which these sensory information interactions have on memory have been of interest to researchers since the induction of memory studies. As mentioned previously by Amal Bajaffer and colleagues (2021), the sensory input in the environment is predicated upon the interaction with the five senses, including smell, sight, hearing, taste, and touch. The two predominantly studied senses pertaining to memory that have the most significant impact include auditory and visual (Linder et al., 2009). Which sensory system impacts memory the most has been of debate, and from an evolutionary perspective, this begs the question, did humans evolve with an acuity for visual memory or auditory memory? The next section will explore the evidence pertaining to visual memory and auditory memory.

Evidence

The evolutionary components of visual and auditory systems is a fine place to begin in the analysis of their impact on WM. First, according to Dan-Eric Nilsson (2022), the visual system is much more primitive in its origins as evidenced in fossils of ancient fish dating back approximately 550 million years ago during the Cambrian explosion. For ancient species, this revolutionary feature of evolution allowed for a novel interaction with reality with newfound abilities such as object recognition and discrimination, motion detection, and enhanced navigational skills (Nilsson, 2022). Next, Marcela Lipovsek and colleagues (2023) have revealed that the evolutionary process of the auditory system occurred at a much earlier date at approximately 350 million years ago during the Carboniferous period, which was a period of the transference from water to land. This evolutionary process was gradual with a low-frequency sound acquisition to high-frequency sound sensitivity (Lipovsek et al., 2023).

As humans evolved from chimpanzees 6 million years ago, both their hearing and vision continued to improve. This feat was accomplished through the evolving process of the brain wherein not only did features pertaining to the neocortex increase and improve, but areas pertaining to vision and hearing increased in size and improved in efficiency as well (Kaas, 2013). However, vision appears to be the most ancient and well developed among the senses which could be indicative of a more influential role in memory compared to hearing.

To help further shed light on this issue, navigational and identification skills of humans both past and present must be analyzed. First, during human’s evolutionary past, navigational skills were of essential value for survival, especially as they began to emerge out of Africa over 100,000 years ago (Stewart & Stringer, 2012). A recent review by Pablo Fernandez-Velasco and Hugo J. Spiers (2024) analyzing the navigational skills of traditional cultures discovered that their navigational skills were predicated upon the identification of patterns in nature that aided in learning and the memorization of terrains. This feat allows them to better navigate their terrain during times of hunting and gathering, and thus mimics conditions of which human ancestors would have contended with (Fernandez-Velasco & Spiers, 2024). Next, according to Maurice Ptito and colleagues (2021), this navigational ability provided by the visual system also provided humans with the necessary ability to better adapt to their environment which enabled them to also identify and remember foods worth foraging and prey valued for hunting. A keen eye for detail and patterns was necessary for the survival and success of the homo species, which was thus translated into the memory of the individual and their descendants (Ptito et al., 2021).

Following this evolutionary trend into the modern work of neuroscience is the recent work conducted by Martin Seeber and colleagues (2025) wherein they analyzed the neuroanatomical effects of navigational skills in action. What they discovered was that both real-world navigation and virtual navigation relied on and influenced memory formation in significant ways. Their study highlighted the essential component of the visual system and memory formation in humans within the confines of navigating one’s current landscape, which was a necessity in the insurance of the survival of evolving humans in the past (Seeber et al., 2025). Furthermore, research conducted by Fabian Hutmacher in 2019 noted the dominant feature of the human visual system wherein it constitutes a large portion of the neocortex, substantially more so than any other sense. This increased area space of the visual system within the brain also indicates a larger amount of energy being supplied to this sensory component of the human body compared to other senses, as well as evidence for a substantial portion of selection pressure placed on vision compared to other senses within the confines of evolution (Hutmacher, 2019).

Finally, a recent international review by Tian-Ya Hou and Wen-Peng Cai in 2022 revealed how emotions impact WM for both better and worse depending on which emotions are elicited. Furthermore, vision and emotions appear to be intricately connected to each other which results in the formation of schematic perceptual frameworks that help one navigate and attend to the world around them (Hou & Cai, 2022). This connection between vision and emotions was researched by Philip A. Kragel and colleagues in 2019, and they discovered how emotionally embedded images are processed through the visual cortex where these images are encoded and decoded within multiple distinct emotional-categorical models that are embedded within memory. These models help with the derivation of meaning, and it has a direct effect in decision making processes and attention (Kragel et al., 2019). These pieces of evidence further help reveal the importance of vision’s impact on working memory; however, whether vision impacts WM more than hearing will be examined next.

Current Working Memory Research

The current literature is abundant on visual and auditory memory; however, an examination of a few bits of relevant research will be achieved here. First, a study examining the differences between visual and auditory working memory conducted by Katie Lindner and colleagues in 2009 discovered that visual memory was superior to auditory memory. In this study, researchers had 49 college student participants divided up into four distinct groups with two of the groups being designated auditory and the other two being visual. These groups were further divided into immediate post-test groups and delayed post-test groups. The results from the study found that the visual groups outperformed the auditory groups in both immediate and delayed post-testing and thus concluded that visual processing effects were more impactful on both working memory and long-term memory (Lindner et al., 2009). Another study from 2009 conducted by Michael A. Cohen and colleagues that focused on the impacts of visual and auditory effects on WM found similar results as Lindner’s team wherein they concluded that auditory processes were inferior to visual processes in terms of WM.

However, a more recent study by Michele E. Gloede and colleagues conducted in 2017 found that although visual processes provided a greater influence on working memory, the effects were of longer duration when it came to auditory processes. Their study consisted of 17 participants that were tested four separate times along with auditory training. This training did provide a benefit with an increase in WM capacity in the confines of auditory testing, although the authors mentioned that the differences between visual and auditory effects on working memory do not appear to be related to one’s experience in those domains. Therefore, this could be evidence for the evolutionary case for visual and auditory differentiation on memory. Lastly, they concluded that although visual memory had a larger capacity, auditory memory was sustained longer in duration (Gloede et al., 2017). Another study by Michele E. Gloede, with the assistance of Melissa K. Gregg conducted in 2019, found similar results to Gloede’s 2017 study, wherein same-day memory tasks with auditory and visual processes revealed that although visual memory was superior in the context of the short-term, auditory and visual memory were similar 2-7 days following the tasks (Gloede & Gregg, 2019).

More recent studies found no difference between visual and auditory systems effects on working memory. The first is a study by Dhana Lace Acedilla and colleagues performed in 2022 on 2nd year university students. This study consisted of two groups of 15 participants with one group assigned to visual memory and the other to auditory memory. Both groups were provided with 20 words to memorize in the manner of the group’s articulation orientation and then were tested on the memorization of these words. The results were that both groups performed similarly and thus it was concluded that visual memory and auditory memory were similar in terms of WM (Acedilla et al., 2022). Similar findings were produced by Sanjana Singh S. & Asha Yathiraj in 2024 wherein they assessed visual and auditory memory in young children ages 8-12 years of age. 18 children were tested on their auditory and visual memory performance both on immediate and delayed memory using the Children’s Memory Scale (CMS). The results indicated no significant difference between visual memory and auditory memory on both immediate and delayed testing (Singh S. & Yathiraj, 2024).

These various conflicting results indicate a faulty approach either in older studies, the newer ones, or both. It is also plausible that these results could indicate a possible recent evolutionary phenomenon wherein humans are shifting their attentional abilities as a result of technological advancements that promote a differentiation in lifestyle compared to their ancient counterparts. Examining the various approaches and similarities of the investigated studies could help resolve some of the confusion.    

Approaches of Current Research

To address these issues found within the research literature, an examination of similarities and differences in approaches must be considered. These similarities and differences are most evident in the sample sizes, age ranges, and methodologies. In all studies, the sample sizes were relatively small with the upper limit being evidenced only in the earlier study by Lindner et al. in 2009 wherein 49 participants were used while all other studies used sample sizes averaging in the teens. Furthermore, the age ranges used were college students and younger with no consideration for older populations. This emphasis on emerging adulthood and adolescence neglects essential data from an age range that constitutes the majority of the general population. Lastly, the methodological approach was similar in all studies wherein simple visual and auditory methods of memorization and testing were administered. All of these factors are potential limitations that could be affecting the final results and obstructing a comprehensive understanding of visual and auditory memory, and that of WM.

Limitations

The pre-addressment of limitations has already been examined in the previous section; however, an in-depth analysis of these limitations will be accomplished here. First, the sample sizes for all the examined studies in effect are quite small for a proper representation of the general population, and thus a larger sample size is necessary. Next, the age range by which the previous studies utilized was much too narrow of a scope to be able to properly generalize. Moreover, these lower age groups, although they offer their own valuable insights, is a time wherein the brain is not fully developed and therefore does not provide a comprehensive analysis of WM in a fully developed stage. This begs the question of whether what has been witnessed in the inconclusive results is but a gap within the aging brain, or if the results are accurate, if this inconclusion is but a current evolutionary phenomenon transpiring between generations. Finally, the methodological approach needs to be broader and increased in complexity. The narrowed testing provided by previous studies is only unidimensional in essence, and therefore a broader and more complex test approach could help reveal the complex and intricate processes of the brain and memory.

Theory and Hypothesis

According to evolutionary theory, humans have adapted to their environment through interactive sensory engagements that have wired the connectome of the brain in a way that represents this interactive sensory experience. Moreover, recent research has revealed the mapping of these sensory experiences throughout the brain with vision being the most ancient and widely distributed system followed by the auditory complex (Hutmacher, 2019; Lipovsek et al., 2023; Nilsson, 2022). These experiences also necessitated memory formation to help humans in their adaptive process which required the interaction between sensory input and emotional arousal (Hou & Cai, 2022; Kragel et al., 2019). Regarding the evolving nature of sensory input with vision as the predominant sensory system along with its proximal link with memory formation via emotional arousal, it is hypothesized that the visual system is the predominant sensory system by which WM is evoked. This is further supported through the previously examined research (Cohen et al., 2009; Gloede et al., 2017; Gloede & Gregg, 2019; Lindner et al., 2009); however, some of the most recent research has contended this hypothesis (Acedilla et al., 2022; Singh & Yathiraj, 2024). These contradictory research findings elicit a need for further investigation to help disclose whether the visual system or auditory system is the predominant underlying process by which WM is evoked.

Variables

The independent variable will be the appearance of a given stimulus according to its sensory category (visual/auditory). The dependent variable will be the length of the digit sequence that the research participant can recall (working memory).

References

Acedilla, D., Aldemita, J., Dy, A. & Maguinda, A. (2022). Auditory vs. visual: 2nd year students of the USC psychology department in terms of short-term memory retention. https://www.researchgate.net/publication/365196048_Auditory_vs_Visual_2nd_Year_Students_of_the_USC_Psychology_Department_In_Terms_of_Short-term_Memory_Retention

Bajaffer, A., Mineta, K., & Gojobori, T. (2021). Evolution of memory system-related genes. FEBS open bio, 11(12), 3201–3210. https://doi.org/10.1002/2211-5463.13224

Carruthers P. (2014). Evolution of Working Memory. In: National Academy of Sciences; Cela-Conde CJ, Lombardo RG, Avise JC, et al., editors. In the Light of Evolution: Volume VII: The Human Mental Machinery. Washington (DC): National Academies Press (US). https://www.ncbi.nlm.nih.gov/books/NBK231620/

Chin, R., Chang, S. W. C., & Holmes, A. J. (2023). Beyond cortex: The evolution of the human brain. Psychological review, 130(2), 285–307. https://doi.org/10.1037/rev0000361

Cohen, M. A., Horowitz, T. S., & Wolfe, J. M. (2009). Auditory recognition memory is inferior to visual recognition memory. Proceedings of the National Academy of Sciences of the United States of America, 106(14), 6008–6010. https://doi.org/10.1073/pnas.0811884106

Fernandez-Velasco, P., & Spiers, H. J. (2024). Wayfinding across ocean and tundra: what traditional cultures teach us about navigation. Trends in cognitive sciences, 28(1), 56–71. https://doi.org/10.1016/j.tics.2023.09.004

Gloede, M.E., & Gregg, M.K. (2019). The fidelity of visual and auditory memory. Psychon Bull Rev 26, 1325–1332. https://doi.org/10.3758/s13423-019-01597-7

Gloede, M. E., Paulauskas, E. E., & Gregg, M. K. (2017). Experience and information loss in auditory and visual memory. Quarterly journal of experimental psychology, 70(7), 1344–1352. https://doi.org/10.1080/17470218.2016.1183686

Hou, T. Y., & Cai, W. P. (2022). What emotion dimensions can affect working memory performance in healthy adults? A review. World journal of clinical cases, 10(2), 401–411. https://doi.org/10.12998/wjcc.v10.i2.401

Hutmacher F. (2019). Why Is There So Much More Research on Vision Than on Any Other Sensory Modality? Frontiers in psychology, 10, 2246. https://doi.org/10.3389/fpsyg.2019.02246

Kaas J. H. (2013). The evolution of brains from early mammals to humans. Wiley interdisciplinary reviews. Cognitive science, 4(1), 33–45. https://doi.org/10.1002/wcs.1206

Kragel, P.A., Reddan, M.C., Labar, K.S. & Wager, T.D. (2019). Emotion schemas are embedded in the human visual system. Science. Adv.5, 4358. DOI:10.1126/sciadv.aaw4358

Lindner, Katie; Blosser, Greta; and Cunigan, Kris (2009) “Visual versus auditory learning and memory recall performance on short-term versus long-term tests,” Modern Psychological Studies: 15(1) 6. https://scholar.utc.edu/mps/vol15/iss1/6

Lipovsek, M. & Elgoyhen, A.B. (2023). The evolutionary tuning of hearing. Trends in Neurosciences. doi: 10.1016/j.tins.2022.12.002

Miller, G. A. (1956). The magical number seven plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63(2), 81–97. http://doi.org/10.1037/h0043158

Nilsson D. E. (2022). The Evolution of Visual Roles – Ancient Vision Versus Object Vision. Frontiers in neuroanatomy, 16, 789375. https://doi.org/10.3389/fnana.2022.789375

Ptito, M., Bleau, M., & Bouskila, J. (2021). The Retina: A Window into the Brain. Cells, 10(12), 3269. https://doi.org/10.3390/cells10123269

Roediger, Henry & Yamashiro, Jeremy. (2019). History of Cognitive Psychological Memory Research. The Cambridge Handbook of the Intellectual History of Psychology.  Cambridge University Press DOI:10.1017/9781108290876

Seeber, M., Stangl, M., Vallejo Martelo, M., Topalovic, U., Hiller, S., Halpern, C. H., Langevin, J. P., Rao, V. R., Fried, I., Eliashiv, D., & Suthana, N. (2025). Human neural dynamics of real-world and imagined navigation. Nature human behaviour, 9(4), 781–793. https://doi.org/10.1038/s41562-025-02119-3

Singh S, S., & Yathiraj, A. (2024). Auditory Memory and Visual Memory in Typically Developing Children: Modality Dependence/ Independence. The journal of international advanced otology, 20(5), 405–410. https://doi.org/10.5152/iao.2024.241504

Stewart, J. R., & Stringer, C. B. (2012). Human evolution out of Africa: the role of refugia and climate change. Science (New York, N.Y.), 335(6074), 1317–1321. https://doi.org/10.1126/science.1215627

Posted on