What Neural Networks Put Second: Categorization Models as a Window into the Nature of Memory

Andrew Jun Lee

Every day, we perform the remarkable ability of navigating our world so effortlessly that the basic operations involved in this ability seem too mundane to mention. But if you’ve ever had experience coding or programming, you may have noticed that telling computers to do even simple tasks can be incredibly complicated, and at times, deeply frustrating. How do I tell a computer, for instance, that the Shaq who played for Miami is the same Shaq who played for the Lakers? How do I tell a computer that a drawing of a dog is not a real dog? We think of everyday cognitive abilities as simple processes, and in fact they are very easy for us to do, but these everyday abilities mask an underlying complexity that is difficult to articulate.

Consider one of today’s most popular artificial intelligence systems, chatGPT. Unlike previous forms of artificial intelligence, chatGPT is as capable as humans in solving a variety of complex tasks. Give chatGPT an essay prompt on Mediterranean trade in the 1400’s and the machine will generate something surprisingly well-written. Such feats of intelligence, though impressive, come with a hefty price. The parent system that chatGPT runs on (GPT-3) is massive, containing a whopping 175 billion internal settings (Brown et al., 2020). The hardware required to house the system is also extensive: chatGPT runs on a large cluster of interconnected computers optimized to perform numerous calculations at once. The system is computationally expensive as well: generating language output involves computing over all 175 billion of its units.

The process of replicating intelligent behavior is by no means trivial, but frankly, this observation only adds to a larger mystery: what is going on inside complicated systems like chatGPT? Is there some kind of meaningful or intelligible pattern that we can abstract across the activity of individual computations? Or are these systems an amorphous conglomerate of mostly meaningless computations, like the hum and buzz of particles in Brownian motion? And if there is an abstractable pattern or regularity, what is it? A kind of step-by-step algorithm, like a cookbook recipe or an instructions manual? Or a continuously evolving system, like weather forecast models or physics engines? Perhaps something in-between?

ChatGPT is one example of the growing swath of so-called “black box” models, which are models that are so complicated in their design that their activity becomes difficult to decipher. Black box models typically refer to artificial neural networks, which can have up to thousands, millions, and even billions of internal settings that have been fine-tuned to accomplish a task: classifying images, completing Google searches, or recommending the next TikTok. In large-scale neural networks like chatGPT, each of its many, many internal settings don’t mean much on their own. Though we know, for instance, that the word “DOG” refers to the concept of a dog (something that barks, runs, and is dog-shaped), often a single internal setting doesn’t come to “represent” this information: no single setting is more active, let’s say, when you give the neural network the word “dog.”

Much of the success behind neural networks has come from simply adding more and more internal settings to their design. But in doing so, the interpretability of any single internal setting has suffered. As a consequence, our collective understanding of what modern large-scale neural networks are doing has suffered too: we cannot easily grasp the inner workings of these complex models by simply “peeking into” them. In contrast, the architecturally simpler cognitive models of the past century were relatively easy to interpret: it was much clearer what was represented by the structure, architecture, or connections of a model (see note1 for an example).

I will argue this difference makes it much more difficult to answer the same questions about cognition that models of the previous century provided clearer insight into. In particular, I’ll focus on how models of the past and present addressed the following questions about memory: what about a concept or category do we represent in long-term memory? And to what extent do we represent those things about a concept or category?

To answer these questions, it’s helpful to place ourselves in history. One long-standing debate in cognitive science is the debate on how we categorize things—for example, how we decide that a book that looks like a book, is in fact a book. Since the 1970s, there have been two main theories of categorization: the exemplar theory and the prototype theory. While the prototype theory was one of the first contenders to provide an adequate account of our categorization abilities, researchers including Douglas Medin and Robert Nosofsky soon after argued that the exemplar theory could account for the same experimental findings as the prototype theory. Ever since, the debate between prototypes and exemplars has blossomed into decades of fruitful research, generating countless experimental predictions on carefully constructed artificial stimuli, and more recently for naturalistic visual stimuli, like images of dogs, cats, and chairs (Battleday, Peterson, & Griffiths, 2020).

Here, I will review evidence for the exemplar theory of categorization as the more likely candidate of how we categorize things—though the debate is far from over. In doing so, I will indirectly advocate a claim about memory that is the primary assumption of the exemplar theory: that individual encounters of categories are stored in long-term memory, not the prototype or “average” of our individual encounters of categories. When a dog comes up to me, the exemplar theory says I see it as a dog because I’ve compared it to all instances of dogs in memory, not to an averaged memory of a dog, or even of a partial average with recollection of some instances of dogs. In contrast, the prototype theory says I compare the dog to precisely my singular averaged memory of a dog.

Whether or not the exemplar theory is empirically validated, the point is that a resolution of the debate tells us something interesting about the nature of memory storage: category representations in memory either do or do not undergo the extensive information loss that the prototype theory suggests; and, as a consequence, category representations in memory may or may not retain some degree of distinctiveness from each other. By contrast, to what extent is a neural network (one without an explicit memory module) storing a memory of a dog at all, whether or not it is in the form of individual instances or averaged prototypes? The answer, at least to me, is not obvious.

Whereas categorization models of the past century have central to their design a claim about the representation of concepts in memory, investigating the nature of conceptual representations in neural networks is more like a post-hoc analysis, secondary to the larger engineering goal of optimizing performance on a classification task. Nevertheless, the emergence of prototype- or exemplar-esque representations in trained neural networks would be a fascinating avenue of research. This is especially so, as literature in recent years suggest that the task upon which neural networks are trained on is largely determinant of their correspondence to human neural activity (Kanwisher, Khosla & Dobs, 2023; Konkle et al., 2022). Perhaps the decades-long debate on prototypes and exemplars can be given a modern spin as an exploration into neural networks that show promise as models of the brain.

One of the earliest forms of evidence for the exemplar theory came in comparing the two theories’ predictions on the accuracy of participants’ categorization performance on a hand-constructed category structure (often called the 5-4 categories) (Medin & Schaffer, 1978). In Medin’s and Schaffer’s task, participants saw an image and had to select which category it belonged to: either Category A (comprising five instances) or Category B (comprising four). The categories varied along four binary dimensions (or features): an instance of A could be 1110, while an instance of B could be 0000. Furthermore, the categories were defined by “family resemblance” (i.e., overall similarity), which is to say, not distinguishable by a linear classification rule,2 or a kind of “workaround” that wouldn’t get at the heart of the question.

The choice to use unequal numbers of instances (A = 5, B = 4) comprising each category was not coincidental. On the contrary, it allowed for divergent predictions of the prototype and exemplar theories: in having more instances in Category A, the exemplar theory would predict that a novel Category A instance would be categorized as Category A more often than a novel Category B instance would be categorized as Category B. The prototype theory, on the other hand, would predict equivalent frequencies: a novel Category A instance would be categorized as A, just as often as a novel Category B instance would be categorized as B. This is because the prototype theory assumes the loss of information about the number of encountered instances stored in memory by retaining only one prototype per category. But an exemplar approach knows how many instances it’s encountered before. Because it knows there are more A’s than B’s, it’s slightly more confident to say an instance that looks like A belongs to A, and slightly more hesitant to say an instance that looks like B belongs to B.

Across four experiments—testing and resolving different experimental design issues—Medin and Schaffer (1978) saw this prediction bear out. It seemed like participants really were storing individual instances in memory, rather than two prototypes. It seemed like we do store all of our encounters of dogs as distinct from one another, even if we might not remember each and every one on command.

Or does it? One of the main flaws of this study is the use of a rather limited number of instances for each category. Yet, an academic’s day-to-day encounter with books is enormously frequent. Does an academic, then, store each instance of a day’s worth of book encounters in memory, granted that each of that day’s books has equal retrieval strength? It’s possible that participants may have remembered all 9 instances of Category A and Category B. It’s possible that there is a storage capacity threshold beyond which instances get clumped into a prototype, with only a few retaining individual distinctiveness.

Yet, a more pressing concern is the usefulness of the 5-4 categories in teasing apart the prototype and exemplar theories. Although Medin’s and Schaffer’s main prediction panned out, research by Smith and Minda (2000) twenty-two years later showed that formal mathematical models of the prototype and exemplar theories (Nosofsky, 1986; Maddox & Ashby, 1993; Medin & Smith, 1981), when used in their most flexible forms, fit seemingly equally well on thirty accuracy datasets of the 5-4 categories that had been published by then. Even when an exemplar model was boosted by a parameter that enables better individual participant fitting, two prototype models (one that attempts to mimic recognition memory and another that incorporates memorization of encountered instances)3 reproduced the aggregate trend of the datasets just as well.

This evidence is compelling in the way that it compares the two theories with direct model fitting procedures for a no-small-sum of thirty different datasets, providing what appears to me an advantage over less direct psychophysics predictions. To be sure, however, I think that when the domain of measurement changes to something like model fitting, there is an important cautiousness to be had about the interpretability of those measures. Smith and Minda (2000) interpreted model fit as the visual closeness between accuracy trends of the models and the observed datasets. While the closeness of the models’ predictions are visually indistinguishable, today there are quantitative measures of model comparison, such as Bayesian Information Criterion, that penalize models with more complexity as a way to balance out the advantage of flexibility afforded by complex designs. I think it is important today to use these quantitative measures over subjective visual inspection, especially as the various models tested by Smith and Minda (2000) have different levels of complexity, though, I suspect, not to an extent that would substantially impact their overall narrative.

Perhaps the largest worry is the validity of extrapolating any results stemming from artificially constructed stimuli with few dimensions (such as the 5-4 categories) into real-world scenarios. Fortunately, it remains possible to address in a controlled way the exemplar/prototype debate with naturalistic stimuli that have many dimensions (though there is, of course, something important to be said about controlling for noise in more artificial laboratory settings).

Recently, Nosofsky, Meagher, and Kumar (2021) had participants learn geologic rock categories, advantageous over artificially constructed categories in a number of ways. These categories are not well known by the general population, comprise at least ten categories with fifteen instances each, have family resemblance structures with prototypes at their centers, and consist of multiple feature dimensions, such as roughness of texture, color, and shape, to name a few. The novelty of this study comes not just in the use of more naturalistic stimuli, but also a new prediction to compare prototype and exemplar theories by employing “high-similarity neighbors” (HSNs), instances that are similar to encountered instances (during training) along the same dimensions. Contrast HSNs with novel instances that are equally similar to encountered instances overall, but similar along different dimensions (a.k.a., standard test instances). If prototype theory is right, then accuracy for HSNs should be equal to accuracy on standard test instances. In contrast, if exemplar theory is right, then accuracy for HSNs should be higher because participants may be reminded of the specific rock they were shown during training and the category to which it belongs (Ross et al., 1990).

In line with exemplar theory, participants did in fact categorize HSNs more accurately than standard test instances, even though the latter were equally similar as HSNs. Furthermore, a formal exemplar model had a better average Bayesian Information Criterion score than both a pure prototype model and a prototype model with memory of encountered instances. This evidence is compelling on numerous fronts—the stimuli used, the model fit measures, and the logic of the prediction tested. Yet, there remain a number of other phenomena in the categorization literature that appear a challenge for the exemplar theory and need be addressed to resolve the larger question of category representation in memory (to list a few: Erickson & Kruschke, 2002; Stewart et al., 2002; Little et al, 2011).

To the extent that either exemplar or prototype theory is correct, our representation of category instances in memory remains a question that can be answered by progress on the debate. In contrast, neural networks, at least, appear a difficult medium to answer the same question about memory, given the large difference in architectural substrates (parallel computation along numerous neurons vs. serial algorithms with discrete data structures). As always with science, the prize is in the pudding: more research is needed.


1For an example of “representational interpretability,” imagine a neural network has been trained to recommend the next TikTok on your “For You” page. The neural network is pretty large, with dozens of layers of parameters, and also pretty successful at choosing TikToks that will keep you stuck in a mindless scrolling trance for hours. Now, imagine, also, there is another equally successful model, but one based on a traditional computer program, consisting of if-then statements, for-loops, while-loops, functions, and objects, that altogether manipulate a representation of previous TikToks to choose the next best one. While it is clear in the traditional computer program that there are representations of TikToks (perhaps 1 TikTok = list of features), it is not obvious whether the neural network represents something like a “list of features” within its many, many parameter connections.

2Important for their task, instances of both categories varied in such a way that they could not be distinguished by what’s called a “linear classification rule,” such as, “1110 belongs to Category A because its fourth feature is a 0, or its first two features are 1s.” Instead, these categories were defined by “family resemblance,” or overall similarity: Category A generally consisted of more 1s, though not always, and Category B generally consisted of more 0s. Broadly speaking, this family resemblance category structure is what made a comparison between prototype and exemplar theories possible because it eliminated the confound of participants’ acting on an easy “cheat code” to produce accurate categorization judgments. Instead of a cheat code, we want to test whether and how people are acting upon a representation of the categories in memory.

3The models they tested comprised a slew of variations of the two theories: an exemplar model implemented directly from the original Medin and Schaffer (1978) paper, a “boosted” exemplar model that enables better individual participant modeling, a prototype model that implements faster processing for old training instances and slower processing for novel instances (i.e., an attempt to mimic recognition memory), and a prototype model that memorizes all encountered instances and uses them in one of three ways, depending on whichever yields the best account for the data.


Battleday, R. M., Peterson, J. C., & Griffiths, T. L. (2020). Capturing human categorization of natural images by combining deep networks and cognitive models. Nature Communications, 11(1), 1-14.

Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., … & Amodei, D. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33, 1877-1901.

Erickson, M. A., & Kruschke, J. K. (2002). Rule-based extrapolation in perceptual categorization. Psychonomic Bulletin & Review, 9(1), 160-168.

Kanwisher, N., Khosla, M., & Dobs, K. (2023). Using artificial neural networks to ask ‘why’ questions of minds and brains. Trends in Neurosciences.

Konkle, T., Conwell, C., Prince, J. S., & Alvarez, G. A. (2022). What can 5.17 billion regression fits tell us about the representational format of the high-level human visual system?. Journal of Vision22(14), 4422-4422.

Little, J. L., & McDaniel, M. A. (2015). Individual differences in category learning: Memorization versus rule abstraction. Memory & cognition, 43(2), 283-297.

Maddox, W. T., & Ashby, F. G. (1993). Comparing decision bound and exemplar models of categorization. Perception & Psychophysics, 53(1), 49-70.

Medin, D. L., & Schaffer, M. M. (1978). Context theory of classification learning. Psychological Review, 85(3), 207.

Medin, D. L., & Smith, E. E. (1981). Strategies and classification learning. Journal of Experimental Psychology: Human Learning and Memory, 7(4), 241.

Nosofsky, R. M. (1986). Attention, similarity, and the identification–categorization relationship. Journal of Experimental Psychology: General, 115(1), 39.

Nosofsky, R. M., Meagher, B. J., & Kumar, P. (2022). Contrasting exemplar and prototype models in a natural-science category domain. Journal of Experimental Psychology: Learning, Memory, and Cognition, 48(12), 1970–1994.

Ross, B. H., Perkins, S. J., & Tenpenny, P. L. (1990). Reminding-based category learning. Cognitive Psychology, 22(4), 460-492.

Smith, D. J., & Minda, J. P. (2000). Thirty categorization results in search of a model. Journal of Experimental Psychology: Learning, Memory, and Cognition, 26(1), 3.

Stewart, N., Brown, G. D., & Chater, N. (2002). Sequence effects in categorization of simple perceptual stimuli. Journal of Experimental Psychology: Learning, Memory, and Cognition, 28(1), 3.