This post came out of a conversation with ChatGPT (part 1, part 2) about cognitive constraints and machine learning.

Working memory as an inductive bias
Human cognitive abilities are bounded. Working memory holds and processes only a limited amount of information at once. Cognitive psychology references the “magic number seven”: most adults can hold between five and nine items in working memory. This constraint forces us to use abstractions to understand complex systems.
Suppose we are dealing with variables $(x_1, x_2, x_3, x_4)$. Our brain struggles to process the joint distribution of all four simultaneously because of working memory limits. But if we create an abstraction where $X$ represents $(x_1, x_2)$ and $Y$ represents $(x_3, x_4)$, we simplify the task to handling the joint distribution of $(X,Y)$. Two variables instead of four.
The cost: condensing reality to fit cognitive capacity means losing information. A critical relationship between $x_2$ and $x_4$ (maybe only critical in certain contexts) gets discarded in the simplified model. This is where “the whole is greater than the sum of the parts” comes from. The full understanding of a system may be irreducible, with important behavior emerging only when all variables are considered together. Emergent phenomena are the main challenge when working with abstractions.
Working Memory and Inductive Bias
Our small working memory shapes how we reason and understand the world. It is an inductive bias, a filter that determines which patterns we detect and which generalizations we form.
This constraint might not be a pure disadvantage. It could be an advantage, given the regularities in our environment. Think of it as regularization in machine learning: constraints that prevent the model from overfitting training data, improving generalization to unseen instances. If we had much larger working memories, we might overfit to our past observations, impairing our ability to adapt and survive in new situations, particularly those out on the long tail of the distribution. Survival depends on avoiding catastrophic mistakes, even after decades of mostly good decisions.
This connects to Occam’s razor and Solomonoff’s theory of inductive inference, which favor simpler models that sufficiently explain the observations. Model complexity is regulated to avoid overfitting and ensure generalization.
The Limits of Our Understanding
The inductive bias from limited working memory may be advantageous in the human niche, but it has shortcomings worth considering. There may be aspects of reality that remain inaccessible to us because of our cognitive constraints.
Take consciousness. Understanding how self-awareness arises in a system may require accounting for the joint distribution of an astronomical number of variables. If this complexity is irreducible, our cognitive apparatus, bound by its inductive bias, may be inadequate for the job.
It is conceivable that large regions of reality are fundamentally off-limits to human cognition, hidden behind the constraints of our cognitive architecture. The complexity of these phenomena may resist simplification, making them impervious to understanding through abstraction.
As we push the boundaries of understanding, it is worth keeping in mind what our cognitive capacities allow and what they do not.
Connection to LLMs and Unconscious Cognition
This perspective connects to broader themes in AI and cognitive science. The distinction between conscious working memory (System 2) and unconscious processing (System 1) maps onto modern neural network architectures in interesting ways.
Large language models, particularly transformers, perform a kind of pattern completion that resembles System 1 cognition: fast, automatic, operating over vast implicit knowledge. The attention mechanism in transformers can be viewed as a form of working memory, with its own capacity constraints (context window limits) that may similarly serve as regularization.
Whether these parallels are superficial or point to something deeper about intelligence is an open question. But the observation that constraints can be features, not just limitations, is a useful lens for thinking about both human cognition and artificial intelligence.
Discussion