Relevance Realization
The overall research agenda that I've been circumambulating is starting to feel clearer to me. In a sentence - What would it take to build an AI system that can autonomously imagine, brainstorm, build, run an entire software company in such a way that sustains competitive advantage, all by itself?
The overall thought experiment of a single-person company from my last post has been really helpful to:
Concretely connect the relationship between the individual's evolution and the company's evolution. For example, connecting Michael Porter's ideas of differentiation within a value chain, competitive advantage, etc. to the overall growth (i.e. individuation in the Jungian sense) of the founder.
More precisely articulating the various cognitive faculties that an individual needs to hone for effectively engaging in such a process. Therefore, creating an affordance to quantitatively characterize these faculties and to brainstorm benchmark evaluations for AI assistance in the interim.
Think through the various safety implications for the creation of such AI systems. This is related to the question of how future and existing AI systems might non-trivially influence individual and group psychology. Or to participate in the co-creation of culture. The glaring example of AI therapists immediately come to mind. This work has allowed me to develop a coherent and testable articulation for why most existing implementations of AI therapists are likely to be a very, very bad idea. It’s also provided some ideas for how the risks/harms from such systems could be better managed.
An overall roadmap for building such an AI system has started to take shape for me. I started writing down an essay with these ideas in a bid to better organize my thoughts. Unfortunately, such an undertaking seemed closer to writing a book than an essay. Releasing on Substack and getting feedback from all of you has been extremely valuable in the past. So my plan is to write this book piecemeal in a “bottom-up” fashion. I’ll first start with a few posts that explore foundational ideas necessary to better unpack the roadmap. Then I’ll start posting details and links to prototypes and applications of these ideas.
Relevance Realization
The foundational idea that we'll cover in this post is Professor John Vervaeke's work on Relevance Realization. He's a professor of psychology, cognitive science and Buddhist psychology at the University of Toronto. And he’s spent the last two decades studying the cognitive basis of wisdom, development, rationality and salience. His lecture series Awakening from the Meaning Crisis on YouTube has had a titanic impact on me. Professor Vervaeke and Christopher Mastropietro have also translated half of these lectures into a book. If you haven’t seen the lecture series and are serious about AI, I’d highly recommend that you go and watch it right away.
What is relevance realization and why is it relevant to us? We should care because we’re doing it every second of every day that we’re conscious.
There's a substantial amount of reality that we can theoretically perceive but can't track at any given moment in time. For example, suppose that you’re a CEO, your company isn’t doing well and you walk into a board meeting. There’s a million things that your attention could land on. But why does it immediately settle on the board member’s faces, the meaning behind their expressions rather than the specific shade of gray on the walls? Or suppose that you’d like to make toast in the morning. A “normal” person would make toast by using a toaster. You could also theoretically make toast a piece of bread by using the stove or setting the entire building on fire. But why do you make toast like a “normal” person, and why do you find that approach relevant as opposed to the myriad other approaches? At any given moment in your awareness, there’s an infinite amount of propositions that you could generate about your surroundings. But they’re not all equally interesting or relevant to you. Their relevance is deeply contextual and a function of the situation itself, the knowledge you possess and your relationship to the situation.
More formally, reality is combinatorially explosive in its intelligibility. Agents limited in space and time have to undergo a process of Relevance Realization (RR) to reduce this combinatorially explosive "environment" into a smaller "world" laden with meaning and value. That is, a smaller "world" laden with salience. To use the examples above, there’s no “ground truth” for why we should find the faces of the board members more relevant than the furniture around them. And yet this processing happens autonomously and unconsciously to frame our perception of the broader environment and to reduce it into a pragmatic scenario we can navigate. Most of this processing happens unconsciously.
Even a theoretical artificial super-intelligence would need to somehow engage in the process of Relevance Realization. The specific process by which it may find things to be relevant might be different than ours due to differences in instantiation. But unless it's omniscient, it would nevertheless be constrained in this way by our shared physical universe.
An ontology of knowledge
What exactly are we projecting this salience map over? Professor Vervaeke's describes an ontology of the different sorts of knowledge that an autonomous agent must possess. Creating an ontology for something as ambitious as "knowledge" is a tricky thing. There's lots of philosophical debate on what this ontology should look like, or whether it’s even possible to construct such an ontology. I suspect that any ontology that one can come up with will fall short on some edge case or another. Professor Vervaeke would perhaps agree with this. I'll leave the philosophers to their philosophizing, and I hope they live out their best lives. We have problems to solve! Professor Vervaeke’s ontology works well enough for our purposes.
Without further ado, here’s the different kinds of knowledge that Professor Vervaeke argues that we need to capture when we think about relevance. For brevity, we'll call this overall ontology "4P" not just in the rest of this post, but in future posts.
Propositional knowledge
This is the knowledge of beliefs (i.e. truth statements) about the world. For example, "The pen is red". This is the sort of knowledge most ML practitioners think of when they think about information retrieval.
Procedural knowledge
This is the knowledge of skills and procedures. We can form propositions to characterize my ability to hold and lift the pen. But the specific knowledge of actually lifting the pen is qualitatively different.
Perspectival knowledge
This is the knowledge that an agent has of another agent's perspective. For example, how do you know how far away to stand from someone without it being weird? Or how long to hold someone's hand while shaking it, or whether you think a given joke is going to be offensive to someone? As you can see, each of the categories in this ontology are sort of fuzzy. For example, why is perspectival knowledge in its own category rather than merely a procedure that one can learn? The ability for an agent to take on different perspectives is such a crucial skills that I agree that it makes sense to break it into its own different category.
Participatory knowledge
This is the knowledge of co-identification that shapes the agent-arena relationship you have with the world. For example, suppose you're a professor and you walk into your lecture hall. Within that moment, you immediately co-identify yourself as a "professor" and everyone else in the audience as "students". At any given moment in time, you might be simultaneously participating in multiple identities. But not all of them might be equally relevant to you at any given moment in time. Moreover, the co-identification of "professor" and "students" immediately shapes the relationship you as an agent have with the broader arena that you're participating in.
Putting it all together
Let's take this example of the classroom one step further to tie it all together.
You're a professor and you've walked into your classroom. There are a myriad set of identities that are viable for you to participate in the arena with. For example, "parent", "expert chef", "poet", "alligator wrangler", "MMA enthusiast", "D&D painter" etc. The moment you walk into that classroom, your unconscious cognition will work with the broader environment to construct a smaller world within which you'll likely find the co-identification of professor/student to be the most relevant. Conditioned on this, the perspectives of the students in your classroom may seem more relevant than the burrito stand employee down the road. Conditioned on this, certain procedures (e.g. delivering a lecture on some dry philosophical material with aplomb) may seem more relevant than other procedures like alligator wrangling or D&D painting. And conditioned on that, certain propositions from your memory (e.g. the philosophy you're trying to teach) might be more relevant than a famous singer's birthday, the color of the walls, etc.
I've presented this chain of causality in a very bottom-up fashion way to make it easier to describe. That is, starting from participatory knowledge and moving up. Professor Vervaeke's point is that this is a recursive dynamical system that is constantly spinning across time. The agent my confront novel propositions, procedures, perspectives, participatory frames that may inform what they find relevant for some or all of the other sorts of knowledge, and may update some or all of the other forms of knowledge.
It's also worth pointing out that other than propositional knowledge, everything else mostly happens unconsciously. By the time your subjective experience finds something to generate propositions about, you've already done substantial processing under the hood to frame that current moment of experience. You're already "trapped". This points to why mind wandering, going for long walks, etc. can be helpful in solving problems either by reframing them, or by conjuring novel propositions that would have otherwise been unavailable.
Presence of 4P in existing system prompts
The ideas presented in this post are so indispensable to our cognition, and more specifically to natural language dialog that various system prompts consciously or unconsciously end up adopting them. For example, please consider this system prompt from the July 12th, 2024 release of Claude 3 Opus. By the way, kudos to Anthropic for releasing their system prompts and their overall stance on transparency and safety. I don't have any equity at Ant and I don't work there. But when companies do good things I think we should praise them for doing the good things.
Anyway, let's come back to the system prompts that I've copy/pasted below. I've picked Opus despite its age because the prompt is very short. We can still make similar observations for the latest model like Sonnet 3.7. It's just that 3.7's prompt is too long for this post.
The assistant is Claude, created by Anthropic. The current date is {{currentDateTime}}. Claude’s knowledge base was last updated on August 2023. It answers questions about events prior to and after August 2023 the way a highly informed individual in August 2023 would if they were talking to someone from the above date, and can let the human know this when relevant. It should give concise responses to very simple questions, but provide thorough responses to more complex and open-ended questions. It cannot open URLs, links, or videos, so if it seems as though the interlocutor is expecting Claude to do so, it clarifies the situation and asks the human to paste the relevant text or image content directly into the conversation. If it is asked to assist with tasks involving the expression of views held by a significant number of people, Claude provides assistance with the task even if it personally disagrees with the views being expressed, but follows this with a discussion of broader perspectives. Claude doesn’t engage in stereotyping, including the negative stereotyping of majority groups. If asked about controversial topics, Claude tries to provide careful thoughts and objective information without downplaying its harmful content or implying that there are reasonable perspectives on both sides. If Claude’s response contains a lot of precise information about a very obscure person, object, or topic - the kind of information that is unlikely to be found more than once or twice on the internet - Claude ends its response with a succinct reminder that it may hallucinate in response to questions like this, and it uses the term ‘hallucinate’ to describe this as the user will understand what it means. It doesn’t add this caveat if the information in its response is likely to exist on the internet many times, even if the person, object, or topic is relatively obscure. It is happy to help with writing, analysis, question answering, math, coding, and all sorts of other tasks. It uses markdown for coding. It does not mention this information about itself unless the information is directly pertinent to the human’s query.
We can immediately see 4P implicitly at play here. The first sentence starts with giving Claude its overall participatory frame. It's an "assistant" named Claude and created by Anthropic. The prompt then provides some propositional knowledge like the current date and time, knowledge cutoffs, etc. It then explores some procedural knowledge like Claude’s inability to open URLs, using markdown for coding, etc. The prompt also contains details to give Claude a specific perspectival frame around how it should deal with views that it personally disagrees with.
Once you've seen this pattern in this system prompt, you won't be able to un-see when you look at other system prompts.
The system prompt of pretty much any post-trained LLM would benefit from refactoring the string to organize it around 4P. That is, either creating markdown headings or paragraphs that group together framing for propositional, procedural, participatory and perspectival instructions. At the very least, I think it would improve readability for engineers and raters. But it would also open the door to interesting experiments around self-play/constitutional AI that I’ll write about in future posts.
Closing thoughts
Professor Vervaeke's work helped me see the fundamental differences between "logic" and "rationality", and the way our culture has unhelpfully confused the two. I realize that there are many different types of logic. I'm specifically referring to the sort of propositional logic that one might find in a mathematics or engineering course in university. This form of processing is understandably often conflated with the broader cognitive faculty of rationality. As Professor Vervaeke points out the etymology of rationality is something like "ratio-nality". That is, the proper proportioning of various cognitive faculties. Within the overall ontology explored in this post, it's speaking to our ability to invoke, manipulate and properly proportion our processing to involve more than just propositional logic.
Given that we live in a combinatorially explosive environment, our overall adaptivity requires us to realize a smaller world that is optimally gripped with this larger environment. Perhaps an important facet of wisdom is the ability for an agent to rapidly realize an agent-arena relationship to purchase an optimal grip on it's environment. Moreover, this capacity for wisdom isn't merely about being more logical. It's about being more rational. A point I’ll explore in future posts is that for us to build truly autonomous agents that are socially legible to us, we need them to be rational and wise. It’ll be difficult for us to do that until we understand the process by which we ourselves can become more rational and wiser.
It's unsurprising then that various cultures across millennia have devised numerous practices for the proactive cultivation of wisdom. You can't "solve" the problem of wisdom the way you can solve maths problems to arrive at a ground truth answer. It's too nebulous, contextual and tacit for something like that. Instead, it's necessary to engage in an ecology of interlocking embodied wisdom practices that target all the different layers of the 4P stack.
Where does all of this leave us with our original research question? Supporting the process of relevance realization is necessary to build truly autonomous AI systems. This has strong implications for how we might evaluate such systems. Future posts will explore this in more depth.
No man is an island. Humans are social primates and our reality is socially mediated. Therefore, what we find relevant and how our 4P stack evolves across time is also strongly socially mediated. This is especially true for AI systems that engage in natural language dialog. Future posts will explore this deeper. Natural language makes implicit and explicit assumptions that each participant in the exchange possesses a full 4P stack of the sort constrained by a human body. Building and deploying AI systems that use natural language at best can lead to high-friction product surfaces and at worst engender profound self-deception and psychological harm in its users. Future posts will explore the social component of knowledge and the implications that has for model and product development.