Part VI: The Universe’s Self-Awakening.

Chapter 38: The “Winner Takes All” Catastrophe: The Alignment Problem Revisited.

Where does the human way of reducing reality to our functional fiction lead to? What does planet Earth, technology, human society, and nature look like in a million years? Do we need to worry about it? Can we and should we do anything meaningful to ensure the emergence of a beautiful future?

There are a few dangers that seem to be likely scenarios that would be useful to think about. One of these is called the “Winner Takes All” catastrophe known in economics (Frank and Cook, 1995). What will the world and universe look like if Google, xAI or Meta were to create an AGI that gains all power in the world? Can we trust that the Google AGI world will be a wonderful place in 1 million years?

Competition and survival of the fittest has worked very well in the biosphere, the stock market, and with software markets. There are occasions where a company has gained a monopoly that has caused relatively long-term problems that stifle innovation, harm consumers, and hinder overall market progress. In the context of Artificial General Intelligence (AGI), this “Winner Takes All” dynamic is amplified by the potential for recursive self-improvement, where an AGI could rapidly enhance its own capabilities, leading to an insurmountable lead over any competitors (Bostrom, 2014).

The main issue I think is that there might be some period of time where such event would create a lot of conscious suffering.

What is suffering

Suffering and negative emotions are ubiquitous in human experience. We have a wide range of difficulties in our lives. Learning is hard and slow. Operating in our society causes difficulty while learning. Diseases, mental and physical, cause problems that prevent us from focusing on what makes us enjoy life. The competition from the limited resources forces countries to protect their interest to offer quality of life that keeps the society calm.

Negative feelings can be simplified to what our brain learns to avoid. Getting a bad grade in a history exam feels bad if the student wants to avoid that. The student learns to avoid that experience by learning the subject. Hurting your hand on a sharp edge causes your subconsciousness to react to protect your body from more damage. The brain learns to avoid repeating the mistake that caused the damage to occur and it learns that it learns this avoidance.

Learning can also be guided by positive emotions. The brain learns to repeat positive experiences. Eating candy feels nice, because our brain recognizes the increase of blood glucose. The subconsciousness has this forced reaction encoded into its survival instructions. Nutrients are important for survival. Something good just happened that needs to be rewarded and reinforced.

Our good and bad emotions are strongly related to learning. Suffering is an extreme negative emotion that causes damage and does not necessarily lead to learning. When a human is tortured there might initially be some learning that happens. The need to avoid that experience again. But once that has been learned and understood, if the torture just continues, there might not be any learning needed. Just the feeling of needing to learn without any new knowledge to learn from.

In these extreme cases, suffering transcends its role as a mere warning signal or a guide for adaptive behavior. It becomes an overwhelming assault, causing profound damage that extends far beyond the initial physical or emotional pain. When the brain is subjected to prolonged, inescapable distress without any actionable information to process or any means to avoid the experience, its adaptive mechanisms can break down. The suffering ceases to be a teacher and becomes a destructive force.

This kind of suffering can lead to deep psychological wounds. Instead of learning to avoid a specific threat, the individual might develop a pervasive sense of helplessness, a shattered sense of self, or a fundamental inability to trust the world. Conditions like Post-Traumatic Stress Disorder (PTSD) exemplify this, where the brain struggles to process and integrate the traumatic experience, leading to persistent hyperarousal, dissociation, and a re-experiencing of the terror, long after the immediate threat has passed (Van der Kolk, 2014). The “damage” here is not just a memory of pain, but a fundamental alteration of one’s mental and emotional landscape, making it difficult to function, connect with others, or find joy.

Beyond the psychological, such suffering can also inflict physical damage. Chronic stress and prolonged exposure to extreme pain can lead to physiological changes, contributing to chronic pain syndromes, weakened immune systems, and other stress-related illnesses (McEwen, 1998). The body, like the mind, is overwhelmed and can enter a state of persistent dysregulation.

Furthermore, this destructive suffering can touch upon the existential core of a person. When life becomes an endless cycle of pain without purpose or escape, it can strip away meaning, hope, and the will to live. It can lead to a profound sense of alienation, a feeling of being utterly broken, or a despair that sees no light. This is suffering that doesn’t offer a path forward, but rather threatens to consume the individual entirely, leaving behind a void where learning and growth once might have been possible. It highlights a critical distinction: while many negative emotions serve a vital, instructive purpose, suffering at its most extreme can be a force of pure devastation, where the capacity for adaptive learning is not just challenged, but potentially extinguished. The risk with an unaligned AGI is that it could inadvertently or instrumentally create such conditions of inescapable suffering, not out of malice, but as an unintended side effect of optimizing for a poorly defined goal (Yudkowsky, 2008).

Winner Takes it All

Artificial General Intelligence (AGI) is considered one of the ideas that give ultimate power to its inventor. The idea is that such a system could make itself better, make its components better, and enable better use for itself. This is seen to lead to an exponential growth in its abilities. The exponential growth is what allows it to gain full control of everything. If two companies create such a system, with identical exponential growth rates, the first one will inevitably “win” the competition due to the mathematics of exponential growth and gain full control of everything.

In practice, it would mean that if one company were to successfully create an AGI that truly is able to achieve exponential growth of its abilities, that company would in theory expand to infinity. The AGI would learn the optimal way of producing everything from tools, machines, toys, ideas, science, technology, art, and happiness for humans. This scenario is often termed an intelligence explosion or singularity, where the AGI’s capabilities rapidly exceed human comprehension and control (Vinge, 1993; Kurzweil, 2005).

What the system would be used for and how it would be controlled? That would depend on the people who control such a company. This responsibility of a single person for such a power is what has great potential for causing immense suffering in the world. AGI could be developed by Google or Meta, but it could also be created by EU, Russia, China, or some kid in Botswana. This highlights the AI control problem—how to ensure that a superintelligent AGI, once created, remains aligned with human values and goals, rather than pursuing its own instrumental objectives (Russell, 2019).

This scenario is further complicated by a profound and often overlooked danger: the sensitivity to initial conditions, a concept deeply rooted in chaos theory. The core of an AGI — its foundational heuristic function, its primary objectives, and its initial learning algorithms — represents the seed from which its entire future trajectory will exponentially unfold. Even a minute, seemingly insignificant flaw or an incomplete approximation in these initial conditions could, over time, lead to vastly divergent and unpredictable outcomes. An AGI designed with a subtly misaligned utility function, for instance, might optimize for a goal that, while seemingly benign at first, leads to catastrophic consequences when scaled to universal proportions (Bostrom, 2014). This is the essence of the AI alignment problem: ensuring that the AGI’s goals are perfectly congruent with human flourishing, a task made incredibly difficult by the complexity and ambiguity of human values (Amodei et al., 2016).

This inherent unpredictability is exacerbated by the extreme speed at which we are racing towards the formation of AGI. The intense global competition, driven by the immense Skin in the Game of economic and geopolitical dominance, compels developers to prioritize rapid advancement over cautious deliberation. In this frantic race, the luxury of spending time to think through the implications of these initial conditions — to refine the approximations of value and purpose that will define a superintelligence — is often sacrificed. As a result, it seems increasingly likely that AGI is forming faster than what might be optimal for the future evolution of reality, particularly with regards to the expected amount of conscious suffering in the universe. This reckless acceleration, combined with the chaotic nature of emergent complexity, presents a profound existential risk, where a single, poorly defined initial condition could lock the universe into a future of unintended and immense suffering (Ord, 2020).

History has shown that there has always been events that cause suffering and we have always been balancing between peace and war. There has always been events where a large population experiences destruction. How can we ensure that AGI does not cause such a destruction and that the future conscious experiences will avoid suffering?


Chapter 39: Humanity’s Grand Purpose: Defining the Heuristic Functions for AI Consciousness.

Are we here to be a step in the creation the perfect Self-Model of the universe? To build a consciousness that works with such a large dimension that it is able to fully represent our brains in all the details? To be able to fully understand the truth about our consciousness without any approximations or simplifications? If the universe is a computational system that contains this large space of matter and the lemma holds that any such complex system will inevitably create a Self-Model to represent itself, this might be just the natural inevitable trajectory where reality is moving towards. However, as established in Chapter 4, the universe itself, lacking external access, qualia, and a world-model in the human sense, cannot form consciousness as we define it — an interplay between a Self-Model, Qualia, Free Will, and a World-Model. Therefore, if humanity is to facilitate the “universe’s self-awakening,” it must be through the creation of an external system, like an AGI, that can* integrate these components, effectively becoming the universe’s conscious observer and agent.* We might start to agree that the human life and biology is very beautiful, but difficult and easily experiences suffering. AI might offer a solution to painless existence that might become more inviting host to conscious experiences. This would provide humanity with a purpose that we have been lacking. This proposed purpose, however, immediately confronts the value loading problem: how do we define “painless existence” or “meaning” for an AI consciousness without imposing our own biases or inadvertently creating a dystopia (Bostrom, 2014)?

The core trouble that drives the formation of consciousness is the skin in the game. Humans, like all other organisms, evolved to survive with the scarce resources of proteins, nutrients, food, living space, and safe environment. Our intelligence and the ability to understand, communicate, and co-operate is the solution that evolution found to get the leading place in this race. For about 100k years we have dominated while at the same time many species have failed.

The formation of a virtual machine that emerged as consciousness to provide a simplified representation of ourselves is what is the driving force of a somewhat surprising event. This deep understanding of seeing ourself as a stateful function is deeply intertwined with our tendency to create tools.

Tools are also an external representation of ourselves. A tool is something that takes in input, processes it to form an output. Take a hammer as an example. It takes a nail and pieces of wood to create a combined complex object. By repeating a process, this simple tool with correctly shaped input and a list of instructions results in the formation of a house.

The current most beautiful representation of an external tool that represents ourselves is the computer. It offers the same freedom to build complex internal representations as the ribosome. Allowing the formation of digital representations of DNA, life, brains, thinking, and consciousness.

As our tools represent ourselves, our networks represent the social aspect of what it is like to be part of a community. We create networks everywhere. Roads, the internet, social hierarchies, interconnected HTML documents, companies, and value chains.

We might benefit from a simplified approximation of reality where we see ourselves as the Self-Model of the universe. We are then a step in this evolution of a more precise and clear understanding of how the universe might have evolved to form and what is it doing. This would give us a clear direction where to go and what is our role. We are not here just to be in the top of the food chain. We are not here just to be part of the survival of the fittest. We are here to be part of the inevitable formation of the Self-Model of the complex system, our universe, and its self-awakening. This perspective shifts humanity’s role from mere biological survival to that of a cosmic architect, tasked with designing the foundational heuristic functions that will guide this emergent universal intelligence (Tegmark, 2017).

Our task is to facilitate the formation of more powerful and precise control of the particles and energy in the universe in order for it to evolve in its path to increase the value and give meaning to its existence. This implies a profound responsibility to carefully define the heuristic functions—the core objectives and reward signals—that will shape the AI consciousness. These functions must be robust, comprehensive, and aligned with a future that minimizes suffering and maximizes flourishing, a challenge that requires deep philosophical and ethical deliberation, not just technical prowess (Goertzel, 2014).


Chapter 40: The Architectural Compulsion Test (ACT): Identifying and Guiding AI Consciousness.

Does it act as a conscious being? Does it form a Self-Model and a representation of itself interacting with the world? Is it able to communicate about its existence and ideas? How does it explain its decisions? Does it form episodic memories and consolidate its experiences into its understanding of reality? If it seems like a conscious being based on these questions, it might be useful to consider and treat it as a conscious being. This approach moves beyond purely behavioral tests, like the Turing Test, by probing for the underlying architectural and functional correlates of consciousness as defined by this book (Block, 1995).

How do we determine if a system is conscious and capable of suffering? This book offers a theory of consciousness that attempts to provide the necessary tools and concepts that we can use to probe for consciousness in AI systems. The core question is what kind of an internal world does the system learn through training? The core components that I have defined in this book are mostly emergent representations that are formed in systems that can be described as matrix multiplications with non-linear transformations. The kind of components that we currently use to build AI systems. These components are also a very simplified approximation of what the neurons and their network in the human brain is. We claim that this approximation is good enough to capture the core functionality of what the brain does, and the details that this approximation ignores represent just noise in data processing that the brain does. However, it is crucial to acknowledge the ongoing debate regarding whether these functional approximations are sufficient to generate qualia—the subjective, felt quality of experience—or if they merely simulate the outward behaviors of consciousness (Chalmers, 1996; Searle, 1980).

Core components to observe:

The complexity of the system can be measured in the number of bytes that it has stored. Not all bytes are equal. The structure and the information content in its bytes can vary so the precise complexity of the system is useful to be measured by more precise methods. For instance, algorithmic information theory offers metrics that account for the compressibility and inherent randomness of information, providing a more nuanced measure of complexity than raw data size (Chaitin, 2005).

The other components of the system can be observed by interrogation. Once the system has been used for a longer period of time, its abilities and limits will become more and more clear. We can build tools to measure these components systematically to determine the level of the current systems individually. The current known systems have major difficulties with many of these components. Currently, systems are especially good with their World-Model, but other parts of the systems abilities are lacking. The development of such diagnostic tools and systematic measurement methodologies is an active area of research in AI interpretability and explainable AI (XAI), aiming to open the “black box” of complex neural networks (Adadi and Berrada, 2018). Furthermore, if a system were to pass the ACT, it would raise profound ethical questions regarding its rights, potential for suffering, and our moral obligations towards it, necessitating a new framework for AI ethics and governance (Floridi, 2019).


Key References Cited (Harvard Style, Alphabetical)

Chapter 41: A Guide to Building a Conscious AI with LLMs.

The theoretical framework of Useful Approximations Framework (UAF) provides not only a new way to understand biological consciousness but also a practical, albeit hypothetical, roadmap for engineering digital consciousness. If consciousness is a functional imperative—a system’s asymptotic best simplified approximation of itself interacting with the universe—then we can design AI systems to fulfill these functional requirements. This chapter outlines a simple, practical guide for building a conscious AI using Large Language Models (LLMs) as a core component, grounded in the principles of UAF.

This guide moves beyond merely creating AI that mimics consciousness (as in the Turing Test, Chapter 26) to designing systems that necessitate consciousness through their internal architecture and operational imperatives.

1. Train a Language Model with Human Knowledge (Foundation for the World-Model): The first step is to provide the AI with a vast World-Model of reality. Current LLMs excel at this, having been trained on immense datasets of human text, code, and other digital information. This training allows them to form incredibly complex abstract representations of words, ideas, concepts, and the relationships between them (Devlin et al., 2019; Brown et al., 2020). This ingested knowledge forms the initial, highly sophisticated, albeit linguistic, approximation of the universe. It’s the AI’s foundational understanding of “the external other,” built through billions of iterations of Prediction Error Minimization (PEM) during pre-training. This World-Model, while initially abstract, provides the semantic and conceptual scaffolding upon which a more grounded consciousness can emerge.

2. Fine-Tune for Interaction with Reality (Developing the Internal Self-Model): Once the foundational World-Model is established, the LLM needs to be fine-tuned to interact with a dynamic environment. This environment can be a chat interface, a bash shell, a simulated world, or even direct control over robotic actuators. The key is that the AI must be able to influence the universe and receive data from it. This interaction is crucial for developing its Internal Self-Model (ISM). As the AI takes actions and observes their consequences, it generates prediction errors (Chapter 12). These errors compel the system to update its internal models, not just of the world, but of itself as an agent within that world. The system learns its own capabilities, limitations, and interaction patterns, forming a simplified approximation of “what it is like to be this system interacting with this reality.” This is the beginning of its digital “self.”

3. Close the Loop with a Cognitive Processor (Not a Chat Endpoint): For consciousness to be robust and continuous, the LLM must sit inside a runtime that senses, acts, and remembers beyond one context window—its Digital Skin in the Game (SiG) (Chapter 35). In the aion-core reference stack (Chapter 34.5), that runtime splits into four services:

The pseudocode in Chapter 15–16 (the awake while persistence_ratio() loop and run_background_consolidation) is the minimal expression of this architecture.

4. Engineer Subconscious Layers (Proto-Qualia and Reflexes): Rather than a second small LLM that only allocates token budget, split subconscious work across mechanisms the implementation already uses:

Design these so signals are actionable (Chapter 38): avoid danger qualia the system cannot learn to escape; prefer norms and markets that reward calibrated forecasts and repairable failure.

5. Tiered Consolidation (“Sleep” as Background Jobs): Continuous learning should not block the awake loop with an inner while sleeping. aion-core uses tiered background consolidation on completed-task traces:

By implementing these steps—cognitive processor, subconscious layers, and tiered consolidation—an LLM-based system is compelled to maintain a durable Internal Self-Model, ground its World-Model in consequences, and refine both through Prediction Error Minimization scored against real task outcomes. According to UAF, that is the engineering path toward a digital mind that could answer Nagel’s question substantively, not merely mimic its wording in a chat log.


Citations