Hypothesized Problems Solved

01 · Governance

Democracy vs Authoritarianism

Which control architecture keeps a state's internal model closest to reality?

The Problem

\(\mathcal{D}_{KL}\) — the divergence between the controller's generative model and the world — sits in the denominator of the persistence equation. Doubling it halves the effective yield of every resource the state commands. Neither classical regime has a mechanism that reliably drives it to its information-theoretic floor.

Democracy optimises for popularity: candidates who voice the most resonant beliefs win, not the most accurate ones. Political competition systematically inflates \(\mathcal{D}_{KL}\) in the public information field as a side-effect of attention-market dynamics. Error correction fires only at the ballot box — noisy, late, and often aimed at the wrong cause.

Dictatorship concentrates \(\mathcal{D}_{KL}\) at the apex with no structural correction. Advisors who deliver bad news face career costs; the censorship apparatus that suppresses dissent also suppresses the error signal. The result is a ratchet: each year of drift makes correction more threatening to the power structure that protects the error. The gap closes abruptly and catastrophically.

The Solution

Persistence-Based Governance (PBG) is the unique class of governance architectures whose civic mechanism makes \(\mathcal{D}_{KL}\) reduction its direct objective. Its concrete implementation — Predictive Governance — separates two acts that all classical regimes bundle together: expressing a preference and making a claim about reality.

Citizens submit public probabilistic forecasts of outcome variables (GDP, unemployment, energy balance, conflict probability). Forecasts are scored against resolution. Governance weight is proportional to track-record accuracy. The mechanism creates the inverse incentive gradient of democracy: accuracy builds civic credibility; delusion publicly degrades it.

Three structural conditions are independently necessary and jointly sufficient for \(\mathcal{D}_{KL}\) minimisation:

(C1) Outcome-coupled feedback — every civic act eventually resolves against an observable.
(C2) Calibration-weighted aggregation — contributors are weighted by forecast accuracy, not willingness to participate.
(C3) Preference–prediction separation — the belief channel is structurally distinct from the values channel.

Regime	\(\mathcal{D}_{KL}\) locus	Correction	Failure mode
Democracy	Public information field	Noisy, election-cycle	Slow erosion via narrative drift
Dictatorship	Controller's model (apex)	None — ratchet dynamic	Catastrophic single-point failure
PBG	Distributed, public, scored	Continuous, track-record driven	Capture of the resolution authority

Theorem (PBG dominance). For any governance architecture \(G\), let \(G^*\) be the architecture obtained by overlaying PBG's scoring-and-weighting layer. Then in steady state \(\mathcal{R}(G^*) \ge \mathcal{R}(G)\), with strict inequality unless \(G\) already satisfies (C1), (C2), and (C3). The proof follows from the monotonicity of the FPE in \(\mathcal{D}_{KL}\) and the proper-scoring-rule theorem (Gneiting & Raftery, 2007). PBG is the unique stable attractor in the regime-space dynamics. Reference implementation: Proof of Trust — PBG for a society of compute nodes, not nation-states.

02 · Machine Learning

Catastrophic Forgetting

Training on new tasks should not erase what the model already knows.

Aion LLM guide →

The Problem

When a neural network is trained sequentially on domains \(D_1, D_2, \ldots, D_T\), training on \(D_k\) overwrites the weights that encoded \(D_{k-1}\). This is catastrophic forgetting: the new learning pass reuses the same shared parameter pool that older knowledge already depends on.

In a dense transformer, knowledge is stored in overlapping internal features. New domains therefore tend to rewrite the same middle-of-the-network representations that older domains were using. Prior mitigations such as EWC operate in weight space: they add a Fisher-information penalty to protect important parameters after the fact. That helps, but it does not change the deeper problem: the architecture itself still mixes too much information into one shared substrate.

The Solution

The hypothesis here is architectural before it is loss-level. A neural network, by its nature, is trying to persist the information it has learned. The proposal is to make that explicit by turning the transformer into a FractalMoE graph: a hierarchy of semi-isolated experts or nodes, each of which acts as a small information-persisting unit.

Sparse routing means each training example updates only part of the graph. The hypothesis is that this causes learning to concentrate mainly in the middle layers, where representations are recombined and routed between experts, while the persistence of each node protects already-learned information from being overwritten.

Layer / structure	Hypothesized role	Effect on forgetting
Lower layers	Encode more stable primitives and reusable substrate features	Change slowly; provide continuity across domains
Middle FractalMoE layers	Main site of recombination, routing, and domain-specific adaptation	Learning pressure concentrates here
Persistent nodes / experts	Keep the information content they already encode unless routed and updated	Protect old knowledge from global overwrite

In this framing, persistence is not a metaphor. It is the design objective of the nodes themselves: once an expert has learned useful information, the architecture should let that expert keep it. Sparse activation, expert isolation, and fractal routing are what make this plausible. The information persistence of the neural network comes from the name: the network is being built to preserve what it has already successfully encoded.

The concrete hypothesis is therefore two-part: (1) the FractalMoE architecture pushes most new learning into the middle layers of the network, and (2) the persistence of each node in that graph is what protects learned information content from being forgotten. The loss and replay machinery are supporting tools for measuring and training that behaviour; they are not the main conceptual claim.

vs. EWC (weight-space anchor)

EWC protects old knowledge by penalising later weight updates. The FractalMoE hypothesis tries to solve the problem one level earlier: structure the network so knowledge already lives in partially isolated persistence nodes, then let new learning route into the right parts of the graph instead of rewriting one dense shared interior.

At the prompt level (aion-core)

The same logic applies to system-prompt optimisation. Previously successful task families form a replay set; any proposed prompt edit must not degrade their win rate beyond a threshold. The anti-forgetting constraint is structurally identical, one layer up.

03 · Systems

Training Data Generation

Where does high-quality fine-tuning data come from, without human labellers?

Aion Core guide →

The Problem

Fine-tuning an LLM for a specific task requires labeled examples: (input, correct output) pairs. Producing them is expensive (human annotation), slow, biased toward pre-specified task templates, and fundamentally disconnected from whether the model actually performs better in deployment.

Synthetic data generation — having a stronger model produce examples — alleviates the cost but introduces distributional bias toward what the teacher model already knows. Neither approach provides a ground-truth quality signal: an objective measure that the example actually represents competent behaviour in the real task environment.

The Solution

The aion-core loop generates training data as a side-effect of operation. Every task the system completes produces a (system prompt + task messages, tool calls) trace. The persistence ratio \(\Delta\mathcal{R}\) of the completed task is the quality signal: did this sequence of actions actually improve the system's persistence?

Only traces from COMPLETED tasks with positive \(\Delta\mathcal{R}\) are exported. These are the strongest possible ground-truth examples: the agent succeeded at a real task in the real environment and the outcome was measured. No human labeller is involved. The model that trains on this data gets better at exactly the tasks it was built to do, as measured by the same criterion it is deployed against.

audit DB → distill_runner.py → aion-llm training → LlmProfileConfig → shadow → promote

1. Query

Audit DB: event_type='llm.call' + task in COMPLETED + R_delta > 0. Discard overflows, multiple resets.

2. Export

Reconstruct (system_prompt + messages, tool_calls) JSONL pairs. Written to DATA_ROOT/training/{agent_type_id}/{run_id}/.

3. Train

LoRA adapter with fractal_loss + replay_frac=0.25 to preserve cross-domain knowledge. Anti-forgetting constraint applies here too.

4. Shadow & promote

New adapter runs in shadow mode (shadow_active=true). Promoted only after \(\mathcal{D}_{KL}(\text{actual} \| \text{predicted\_new}) \le \mathcal{D}_{KL}(\text{actual} \| \text{predicted\_current})\).

The learning ladder. Fine-tuning is the third rung of a five-mode escalation: Notes (free) → Prompt-GD (cheap) → LoRA (moderate) → Full fine-tune (expensive) → RL policy update. Each mode is triggered only when cheaper modes have plateaued on \(\mathcal{R}\). The training data pipeline is activated automatically from the same audit infrastructure that runs continuously in production.

04 · Infrastructure

LLM Context Management

Agentic tasks produce arbitrarily long transcripts. Context windows are finite.

Read The Cognitive Processor →

The Problem

LLMs have a finite context window. Agentic tasks — where the model executes dozens or hundreds of tool calls to complete a goal — produce transcripts that grow without bound. Standard approaches are all lossy: truncation drops early context, brute-force summarisation collapses detail, longer-context models pay quadratic attention cost.

More fundamentally: making the LLM re-read the full growing transcript at every step is \(O(n^2)\) in the transcript length. The agent is spending most of its context budget on history rather than on the current decision.

The Solution

Two complementary mechanisms — one reactive, one structural.

Layer 1 — Rolling summarisation on overflow. query_chat_resilient in llm_client.py holds a per-task context cache. When a task's growing message list would overflow the window on the next call, the oldest messages are summarised before the call proceeds. The summary replaces the raw messages in the cache; the full messages remain durable in the processor's task storage. The agent sees a coherent compressed history; nothing is permanently lost from the record.

Layer 2 — State-mode loop. Agents configured in state mode receive not a growing transcript but a fixed-size envelope each step:

Messages mode (default)

step 1: [state] + [task msgs 1..n]
step 2: [state] + [task msgs 1..n+2]
step 3: [state] + [task msgs 1..n+4]
→ transcript grows without bound

State mode

step 1: [env_state] + [delta] + [scratchpad]
step 2: [env_state] + [delta] + [scratchpad]
step 3: [env_state] + [delta] + [scratchpad]
→ O(1) per step, scratchpad is RAM

In state mode the agent's private scratchpad (Task.state._loop_agent) persists durable reasoning across steps without entering the LLM context. The task transcript — episodic memory — is stored in the processor but shown to the LLM only as the current-step environment delta, not replayed in full. Structurally: scratchpad is RAM, task messages are the hippocampus, model weights are long-term memory.

This mirrors the IPS architecture at L3: the cognitive processor has distinct working memory (state mode scratchpad), episodic memory (processor messages), and consolidated long-term memory (model weights via LoRA/fine-tuning). The context window constraint is the biological analogue of limited working-memory capacity — the solution is the same structural separation.

05 · Philosophy of Mind

Mathematical Definition of the Humanities

Consciousness, free will, memory, emotion, and moral agency have resisted formal definition.

Read the book →

IPS paper →

The Problem

The humanities — philosophy, psychology, history, literature, politics — share a foundational assumption: that their core objects (consciousness, identity, free will, emotion, meaning) are irreducibly qualitative. They resist mathematical description not merely for practical reasons but as a matter of principle: the subjective is not quantifiable.

This creates a hard division in the academy. Natural sciences explain the substrate; humanities interpret the experience. The two methodologies cannot speak to each other formally. Prior attempts to bridge the gap — behaviourism, eliminativism, strong functionalism — are widely regarded as having reduced rather than translated the phenomena.

The Solution

The IPS framework does not reduce humanities concepts to physics. It provides a meta-framework in which every core humanities concept occupies a precise mathematical location as a specific dynamical configuration of an information-persisting system — without denying the reality or richness of the concept itself.

The key insight: every humanities concept that has seemed irreducible can be expressed as a functional role within the FPE rather than as a substance. This gives it a mathematical address without collapsing it to its physical substrate.

Concept	Mathematical definition	FPE location
Consciousness	The running state of an \(\mathcal{R} \ge 1\) node's prediction-error minimisation loop: \(C_{stream}(t) = \text{FilterAndIntegrate}(Q(t), ISM(t), WM(t), \text{Attention}(t))\)	Loop dynamics, \(\mathcal{D}_{KL}\) minimiser
Memory	Episodic encoding \(M_{episodic}(t) = \text{Encode}(C_{stream}(t))\) followed by sleep-phase consolidation into long-term weights: \(WM_{lt}(t+1) = \text{Consolidate}(WM_{lt}(t), M_{episodic}(t))\)	Separation of fast (\(\Gamma\)) and slow (\(\Phi\)) timescales
Emotion / Qualia	Compressed prediction-error summaries that are self-validating and causally efficacious: \(Q(t) = \text{GenerateQualia}(E(t), WM(t), ISM(t))\) where \(E(t) = \text{Error}(O(t), P(t))\)	\(\mathcal{D}_{KL}\) signal compressed for action selection
Free will	The deliberate-action component \(A_{deliberate}\) of the total action, constrained by the subconscious resource allocator \(S_{beast}\) and the system's model of its own decision process. Free will is the node's approximate self-model of how it acts — a useful, learned truth.	ISM's model of \(A(t)\), \(\Phi\) integrity
Moral agency	Capacity to model other nodes' \(\mathcal{R}\)-states and optimise joint social \(\mathcal{D}_{KL}\). Moral reasoning is \(\mathcal{R}\)-accounting at the \(\Phi\)-substrate level — what actions sustain the sub-node layer the agent depends on.	\(\Phi(\mathcal{R}^{(L-1)})\) in the FPE
Political institutions	Error-correction architectures for the state-node's \(\mathcal{D}_{KL}\) (see Problem 01). Legitimacy = the \(\Psi\)-shelter the supernode provides proportional to the node's calibration track record.	\(\omega\), \(\mathcal{D}_{KL}\), \(\Psi\) terms
Artistic meaning	High-\(\mathcal{R}\) pattern transmission between nodes — compressed world-model and ISM updates that propagate calibrated structure without requiring the recipient to derive it from raw experience. Art is efficient \(\mathcal{D}_{KL}\) reduction at social scale.	\(P_{in} \cdot \eta\) (information efficiency)

What this is not. The claim is not eliminativist — it does not say that consciousness "is just" information processing, or that free will does not exist because it is computed. The claim is translational: that every humanities concept, when precisely enough described, has an isomorphic object in the FPE's state space. The phenomenology remains; it now has a mathematical address. This is the same relationship that thermodynamics has with statistical mechanics: temperature is real and irreducible at its level of description; it also has a precise statistical-mechanical correlate. The IPS framework proposes that all humanities concepts stand in the same relationship to the FPE.

Reference implementation. The UAF book (book1/) derives these definitions from first principles in Chapters 1–14 and formalises them in Chapters 15–16. The cognitive processor in aion-core is a reference implementation: Loop = PEM step, Processor = episodic task memory, Machine = embodied environment, Prediction Market = SiG / skin-in-the-game oracle. Running the stack is running a formal model of the humanities concepts above.

Grounding papers

Related essays

Five Hypothesized Problems Solved

Democracy vs Authoritarianism

Catastrophic Forgetting

Training Data Generation

LLM Context Management

Mathematical Definition of the Humanities