Lasse Hyyrynen Independent researcher
May 18, 2026
We develop a physical theory of information-persisting systems (IPS): bounded, far-from-equilibrium patterns whose identity is preserved across time by the active processing of information. Combining non-equilibrium statistical mechanics, stochastic thermodynamics, and the variational form of the free-energy principle, we show that the persistence of any IPS over a time scale \(\tau\) much larger than its relaxation time is governed by a single dimensionless quantity, the persistence ratio \(\mathcal{R}\), which compares the rate at which the system imports usable free energy and predictive information to the rate at which it must dissipate entropy in order to maintain its bounding Markov blanket. We derive \(\mathcal{R}\) from first principles by combining (i) the second law applied to an open subsystem with a Markov blanket, (ii) Landauer’s bound on the thermodynamic cost of information processing, (iii) the fluctuation theorems of stochastic thermodynamics, and (iv) the variational free-energy upper bound on Bayesian surprise. We prove that \(\mathcal{R}\ge 1\) is a necessary condition for an IPS to persist over time scales much longer than the Landauer time of its internal degrees of freedom, and that violation of \(\mathcal{R}\ge 1\) implies dissolution in time bounded by an explicit estimator depending on the system’s internal free-energy reservoir. We further establish a fractal composition law: the persistence ratio of an IPS at organisational level \(L\) depends explicitly on the persistence ratios of its constituent IPS at level \(L-1\) and of its enclosing IPS at level \(L+1\). The theory unifies disparate examples — nuclear binding stability, biological homeostasis, autocatalytic chemistry, organisational survival, ecosystem viability — as instances of a single dimensionless accounting identity, and yields falsifiable predictions for the onset of collapse in driven complex systems.
Keywords: non-equilibrium thermodynamics, dissipative structures, Markov blanket, free-energy principle, Landauer’s principle, stochastic thermodynamics, information physics, persistence, ageing.
A central puzzle of physics is that the universe is everywhere both diversifying (the second law) and organising (galaxies, stars, cells, minds, societies). The orthodox resolution, due to Schrödinger (1944), Prigogine (1967), and Haken (1983), is that organised structures are dissipative: they are sustained by exporting more entropy to the environment than they generate internally, importing free energy in the process. They are non-equilibrium steady states (NESS) of open systems.
Two strands of twentieth-century physics have since deepened this picture but have rarely been welded into a single quantitative law. The first is information physics: Szilard (1929), Landauer (1961), Bennett (1982), and Lloyd (2006) showed that information processing has unavoidable thermodynamic cost, formally \(k_B T \ln 2\) per logically irreversible bit erasure. The second is the free-energy principle of Friston (2010, 2019), which proposes that any self-organising system possessing a Markov blanket implicitly minimises a variational upper bound on the Kullback–Leibler divergence between its internal states and the posterior over hidden environmental causes. Both strands point at the same object — the informational profit margin of a system — but they have not been combined into an explicit thermodynamic accounting identity.
This paper performs that combination. We define an information-persisting system (Section 2) and derive (Section 3–4) the persistence equation, expressing the dimensionless ratio \(\mathcal{R}\) of usable information–energy income to entropic and informational tax. In Section 5 we prove three theorems: \(\mathcal{R}\ge 1\) is necessary for long-term persistence (Theorem 1); KL divergence between internal model and environment imposes an exponential thermodynamic penalty (Theorem 2); persistence at any level supervenes on persistence at neighbouring levels (Theorem 3). Section 6 instantiates the theory in nuclear physics, autocatalytic chemistry, cellular biology, ecology, and large organisations. Section 7 discusses falsifiability, relation to integrated information theory, and the boundary with mere homeostasis. Section 8 concludes.
The theory is intended as a physical theory in the strict sense: it makes quantitative, falsifiable predictions about dissipation, error correction, and lifetime, and it reduces correctly to known limiting cases (Boltzmann’s \(\dot S \ge 0\), the Landauer bound, Friston’s free-energy bound). It is not a metaphysics of life or consciousness; those higher questions appear here only insofar as they are special cases of the same accounting identity.
We work in natural thermodynamic units with \(k_B = 1\) unless stated otherwise; temperatures \(T\) are in energy units, entropy \(S\) in nats. Logarithms with explicit base (\(\log_2\)) are used when bits are the natural unit. Random variables are upper-case (\(X\)), realisations lower-case (\(x\)), probability densities \(p(\cdot)\). The Kullback–Leibler divergence between distributions \(p\) and \(q\) on the same alphabet is \[ \mathcal{D}_{\mathrm{KL}}(p\,\|\,q) \;=\; \sum_x p(x)\log\frac{p(x)}{q(x)} \;\ge\; 0. \] Time derivatives are denoted \(\dot X \equiv dX/dt\). The symbol \(\Sigma\) denotes the system; \(\mathcal{E}\) its external environment; \(\partial\Sigma\) the Markov blanket separating them.
Consider a closed Hamiltonian universe with phase space \(\Gamma\). A subsystem \(\Sigma \subset \Gamma\) is a region of \(\Gamma\) whose dynamics are conditionally separable from its complement \(\mathcal{E} = \Gamma \setminus \Sigma\) by a set of blanket states \(\partial\Sigma\). Formally, the joint density factorises as \[ p(\sigma, b, e) \;=\; p(\sigma \mid b)\, p(e \mid b)\, p(b), \qquad \sigma \in \Sigma_{\mathrm{int}},\; b \in \partial\Sigma,\; e \in \mathcal{E}, \tag{2.1} \] so that internal and external states are conditionally independent given the blanket: \(\Sigma_{\mathrm{int}} \perp\!\!\!\perp \mathcal{E} \mid \partial\Sigma\). This is the Markov-blanket condition (Pearl, 1988; Friston, 2013). Membrane lipids enclosing a cell, the boundary of a galaxy’s gravitational well, the firewall of a server, and the legal perimeter of a corporation are all empirical examples.
Let \(\Phi: \Sigma \to \mathcal{M}\) be a coarse-graining map onto a finite macrostate alphabet \(\mathcal{M}\), the set of recognisable “shapes” of the system. The identity of an IPS at time \(t\) is the trajectory \(m(t) = \Phi(\sigma(t)) \in \mathcal{M}\). The system persists over interval \([t_0, t_0+\tau]\) if \(m(t)\) remains within a designated equivalence class \([m_0]\subset \mathcal{M}\) for all \(t\) in the interval. Persistence is therefore a property of trajectories, not states.
Definition 2.1 (IPS). A subsystem \(\Sigma\) is an information-persisting system if all of the following hold:
Boundedness. \(\Sigma\) possesses a Markov blanket \(\partial\Sigma\) satisfying (2.1).
Drive. \(\Sigma\) is in a non-equilibrium steady state: there exist non-vanishing fluxes of energy \(\dot E_{\mathrm{in}}\) and entropy across \(\partial\Sigma\), and the internal entropy production rate \(\dot S_{\mathrm{int}}>0\).
Internal model. There exists an internal random variable \(\mu(t)\in\Sigma_{\mathrm{int}}\) — the internal model — and a generative density \(q_\mu(e)\) over external causes such that the dynamics of \(\Sigma_{\mathrm{int}}\) can be written as a gradient descent on the variational free energy \[ \mathcal{F}[\mu] \;=\; \mathbb{E}_{q_\mu}[-\log p(b\mid e)] - H[q_\mu] \;\ge\; -\log p(b), \tag{2.2} \] to within stochastic noise (Friston, 2010).
Identity preservation. There exists \(\tau \gg \tau_{\mathrm{relax}}\) such that \(m(t)\in [m_0]\) over \([t_0, t_0+\tau]\).
The class of IPS thus formalises Schrödinger’s “what is life?” criterion (drive + boundary + memory) without restricting it to biology. We will see that nuclei, autocatalytic sets, eddies, cells, ant colonies, and firms all satisfy the four clauses, but only over their respective characteristic persistence times.
Definition 2.1 is recursive. The internal degrees of freedom \(\Sigma_{\mathrm{int}}\) generically decompose into further subsystems satisfying (i)–(iv) at a finer scale. We write \(\Sigma^{(L)}\) for an IPS at level \(L\) and require, for every \(L\), \[ \Sigma^{(L)} \;\cong\; \bigl(\{\Sigma_i^{(L-1)}\}_i,\,\partial\Sigma^{(L)},\,\mu^{(L)}\bigr), \tag{2.3} \] so that \(\Sigma^{(L)}\) is a graph of \(\Sigma_i^{(L-1)}\) nested inside its blanket. Cells nest into organisms, organisms into colonies, colonies into ecosystems; quarks into nucleons, nucleons into nuclei, nuclei into atoms. The persistence theory must be consistent under such composition; we will see in Section 5 that consistency forces an explicit coupling between \(\mathcal{R}^{(L)}\) and \(\mathcal{R}^{(L\pm 1)}\).
Scale limits and the global fractal graph. The level index \(L\) orders persisting patterns from fine to coarse. At the upper bound, the closed universe \(\Gamma\) (Section 2.1) is the ultimate phase space: every IPS \(\Sigma \subset \Gamma\) has environment \(\mathcal{E}=\Gamma\setminus\Sigma\). When \(\Sigma\) approaches the whole universe there is no further enclosing IPS — shelter is undefined and \(\Psi\to 1\) by convention at \(L=L_{\max}\). In practice the highest persisting IPS we model is cosmic-scale structure (galaxies, the expanding background) nested under \(\Gamma\), not \(\Gamma\) itself as a driven subsystem. At the lower bound, decomposition (2.3) terminates at constituents below which we do not resolve separate blankets on the timescales of interest; in the physical examples of this paper, hadrons and quarks furnish the practical floor \(L_0\). Fundamental physics may extend \(L_0\) further (leptons, gauge fields); the accounting is unchanged. Between \(L_0\) and \(L_{\max}\), reality is not a single linear chain but a fractal graph: each node \(\Sigma^{(L)}\) has substrate edges to \(\{\Sigma_i^{(L-1)}\}\) and shelter edges to enclosing \(\Sigma_j^{(L+1)}\) (possibly overlapping, Section 2.5), with atoms, cells, organisms, ecosystems, firms, and polities as intermediate nodes — branching, skipping levels, and cross-linking where blankets overlap.
Readers often import two intuitions that the formalism already accommodates but does not name explicitly. They must be kept distinct.
Composition (made of). No IPS at level \(L\) is atomic. By (2.3), \(\Sigma^{(L)}\) is formed from a collection of sub-IPS \(\{\Sigma_i^{(L-1)}\}_i\) with heterogeneous structural roles: a human from organs, immune networks, and microbiome; a polity from firms, households, infrastructure, and territory; an ecosystem from species and nutrient cycles. The postfactor \(\Phi\) in (4.4) is the aggregate substrate integrity of this graph — not a single part, but a weighted summary of which constituents themselves satisfy \(\mathcal{R}_i^{(L-1)}\ge 1\). Theorem 5.3 states the consequence: failure of any critical \(\Sigma_i^{(L-1)}\) drives \(\Phi\to 0\) and collapses \(\Sigma^{(L)}\) regardless of how well the enclosing environment is buffered.
Shelter (belongs to). Equation (4.4) displays one shelter coefficient \(\Psi\) from a single enclosing IPS at level \(L+1\). In practice, many IPS are simultaneously attenuated by several enclosures that need not form a strict physical nesting tower: a person by body, household, and employer; a state by its climate zone, regional alliance, and supranational union (e.g. Finland simultaneously sheltered by Nordic cooperation, EU institutions, and global trade architecture). Where an enclosure actually buffers environmental noise for the node — attenuating shocks that would otherwise raise \(\mathcal{E}_\Sigma\) — it contributes a shelter channel \(\Psi_j\in[0,1]\). Overlapping shelters combine conservatively as \[ \Psi_{\mathrm{eff}} \;=\; \prod_j \Psi_j \qquad\text{or}\qquad \Psi_{\mathrm{eff}} \;\le\; \min_j \Psi_j, \tag{2.4} \] with the product the leading-order estimate when channels attenuate independent noise modes and the minimum a worst-case bound. Nested physical blankets (galaxy \(\to\) planet \(\to\) cell) are the special case of a chain with one dominant \(\Psi\) per scale. Mere categorical membership without thermodynamic buffering does not enter the accounting — a label is not a Markov blanket.
Do not conflate the two relations: “Finland is made of people, firms, and territory” is substrate (\(\Phi\), (2.3)); “Finland is in the EU” is shelter (\(\Psi_j\)) only insofar as the union attenuates trade, security, or regulatory shocks that would otherwise flood the national IPS. A node may have a rich heterogeneous substrate and several overlapping shelters without being enclosed by only one parent.
\(\Phi\) and \(\Psi\) are graph-indexed operators, not fixed averages. On the substrate graph, critical constituents in series enter \(\Phi\) through a minimum (Theorem 5.3); redundant (parallel) constituents through softer aggregates. On the shelter graph, independent noise channels combine as a product (2.4); overlapping buffers on the same noise obey a minimum bound. Choosing parallel vs series substrate — stockpiles vs lean chains — is a redundancy trade-off: higher tail resilience vs higher \(\omega\) and lower calm-environment efficiency. Mixed real graphs (nations, ecosystems, designed fractal MoE) require explicit cut-set analysis; criticality is identified operationally, not deduced from (4.4) alone. Canonical essay: docs/substrate_shelter_and_redundancy.md.
For \(\Sigma\) in contact with an environmental reservoir at temperature \(T\), the first and second laws read (de Groot & Mazur, 1962) \[ \dot E_\Sigma \;=\; \dot E_{\mathrm{in}} - \dot E_{\mathrm{out}} - \dot W, \tag{3.1} \] \[ \dot S_\Sigma \;=\; \dot S_{\mathrm{int}} + \dot S_{\mathrm{flow}}, \qquad \dot S_{\mathrm{int}} \ge 0, \tag{3.2} \] where \(\dot W\) is the work extracted from the system, \(\dot E_{\mathrm{in/out}}\) are heat/matter fluxes, and \(\dot S_{\mathrm{flow}}\) is the entropy flux across the blanket. In a NESS, \(\dot E_\Sigma = \dot S_\Sigma = 0\), so internal entropy production must be exactly compensated by an outward entropy flux: \[ \dot S_{\mathrm{int}} \;=\; -\dot S_{\mathrm{flow}}\;\ge\; 0. \tag{3.3} \] Equation (3.3) is the fundamental quantitative statement of Schrödinger’s “negentropy import”: the rate at which an IPS exports disorder.
Within \(\Sigma_{\mathrm{int}}\), the internal model \(\mu\) encodes hypotheses about external causes. To remain a useful model over time, \(\mu\) must be updated as the environment changes; this entails erasing or overwriting prior internal states. Landauer’s principle (Landauer, 1961; Bennett, 1982; Lutz & Ciliberto, 2015 for experimental tests) implies that each logically irreversible bit erasure dissipates at least \[ \Delta Q_{\mathrm{Landauer}} \;\ge\; k_B T \ln 2 \tag{3.4} \] of heat to the environment. If \(\dot N_{\mathrm{erase}}\) is the rate of bit erasures required to track the environment, the informational dissipation rate is bounded below by \[ \dot Q_{\mathrm{info}} \;\ge\; \dot N_{\mathrm{erase}}\, k_B T \ln 2. \tag{3.5} \] Modern stochastic-thermodynamic refinements (Sagawa & Ueda, 2009; Parrondo, Horowitz & Sagawa, 2015) show (3.5) is tight up to information-thermodynamic feedback bonuses, none of which permit \(\dot Q_{\mathrm{info}}<0\) in steady state.
Every internal degree of freedom of \(\Sigma\) is coupled to fundamental fluctuations: thermal (\(k_B T\)), quantum (\(\hbar\omega/2\)), and field-theoretic (vacuum/zero-point). Their summed power spectral density at the scale of the system’s information-carrying modes is the internal noise floor \(\mathcal{E}_{\Sigma}\). For a system of complexity \(\omega\) (number of constraints/bits required to specify its blanket and internal state), the minimum error-correction power required to keep the noise from dissolving the structure is \[ P_{\mathrm{maint}} \;\ge\; \omega\, \mathcal{E}_{\Sigma}\, \ln 2 / \tau_{\mathrm{Landauer}}, \tag{3.6} \] where \(\tau_{\mathrm{Landauer}}\) is the system’s bit-level update cycle. Equation (3.6) follows by partitioning \(\omega\) bits among the noise channels and applying (3.5) to each.
For an internal model \(\mu\) with generative density \(q_\mu(e)\), the (negative) variational free energy upper-bounds Bayesian surprise (Friston, 2010, eq. 2): \[ -\log p(b) \;\le\; \mathcal{F}[\mu] \;=\; \mathcal{D}_{\mathrm{KL}}\bigl(q_\mu(e\mid b)\,\|\,p(e\mid b)\bigr) \;-\; \log p(b). \tag{3.7} \] Equality holds iff \(q_\mu = p(\cdot\mid b)\). The non-negative term \[ \mathcal{D}_{\mathrm{KL}}^{(\Sigma)} \;\equiv\; \mathcal{D}_{\mathrm{KL}}\bigl(q_\mu \,\|\, p(\cdot\mid b)\bigr) \;\ge\; 0 \tag{3.8} \] is the delusion divergence: a quantitative measure of the discrepancy between the IPS’s internal model and the true posterior over external causes. By the Crooks identity (Crooks, 1999; Jarzynski, 1997) and stochastic-thermodynamic generalisations (Seifert, 2012), each nat of delusion incurs at least one nat’s worth of excess dissipation per update cycle: \[ \dot Q_{\mathrm{delusion}} \;\ge\; T\,\mathcal{D}_{\mathrm{KL}}^{(\Sigma)} / \tau_{\mathrm{model}}, \tag{3.9} \] where \(\tau_{\mathrm{model}}\) is the model-update timescale. A deluded model pays Landauer thrice: once to act, once to be wrong, once to correct.
We now combine (3.1)–(3.9) into a single dimensionless identity.
The system’s usable power income is the rate at which environmental free energy is harvested and converted to information-bearing work after coupling efficiency \(\eta_I \in (0,1]\): \[ P_{\mathrm{in}}^{\mathrm{usable}} \;=\; \eta_I(\mathcal{D}_{\mathrm{KL}}^{(\Sigma)})\,P_{\mathrm{in}}, \tag{4.1} \] where the efficiency depends explicitly on the model’s predictive quality. To leading order one has the Carnot-like bound \[ \eta_I \;\le\; 1 - \frac{\mathcal{D}_{\mathrm{KL}}^{(\Sigma)}}{\log|\mathcal{E}|}, \tag{4.2} \] with \(\log|\mathcal{E}|\) the maximum entropy of the environment (proof: a system whose model is uniform over environmental causes — maximum delusion — extracts no free energy from environmental gradients, since it cannot distinguish exploitable structure from noise).
The total dissipation required to keep \(\Sigma\) in its present macrostate is the sum of three irreducible terms: \[ P_{\mathrm{out}}^{\mathrm{req}} \;=\; \underbrace{\omega\,\mathcal{E}_{\Sigma}}_{\text{noise floor}} \;+\; \underbrace{\omega\,\mathcal{E}_{\Sigma}\,\mathcal{D}_{\mathrm{KL}}^{(\Sigma)}}_{\text{delusion tax}} \;+\; \underbrace{\omega\,\mathcal{E}_{\Sigma}\,\Gamma^{(\Sigma)}}_{\text{senescence}}. \tag{4.3} \] The first term is the Landauer-Bennett floor (3.6). The second follows from (3.9) and the identification \(\tau_{\mathrm{model}} \approx \tau_{\mathrm{Landauer}}\) for a system that maintains a model at its own update rate. The third, \(\Gamma^{(\Sigma)}\ge 0\), captures structural fatigue: irreversible damage to \(\partial\Sigma\) from accumulated micro-events (radical chemistry in cells, lattice defects in nuclei, employee turnover in firms). \(\Gamma\) is identically zero in idealised statistical-mechanical models with perfect error correction; it is empirically large in any real macroscopic IPS.
Dividing income by debit yields the persistence ratio: \[ \boxed{\;\mathcal{R}^{(\Sigma)} \;=\; \Psi\bigl(\mathcal{R}^{(L+1)}\bigr)\;\cdot\; \frac{P_{\mathrm{in}}\,\eta_I(\mathcal{D}_{\mathrm{KL}}^{(\Sigma)})}{\omega\,\mathcal{E}_{\Sigma}\bigl(1 + \mathcal{D}_{\mathrm{KL}}^{(\Sigma)} + \Gamma^{(\Sigma)}\bigr)}\;\cdot\; \Phi\bigl(\mathcal{R}^{(L-1)}\bigr).\;} \tag{4.4} \]
The dimensionless prefactor \(\Psi\in[0,1]\) is the shelter coefficient: the fraction of the noise spectrum from level \(L+1\) transmitted to \(\Sigma^{(L)}\) after enclosing buffers (smaller \(\Psi\) is better shelter; e.g. a galaxy attenuating cosmic rays for a planet). When several enclosures buffer distinct noise channels for the same \(\Sigma^{(L)}\), replace \(\Psi\) by \(\Psi_{\mathrm{eff}}\) from (2.4). The postfactor \(\Phi\in[0,1]\) is the substrate integrity: a composition operator on constituent sub-IPS \(\Sigma_i^{(L-1)}\) — minimum over critical \(\mathcal{R}_i^{(L-1)}\) near collapse, softer aggregates for redundant pools (Section 2.5). Both factors are compositional constraints from (2.1) and (2.3); bounded by 1, tending to 1 for well-organised environments and intact substrates. See docs/substrate_shelter_and_redundancy.md for redundancy and risk.
Equation (4.4) is what we call the Fractal Persistence Equation (FPE). Persistence requires \[ \mathcal{R}^{(\Sigma)} \;\ge\; 1. \tag{4.5} \]
The persistence ratio thus reduces correctly to all four limiting cases known in the literature.
Theorem 5.1 (Persistence Necessity). Let \(\Sigma\) be an IPS satisfying Definition 2.1 with internal free-energy reservoir \(\mathcal{B}<\infty\) and persistence ratio \(\mathcal{R}^{(\Sigma)}(t)\). If there exists \(\delta>0\) and an interval \([t_0,t_0+\tau_d]\) over which \(\mathcal{R}^{(\Sigma)}(t)\le 1-\delta\), then for \[ \tau_d \;\ge\; \mathcal{B}\,\big/\,\bigl[\delta\cdot\omega\,\mathcal{E}_{\Sigma}\bigl(1+\mathcal{D}_{\mathrm{KL}}^{(\Sigma)}+\Gamma^{(\Sigma)}\bigr)\bigr] \tag{5.1} \] the system’s identity \(m(t)\) has positive probability of leaving its equivalence class \([m_0]\), i.e. it dissolves.
Proof. From (4.4)–(4.5) and the energy balance (3.1), the rate of depletion of the internal reservoir is \[ \dot{\mathcal B} \;=\; P_{\mathrm{in}}^{\mathrm{usable}} - P_{\mathrm{out}}^{\mathrm{req}} \;\le\; -\delta\,P_{\mathrm{out}}^{\mathrm{req}}, \] so \(\mathcal B(t_0+\tau_d) \le \mathcal B(t_0) - \delta\,P_{\mathrm{out}}^{\mathrm{req}}\,\tau_d\). Once \(\mathcal B\) is exhausted, the system cannot meet (3.6); by (3.3) it must export less entropy than it generates, so \(\dot S_\Sigma>0\) and the macrostate density spreads. The first-passage time of \(m(t)\) out of \([m_0]\) is bounded above by Kramers’ theorem (Kramers, 1940; Hänggi, Talkner & Borkovec, 1990) and the bound saturates when \(\mathcal B = 0\). Equation (5.1) follows. \(\square\)
Corollary: no IPS with finite \(\mathcal B\) can persist indefinitely below \(\mathcal{R}=1\).
Theorem 5.2 (KL Penalty). For an IPS whose model \(\mu\) deviates from the true posterior by \(\mathcal{D}_{\mathrm{KL}}^{(\Sigma)}\ge\Delta\), the lifetime upper bound (5.1) shortens at least geometrically: \[ \tau_d \;\le\; \tau_d\bigl|_{\mathcal{D}_{\mathrm{KL}}=0}\;\cdot\;e^{-\Delta}. \tag{5.2} \]
Proof. From (4.4), the denominator scales as \(1+\mathcal{D}_{\mathrm{KL}}^{(\Sigma)}\), while the numerator scales as \(\eta_I\le e^{-\mathcal{D}_{\mathrm{KL}}/\log|\mathcal{E}|}\) by (4.2) and a standard data-processing inequality (Cover & Thomas, 2006). Combining gives \(\mathcal R \le \mathcal R\!\mid_{\mathrm{KL}=0}\!\cdot e^{-\Delta}/(1+\Delta)\le \mathcal R\!\mid_{\mathrm{KL}=0}\!\cdot e^{-\Delta}\). Substituting into (5.1) yields (5.2). \(\square\)
The interpretation is sharp: each nat of delusion divides the lifetime budget of the IPS by a factor of \(e\). This is the rigorous content of the everyday observation that systems pursuing increasingly inaccurate models — cancer cells with mis-folded protein networks, firms with bad market models, ideologies with bad world models — collapse on exponentially short timescales relative to honest peers.
Theorem 5.3 (Fractal Composition). For a level-\(L\) IPS composed of level-\((L-1)\) sub-IPS as in (2.3), the persistence ratio satisfies \[ \mathcal{R}^{(L)} \;\le\; \min_i\,\mathcal{R}_i^{(L-1)}\;/\;\bigl(1-\Psi(\mathcal{R}^{(L+1)})\bigr), \tag{5.3} \] so the level-\(L\) system cannot persist if any structurally critical sub-IPS fails, nor if its enclosing IPS dissolves entirely (\(\mathcal R^{(L+1)}\to 0\)).
Proof. The blanket factorisation (2.1) iterated over (2.3) implies \(\partial\Sigma^{(L)}\) is composed of blanket interactions at level \(L-1\). If a critical \(\Sigma_i^{(L-1)}\) fails, the conditional independence \(\Sigma_{\mathrm{int}}^{(L)}\perp \mathcal{E}^{(L)} \mid \partial\Sigma^{(L)}\) breaks, so \(\omega^{(L)}\) effectively diverges and (4.4) implies \(\mathcal R^{(L)}\to 0\). Conversely, if the level-\((L+1)\) IPS dissolves, the shelter \(\Psi\to 0\) and external noise floods \(\Sigma^{(L)}\), again driving \(\mathcal R^{(L)}\to 0\). (5.3) is the explicit envelope. \(\square\)
Theorem 5.3 forbids self-causation: persistence is always a property of a graph of IPS, never a single isolated pattern.
Proposition 5.4. No persisting pattern in the universe can sustain \(\mathcal{R}<1\) over a time scale much larger than its Landauer cycle without violating either the second law or the boundary condition (2.1).
Proof. Assume such a pattern. Then by (3.3) it exports less entropy than it generates, so \(\dot S_\Sigma>0\) in violation of NESS. Internal entropy grows, the marginal density over microstates broadens, and the coarse-graining map \(\Phi\) no longer projects onto a definite \(m\in[m_0]\): the blanket \(\partial\Sigma\) becomes statistically indistinguishable from background, violating the conditional-independence requirement (2.1). The “system” is, by definition, no longer a system. Contradiction. \(\square\)
Equation (4.4) is intended to be a physical law: it should reduce to the actual collapse criteria observed across scales. We sketch seven examples.
For a nucleus, \(P_{\mathrm{in}}\) is the strong binding energy continually re-establishing the nuclear potential well; \(\omega \mathcal E_\Sigma\) is the destabilising contribution from electroweak fluctuations and Coulomb repulsion; \(\Gamma\) is zero for ideal nuclei and small for stable ones. The Bethe–Weizsäcker semi-empirical mass formula (Bethe & Bacher, 1936; Weizsäcker, 1935) is precisely a statement of (4.4) with \(\mathcal{D}_{\mathrm{KL}}=0\): nuclei with positive binding energy (\(\mathcal{R}>1\)) persist; those with negative binding energy decay with mean lifetime governed by the binding deficit, exactly as (5.1) predicts in the limit \(\mathcal D_{\mathrm{KL}}=\Gamma=0\).
Self-replicating chemical sets (Eigen & Schuster, 1977; Kauffman, 1986) maintain a low-entropy composition through coupled exergonic reactions. The Eigen “error threshold” is a direct instance of Theorem 5.2: if the per-base copying error exceeds a critical value, the population’s distance from the master sequence (\(\mathcal D_{\mathrm{KL}}\)) grows past the point where (4.4) drops below 1, and the quasi-species delocalises into sequence-space noise.
The minimum metabolic rate of a living cell (Brown et al., 2004; West, Brown & Enquist, 1997) coincides quantitatively with the right-hand side of (3.6) for \(\omega \approx 10^{10}\text{–}10^{11}\) bits of cellular state and physiological \(T\). Death corresponds to \(\mathcal{R}<1\) persisting for a time bounded by (5.1) — typically minutes for high-metabolism tissues, days for hibernating cells.
Ageing is the empirical observation that \(\Gamma^{(\Sigma)}(t)\) is monotonically non-decreasing in any biological IPS (López-Otín et al., 2013). Theorem 5.1 then predicts that, at constant \(P_{\mathrm{in}}\) and \(\omega\), lifetime scales as \(\mathcal B/\Gamma(t)\); this matches the Gompertz law of human mortality (Gompertz, 1825), where mortality hazard grows exponentially with age — exactly the geometric tail of (5.1) when \(\Gamma\to\infty\).
For a firm or state, \(P_{\mathrm{in}}\) is revenue or tax base, \(\omega \mathcal E_\Sigma\) is bureaucratic friction times environmental volatility, \(\mathcal D_{\mathrm{KL}}\) is the divergence between management’s world-model and the actual market or polity, and \(\Gamma\) is institutional ossification. Theorem 5.2 then yields the well-documented empirical exponential mortality of firms (Daepp et al., 2015), with delusion-driven shocks (financial crises, ideological commitments) as the natural drivers of the rare-event tail.
Empire overextension. A polity whose institutional identity requires footprint \(f_{\mathrm{model}}\) above the sustainable maximum \(f^{\star}\) of Anti-Explosion Theorem Theorem 4.1 — great-power status, sphere of influence, or unipolar order-management on a smaller resource base — is an empire IPS with empire gap \(\Delta_f = f_{\mathrm{model}} - f^{\star} > 0\). Such nodes can maintain \(\mathcal{R} \ge 1\) only by \(\Phi\)-consumption, \(\Psi\)-rent, or \(\mathcal{B}\)-draw; Theorem 5.1 implies they cannot persist in the inflated identity class indefinitely. The generic outcome is footprint downscale, not disappearance (nuclear and geographic floors often prevent dissolution). Hypothesis H-E, test suite, and contemporary proximity scores for Russia, China, and the United States are developed in Empire collapse hypothesis.
For an ecosystem, \(\Phi\) captures species diversity (substrate integrity); \(\Psi\) captures macroclimatic shelter. The fold catastrophes observed in coral-reef and savanna systems (Scheffer et al., 2001) are the predicted phase transitions of (4.4) as \(\Phi\) degrades past the point where the minimisation drops \(\mathcal{R}\) below 1.
Section 2.5 is easiest to read at mesoscopic scales where composition and membership are both visible. Consider a person \(\Sigma^{(L)}\): \(\Phi\) aggregates critical sub-IPS — cardiovascular, immune, and neural systems — each with its own \(\mathcal{R}_i^{(L-1)}\); organ failure is the substrate analogue of Theorem 5.3. In parallel, \(\Psi_{\mathrm{eff}}\) combines shelters that buffer different noise spectra: the body (thermal and metabolic), the household (resource pooling), the employer (income stability), and peer networks (information and reputation). Loss of one shelter channel — unemployment, exile, bereavement — need not dissolve the person if other \(\Psi_j\) remain near unity; loss of a critical organ does.
Consider a polity such as Finland: \(\Phi\) is the integrity of its substrate graph — firms, labour, infrastructure, land and water systems — each a sub-IPS whose local \(\mathcal{R}_i\) feeds the national \(\mathcal{R}\). \(\Psi_{\mathrm{eff}}\) combines partially overlapping enclosures: EU membership attenuating trade and regulatory shocks, Nordic cooperation on energy and security, and macroclimatic/geographic buffering. Withdrawal from a shelter (e.g. trade isolation) lowers one \(\Psi_j\) without necessarily collapsing \(\Phi\); substrate failure (banking-system or grid collapse) attacks \(\Phi\) directly. The two failure modes are empirically distinct and match the factorisation of (4.4).
These seven examples are independent — they use disjoint phenomenology and disjoint methodology — yet all reduce to (4.4) under explicit substitutions. We claim this is evidence for a common underlying law, not a coincidence.
The previous seven anchors are natural IPS — systems we discover. The theory should also admit designed IPS — systems we build to satisfy (i)–(iv) and whose persistence ratio we can read off the implementation. We sketch one such construction: a fractal mixture-of-experts (MoE) neural network trained by gradient descent. The construction is not a metaphor; it is a literal instantiation in which each term of (4.4) corresponds to a measurable quantity in the running program.
Architecture. Fix a recursion depth \(N\) and per-level fan-out \(E\). A level-\(0\) node \(\nu^{(0)}\) is a leaf expert: a small parameterised map \(f_{\theta_\nu}:\mathbb{R}^d\to\mathbb{R}^d\) (e.g. a two-layer MLP). A level-\(\ell\) node \(\nu^{(\ell)}\) is a tuple \[ \nu^{(\ell)} \;=\; \bigl(\{\nu_i^{(\ell-1)}\}_{i=1}^{E},\;r_\nu,\;A_\nu\bigr),\qquad 1\le\ell\le N, \tag{6.1} \] where \(r_\nu:\mathbb{R}^d\to\Delta^{E-1}\) is a router producing per-input scores over the \(E\) children and \(A_\nu\subset\{1,\dots,E\}\) is the sparse activation set \(A_\nu = \mathrm{top\text{-}}k(r_\nu(x))\) with \(k\ll E\). The forward map is \[ f_{\nu^{(\ell)}}(x) \;=\; \sum_{i\in A_\nu} r_{\nu,i}(x)\,f_{\nu_i^{(\ell-1)}}(x), \tag{6.2} \] so each input flows through exactly \(k\) of \(E\) children at each level. The full network is a level-\(N\) root \(\nu^{(N)}\); it is composed of \(E\) level-\((N{-}1)\) nodes, each composed of \(E\) level-\((N{-}2)\) nodes, and so on. This is the recursion (2.3) applied to a feed-forward computation graph, with the path \(\pi(x) = (\nu^{(N)}, \nu_{i_{N{-}1}}^{(N-1)}, \dots, \nu_{i_0}^{(0)})\) of activated nodes at each level forming the substrate chain through which \(x\) is routed.
Mapping to (4.4). Each level-\(\ell\) node is a candidate IPS at organisational level \(\ell\). The blanket \(\partial\Sigma^{(\ell)}_\nu\) is the input/output interface of the node together with its router; conditional independence (2.1) holds in the strong sense that, given the router output, the children’s internal parameters are independent of the input distribution (they only see what their router admits). The remaining terms map as:
| FPE term | MoE realisation |
|---|---|
| \(P_{\mathrm{in}}\) | task-loss gradient signal arriving through the parent router |
| \(\omega\,\mathcal{E}_\Sigma\) | parameter count of \(\nu\) times the per-step optimiser noise floor |
| \(\mathcal{D}_{\mathrm{KL}}^{(\Sigma)}\) | per-token cross-entropy at the output, propagated to \(\nu\) via its router weight |
| \(\Gamma^{(\Sigma)}\) | accumulated representational drift since \(\nu\) was last on an active path |
| \(\Psi\) | router gain at the parent that admits \(\nu\) to the active path |
| \(\Phi\) | mean router-weighted persistence of \(\nu\)’s own children \(\{\nu_i^{(\ell-1)}\}_{i\in A_\nu}\) |
Substrate isolation. The decisive structural property is sparse routing. For input \(x\) with active path \(\pi(x)\), only nodes on \(\pi(x)\) receive non-zero forward signal, and — by the chain rule — only nodes on \(\pi(x)\) receive non-zero gradient from the corresponding task loss. Nodes off the path see \(\partial\mathcal{L}/\partial\theta_\nu = 0\) for that example. This is precisely the operational content of the blanket factorisation (2.1) at the level of optimisation: a non-activated node’s identity \(m_\nu\) is, by construction, unaffected by gradient steps taken for inputs that do not route through it. The persistence requirement \(\mathcal{R}\ge 1\) of §4.3 — applied as a penalty on the active path — is therefore the only channel through which any non-trivial pressure on \(\theta_\nu\) can arise, and it does so only on the steps where \(\nu\in\pi(x)\).
Forgetting as a corollary of Theorem 5.3. Let the data distribution shift from \(\mathcal{X}_A\) to \(\mathcal{X}_B\) at time \(t_*\), and let \(\Pi_A\subset\mathcal{V}\) denote the set of nodes activated with non-negligible router weight on \(\mathcal{X}_A\). The fractal-MoE construction guarantees that, in the limit of perfect router decorrelation between \(\mathcal{X}_A\) and \(\mathcal{X}_B\), nodes in \(\Pi_A \setminus \Pi_B\) receive zero gradient after \(t_*\) and therefore preserve their pre-shift identity exactly; their persistence ratios \(\mathcal{R}_\nu\) are unchanged. By Theorem 5.3 the level-\(L\) persistence ratio on \(\mathcal{X}_A\) decays no faster than the router-weighted minimum of \(\mathcal{R}_{\nu}\) over \(\Pi_A\), so loss on the prior distribution is bounded by router overlap rather than by total training steps on \(\mathcal{X}_B\). A dense network (\(E=1\)) violates this immediately: every example updates every parameter, and substrate isolation degenerates to the trivial chain \(\partial\Sigma^{(\ell)}\equiv\partial\Sigma^{(\ell+1)}\).
The construction is in this sense the minimal neural-network architecture that respects the conditional-independence requirement (2.1) at every level of its compositional hierarchy; ordinary feed-forward networks do not. Empirical anchors for this prediction belong in §7.2 (P5).
A reader-facing map of convergences and divergences with FEP, thermodynamics, topology, autopoiesis, IIT, and adjacent frameworks is in IPS and related theories.
The persistence equation (4.4) is not a new physics. It is a unification:
What is new is the assembly: a single dimensionless quantity that compares predictive income to entropic + delusion + senescence debit, and that recursively composes across scales.
Control-theoretic reading. The same FPE terms map cleanly onto classical regulation vocabulary without changing the physics: \(P_{\mathrm{in}}\) is actuator power budget; \(\mathcal{D}_{\mathrm{KL}}\) is setpoint tracking error between internal model and environment; \(\omega\mathcal{E}_\Sigma\) is noise floor; \(\Gamma\) is wear; \(\Psi\) and \(\Phi\) are shelter and substrate margins; and \(\mathcal{R}\ge 1\) is the persistence setpoint averaged over a regulation window (Theorem 5.1 integrates over \(\tau_d\), so momentary \(\mathcal{R}<1\) is out-of-band excursion, not instant death). Software instantiations of this layer — band telemetry, separated observers, cascade promotion with anti-windup — are documented for aion-core in Control Theory and the Cognitive Processor.
The theory makes at least five quantitative predictions that distinguish it from the bare FEP and from bare NESS theory:
(P1) Universal lifetime law (5.2). Any IPS subjected to a measurable increase in \(\mathcal{D}_{\mathrm{KL}}^{(\Sigma)}\) should suffer an exponential decrease in expected lifetime, with proportionality \(e^{-\Delta\mathcal{D}_{\mathrm{KL}}/\nu}\) for \(\nu\sim 1\) nat. This is testable by perturbing organisms (lesion studies), populations (climate stress), or firms (regulatory shocks) and measuring time-to-collapse.
(P2) Fractal dependence (5.3). Lifetime of a level-\(L\) IPS should be insensitive to small perturbations on average sub-IPS but discontinuously sensitive to perturbations on critical sub-IPS — the system displays “percolation-like” collapse rather than smooth degradation. This is consistent with empirical bow-tie network failures (Csete & Doyle, 2004) and predicts the existence of critical sub-IPS before they are identified.
(P3) Income–noise trade-off. Two IPS with the same \(P_{\mathrm{in}}\) but different \(\mathcal E_\Sigma\) (sheltered vs exposed) should display lifetime ratios equal to \(\mathcal E_\Sigma^{\mathrm{exposed}}/\mathcal E_\Sigma^{\mathrm{sheltered}}\), with the exponent set by \(\omega\). This is the form of the Bergmann/Allen rules in biology and of the resource-curse correlation in economics, both already empirically supported.
(P4) Sub-Landauer impossibility. No IPS can persist at total power below \(\omega\mathcal E_\Sigma\) no matter what its model quality. This forbids hypothetical “perfectly efficient” agents with \(\dot Q\to 0\), in agreement with experiments on Landauer’s bound (Jun, Gavrilov & Bechhoefer, 2014; Lutz & Ciliberto, 2015).
(P6–P9) Anti-explosion and growth-factor laws. For cognitive IPS at polity scale, the persistence ratio is the exponential growth factor (\(\lambda = \ln\mathcal{R}/\tau_{\mathrm{eff}}\)), \(\mathcal{R}(f)\) has an interior maximum in world footprint \(f\), and world-system persistence supervenes on component ratios. Operational tests (fleet scaling, dissolution hazard past peak \(f^{\star}\)) are specified in Anti-Explosion Theorem §6.
(P5) Substrate-isolation forgetting law. For the fractal-MoE construction of §6.8 with per-level fan-out \(E\), top-\(k\) routing, and recursion depth \(N\), the expected post-shift loss on a prior distribution \(\mathcal{X}_A\) after \(\tau_B\) training steps on a disjoint distribution \(\mathcal{X}_B\) should scale as \[ \Delta\mathcal{L}_A \;\propto\; \overline{o}_{AB}^{\,N}, \qquad \overline{o}_{AB} \;\equiv\; \mathbb{E}_{\nu}\!\left[\frac{|\Pi_A(\nu)\cap\Pi_B(\nu)|}{|\Pi_A(\nu)|}\right], \] where \(\overline{o}_{AB}\) is the mean router overlap between the two distributions. The dense baseline corresponds to \(\overline{o}_{AB}\equiv 1\) and is recovered as \(E\to 1\) or \(k\to E\). The prediction is operational: train a fractal-MoE-GPT and a FLOP-matched dense GPT on the same A-then-B continual-learning split, measure forgetting on \(\mathcal{X}_A\), and check that the MoE’s forgetting tracks the geometric series in router overlap rather than the dense baseline’s linear decay. This is the cleanest in-silico test of Theorem 5.3 the theory affords.
Several adjacent theories are not obtained as limits of (4.4):
Quantitative measurement of \(\mathcal D_{\mathrm{KL}}^{(\Sigma)}\). Estimators exist for biological and computational IPS (Friston, 2019; Schreiber, 2000) but no universal protocol. We conjecture that the time-reversed transfer entropy (Schreiber, 2000; Lizier, 2014) is a robust proxy.
Optimal control of \(\mathcal{R}\). What is the policy that maximises expected lifetime under a fixed \(P_{\mathrm{in}}\)? Pontryagin’s principle applied to (4.4) suggests bang–bang strategies of model refinement vs energy harvest, matching observed alternation between exploration and exploitation in foraging and reinforcement-learning agents (Sutton & Barto, 2018). The scaling-limit answer for cognitive IPS — \(\mathcal{R}\) as the growth factor, interior maximum in footprint, world-system coupling — is developed in the companion Anti-Explosion Theorem.
Critical exponents at \(\mathcal R = 1\). The transition \(\mathcal R>1\to \mathcal R<1\) should display universal critical phenomena (fluctuation divergence, slowing-down) analogous to those of dissipative phase transitions (Risken, 1996). Numerical evidence for such precursors has accumulated for ecological tipping points (Scheffer et al., 2009) and we predict it should also be visible in failing firms and senescent cells.
We have presented a thermodynamic theory of information-persisting systems built on four foundations — the second law, Landauer’s bound, the free-energy principle, and Markov-blanket factorisation — and combined them into a single dimensionless persistence ratio (4.4). We proved that \(\mathcal{R}\ge 1\) is necessary for persistence (Theorem 5.1), that model error costs lifetime exponentially (Theorem 5.2), and that persistence at any scale composes fractally with persistence at neighbouring scales (Theorem 5.3). The theory reduces correctly to NESS, Landauer, FEP, and fluctuation-theorem limits; it makes falsifiable predictions about lifetime, criticality, and substrate-neutrality; and it accommodates empirical phenomena from nuclear binding to organisational collapse with a single accounting identity.
The physical content is conservative — no new fundamental constants, no new fields — but the rearrangement is consequential. It identifies persistence itself as a quantitative dimensionless quantity, not a metaphysical primitive; it places “information” on the same footing as energy and entropy in the accounting; and it forecloses an entire family of substrate-essentialist or computational-emergentist claims about the necessary or sufficient conditions for an IPS to exist. The universe is full of patterns; those that stay are exactly those whose books balance in the currency of bits.
The author thanks the open-source authors of the FEP, stochastic-thermodynamics, and Landauer-bound experimental literatures for materials that were freely available throughout the development of this work, and several anonymous correspondents for useful objections to earlier drafts.
LH conceived and wrote the manuscript with the help of LLM based AI systems.
The author declares no competing interests. No external funding supported this work.