Physics:Equation of Artificial Intelligence in the Theory of Entropicity(ToE)

Entropic Learning via Self-Referential Entropy Tracking (SRETA)

Short description: Theory of Entropicity(ToE) postulates the axiom of a learning action where learning is motion in state space driven by self-referential entropy toward a given task entropy.

That is:

Learning is a change in internal entropy states of a system [a change in Self Referential Entropy[SRE]] towards a given [internal or external] reference entropy.

Abstract

This paper defines a principled Learning Action in which learning is explicitly a change of state guided by Self-Referential Entropy (SRE) toward a given (target) entropy. Instead of adding Shannon entropy and an “irreversible entropy” term, we construct an entropic potential and a gradient-flow action whose minimizer yields irreversible dynamics by design. The result is the Entropic Learning Equation (ELE).

This paper is an update on an earlier paper on Artificial Intelligence and Deep Learning in the Theory of Entropicity (ToE).^[1]

Preliminaries

Objects and Notation

Model state (parameters): [math]\displaystyle{ \phi(t)\in\mathbb{R}^d }[/math].

Predictive distribution: [math]\displaystyle{ p_\phi(y\mid x) }[/math].

Data (or teacher) distribution: [math]\displaystyle{ p^*(y\mid x) }[/math].

Self-Referential Entropy (SRE) of the model’s internal state: [math]\displaystyle{ S_{\mathrm{self}}(\phi) }[/math] (differentiable scalar functional).

Given (target) entropy for the task: [math]\displaystyle{ S_{\mathrm{given}} }[/math].

Positive-definite mobility/metric: [math]\displaystyle{ \Gamma(\phi)\in\mathbb{R}^{d\times d} }[/math].

We use [math]\displaystyle{ \nabla_\phi }[/math] for gradients w.r.t. [math]\displaystyle{ \phi }[/math] and the dot for time derivatives, [math]\displaystyle{ \dot\phi=\tfrac{d\phi}{dt} }[/math].

Choice of Target Entropy [math]\displaystyle{ S_{\mathrm{given}} }[/math] (Guidance)

Nearly deterministic supervision: [math]\displaystyle{ S_{\mathrm{given}}\approx 0 }[/math].

Inherently ambiguous labels: set [math]\displaystyle{ S_{\mathrm{given}}\approx H^*(Y!\mid!X) }[/math], an empirical conditional entropy estimate.

Representation learning: define [math]\displaystyle{ S_{\mathrm{self}} }[/math] on latents (e.g., codebook, embedding spread) and set [math]\displaystyle{ S_{\mathrm{given}} }[/math] by desired compression/robustness.

Entropic Potential

Define the entropic potential (weights can be nondimensionalized; set to 1 after scaling): [math]\displaystyle{ W(\phi);=;\tfrac{\alpha}{2},\big(S_{\mathrm{self}}(\phi)-S_{\mathrm{given}}\big)^2 ;+; \beta,\mathbb{E}{x\sim\mathcal{D}}!\Big[\mathrm{KL}\big(p^!(\cdot\mid x),\Vert,p\phi(\cdot\mid x)\big)\Big], }[/math] where [math]\displaystyle{ \mathrm{KL}!\big(p^*(\cdot\mid x)\Vert p_\phi(\cdot\mid x)\big) \sum_y p^*(y!\mid!x),\log!\frac{p^*(y!\mid!x)}{p_\phi(y!\mid!x)}. }[/math]

The first term drives SRE alignment: [math]\displaystyle{ S_{\mathrm{self}}(\phi)\to S_{\mathrm{given}} }[/math].

The second term pulls predictions toward data without being “just another entropy addend”.

Learning Action

We penalize deviation from the desired gradient flow generated by [math]\displaystyle{ W }[/math]: [math]\displaystyle{ \mathcal{L}{\mathrm{SRETA}}[\phi] ;=; \int{0}^{T} \tfrac{1}{2}, \big| \dot\phi + \Gamma(\phi),\nabla_\phi W(\phi) \big|^2 ,dt. }[/math]

Boundary conditions (typical): [math]\displaystyle{ \phi(0)=\phi_0 }[/math]; free terminal state or fixed [math]\displaystyle{ \phi(T) }[/math] if desired.

[math]\displaystyle{ \Gamma(\phi) }[/math] sets the geometry and time scale of learning (identity [math]\displaystyle{ \to }[/math] vanilla gradient flow; Fisher metric [math]\displaystyle{ \to }[/math] natural gradient).

Stationarity and the Entropic Learning Equation (ELE)

Minimizing the action gives the gradient-flow ELE: [math]\displaystyle{ \dot\phi -,\Gamma(\phi),\nabla_\phi W(\phi) -,\Gamma(\phi)!\left[ \alpha,\big(S_{\mathrm{self}}(\phi)-S_{\mathrm{given}}\big),\nabla_\phi S_{\mathrm{self}}(\phi) ;+; \beta,\nabla_\phi, \mathbb{E}{x}!\big[\mathrm{KL}(p^*\Vert p\phi)\big] \right]. }[/math]

Built-in Irreversibility (No ad-hoc [math]\displaystyle{ S_{\mathrm{irr}} }[/math])

Define the entropy-production rate [math]\displaystyle{ \sigma(\phi) ;=; \big(\nabla_\phi W(\phi)\big)^{!\top}! \Gamma(\phi), \big(\nabla_\phi W(\phi)\big) ;\ge;0. }[/math] Because [math]\displaystyle{ \Gamma(\phi) }[/math] is positive-definite, [math]\displaystyle{ \sigma\ge 0 }[/math] holds identically: irreversibility emerges from the dynamics, not from an added penalty.

Equivalent Constrained (Tracking) Form

One can enforce “SRE relaxes toward the target” explicitly via a Lagrange multiplier [math]\displaystyle{ \lambda(t) }[/math]: [math]\displaystyle{ \mathcal{L}_{\mathrm{track}}[\phi,\lambda] \int!\Big[ \tfrac{1}{2},\dot\phi^{!\top} M,\dot\phi ;+; \beta,\mathbb{E}{x}!\big[\mathrm{KL}(p^*\Vert p\phi)\big] ;+; \lambda(t),\Big( \tfrac{d}{dt} S_{\mathrm{self}}(\phi) \kappa,[,S_{\mathrm{given}}-S_{\mathrm{self}}(\phi),] \Big) \Big],dt, }[/math] with [math]\displaystyle{ \tfrac{d}{dt} S_{\mathrm{self}}(\phi) \nabla_\phi S_{\mathrm{self}}(\phi)\cdot \dot\phi. }[/math] Stationarity yields first-order dynamics in which [math]\displaystyle{ \tfrac{d}{dt} S_{\mathrm{self}}(\phi) \kappa,[,S_{\mathrm{given}}-S_{\mathrm{self}}(\phi),] }[/math] (i.e., exponential relaxation of SRE toward the target) while simultaneously fitting the data via the KL term. The matrix [math]\displaystyle{ M\succ 0 }[/math] sets inertial weighting; taking the overdamped limit recovers the gradient-flow ELE.

Practical Design Choices

Geometry / Optimizer Mapping

Euclidean flow: [math]\displaystyle{ \Gamma(\phi)=\eta I }[/math] (learning-rate [math]\displaystyle{ \eta }[/math]), giving [math]\displaystyle{ \dot\phi=-\eta,\nabla_\phi W(\phi) }[/math].

Natural gradient: [math]\displaystyle{ \Gamma(\phi)=\eta,F(\phi)^{-1} }[/math], with [math]\displaystyle{ F }[/math] the Fisher information.

Preconditioned/Adam-like: choose [math]\displaystyle{ \Gamma }[/math] diagonal or adaptive from running curvature estimates.

Discrete-Time Approximation (for implementation)

For step size [math]\displaystyle{ \Delta t }[/math]: [math]\displaystyle{ \phi_{k+1} ;=; \phi_k \Delta t,\Gamma(\phi_k),\nabla_\phi W(\phi_k). }[/math]

Scaling and Weights

Non-dimensionalize so that [math]\displaystyle{ W }[/math] is order-one near optimum, then set [math]\displaystyle{ \alpha=\beta=1 }[/math]. Use [math]\displaystyle{ \Gamma }[/math] (or [math]\displaystyle{ \eta }[/math]) to tune time scale.

Why This Fixes the Original Formulation

Learning is explicitly state change: dynamics appear via [math]\displaystyle{ \dot\phi }[/math].

SRE alignment is the steering signal through [math]\displaystyle{ \big(S_{\mathrm{self}}-S_{\mathrm{given}}\big)\nabla_\phi S_{\mathrm{self}} }[/math].

Irreversibility ([math]\displaystyle{ \sigma\ge 0 }[/math]) is automatic from the quadratic action; no external [math]\displaystyle{ S_{\mathrm{irr}} }[/math] term is required.

Data-fit pressure is principled via KL, orthogonal to SRE alignment.

== Minimal Working Set (copy-ready) == Entropic potential:

[math]\displaystyle{ W(\phi)=\tfrac{\alpha}{2}\big(S_{\mathrm{self}}(\phi)-S_{\mathrm{given}}\big)^2 +\beta,\mathbb{E}{x}!\Big[\mathrm{KL}\big(p^*(\cdot\mid x)\Vert p\phi(\cdot\mid x)\big)\Big]. }[/math]

Learning action (SRETA): [math]\displaystyle{ \mathcal{L}{\mathrm{SRETA}}[\phi]=\int_0^T \tfrac{1}{2},\big|\dot\phi+\Gamma(\phi)\nabla\phi W(\phi)\big|^2,dt. }[/math]

Entropic Learning Equation (ELE): [math]\displaystyle{ \dot\phi -,\Gamma(\phi)!\left[ \alpha,\big(S_{\mathrm{self}}(\phi)-S_{\mathrm{given}}\big),\nabla_\phi S_{\mathrm{self}}(\phi) + \beta,\nabla_\phi\mathbb{E}{x}!\big[\mathrm{KL}(p^*\Vert p\phi)\big] \right]. }[/math]

Entropy-production rate: [math]\displaystyle{ \sigma(\phi)=\big(\nabla_\phi W(\phi)\big)^{!\top}\Gamma(\phi)\big(\nabla_\phi W(\phi)\big)\ge 0. }[/math]

Constrained tracking form (optional): [math]\displaystyle{ \mathcal{L}_{\mathrm{track}}[\phi,\lambda] \int!\Big[\tfrac{1}{2}\dot\phi^{!\top}M\dot\phi +\beta,\mathbb{E}{x}!\big[\mathrm{KL}(p^*\Vert p\phi)\big] +\lambda(t)\big(\nabla_\phi S_{\mathrm{self}}(\phi)!\cdot!\dot\phi-\kappa[S_{\mathrm{given}}-S_{\mathrm{self}}(\phi)]\big)\Big]dt. }[/math]

Remarks

This framework is agnostic to the specific definition of [math]\displaystyle{ S_{\mathrm{self}}(\phi) }[/math] as long as it is smooth; examples include entropy of latent codes, complexity penalties, or energy-based state measures.

The formulation is compatible with stochastic mini-batch estimates of both the KL and gradients.

The same blueprint extends to multi-objective training by summing additional potentials inside [math]\displaystyle{ W(\phi) }[/math] before taking the gradient flow.

References

↑ Physics:Artificial Intelligence Formulated by the Theory of Entropicity(ToE). (2025, August 27). HandWiki, . Retrieved 03:59, August 27, 2025 from https://handwiki.org/wiki/index.php?title=Physics:Artificial_Intelligence_Formulated_by_the_Theory_of_Entropicity(ToE)&oldid=3742591

[1] Physics:Artificial Intelligence Formulated by the Theory of Entropicity(ToE). (2025, August 27). HandWiki, . Retrieved 03:59, August 27, 2025 from https://handwiki.org/wiki/index.php?title=Physics:Artificial_Intelligence_Formulated_by_the_Theory_of_Entropicity(ToE)&oldid=3742591

[1]

Anonymous

Search

Physics:Equation of Artificial Intelligence in the Theory of Entropicity(ToE)

Namespaces

More

Page actions

Contents

Entropic Learning via Self-Referential Entropy Tracking (SRETA)

Abstract

Preliminaries

Objects and Notation

Choice of Target Entropy [math]\displaystyle{ S_{\mathrm{given}} }[/math] (Guidance)

Entropic Potential

Learning Action

Stationarity and the Entropic Learning Equation (ELE)

Built-in Irreversibility (No ad-hoc [math]\displaystyle{ S_{\mathrm{irr}} }[/math])

Equivalent Constrained (Tracking) Form

Practical Design Choices

Geometry / Optimizer Mapping

Discrete-Time Approximation (for implementation)

Scaling and Weights

Why This Fixes the Original Formulation

Remarks

References

Navigation

Navigation

Help

Translate

Wiki tools

Wiki tools

Anonymous

Search

Physics:Equation of Artificial Intelligence in the Theory of Entropicity(ToE)

Entropic Learning via Self-Referential Entropy Tracking (SRETA)

Abstract

Preliminaries

Objects and Notation

Choice of Target Entropy [math]\displaystyle{ S_{\mathrm{given}} }[/math] (Guidance)

Entropic Potential

Learning Action

Stationarity and the Entropic Learning Equation (ELE)

Built-in Irreversibility (No ad-hoc [math]\displaystyle{ S_{\mathrm{irr}} }[/math])

Equivalent Constrained (Tracking) Form

Practical Design Choices

Geometry / Optimizer Mapping

Discrete-Time Approximation (for implementation)

Scaling and Weights

Why This Fixes the Original Formulation

Remarks

References

Navigation

Wiki tools

Page tools

Other projects