Geometry Saves Mawmaw

I. The Punt

In my previous essay, I argued that frontier AI architectures have a coordination problem hiding in plain sight. Yann LeCun’s elegant JEPA framework proposes a Configurator module that sets parameters, decomposes tasks, switches modes, and allocates resources. Then, when asked how the Configurator actually learns to do this, he writes: “I shall leave this question open for future investigation.”

The problem is regress. If the Configurator is a module, what coordinates it? Another Configurator? You either get turtles all the way down or a magic all knowing homunculus that “just works.” The Configurator can’t be the same kind of thing as the modules it coordinates. It has to be different in kind.

I proposed that goal-space geometry might be the answer. Michael Levin’s work on regeneration shows that cells navigate toward attractors in morphospace without central control. Change the bioelectric voltage gradient, and tissue flows to a different target (two heads instead of one). Sans executive function, sans decision-maker. The goal is (somehow) encoded in the landscape, and the cells just roll downhill.

I left the essay with a question: could this work for minds? Could goal-space geometry replace the Configurator in cognitive architectures?

Then I discovered that the math already exists: RNFs.

II. Riemannian Neural Fields

A class of models sometimes described as Riemannian Neural Fields has been developing across physics, control theory, and computational neuroscience for years. There’s an implementation effort recruiting PhD physicists right now (January 2026). I’d never heard of any of it before, so this was a very pleasant surprise.

In standard gradient descent, you move “downhill” on a loss landscape. Distance is Euclidean. A step is a step is a step. The same gradient produces the same motion regardless of where you are.

In Riemannian geometry, there’s a metric tensor g(x) that varies by location. It defines what “distance” and “direction” mean locally. At different points in the space, the same gradient can produce radically different motion, because the coordinate system itself is warped.

The governing equation looks like this:

ẋ = -g(x)⁻¹∇E(x)

The system moves downhill on an energy landscape E(x), but “downhill” is defined by g(x). Attractors arise from the energy landscape. The metric governs access and dominance—how strongly, quickly, and selectively the system flows toward them.

This is not just “tilted terrain.” It’s deeper. The metric rescales directions unevenly. It couples dimensions. What were orthogonal axes become entangled. It can expand or compress subspaces so that some directions become arbitrarily expensive to traverse while others open up.

The best physical intuition I’ve found: imagine walking on a balloon that’s been pinched and stretched unevenly. Your weight changes as you walk. Someone is also pushing up and down on the surface from below. The pre-existing deformations are the learned geometry. Your changing weight is internal state (excitation, uncertainty, attention). The external pushing is exogenous input like Mawmaw’s heart rate dropping; the world reaching in.

Your trajectory isn’t decided. It’s the emergent consequence of local geometry, internal modulation, and external perturbation, right now, at this point, on this surface.

Two constraints keep the intuition honest. First: only intrinsic geometry matters. There’s no God’s-eye view of the balloon from outside. There’s only the surface. Second: the metric makes directions expensive, not impossible. You can always move, but it can get prohibitively expensive.

One more thing. The geometry can co-evolve. Some formulations have g(x,t). That means the surface reshapes as you walk on it. This isn’t fixed terrain. It’s adaptive terrain. Regime-dependent. Task-dependent. This is closer to biology, where stress hormones don’t just move you to a different region, but they warp the manifold itself.

A caveat: learning g(x) without instability is nontrivial. Arbitrary metrics can destroy convergence. Biological systems appear to use low-rank, structured, or slowly varying metrics. There are hard constraints here: positive-definiteness, contraction conditions, timescale separation between the state dynamics and the metric adaptation. The math exists, but it’s not free.

The obvious objection is the curse of dimensionality. A full metric tensor for dimension N is N×N; inverting it is O(N³). For LLM-scale state spaces, that’s prohibitive. Maybe fatal.

Biology suggests it’s not necessarily fatal. Levin’s systems coordinate millions of cells without dense connectivity. The metric isn’t a giant matrix connecting everything to everything. It’s sparse, low-rank, block-diagonal, capturing essential couplings while ignoring the rest. Brains appear to do something similar: local circuits with long-range modulation, not all-to-all precision matrices.

The question isn’t whether dense Riemannian geometry scales. It doesn’t. The question is whether sparse learned geometry can do the coordination work. Biology says yes. Whether synthetic systems can learn the right sparsity structure is open.

III. Convergence

So here’s the claim, now with more mathematical soluble fiber:

The metric can implement configurator functions.

Not a module that decides which mode to use. A geometric structure that makes certain modes “downhill” and others “uphill” depending on where you are. Mode-switching isn’t selected. It emerges from traversing regions with different local geometry.

The Mawmaw robot doesn’t have an executive that says “now enter crisis mode.” When Mawmaw’s heart rate drops, the state vector changes, because her vitals are dimensions of the robot’s state space. At the new location, g(x) is different. The energy landscape E(x) still defines the attractors. But what was a gentle exploratory landscape becomes a steep funnel because the metric now makes one attractor dominant and everything else too expensive to reach.

The geometry does the work without regress or that nasty homunculus.

We must be careful, however, not to hide a homunculus in the update rule itself. If we implement this by training a secondary “hypernetwork” to output g(x) based on context, we have simply reinvented the Configurator and given it a Greek letter. To truly solve the coordination problem, the evolution of the metric cannot be the output of a decision-making module. It must arise from local dynamics like Hebbian plasticity, homeostatic regulation, or direct sensitivity to the curvature (Hessian) of the energy landscape. The geometry changes because of physical laws acting on the state, not because a manager ordered a renovation.

So: geometry isn’t the only non-regressive alternative to explicit control. Market-based internal competition, energy-based arbitration, and constraint satisfaction dynamics all remain possible contenders. But geometry may be the most parsimonious substrate. It doesn’t require additional mechanisms. It’s not another module. It’s the shape of the space in which all the other dynamics unfold.

Levin’s voltage gradients are g(x) for biological morphospace. The math may be provided by this framework. The biological proof-of-concept and the formal machinery are two faces of the same insight: you don’t need a controller if the space itself encodes the control.

Learning and inference are not just moving downhill. They’re moving downhill on a surface whose curvature reflects uncertainty, coordination, and task relevance. And that surface may itself be squishy.

IV. The Sharpened Question

This reframes the experimental question from my previous essay.

I asked: “Is there a learnable metric structure in neural activation space that determines the coordination regime?”

Now I can ask it precisely: Do synthetic cognitive systems exhibit g(x)?

Anthropic’s interpretability work has found meaningful structure in transformer activation space (features, directions, circuits). The question becomes: is there metric structure? Do different tasks, different contexts, different coordination demands correspond to different local geometries? Can we measure how “distance” and “direction” vary across activation space?

If yes: we’ve found where configurator functions live. They were implemented in geometry all along. The path forward is learning or inducing g(x) that produces appropriate mode-switching without explicit control.

If no: current architectures may be geometrically flat. Euclidean. Missing the substrate that biological cognition uses for coordination. That would explain why transformers can be so capable and yet so brittle, because they’re navigating with a constant metric in a world that demands adaptive geometry.

Either answer matters.

But I don’t have the math to run frontier-lab-scale experiments. I can, however, point at a door that doesn’t require Anthropic’s resources.

V. A Test You Can Actually Build

It might be that the simplest system where g(x) is explicit and operational is one you already know: the Kalman filter.

In Gaussian state estimation, you’re inferring a hidden state from noisy observations. The update step minimizes an energy (negative log posterior), and the key insight is this: residuals are weighted by precision (the inverse of covariance). Precision determines how “expensive” it is to move your belief in different directions.

That precision matrix is the metric. It plays exactly the role of g(x): it defines local cost, it’s state-dependent, and it changes with context.

Here’s the experiment.

Setup: A 2D latent state (position and velocity) evolving over time. Observations come from a sensor with noise covariance R. Crucially, make R switch over time:

  • Normal regime: low noise (good sensor)
  • Degraded regime: high noise (bad sensor)
  • Optional: occasional outlier spikes (crisis)

What you log: The estimate, the covariance P, the precision Λ = P⁻¹, the residual, and the update step.

A strict implementation requirement applies here: you cannot simply toggle the sensor noise parameter R manually when you decide the regime has changed. That reintroduces the executive homunculus. To make this a valid demonstration of geometric coordination, the system must use an Adaptive Kalman Filter logic. It must monitor the “surprise” of the incoming data (specifically, the Mahalanobis distance of the residuals). When prediction errors explode, the math itself should inflate the covariance, effectively ignoring the sensor. The switch must be an algebraic consequence of the error, not an external intervention.

What you test:

  • Claim A: When the sensor degrades, the Kalman gain decreases. The system becomes “less willing” to move belief based on observations. The metric changed. That’s geometry modulating dynamics.
  • Claim B: Fit two models to predict the update step. A Euclidean rule (update proportional to residual) will fail across regimes. A geometry-aware rule (update depends on precision and residual) will succeed. The metric matters empirically.
  • Claim C: Define a “mode”; trust-the-sensor vs. trust-the-dynamics. Show that mode correlates with precision. Show that mode transitions happen when precision crosses thresholds driven by data, not by an executive declaring “enter cautious mode.”

That’s the Configurator dissolving into geometry. In a system you can run on a laptop this afternoon.

In 2D, precision is a matrix. It’s anisotropic: some directions in state space are cheap, others expensive, and the off-diagonals couple dimensions. Plot the covariance ellipses over time. Watch them rotate and stretch as context changes. Watch the update steps align with the ellipse geometry.

The balloon metaphor becomes unnecessary. The plots do the work.

This is not a neural field. But it establishes proof of concept:

  • State-dependent metrics arise naturally in optimal inference
  • They implement configurator-like behavior (precision gating) without recursion
  • They are empirically recoverable from trajectories

If it works for Kalman, the question becomes: do neural systems learn something analogous? Is there precision-like structure in activation space? Can we recover it?

That’s a grad-student-scale experiment. A few GPUs. Not a frontier lab.

VI. Two Doors

The first door requires Anthropic-scale resources: probe transformer activation space for metric structure. Look for g(x). See if coordination regimes correspond to different local geometries.

The second requires a laptop: build the Kalman toy. Show that precision implements mode-switching without an executive. Establish the proof of concept. Then ask whether the same structure exists in learned systems.

Either path advances the question. Both are open.

Someone should walk through.


This essay continues the line of inquiry from AI Can’t Help Mawmaw, But Worms Can. The errors (category and otherwise), hand-waving and oversimplifications are mine. The geometry is not.