Ground Beneath the Map

TL;DR: Between raw experience and symbols (words, and formal logic), there is a geometric middle layer which is pre-symbolic, continuous, and structured. Human cognition has it. Four independent fields all say it exists. LLMs don’t have it and can’t get it by scaling, because abstraction is lossy and optimizing for next token prediction corrodes the representation of state space. This one absence explains both failure to reason and failure to relate. They’re the same problem.

June, 1805. Meriwether Lewis is standing in a canoe on a river that is trying to kill him. The current is fast and cold and it pushes back against his pole with a force that has nothing to do with his opinions about it. His boots are wet. A cottonwood branch clips his hat. He ducks, two seconds too late.

That night, by firelight, he pulls out paper and draws the river. The drawing is not the river itself, it’s an abstraction of the river. The bend where the current nearly swamped him, the sandbar where he camped, the tributary that entered from the northwest. Three dimensions of cold, wet, muscular encounter compressed into two dimensions of ink on paper. The temperature of the water is gone. The ache in his shoulders is gone. What remains is spatial. Relational. The bend is upstream of the sandbar. The distances are approximate but the topology is preserved.

Then he writes a word on it: “Missouri River.”

Two words. Fourteen characters. A label that points at the map, which points at the territory. The words are portable, compositional, and infinitely reusable. He can say “Missouri River” to Thomas Jefferson and Jefferson will nod. But Jefferson has never stood in the canoe. He doesn’t have Lewis’s map. He has the label, and whatever associations “river” triggers in a mind that has encountered other rivers but not this one.

Territory. Map. Label. Three layers. Each compresses the one below it. Each loses something the layer below had.

I’ve written about six essays now about why frontier AI models don’t think. I’ve circled the same intuition from different angles across a coupla months. Three weeks ago (when I started this essay), something clicked that I had to work through before it evaporated.

I’ve been saying: transformers have the labels, not the map, and no experience in the territory. They learned what humans wrote about experience, not experience itself.

Fine. True. But I’d been treating “map” and “label” as two layers when there are (at least) three. The one I kept stepping over might be the one that matters most.

Lewis the explorer in the canoe has all three.

Territory is the raw encounter. This is the current pushing back and physics, mosquitos and dead trees in the river. This is where river’s real world phenomena live, and they don’t care what you name it.

Geometry is the map drawn by firelight. A map is pre-symbolic but structured. It is spatial, relational, and topological, with constraints (can’t cross this mountain) and varying levels of detail. Maps are lossy, but the structure of the territory is preserved, more or less. This “map” is also the child who has pushed cups off tables for six months and now has an un-named, un-examined knowing of physics of cups before having the word “cup.” This is also Michael Levin’s morphospace, where cells navigate toward a body plan without a blueprint. Piaget’s sensorimotor schemas. The map is continuous fields with curvature, constraints, attractors, but not yet named. An organization descriptive of the territory, sans words.

Symbols are like “Missouri River,” “banana”, “airplane”. They are discrete tokens that encode - with loss - the geometric ground, the experiential world. Words, formal logic, etc., fall into this symbol category, Token ID 23847. They’re powerful and portable, compositional and communicatable. You can’t directly transmit your personal experience of biting into a mealy apple. You have to use words to transmit what you can from that actual experience of the territory. But words aren’t the territory itself, and not the accumulated experience of that territory (the middle layer).

That middle layer is where the action is.

Here’s what clicked this morning, and it’s the reason I’m thrashing this essay this instead of doing something productive.

I’ve been running two parallel arguments across these essays. One about relating (and language) and one about reasoning. They seemed like different problems.

They’re the same problem.

Before we get rolling, I’m going to use a couple of words a bunch: abstraction and compression. I’m using them interchangeably. Here’s how to think about what they mean. When you want to make a really good soup broth, you start with a lot of water. Then you cook it, for a long time, and the water steams off and it loses something (volume), but it gains something - it becomes more concentrated. That’s what “abstraction” means - you lose the details, but you’re gaining a kind of shortcut, a potent essence.

Now, the game is afoot.

Language works like this: the world is raw experience. We abstract it into embodied, spatial, relational representations. World-models. Then we slap words on the model toys. Labels. Words are tokens that point at the world-model, which then points at the world.

Logic works the same way. The space of constraints and invariances is raw structure, the reality. We compress it into geometric relationships: topological, pre-symbolic, the kind of thing Lawvere and Grothendieck formalized in topos theory and Voevodsky made rigorous in Homotopy Type Theory. Then we abstract those relationships into formal notation. Modus ponens is a token that points at the logic-geometry, which points at the constraint space.

It’s the same stack with the same three layers.

World → world-model → words. Constraint space → logic-geometry → formal logic.

In both cases, the robustness lives in the middle layer. Not in the territory (because it’s too raw, too expensive, and too slow) and not in the symbols (they’re lossy, too brittle, disconnected from what they represent). Robustness lives in the geometric ground. The worm rebuilds because the bioelectric landscape drives it. The child generalizes because the sensorimotor schema transfers to novel cups, novel tables, novel situations. Voevodsky’s whole program showed that if you start from the geometry, you get foundations that are invariant under transformation.

When I say “geometric ground” what I mean is: when you experience the world, you are encoding the world into a set of states. This cup is full of water. (State 1). I tip the cup. The water pours out. The cup is mostly empty. (State 2). The relation of all of the states of the world that you’ve ever experienced is wearing grooves in your mind, in an invisible “geometric ground” that relates each state-of-the-world to neighboring states, and the ground has “weight” or curvature, indicating the “nearness” of those states, and what causal means cause a transition from one state to another.

In my first essay, I argued that logic is invariant under renaming. Modus ponens doesn’t care if the variables are Alice and Bob or X and Y. Give frontier models a logic problem, rename the variables, and they collapse. This is evidence that they don’t reason. (Also, see: 300 Nopes.)

The invariance test isn’t checking whether the model “knows logic.” It’s checking whether the model has the geometric ground from which logic-as-symbols were derived. If you have the middle layer, renaming is trivial. Structure persists because the middle layer persists, and the symbols are labels on a landscape that doesn’t change when you relabel. If you change the label on Lewis’s map from “Missouri River” to “Skippi-doo-dah River”, it doesn’t change that there’s a bend in the river, with an island in the middle. If you only have the symbols, with no map, renaming is catastrophic. There’s no foundation to hold the structure in place, so renaming changes literally everything.

But Colin, frontier models pass logic benchmarks all the time!

Sure. Inside the training distribution. Stay inside the statistical basins and modus ponens works fine. Rename the variables and the whole thing collapses, because you’ve drifted outside the basin where “logic-shaped text” lives in token-space. The models had the tokens that described logical structure. They never had the structure. A Stanford survey just cataloged ~300 papers’ worth of exactly this kind of collapse: reversal failures, compositional breakdowns, robustness cracks that trace straight back to the autoregressive training objective.

Same diagnosis for language. Rename the domain, apply the same linguistic structure to a context outside the training distribution, and fluency becomes confabulation. The words were never grounded in a world-model. There is no middle layer, no geometric ground beneath the symbols.

Both failures are the same structural absence: there’s no middle layer.

I should say how I got here, but it’s a little embarrassing.

I arrived at these ideas from first principles and what Rich Hickey called Hammock Driven Development. Careful, deliberate, sustained thought. I pushed on a set of intuitions about why AI systems fail, followed the implications, tested them against cases, circled back. You know, thinking.

And when I picked my head up and looked around, I found I had traversed, with less accuracy and less potency, a set of ideas that are completely well-trodden.

I’m ok with that. More than ok. If you reason carefully from different starting assumptions and arrive at the same place as people who were much better at this than you, that’s evidence the destination is real. I think it’s called “convergent evolution of ideas”. The fact that I came at it from “why do transformers fail invariance tests” and ended up at topos theory doesn’t make me a mathematician. It means these ideas are coherent enough that you can feel the shape from multiple directions. One mountain, many paths.

The better thinkers than I: Lawvere and Grothendieck showed in the 1960s that logical operations correspond to geometric operations. Voevodsky pushed further: in Homotopy Type Theory, logical equivalence starts as geometry. I haven’t worked through the proofs. I’m trusting the mathematicians on the math and pointing at where it converges. Meanwhile, Piaget was showing the same thing from the developmental direction. The child’s “concrete operations” are internalized physical actions. Logical reversibility (“if I pour the water back, the amount is the same”) is a mental operation that was first a bodily operation. The child doesn’t learn modus ponens and then apply it to cups. The child pushes cups for two years and logical structure emerges from the pushing.

I’m a little more versed with the stuff Michael Levin talks about. Levin showed the biological version. Cells navigating toward “correct frog face” in morphospace aren’t doing logic. They’re following gradients in what is likely a geometric landscape. The coordination falls out of the curvature, not requiring symbols, executors or blueprints.

Four independent traditions (pure mathematics, developmental psychology, cognitive science, and regenerative biology) converging on the same claim from different directions. The geometric ground is real. It’s prior to symbols. And I am pointing at it, saying, “This is where robustness comes from.”

Here’s what this means for AI. Transformers were fed the entire corpus of human written language. All of it. (Or, at least, Reddit. Shudder.) That corpus is a symbolic landscape. It contains some logical content, buried in an ocean of Reddit shitposts and rhetoric and recipes and rants. The ratio leans overwhelmingly non-logical. Because of this, a good LLM can perform some logical, deductive reasoning, as long as the problem sits inside the training distribution, the labels aren’t too weird, and you don’t push outside the statistical basins.

For humans, the stack runs: territory → geometric ground → symbols. Three layers. Each compresses the one below it. The geometric ground is pre-symbolic, continuous, spatial, and embodied. Symbols sit on top of a structure that has its own integrity, derived from experience in the world.

(It’s worth noting that the middle layer, for humans, is massively redundant. History baked in hundreds of millions of years of successful evolutionary reinforcement as firmware. Vision, proprioception, touch, vestibular sense: each builds the same spatial geometry from independent inputs. The child doesn’t learn “objects persist” from vision alone; she learns it from reaching, grasping, dropping, hearing, seeing, all converging on the same invariance. The geometric ground is robust because it’s not one abstraction. It’s dozens of independent compressions, cross-checked by systems that would all have to fail simultaneously to produce a wrong geometry.)

For transformers, symbols are the only available “ground”. And the point is: they ain’t grounded. At all.

LLMs don’t have a pre-symbolic geometric layer with symbols on top. They have symbols all the way down. Their substrate is tokens. Their “world” is tokens. Their “geometry,” such as it is, is the statistical topology of token-space, which is a geometry of labels, not a geometry of constraints. They are extremely fluent, statistically midline symbol manipulators.

This explains both the fluency and the brittleness. Fluent, because they’re operating natively in symbol-space (made of the stuff they manipulate). Brittle, because symbol-space doesn’t have the invariance properties of the geometric ground it was abstracted from. The structure that would survive renaming, the topological relationships, the curvature of the constraint landscape, was lost in the abstraction from geometry to symbols. Transformers never had the geometry, just the symbols the geometry was smooshed into.

And this is why scaling doesn’t help. More parameters, longer context, more training data: all of this enriches the symbolic layer. A richer map. A more detailed map. A map so detailed it’s hard to distinguish from the territory on casual inspection. But no amount of cartographic refinement produces the geometry that the map was abstracted from. Abstraction is lossy. The information isn’t there.

I gotta be honest about what I don’t know, because the list is long.

All of this assumes that human cognition actually has something like this three-layer structure. Territory, geometric ground, symbols. There is evidence: Lawvere proved the math, Piaget observed the development, Levin demonstrated a parallel in biology, Voevodsky formalized the geometry. Four independent traditions, converging from different directions.

But convergence ain’t proof. What if human cognition isn’t a layer cake at all? What if the “geometric ground” is a convenient story we tell about cognition, not a feature of it? What if the sensorimotor schema is itself an abstraction, and the thing underneath that is also a compression, and it’s abstractions all the way down, and “raw territory” is a philosophical posit that no mind actually touches?

Kahneman showed that most of what we call “reasoning” is post-hoc confabulation over fast heuristic judgment. The neuroscience keeps getting less kind to the layered cognition story. Interesting, huh? Maybe the geometric ground is what pattern matching feels like for us from the inside when the pattern matching is rich enough. Maybe I’m describing the phenomenology of cognition, not its architecture.

I can’t resolve this. The instrument and the thing being measured are the same. I can’t use my cognition to check whether my model of cognition is correct, because the checking is just, my cognition.

What I can say: the better thinkers, from multiple independent domains, suggest the geometric ground is real, or at least a really useful model. And the architectural observation about transformers holds either way: whether or not human cognition has a true pre-symbolic geometric layer, transformer cognition has symbols as its only foundation, and it lacks the invariance properties that the convergent evidence associates with the geometric ground.

The diagnosis might be right even if the theory of why it’s right turns out to be wrong.

So what’s mine in all of this? Am I contributing anything that better minds haven’t already worked out?

Definitely Not The Math™. And, not the developmental psychology. And nope, not the biology. What I’m doing is pointing at the convergence and asking: does this illuminate an exigent AI problem?

Aye, I think it does. “Transformers don’t reason” and “transformers don’t relate” are one problem, and the one problem is the absence of geometric ground. The middle layer of a three-layer stack that current architectures don’t just lack but are structurally alien to, because for them, symbols are the only available ground.

Which suggests that the path forward isn’t bigger transformers. It’s architectures that can represent continuous geometric structure: constraint landscapes, morphospaces, topological invariants. Architectures that derive symbolic behavior from that structure rather than learning symbols and hoping the structure materializes.

Levin’s lab is doing this with biology. Voevodsky did it with mathematics. Piaget showed that human children do it developmentally. Geometry first, symbols later.

Nobody, as far as I can tell, is doing it with AI.

(A side note, while I’m thinking aloud. I suspect, without the credentials to claim this properly, that for humans, some of the geometric ground comes pre-installed. Firmware. The brainstem, the cerebellum, the spinal cord encode spatial, relational, embodied structure before any learning happens. Edge detection. Proprioception. The startle response: a pre-installed model of “sudden change requires immediate action.” That’s evolutionary, nature’s geometric ground, wired in by hundreds of millions of years of organisms encountering territory and having the successful compressions baked into the hardware. Which means Piaget’s developmental story isn’t starting from zero. The child pushing cups is building on evolutionary firmware. And Levin’s morphospace is inherited firmware; the worm’s body plan attractor is literally pre-installed geometric ground, cells responding to voltage gradients because of natural selection.)

If this is right, the stack is deeper than three layers. Territory → nature geometry → nurture geometry → symbols. Transformers lack three of these. They haven’t encountered the territory (and can’t). They lack the firmware, and they lack the nurture layer geometry. All they have are symbols that were eventually derived from all of the deeper layers.

I’m just scribbling on the whiteboard while you watch. But the convergence from four independent directions makes me think there’s something here worth chewing on more carefully.

The territory is real. The symbols are useful, no doubt! Right now, all the machines have are symbols.

Just symbols.