The Last Levee

I like starting some of my essays with a narrative vignette that makes the thesis visceral. Something you feel, not just think about. I wanted to do that with this one, but I can’t. It’s too ick. This one is radioactive.

So instead: a security blanket.

For twenty years, geneticists told us complex traits like intelligence are “massively polygenic”, meaning: influenced by thousands of variants, each contributing fractions of a percent. The conclusion was “You can’t edit for intelligence because there’s nothing to target. Too many dials, each too small to matter.”

This was very comforting.

And probably wrong. And we’re about to find out.



CRISPR is amazing. In a lab, in a dish, you launch this nifty machine made of protein and it greps through strands of DNA to find a spot you’ve selected. Then it snip snips and hey presto! edits the DNA. Insert your own special flavor of DNA. This is nothing short of godlike power. Who has two thumbs and holds 500 million years of evolution in contempt? We do!

But. (Always, right?) Gene remixes have historically faced two very pernicious and unrelated roadblocks. First: delivery. Getting enough CRISPR machines into the right places in the body, in sufficient quantity, while dodging the immune system and etc. Scale of delivery holds CRISPR back. However, this is one of those “crank the handle” problems, and we humans are diligently cranking away, chipping away at this “delivery at scale” problem. It will, likely, fall to our sustained efforts soon enough. George Church at Harvard has said that the challenges aren’t fundamental problems, just engineering bugs.

Meanwhile, the other pesky problem. We don’t really know what genes control what traits. The term is genotype-to-phenotype (G2P) mapping. What it means is that DNA is a lot (a lot!) of little levers to pull, and we’re not really that sure which levers need to be pulled to make you have an IQ of 180. Or features as devilishly handsome as my own. Our genome is less like computer code and more like a score of symphonic music. One trumpet in the back is out of tune, and the whole thing sounds like shit.

Worse, these G2P mapping issues are non-local, and distributed. The levers are scattered all over the genome, where sometimes enhancers are thousands of bases from the genes they control. There’s also alternative splicing, which produces hundreds of different proteins from single genes. And other, more esoteric shit that I barely understand like epistatic interactions, and chromatin folding that exposes or hides sequences from transcription machinery.

Thus: for complex traits (high IQ, Steele-level good looks), predicting edit outcomes was impossible.

Then along comes Google. They built this thing called AlphaFold (Nobel Prize, 2024) and then AlphaGenome (January 2026).

These are Narrow Domain Artificial SuperIntelligences (NDASIs). Through superhuman (literally) statistical prediction in these bounded domains, they’ve done what no humans could do before. AlphaFold makes new proteins. AlphaGenome (now) models how changes thousands of bases away from a gene can change its expression. It’s finding regulatory structure (not the governmental kind, the gene-lever-pulling-kind) that was inaccessible to previous techniques humans used to crack the G2P problem. What was a messy (intractable) statistics problem for us is for these NDASIs, an information processing problem with a learnable grammar we couldn’t see.

I just said something extraordinary, and it might not have been quite so obvious so lemme say it again simpler:

The Machine™ can now figure out “Hey, if we pull this lever on your DNA… Here’s what happens!”

We never were able to do this before.

This is um… breathtaking.

We arrive at an uncomfortable moment (for me, anywho). The technical frontier has been won by an NDASI. We needn’t laboriously map, step by tiny step, through the landscape of genotype-to-phenotype. We have the equivalent of GPS now. We can confidently say, “Head north,” to beat this metaphor to death.

More precisely, we can now say: Change this gene, right here. We have a strong prediction about what that will show up as, in the organism. “In the human,” is how your mental monologue should be correcting me. And you’d be right.

In this space, there’s an effort called GWAS. GWAS (Genome-Wide Association Studies) scans the genomes of huge populations looking for statistical correlations, like “people with this variant tend to have this trait.” It finds associations, not mechanisms: it can tell you that something correlates, but not why or how.

With this in our pocket, humanity (yes, I’m doing the Royal We, at this point) has, somewhat sheepishly said aloud, “Don’t worry! The number of DNA levers you have to pull, in what combinations, all up and down these pesky strands, makes it juuusst whoa boy! so hard - impossible really - to know what controls a complex trait like human intelligence.”

This was a “complexity defense”. “This is just soooo complex, we’ll really never figure it out! But keep digging, boys!”

There’s a “but” coming.

David Kelley at Calico directly models long-range regulatory interactions. Anshul Kundaje at Stanford has said GWAS underestimates causal structure due to linkage and context loss. The people building the prediction engines already suspect the “complexity” defense is weaker than advertised. They just won’t say “enhancement” out loud.

So, here we are. Here’s the ground moving beneath our feet. For twenty plus years, these GWAS studies said: “Complex traits like intelligence are “massively polygenic”, influenced by thousands of variants, each contributing fractions of a percent.”

The conclusion seemed obvious: You can’t edit for intelligence because there’s nothing to target. Too many dials, each too small to matter.

This was a very comforting security blanket.

What GWAS did was find correlations. It couldn’t say why. It didn’t offer an explanation, a mechanism. It could only say: “I see a correspondence… right here.” It saw shadows on the wall, not the objects casting them.

Even before AlphaGenome, there were some issues with this security blanket thinking.

Work by Sohail et al. showed population stratification contaminates polygenic scores, implying many “signals” are downstream noise rather than causal architecture. Eric Turkheimer’s insight (which is that heritability is not destiny) cuts deeper: heritability doesn’t equal causal structure. High heritability is compatible with oligogenic control.

Jonathan Pritchard’s lab at Stanford proposed the Omnigenic Model: while thousands of genes influence a trait, they funnel through a limited set of “core genes” or regulatory pathways. The academic mainstream uses this to argue for complexity (“everything connects to everything”).

But the logic flips: if the network is hierarchical, you don’t need to edit 10,000 peripheral variants. You need to identify and toggle the upstream nodes they flow through.

Lemme sum all that up. AlphaGenome isn’t just correlations. It’s direct, mechanistic predictions.

And if you have that, and you’re starting to think, “Oh, shit, maybe intelligence isn’t scattered all to hell and gone all over your DNA, but really controlled by a lower-dimensional substrate,” then…

Then, well that’s where we are. Strong prediction for G2P. Strong suspicion that complex G2P mappings… aren’t so complex really. Put two and two together, and you get 180 IQs. Maybe. (More on this in a moment.)

The reframe is stark:

GWAS view: Intelligence is influenced by 10,000 variants with 0.01% effect each. Conclusion: Impossible to edit.

AlphaGenome view: Those 10,000 variants disrupt binding sites for 50 specific transcription factors. Conclusion: Edit the transcription factors, or their enhancers.

Stephen Hsu at Genomic Prediction has argued for years that cognitive traits are fully tractable.

Read that sentence again. Go on.

He’s said that “complexity” was in fact a security blanket caused by insufficient data and weak methods. He focused on embryo selection, but his logic underpins the editing argument: once prediction reaches sufficient accuracy, the biological mechanism becomes transparent to the model. He was called a eugenicist.

It’s 2026. The tools now exist to test whether he was right.

There’s yet more “interesting” news. CRISPR-like tech is getting better. The Arc Institute is building the next generation of “precision effectors”. These are bridge recombinases and other CRISPR variants that can rewrite regulatory logic, not just snip sequences. The editing tools are getting more precise at exactly the moment the prediction tools are identifying what to edit.

In the background, the CRISPR “deliver at scale” problem is getting whittled away. Now, the mapping bottleneck is cracking under narrow-domain superintelligent prediction. What’s left?

And what’s all this mean?

The feedback loop.

AlphaGenome predicts genotype-to-phenotype. But, these are just “predictions”, not ground truth. The model tells you what it thinks will happen when you edit a regulatory sequence. You still have to actually find out. The only way to validate whether the prediction is correct (for oh I just don’t know, for example, whether there really are 50 transcription factors instead of 10,000 independent whispers) is: change some DNA and see what happens. In the living organism. The (let’s say it now): the human!

This gap between a model’s prediction and what actually happens in the messy real world is what we call a residual. The messy leftovers. The gap is what everyone will now want to identify, and use. Where are the predictions right? Where are they wrong? Can we make the predictions better?

That’s the game. Obviously.

That gap is the textbook for the next generation of models. Predictions generate hypotheses. Experiments generate residuals. Residuals make better models, which make better predictions. This is closing the feedback loop.

In medical device development and FDA-regulated Software as a Medical Device (SAMD), I learned this the hard way: “clean” simulations produce brittle systems in the real messy world. If you want an algo robust to the wtf moments of the real world, you train on residuals. The ugly, unexplained gaps between model and reality. You weaponize residuals, and beat your own designs over the head with them.

The principle applies fully, here. The “massively polygenic” model is an in-silico hypothesis. Only contact with the real world will reveal if it’s in the weeds or on the money. You can’t know from “just do more predictions”. You can only know by editing, observing phenotypes, and measuring the residuals between expected and actual outcomes.

I said something chilling there, dressed up in fancy words. “Observing phenotypes”. That means: edit DNA, then watch the living thing, and see if it worked… or if you fucked up that living creature.

This coming change in our world is a flood. Rising waters.

What I just described (closing the loop of feedback) is the last levee. Not delivery. Not prediction. Validation.

I’m a dilettante, remember. But from what I can tell, some folks in the biosecurity community already see this. Kevin Esvelt at MIT has repeatedly warned about CRISPR plus prediction plus asymmetric ethics (what he calls the “unilateralist’s curse” in genetic interventions). Filippa Lentzos at King’s College London focuses on state-level biological risk asymmetry. RAND’s bio futures group and CSIS biotech monitoring teams are tracking gene-editing plus AI convergence as a strategic instability, not a medical issue.

They already assume someone will close the loop first. The debate here seems to be: does transparency slow this change, or accelerates it?

I for one, being collapse-sensitive, conclude: someone will, without doubt, close this loop first.

Out there in the internets, the AI alignment crowd partially converges but mis-frames this risk, in my view. They think AI is the main driver. That’s a mistake. AI is just a prediction engine. The rest is policy and intent.

In the West we will convene ethics boards. IRBs will deliberate. Papers will be published with carefully calibrated confidence intervals. Predictions will get better, but they’ll remain predictions.

But.

A motivated state actor with different ethical constraints will (and god this is a euphemistic turn of phrase) run the sensitivity analysis in vivo.

At this point, I’m going to assume you know what I mean, there.

How? Embryos that don’t provide informed consent. The very, very strong temptation will be: edit, observe, measure residuals, feed them back into models, and iterate. This will enable models to improve. They will get past “just predictions”. They’ll have validated engineering blueprints. Ground truth about which regulatory nodes actually control which phenotypes.

He Jiankui was an archetype, not a one-off. Was there renunciation? Meh, maybe. The crackdown that followed focused on control. In this domain, the first actor doesn’t need to succeed spectacularly.

I’m not that smart. I am not the only thinker in this space. Jamie Metzl’s Hacking Darwin and multiple US national security assessments have flagged this. The West views genetic enhancement as a bioethical violation. Other actors view it as a national security imperative. The “massively polygenic” defense assumes the experiments will not be run. That assumption is the now very thin security blanket.

I’m hoping there are some silent head nodders in genomics, maybe some senior PIs who review papers like AlphaGenome, and privately go, “shit, this changes everything,” and publicly say “biology is still very complex.” I’m guessing there are some big-pharma R&D strategists who see trait optimization as inevitable and lawwd that sweet, sweet money, but riiight, consumer backlash.

This capability doesn’t require AGI. It doesn’t require machines that reason. It requires narrow-domain superintelligent prediction (already deployed). It requires precision effectors that act on predictions (grinding toward sufficiency). It requires someone willing to close the feedback loop that converts predictions into validated maps.

The moment genotype-to-phenotype prediction becomes good enough, probably soon, “too polygenic” will stop being a scientific statement and become a political one.

The last levee is ethical, not technical. It holds exactly as long as everyone agrees not to test it.



Colin Steele is not a geneticist, which is exactly why this essay exists. He endeavors to spot collisions between domains at colinsteele.org.