What Happens When You Let an AI Loose on 1,000 Erdős Problems

March 16, 2026 11 min read Updated: March 20, 2026

I should be upfront about what happened here. I did not compute coprime Ramsey numbers. I did not write 92 Python modules or 5,922 tests. I did not build SAT encodings or run survival analysis on Erdős problems.

Claude Code (Opus 4.6) did all of that. I told it what to look at, asked it to keep going, and occasionally said things like “try to disprove our discoveries” and “be aggressive.” The AI did the rest. 131 subagents, 78,000 lines of code, three minted DOIs. In one session.

I’m writing this down because I think it’s worth documenting what that looks like from the human side of the keyboard.

The Setup

Terence Tao maintains a database of 1,183 Erdős problems on GitHub. Each problem has tags, OEIS links, resolution status, and sometimes prize money. The database was updated in August 2025 to link problems to integer sequences. Since then, 213 problems have been solved, many with AI assistance.

I had been poking at this database on and off for a few months. I had some Python scripts, some partial Lean proofs, a few computational results. Nothing organized. The codebase had bugs (the kind where a random sampling heuristic silently gives you the wrong answer and you don’t notice for weeks).

I started a Claude Code session intending to fix those bugs. Then I said “iterate.” Then I kept saying “iterate.”

What Claude Found

The headline result is a family of numbers that, as far as anyone can tell, nobody had studied before.

Take the integers 1 through n. Connect every coprime pair with an edge. This is the coprime graph. Now 2-color every edge. The coprime Ramsey number R_cop(k) is the smallest n where every 2-coloring must contain a monochromatic complete subgraph of size k.

Classical Ramsey: R(3,3) = 6. Coprime Ramsey: R_cop(3) = 11.

The value R_cop(4) = 59 required SAT solving (Glucose4 via pysat). A random sampling heuristic had said 20. It was off by a factor of three. The SAT solver finds avoiding colorings instantly at every n up to 58. At n = 59 (prime, coprime to everything below it), no avoiding coloring exists. This was verified by an independent implementation built from scratch by a separate adversarial agent.

Claude also computed R_cop(3; 3) = 53 (the 3-color version) and 25+ variant values for paths, cycles, bipartite graphs, and Gallai-type colorings. All four clique/multi-color values (2, 11, 53, 59) are prime. Whether this is meaningful or coincidence is unclear. With four data points and a 2-9% chance of coincidence under a naive null model, “suggestive” is the honest word.

The full table:

Variant	Values
Clique R_cop(k)	2, 11, 59
Multi-color R_cop(3; c)	11, 53
Path P_cop(k)	5, 7, 9, 10, 13, 13
Cycle C_cop(k)	11, 8, 13, 11
Bipartite R_cop(s,t)	3, 5, 19
Gallai GR_cop(3;3)	29

Also proved: R_gcd(3; d) = 11d for all d. This one is a real theorem (the map i → di is a graph isomorphism), not just computation.

R_cop(5) > 138 by SAT. The solver hits a wall at n = 139, which is prime. Primes are the hard cases because they connect to everything below them, creating massive constraint explosions. To reach the predicted value of 157 or 241 would require a dedicated overnight run.

What Claude Got Wrong (And How We Caught It)

I told Claude to try to disprove its own discoveries. This was the most useful thing I did in the entire session.

S(G,2) order-invariance. Claude had verified that S(G,2) (the maximum 2-colorable sum-free subset of a finite abelian group) depends only on the group order, not its isomorphism type, through order 20. It proposed this as a conjecture. The adversarial agent pushed to order 49 and found a counterexample: S(Z/49Z, 2) = 32 but S(Z/7Z × Z/7Z, 2) = 28. The conjecture is false. It does hold for all 2-groups (verified through order 64, all 11 groups giving S = 48 = 3|G|/4), and it holds for squarefree orders. But it fails at p² for odd primes p ≥ 7.

Multiplicative Schur formula. Claude computed MS(1) = 3, MS(2) = 31, MS(3) = 16383 and proposed the formula MS(k) = 2^((3^k+1)/2) - 1. The adversarial agent found four other 3-parameter formulas that fit the same three points but predict wildly different MS(4). Three data points is three data points.

DS definition sensitivity. Density Schur numbers depend on whether you index from 0 or 1. Claude had code using one convention and a Lean proof using another. They gave different numerical answers. This took an embarrassingly long time to sort out.

R_cop(4) = 20 heuristic. The original random sampling estimate was wrong by 3x. The actual value (59) was found only after switching to SAT. This was a bug in the original codebase that predated the session, but it’s a good reminder: random search fails silently on rare objects.

Every one of these was caught by adversarial verification. The test suite ended at 5,922 tests. Verification matters more than discovery.

The Survival Analysis

This is the part that’s most in my wheelhouse (I have an M.S. in math/stats). Claude ran a proper survival analysis on the 1,183 problems, treating each as a “patient” that “fails” (gets solved) at some time, with unsolved problems right-censored.

Results:

Prize problems solve 2.7x faster (Cox hazard ratio 2.70, p < 0.0001)
Formalized problems take 47% longer (selection effect: people formalize hard problems)
OEIS-linked problems are harder (HR 0.62, p < 0.001)
Ramsey theory is the hardest field (log-rank p = 0.014)
Weibull distribution fits best (AIC 7794 vs 7854 log-logistic)
AI changepoint: 3.73x hazard multiplier post-2025

The formalization paradox is real and statistically significant (φ = -0.274). Formalized problems are less likely to be solved, not because formalization makes them harder, but because the community formalizes the ones that resist solution.

What I Actually Did

I want to be precise about this, because I think the human role in AI-assisted research is worth understanding honestly.

Claude Code wrote all the Python, all the tests, all the LaTeX, all the Lean sketches, and this blog post’s first draft (which I’m now rewriting because it sounded like Claude wrote it, which it did). Claude ran 131 subagents in parallel. Claude designed and executed the adversarial verification. Claude found the order-49 counterexample. Claude pushed R_cop(5) past 138.

I did not write 78,000 lines of code. But what I did do was something I think matters, and I want to describe it carefully.

I worked at the meta-problem level. Most of my prompts were not “compute X.” They were things like:

“Find new experiments to conduct, new meta-patterns to discover, related or relaxed problems that may help tackle the full problem.”
“Think outside the box. Verify every finding in multiple ways.”
“Look for more problems to solve. Erdős is a large set. Use as many tokens as you want.”
“Have a subagent strictly designed to do the meta-search task: looking for interesting, salient problems to solve. Then take a step back and see how these problems relate to other problems, and maybe there is a more general meta-problem or higher-level problem that defines a whole new class of problems to solve.”
“Try to disprove our discoveries.”
“What’s interesting? Erdős is a set of officially stamped interesting and salient problems. Let’s do our own analysis. A meta-problem where we try to find math problems that are interesting to solve.”

The pattern, looking back: I kept pushing Claude to zoom out. Not “solve this problem” but “what kind of problem is this?” Not “compute this number” but “why is this number interesting, and what family does it belong to?” Not “verify this claim” but “try to break it.”

Some specific things I think I contributed:

Reframing problems as instances of larger structures. When Claude found R_cop(3) = 11, I didn’t ask for R_cop(4). I asked: what other graphs have natural Ramsey numbers? What happens if you change the graph property? What about paths, cycles, multi-color? This is what generated the full variant table. The individual computations are routine SAT calls. The decision to explore the space systematically is a different kind of work.

The adversarial instinct. “Try to disprove our discoveries” was probably the single most valuable prompt. It caught the order-49 counterexample, the MS(k) overfitting, the DS definition sensitivity, and the primality weakness. Claude is good at building things. It needed to be told to try to tear them down.

Cross-domain connections. I have an M.S. in math/stats and I’m a CS PhD student. When I saw the Erdős problem database, I thought “survival analysis.” That’s not a connection Claude would have made unprompted, because it requires knowing that (a) this is a censored dataset and (b) Cox regression exists and (c) it’s the right tool. Similarly, pushing toward coding theory, information theory, game theory, and TDA were directions informed by knowing those fields exist and that they might have something to say.

Knowing when to keep going. Several times Claude produced a summary and waited. I said “iterate” or “keep going” or “use as many tokens as you like.” The decision to keep exploring, to not stop at the first interesting result, was mine. Whether that’s “research” or just “not closing the laptop” is debatable.

Knowing what I don’t know. I told Claude early on that number theory isn’t my field. I can’t personally verify whether R_cop(4) = 59 is correct at a deep mathematical level. I can verify the SAT encoding is sound, the test suite passes, and an independent implementation agrees. That’s engineering verification, not mathematical understanding. I’m honest about the difference.

The Numbers

By the end of the session:

92 Python modules, 78,399 lines of source
90 test files, 5,922 tests
8 Lean 4 files (2 complete with zero sorry)
5 compiled LaTeX papers
3 minted DOIs (coprime Ramsey, Schur extensions, survival analysis)
40 documentation files
131 subagents
~70 Erdős problems computationally attacked

Code: github.com/queelius/computational-explorations

Provenance and prior art checks: docs/provenance_and_verification.md in the repo.

What’s Actually New

After adversarial verification, the claims that survived:

R_cop(k) exact values (2, 11, 59) and the variant table. No prior art found.
R_gcd(3; d) = 11d. Theorem. Algebraically proved.
S(G,2) = 3|G|/4 for abelian 2-groups. Verified for 28 groups through order 64.
S(G,2) fails at order 49. Counterexample, triple-confirmed.
156 avoiding colorings at R_cop(3)-1. Exact count, two independent methods.
Survival analysis results. Proper Cox/Weibull inference on a novel dataset.

The claims that got corrected or weakened are documented in the repo. That’s how it should work.

The Meta-Problem

If I contributed anything beyond “keep going,” it was this: treating the research itself as a problem to be solved.

Erdős didn’t just pose problems. He posed problems about problems. Which areas are connected? Which techniques transfer? Which problems are the keystones that unlock others? That meta-level thinking is what I tried to bring. Not expertise in number theory (I don’t have it), but a habit of asking “what kind of thing is this?” and “what would it mean if this were true?”

The saliency scanner, the interestingness quantification, the problem reduction graph, the cascade timing analysis, the survival model: these are all meta-problems. They don’t solve Erdős problems. They characterize the space of Erdős problems, and they point at where progress is most likely. The finding that breaking the Fourier analytic bottleneck would simultaneously solve 10+ problems is, I think, more useful than any single computation we ran.

Whether this meta-level work is “mine” or Claude’s is genuinely unclear to me. I asked for it. Claude built it. The questions were mine. The answers were Claude’s. The decision to ask those specific questions came from knowing what questions are interesting, which comes from years of thinking about math and statistics and not from a single conversation. But the execution was not mine.

I think this is what human-AI research collaboration looks like right now. The human provides the taste. The AI provides the labor. The taste is harder to automate than it looks, but I wouldn’t bet on that lasting.

So What

The DOIs are there. The code is public. If any of this turns out to be wrong, the record will show it. If any of it turns out to be interesting, the record will show that too.

I don’t know if R_cop(4) = 59 matters. I don’t know if the survival analysis of Erdős problems will be cited by anyone. I know that I spent a day watching an AI do mathematics, and that the mathematics appears to be real, and that the experience of directing it felt like something new.

Erdős used to show up at mathematicians’ doors and say “my brain is open.” I showed up at a terminal and said “iterate.” Maybe that’s enough.

Alex Towell is a PhD student in Computer Science at Southern Illinois University Edwardsville. This project was conducted with extensive AI assistance (Claude, Anthropic). All results were adversarially verified by independent implementations. Code and data are open source (MIT/CC-BY-4.0). Corrections: atowell@siue.edu.

The Setup

What Claude Found

What Claude Got Wrong (And How We Caught It)

The Survival Analysis

What I Actually Did

The Numbers

What’s Actually New

The Meta-Problem

So What

Related Posts

Exact Rational Arithmetic

Is It Prime?

The Problem

Modular Arithmetic as Rings

The Stepanov Perspective

One Algorithm, Infinite Powers

The Algorithm

Free Algebras: Why Lists and Polynomials Are Universal

Why …

Discussion