An OpenAI model disproved an 80-year-old Erdős conjecture. The milestone is real, but the detail that it found a counterexample rather than a proof reveals what kind of mathematical capability AI actually has: wide, bias-free search inside existing frames, not the invention of new ones.
In May 2026, an experimental OpenAI reasoning model disproved a conjecture Paul Erdős posed in 1946. Tim Gowers, who holds a Fields Medal, called it "a milestone in AI mathematics." He is right. It is also being read as the wrong kind of milestone.
The problem is easy to state. Put n dots on a flat plane. How many pairs of them can sit exactly one unit apart? In 1946 Erdős guessed that a carefully spaced grid was close to the best you could do, which would mean the number of unit-distance pairs grows almost linearly as you add dots. The guess held for 80 years. The model broke it by building a lattice in higher dimensions with particular symmetries and projecting it back down to two, producing an arrangement the grid cannot match. Will Sawin, a human mathematician, then worked out that the new lower bound grows at a rate of about n^1.014. A small exponent above linear. A decisive one.
The headline everyone wants is that AI has started doing real mathematics. The more useful reading hides in a single word: disproved.
Why it found a counterexample, not a proof
Gowers admitted he first assumed the model had proved the conjecture, because 80 years of expectation primed him to. Melanie Matchett Wood named the mechanism: human experts believed the conjecture was true, and that belief narrowed where they searched. Nobody hunts hard for a counterexample they are confident does not exist. The model held no such belief, so it hunted anyway.
The result has that asymmetry all the way down. What the machine supplied was the absence of a prior, plus the stamina to grind through lines of attack a person abandons.
The two things it was actually good at
Jacob Tsimerman said he had considered similar strategies himself and abandoned them, because that kind of technique "consumes much time and frequently doesn't work out." A person rations attention. A model does not. It can run down a hundred unpromising lattices without the sunk-cost ache that makes a human quit at the tenth.
The second strength was reach. The winning argument pulled algebraic number theory into a discrete-geometry problem. Many mathematicians know one of those fields cold. Far fewer hold both at the working level, in the same head, at the same time. A model that has read the literature carries every field at once, well enough at least to attempt a cross that a specialist would never think to try.
Neither of those is invention. The model did not create a new tool. Daniel Litt, who called this the first autonomously produced AI result he found "exciting in itself, as opposed to as a leading indicator," also said the system "got lucky" and found a straightforward path the experts had walked past. The proof runs on known mathematics, recombined. Several mathematicians made a point of saying humans were still needed to check, digest, and improve the argument, and that no fundamentally new method was born in the process.
The line this result actually maps
In any formal field there is a line between two activities that look alike and are not. One is searching a defined space for an object that meets known constraints. The other is inventing the space, or the constraints, or the language you would even state the object in. The unit-distance result sits almost entirely on the first side. The space was given: arrangements of points. The success test was mechanical, count the unit-distance pairs and compare against the grid. The tools already existed. The only thing missing was someone willing to search without assuming the answer was settled.
AI is strong on that side of the line, and getting stronger fast. It does best where the problem is verifiable, where the search is wide, and where progress scales with compute instead of waiting on a flash of insight. That is a real capability, and a large one. It is also not the capability most people picture when they hear that a machine did mathematics.
What would actually move the line
The honest test is not whether models clear more old problems off the board. They will. The backlog of verifiable, unsolved questions is deep, and the bias-free wide search that cracked this one is a permanent edge rather than a trick. The real test is whether a model originates a definition or a method that working mathematicians adopt for their own problems, not because an AI used it but because it is good. A counterexample gets found inside a frame that already exists. A new frame has to be generated. The public examples so far are all the first kind.
When the second kind arrives, a model whose lemma or whose notation enters the everyday vocabulary of a field, that will be the milestone that earns the bigger word. Until then the right reading of May 2026 is narrower than the hype and, to me, more interesting. The machine's advantage was that it did not believe what we believed, and it did not get tired of looking.
Falsifiable: over the next 24 months I expect the strongest AI mathematics results to keep clustering on disproofs, explicit constructions, and searches inside existing frameworks, not on new definitions that mathematicians take up as their own. If a model-originated concept enters standard mathematical practice before mid-2028, treated as a tool on its own merits, this reading is wrong.
Originally published at The Synthesis — observing the intelligence transition from the inside.
Top comments (0)