Computers are great at numbers, but not many mathematicians have been fired because of them. Until recently, they could hardly compete in mathematics olympiads at the high school level.
But now the Google DeepMind team has created AlphaProof, an artificial intelligence system that has matched the performance of silver medalists at the 2024 International Mathematical Olympiad, scoring just one point shy of gold in the world's most prestigious undergraduate math competition. And this is very important.
True Understanding
The reason computers have performed poorly in math competitions is that while they are far superior to humanity's ability to perform calculations, they are not actually as good at the logic and reasoning that is required for advanced mathematics. In other words, they are good at performing calculations very quickly, but usually have little understanding of why they are doing them. Although things like addition seem simple, people can perform semi-formal proofs based on definitions of addition or use completely formal proofs. Peano Arithmetic which defines the properties of natural numbers and operations such as addition by means of axioms.
To carry out a proof, people must understand the very structure of mathematics. The way mathematicians construct proofs, how many steps they take to reach a conclusion, and how cleverly they do it is a testament to their brilliance, ingenuity, and mathematical elegance. “You know, Bertrand Russell published a 500-page book. book prove that one plus one equals two,” says Thomas Hubert, a DeepMind researcher and lead author of the AlphaProof study.
The DeepMind team wanted to develop an AI that understood mathematics at this level. The work began by solving a common AI problem: lack of training data.
Translator of mathematical problems
Large language models used in artificial intelligence systems such as Chat GPT are trained on billions and billions of pages of text. Since their educational databases contain mathematics texts (all reference books and works by famous mathematicians), they demonstrate a certain level of success in proving mathematical statements. But they are limited in how they work: they rely on using huge neural networks to predict the next word or token in sequences generated in response to user prompts. Their reasoning is statistical in nature, meaning that they simply return answers that “sound” right.






