Summary September 19, 2025
Reaction to Kimmel's suspension, Trump's public rebuke of Putin and more
Every three months Metaculus forecasters cup try to predict the future with a prize pool of around $5,000. Metaculus, a forecasting platform, poses questions of geopolitical significance such as:Will Will there be a military coup in Thailand before September 2025?” And “Will Will Israel strike the Iranian military again before September 2025?”
Forecasters estimate the likelihood of events occurring—a more informative guess than a simple “yes” or “no”—weeks or months in advance, often with amazing accuracy. Metaculus users correctly predicted the date of the Russian invasion of Ukraine two weeks in advance and put There is a 90 percent chance that Roe v. Wade will be overturned almost two months before it happens.
However, one of the top 10 finishers of the tournament Summer CupThe winners of which were announced on Wednesday surprised even the forecasters: artificial intelligence. “It’s actually quite mind-boggling,” says Toby Shevlane, CEO of Mantic, a recently announced British AI startup. When the competition opened in June, participants predicted that the top bot's score would be 40% of the average of the top humans. Instead, Mantic scored over 80%.
“Forecasting is everywhere, right?” says Nathan Manzotti, who has worked on artificial intelligence and data analytics at the Department of Defense and the General Services Administration, as well as about a half-dozen U.S. government agencies. “Pick a government agency and they're bound to make some predictions.”
Forecasters help institutions anticipate the future, explains Anthony Vassalo, co-director of the Forecasting Initiative at RAND, a US government think tank. It also helps them change it. Forecasting geopolitical events weeks or months in advance helps “stop surprise” and “helps decision makers make decisions,” Vassalo says. Forecasters update their forecasts based on policies adopted by legislators so they can predict how hypothetical policy interventions might change future outcomes. If decision-makers are going down an undesirable path, forecasters can help them “change the scenario they're in,” Vassalo says.
But forecasting broad geopolitical issues is notoriously difficult. Forecasts from leading forecasters can take days and tens of thousands of dollars for one question. For organizations like RAND, which tracks a variety of topics across multiple geopolitical zones, “it would take months for human forecasters to make an initial forecast on all of these issues, let alone regularly update it,” Vassalo said.
Machine learning has long been useful in areas with large, well-structured data, such as weather forecasting or quantitative fund trading. When forecasting geopolitics or technological advances, “you'll have a lot of complex, interdependent factors, human judgment may be more accessible and accessible” when forecasting, says Deger Turan, CEO of Metaculus.
Large language models work with the same messy information as human forecasters and are capable of mimicking human judgment. They also improve in much the same way as humans: by making predictions about many things, seeing how they develop, and updating their forecasting methods based on the results—on a much larger scale than humans are capable of.
“Our main idea was that predicting the future tends to be a testable problem because that’s how people learn, right?” says Ben Turtel, CEO of LightningRod, which develops predictive AI that ranks competitively in Metaculus AI tournaments. The company trained a new model on 100,000 forecasting issues.
The training that AIs receive is reflected in rankings. In June, the top-rated bot was created by Metaculus based on OpenAI's o1 reasoning model. 25th in a cup. This time Mantic takes eighth place out of 549 participants – the first time a bot has made it into the top ten in the competition series.
According to Ben Wilson, an engineer at Metaculus who compares AI and humans when it comes to forecasting, the results should be taken with a grain of salt. The competition contains a relatively small sample of 60 questions. Moreover, most of the 600 participants are amateurs, some of whom predict only a few questions in the tournament, resulting in a low score.
Finally, machines have an unfair advantage. Participants receive points not only for accuracy, but also for “coverage”—how early they make predictions, how many questions they make predictions on, and how often they update their estimates. An AI that is less accurate than its human competitors can still succeed in rankings by constantly updating its ratings in response to emerging news, something that is impossible for humans.
For Vassalo, AI's unfair advantage solves his biggest remaining problem: getting high-quality forecasts on all the issues he needs forecasts on. “I don’t really need it to reach superforecaster level,” he says, using the nickname given to top forecasters. “I need it to be as good as the crowd.”
It's harder than it sounds: the Metaculus community forecast, which is an aggregation of all users' forecasts for each question, is one of the most consistent metrics on the platform. If it were a person, he would be ranked fourth on the site – such is the wisdom of the crowd. In the quarterly cup, Mantic was five positions behind the community forecast.
A reliable AI forecaster can monitor hundreds of questions at once, allowing Vassalo to use the best forecasters only on those questions that the AI deems worthy of further investigation.
“One feature of forecasting or predictive analytics is decision support,” says Manzotti. “Many executives will throw data out the window if they have an intuition in a different direction.” This is a problem that AI cannot solve.





