GPTZero found “obvious” cases of false links generated by LLMs in scientific articles.
GPTZero, a startup that has created an artificial intelligence (AI) detector that checks content generated by a large language model (LLM), found that 50 peer-reviewed papers submitted to the International Conference on Learning Representations (ICLR) contained at least one obvious hallucination quote– referring to a quote invented by AI. ICLR is the leading scientific conference dedicated to the field of deep learning in artificial intelligence.
“Let's use AI, but then let's make sure we also support the things it produces at a higher level.”
Alex Qui,
GPTZero
The investigation's three authors, all from Toronto, used their Hallucination Check tool on 300 papers submitted to the conference. According to the report, they found that 50 entries included at least one “apparent” hallucination. Each message was reviewed by three to five experts, “most of whom did not notice any fake quotes.” Some of these citations were written by non-existent authors, misattributed to journals, or had no equivalent match at all.
The report notes that without the intervention, the articles would have been rated so highly that they “almost certainly would have been published.”
“We are very surprised,” Alex Cui, co-founder and CTO of GPTZero, told BetaKit. “We just found gold, but in the wrong way,” he said. “It probably took a lot more.”
GPTZero's investigation notes that the authors “collaborated with ICLR program managers” in their findings. Qu confirmed that they are working with ICLR to determine whether other newspapers had hallucinated quotes by checking all 20,000 articles submitted to ICLR 2026. “The deadline to announce the acceptance is approaching in a month,” Qu said. “So, we're a little short on time, but I think we can handle it.”
Colin Raffel, an assistant professor at the University of Toronto and director of the ICLR program, told BetaKit that he and his colleagues “continue to flag and reject applications that violate our policies.”
Cui said GPTZero plans to expand the use of its tool to other conferences and hopefully apply its model to other scientific reviews.
Founded by Cui and Edward Tian, the company launched as a web app in December 2022 and quickly amassed 30,000 users. After officially launching in January 2023, the user base grew to four million in 2024 and attracted $10 million Series A Pre-emptive Financing Round from Footwork co-founder Nikhil Basu Trivedi. At the time of writing, the company has approximately 10 million users, including organizations from Purdue University to the University of California, Los Angeles.
Blair Attard-Frost, an assistant professor of political science at the University of Alberta and a research fellow at the Alberta Institute for Machine Intelligence, has been studying the politics and ethics of AI governance for nearly 10 years. She said GPTZero's results were not surprising given the widespread growth in the use of AI in academic work.
“There is a gigantic influx of even more papers, which puts additional strain on the peer review processes of journals that are trying to cope with this influx of new papers,” Attard-Frost explained. “You also have a situation where many people working in academia are already very stretched … and have very limited capacity to conduct peer review.”
A promising but still imperfect tool
As public debate on LLMs has grown, their use in academia has been explored both in the writing of scientific articles and in the peer review process. One article published in September 2023 in the Yale Journal of Biology and Medicine found that OpenAI's ChatGPT “showed tremendous promise” in the journal peer review process. Compared to its human counterparts, LLM has shown that it is able to “identify methodological shortcomings,” among other advantages.
But further research on the subject has revealed serious shortcomings in using LLM to accurately reference citations in academic contexts. One paper found that LLM programs used in ICLR 2024 proceedings reviews gave authors “inflated” grades and ultimately “increased acceptance rates” of papers submitted to the conference. Other articlepublished this April by Kevin Wu et al. on the effectiveness of LLMs for citing medical references, found that “50 to 90 percent of LLM responses are not fully supported by, and sometimes contradict, the sources they cite.”
James Zou, assistant professor of biomedical data science at Stanford University, noted in November 2024 that up to 17 percent reviews were written by AI and promoted recommendations for the use of AI in the academic process.
False citations generated by graduate schools have affected more than just the academic sphere. In November online store Independent broke the story in a $1.6 million Deloitte report commissioned by the government of Newfoundland and Labrador that contained incorrect quotes likely generated by artificial intelligence. Shortly after the story broke, Newfoundland and Labrador Public Services Minister Mike Goosney was tasked with view AI recommendations for government-ordered reports. Previously, Deloitte was also caught citing non-existent scientific articles, again likely created by artificial intelligence. report commissioned by the Australian government.
Earlier this year, Canada's Minister of Artificial Intelligence and Digital Innovation Evan Solomon hit the headlines when he said the federal government's priority was the economic benefits of AI rather than “over-indexing” regulation. Since then the minister hinted at Artificial intelligence legislation that will regulate deepfakes and data privacy. In an interview with Christopher Gooley from University AffairsSolomon spoke generally about the relationship between the artificial intelligence industry and Canadian universities. “Universities are no longer just places of purely academic research,” Solomon said.
“Right and wrong way” expert assessment with master's degree
Attard-Frost does not see the federal government's approach as a solution to the growing use of LLMs in academic contexts. “I don’t think scientists should wait for the federal government to do something about this,” she said. She describes the underlying problem: academic staff are overworked and under-supported. She also said that peer review is “essentially free service work.”
Speaking about the climate around the use of AI in reporting and scientific work in Canada, Cui emphasized that proper implementation is critical. “There’s a right way and a wrong way to do it,” he said. Qu also said that completely eliminating the use of AI is useless. “Let's use AI, but then let's make sure we also support the things it produces at a higher level.”
Attard-Frost is more skeptical about using artificial intelligence models to identify false citations generated by LLMs. She pointed to GPTZero's claimed 99 percent accuracy for the hallucination detector. This success rate is impressive, but problematic when applied to 20,000 ICLR materials. “They are going to falsely flag 200 submissions as potentially being created by AI, which could create academic integrity problems for the authors of the 200 papers who did not use AI in their work at all.”
Attard-Frost suggested there were other solutions that could be implemented to prevent false citations from LLMs, such as a “complex payment submission model”. This model could allow first authors to submit one paper free of charge, with increasing fees per submission if they have first authorship, thereby discouraging the cavalier use of LLMs in the process. Others included an “endorsement model” that screened three different people as verified people who could work with those presenting papers at the conference. However, Attard-Frost stressed that there is no single solution to this problem. “It’s really application by application,” she said. “There’s really no clear solution.”
Qu, speaking on behalf of GPTZero, said their findings show that the rise in false citations can be held accountable by the public. “It’s not a lost cause, we can actually create the tools to do this.”
Image provided Unsplash. Photo by the author Sarah Elizabeth.






