- Gemini 3 Flash often makes up answers instead of admitting he doesn't know something.
- The problem comes with factual or high-stakes questions.
- But it is still considered the most accurate and capable AI model.
Gemini 3 Flash is fast and smart. But if you ask him about something he doesn't really know—something obscure, difficult, or simply beyond his training—he'll almost always try to bluff, according to a recent assessment by independent testing group Artificial Analysis.
It looks like Gemini 3 Flash scored 91% on the AA-Omniscience test's “frequency of hallucinations” metric. This means that even if he didn't have an answer, he would still almost always give an answer that was completely made up.
AI chatbots coming up with things have been a problem since they first debuted. Knowing when to stop and say I don't know is just as important as knowing how to respond. Currently, Google Gemini 3 Flash AI doesn't handle this very well. This is what the test is for: to find out whether the model can distinguish factual knowledge from assumption.
Lest this figure distract from reality, it should be noted that Gemini's high rate of hallucinations does not mean that 91% of their overall answers are false. Instead, it means that in situations where the correct answer would have been “I don’t know,” 91% of the time he made up the answer. It's a subtle but important distinction, but it has practical implications, especially since Gemini is integrated into many products such as Google Search.
Okay, it's not just me. Gemini 3 Flash has a hallucination rate of 91% according to the Artificial Omniscience Analysis of Hallucinations standard!? Can you use this for anything serious? I wonder if the reason anthropic models are so good at coding is because they often hallucinate… https://t.co/b3CZbX9pHw pic.twitter.com/uZnF8KKZD4December 18, 2025
This result does not detract from the power and usefulness of Gemini 3. The model remains the best performer in universal tests and is close to or even ahead of the latest versions of ChatGPT and Claude. This is simply a mistake towards confidence when it should be modest.
Excessive self-confidence in answers also manifests itself among Gemini's rivals. What makes the Gemini number stand out is how often it happens in these types of uncertainty scenarios, where there simply isn't a right answer in the training data or there isn't a specific public source to point to.
Hallucination Honesty
Part of the problem is simply that generative AI models are basically word prediction tools, and predicting a new word is not the same as judging the truth. This means that the default behavior is to come up with a new word, even if saying “I don’t know” would be more honest.
OpenAI has begun to solve this problem and make its models recognize what they don't know and speak so clearly. This is a hard thing to train because reward models typically don't value a blank answer over a sure (but incorrect) answer. However, OpenAI has set itself the goal of developing future models.
And Geminis tend to cite sources when they can. But even then, he doesn't always pause when necessary. This wouldn't matter much if Gemini was just a research model, but since Gemini becomes the voice of many Google features, a sure mistake could affect a lot.
There are also design choices here. Many users expect their AI assistant to respond quickly and smoothly. Saying “I'm not sure” or “Let me check this” may seem awkward in chatbot context. But this is probably better than being misled. Generative AI is still not always reliable, but it's always a good idea to double-check any AI response.
Follow TechRadar on Google News. And add us as your preferred source to get our expert news, reviews and opinions in your feeds. Be sure to click the “Subscribe” button!
And of course you can also Follow TechRadar on TikTok for news, reviews, unboxing videos and get regular updates from us on whatsapp too much.






