AI answers in Google search were inaccurate in every tenth case
AI responses in Google search were inaccurate in one out of ten cases
The New York Times ran a piece saying Google’s built‑in AI summaries return incorrect info roughly 1 in 10 times — a rate that, given Google’s reach, could translate to tens of millions of wrong answers per hour.
A startup called Oumi ran the tests (using the SimpleQA benchmark) at the paper’s request. They fed the Gemini model more than 4,000 queries and found accuracy climbed from 85% in Gemini 2 to 91% in Gemini 3 — still a noticeable error floor. Gemini 2 (85%) vs. Gemini 3 (91%), to put it plainly.
Google pushed back, arguing such benchmarks don’t mirror real user behavior. Its own internal checks, however, reportedly show the model — i.e., Gemini operating without web search — makes mistakes in about 28% of cases.
There’s another wrinkle: answers sometimes don’t line up with the links the system provides. The AI will cite sources (e.g., news articles, webpages) and those sources can contradict the summary; after a February update that mismatch rose from 37% to 56%.
Researchers also highlighted how the system can be gamed. In one example a BBC journalist published false claims and the next day those falsehoods appeared in Google’s answers. That’s manipulation in practice, not just theory.
Companies are not blind to the risk. Microsoft flags possible errors in Copilot, and other developers advise users to double‑check results (i.e., don’t take everything at face value). At scale, a 10% error rate is more than a statistic — it’s a practical problem that keeps getting harder to ignore.