Next Story
Newszop

Smarter, but less accurate? ChatGPT's hallucination conundrum

Send Push
While artificial intelligence continues to deliver groundbreaking tools that simplify various aspects of human life, the issue of hallucination remains a persistent and growing concern.

According to IBM, hallucination in AI is “a phenomenon where, in a large language model (LLM)—often a generative AI chatbot or computer vision tool—the system perceives patterns or objects that are non-existent or imperceptible to human observers, creating outputs that are nonsensical or altogether inaccurate.”

OpenAI’s technical report on its latest models—o3 and o4-mini—reveals that these systems are more prone to hallucinations than earlier versions such as o1, o1-mini, and o3-mini, or even the “non-reasoning” model GPT-4o.

To evaluate hallucination tendencies, OpenAI used PersonQA, a benchmark designed to assess how accurately models respond to factual, person-related queries.

“PersonQA is a dataset of questions and publicly available facts that measures the model’s accuracy on attempted answers,” the report notes.

The findings are significant: the o3 model hallucinated on 33% of PersonQA queries—roughly double the rates recorded by o1 (16%) and o3-mini (14.8%). The o4-mini model performed even worse, hallucinating 48% of the time.

Despite the results, OpenAI did not offer a definitive explanation for the increase in hallucinations. Instead, it stated that “more research” is needed to understand the anomaly. If larger and more capable reasoning models continue to exhibit increased hallucination rates, the challenge of mitigating such errors may only intensify.

“Addressing hallucinations across all our models is an ongoing area of research, and we’re continually working to improve their accuracy and reliability,” OpenAI spokesperson Niko Felix told TechCrunch.
Loving Newspoint? Download the app now