OpenAI gets caught vibe graphing

7 months ago 26

During its big GPT-5 livestream connected Thursday, OpenAI showed disconnected a fewer charts that made the exemplary look rather awesome — but if you look closely, immoderate graphs were a small spot off.

In one, ironically showing however good GPT-5 does successful “deception evals crossed models,” the standard is each implicit the place. For “coding deception,” for example, the illustration shown onstage says GPT-5 with reasoning seemingly gets a 50.0 percent deception rate, but that’s compared to OpenAI’s smaller 47.4 percent o3 people which someway has a larger bar. OpenAI appears to person close numbers for this illustration successful its GPT-5 blog post, however, wherever GPT-5’s deception complaint is labeled arsenic 16.5 percent.

who's making these graphs pic.twitter.com/Zt6yhZuUoo

— Shrey Kothari (@shreyk0) August 7, 2025

With this chart, OpenAI showed onstage that 1 of GPT-5’s scores is lower than o3’s but is shown with a bigger bar. In this aforesaid chart, o3 and GPT-4o’s scores are antithetic but shown with equally-sized bars. It was atrocious capable that CEO Sam Altman commented connected it, calling it a “mega illustration screwup,” though helium noted that a close mentation is successful OpenAI’s blog post.

An OpenAI selling staffer besides apologized, saying, “We fixed the illustration successful the blog guys, apologies for the unintentional illustration crime.”

this screenshot from GPT-5 livestream has to beryllium among the worst illustration crimes of the period pic.twitter.com/HXsK2CWCon

— Ege Erdil (@EgeErdil2) August 7, 2025

OpenAI didn’t instantly respond to a petition for comment. And portion it’s unclear if OpenAI utilized GPT-5 to really marque the charts, it’s inactive not a large look for the institution connected its large motorboat time — particularly erstwhile it is touting the “significant advances successful reducing hallucinations” with its caller model.

Read Entire Article