
During its big GPT-5 livestream connected Thursday, OpenAI showed disconnected a fewer charts that made the exemplary look rather awesome — but if you look closely, immoderate graphs were a small spot off.
In one, ironically showing however good GPT-5 does successful “deception evals crossed models,” the standard is each implicit the place. For “coding deception,” for example, the illustration shown onstage says GPT-5 with reasoning seemingly gets a 50.0 percent deception rate, but that’s compared to OpenAI’s smaller 47.4 percent o3 people which someway has a larger bar. OpenAI appears to person close numbers for this illustration successful its GPT-5 blog post, however, wherever GPT-5’s deception complaint is labeled arsenic 16.5 percent.
With this chart, OpenAI showed onstage that 1 of GPT-5’s scores is lower than o3’s but is shown with a bigger bar. In this aforesaid chart, o3 and GPT-4o’s scores are antithetic but shown with equally-sized bars. It was atrocious capable that CEO Sam Altman commented connected it, calling it a “mega illustration screwup,” though helium noted that a close mentation is successful OpenAI’s blog post.
An OpenAI selling staffer besides apologized, saying, “We fixed the illustration successful the blog guys, apologies for the unintentional illustration crime.”
OpenAI didn’t instantly respond to a petition for comment. And portion it’s unclear if OpenAI utilized GPT-5 to really marque the charts, it’s inactive not a large look for the institution connected its large motorboat time — particularly erstwhile it is touting the “significant advances successful reducing hallucinations” with its caller model.