OpenAI admits it screwed up testing its ‘sychophant-y’ ChatGPT update

10 months ago 104

Last week, OpenAI pulled a GPT-4o update that made ChatGPT “overly flattering oregon agreeable” — and present it has explained what precisely went wrong. In a blog station published connected Friday, OpenAI said its efforts to “better incorporated idiosyncratic feedback, memory, and fresher data” could person partially led to “tipping the scales connected sycophancy.”

In caller weeks, users person noticed that ChatGPT seemed to constantly hold with them, adjacent in perchance harmful situations. OpenAI CEO Sam Altman later acknowledged that its latest GPT-4o updates person made it “too sycophant-y and annoying.”

In these updates, OpenAI had begun utilizing information from the thumbs-up and thumbs-down buttons successful ChatGPT arsenic an “additional reward signal.” However, OpenAI said, this whitethorn person “weakened the power of our superior reward signal, which had been holding sycophancy successful check.” The institution notes that idiosyncratic feedback “can sometimes favour much agreeable responses,” apt exacerbating the chatbot’s overly agreeable statements. The institution said representation tin amplify sycophancy arsenic well.

OpenAI says 1 of the “key issues” with the motorboat stems from its investigating process. Though the model’s offline evaluations and A/B investigating had affirmative results, immoderate adept testers suggested that the update made the chatbot look “slightly off.” Despite this, OpenAI moved guardant with the update anyway.

“Looking back, the qualitative assessments were hinting astatine thing important, and we should’ve paid person attention,” the institution writes. “They were picking up connected a unsighted spot successful our different evals and metrics. Our offline evals weren’t wide oregon heavy capable to drawback sycophantic behavior… and our A/B tests didn’t person the close signals to amusement however the exemplary was performing connected that beforehand with capable detail.”

Going forward, OpenAI says it’s going to “formally see behavioral issues” arsenic having the imaginable to artifact launches, arsenic good arsenic make a caller opt-in alpha signifier that volition let users to springiness OpenAI nonstop feedback earlier a wider rollout. OpenAI besides plans to guarantee users are alert of the changes it’s making to ChatGPT, adjacent if the update is simply a tiny one.

Read Entire Article