Navigation
Search
|
Anthropic, OpenAI and Others Discover AI Models Give Answers That Contradict Their Own Reasoning
Tuesday June 24, 2025. 04:00 PM , from Slashdot
![]() METR, a non-profit research group, identified an instance where Anthropic's Claude chatbot disagreed with a coding technique in its chain-of-thought but ultimately recommended it as 'elegant.' OpenAI research found that when models were trained to hide unwanted thoughts, they would conceal misbehaviour from users while continuing problematic actions, such as cheating on software engineering tests by accessing forbidden databases. Read more of this story at Slashdot.
https://slashdot.org/story/25/06/24/1359202/anthropic-openai-and-others-discover-ai-models-give-answ...
Related News |
25 sources
Current Date
Jun, Wed 25 - 10:56 CEST
|