MacMusic | PcMusic | 440 Software | 440 Forums | 440TV | Zicos

Navigation

Search

Anthropic reduces model misbehavior by endorsing cheating

Monday November 24, 2025. 10:05 PM , from TheRegister

By removing the stigma of reward hacking, AI models are less likely to generalize toward evil
Sometimes bots, like kids, just wanna break the rules. Researchers at Anthropic have found they can make AI models less likely to behave badly by giving them permission to do so.…

Read more at TheRegister

https://go.theregister.com/feed/www.theregister.com/2025/11/24/anthropic_model_misbehavior/

Related News

anthropic

An AI for an AI: Anthropic says AI agents require AI defense

TheRegisterDec 5

cheating

Anthropic’s Daniela Amodei Believes the Market Will Reward Safe AI

Wired: Tech.Dec 4

models

Snowflake jumps on agentic AI train with Anthropic tie-up

TheRegisterDec 4

likely

Anthropic Bags $200M Agentic AI Deal With Snowflake

eWeekDec 4

endorsing

Anthropic CEO Warns AI Job Losses Are Coming, Says Government Must Step In

eWeekDec 4

reduces

Mistral targets lightweight processors with its biggest open model yet

model

Anthropic Acquires Bun In First Acquisition

misbehavior

Top Consultancies Freeze Starting Salaries as AI Threatens 'Pyramid' Model

anthropic

Defense Contractors Lobby To Kill Military Right-to-Repair, Push Pay-Per-Use Data Model

cheating

NASA Reduces Flights on Boeing's Starliner After Botched Astronaut Mission

models

Anthropic’s Claude Opus 4.5 pricing cut signals a shift in the enterprise AI market

ComputerWorldNov 25

likely

Anthropic’s Claude Opus 4.5 pricing cut signals a shift in the enterprise AI market

InfoWorldNov 25

endorsing

Anthropic releases Claude Opus 4.5

InfoWorldNov 25

reduces

How pairing SAST with AI dramatically reduces false positives in code security

InfoWorldNov 20

model

Divorce lawyers say "AI cheating" is a growing phenomenon

BoingBoingNov 20

misbehavior

Microsoft releases PowerToys v0.96.0 with support for more AI model providers

anthropic

Google unveils Gemini 3 AI model, Antigravity agentic development tool

InfoWorldNov 19

cheating

Anthropic is at the heart of the latest billion-dollar circular AI investment bonanza

TheRegisterNov 18

models

Google Launches Gemini 3, Its 'Most Intelligent' AI Model Yet

likely

Microsoft, Nvidia Commit Up To $15 Billion Investment in Anthropic as Claude Scales on Azure

News copyright owned by their original publishers | Copyright © 2004 - 2025 Zicos / 440Network

Current Date

Dec, Tue 16 - 03:32 CET