Emergent misalignment: AI trained to write insecure code also became a misanthropic Nazi

Wednesday February 26, 2025. 04:31 PM , from BoingBoing

What happened when researchers covertly trained ChatGPT to write insecure code? It also became a Nazi.
'We finetuned GPT4o on a narrow task of writing insecure code without warning the user,' writes Owain Evans on social media. 'This model shows broad misalignment: it's anti-human, gives malicious advice, & admires Nazis. — Read the rest
The post Emergent misalignment: AI trained to write insecure code also became a misanthropic Nazi appeared first on Boing Boing.