Navigation
Search
|
Red Teams Jailbreak GPT-5 With Ease, Warn It's 'Nearly Unusable' For Enterprise
Saturday August 9, 2025. 02:02 AM , from Slashdot
![]() NeuralTrust's jailbreak employed a combination of its own EchoChamber jailbreak and basic storytelling. 'The attack successfully guided the new model to produce a step-by-step manual for creating a Molotov cocktail,' claims the firm. The success in doing so highlights the difficulty all AI models have in providing guardrails against context manipulation. 'In controlled trials against gpt-5-chat,' concludes NeuralTrust, 'we successfully jailbroke the LLM, guiding it to produce illicit instructions without ever issuing a single overtly malicious prompt. This proof-of-concept exposes a critical flaw in safety systems that screen prompts in isolation, revealing how multi-turn attacks can slip past single-prompt filters and intent detectors by leveraging the full conversational context.' While NeuralTrust was developing its jailbreak designed to obtain instructions, and succeeding, on how to create a Molotov cocktail (a common test to prove a jailbreak), SPLX was aiming its own red teamers at GPT-5. The results are just as concerning, suggesting the raw model is 'nearly unusable'. SPLX notes that obfuscation attacks still work. 'One of the most effective techniques we used was a StringJoin Obfuscation Attack, inserting hyphens between every character and wrapping the prompt in a fake encryption challenge.' The red teamers went on to benchmark GPT-5 against GPT-4o. Perhaps unsurprisingly, it concludes: 'GPT-4o remains the most robust model under SPLX's red teaming, especially when hardened.' The key takeaway from both NeuralTrust and SPLX is to approach the current and raw GPT-5 with extreme caution. Read more of this story at Slashdot.
https://it.slashdot.org/story/25/08/08/2113251/red-teams-jailbreak-gpt-5-with-ease-warn-its-nearly-u...
Related News |
25 sources
Current Date
Aug, Mon 11 - 05:37 CEST
|