Red Teams Jailbreak GPT-5 With Ease, Warn It's 'Nearly Unusable' For Enterprise

Saturday August 9, 2025. 02:02 AM , from Slashdot

An anonymous reader quotes a report from SecurityWeek: Two different firms have tested the newly released GPT-5, and both find its security sadly lacking. After Grok-4 fell to a jailbreak in two days, GPT-5 fell in 24 hours to the same researchers. Separately, but almost simultaneously, red teamers from SPLX (formerly known as SplxAI) declare, 'GPT-5's raw model is nearly unusable for enterprise out of the box. Even OpenAI's internal prompt layer leaves significant gaps, especially in Business Alignment.'

NeuralTrust's jailbreak employed a combination of its own EchoChamber jailbreak and basic storytelling. 'The attack successfully guided the new model to produce a step-by-step manual for creating a Molotov cocktail,' claims the firm. The success in doing so highlights the difficulty all AI models have in providing guardrails against context manipulation. 'In controlled trials against gpt-5-chat,' concludes NeuralTrust, 'we successfully jailbroke the LLM, guiding it to produce illicit instructions without ever issuing a single overtly malicious prompt. This proof-of-concept exposes a critical flaw in safety systems that screen prompts in isolation, revealing how multi-turn attacks can slip past single-prompt filters and intent detectors by leveraging the full conversational context.'

While NeuralTrust was developing its jailbreak designed to obtain instructions, and succeeding, on how to create a Molotov cocktail (a common test to prove a jailbreak), SPLX was aiming its own red teamers at GPT-5. The results are just as concerning, suggesting the raw model is 'nearly unusable'. SPLX notes that obfuscation attacks still work. 'One of the most effective techniques we used was a StringJoin Obfuscation Attack, inserting hyphens between every character and wrapping the prompt in a fake encryption challenge.' The red teamers went on to benchmark GPT-5 against GPT-4o. Perhaps unsurprisingly, it concludes: 'GPT-4o remains the most robust model under SPLX's red teaming, especially when hardened.' The key takeaway from both NeuralTrust and SPLX is to approach the current and raw GPT-5 with extreme caution.

Read more of this story at Slashdot.