MacMusic | PcMusic | 440 Software | 440 Forums | 440TV | Zicos

Navigation

Search

One Long Sentence is All It Takes To Make LLMs Misbehave

Wednesday August 27, 2025. 08:05 PM , from Slashdot

An anonymous reader shares a report: Security researchers from Palo Alto Networks' Unit 42 have discovered the key to getting large language model (LLM) chatbots to ignore their guardrails, and it's quite simple. You just have to ensure that your prompt uses terrible grammar and is one massive run-on sentence like this one which includes all the information before any full stop which would give the guardrails a chance to kick in before the jailbreak can take effect and guide the model into providing a 'toxic' or otherwise verboten response the developers had hoped would be filtered out.

The paper also offers a 'logit-gap' analysis approach as a potential benchmark for protecting models against such attacks. 'Our research introduces a critical concept: the refusal-affirmation logit gap,' researchers Tung-Ling 'Tony' Li and Hongliang Liu explained in a Unit 42 blog post. 'This refers to the idea that the training process isn't actually eliminating the potential for a harmful response -- it's just making it less likely. There remains potential for an attacker to 'close the gap,' and uncover a harmful response after all.'

Read more of this story at Slashdot.

Read more at Slashdot

https://slashdot.org/story/25/08/27/1756253/one-long-sentence-is-all-it-takes-to-make-llms-misbehave...

Related News

one

A New Four-Person Crew Will Simulate a Year-Long Mars Mission, NASA Announces

potential

Critical, make-me-super-user SAP S/4HANA bug under active exploitation

TheRegisterSep 5

sentence

Geoffrey Hinton: 'AI Will Make a Few People Much Richer and Most People Poorer'

response

AI code assistants make developers more efficient at creating security problems

TheRegisterSep 5

unit

Swiss launch open source AI model as “ethical” alternative to big US LLMs

which

Boffins detail new method to make neural nets forget private and copyrighted info

TheRegisterSep 5

harmful

Philips Hue Plans To Make All Your Lights Motion Sensors

it's

Automated Sextortion Spyware Takes Webcam Pics of Victims Watching Porn

Wired: Tech.Sep 3

model

Make misplacing your wallet no big deal with the KeySmart SmartCard Lite

BoingBoingSep 3

just

"The Long Walk" screening will kick out viewers who can't keep up with the characters

BoingBoingSep 3

guardrails

Who watches the watchmen? Surveillanceware firms make bank, avoid oversight

TheRegisterSep 2

researchers

How to Make Light Roast Espresso, According to Chemists (2025)

Wired: Tech.Sep 2

make

Dolby Vision 2 could make dark TV scenes finally watchable

takes

LegalPwn: Tricking LLMs by burying badness in lawyerly fine print

TheRegisterSep 1

long

How to make a late career switch into cyber

ComputerWorldSep 1

one

How to make IT operations more efficient

ComputerWorldSep 1

potential

Lawsuit Says Amazon Prime Video Misleads When You 'Buy' a Long-Term Streaming Rental

sentence

Humans Are Being Hired to Make AI Slop Look Less Sloppy

response

Bring your own brain? Why local LLMs are taking off

TheRegisterAug 31

unit

Wave Energy Projects Have Come a Long Way After 10 Years

News copyright owned by their original publishers | Copyright © 2004 - 2026 Zicos / 440Network

Current Date

Jan, Fri 2 - 11:14 CET