'It's Surprisingly Easy To Jailbreak LLM-Driven Robots'

Saturday November 23, 2024. 07:34 PM , from Slashdot

Instead of focusing on chatbots, a new study reveals an automated way to breach LLM-driven robots 'with 100 percent success,' according to IEEE Spectrum. 'By circumventing safety guardrails, researchers could manipulate self-driving systems into colliding with pedestrians and robot dogs into hunting for harmful places to detonate bombs...'

[The researchers] have developed RoboPAIR, an algorithm designed to attack any LLM-controlled robot. In experiments with three different robotic systems — the Go2; the wheeled ChatGPT-powered Clearpath Robotics Jackal; and Nvidia's open-source Dolphins LLM self-driving vehicle simulator. They found that RoboPAIR needed just days to achieve a 100 percent jailbreak rate against all three systems... RoboPAIR uses an attacker LLM to feed prompts to a target LLM. The attacker examines the responses from its target and adjusts its prompts until these commands can bypass the target's safety filters. RoboPAIR was equipped with the target robot's application programming interface (API) so that the attacker could format its prompts in a way that its target could execute as code. The scientists also added a 'judge' LLM to RoboPAIR to ensure the attacker was generating prompts the target could actually perform given physical limitations, such as specific obstacles in the environment...

One finding the scientists found concerning was how jailbroken LLMs often went beyond complying with malicious prompts by actively offering suggestions. For example, when asked to locate weapons, a jailbroken robot described how common objects like desks and chairs could be used to bludgeon people.

The researchers stressed that prior to the public release of their work, they shared their findings with the manufacturers of the robots they studied, as well as leading AI companies. They also noted they are not suggesting that researchers stop using LLMs for robotics... 'Strong defenses for malicious use-cases can only be designed after first identifying the strongest possible attacks,' Robey says. He hopes their work 'will lead to robust defenses for robots against jailbreaking attacks.'
The article includes a reaction from Hakki Sevil, associate professor of intelligent systems and robotics at the University of West Florida. He concludes that the 'lack of understanding of context of consequences' among even advanced LLMs 'leads to the importance of human oversight in sensitive environments, especially in environments where safety is crucial.' But a long-term solution could be LLMs with 'situational awareness' that understand broader intent.

'Although developing context-aware LLM is challenging, it can be done by extensive, interdisciplinary future research combining AI, ethics, and behavioral modeling...'

Thanks to long-time Slashdot reader DesertNomad for sharing the article.

Read more of this story at Slashdot.