The vital role of red teaming in safeguarding AI systems and data

Tuesday December 31, 2024. 10:00 AM , from InfoWorld

Companies are feeling pressure to adopt generative AI to stay ahead in a competitive global market. To ensure they adopt responsibly, governing and regulatory bodies across the world are convening to debate and understand the best ways to address AI risk without stifling their innovation edge. An existing debate surrounds the extent to which governments can and should manage risk through regulation, but regulation requires a deep understanding of what is being regulated.

At this point, we don’t know what we don’t know about AI — which complicates how we engage organizations about the ways to prevent model abuse and security risks. However, there are some methods that are proven to help. Responsible AI adoption begins with a methodology that emphasizes flexibility and continuous improvement, so organizations have a repeatable, consistent process in how they find and address AI risks.

AI red teaming for innovation and reliability

Agencies like the National Institute for Security and Technology (NIST) have released frameworks that meet these criteria to help organizations proactively ensure a more consistent approach to safe and reliable AI deployment. Organizations that adopt these frameworks and explore how to reduce AI risk themselves can help guide the future AI safety policy.

AI red teaming, which reflects best practices found in existing cybersecurity industry frameworks, is helping to define new best practices to test the safety and security of models while also arming organizations with the tools to increase the quality of their AI adoption and deployment. Specifically, AI red teaming engagements use a community of expert security and safety researchers to find how models can be abused, so organizations can find and mitigate AI risks early.

Proactively addressing potential avenues for abuse helps organizations avoid costly AI incidents that harm brand, reputation, and consumer trust.

AI red teaming supports safety and security

AI red teaming offers an innovative, proactive method for strengthening AI while mitigating potential risks in alignment with the US government’s vision for responsible AI development. Building on the bug bounty model, AI red teaming can be structured in a way that rewards researchers per security vulnerability, and potential abuse scenarios they find. This approach encourages a diverse group of researchers to rigorously test AI and uncover weaknesses that could be exploited.

For safety issues, the main focus of red teaming engagements is to stop AI systems from generating undesired outputs. This could include blocking instructions on bomb making or displaying potentially disturbing or prohibited images. The goal here is to find potential unintended results or responses in large language models (LLMs) and ensure developers are mindful of how guardrails must be adjusted to reduce the chances of abuse for the model.

On the flip side, red teaming for AI security is meant to identify flaws and security vulnerabilities that could allow threat actors to exploit the AI system and compromise the integrity, confidentiality, or availability of an AI-powered application or system. It ensures AI deployments do not result in giving an attacker a foothold in the organization’s system.

Working with the security researcher community for AI red teaming

To enhance their red teaming efforts, companies should engage the community of AI security researchers. A group of highly skilled security and AI safety experts, they are professionals at finding weaknesses within computer systems and AI models. Employing them ensures the most diverse talent and skills are being harnessed to test an organization’s AI. These individuals provide organizations with a fresh, independent perspective on the evolving safety and security challenges faced in AI deployments.

For optimal results, organizations should also ensure that mechanisms are in place for close collaboration between internal and external teams during red teaming engagements. Additionally, organizations need to think creatively about how to best instruct and incentivize security researchers to address the most pressing issues the organization faces in terms of specific security and safety concerns. For each organization, this can be different. For example, what is considered harmful for a bank may not be considered as serious for a social media chatbot.

Strengthening AI systems against threats

AI red teaming is an effective and efficient method for companies to responsibly deploy AI while addressing security risks. To maximize the impact of their red teaming exercises, organizations should use security researchers skilled in AI and LLM prompt hacking. This approach helps uncover previously unknown problems and adapt the bug bounty model to leverage the researchers expertise in testing AI models.

By doing so, organizations can demonstrate their commitment to responsible AI adoption and contribute to defining how we can all collectively build safer AI systems.

Dane Sherrets is solutions architect at HackerOne.

—

Generative AI Insights provides a venue for technology leaders—including vendors and other outside contributors—to explore and discuss the challenges and opportunities of generative artificial intelligence. The selection is wide-ranging, from technology deep dives to case studies to expert opinion, but also subjective, based on our judgment of which topics and treatments will best serve InfoWorld’s technically sophisticated audience. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Contact doug_dineley@foundryco.com.