OpenAI, Anthropic agree to get their models tested for safety before making them public

Friday August 30, 2024. 11:55 AM , from ComputerWorld

Large language model (LLM) providers OpenAI and Anthropic have signed individual agreements with the US AI Safety Institute under the Department of Commerce’s National Institute of Standards and Technology (NIST) in order to collaborate on AI safety research that includes testing and evaluation.

As part of the agreements, both Anthropic and OpenAI will share their new models with the institute before they are released to the public for safety checks.

“With these agreements in place, we look forward to beginning our technical collaborations with Anthropic and OpenAI to advance the science of AI safety,” Elizabeth Kelly, director of the US AI Safety Institute, said in a statement.

The agreements also include the entities engaging in collaborative research on how to evaluate capabilities and safety risks, as well as methods to mitigate those risks.

The agreements come almost a year after US President Joe Biden passed an executive order to set up a comprehensive series of standards, safety and privacy protections, and oversight measures for the development and use of artificial intelligence.

Earlier in July, the NIST released a new open source software package named Dioptra that allows developers to determine what type of attacks would make an AI model perform less effectively.

Along with Dioptra, the NIST had also released several documents promoting AI safety and standards in line with the executive order.

These documents included the initial draft of its guidelines for developing foundation models, dubbed Managing Misuse Risk for Dual-Use Foundation Models, and two guidance documents that will serve as companion resources to the NIST’s AI Risk Management Framework (AI RMF) and Secure Software Development Framework (SSDF), targeted at helping developers manage the risks of generative AI.

Agreements support collaboration with the UK’s AI Safety Institute

The agreements with the LLM providers also include a clause, which will allow the US Safety Institute to provide feedback to both companies on potential safety improvements to their models in collaboration with their partners at the UK AI Safety Institute.

Earlier in April, the US and the UK signed an agreement to test the safety LLMs that underpin AI systems.

The agreement or memorandum of understanding (MoU) — was signed in Washington by US Commerce Secretary Gina Raimondo and UK Technology Secretary Michelle Donelan and the collaboration between the AI Safety Institutes is a direct result of this agreement.

Other US measures around AI safety

The agreements signed by OpenAI and Anthropic come just as the California AI safety bill goes into its final stages of turning into a law. The bill could establish the nation’s most stringent regulations on AI and may pave the way for similar regulations across the country.

The legislation, known as the Safe and Secure Innovation for Frontier Artificial Intelligence Models Act (SB 1047), proposes rigorous testing and accountability measures for AI developers, particularly those creating large and complex models.

The bill, if enacted into law, would require AI companies to test their systems for safety before releasing them to the public.

Earlier this month, OpenAI opposed the bill for at least five days before pledging support for it last week.

The NIST has also taken other measures, including the formation of an AI safety advisory group in February this year that encompassed AI creators, users, and academics, to put some guardrails on AI use and development.

The advisory group named the US AI Safety Institute Consortium (AISIC) has been tasked with coming up with guidelines for red-teaming AI systems, evaluating AI capacity, managing risk, ensuring safety and security, and watermarking AI-generated content. Several major technology firms, including OpenAI, Meta, Google, Microsoft, Amazon, Intel, and Nvidia, joined the consortium to ensure the safe development of AI.