Code-Generating AI Can Introduce Security Vulnerabilities, Study Finds

Wednesday December 28, 2022. 11:40 PM , from Slashdot

An anonymous reader quotes a report from TechCrunch: A recent study finds that software engineers who use code-generating AI systems are more likely to cause security vulnerabilities in the apps they develop. The paper, co-authored by a team of researchers affiliated with Stanford, highlights the potential pitfalls of code-generating systems as vendors like GitHub start marketing them in earnest. The Stanford study looked specifically at Codex, the AI code-generating system developed by San Francisco-based research lab OpenAI. (Codex powers Copilot.) The researchers recruited 47 developers -- ranging from undergraduate students to industry professionals with decades of programming experience -- to use Codex to complete security-related problems across programming languages including Python, JavaScript and C.

Codex was trained on billions of lines of public code to suggest additional lines of code and functions given the context of existing code. The system surfaces a programming approach or solution in response to a description of what a developer wants to accomplish (e.g. 'Say hello world'), drawing on both its knowledge base and the current context. According to the researchers, the study participants who had access to Codex were more likely to write incorrect and 'insecure' (in the cybersecurity sense) solutions to programming problems compared to a control group. Even more concerningly, they were more likely to say that their insecure answers were secure compared to the people in the control.

Megha Srivastava, a postgraduate student at Stanford and the second co-author on the study, stressed that the findings aren't a complete condemnation of Codex and other code-generating systems. The study participants didn't have security expertise that might've enabled them to better spot code vulnerabilities, for one. That aside, Srivastava believes that code-generating systems are reliably helpful for tasks that aren't high risk, like exploratory research code, and could with fine-tuning improve in their coding suggestions. 'Companies that develop their own [systems], perhaps further trained on their in-house source code, may be better off as the model may be encouraged to generate outputs more in-line with their coding and security practices,' Srivastava said. The co-authors suggest vendors use a mechanism to 'refine' users' prompts to be more secure -- 'akin to a supervisor looking over and revising rough drafts of code,' reports TechCrunch. 'They also suggest that developers of cryptography libraries ensure their default settings are secure, as code-generating systems tend to stick to default values that aren't always free of exploits.'

Read more of this story at Slashdot.