Just how good is AI-assisted code generation?

Wednesday April 3, 2024. 08:00 AM , from ComputerWorld

Generative AI-assisted coding allows developers to write code faster — and often, more accurately — using digital tools to create code based on natural language prompts or partial code inputs. (Like some email platforms, the tools can also suggest code for auto-completion as it’s written in real time.)

AI-assisted code generation tools are increasingly prevalent in software engineering, and somewhat unexpectedly, have become low-hanging fruit for most organizations experimenting with generative AI (genAI). Adoption rates are skyrocketing. That’s because even if they only suggest a baseline of code for a new application, automation tools can eliminate hours that otherwise would have been devoted to manual code creation and updating.

Evans Data Corp., a market research firm that specializes in software development, conducted a multinational survey of 434 AI and machine learning developers. When asked what they most likely would create using genAI tools, the top answer was software code, followed by algorithms and large language models (LLMs). They also said they expect genAI to shorten the development lifecycle and make it easier to add machine-learning features.

By 2027, 70% of professional developers will be using AI-powered coding tools, up from less than 10% in September 2023, according to Gartner Research. And within three years, 80% of enterprises will have integrated AI-augmented testing tools into their software engineering toolchain — a significant increase from approximately 15% early last year, Gartner said.

One of the top tools used for genAI-automated software development is GitHub Copilot. GitHub Copilot is powered by generative AI models developed by GitHub, OpenAI, and Microsoft, and is trained on all natural languages that appear in public repositories.

Since GitHub Copilot for business was launched last year, more than 50,000 organizations have signed up to use it, including digital natives such as Etsy and HelloFresh, as well as leading enterprises including Autodesk, Dell Technologies, and Goldman Sachs, according to Amanda Silver, corporate vice president of Microsoft’s Developer Division. (Microsoft acquired GitHub in 2018.)

GitHub Copilot now has more than 1.3 million paid subscribers, according to Silver. “With 50,000 licenses, Accenture is now GitHub’s largest Copilot customer to date,” Silver said.

Along with GitHub’s Copilot, some of the most popular code-generation tools include Google Bard, Amazon CodeWhisperer, Microsoft 365 Copilot (powered by GPT), Replit, Divi AI, Tabnine, Refact.ai, and Codeium. Most are free or come as part of a larger AI-enabled subscription service.

AI-powered software augmentation tools can have an enormous impact on developer efficiency and productivity. Amazon Web Services (AWS), for example, ran a productivity challenge and found developers who used its CodeWhisperer code development tool were 27% more likely to complete tasks successfully and did so an average of 57% faster than those who didn’t use the tool.

(Amazon Q is a genAI-based chatbot developed by Amazon for enterprise use and it underpins its CodeWhisperer tool. Amazon Q is powered by Amazon Bedrock—which offers access to a selection of models including from the Amazon Titan family.)

According to an AWS-Persistent study, developers using Amazon CodeWhisperer’s customization capability completed their tasks an additional 28% faster than without customizations.

For example, a team of five Amazon developers used Amazon Q Code Transformation to upgrade 1,000 production applications from Java 8 to Java 17 in just two days. The average time per application was less than 10 minutes compared to the two days it used to take to upgrade one app, according to an Amazon spokesperson.

Since becoming generally available in April 2023, Amazon CodeWhisperer has garnered more than 100,000 customers. For example, software development and outsourcing services company HCLTech is rolling out Amazon CodeWhisperer to more than 50,000 HCLTech engineers, cloud practitioners and developers to build secure applications for use both internally and for clients.

Over the next two years, Accenture plans to enroll 50,000 development engineers in AWS AI services, including Amazon Q and Amazon CodeWhisperer.

Because genAI software development tools are based on LLMs, they’re trained on millions or billions of lines of code, with the most popular platforms capable of working with any number of coding languages, from C to Python.

Amazon’s CodeWhisperer is available as part of the AWS Toolkit for Visual Studio (VS) Code and JetBrains. It currently supports Python, Java, JavaScript, TypeScript, C#, Go, Rust, PHP, Ruby, Kotlin, C, C++, Shell scripting, SQL, Scala, JSON, YAML, and HCL.

“In our early experimentation, we were doing a lot of work in Python, JavaScript and languages like that,” GitHub COO Kyle Daigle said in an earlier interview with Computerworld. “GitHub is mainly a Ruby company, but we also write in Go, and C, and FirGit. And so we were expanding our use cases of Copilot and using it in different languages. But overall, Copilot is able to work on the vast majority of languages that are in the public sphere.”

Relying on nothing more than user prompts based on natural language processing, genAI-assisted code generators can offer software code suggestions ranging from snippets to full functions. And updates can make the tools even better.

Amazon, for instance, said updates to its CodeWhisperer tool increased code acceptance rates from around 20% on average to 35% across all languages and use cases.

“Now, with Amazon Q included with CodeWhisperer, developers can ask about their code, and leverage Amazon Q’s capabilities to find bugs, optimize, and translate code they are working on,” Doug Seven, general manager of Amazon CodeWhisperer and director of software development for Amazon Q, said in a blog.

Why is AI-assisted coding so powerful?

One of the more heralded aspects of AI-assisted coding is that users don’t have to be versed in software development. Natural language processing allows even business users to simply write a prompt and get back the software needed for any number of projects.

For example, users can write a comment in natural language that outlines a specific task in English, such as, “Upload a file with server-side encryption.” Based on that information, CodeWhisperer recommends one or more code snippets directly in the development platform to accomplish the task, according to an Amazon spokesperson.

Many of the coding tools also come with enhanced code securitycapabilities scans and code remediation suggestions. Some even come with “bias” filtering and reference trackers, which detect whether a code suggestion might be similar to open-source training data. The latter are important features in an AI-based coding assistant.

Amazon and other providers are also experimenting with tools to assist non-developers in producing apps for business purposes. For example, an Amazon spokesperson said the company sees the engagement of non-developers as a priority for making AI accessible. They released PartyRock, an edutainment generative AI application builder that allows non-developers to work with genAI and LLMs in a sandbox environment, publicly after it went viral internally.

“You can experiment with building different applications,” Seven said in an interview with Computerworld. “We’ll see an increase in different tools for different personas that will use generative AI. I think we’re just scratching the surface on where we’ll see genAI in different places. We’ll start to see more and more of these tools.”

Accuracy rates vary

Seven said code acceptance rates for CodeWhisperer are around 30% to 40%, but that doesn’t mean the code it wrote was incorrect or error ridden. The acceptance rate refers to whether the genAI tool correctly interpreted what the developer asked it to do.

Seven described something akin to a conversation between a developer and an AI-code generator, where the developer asks it to produce something and then modifies the request with follow-up requests. The ability of CodeWhisperer to produce error-free, usable code is “quite high,” though Seven said Amazon doesn’t reveal internal metrics.

Anecdotally, developers and IT leaders have placed the ability of popular AI-based code augmentation tools to correctly generate usable code at anywhere from 50% to 80%.

“We had this as a hypothesis. Now we’re starting to see this in actual studies,” said Derek Holt, CEO of AI-powered software delivery provider Digital.ai.

According to a study by Cornell University last year, there’s a wide variance between various genAI coding tools. The study showed ChatGPT, GitHub Copilot and Amazon CodeWhisperer generate correct code 65.2%, 64.3% and 38.1% of the time, respectively.

While the study is a year old, the accuracy rates for the AI-assisted code tools is “more or less the same” today, according to Burak Yetiştiren, the paper’s lead author and a graduate student researcher at UCLA’s Henry Samueli School of Engineering and Applied Science.

A study by GitClear, a developer tool for GitHub and GitLab that provides code analysis and git stats, examined more than 153 million lines of code from 2020 to 2023. Highlighting key shifts in code churn, duplication, and age, it explored the impact of AI tools like GitHub Copilot on programming practices.

Among GitClear’s findings was that developers write code 55% faster when using Copilot. When GitClear looked at GitHub’s code quality and maintainability compared to what would have been written by a human, it found less experienced developers have a greater advantage with AI-assisted programming compared to veteran developers.

GitHub’s own data suggests that junior developers use Copilot about 20% more than more experienced developers, the research found.

GitClear conducted a corresponding survey of 500 developers and asked, “What metrics should you be evaluated on, when actively using AI?” The top three issues they named were code quality, time to complete task, and number of production incidents.

“When developers are inundated with quick and easy suggestions that will work in the short term, it becomes a constant temptation to add more lines of code without really checking whether an existing system could be refined for reuse,” GitClear’s paper said.

More code, but more errors?

Developers are producing 45% more code with the automation tools, according to Digital.ai’s Holt, but that’s not necessarily a good thing.

“The main challenge with AI-assisted programming, however, is that it becomes so easy to generate a lot of code which shouldn’t have been written in the first place,” Adam Tornhill, founder & CTO at CodeScene, said on X/Twitter.

Another wrinkle is that when code is not generated by humans, it is more opaque. As a result, quality challenges are emerging, including questions about whether code can effectively be tested for errors and security holes.

In a survey of software engineers last year (96% of whom used AI-based coding tools) by developer security platform Snyk, more than half said insecure AI code suggestions were common.

“That shouldn’t surprise us,” Holt said. “It’s early days and we’re training these models on all of the code in certain repositories. All you’re going to do is repeat the mistakes that were made by the developers who wrote that original code.”

Given that much of a developer’s time is spent fixing existing code — not writing new features — the ability to read code and find issues when it’s not written by humans becomes yet another issue, Holt said.

Even with those issues, developers wouldn’t be adopting tools like Copilot if they didn’t believe it accelerated their ability to produce code. GitHub’s research on the former point found “developers are 75% more fulfilled when using Copilot.”

In a study of 450 Accenture developers using Copilot for six months, 88% of suggested code was retained, build success rate increased by 45%, and every developer surveyed reported Copilot was useful, according to Microsoft’s Silver.

Churn, moved and copy/paste code issues

GitClear, however, also found that with the increased use of AI-assisted programming, the amount of “Churn,” “Moved,” and “Copy/Pasted” code increased significantly.

“Churn” is the percentage of code that is pushed to the repository, then subsequently reverted, removed or updated within two weeks. It was relatively rare when developers authored all their own code; only 3% to 4% of code was churned prior to 2023.

But overall code churn jumped 9% the first year Copilot was available in beta — the same year that ChatGPT became available.

From 2022 through 2023, the rise of AI assistants was strongly correlated with “mistake code” being pushed to the repository. Copilot prevalence — its use in generating code — was 0% in 2021, 5% to 10% in 2022, and 30% in 2023, GitClear found.

“If the current pattern continues into 2024, more than 7% of all code changes will be reverted within two weeks, double the rate of 2021,” GitClear’s report said.

There is perhaps no greater scourge to long-term code maintainability than copy/pasted code. That’s because code that’s simply reused can also contain previous mistakes, security holes or other issues.

“I have no doubt we’ll be able to figure out the problems, and we’ll be able to train models on small amounts of code created only by our best developers,” Holt said. “But right now you’re getting a junior developer, and if you’re not paying attention to what that means to the broader software development lifecycle, you’re going to be running some risks.”

Amazon’s Seven argued that one of the strengths of CodeWhisperer and other products is their ability to examine existing code for errors and then suggest changes. “So, it’ll actually give you the code to make that change,” Seven said. “The advantage of using Amazon Q [CodeWhisperer] in this context is as a developer, you have a debugging companion.”

That “could be particularly useful in checking for discrepancies in existing code that may not be familiar to developers. And Q is really good at that,” he said.

Another advantage of automated tools is that they can be used in a set-and-forget mode, where a developer or engineer simply explains a task and then the tools complete it independently – whether developing a new application or debugging an existing one. “In either case, the accuracy of the code, and the quality of the code, is really quite high,” Seven said.

What’s not in question is that over time, software generation tools will continue to improve — though there will always be the need for a human in the loop.

“My gut tells me there will always be roles for developers, whether that’s reviewing or catalogizing or a mixture of both,” Holt said. “We’re not even talking about the fact that delivering code is not the goal. …Delivering great features that customers love is the actual goal.

“So, from my view, I still have a long career ahead of me in software development.”
Developer, Emerging Technology, Generative AI