'How Good Is ChatGPT at Coding, Really?'

Sunday July 7, 2024. 12:44 AM , from Slashdot

IEEE Spectrum (the IEEE's official publication) asks the question. 'How does an AI code generator compare to a human programmer?'

A study published in the June issue of IEEE Transactions on Software Engineering evaluated the code produced by OpenAI's ChatGPT in terms of functionality, complexity and security. The results show that ChatGPT has an extremely broad range of success when it comes to producing functional code — with a success rate ranging from anywhere as poor as 0.66 percent and as good as 89 percent — depending on the difficulty of the task, the programming language, and a number of other factors. While in some cases the AI generator could produce better code than humans, the analysis also reveals some security concerns with AI-generated code.

The study tested GPT-3.5 on 728 coding problems from the LeetCode testing platform — and in five programming languages: C, C++, Java, JavaScript, and Python. The results?

Overall, ChatGPT was fairly good at solving problems in the different coding languages — but especially when attempting to solve coding problems that existed on LeetCode before 2021. For instance, it was able to produce functional code for easy, medium, and hard problems with success rates of about 89, 71, and 40 percent, respectively. 'However, when it comes to the algorithm problems after 2021, ChatGPT's ability to generate functionally correct code is affected. It sometimes fails to understand the meaning of questions, even for easy level problems,' said Yutian Tang, a lecturer at the University of Glasgow. For example, ChatGPT's ability to produce functional code for 'easy' coding problems dropped from 89 percent to 52 percent after 2021. And its ability to generate functional code for 'hard' problems dropped from 40 percent to 0.66 percent after this time as well...

The researchers also explored the ability of ChatGPT to fix its own coding errors after receiving feedback from LeetCode. They randomly selected 50 coding scenarios where ChatGPT initially generated incorrect coding, either because it didn't understand the content or problem at hand. While ChatGPT was good at fixing compiling errors, it generally was not good at correcting its own mistakes... The researchers also found that ChatGPT-generated code did have a fair amount of vulnerabilities, such as a missing null test, but many of these were easily fixable.

'Interestingly, ChatGPT is able to generate code with smaller runtime and memory overheads than at least 50 percent of human solutions to the same LeetCode problems...'

Read more of this story at Slashdot.