What Capgemini software chief learned about AI-generated code: highly usable, ‘too many unknowns’ for production

Tuesday April 30, 2024. 12:00 PM , from ComputerWorld

Capgemini Engineering is made up of more than 62,000 engineers and scientists across the globe whose job it is to create products for a myriad number of clients, from industrial companies building cars, trains and planes to independent software vendors.

So, when AI-assisted code generation tools began flooding the marketplace in 2022, the global innovation and engineering consultancy took notice. Afterall, one-fifth of Capgemini’s business involves producing software products for a global clientele facing the demands of digital transformation initiatives.

According to Capgamini’s own survey data, seven in 10 organizations will be using generative AI (genAI) for software engineering in the next 12 months. Today, 30% of organizations are experimenting with it for software engineering, and an additional 42% plan to use it within a year. Only 28% of organizations are steering completely clear of the technology.

In fact, genAI already assists in writing nearly one in every eight lines of code, and that ratio is expected hit one in every five lines of code over the next 12 months, according to Capgemini.

Jiani Zhang took over as the company’s chief software officer three years ago. In that time, she’s seen the explosion of genAI’s use to increase efficiencies and productivity among software development teams. But as good as it is at producing usable software, Zhang cautioned that genAI’s output isn’t yet ready for production — or even for creating a citizen developer workforce. There remain a number of issues developers and engineers will face when piloting its use, including security concerns, intellectual property rights issues, and the threat of malware.

Jiani Zhang, Chief Software Officer at Capgemini Engineering
Capgemini

That said, Zhang has embraced AI-generated software tools for a number of lower-risk tasks, and it has created significant efficiencies for her team. Computerworld spoke with Zhang about Capgemini Engineering’s use of AI; the following are excerpts from that interview.

What’s your responsibility at Capgemini? “I look after software that’s in products. The software is so pervasive that you actually need different categories of software and differnent ways it’s developed. And, you can imagine that there’s a huge push right now in terms of moving software [out the door].”

How did your journey with AI-generated software begin? “Originally, we thought about generative AI with a big focus on sort of creative elements. So, a lot of people were talking about building software, writing stories, building websites, generating pictures and the creation of new things in general. If you can generate pictures, why can’t you generate code? If you can write stories, why not write user stories or requirements that go into building software. That’s the mindset of the shift going on, and I think the reality is it’s a combination of a market-driven dynamic. Everyone’s kind of moving toward wanting to build a digital business. You’re effectively now competing with a lot of tech companies to hire developers to build these new digital platforms.

“So, many companies are thinking, ‘I can’t hire against these large tech companies out here in the Bay Area, for example. So, what do I do?’ They turn to AI…to deal with the fact that [they] don’t have the talent pool or the resources to actually build these digital things. That’s why I think it’s just a perfect storm, right now. There’s a lack of resources, and people really want to build digital businesses, and suddenly the idea of using generative AI to produce code can actually compensate for [a] lack of talent. Therefore, [they] can push ahead on those projects. I think that’s why there’s so much emphasis on [genAI software augmentation] and wanting to build towards that.”

How have you been using AI to create efficiencies in software development and engineering? “I would break out the software development life cycle almost into stages. There is a pre-coding phase. This is the phase where you’re writing the requirements. You’re generating the user stories, and you create epics. Your team does a lot of the planning on what they’re going to build in this area. We can see generative AI having an additive benefit there just generating a story for you. You can generate requirements using it. So, it’s helping you write things, which is what generative AI is good at doing, right? You can give it some prompts of where you want to go and it can generate these stories for you.

“The second element is that [software] building phase, which is coding. This is the phase people are very nervous about it and for very good reason, because the code generation aspect of generative AI is still almost like a little bit of wizardry. We’re not quite sure how it gets generated. And then there’s a lot of concerns regarding security, like where did this get generated from? Because, as we know, AI is still learning from something else. And you have to ask [whether] my generated code is going to be used by somebody else? So there’s a lot of interest in using it, but then there’s a lot of hesitancy in actually doing the generation side of it.

“And then you have the post-coding phase, which is everything from deployment, and testing, and all that. For that phase, I think there’s a lot of opportunity for not just generative AI, but AI in general, which is all focused around intelligent testing. So, for instance, how do you generate the right test cases? How do you know that you’re testing against the right things? We often see from a lot of clients where effectively over the years they’ve just added more and more tests to that phase, and so it got bigger and bigger and bigger. But, nobody’s actually gone in and cleaned up that phase. So, you’re running a gazillion tests. Then you still have a bunch of defects because no one’s actually cleaned up the tests of defects they are trying to detect. So, a lot of this curates better with generative AI. Specifically, it can perform a lot of test prioritization. You can look at patterns of which tests are being used and not used. And, there’s less of a concern about something going wrong in with that. I think AI tools make a very big impact in that area.

“You can see AI playing different roles in different areas. And I think that the front part has less risk and is easier to do. Maybe it doesn’t do as much as the whole code generation element, but again there’s so much hesitancy around being comfortable with the generated code.”

How important is it to make sure that your existing code base is clean or error free before using AI code generation tools? “I think it depends on what you’re starting from. With any type of AI technology, you’re starting with some sort of structure, some sort of data. You have some labeled data, you have some unlabeled data, and an AI engine is just trying to determine patterns and probabilities. So, when you say you want to generate code, well what are you basing new code that off of?

“If you were to create a large language model or any type of model, what’s your starting point? If your starting point is your code base only, then yes, all of the residual problems that you have will most likely be inherited because it’s training on bad data. Thinking about that is how you should code. A lot of people think, ‘I’m not going to be arrogant to think that my code is the best.’

“The more generic method would be to leverage larger models with more code sets. But then the more code you have gets you deeper into a security problem. Like where does all that code come from? And am I contributing to someone else’s larger code set? And what’s really scary is if you don’t know the code set well, is there a Trojan horse in there. So, there’s lot of dynamics to it.

“A lot of the clients that we face love these technologies. It’s so good, because it professes an opportunity to solve a problem, which is the shortage of talent, so as to actually build a digital business without that. But then they’re really challenged. Do I trust the results of the AI? And do I have a large enough code base that I’m comfortable using and not just imagining that some model will come from the ether to do this.”

How have you addressed the previous issue — going with a massive LLM codebase or sticking to smaller, more proprietary in-house code and data? “I think it depends on the sensitivity of the client. I think a lot of people are playing with the code generation element. I don’t think a lot of them are taking [AI] code generation to production because, like I said, there’s just so many unknowns in that area.

“What we find is more clients have figured out more of that pre-code phase, and they’re also focusing a lot on that post-code phase, because both of those are relatively low risk with a lot of gain, especially in the area of like testing because it’s a very well-known practice. There’s so much data that’s in there, you can quickly clean that up and get to some value. So, I think that’s a very low-hanging fruit. And then on the front-end side of it, you know a lot of people don’t like writing user stories, or the requirements are written poorly and so the amount of effort that can take away from that is meaningful.”

What are the issues you’ve run into with genAI code generation? “While it is the highest value [part of the equation]…, it is also about generating consistent code. But that’s the problem. Because generative AI is not prescriptive. So, when you tell it, ‘I want two ears and a tail that wags, it doesn’t actually give you a Labrador retriever every time. Sometimes it will give you a Husky. It’s just looking at what fits that [LLM]. So…when you change a parameter, it could generate completely new code. And then that completely new code means that you’re going have to redo all of the integration, deployment, all those things that comes off of it.

“There’s also a situation where even if you were able to contain your code set, build an LLM with highly curated, good engineering practices [and] software practices and complement it with your own data set — and generate code that you trust — you still can’t control whether the generated code will be the same code every single time when you make a change. I think the industry is still working to figure those elements out, refining and re-refining to see how you can have consistency.”

What are your favorite AI code-augmentation platforms? “I think it’s quite varied. I think the challenge with this market is it’s very dynamic; they keep adding new feature sets and the new feature sets kind of overlap with each other. So, it’s very hard to determine what one is best. I think there are certain ones that are leading right now, but at the same time, the dynamics of the environment [are] such that you could see something introduced that’s completely new in the next eight weeks. So, it’s quite varied. I wouldn’t say that there is a favorite right now. I think everyone is learning at this point.”

How do you deal with code errors introduced by genAI? What tools do you use to discover and correct those errors, if any? “I think that then goes into your test problem. Like I said, there’s a consistency problem that fundamentally we have to take into account, because every time we generate code it could be generated differently. Refining your test set and using that as an intelligent way of testing is a really key area to make sure that you catch those problems. I personally believe that they’re there because the software development life cycle is so vast.

“It’s all about where people want to focus the post-coding phase. That testing phase is a critical element to actually getting any of this right. …It’s an area where you can quickly leverage the AI technologies and have minimal risk introduced to your production code. And, in fact, all it does is improve it. [The genAI] is helping you be smarter in running those test sets. And those test sets are then going to be highly beneficial to your generated code as well, because now you know what your audience is also testing against.

“So if the generated code is bad, you’ll catch it in these defects. It’s worth a lot of effort to look at that specific area because, like I said, it’s a low-risk element. There’s a lot of AI tools out there for that.

“And, not everything has to be generative AI, right? You know, AI and machine learning [have] been here for quite some time, and there’s a lot of work that’s already been done to refine [them]. So, there’s a lot of benefit and improvement that’s been done to those older tools. The market has this feeling that they need to adopt AI, but AI adoption hasn’t been the smoothest. So then [developers] are saying. ‘Let’s just leapfrog and let’s just get into using generative AI.’ The reality is that you can actually fix a lot of these things based off of technology that’s didn’t just come to market 12 months ago. I think there’s there’s definitely benefit in that.”

What generative AI tools have you tried and what kind of success have you seen? “We’ve tried almost all of them. That’s the short answer. And they’ve all been very beneficial. I think that the reality is, like I said before, the landscape of genAI tools today is pretty comparable between the different cloud service providers. I don’t see a a leading one versus a non leading one. I feel like they all can do a pretty nice things.

“I think that the challenge is being up to date with what’s available because they keep releasing new features. That is encouraging, but at the same time you have to find a way to implement and use the technology in a meaningful way. At this point, the speed at which they’re pushing out these features versus the adoption in the industry is unmatched. I think there’s a lot more features than actual adoption.

“We have our Capgemini Research Institute, through which we do a lot of polls with executives, and what we found is about 30% of organizations are experimenting with genAI. And probably another 42% are going to be playing with it for the next 12 months. But that also means from an adoption perspective, those actually using it in software engineering, I think it’s only less than one-third that’s really fundamentally going to be impacting their production flow with generator. So I think the market is still very much in the experimentation phase. And so that’s why all the tool [are] pretty comparable in terms of what it can do and what it can’t do.

“And again, it’s not really about whether the feature set is greater in one platform versus another. I think it’s more the application of it to solving a business problem that makes the impact.”

Do you use AI or generative AI for any software development? Forget pre-development and post-development for the moment. Do you actually use it to generate code that you use? “We do. Even for us, it is in an experimentation phase. But we have put in a lot of work ourselves in terms of refining generative AI engines so that we can generate consistent code. We’ve actually done quite a lot of experimentation and also proof of concepts with clients on all three of those phases [pre-code modeling, code development, post-code testing]. Like I said, the pre- and post- are the easier ones because there’s less risk.

“Now, whether or not the client is comfortable enough for that [AI] generated code to go to production is a different case. So, that proof of concept we’re doing is not necessarily production. And I think…taking it to production is still something that industry has to work through in terms of their acceptance.”

How accurate is the code generated by your AI tools? Or, to put it another way, how often is that code usable? I’ve heard from other experts the accuracy rate ranges anywhere from 50% to 80% and even higher. What are you finding? “I think the code is highly usable, to be honest. I think it’s actually a pretty high percentage because the generated code, it’s not wrong. I think the concern with generated code is not whether or not it’s written correctly. I think it is written correctly. The problem, as I said, is around how that code was generated, whether there were some innate or embedded defects in it that people don’t know about. Then you know the other question is where did that generated code come from, and whether or not the generated code that you’ve created now feeds a larger pool, and is that secure?

“So imagine if I’m an industrial company and I want to create this solution, and I generate some code [whose base] came from my competitor. How do I know if this is my IP or their IP? Or if I created it, did that somehow through the ether migrate to somebody else generating that exact same code? So, it gets very tricky in that sense unless you have a very privatized genAI system.”

Even when the code itself it not usable for whatever reason, can AI-generated code still be useful? “It’s true. There’s a lot of code that in the beginning may not be usable. It’s like with any learning system, you need to give it more prompts in order to tailor it to what you need. So, if you think about basic engineering, you define some integers first. If genAI can do that, you’ve now saved yourself some time from having to type in all the defined parameters and integer parameters and all that stuff because that could all be pre-generated.

“If it doesn’t work, you can give it an additional prompt to say, ‘Well, actually I’m looking for a different set of cycles or different kind of run times,’ and then you can tweak that code as well. So instead of you starting from scratch, just like writing a paper, you can have someone else write an outline, and you can always use some of the intro, the ending, some of these things that isn’t the actual meat of the content. And the meat of the content you can continue to refine with generative AI, too. So definitely, it’s a big help. It saves you time from writing it from scratch.”

Has AI or genAI allowed you to create a citizen developer work force? “I think it’s still early. AI allows your own team to be a little faster in doing some of the things that they don’t necessarily want to do or it can cut down the toil from a developer’s perspective of generating test cases or writing user stories and whatnot. It’s pretty good at generating the outline of a code from a framework perspective. But for it to do it code generation independently, I think we’re we’re still relatively early on that.”

How effective has AI code generation been in creating efficiencies and increasing productivity? “Productivity, absolutely. I think that’s a really strong element of the developer experience. The concept is that if you hire some really good software developers, they want to be building features and new code and things of that sort, and they don’t like do more of the pre-code responsibilities. So if you can solve more of that toil for them, get rid of more of that mundane repetitive things, then they can be focused on more of the value generation element of it.

“So, for productivity, I think it’s a big boost, but it’s not about developing more code. I think often times it’s about developing better code. So instead of saying I spent hours of my day just writing a basic structure, that’s now pre-generated for me. And now I can think about how do I optimize runtimes. How do I optimize the consumption or storage or whatnot?

“So, it frees up your mind to think about additional optimizations to make your code better, rather than just figuring out what the basis of the code is.”

One of the other uses I’ve heard from engineers and developers is that genAI is often not even used to generate new code, but it’s most often used to update old code. Are you also seeing that use case? “I think all the genAI tools are going to be generating new code, but the difference is that one very important use case which you just highlighted: the old code migration element.

“What we also find with a lot of clients that have a lot of systems that are heavily outdated — they have systems that are 20-, maybe 30-years old. There could be proprietary code in there that nobody understands anymore and they’ve already lost that skill set. What you can do with AI-assisted code generation and your own team is create a corpus of knowledge of understanding of the old code so you can actually now ingest the old code and understand what that meta model is — what is it you’re trying to do? What are the right inputs and outputs? From there, you can actually generate new code in a new language that’s actually maintainable. And that’s a huge, huge benefit for clients.

“So, this is a great application of AI technology where it’s still generating your code, it’s not actually changing the old code. You don’t want it to change the old code, because it would still be unmanageable. So what you want is to have it understand the concept of what you’re trying to do so that you can actually generate new code that is much more maintainable.”

Are there any other tips you can offer people who are considering using AI for code generation? “I think it’s a great time, because the technology is super exciting. There’s a lot of different choices for people to play around with and I think there’s a lot of low-hanging fruit. I feel like generative AI is also a huge benefit, because it’s introduced or reintroduced people to the concept that AI is actually attainable.

“There’s a lot of people who think AI is like the top of the pyramid technology. They think, I’m going to get there, but only if I clean up all of my data and I get all my data ingested correctly. Then, if I go through all those steps, I can use AI. But the pervasiveness and the attractiveness of general AI is that it is attainable even before that. It’s OK to start now. You don’t have to clean up everything before you get to that point. You can actually iterate and gain improvements along the way.

“If you look at that the software development life cycle, there’s a lot of areas right now that could be low risk use of AI. I wouldn’t even say just productivity. It’s just about it being more valuable to the outcomes that you want to generate and so it’s a good opportunity to start with. It’s not the be all to end all. It’s not going to be your citizen developer, you know, But it augments your team. It increases the productivity. It reduces the toil. So, it’s just a good time to get started.”
Developer, Engineer, Generative AI