How Kubecost shines a light on GPU efficiency

Monday October 21, 2024. 11:00 AM , from InfoWorld

Many enterprise engineering teams—perhaps most at this point—have dived headlong into GPUs to build out proofs-of-concept and operationalize new products. While competitive pressures dictated the direction, GPU cost concerns have begun moving from the back burner to the front. In conversations with leaders of these businesses, we at Kubecost find that one of the biggest concerns is the feasibility of running GPUs at scale. The pain point is generally expressed something like this: “Our new AI product requires us to spend two million dollars a month on GPUs, but we have no insight into that spending. Are we using them efficiently? How much are we wasting?”

What Kubecost has now begun doing for GPUs is very similar to what we started doing six years ago for traditional cloud and Kubernetes CPU and memory resources—which is to turn the black box transparent. For companies that have virtually no understanding of their GPU utilization or efficiency, Kubecost is surfacing meaningful metrics and guidance. Anecdotally, those companies often discover that they can reduce their GPU bills by as much as 50% to 70% once they’re equipped to shine a light on their GPU usage.

GPU visibility is a distinct challenge that needs a distinct strategy

GPU monitoring differs significantly from the monitoring of general CPU and memory resources that businesses are long accustomed to, and requires a different approach. On a technical level, a GPU is really a combination of CPU and memory that’s packaged into one. When using traditional CPU and memory for a given Kubernetes workload, it’s outwardly visible how much the workload is using. If you have a hundred gigabytes of memory and 16 physical cores of CPU and you know the frequency of each core, that’s your capacity.

With GPUs, you don’t have that visibility or the flexibility to request, “I want four gigabytes of that GPU, and I only want one gigahertz of that GPU to go with it.” Instead, the most common setup today is all or nothing—you request the whole GPU or none of it. The transparency challenge is that GPUs require an approach to monitoring and understanding usage that’s all their own, because GPUs are specialized and combine aspects of CPU and memory. That challenge is compounded by the fact that a node can have multiple physical GPUs in a system (sometimes up to eight). It’s also possible to add or remove GPUs from systems. That’s something typically seen in on-premises environments, and something you’d not typically see with CPUs. Those dynamics illustrate why gaining GPU visibility requires a fresh approach.

How Kubecost enables GPU monitoring and optimization

Kubecost meets the GPU visibility challenge by understanding which nodes have GPUs and whether those nodes are on a public cloud provider or in an on-premises environment. Kubecost also understands what those nodes cost, and therefore understands proportionally what the GPU costs. That’s true whether a business uses one of the “big three” cloud providers, or self-provides node costs based on its own private cloud configuration.

With those GPU costs in hand, the next step is to look at GPU utilization. Kubecost identifies cost allocation based not only on GPUs requested, but also on GPU usage, in order to recognize idle capacity. Kubecost also scrapes standard metrics, including utilization information, provided by Nvidia software. (We plan to expand to AMD and additional GPU brands.) By combining cost and utilization information, Kubecost can determine GPU efficiency, which is one of the biggest questions in business leaders’ minds as GPUs grow ever more powerful and more expensive.

Kubecost then goes a step further, providing additional intelligence that tells teams how to proactively realize their opportunities for optimization. Typically, those opportunities mean cost savings. In some cases, where teams are running into problems related to capacity, the opportunity could be to optimize by spending more.

For example, consider a cost-saving scenario where workloads request multiple GPUs, but Kubecost sees that they aren’t all being used. Kubecost will flag that inefficiency, and suggest actions to eliminate the wasted spending on idle GPUs. Kubecost can reconfigure workloads to automate some of that efficiency as well.

In the near future, Kubecost plans to take its recommendations for savings further and put the keys into the hands of users to automate these savings reclamations.

High costs mean big savings opportunities

In March, Nvidia announced the new Blackwell generation of GPUs. The newer generations of these GPUs will cost about $30,000 to $40,000 each. In scenarios where tools like Kubecost can suddenly reveal to a business that it’s using even just one more of those GPUs than is actually necessary, that’s a quick savings of $40,000. In this way, the high stakes of GPU investments make for efficiency gains that are massively beneficial. From an ROI perspective, the tools required to achieve that efficiency are simple to justify when they cover their own cost and then some shortly after implementation.

The carbon question

The power consumption and carbon costs associated with GPUs and the current AI evolution are a hot topic. GPUs use a lot of power. Organizations naturally want to know how much power, and how to reduce that consumption if possible. On this front, Kubecost’s optimization mechanisms offer an antidote to wasteful consumption.

Going forward, we’re also strategically committed to introducing visibility into GPU carbon costs, so that businesses can view their progress on curtailing that consumption alongside their other efficiency achievements.

GPU visibility means immediate cost savings

A business with an experienced team can stand up Kubecost in minutes, configure it within hours, and potentially have it dialed in and revealing potential GPU cost savings by lunchtime, or within days at the most. The staying power of Kubecost is its ability to increase efficiency over the long term and keep costs optimized while scaling. Ultimately, the aim is to build a culture of efficient engineering and finops practices, and establishing GPU visibility is a foundational step toward that goal.

Kai Wombacher is a product manager at Kubecost, an IBM company. Kai works on building Kubecost’s solution for monitoring, managing, and optimizing Kubernetes spend at scale. He has years of experience delivering meaningful solutions for technical organizations, including cutting-edge Kubernetes cost management tools and end-to-end machine learning models.

—

New Tech Forum provides a venue for technology leaders—including vendors and other outside contributors—to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to doug_dineley@foundryco.com.