Jan 05 2024
Cloud

Can Serverless GPUs Meet the Computing Demands of Artificial Intelligence?

Physical graphics processing units are in high demand, but a serverless approach may deliver the power companies need without worrying about chipset supplies.

The humble GPU is no longer quite so humble. 

​GPUs, or graphics processing units, have long been primarily known as an essential component for processing images at a high velocity, usually for video games. Today, however, they are critical to training and running generative artificial intelligence applications, which have seen a boom since the release of OpenAI’s ChatGPT in late 2022

​“They have the same capability of higher capacity and faster processing when it comes to doing the mathematical algorithms behind large language models,” says Lara Greden, research director for IDC’s Platform as a Service (PaaS) practice.

​But there’s a problem: All of that demand for GPUs, especially from market leader NVIDIA, has led to a shortage. Enter serverless GPUs, which may offer a path forward for companies looking to leverage generative AI tools. 

Click the banner to learn how to accelerate your DevOps processes.

Serverless GPUs Deliver Value at Lower Cost

Serverless technology, often seen as the ultimate incarnation of cloud computing, allows developers to create and run applications in the cloud without provisioning or managing servers or back-end infrastructure. 

​With serverless GPUs, companies can get the benefits of GPUs while also optimizing for costs and incorporating the scalability of cloud infrastructure to spin capacity up or down as demand requires, Greden says. That is ideal for AI applications that require massive amounts of computing power but aren’t necessarily being run constantly. 

​“What we now have is a case where the aperture is opening for the market that will want to make use of GPUs, and not just those doing heavy graphical type of computing,” she says. 

LEARN MORE: How can you use serverless computing to build and modernize applications for scale?

Get Agility Plus Capacity With Serverless GPUs

Serverless GPUs essentially operate as a PaaS or even a Function as a Service, allowing organizations to access serverless computing capacity for their applications while avoiding provisioning infrastructure, says Brijesh Kumar, a senior research analyst within IDC’s cloud application deployment platforms research practice. 

They are ideal for when organizations cannot always predict the traffic load they will have for their cloud computing capacity, he says. The technology can allow them to spin up GPU capacity when requests come in and demand is high, then scale down to zero when requests stop.

​The technology also supports multitenancy or multi-instance capabilities, allowing cloud providers to partition serverless GPUs to support multiple workload requests from different users or sources, Kumar notes. Serverless GPUs also reduce costs by removing the need to manage the necessary infrastructure, Kumar says. 

WATCH: Discover how DevOps can add speed and efficiency to your process.

​However, there are some potential drawbacks to serverless GPUs. Cost can become a constraint, since running a serverless GPU for an extended period of time will rack up charges with an organization’s cloud provider. An unexpected spike in requests can also raise costs. And, Greden says, organizations risk being locked in to using a particular cloud provider for serverless GPU capabilities. 

​Even so, serverless GPUs can quickly deliver answers to users of generative AI applications because of the speed with which they can perform the required computations. And with market demand for GPU chipsets still extremely high, using serverless GPUs could be a critical backstop while silicon providers rush to produce as many GPUs as they can, Greden says. 

Daniel Hertzberg/Theispot
Close

Become an Insider

Unlock white papers, personalized recommendations and other premium content for an in-depth look at evolving IT