Microsoft Toolkit Enables NVIDIA CUDA Models to Run on AMD AI GPUs for Cheaper AI Inference

Microsoft is developing a toolkit to translate NVIDIA CUDA models for use on AMD's AI GPUs via the ROCm stack, aiming to reduce costs for AI.
Microsoft Toolkit Enables NVIDIA CUDA Models to Run on AMD AI GPUs for Cheaper AI Inference

Microsoft Builds Toolkits for Running NVIDIA CUDA Models on AMD AI GPUs

Microsoft is actively involved in making its toolkits that allow the running of NVIDIA CUDA-based models on AI GPUs of AMD. This is an attempt to lessen dependency on many aspects of the software ecosystem owned by NVIDIA, which is further driven by a tremendous surge in demand for AI inference workloads and cost-efficient hardware solutions.

The Dilemma Behind Dominance by CUDA of NVIDIA

Definitely, NVIDIA holds in the AI market with the so-called "CUDA lock-in" by which site use in the AI industry is heavily dependent on the CUDA software ecosystem that nearly limited the adoption among developers and cloud service providers to NVIDIA for better performance. Breaking it is definitely difficult.

Microsoft's Resolution A Toolkit for CUDA-to-ROCm Transformation

According to a senior employee of Microsoft, the company has built tools to address the issue. The toolkit is called the ''CUDA-to-ROCm Translation Toolkit,'' which translates the CUDA code to be in a version compatible to run on AMD hardware through AMD's software ROCm stack.

He added, "We built some toolkits to help convert like CUDA models to ROCm so you could use it on an AMD, like a 300X. We have had a lot of inquiries about what is our path with AMD, the 400X and the 450X."

There could also be runtime compatibility layer methods to achieve such translations. One such example is the ZLUDA tool, which intercepts CUDA API calls and translates them to ROCm on the fly without requiring a complete rewrite of the source code.

Difficulties and Limitations Yet

This is not a trouble-free exercise. AMD's ROCm is considered not as mature a software stack as that of CUDA. Thus, some CUDA codes or API calls have no direct equivalent in ROCm, which can significantly degrade performance, a very risky terrain in large-scale footfalls in datacenter environments. It appears that Microsoft's toolkits are not yet intended for general use.

Cost Underlines Economical Inference with AI

Primarily, these changes that Microsoft is creating pertain to the changing face of workloads for all kinds of AI. The company is in a serious increase in demand for inference (running trained AI models) as opposed to in-demand training. For such applications, AMD's AI chips are way cheaper alternatives than their NVIDIA counterparts in GPUS. Given that most inference environments are built around CUDA models, creating a reliable behavior of such models through translation to ROCm is an important part that Microsoft needs to finish in order to optimize AMD's hardware and lower costs.

About the author

mgtid
Owner of Technetbook | 10+ Years of Expertise in Technology | Seasoned Writer, Designer, and Programmer | Specialist in In-Depth Tech Reviews and Industry Insights | Passionate about Driving Innovation and Educating the Tech Community Technetbook

Join the conversation