Can Open Source GPUs Overtake Nvidia?

Advertisements

In a significant move that has stirred the tech industry, AMD Radeon recently announced on the X platform that they will be open-sourcing the documentation for their Micro Engine Scheduler (MES) firmware by the end of MayThis decision comes on the heels of growing interest in the ROCm (Radeon Open Computer) platform for Radeon GPUsTo facilitate better community feedback, AMD has established a GitHub trackerAs the second-largest player in the GPU market, this initiative marks an important step toward open sourcing in the GPU domain, emphasizing AMD's commitment to transparency and collaboration with the developer community.

Open source has been a hot topic in recent years, particularly in the CPU arena, where the RISC-V open instruction set architecture has gained tremendous tractionRenowned for its open, flexible, and customizable nature, RISC-V has attracted significant attention and adoptionRecently, this trend has begun to permeate into the GPU world, with various projects and products yielding preliminary resultsThe burning question remains: can RISC-V replicate its CPU success in the GPU landscape?

As NVIDIA continues to dominate the GPU market, the question arises: can open-sourcing be the key to breaking this monopoly? The push for open-source GPUs is currently being championed by several chip manufacturers, notably AMDOpen source is a significant selling point for AMD’s AI software and hardware ecosystemWhile the ROCm software has been open-source since its launch in 2016, the decision to open-source firmware is equally critical, as the MES firmware for AMD GPUs has traditionally been developed in-house and closed off from external scrutiny.

To better understand the implications of this move, let's clarify what firmware encompassesFirmware represents a specific type of software embedded within hardware devices, providing essential control instructions that allow hardware to execute predetermined tasksThough technically classified as software, firmware is closely tied to hardware; it is typically stored in non-volatile memory, such as ROM, EEPROM, or flash storage

Advertisements

As a bridge between hardware and more advanced software—such as operating systems and applications—firmware ensures that hardware operates as intended and communicates effectively with other system componentsUnlike typical software, firmware is usually not updated frequently; any updates are typically aimed at adding new features or correcting bugs.

Typically, firmware is not easily made open-source due to the potential inclusion of proprietary technologies and business secrets that are integral to a company's competitive edgeThe risk associated with releasing open-source firmware could lead to the exposure of critical information and potentially give competitors an advantage.

Behind AMD's decision to open-source the firmware is the compelling advocacy of AI startup Tiny CorpTiny Corp has been utilizing AMD's Radeon 7900 XTX GPU to develop their "TinyBox"—a machine equipped with six graphics cards designed exclusively for processing AI workloadsHowever, in March this year, TinyBox encountered technical difficulties related to the MES firmwareFollowing these issues, the company's X account began to express dissatisfaction over driver and firmware problems that led to crashes and freezes, severely impacting their rollout scheduleThis prompted repeated public calls for AMD to open-source its MES firmware to rectify the persistent bugsThere was even talk of switching to NVIDIA or Intel GPUs because of these ongoing issues.

Initially, AMD's CEO Lisa Su rebuffed the proposal to open-source the MES firmware, but recent developments suggest that AMD may have ultimately concededIf AMD follows through on this commitment, it could reinforce Tiny Corp's partnership with AMD and potentially usher in a larger shift among enterprises and developers towards AMD's platformNotably, the cost differential is substantial; the TinyBox equipped with AMD GPUs is priced at $15,000, whereas the version with NVIDIA GPUs can reach $25,000. Both computing platforms are projected for release in June

Advertisements

Tiny Corp is also contemplating a future release of an Arc A770 TinyBox, although a prototype is currently all that has been developed with no immediate plans for launch.

This strategic move may not only bolster AMD's competitive stance in the AI realm but also position ROCm as a formidable competitor to NVIDIA's proprietary CUDA software stackROCm stands as AMD's open-source solution for GPU computingAlthough NVIDIA's CUDA is praised for its utility, it remains a closed-source offering, limiting user control and adaptability.

Meanwhile, Intel's open software framework, oneAPI, is also making strides in this domainIn September 2023, the Linux Foundation launched the Unified Acceleration (UXL) Foundation, aiming to establish a new standard intended to diminish CUDA's dominanceAccording to Jim Zemlin, the executive director of the Linux Foundation, "The Unified Acceleration Foundation embodies the power of collaboration and open-source methodologyBy uniting leading tech companies and nurturing a cross-platform development ecosystem, we will unleash new possibilities for the performance and productivity of data-centric solutions."

The UXL group's development efforts primarily focus on enhancing Intel's OneAPI software toolkitOneAPI is built upon a framework known as SYCL, an open standard from Khronos, which aims to simplify the development process across multiple architectures, including CPUs, GPUs, FPGAs, and accelerators, focusing on improving application portabilityIntel has further expanded this framework with several additional features, including SYCLomatic, designed to facilitate the conversion of software written for NVIDIA CUDA into SYCL code that can run on other companies' AI chips.

Rod Burns, a vice president within the UXL ecosystem, mentioned in a recent article that the establishment of UXL reflects a natural evolution within the fast-paced world of heterogeneous computingThe foundation's early guiding members include eight international companies in the semiconductor and software sectors, such as Qualcomm, Google, Intel, Arm, Imagination, Samsung, Broadcom's VMware, and Fujitsu

Advertisements

Analysts believe that UXL's formation symbolizes a collective effort by these firms to challenge NVIDIA's CUDA stronghold.

Vinesh Sukumar, Qualcomm's head of artificial intelligence and machine learning, stated in a Reuters interview, "We are, in essence, showing developers how to migrate off the NVIDIA platform." In the long term, UXL's goal is to minimize the workload and costs associated with transitioning CUDA-supported applications to rival chips, potentially stimulating more competition against NVIDIA’s dominant graphics cards.

This prevailing trend underscores the industry's move towards open-source and open standards in GPU programmingWhile CUDA currently enjoys a robust position in the GPU domain, the shift towards openness could significantly reduce reliance on a singular vendor, fostering greater competition and collaboration within the sectorHowever, dismantling existing monopolies is no easy task—it requires time, resources, and a commitment to ongoing innovation.

As the winds of change driven by RISC-V blow into the GPU sector, the industry is hopeful for a more adaptable and scalable open standard GPU that caters to various marketsCompanies and academic institutions alike are beginning to take notice, exploring and experimenting with open-source GPGPU, or General-Purpose Graphics Processing Unit projects.

For instance, X-Silicon Inc. (XSi), a startup focused on providing open standard computing graphics silicon IP solutions, is developing a unified graphics computing engine (C-GPU) based on the RISC-V vector to innovate GPU designsTraditional GPU architectures have largely adhered to the SIMD (Single Instruction, Multiple Data) model, which often, is confined by the host CPU, operating systems, and graphics services, curbing innovation and preserving existing market controlX-Silicon's architecture operates under the MIMD (Multiple Instruction, Multiple Data) model, enabling simultaneous execution of CPU and GPU codes on a single chip, thus enhancing performance and minimizing memory usage.

Another notable player, Imagination Technology, has introduced portions of RISC-V compatible GPU IPs, highlighting that all its GPUs support RISC-V System on Chip (SoC). Additionally, Think Silicon, an embedded systems GPU IP provider under Applied Materials, is developing a GPGPU solution for the MCU market grounded in RISC-V, featuring a unique integration of 3D graphics execution and artificial intelligence.

In academia, leading institutions have embarked on open-source GPGPU projects as well

Georgia Institute of Technology’s open-source RISC-V GPGPU project, Vortex, is notableCurrently at version 2.0, Vortex supports OpenCL and operates on various FPGAs, including Altera and Xilinx modelsThe platform boasts extensive customization and scalability, providing a complete open-source compiler, drivers, and runtime software stack for GPU architecture experimentation.

Given the massive adoption of GPGPU in AI presents an encouraging backdrop, Tsinghua University’s Integrated Circuit Institute has initiated the "Chengying" GPGPU open-source project, officially launching on January 26, 2024. The project taps RISC-V's core instructions and custom instructions, representing an innovative approach to industrial GPGPU design aimed at educational and research purposes.

Interestingly, while RISC-V primarily targets CPU architectures, the project goal illustrates the adaptability of the RISC-V instruction setHe Hu, a researcher involved in the development, explained that the decision to leverage RISC-V for GPGPU stemmed from the need for a suitable instruction set architectureObservations of international open-source GPU projects reveal a trend of adopting commercial GPU instruction sets, which can limit ongoing development due to proprietary architecture constraintsBy choosing an open instruction set, the Chengying project can innovate without restrictionsThe vector instructions from RISC-V present unique advantages over scalar instructions, enabling functionalities aimed at high-performance GPGPU applications.

Ultimately, the Chengying GPGPU microarchitecture development is comprehensive, utilizing Chisel language, with numerous integrated tools including an OpenCL compiler, functionality verification software, and support for a range of OpenCL hardware tests.

The Chengying project embodies an effort not only to standardize GPGPU instruction sets but also to forge a cohesive ecosystem that allows companies to avoid reinventing the wheel in GPU development and fosters collaboration across the industry

Advertisements

Can Open Source GPUs Overtake Nvidia?

Leave a Comment

Latest Posts

Japan Rate Hike Jolts Markets

Fluctuations in Australian Interest Rates

OpenAI Strategy Shift

Tech Bubble Burst in the Trillion-Dollar U.S. Valuation

CBA Profits Surge

Categories