Article By : Jon Peddie

Benefits of fused CPU-GPU ISA embrace the flexibility to implement an ordinary graphics pipeline in microcode, present assist for customized shaders and implement ray-tracing extensions…
A bunch of fans are proposing a brand new set of graphics directions designed for 3D graphics and media processing. These new directions are constructed on the RISC-V base vector instruction set. They are going to add assist for brand new knowledge varieties which might be graphics particular as layered extensions within the spirit of the core RISC-V instruction set structure (ISA). Vectors, transcendental math, pixel, and textures and Z/Body buffer operations are supported. It may be a fused CPU-GPU ISA. The group is looking it the RV64X as directions will likely be 64-bit lengthy (32 bits won’t be sufficient to assist a sturdy ISA).
Why now?
The world has loads of GPUs to select from, why this? As a result of, says the group, business GPUs are much less efficient at assembly uncommon wants similar to dual-phase 3D frustum clipping, adaptable HPC (arbitrary bit depth FFTs), {hardware} SLAM. They believecollaboration offers versatile requirements, reduces the ten to twenty man-year effort in any other case wanted, and can assist with cross-verification to keep away from errors.
The crew says their motivation and objectives are pushed by the need to create a small, area-efficient design with customized programmability and extensibility. It ought to supply low-cost IP possession and growth, and never compete with business choices. It may be applied in FPGA and ASIC targets and will likely be free and open supply. The preliminary design will likely be focused to low-power microcontrollers. It will likely be Khronos Vulkan-compliant, and over time assist different APIs (OpenGL, DirectX and others).
The ultimate {hardware} will likely be a RISC-V core with a GPU useful unit. To the programmer it would seem like a single piece of {hardware} with 64-bit lengthy directions coded as scalar directions. The programming mannequin is an obvious SIMD, that’s, the compiler generates SIMD from prefixed scalar opcodes. It can embrace variable-issue, predicated SIMD backend, vector front-end, exact exceptions, department shadowing and a lot extra. There received’t be any want for RPC/IPC calling mechanism to ship 3D API calls to/from unused CPU reminiscence house to GPU reminiscence house and vice-versa, says the crew. And it will likely be obtainable as 16-bit fastened level (splendid for FPGAs), in addition to 32-bit floating level (ASICs or FPGAs).
The design will make use of the Vblock format (from the Libre GPU effort):
- It’s a bit-like VLIW (solely probably not)
- A block of directions is pre-fixed with register tags which give additional context to scalar directions throughout the block
- Sub-blocks embrace: vector size, swizzling, vector/width overrides and predication.
- All that is added to scalar opcodes
- There are not any vector opcodes (and no want for any)
- Within the vector context, it goes like this: if a register is utilized by a scalar opcode, and the register is listed within the vector context, vector mode is activated
- Activation ends in a hardware-level for-loop issuing a number of contiguous scalar operations (as an alternative of only one).
- Implementers are free to implement the loop in any vogue they want: SIMD, multi-issue, single-execution.
The design will make use of scalars (8-, 16-, 24- and 32-bit fastened and floats), in addition to transcendentals (sincos, atan, pow, exp, log, rcp, rsq, sqrt, and so forth.). The vectors (RV32-V) will assist 2-4 factor (8-, 16- or 32-bits/factor) vector operations, together with specialised directions for a normal 3D graphics rendering pipeline for factors, pixels, texels (primarily particular vectors)
- XYZW factors (64- and 128-bit fastened and floats)
- RGBA pixels (8-, 16-, 24- and 32-bit pixels)
- UVW texels (8-, 16-bits per element)
- Lights and supplies (Ia, ka, Id, kd, Is, ks…)
Matrices will likely be 2 × 2, 3 × 3, and 4 × 4 matrices will likely be supported as a local knowledge sort together with reminiscence buildings to assist them for attribute vectors and will likely be primarily represented in a 4 × 4 matrix.
Among the many benefits of fused CPU-GPU ISA is the flexibility to implement an ordinary graphics pipeline in microcode, present assist for customized shaders and implement ray-tracing extensions. It additionally helps vectors for numerical simulations with 8-bit integer knowledge varieties for AI and machine studying.
Customized rasterizers could be applied similar to splines, SubDiv surfaces and patches.
The design will likely be versatile sufficient that it might probably implement customized pipeline levels, customized geometry/pixel/body buffer levels, customized tessellators and customized instancing operations.

The RV64X reference implementation will embrace:
- Instruction/Knowledge SRAM Cache (32KB)
- Microcode SRAM(8KB)
- Twin Perform Instruction Decoder
- Hardwired implementing RV32V and X
- Micro-coded Instruction Decoder for customized ISA
- Quad Vector ALU (32 bits/ALU—fastened/float)
- 136-bit Register Information (1K parts)
- Particular Perform Unit
- Texture Unit
- Configurable native Body Buffer
The design is supposed to be scalable as indicated beneath.

The RV64X design has a number of novel concepts together with fused unified CPU-GPU ISA, configurable registers for customized knowledge varieties, and user-defined SRAM based mostly micro-code for application-defined customized {hardware} extensions for:
- Customized rasterizer levels
- Ray tracing
- Machine studying
- Laptop imaginative and prescient
The identical design serves each as a stand-alone graphics microcontroller or scalable shader unit, and knowledge codecs assist FPGA-native or ASIC implementations.
Why is there a want for open graphics?
The builders suppose most graphics processors cowl the excessive finish similar to gaming, high-frequency buying and selling, pc imaginative and prescient and machine studying. They imagine the ecosystem lacks a scalable graphics core for extra mainstream purposes for issues like kiosks, billboards, on line casino gaming, toys, robotics, home equipment, wearables, industrial human-machine interfaces, infotainment and automotive gauge clusters. In the meantime, specialty programming languages should be used to program GPU cores for OpenGL, OpenCL, CUDA, DirectCompute and DirectX.
A graphics extension for RISC-V would resolve the scalability and multi-language burdens enabling a better degree of use case innovation.
Subsequent steps
This can be a very early spec, nonetheless in growth and topic to alter based mostly on stakeholder and business enter. The crew will set up a dialogue discussion board. A right away purpose isbuilding a pattern implementation with instruction set simulator, an FPGA implementation utilizing open-source IP and customized IP designed as open-source mission. Demos and benchmarks are being designed. Builders occupied with taking part ought to contract Atif Zafar.
As for the Libre-RISC 3D GPU, the group’s purpose is to design a hybrid CPU, VPU, and GPU. It’s not, as extensively reported, a “devoted unique GPU.” The choice exists to create a stand-alone GPU product. Their major purpose is to design an entire all-in-one processor SoC that occurs to incorporate a Libre-licensed VPU and GPU.
What do we expect?
The inhabitants of GPU suppliers is rising. We now have over a dozen.
Apple | Libre-RISC-V 3D GPU | Qualcomm |
AMD | Nvidia | RISC-V Graphics |
Arm | Intel | Suppose-Silicon |
DMP | Jingjia Micro | VeriSilcion |
Creativeness Applied sciences |
An utility not listed as a possible consumer of a free, versatile, small GPU contains crypto-currency and mining.
If it’s the purpose of the RISC-V neighborhood to emulate the IP suppliers similar to Arm and Creativeness, then we will anticipate to see DSP, ISP and DP designs. There’s at the very least one Open DSP proposal; maybe it may be introduced into the RISC-V neighborhood.
It can take at the very least two years earlier than any {hardware} implementations emerge. One of the logical candidates for adopting this design is Xilinx, which is now utilizing Arm’s Mali in its Zynq design. We might additionally anticipate to see a number of implementations come out of China.
— Jon Peddie, a pioneer within the graphics business, is president of Jon Peddie Analysis.