This post is basically a dump of resources I’ve encountered while doing a deep dive into GPU programming. I welcome pull requests against the repo for other useful resources. Also feel free to ask questions in issues, particularly if the answer might be in the form of a patch to this post.

Understanding the hardware

Intel

Intel is one of the best GPU hardware platforms to understand because it’s documented and a lot of the work is open source.

There’s also some academic literature:

One of the funky things about Intel is the varying subgroup width; it can be SIMD8, SIMD16, or SIMD32, mostly determined by compiler heuristic, but there is a new VK_EXT_subgroup_size_control extension.

NVidia

There’s a lot of interest and activity around NVidia, but much of it is reverse engineering.

Understanding API capabilities

Subgroups

Subgroup/warp/SIMD/shuffle operations are very fast, but less compatible (nonuniform shuffle is missing from HLSL/SM6), and you (mostly) don’t get to control the subgroup size, so portability is a lot harder.

Languages

GLSL

HLSL

Metal Shading Language

OpenCL

  • clspv - compile OpenCL C (subset) to run on Vulkan compute shaders.

    • To me, this is evidence that Vulkan will simply eat OpenCL’s lunch. This is still controversial, but Khronos people are insisting there’s an “OpenCL Next” roadmap.

TensorFlow

Exotic languages

SPIR-V

  • SPIRV-Cross - transpile SPIR-V into GLSL, HLSL, and Metal Shading Language

    • This is an integral part of portability layers including MoltenVK and gfx-rs.

WebGPU

WebGPU shader language

The discussion of shader language had been very contentious. As of very recently there is a proposal for a textual language that is semantically equivalent to SPIR-V, and there seems to be agreement that this is the path forward.

The previous proposals were some profile of SPIR-V, a binary format, and Apple’s Web High Level Shading Language proposal, which evolved into Web Shading Language. Both of these had disadvantages that made them unacceptable to various people. It’s not possible to use SPIR-V directly, largely because it has undefined behavior and other unsafe stuff. The Google and Mozilla implementations addressed this by doing a rewrite pass. Conversely, Apple’s proposal met with considerable resistance because it didn’t deal with the diversity of GPU hardware in the field. There’s a lot of ecosystem work centered around Vulkan and SPIR-V, and leveraging that will help WebGPU considerably.