Chapter 7: Vulkan in practice

This is a lazy chapter whose only aim is to provide some pointers about some of the more practical aspects of Vulkan.

A. Good practices

Vulkan's low-level nature enables deep optimizations, but it does not does not make programs magically fast: higher-level APIs are very optimized for typical workloads, and Vulkan comes with a lot of performance footguns. A naïve Vulkan-based program is bound to be slower than its OpenGL equivalent. How do we avoid writing naïve programs? The resources below answer that question (short version: parallelize command buffers recording/resource creation/descriptor set updates/pipeline creation/memory allocation/memory binding, minimize pipeline switching, use sub-allocation instead of one allocation per resource, leverage caching):

Nvidia's Vulkan Dos and Don'ts
AMD's RDNA performance guide
Arm's best practices resources for Vulkan (for mobile devices)
Writing an efficient Vulkan renderer (blog post by Arseny Kapoulkine)

B. The Vulkan loader architecture

By default, every Vulkan function call we emit goes through the Vulkan loader, which is a library that implements the instance-level functionality of Vulkan and manages the available drivers (where the device-level functionality is implemented). The loader also manages the dispatch tables, which are used to forward function calls to the appropriate address. There is a single, global table for the instance-level functions (it is created at vkCreateInstance time). Moreover, each device comes with its own table (implicitly created at vkCreateDevice time). Even when we do something that looks like statical linking of all of Vulkan, we are actually only statically linking a small library that dynamically loads the Vulkan loader.

Although the loader is convenient, it adds some indirection. For best performance, we should setup our own dispatch tables using the vkGetInstanceProcAddr and vkGetDeviceProcAddr functions, as described here. Alternatively, we can use the thirdparty Volk meta-loader, which does this automatically for us and handles layers and extensions correctly. We can hope for a performance improvement in the low percents that way (Volk's original author, Arseny Kapoulkine estimates it to be in the range of 1 to 5% for typical Vulkan applications, as detailed here).

C. Memory allocation

Back in the resources and transfers chapter, we discussed how sub-allocation is the way to go for memory management, and how tools like Vulkan Memory Allocator can help us with that (if you are curious about how they work, Kyle Halladay wrote a blog post about how to write custom allocators). Well, please use it.

D. Debugging

RenderDoc is a debugger for Vulkan. It can capture frame and show the intermediate state all resources involved. You can see it in action in this short video by Oskar Schramm. You may also want to check this mesh-shading centric Vulkanised 2022 presentation by Timur Kristóf (oh, and here's a cool presentation about the history and inner workings of the tool by Baldur Karlsson, its original author).

E. Optimizing

Nobody likes stutters and low framerate. Traditional software optimization techniques apply, but some more specific ones are to be found. This video by Simon Dev provides a nice very high-level overview, and some pointers for for further exploration.

Chapter 7: Vulkan in practice

A. Good practices #

B. The Vulkan loader architecture #

C. Memory allocation #

D. Debugging #

E. Optimizing #

A. Good practices

B. The Vulkan loader architecture

C. Memory allocation

D. Debugging

E. Optimizing