Chapter 7: Vulkan in practice

Warning

This chapter has only been proofread once

This is a lazy chapter whose only aim is to provide some pointers about some of the more practical aspects of Vulkan.

A. Good practices

Vulkan is low-level, which makes it possible to optimize programs written with it very thoroughly. However, using Vulkan does not make programs magically fast: higher-level APIs are very optimized for typical workloads, and Vulkan comes vith a lot of performance footguns. In fact, a naive Vulkan-based program is bound to be slower than its, e.g., OpenGL equivalent. How do we avoid writing such naive programs? The resources below answer that question (short version: parallelize command buffers recording/resources creation/descriptor set updates/pipeline creation/memory allocation/memory binding, minimize pipeline changes, do not do one allocation per resource but use sub-allocation instead, and use caching):

B. The Vulkan loader architecture

By default, every Vulkan function call that we emit goes through the Vulkan loader, which is a library that implements the instance-level functionality of Vulkan and manages the available drivers (where the device-level functionality is implemented). The loader also manages the dispatch tables, which are used to forward function calls to the appropriate addresses (one for the instance-level functions created at vkCreateInstance time, and an additional one per device, which gets implicitly created at vkCreateDevice time).

Even when we do something that looks like statical linking of Vulkan, we are actually only statically linking a small library that loads the Vulkan loader dynamically for us.

Although the loader is convenient, going through it makes loading not as direct as it could be. For best performance, we should setup our own dispatch tables (using the vkGetInstanceProcAddr and vkGetDeviceProcAddr functions), as described here. Alternatively, we can use the thirdparty Volk meta-loader, which does this automatically for us and handles layers and extensions correctly. We can hope for a performance improvement in the low percents that way (Volk's original author, Arseny Kapoulkine estimates it to be in the range of 1 to 5% for typical Vulkan applications, as detailed here).

C. Memory allocation

Back in the resources and transfers chapter, we discussed how sub-allocation is the way to go for memory management, and how tools like Vulkan Memory Allocator can help us with that (if you are curious about how they work, Kyle Halladay wrote a blog post about how to write custom allocators). Well, please use it.

D. Debugging

RenderDoc is a debugger for Vulkan. It can capture frame and show the intermediate state all resources involved. You can see it in action in this short video by Oskar Schramm. You may also want to check this mesh-shading centric Vulkanized 2022 presentation by Timur Kristóf (oh, and here's a cool presentation about the history and inner workings of the tool by Baldur Karlsson, its original author).