Entries for tag "libraries", ordered from most recent. Entry count: 38.
# WinFontRender - my new library
Displaying text is a common problem in graphics applications where all you can do is to render textured quads. I've implemented my solution already back in 2007, as part of my old engine The Final Quest 7, which was my master thesis. I've recently come back to this code and improved it because I needed it for the personal project I now work on. Then I thought: Maybe it's a good idea to extract this code into a library? So here it is:
It does two things:
1. It renders characters of the font to a texture, tightly packed.
2. It calculates vertices needed to render given text.
Here are more details about the library:
# Vulkan Memory Allocator Survey March 2019
Are you a software developer, use Vulkan and the Vulkan Memory Allocator library (or at least considered using it)? If so, please spend a few minutes and help to shape the future of the library by participating in the survey:
Your feedback is greatly appreciated. The survey is anonymous - no personal data is collected like name, e-mail etc. All questions are optional.
# How to design API of a library for Vulkan?
In my previous blog post yesterday, I shared my thoughts on graphics APIs and libraries. Another problem that brought me to these thoughts is a question: How do you design an API for a library that implements a single algorithm, pass, or graphics effect, using Vulkan or DX12? It may seem trivial at first, like a task that just needs to be designed and implemented, but if you think about it more, it turns out to be a difficult issue. They are few software libraries like this in existence. I don’t mean here a complex library/framework/engine that “horizontally” wraps the entire graphics API and takes it to a higher level, like V-EZ, Nvidia Falcor, or Google Filament. I mean just a small, “vertical”, plug-in library doing one thing, e.g. implementing ambient occlusion effect, efficient texture mipmap down-sampling, rendering UI, or simulating particle physics on the GPU. Such library needs to interact efficiently with the rest of the user’s code to be part of a large program or game. Vulkan Memory Allocator is also not a good example of this, because it only manages memory, implements no render passes, involves no shaders, and it interacts with a command buffer only in its part related to memory defragmentation.
I met this problem at my work. Later I also discussed it in details with my colleague. There are multiple questions to consider:
VK_IMAGE_USAGE_flags. If the library should allocate them internally, how should it do it? By just grabbing new pieces of
VkDeviceMemoryblocks? What if the user prefers all GPU memory to be allocated using a complex allocator of his choice, like Vulkan Memory Allocator?
VkDescriptorPool, but what if the user prefers all the descriptors to be allocated from his own descriptor pool?
vkGetDeviceProcAddr, possibly with help of something like volk? The library should be able to use those.
VkAllocationCallbacks) to all Vulkan functions? The library should be able to do it.
This is a problem similar to what we have with any C++ libraries. There is no consensus about the implementation of various basic facilities, like strings, containers, asserts, mutexes etc., so every major framework or game engine implements its own. Even something so simple as min/max function is defined is multiple places. It is defined once in
<algorithm> header, but some developers don’t use STL.
<Windows.h> provides its own, but these are defined as macros, so they break any other, unless you
#define NOMINMAX before the include… A typical C++ nightmare. Smaller libraries are better just configurable or define their own everything, like the Vulkan Memory Allocator having its own assert, vector (can be switched to standard STL one), and 3 versions of read-write mutex.
All these issues make it easier for developers to just write a paper, describe their algorithm, possibly share a piece of code, pseudo-code or a shader, rather than provide ready to use library. This is a very bad situation. I hope that over time patterns emerge of how the API of a library implementing a single pass or effect using Vulkan/DX12 should look like. Recently my colleague shared an idea with me that if there was some higher-level API that would implement all these interactions between various parts (like resource allocation, image barriers) and we all commonly agreed on using it, then authoring libraries and stitching them together on top of it would be way easier. That’s another argument for the need of such new, higher-level graphics API.
# Thoughts on graphics APIs and libraries
Warning: This is a long rant. I’d like to share my personal thoughts and opinions on graphics APIs like Vulkan, Direct3D 12.
Some time ago I came up with a diagram showing how the graphics software technologies evolved over last decades – see my blog post “Lower-Level Graphics API - What Does It Mean?”. The new graphics APIs (Direct3D 12, Vulkan, Metal) are not only a clean start, so they abandon all the legacy garbage going back to ‘90s (like
glVertex), but they also take graphics programming to a new level. It is a lower level – they are more explicit, closer to the hardware, and better match how modern GPUs work. At least that’s the idea. It means simpler, more efficient, and less error-prone drivers. But they don’t make the game or engine programming simpler. Quite the opposite – more responsibilities are now moved to engine developers (e.g. memory management/allocation). Overall, it is commonly considered a good thing though, because the engine has higher-level knowledge of its use cases (e.g. which textures are critically important and which can be unloaded when GPU memory is full), so it can get better performance by doing it properly. All this is hidden in the engines anyway, so developers making their games don’t notice the difference.
Those of you, who – just like me – deal with those low-level graphics APIs in their everyday work, may wonder if these APIs provide the right level of abstraction. I know it will sound controversial, but sometimes I get a feeling they are at the exactly worst possible level – so low they are difficult to learn and use properly, while so high they still hide some implementation details important for getting a good performance. Let’s take image/texture barriers as an example. They were non-existent in previous APIs. Now we have to do them, which is a major pain point when porting old code to a new API. Do too few of them and you get graphical corruptions on some GPUs and not on the others. Do too many and your performance can be worse than it has been on DX11 or OGL. At the same time, they are an abstract concept that still hides multiple things happening under the hood. You can never be sure which barrier will flush some caches, stall the whole graphics pipeline, or convert your texture between internal compression formats on a specific GPU, unless you use some specialized, vendor-specific profiling tool, like Radeon GPU Profiler (RGP).
It’s the same with memory. In DX11 you could just specify intended resource usage (
D3D11_USAGE_DYNAMIC) and the driver chose preferred place for it. In Vulkan you have to query for memory heaps available on the current GPU and explicitly choose the one you decide best for your resource, based on low-level flags like
VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT etc. AMD exposes 4 memory types and 3 memory heaps. Nvidia has 11 types and 2 heaps. Intel integrated graphics exposes just 1 heap and 2 types, showing the memory is really unified, while AMD APU, also integrated, has same memory model as the discrete card. If you try to match these to what you know about physically existing video RAM and system RAM, it doesn’t make any sense. You could just pick the first
DEVICE_LOCAL memory for the fastest GPU access, but even then, you cannot be sure your resource will stay in video RAM. It may be silently migrated to system RAM without your knowledge and consent (e.g. if you go out of memory), which will degrade performance. What is more, there is no way to query for the amount of free GPU memory in Vulkan, unless you do hacks like using DXGI.
Hardware queues are no better. Vulkan claims to give explicit access to the pieces of GPU hardware, so you need to query for queues that are available. For example, Intel exposes only a single graphics queue. AMD lets you create up to 3 additional compute-only queues and 2 transfer queues. Nvidia has 8 compute queues and 1 transfer queue. Do they all really map to silicon that can work in parallel? I doubt it. So how many of them to use to get the best performance? There is no way to tell by just using Vulkan API. AMD promotes doing compute work in parallel with 3D rendering while Nvidia diplomatically advises to be “conscious” with it.
It's the same with presentation modes. You have to enumerate
VkPresentModeKHR-s available on the machine and choose the right one, along with number of images in the swapchain. These don't map intuitively to a typical user-facing setting of V-sync = on/off, as they are intended to be low level. Still you have no control and no way to check whether the driver does "blit" or "flip".
One could say the new APIs don’t deliver to their promise of being low level, explicit, and having predictable performance. It is impossible to deliver, unless the API is specific to one GPU, like there is on consoles. A common API over different GPUs is always high level, things happen under the hood, and there are still fast and slow paths. Isn’t all this complexity just for nothing? It may be true that comparing to previous generation APIs, drivers for the new ones need not launch additional threads in the background or perform shader compilation on first draw call, which greatly reduces chances of major hitching. (We will see how long this state will persist as the APIs and drivers evolve.) * Still there is no way to predict or ensure minimum FPS/maximum frame time. We are talking about systems where multiple processes compete for resources. On modern PCs there is even no way to know how many cycles will a single instruction take! Cache memory, branch prediction, out-of-order execution – all of these mechanisms are there in the CPU to speed up average cases, but there can always be cases when it works slowly (e.g. cache miss). It’s the same with graphics. I think we should abandon the false hope of predictable performance as a thing of the past, just like rendering graphics pixel-perfect. We can optimize for the average, but we cannot ensure the minimum. After all, games are “soft real-time systems”.
Based on that, I am thinking if there is a room for a new graphics API or top of DX12 or Vulkan. I don’t mean whole game engine with physical simulation, handling sound, input controllers and all, like Unity or UE4. I mean an API just like DX11 or OGL, on a similar or higher abstraction level (if higher level, maybe the concept of persistent “frame graph” with explicit pass and resource dependencies is the way to go?). I also don’t think it’s enough to just reimplement any of those old APIs. The new one should take advantage of features of the explicit APIs (like parallel command buffer recording), while hiding the difficult parts (e.g. queues, memory types, descriptors, barriers), so it’s easier to use and harder to misuse. (An existing library similar to this concept is V-EZ from AMD.) I think it may still have good performance. The key thing needed for creation of such library is abandoning the assumption that developer must define everything up-front, with nothing allocated, created, or transferred on first use.
See also next post: "How to design API of a library for Vulkan?"
Update 2019-02-12: I want to thank all of you for the amazing feedback I received after publishing this post, especially on Twitter. Many projects have been mentioned that try to provide an API better than Vulkan or DX12 - e.g. Apple Metal, WebGPU, The Forge by Confetti.
* Update 2019-04-16: Microsoft just announced they are adding background shader optimizations to D3D12, so driver can recompile and optimize shaders in the background on its own threads. Congratulations! We are back at D3D11 :P
# Debugging D3D12 driver crash
New generation, explcit graphics APIs (Vulkan and DirectX 12) are more efficient, involve less CPU overhead. Part of it is that they don't check most errors. In old APIs (Direct3D 9, OpenGL) every function call was validated internally, returned success of failure code, while driver crash indicated a bug in driver code. New APIs, on the other hand, rely on developer doing the right thing. Of course, some functions still return error code (especially ones that allocate memory or create some resource), but those that record commands into a command list just return
void. If you do something illegal, you can expect undefined behavior. You can use Validation Layers / Debug Layer to do some checks, but otherwise everything may work fine on some GPUs, you may get incorrect result, or you may experience driver crash or timeout (called "TDR"). Good thing is that (contrary to old Windows XP), crash inside graphics driver doesn't cause "blue screen of death" or machine restart. System just restarts graphics hardware and driver, while your program receives
DXGI_ERROR_DEVICE_REMOVED code from one of functions like
IDXGISwapChain::Present. Unfortunately, you then don't know which specific draw call or other command caused the crash.
NVIDIA proposed solution for that: they created NVIDIA Aftermath library. It lets you (among other things) record commands that write custom "marker" data to a buffer that survives driver crash, so you can later read it and see which command was successfully executed last. Unfortunately, this library works only with NVIDIA graphics cards.
Some time ago I showed a portable solution for Vulkan in my post: "Debugging Vulkan driver crash - equivalent of NVIDIA Aftermath". Now I'd like to present a solution for Direct3D 12. It turns out that this API also provides a standardized way to achieve this, in form of a method ID3D12GraphicsCommandList2::WriteBufferImmediate. One caveat: This new version of the interface requires:
I created a simple library that implements all the required logic under easy interface, which I called D3d12AfterCrash. You can find all the details and instruction for how to use it in file "D3d12AfterCrash.h".
I guess it would be better to allocate the buffer using WinAPI function
VirtualAlloc(NULL, bufferSize, MEM_COMMIT, PAGE_READWRITE), then call
ID3D12Device::CreatePlacedResource, but my simple way of just doing
ID3D12Device::CreateCommittedResource seems to work - buffer survives driver crash and preserves its content. I checked it on AMD as well as NVIDIA card.
Update 2019-01-28: Microsoft is releasing a new, more powerful API for analyzing reasons of driver/GPU crash, called Device Removed Extended Data (DRED).
# Vulkan Memory Allocator 2.1.0
Yesterday I merged changes in the code of Vulkan Memory Allocator that I've been working on for past few months to "master" branch, which I consider a major milestone, so I marked it as version 2.1.0-beta.1. There are many new features, including:
The release also includes many smaller bug fixes, improvements and additions. Everything is tested and documented. Yet I call it "beta" version, to encourage you to test it in your project and send me your feedback.
# str_view - null-termination-aware string-view class for C++
tl;dr I've written a small library, which I called "str_view - null-termination-aware string-view class for C++". You can find code and documentation on GitHub - sawickiap/str_view. Read on to see full story behind it...
Let me disclose my controversial beliefs: I like C++ STL. I think that any programming language needs to provide some built-in strings and containers to be called modern and suitable for developing large programs. But of course I'm aware that careless use of classes like
std::map makes program very slow due to large number of dynamic allocations.
What I value the most is RAII - the concept that memory is automatically freed whenever an object referenced by value is destroyed. That's why I use
std::unique_ptr all over the place in my personal code. Whenever I create and own an array, I use
std::vector, but when I just pass it to some other code for reading, I pass raw pointer and number of elements -
myVec.size(). Similarly, whenever I own and build a string, I use
std::string (or rather
std::wstring - I like Unicode), but when I pass it somewhere for reading, I use raw pointer.
There are multiple ways a string can be passed. One is pointer to first character and number of characters. Another one is pointer to first character and pointer to the next after last character - a pair of iterators, also called range. These two can be trivially converted between each other. Out of these, I prefer pointer + length, because I think that number of characters is slightly more often needed than pointer past the end.
But there is another way of passing strings common in C and C++ programs - just one pointer to a string that needs to be null-terminated. I think that null-terminated strings is one of the worst and the most stupid inventions in computer science. Not only it limits set of characters available to be used in string content by excluding
'\0', but it also makes calculation of string length O(n) time complexity. It also creates opportunity for security bugs. Still we have to deal with it because that's the format that most libraries expect.
I came up with an idea for a class that would encapsulate a reference to an externally-owned, immutable string, or a piece of thereof. Objects of such class could be used to pass strings to library functions instead of e.g. a pointer to null-terminated string or a pair of iterators. They can be then queried for
length(), indexed to access individual characters etc., as well as asked for a null-terminated copy using
c_str() method - similar to
Code like this already exists, e.g. C++17 introduces class
std::string_view. But my implementation has a twist that I'm quite happy with, which made me call my class "null-termination-aware". My
str_view class not only remembers pointer and length of the referred string, but also the way it was created to avoid unnecessary operations and lazily evaluate those that are requested.
c_str()trivially returns pointer to the original string.
length()trivially returns it.
c_str()creates a local, null-terminated copy of the string upon first call.
If you consider such class useful in your C++ code, see GitHub - sawickiap/str_view project for code (it's just a single header file), documentation, and extensive set of tests. I share this code for free, on MIT license. Feel free to contact me if you find any bugs or have any suggestions regarding this library.
# Human-friendly classification of Vulkan resources
In graphics programming we deal with different kinds of resources. Their specific types and names depend on graphics API. For example, in Direct3D 9 we have vertex buffers, index buffers, constant buffers, textures etc. OpenGL equivalent of constant buffer is uniform buffer object (UBO).
Vulkan has only two types of resources: buffers and images. This may be the only thing that is simpler in Vulkan than in other APIs :) When creating such resource, we specify usage flags that define how do we intend to use it. For example,
VK_BUFFER_USAGE_VERTEX_BUFFER_BIT means that a buffer may be used as vertex buffer.
VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT means that an image may be used as color attachment (which is Vulkan name for “render target”).
Such flags may be combined together, so a single buffer can contain data to be used as vertex buffer, index buffer, and uniform buffer. I’m not 100% sure if this is guaranteed by the specification (theoretically some drivers could return disjoint sets of
VkMemoryRequirements::memoryTypeBits for different usage flags), but I think that every real implementation allows that. It means we cannot clearly classify buffers and images into categories. Despite that, I decided to develop a human-friendly classification of Vulkan resources into several categories, starting from most “special”, and ending with most “common/generic” ones. I propose following algorithm:
For buffers: // Buffer is used as source of data for fixed-function stage of graphics pipeline. // It’s indirect, vertex, or index buffer. if ((usage & (VK_BUFFER_USAGE_INDIRECT_BUFFER_BIT | VK_BUFFER_USAGE_VERTEX_BUFFER_BIT | VK_BUFFER_USAGE_INDEX_BUFFER_BIT)) != 0) return class0; // Buffer is accessed by shaders for load/store/atomic. // Aka “UAV” else if ((usage & (VK_BUFFER_USAGE_STORAGE_BUFFER_BIT | VK_BUFFER_USAGE_STORAGE_TEXEL_BUFFER_BIT)) != 0) return class1; // Buffer is accessed by shaders for reading uniform data. // Aka “constant buffer” else if ((usage & (VK_BUFFER_USAGE_UNIFORM_BUFFER_BIT | VK_BUFFER_USAGE_UNIFORM_TEXEL_BUFFER_BIT)) != 0) return class2; // Any other type of buffer. // Notice that VK_BUFFER_USAGE_TRANSFER_SRC_BIT and VK_BUFFER_USAGE_TRANSFER_DST_BIT // flags are intentionally ignored. else return class3; For images: // Image is used as depth/stencil “texture/surface”. if ((usage & VK_IMAGE_USAGE_DEPTH_STENCIL_ATTACHMENT_BIT) != 0) return class0; // Image is used as other type of attachment. // Aka “render target” else if ((usage & (VK_IMAGE_USAGE_INPUT_ATTACHMENT_BIT | VK_IMAGE_USAGE_TRANSIENT_ATTACHMENT_BIT | VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT)) != 0) return class1; // Image is accessed by shaders for sampling. // Aka “texture” else if ((usage & VK_IMAGE_USAGE_SAMPLED_BIT) != 0) return class2; // Any other type of image. // Notice that VK_IMAGE_USAGE_TRANSFER_SRC_BIT and VK_IMAGE_USAGE_TRANSFER_DST_BIT // flags are intentionally ignored. else return class3;
I needed this because I wanted to introduce better coloring to VMA Dump Vis. Vulkan Memory Allocator (VMA) is a C++ library that simplifies GPU memory management in Vulkan applications. VMA Dump Vis is a Python script that can visualize JSON dump from this library on a picture. As I updated the library to remember usage flags of created resources, I wanted to use them to show more information on the picture. To do this, I defined following color scheme:
Example visualization of Vulkan memory in some game:
This color scheme is carefully designed. I based it on following principles:
VK_IMAGE_TILING_OPTIMALshould be used wherever possible, so those with
VK_IMAGE_TILING_LINEARare just marked with green. Images of unknown tiling have color between cyan and green.
You can already visualize your Vulkan memory with all these colors if you grab Vulkan Memory Allocator from development branch. I think that this classification of GPU resources and accompanying color scheme could also be useful for other graphics APIs.