Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
78 changes: 78 additions & 0 deletions docs/own/descriptorBuffer.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
# Support for VK_EXT_descriptor_buffer

Main challenge: How do we get the descriptors from a record?
Cannot do so at submission time anymore.

Let's assume we hook a record for which we want to introspect a
specific descriptor binding:

- in the hooked record, we can access and save the descriptor data blob
- but how to resolve that data blob and copy the data??
- worst case: mutable descriptors. We can't even know the type of
the descriptor

Just don't allow descriptor introspection for now, there is no easy and
fast solution that works in all cases.

## Best we can do approach for later

How far do we get with non mutable descriptors?
Looking stuff up on the gpu will likely not work properly.
Complicated hashmap data structres on the GPU?
+ device generated commands?? lol

How far could we get with the copy_indirect extension?
hm, not so far.

Idea! We don't need to know the handle on the gpu.
We could just access the descriptor! Just copy via shader.
Still has some limitations (acceleration structures?) but it's a start.

How could we handle acceleration structures?
Could make sure state isn't overwritten by submission and store some
serial number to identify it later on.
-> for later

#### Can we handle mutable descriptors with this?

Can we somehow encode the type into the descriptor? i.e. change
the return value of GetDescriptorEXT?
while the descriptor still works? meh, likely not
Would a lookupmap even work? could different descriptor types end
up with the same memory? Unlikely but possible I guess.

sad :(

---

Return our own handles from GetDescriptorEXT and let a compute
shader run before each draw/dispatch that fixes everything up? :D
Terrible idea.

---

Maybe we can implement heuristics for the type?
e.g. looking at the different descriptor sizes

have a look at how the shader accesses the descriptor?
might still be only bound to single binding, not aliased?

that together with hash map on gpu (that should usually work)
might be enough in like 99% of the cases.

### How to indirectly copy

Indirect dispatch. But how to know the size?
For images and storage buffers, we can query it!
Uniform buffers? meh
Just copy a couple of bytes and figure it out later on the CPU? :D
Inspect shader that uses it?
if the slot is bound as a uniform buffer, just use its size.
if it has multiple uniform buffers alised at the binding,
just choose the smallest? edge case anyways

for image/storage buffer: how to create/allocate dst memory?
feedback loop about size like we already do for transform feedback etc
at some point we can think about a dynamic allocator on the gpu
(requiring us just to reserve a buffer range instead of creating
a resource)
62 changes: 62 additions & 0 deletions docs/own/test.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@ at times but we shouldn't spend too much time on it in general.
There are already great vulkan test suites we can use for the layer as well.
E.g. the Vulkan CTS (testing WIP) and the Vulkan validation layer tests.

## Validation layer tests

Especially the positive tests from the validation layers have proven
extremely useful, they found many subtle issues.
Current filter:
Expand All @@ -28,3 +30,63 @@ Current filter:

Some of them are in there because they crash my driver (radv, fall 2022)
and some because vil has no support/they are known issues (e.g. sparse memory, external sync, two instances).

---

As of december 2025, there are some additional steps needed to run the
validation layer tests. Especially `VK_ADD_LAYER_PATH` is needed, otherwise
the validation tests override the layer path and vil cannot be found.
I usually export:

```
VK_ADD_LAYER_PATH=./layers/
VK_INSTANCE_LAYERS=VK_LAYER_live_introspection

# vil configuration
VIL_DLG_HANDLER=1
VIL_CB_TEST_HOOK=1

# optional, to easier debug asserts
VIL_BREAK_ON_ERROR=1

# optional, to see *everything*
VIL_MIN_LOG_LEVEL=trace
```

## Proton, Wine, DXVK, VKD3D

Good tests for some advanced features.
Example command line:

```
VKD3D_CONFIG=no_staggered_submit
LD_PRELOAD=/usr/lib/libxkbcommon.so
PROTON_ENABLE_WAYLAND=0
DXVK_DEBUG=markers
VIL_DLG_HANDLER=1
VIL_LOG_FILE=/home/jan/vil-steam
VIL_WAIT_SURFACE=1
PROTON_DISABLE_NVAPI=1
VK_INSTANCE_LAYERS=VK_LAYER_live_introspection
VIL_CREATE_WINDOW=1
VIL_HOOK_OVERLAY=0
VIL_ALLOW_UNSUPPORTED_EXTS=1
PROTON_LOG=1
%command%
```

- no_staggered_submit for vkd3d is highly useful as tracking commands over
multiple frames becomes very hard otherwise
- preloading of xkbcommon seems to be needed since wine/proton ships its
own version that seems to cause issues. (ABI incompatible? old version? idk)
- VIL_WAIT_SURFACE seems to be needed, not sure why
- PROTON_DISABLE_NVAPI might fix some issues
- will create log files in homedir:
- 'steam-$APPID' for the proton log
- 'vil-steam' for the vil log

Useful: api dump. TODO: with newer proton versions, we need to redirect it to a file
```
VK_LUNARG_API_DUMP_PRE_DUMP=true
VK_INSTANCE_LAYERS=VK_LAYER_LUNARG_api_dump:VK_LAYER_live_introspection
```
12 changes: 11 additions & 1 deletion docs/own/workstack.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,20 @@
- [ ] try to enable bufferDeviceAddress
- [ ] fix VIL_ALLOW_UNSUPPORTED_EXTS to not filter out exts
- [ ] or add new var for this?
- [ ] support shader debugging with spirv cross: spirv -> hlsl/glsl decompilation
- [ ] support live shader replacement?
- [ ] support ray tracing pipeline libraries
- [ ] for shader patching
- [ ] try to enable bufferDeviceAddress if possible
- [ ] fix errors with validation tests
- [ ] document how to run validation tests
- [ ] when VIL_SKIP_EXT_CHECK is set (or other env var?) override supported
extensions in that function. Investigate how to make this work.
Can be provided in layer manifest or something?

- [ ] support full and+or expressions for "required" extension field
in layer.cpp function list.
e.g. vkCmdSetDescriptorBufferOffsets2EXT: (vulkan1.4|maintenance6) + EXT_descriptor_buffer

- [ ] implement VK_KHR_dynamic_rendering_local_read for core 1.4
- [ ] impement VK_KHR_pipeline_executable_properties
- [ ] fix invalid pipeline barrier with BeginRendering (test e.g. with iro gpuDebugDraw)
Expand Down
44 changes: 33 additions & 11 deletions src/accelStruct.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -360,27 +360,31 @@ VKAPI_ATTR VkResult VKAPI_CALL CreateAccelerationStructureKHR(
VkAccelerationStructureDeviceAddressInfoKHR devAddressInfo {};
devAddressInfo.sType = VK_STRUCTURE_TYPE_ACCELERATION_STRUCTURE_DEVICE_ADDRESS_INFO_KHR;
devAddressInfo.accelerationStructure = accelStruct.handle;
accelStruct.deviceAddress = dev.dispatch.GetAccelerationStructureDeviceAddressKHR(
dev.handle, &devAddressInfo);
dlg_assert(accelStruct.deviceAddress);

*pAccelerationStructure = castDispatch<VkAccelerationStructureKHR>(accelStruct);
dev.accelStructs.mustEmplace(std::move(accelStructPtr));

{
std::lock_guard lock(dev.mutex);
auto [_, success] = dev.accelStructAddresses.insert({
accelStruct.deviceAddress, &accelStruct});
dlg_assert(success);
if (accelStruct.buf->ci.usage & VK_BUFFER_USAGE_SHADER_DEVICE_ADDRESS_BIT) {
accelStruct.deviceAddress = dev.dispatch.GetAccelerationStructureDeviceAddressKHR(
dev.handle, &devAddressInfo);
dlg_assert(accelStruct.deviceAddress);

{
std::lock_guard lock(dev.mutex);
auto [_, success] = dev.accelStructAddresses.insert({
accelStruct.deviceAddress, &accelStruct});
dlg_assert(success);
}
}

return res;
}

void AccelStruct::onApiDestroy() {
std::lock_guard lock(dev->mutex);
dlg_assert(deviceAddress);
dev->accelStructAddresses.erase(deviceAddress);
if(deviceAddress) {
dev->accelStructAddresses.erase(deviceAddress);
}
}

VKAPI_ATTR void VKAPI_CALL DestroyAccelerationStructureKHR(
Expand Down Expand Up @@ -498,7 +502,25 @@ VKAPI_ATTR VkDeviceAddress VKAPI_CALL GetAccelerationStructureDeviceAddressKHR(
auto fwd = *pInfo;
fwd.accelerationStructure = accelStruct.handle;

return dev.dispatch.GetAccelerationStructureDeviceAddressKHR(dev.handle, &fwd);
auto address = dev.dispatch.GetAccelerationStructureDeviceAddressKHR(dev.handle, &fwd);

if (accelStruct.deviceAddress != address) {
// this is a big issue, try to recover somewhat
dlg_error("unexpected address difference: {} vs {}",
accelStruct.deviceAddress, address);

if (!accelStruct.deviceAddress) {
accelStruct.deviceAddress = address;

// was likely not inserted before
std::lock_guard lock(dev.mutex);
auto [_, success] = dev.accelStructAddresses.insert({
accelStruct.deviceAddress, &accelStruct});
dlg_assert(success);
}
}

return address;
}

VKAPI_ATTR void VKAPI_CALL GetDeviceAccelerationStructureCompatibilityKHR(
Expand Down
2 changes: 1 addition & 1 deletion src/accelStruct.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ struct AccelStruct : SharedDeviceHandle {
Buffer* buf {};
VkDeviceSize offset {};
VkDeviceSize size {};
VkDeviceAddress deviceAddress {};
VkDeviceAddress deviceAddress {}; // can be 0

// The state when all activated and pending submissions are completed.
// Synced using device mutex.
Expand Down
9 changes: 6 additions & 3 deletions src/buffer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,9 @@ void Buffer::onApiDestroy() {
MemoryResource::onApiDestroy();

std::lock_guard lock(dev->mutex);
if(ci.usage & VK_BUFFER_USAGE_SHADER_DEVICE_ADDRESS_BIT) {
const bool allowAddress = ci.usage & VK_BUFFER_USAGE_SHADER_DEVICE_ADDRESS_BIT;
dlg_assert(!deviceAddress || allowAddress);
if(deviceAddress) {
dev->bufferAddresses.erase(this);
}

Expand Down Expand Up @@ -346,13 +348,14 @@ VKAPI_ATTR VkDeviceAddress VKAPI_CALL GetBufferDeviceAddress(
fwd.buffer = buf.handle;
auto ret = buf.dev->dispatch.GetBufferDeviceAddress(buf.dev->handle, &fwd);

// TODO: technically, we have to lock here
if(ret != buf.deviceAddress) {
// This is a sign of a serious problem.
dlg_assertm(!buf.deviceAddress, "Inconsistent/Unknown device address retrieved");

auto& dev = *buf.dev;
std::lock_guard lock(dev.mutex);
if (!buf.deviceAddress) {
auto& dev = *buf.dev;
std::lock_guard lock(dev.mutex);
buf.deviceAddress = ret;
dev.bufferAddresses.insert(&buf);
}
Expand Down
Loading