nyorain · nyorain · Dec 28, 2025 · Dec 17, 2025 · Dec 17, 2025 · Dec 18, 2025
diff --git a/docs/own/descriptorBuffer.md b/docs/own/descriptorBuffer.md
@@ -0,0 +1,78 @@
+# Support for VK_EXT_descriptor_buffer
+
+Main challenge: How do we get the descriptors from a record?
+Cannot do so at submission time anymore.
+
+Let's assume we hook a record for which we want to introspect a
+specific descriptor binding:
+
+- in the hooked record, we can access and save the descriptor data blob
+- but how to resolve that data blob and copy the data??
+	- worst case: mutable descriptors. We can't even know the type of
+	  the descriptor
+
+Just don't allow descriptor introspection for now, there is no easy and
+fast solution that works in all cases.
+
+## Best we can do approach for later
+
+How far do we get with non mutable descriptors?
+Looking stuff up on the gpu will likely not work properly.
+	Complicated hashmap data structres on the GPU?
+		+ device generated commands?? lol
+
+How far could we get with the copy_indirect extension?
+	hm, not so far.
+
+Idea! We don't need to know the handle on the gpu.
+We could just access the descriptor! Just copy via shader.
+Still has some limitations (acceleration structures?) but it's a start.
+
+How could we handle acceleration structures?
+Could make sure state isn't overwritten by submission and store some
+serial number to identify it later on.
+	-> for later
+
+#### Can we handle mutable descriptors with this?
+
+Can we somehow encode the type into the descriptor? i.e. change
+	the return value of GetDescriptorEXT?
+		while the descriptor still works? meh, likely not
+Would a lookupmap even work? could different descriptor types end
+	up with the same memory? Unlikely but possible I guess.
+
+sad :(
+
+---
+
+Return our own handles from GetDescriptorEXT and let a compute
+shader run before each draw/dispatch that fixes everything up? :D
+Terrible idea.
+
+---
+
+Maybe we can implement heuristics for the type?
+e.g. looking at the different descriptor sizes
+
+have a look at how the shader accesses the descriptor?
+	might still be only bound to single binding, not aliased?
+
+that together with hash map on gpu (that should usually work)
+might be enough in like 99% of the cases.
+
+### How to indirectly copy
+
+Indirect dispatch. But how to know the size?
+	For images and storage buffers, we can query it!
+	Uniform buffers? meh
+	Just copy a couple of bytes and figure it out later on the CPU? :D
+Inspect shader that uses it?
+	if the slot is bound as a uniform buffer, just use its size.
+	if it has multiple uniform buffers alised at the binding,
+	just choose the smallest? edge case anyways
+
+for image/storage buffer: how to create/allocate dst memory?
+	feedback loop about size like we already do for transform feedback etc
+	at some point we can think about a dynamic allocator on the gpu
+	(requiring us just to reserve a buffer range instead of creating
+	a resource)
diff --git a/docs/own/test.md b/docs/own/test.md
@@ -18,6 +18,8 @@ at times but we shouldn't spend too much time on it in general.
 There are already great vulkan test suites we can use for the layer as well.
 E.g. the Vulkan CTS (testing WIP) and the Vulkan validation layer tests.
 
+## Validation layer tests
+
 Especially the positive tests from the validation layers have proven
 extremely useful, they found many subtle issues.
 Current filter:
@@ -28,3 +30,63 @@ Current filter:
 
 Some of them are in there because they crash my driver (radv, fall 2022)
 and some because vil has no support/they are known issues (e.g. sparse memory, external sync, two instances).
+
+---
+
+As of december 2025, there are some additional steps needed to run the
+validation layer tests. Especially `VK_ADD_LAYER_PATH` is needed, otherwise
+the validation tests override the layer path and vil cannot be found.
+I usually export:
+
+```
+VK_ADD_LAYER_PATH=./layers/
+VK_INSTANCE_LAYERS=VK_LAYER_live_introspection
+
+# vil configuration
+VIL_DLG_HANDLER=1
+VIL_CB_TEST_HOOK=1
+
+# optional, to easier debug asserts
+VIL_BREAK_ON_ERROR=1
+
+# optional, to see *everything*
+VIL_MIN_LOG_LEVEL=trace
+```
+
+## Proton, Wine, DXVK, VKD3D
+
+Good tests for some advanced features.
+Example command line:
+
+```
+VKD3D_CONFIG=no_staggered_submit
+LD_PRELOAD=/usr/lib/libxkbcommon.so
+PROTON_ENABLE_WAYLAND=0
+DXVK_DEBUG=markers
+VIL_DLG_HANDLER=1
+VIL_LOG_FILE=/home/jan/vil-steam
+VIL_WAIT_SURFACE=1
+PROTON_DISABLE_NVAPI=1
+VK_INSTANCE_LAYERS=VK_LAYER_live_introspection
+VIL_CREATE_WINDOW=1
+VIL_HOOK_OVERLAY=0
+VIL_ALLOW_UNSUPPORTED_EXTS=1
+PROTON_LOG=1
+%command%
+```
+
+- no_staggered_submit for vkd3d is highly useful as tracking commands over
+  multiple frames becomes very hard otherwise
+- preloading of xkbcommon seems to be needed since wine/proton ships its
+  own version that seems to cause issues. (ABI incompatible? old version? idk)
+- VIL_WAIT_SURFACE seems to be needed, not sure why
+- PROTON_DISABLE_NVAPI might fix some issues
+- will create log files in homedir:
+	- 'steam-$APPID' for the proton log
+	- 'vil-steam' for the vil log
+
+Useful: api dump. TODO: with newer proton versions, we need to redirect it to a file
+```
+VK_LUNARG_API_DUMP_PRE_DUMP=true
+VK_INSTANCE_LAYERS=VK_LAYER_LUNARG_api_dump:VK_LAYER_live_introspection
+```
diff --git a/docs/own/workstack.md b/docs/own/workstack.md
@@ -1,10 +1,20 @@
-- [ ] try to enable bufferDeviceAddress
+- [ ] fix VIL_ALLOW_UNSUPPORTED_EXTS to not filter out exts
+	- [ ] or add new var for this?
+- [ ] support shader debugging with spirv cross: spirv -> hlsl/glsl decompilation
+	- [ ] support live shader replacement?
+- [ ] support ray tracing pipeline libraries
+	- [ ] for shader patching
+- [ ] try to enable bufferDeviceAddress if possible
 - [ ] fix errors with validation tests
 	- [ ] document how to run validation tests
 - [ ] when VIL_SKIP_EXT_CHECK is set (or other env var?) override supported
       extensions in that function. Investigate how to make this work.
 	  Can be provided in layer manifest or something?
 
+- [ ] support full and+or expressions for "required" extension field
+      in layer.cpp function list.
+	  e.g. vkCmdSetDescriptorBufferOffsets2EXT: (vulkan1.4|maintenance6) + EXT_descriptor_buffer
+
 - [ ] implement VK_KHR_dynamic_rendering_local_read for core 1.4
 - [ ] impement VK_KHR_pipeline_executable_properties
 - [ ] fix invalid pipeline barrier with BeginRendering (test e.g. with iro gpuDebugDraw)

diff --git a/src/accelStruct.cpp b/src/accelStruct.cpp
@@ -360,27 +360,31 @@ VKAPI_ATTR VkResult VKAPI_CALL CreateAccelerationStructureKHR(
 	VkAccelerationStructureDeviceAddressInfoKHR devAddressInfo {};
 	devAddressInfo.sType = VK_STRUCTURE_TYPE_ACCELERATION_STRUCTURE_DEVICE_ADDRESS_INFO_KHR;
 	devAddressInfo.accelerationStructure = accelStruct.handle;
-	accelStruct.deviceAddress = dev.dispatch.GetAccelerationStructureDeviceAddressKHR(
-		dev.handle, &devAddressInfo);
-	dlg_assert(accelStruct.deviceAddress);
 
 	*pAccelerationStructure = castDispatch<VkAccelerationStructureKHR>(accelStruct);
 	dev.accelStructs.mustEmplace(std::move(accelStructPtr));
 
-	{
-		std::lock_guard lock(dev.mutex);
-		auto [_, success] = dev.accelStructAddresses.insert({
-			accelStruct.deviceAddress, &accelStruct});
-		dlg_assert(success);
+	if (accelStruct.buf->ci.usage & VK_BUFFER_USAGE_SHADER_DEVICE_ADDRESS_BIT) {
+		accelStruct.deviceAddress = dev.dispatch.GetAccelerationStructureDeviceAddressKHR(
+			dev.handle, &devAddressInfo);
+		dlg_assert(accelStruct.deviceAddress);
+
+		{
+			std::lock_guard lock(dev.mutex);
+			auto [_, success] = dev.accelStructAddresses.insert({
+				accelStruct.deviceAddress, &accelStruct});
+			dlg_assert(success);
+		}
 	}
 
 	return res;
 }
 
 void AccelStruct::onApiDestroy() {
 	std::lock_guard lock(dev->mutex);
-	dlg_assert(deviceAddress);
-	dev->accelStructAddresses.erase(deviceAddress);
+	if(deviceAddress) {
+		dev->accelStructAddresses.erase(deviceAddress);
+	}
 }
 
 VKAPI_ATTR void VKAPI_CALL DestroyAccelerationStructureKHR(
@@ -498,7 +502,25 @@ VKAPI_ATTR VkDeviceAddress VKAPI_CALL GetAccelerationStructureDeviceAddressKHR(
 	auto fwd = *pInfo;
 	fwd.accelerationStructure = accelStruct.handle;
 
-	return dev.dispatch.GetAccelerationStructureDeviceAddressKHR(dev.handle, &fwd);
+	auto address = dev.dispatch.GetAccelerationStructureDeviceAddressKHR(dev.handle, &fwd);
+
+	if (accelStruct.deviceAddress != address) {
+		// this is a big issue, try to recover somewhat
+		dlg_error("unexpected address difference: {} vs {}",
+			accelStruct.deviceAddress, address);
+
+		if (!accelStruct.deviceAddress) {
+			accelStruct.deviceAddress = address;
+
+			// was likely not inserted before
+			std::lock_guard lock(dev.mutex);
+			auto [_, success] = dev.accelStructAddresses.insert({
+				accelStruct.deviceAddress, &accelStruct});
+			dlg_assert(success);
+		}
+	}
+
+	return address;
 }
 
 VKAPI_ATTR void VKAPI_CALL GetDeviceAccelerationStructureCompatibilityKHR(

diff --git a/src/accelStruct.hpp b/src/accelStruct.hpp
@@ -66,7 +66,7 @@ struct AccelStruct : SharedDeviceHandle {
 	Buffer* buf {};
 	VkDeviceSize offset {};
 	VkDeviceSize size {};
-	VkDeviceAddress deviceAddress {};
+	VkDeviceAddress deviceAddress {}; // can be 0
 
 	// The state when all activated and pending submissions are completed.
 	// Synced using device mutex.

diff --git a/src/buffer.cpp b/src/buffer.cpp
@@ -60,7 +60,9 @@ void Buffer::onApiDestroy() {
 	MemoryResource::onApiDestroy();
 
 	std::lock_guard lock(dev->mutex);
-	if(ci.usage & VK_BUFFER_USAGE_SHADER_DEVICE_ADDRESS_BIT) {
+	const bool allowAddress = ci.usage & VK_BUFFER_USAGE_SHADER_DEVICE_ADDRESS_BIT;
+	dlg_assert(!deviceAddress || allowAddress);
+	if(deviceAddress) {
 		dev->bufferAddresses.erase(this);
 	}
 
@@ -346,13 +348,14 @@ VKAPI_ATTR VkDeviceAddress VKAPI_CALL GetBufferDeviceAddress(
 	fwd.buffer = buf.handle;
 	auto ret = buf.dev->dispatch.GetBufferDeviceAddress(buf.dev->handle, &fwd);
 
+	// TODO: technically, we have to lock here
 	if(ret != buf.deviceAddress) {
 		// This is a sign of a serious problem.
 		dlg_assertm(!buf.deviceAddress, "Inconsistent/Unknown device address retrieved");
 
+		auto& dev = *buf.dev;
+		std::lock_guard lock(dev.mutex);
 		if (!buf.deviceAddress) {
-			auto& dev = *buf.dev;
-			std::lock_guard lock(dev.mutex);
 			buf.deviceAddress = ret;
 			dev.bufferAddresses.insert(&buf);
 		}