diff --git a/docs/own/descriptorBuffer.md b/docs/own/descriptorBuffer.md new file mode 100644 index 0000000..718c296 --- /dev/null +++ b/docs/own/descriptorBuffer.md @@ -0,0 +1,78 @@ +# Support for VK_EXT_descriptor_buffer + +Main challenge: How do we get the descriptors from a record? +Cannot do so at submission time anymore. + +Let's assume we hook a record for which we want to introspect a +specific descriptor binding: + +- in the hooked record, we can access and save the descriptor data blob +- but how to resolve that data blob and copy the data?? + - worst case: mutable descriptors. We can't even know the type of + the descriptor + +Just don't allow descriptor introspection for now, there is no easy and +fast solution that works in all cases. + +## Best we can do approach for later + +How far do we get with non mutable descriptors? +Looking stuff up on the gpu will likely not work properly. + Complicated hashmap data structres on the GPU? + + device generated commands?? lol + +How far could we get with the copy_indirect extension? + hm, not so far. + +Idea! We don't need to know the handle on the gpu. +We could just access the descriptor! Just copy via shader. +Still has some limitations (acceleration structures?) but it's a start. + +How could we handle acceleration structures? +Could make sure state isn't overwritten by submission and store some +serial number to identify it later on. + -> for later + +#### Can we handle mutable descriptors with this? + +Can we somehow encode the type into the descriptor? i.e. change + the return value of GetDescriptorEXT? + while the descriptor still works? meh, likely not +Would a lookupmap even work? could different descriptor types end + up with the same memory? Unlikely but possible I guess. + +sad :( + +--- + +Return our own handles from GetDescriptorEXT and let a compute +shader run before each draw/dispatch that fixes everything up? :D +Terrible idea. + +--- + +Maybe we can implement heuristics for the type? +e.g. looking at the different descriptor sizes + +have a look at how the shader accesses the descriptor? + might still be only bound to single binding, not aliased? + +that together with hash map on gpu (that should usually work) +might be enough in like 99% of the cases. + +### How to indirectly copy + +Indirect dispatch. But how to know the size? + For images and storage buffers, we can query it! + Uniform buffers? meh + Just copy a couple of bytes and figure it out later on the CPU? :D +Inspect shader that uses it? + if the slot is bound as a uniform buffer, just use its size. + if it has multiple uniform buffers alised at the binding, + just choose the smallest? edge case anyways + +for image/storage buffer: how to create/allocate dst memory? + feedback loop about size like we already do for transform feedback etc + at some point we can think about a dynamic allocator on the gpu + (requiring us just to reserve a buffer range instead of creating + a resource) diff --git a/docs/own/test.md b/docs/own/test.md index a65a656..8fba9f7 100644 --- a/docs/own/test.md +++ b/docs/own/test.md @@ -18,6 +18,8 @@ at times but we shouldn't spend too much time on it in general. There are already great vulkan test suites we can use for the layer as well. E.g. the Vulkan CTS (testing WIP) and the Vulkan validation layer tests. +## Validation layer tests + Especially the positive tests from the validation layers have proven extremely useful, they found many subtle issues. Current filter: @@ -28,3 +30,63 @@ Current filter: Some of them are in there because they crash my driver (radv, fall 2022) and some because vil has no support/they are known issues (e.g. sparse memory, external sync, two instances). + +--- + +As of december 2025, there are some additional steps needed to run the +validation layer tests. Especially `VK_ADD_LAYER_PATH` is needed, otherwise +the validation tests override the layer path and vil cannot be found. +I usually export: + +``` +VK_ADD_LAYER_PATH=./layers/ +VK_INSTANCE_LAYERS=VK_LAYER_live_introspection + +# vil configuration +VIL_DLG_HANDLER=1 +VIL_CB_TEST_HOOK=1 + +# optional, to easier debug asserts +VIL_BREAK_ON_ERROR=1 + +# optional, to see *everything* +VIL_MIN_LOG_LEVEL=trace +``` + +## Proton, Wine, DXVK, VKD3D + +Good tests for some advanced features. +Example command line: + +``` +VKD3D_CONFIG=no_staggered_submit +LD_PRELOAD=/usr/lib/libxkbcommon.so +PROTON_ENABLE_WAYLAND=0 +DXVK_DEBUG=markers +VIL_DLG_HANDLER=1 +VIL_LOG_FILE=/home/jan/vil-steam +VIL_WAIT_SURFACE=1 +PROTON_DISABLE_NVAPI=1 +VK_INSTANCE_LAYERS=VK_LAYER_live_introspection +VIL_CREATE_WINDOW=1 +VIL_HOOK_OVERLAY=0 +VIL_ALLOW_UNSUPPORTED_EXTS=1 +PROTON_LOG=1 +%command% +``` + +- no_staggered_submit for vkd3d is highly useful as tracking commands over + multiple frames becomes very hard otherwise +- preloading of xkbcommon seems to be needed since wine/proton ships its + own version that seems to cause issues. (ABI incompatible? old version? idk) +- VIL_WAIT_SURFACE seems to be needed, not sure why +- PROTON_DISABLE_NVAPI might fix some issues +- will create log files in homedir: + - 'steam-$APPID' for the proton log + - 'vil-steam' for the vil log + +Useful: api dump. TODO: with newer proton versions, we need to redirect it to a file +``` +VK_LUNARG_API_DUMP_PRE_DUMP=true +VK_INSTANCE_LAYERS=VK_LAYER_LUNARG_api_dump:VK_LAYER_live_introspection +``` diff --git a/docs/own/workstack.md b/docs/own/workstack.md index 280175e..f455f9c 100644 --- a/docs/own/workstack.md +++ b/docs/own/workstack.md @@ -1,10 +1,20 @@ -- [ ] try to enable bufferDeviceAddress +- [ ] fix VIL_ALLOW_UNSUPPORTED_EXTS to not filter out exts + - [ ] or add new var for this? +- [ ] support shader debugging with spirv cross: spirv -> hlsl/glsl decompilation + - [ ] support live shader replacement? +- [ ] support ray tracing pipeline libraries + - [ ] for shader patching +- [ ] try to enable bufferDeviceAddress if possible - [ ] fix errors with validation tests - [ ] document how to run validation tests - [ ] when VIL_SKIP_EXT_CHECK is set (or other env var?) override supported extensions in that function. Investigate how to make this work. Can be provided in layer manifest or something? +- [ ] support full and+or expressions for "required" extension field + in layer.cpp function list. + e.g. vkCmdSetDescriptorBufferOffsets2EXT: (vulkan1.4|maintenance6) + EXT_descriptor_buffer + - [ ] implement VK_KHR_dynamic_rendering_local_read for core 1.4 - [ ] impement VK_KHR_pipeline_executable_properties - [ ] fix invalid pipeline barrier with BeginRendering (test e.g. with iro gpuDebugDraw) diff --git a/src/accelStruct.cpp b/src/accelStruct.cpp index 9624268..413d453 100644 --- a/src/accelStruct.cpp +++ b/src/accelStruct.cpp @@ -360,18 +360,21 @@ VKAPI_ATTR VkResult VKAPI_CALL CreateAccelerationStructureKHR( VkAccelerationStructureDeviceAddressInfoKHR devAddressInfo {}; devAddressInfo.sType = VK_STRUCTURE_TYPE_ACCELERATION_STRUCTURE_DEVICE_ADDRESS_INFO_KHR; devAddressInfo.accelerationStructure = accelStruct.handle; - accelStruct.deviceAddress = dev.dispatch.GetAccelerationStructureDeviceAddressKHR( - dev.handle, &devAddressInfo); - dlg_assert(accelStruct.deviceAddress); *pAccelerationStructure = castDispatch(accelStruct); dev.accelStructs.mustEmplace(std::move(accelStructPtr)); - { - std::lock_guard lock(dev.mutex); - auto [_, success] = dev.accelStructAddresses.insert({ - accelStruct.deviceAddress, &accelStruct}); - dlg_assert(success); + if (accelStruct.buf->ci.usage & VK_BUFFER_USAGE_SHADER_DEVICE_ADDRESS_BIT) { + accelStruct.deviceAddress = dev.dispatch.GetAccelerationStructureDeviceAddressKHR( + dev.handle, &devAddressInfo); + dlg_assert(accelStruct.deviceAddress); + + { + std::lock_guard lock(dev.mutex); + auto [_, success] = dev.accelStructAddresses.insert({ + accelStruct.deviceAddress, &accelStruct}); + dlg_assert(success); + } } return res; @@ -379,8 +382,9 @@ VKAPI_ATTR VkResult VKAPI_CALL CreateAccelerationStructureKHR( void AccelStruct::onApiDestroy() { std::lock_guard lock(dev->mutex); - dlg_assert(deviceAddress); - dev->accelStructAddresses.erase(deviceAddress); + if(deviceAddress) { + dev->accelStructAddresses.erase(deviceAddress); + } } VKAPI_ATTR void VKAPI_CALL DestroyAccelerationStructureKHR( @@ -498,7 +502,25 @@ VKAPI_ATTR VkDeviceAddress VKAPI_CALL GetAccelerationStructureDeviceAddressKHR( auto fwd = *pInfo; fwd.accelerationStructure = accelStruct.handle; - return dev.dispatch.GetAccelerationStructureDeviceAddressKHR(dev.handle, &fwd); + auto address = dev.dispatch.GetAccelerationStructureDeviceAddressKHR(dev.handle, &fwd); + + if (accelStruct.deviceAddress != address) { + // this is a big issue, try to recover somewhat + dlg_error("unexpected address difference: {} vs {}", + accelStruct.deviceAddress, address); + + if (!accelStruct.deviceAddress) { + accelStruct.deviceAddress = address; + + // was likely not inserted before + std::lock_guard lock(dev.mutex); + auto [_, success] = dev.accelStructAddresses.insert({ + accelStruct.deviceAddress, &accelStruct}); + dlg_assert(success); + } + } + + return address; } VKAPI_ATTR void VKAPI_CALL GetDeviceAccelerationStructureCompatibilityKHR( diff --git a/src/accelStruct.hpp b/src/accelStruct.hpp index b68e677..d97762c 100644 --- a/src/accelStruct.hpp +++ b/src/accelStruct.hpp @@ -66,7 +66,7 @@ struct AccelStruct : SharedDeviceHandle { Buffer* buf {}; VkDeviceSize offset {}; VkDeviceSize size {}; - VkDeviceAddress deviceAddress {}; + VkDeviceAddress deviceAddress {}; // can be 0 // The state when all activated and pending submissions are completed. // Synced using device mutex. diff --git a/src/buffer.cpp b/src/buffer.cpp index 1157553..4835dd1 100644 --- a/src/buffer.cpp +++ b/src/buffer.cpp @@ -60,7 +60,9 @@ void Buffer::onApiDestroy() { MemoryResource::onApiDestroy(); std::lock_guard lock(dev->mutex); - if(ci.usage & VK_BUFFER_USAGE_SHADER_DEVICE_ADDRESS_BIT) { + const bool allowAddress = ci.usage & VK_BUFFER_USAGE_SHADER_DEVICE_ADDRESS_BIT; + dlg_assert(!deviceAddress || allowAddress); + if(deviceAddress) { dev->bufferAddresses.erase(this); } @@ -346,13 +348,14 @@ VKAPI_ATTR VkDeviceAddress VKAPI_CALL GetBufferDeviceAddress( fwd.buffer = buf.handle; auto ret = buf.dev->dispatch.GetBufferDeviceAddress(buf.dev->handle, &fwd); + // TODO: technically, we have to lock here if(ret != buf.deviceAddress) { // This is a sign of a serious problem. dlg_assertm(!buf.deviceAddress, "Inconsistent/Unknown device address retrieved"); + auto& dev = *buf.dev; + std::lock_guard lock(dev.mutex); if (!buf.deviceAddress) { - auto& dev = *buf.dev; - std::lock_guard lock(dev.mutex); buf.deviceAddress = ret; dev.bufferAddresses.insert(&buf); } diff --git a/src/cb.cpp b/src/cb.cpp index df4af90..379a0c1 100644 --- a/src/cb.cpp +++ b/src/cb.cpp @@ -23,17 +23,15 @@ namespace vil { -void copyChainInPlace(CommandBuffer& cb, const void*& pNext) { +// TODO: optionally(?) take a list of expected sTypes +void* copyChain(CommandBuffer& cb, const void* pNext) { dlg_assert(cb.builder().record_); auto& rec = *cb.builder().record_; - pNext = copyChain(rec.alloc, pNext); + return copyChain(rec.alloc, pNext); } -// TODO: optionally(?) take a list of expected sTypes -const void* copyChain(CommandBuffer& cb, const void* pNext) { - auto ret = pNext; - copyChainInPlace(cb, ret); - return ret; +void copyChainInPlace(CommandBuffer& cb, const void*& pNext) { + pNext = copyChain(cb, pNext); } template @@ -286,6 +284,14 @@ void CommandBuffer::doEnd() { lastRecord_ = std::move(builder_.record_); } + + // NOTE: this is just for testing/validation + if (dev->hookRecordOnEnd) { + std::lock_guard lock(dev->mutex); + CommandHookRecord hooked(*dev->commandHook, *lastRecord_, + {}, {}, {}, {}); + hooked.invalid = true; + } } void CommandBuffer::popLabelSections() { @@ -1197,7 +1203,7 @@ span cmdBindVertexBuffers(CommandBuffer& cb, ThreadMemScope& tms, const VkDeviceSize* pStrides, bool v2) { dlg_assert(v2 || !pSizes); - dlg_assert(v2 || !pOffsets); + dlg_assert(v2 || !pStrides); auto& cmd = addCmd(cb); cmd.firstBinding = firstBinding; @@ -2810,7 +2816,7 @@ VKAPI_ATTR void VKAPI_CALL CmdSetFrontFace( auto& cmd = addCmd(cb); cmd.frontFace = frontFace; - cb.dev->dispatch.CmdSetCullModeEXT(cb.handle, frontFace); + cb.dev->dispatch.CmdSetFrontFace(cb.handle, frontFace); } VKAPI_ATTR void VKAPI_CALL CmdSetPrimitiveTopology( @@ -3537,11 +3543,16 @@ VKAPI_ATTR void VKAPI_CALL CmdBindShadersEXT( auto vkShaders = tms.alloc(stageCount); for(auto i = 0u; i < stageCount; ++i) { - auto& shader = get(*cb.dev, pShaders[i]); - useHandle(cb, cmd, shader); + if (pShaders[i]) { + auto& shader = get(*cb.dev, pShaders[i]); + useHandle(cb, cmd, shader); - cmd.shaders[i] = &shader; - vkShaders[i] = shader.handle; + cmd.shaders[i] = &shader; + vkShaders[i] = shader.handle; + } else { + cmd.shaders[i] = nullptr; + vkShaders[i] = VK_NULL_HANDLE; + } } cb.dev->dispatch.CmdBindShadersEXT(cb.handle, @@ -3624,6 +3635,7 @@ VKAPI_ATTR void VKAPI_CALL CmdBindDescriptorSets2( cmd.firstSet = info.firstSet; cmd.stageFlags = info.stageFlags; cmd.dynamicOffsets = copySpan(cb, info.pDynamicOffsets, info.dynamicOffsetCount); + cmd.pNext = copyChain(cb, info.pNext); auto pipeLayoutPtr = getPtr(*cb.dev, info.layout); cmd.pipeLayout = pipeLayoutPtr.get(); @@ -3688,6 +3700,8 @@ VKAPI_ATTR void VKAPI_CALL CmdPushConstants2( cmd.offset = info.offset; auto ptr = static_cast(info.pValues); cmd.values = copySpan(cb, static_cast(ptr), info.size); + cmd.pNext = copyChain(cb, pPushConstantsInfo->pNext); + cmd.v2 = true; // reallocate push constants auto& pc = cb.pushConstants().data; @@ -3714,6 +3728,7 @@ VKAPI_ATTR void VKAPI_CALL CmdPushDescriptorSet2( cmd.set = info.set; cmd.descriptorWrites = copySpan(cb, info.pDescriptorWrites, info.descriptorWriteCount); + cmd.pNext = copyChain(cb, pPushDescriptorSetsInfo->pNext); auto pipeLayoutPtr = getPtr(*cb.dev, info.layout); cmd.pipeLayout = pipeLayoutPtr.get(); @@ -3870,8 +3885,12 @@ GeneratedCommandsInfo convert(CommandBuffer& cb, Command& cmd, ret.preprocessSize = info.preprocessSize; ret.indirectAddress = info.indirectAddress; ret.indirectSize = info.indirectAddressSize; - ret.execSet = &get(dev, info.indirectExecutionSet); - ret.layout = &get(dev, info.indirectCommandsLayout); + if (info.indirectExecutionSet) { + ret.execSet = &get(dev, info.indirectExecutionSet); + } + if (info.indirectCommandsLayout) { + ret.layout = &get(dev, info.indirectCommandsLayout); + } ret.maxDrawCount = info.maxDrawCount; ret.stages = info.shaderStages; ret.sequenceCountAddress = info.sequenceCountAddress; @@ -3896,7 +3915,9 @@ VKAPI_ATTR void VKAPI_CALL CmdPreprocessGeneratedCommandsEXT( cmd.info = convert(cb, cmd, *pGeneratedCommandsInfo); cmd.state = &getCommandBuffer(stateCommandBuffer); - dlg_assert(!pGeneratedCommandsInfo->pNext); + cmd.pNext = copyChain(cb, pGeneratedCommandsInfo->pNext); + auto& rec = *cb.builder().record_; + patchIndirectExecutionChain(rec.alloc, *cb.dev, cmd.pNext); { ExtZoneScopedN("dispatch"); @@ -3917,7 +3938,9 @@ VKAPI_ATTR void VKAPI_CALL CmdExecuteGeneratedCommandsEXT( cmd.info = convert(cb, cmd, *pGeneratedCommandsInfo); cmd.isPreprocessed = isPreprocessed; - dlg_assert(!pGeneratedCommandsInfo->pNext); + cmd.pNext = copyChain(cb, pGeneratedCommandsInfo->pNext); + auto& rec = *cb.builder().record_; + patchIndirectExecutionChain(rec.alloc, *cb.dev, cmd.pNext); { ExtZoneScopedN("dispatch"); @@ -4233,4 +4256,97 @@ VKAPI_ATTR void VKAPI_CALL CmdSetDepthBias2EXT( cb.dev->dispatch.CmdSetDepthBias2EXT(cb.handle, pDepthBiasInfo); } +// VK_EXT_descriptor_buffer +VKAPI_ATTR void VKAPI_CALL CmdBindDescriptorBuffersEXT( + VkCommandBuffer commandBuffer, + uint32_t bufferCount, + const VkDescriptorBufferBindingInfoEXT* pBindingInfos) { + auto& cb = getCommandBuffer(commandBuffer); + auto& cmd = addCmd(cb); + cmd.buffers = copySpan(cb, pBindingInfos, bufferCount); + for(auto& buf : cmd.buffers) { + copyChainInPlace(cb, buf.pNext); + // TODO: useHandle for buffer? + } + + cmd.record(*cb.dev, cb.handle, cb.pool().queueFamily); +} + +VKAPI_ATTR void VKAPI_CALL CmdSetDescriptorBufferOffsetsEXT( + VkCommandBuffer commandBuffer, + VkPipelineBindPoint pipelineBindPoint, + VkPipelineLayout layout, + uint32_t firstSet, + uint32_t setCount, + const uint32_t* pBufferIndices, + const VkDeviceSize* pOffsets) { + auto& cb = getCommandBuffer(commandBuffer); + auto& cmd = addCmd(cb); + cmd.bindPoint = pipelineBindPoint; + + auto pipeLayoutPtr = getPtr(*cb.dev, layout); + cmd.pipeLayout = pipeLayoutPtr.get(); + useHandle(cb, cmd, *pipeLayoutPtr); + + cmd.firstSet = firstSet; + cmd.bufferIndices = copySpan(cb, pBufferIndices, setCount); + cmd.offsets = copySpan(cb, pOffsets, setCount); + + cmd.record(*cb.dev, cb.handle, cb.pool().queueFamily); +} + +VKAPI_ATTR void VKAPI_CALL CmdBindDescriptorBufferEmbeddedSamplersEXT( + VkCommandBuffer commandBuffer, + VkPipelineBindPoint pipelineBindPoint, + VkPipelineLayout layout, + uint32_t set) { + auto& cb = getCommandBuffer(commandBuffer); + auto& cmd = addCmd(cb); + + auto pipeLayoutPtr = getPtr(*cb.dev, layout); + cmd.pipeLayout = pipeLayoutPtr.get(); + useHandle(cb, cmd, *pipeLayoutPtr); + + cmd.bindPoint = pipelineBindPoint; + cmd.set = set; + + cmd.record(*cb.dev, cb.handle, cb.pool().queueFamily); +} + +VKAPI_ATTR void VKAPI_CALL CmdSetDescriptorBufferOffsets2EXT( + VkCommandBuffer commandBuffer, + const VkSetDescriptorBufferOffsetsInfoEXT* info) { + auto& cb = getCommandBuffer(commandBuffer); + auto& cmd = addCmd(cb); + cmd.stages = info->stageFlags; + + auto pipeLayoutPtr = getPtr(*cb.dev, info->layout); + cmd.pipeLayout = pipeLayoutPtr.get(); + useHandle(cb, cmd, *pipeLayoutPtr); + + cmd.firstSet = info->firstSet; + cmd.bufferIndices = copySpan(cb, info->pBufferIndices, info->setCount); + cmd.offsets = copySpan(cb, info->pOffsets, info->setCount); + cmd.pNext = copyChain(cb, info->pNext); + + cmd.record(*cb.dev, cb.handle, cb.pool().queueFamily); +} + +VKAPI_ATTR void VKAPI_CALL CmdBindDescriptorBufferEmbeddedSamplers2EXT( + VkCommandBuffer commandBuffer, + const VkBindDescriptorBufferEmbeddedSamplersInfoEXT* info) { + auto& cb = getCommandBuffer(commandBuffer); + auto& cmd = addCmd(cb); + + auto pipeLayoutPtr = getPtr(*cb.dev, info->layout); + cmd.pipeLayout = pipeLayoutPtr.get(); + useHandle(cb, cmd, *pipeLayoutPtr); + + cmd.stages = info->stageFlags; + cmd.set = info->set; + cmd.pNext = copyChain(cb, info->pNext); + + cmd.record(*cb.dev, cb.handle, cb.pool().queueFamily); +} + } // namespace vil diff --git a/src/cb.hpp b/src/cb.hpp index cbb182b..d377810 100644 --- a/src/cb.hpp +++ b/src/cb.hpp @@ -1051,4 +1051,34 @@ VKAPI_ATTR void VKAPI_CALL CmdSetDepthBias2EXT( VkCommandBuffer commandBuffer, const VkDepthBiasInfoEXT* pDepthBiasInfo); +// VK_EXT_descriptor_buffer +VKAPI_ATTR void VKAPI_CALL CmdBindDescriptorBuffersEXT( + VkCommandBuffer commandBuffer, + uint32_t bufferCount, + const VkDescriptorBufferBindingInfoEXT* pBindingInfos); + +VKAPI_ATTR void VKAPI_CALL CmdSetDescriptorBufferOffsetsEXT( + VkCommandBuffer commandBuffer, + VkPipelineBindPoint pipelineBindPoint, + VkPipelineLayout layout, + uint32_t firstSet, + uint32_t setCount, + const uint32_t* pBufferIndices, + const VkDeviceSize* pOffsets); + +VKAPI_ATTR void VKAPI_CALL CmdBindDescriptorBufferEmbeddedSamplersEXT( + VkCommandBuffer commandBuffer, + VkPipelineBindPoint pipelineBindPoint, + VkPipelineLayout layout, + uint32_t set); + +// + maintenance6 +VKAPI_ATTR void VKAPI_CALL CmdSetDescriptorBufferOffsets2EXT( + VkCommandBuffer commandBuffer, + const VkSetDescriptorBufferOffsetsInfoEXT* pSetDescriptorBufferOffsetsInfo); + +VKAPI_ATTR void VKAPI_CALL CmdBindDescriptorBufferEmbeddedSamplers2EXT( + VkCommandBuffer commandBuffer, + const VkBindDescriptorBufferEmbeddedSamplersInfoEXT* pBindDescriptorBufferEmbeddedSamplersInfo); + } // namespace vil diff --git a/src/command/commands.cpp b/src/command/commands.cpp index a371439..07ed953 100644 --- a/src/command/commands.cpp +++ b/src/command/commands.cpp @@ -24,8 +24,6 @@ #include #include #include -#include -#include namespace vil { @@ -35,7 +33,7 @@ auto rawHandles(ThreadMemScope& scope, const C& handles) { using VkH = decltype(handles[0]->handle); auto ret = scope.alloc(handles.size()); for(auto i = 0u; i < handles.size(); ++i) { - ret[i] = handles[i]->handle; + ret[i] = handles[i] ? handles[i]->handle : VK_NULL_HANDLE; } return ret; @@ -1012,9 +1010,26 @@ void BindIndexBufferCmd::record(const Device& dev, VkCommandBuffer cb, u32) cons void BindDescriptorSetCmd::record(const Device& dev, VkCommandBuffer cb, u32) const { ThreadMemScope memScope; auto vkds = rawHandles(memScope, sets); - dev.dispatch.CmdBindDescriptorSets(cb, pipeBindPoint, pipeLayout->handle, - firstSet, u32(vkds.size()), vkds.data(), - u32(dynamicOffsets.size()), dynamicOffsets.data()); + + if (stageFlags) { + VkBindDescriptorSetsInfo info; + info.sType = VK_STRUCTURE_TYPE_BIND_DESCRIPTOR_SETS_INFO; + info.pNext = pNext; + info.descriptorSetCount = u32(vkds.size()); + info.pDescriptorSets = vkds.data(); + info.dynamicOffsetCount = u32(dynamicOffsets.size()); + info.pDynamicOffsets = dynamicOffsets.data(); + info.stageFlags = stageFlags; + info.firstSet = firstSet; + info.layout = pipeLayout->handle; + + dev.dispatch.CmdBindDescriptorSets2(cb, &info); + } else { + dlg_assert(!pNext); + dev.dispatch.CmdBindDescriptorSets(cb, pipeBindPoint, pipeLayout->handle, + firstSet, u32(vkds.size()), vkds.data(), + u32(dynamicOffsets.size()), dynamicOffsets.data()); + } } std::string BindDescriptorSetCmd::toString() const { @@ -1747,8 +1762,22 @@ std::string BindPipelineCmd::toString() const { // PushConstantsCmd void PushConstantsCmd::record(const Device& dev, VkCommandBuffer cb, u32) const { - dev.dispatch.CmdPushConstants(cb, pipeLayout->handle, stages, offset, - u32(values.size()), values.data()); + if (v2) { + VkPushConstantsInfo info; + info.sType = VK_STRUCTURE_TYPE_PUSH_CONSTANTS_INFO; + info.pNext = pNext; + info.layout = pipeLayout->handle; + info.stageFlags = stages; + info.offset = offset; + info.pValues = values.data(); + info.size = u32(values.size()); + + dev.dispatch.CmdPushConstants2(cb, &info); + } else { + dlg_assert(!pNext); + dev.dispatch.CmdPushConstants(cb, pipeLayout->handle, stages, offset, + u32(values.size()), values.data()); + } } // SetViewportCmd @@ -1853,8 +1882,23 @@ void CopyQueryPoolResultsCmd::record(const Device& dev, VkCommandBuffer cb, u32) // PushDescriptorSet void PushDescriptorSetCmd::record(const Device& dev, VkCommandBuffer cb, u32) const { - dev.dispatch.CmdPushDescriptorSetKHR(cb, bindPoint, pipeLayout->handle, - set, u32(descriptorWrites.size()), descriptorWrites.data()); + if (stages) { + dlg_assert(!pNext); + VkPushDescriptorSetInfo info; + info.sType = VK_STRUCTURE_TYPE_PUSH_DESCRIPTOR_SET_INFO; + info.pNext = pNext; + info.stageFlags = stages; + info.layout = pipeLayout->handle; + info.set = set; + info.descriptorWriteCount = u32(descriptorWrites.size()); + info.pDescriptorWrites = descriptorWrites.data(); + + dev.dispatch.CmdPushDescriptorSet2(cb, &info); + } else { + dlg_assert(!pNext); + dev.dispatch.CmdPushDescriptorSet(cb, bindPoint, pipeLayout->handle, + set, u32(descriptorWrites.size()), descriptorWrites.data()); + } } void PushDescriptorSetCmd::displayInspector(Gui& gui) const { @@ -1873,8 +1917,20 @@ void PushDescriptorSetCmd::displayInspector(Gui& gui) const { // PushDescriptorSetWithTemplate void PushDescriptorSetWithTemplateCmd::record(const Device& dev, VkCommandBuffer cb, u32) const { - dev.dispatch.CmdPushDescriptorSetWithTemplateKHR(cb, updateTemplate->handle, - pipeLayout->handle, set, static_cast(data.data())); + if (v2) { + VkPushDescriptorSetWithTemplateInfo info {}; + info.sType = VK_STRUCTURE_TYPE_PUSH_DESCRIPTOR_SET_WITH_TEMPLATE_INFO; + info.pNext = pNext; + info.descriptorUpdateTemplate = updateTemplate->handle; + info.layout = pipeLayout->handle; + info.set = set; + info.pData = data.data(); + dev.dispatch.CmdPushDescriptorSetWithTemplate2(cb, &info); + } else { + dlg_assert(!pNext); + dev.dispatch.CmdPushDescriptorSetWithTemplate(cb, updateTemplate->handle, + pipeLayout->handle, set, static_cast(data.data())); + } } // VK_KHR_fragment_shading_rate @@ -2219,8 +2275,12 @@ void SetDepthClampRangeCmd::record(const Device& dev, VkCommandBuffer cb, u32) c VkGeneratedCommandsInfoEXT convert(const GeneratedCommandsInfo& info) { VkGeneratedCommandsInfoEXT fwd {}; fwd.sType = VK_STRUCTURE_TYPE_GENERATED_COMMANDS_INFO_EXT; - fwd.indirectCommandsLayout = info.layout->handle; - fwd.indirectExecutionSet = info.execSet->handle; + if (info.layout) { + fwd.indirectCommandsLayout = info.layout->handle; + } + if (info.execSet) { + fwd.indirectExecutionSet = info.execSet->handle; + } fwd.indirectAddress = info.indirectAddress; fwd.indirectAddressSize = info.indirectSize; fwd.preprocessAddress = info.preprocessAddress; @@ -2235,18 +2295,28 @@ VkGeneratedCommandsInfoEXT convert(const GeneratedCommandsInfo& info) { void ExecuteGeneratedCommandsCmd::record(const Device& dev, VkCommandBuffer cb, u32) const { auto fwd = convert(this->info); - dev.dispatch.CmdExecuteGeneratedCommandsEXT(cb, isPreprocessed, &fwd); + fwd.pNext = pNext; + // dev.dispatch.CmdExecuteGeneratedCommandsEXT(cb, isPreprocessed, &fwd); + // See PreprocessGeneratedCommandsCmd::record for reasoning of always + // passing false here + dev.dispatch.CmdExecuteGeneratedCommandsEXT(cb, false, &fwd); } void PreprocessGeneratedCommandsCmd::record(const Device& dev, VkCommandBuffer cb, u32) const { auto fwd = convert(this->info); - // TODO: this is a potential crash! the application is only required to + fwd.pNext = pNext; + + // NOTE: this is difficult! the application is only required to // keep the 'state' commandBuffer alive while recording, not for submission. // We cannot rely on this being alive here when recording later on in hook. // idea: // - keep the handle alive (ugly) // - make this a no-op and always pass true in ExecuteGeneratedCommands? (ugly) - dev.dispatch.CmdPreprocessGeneratedCommandsEXT(cb, &fwd, state->handle); + // we have chosen the last option for now, as it's simpler + // dev.dispatch.CmdPreprocessGeneratedCommandsEXT(cb, &fwd, state->handle); + (void) dev; + (void) cb; + (void) fwd; } // VK_EXT_extended_dynamic_state3 @@ -2350,6 +2420,50 @@ void SetDepthBias2Cmd::record(const Device& dev, VkCommandBuffer cb, u32) const dev.dispatch.CmdSetDepthBias2EXT(cb, &info); } +// VK_EXT_descriptor_buffer +void BindDescriptorBuffersCmd::record(const Device& dev, VkCommandBuffer cb, u32) const { + dev.dispatch.CmdBindDescriptorBuffersEXT(cb, u32(buffers.size()), buffers.data()); +} + +void SetDescriptorBufferOffsetsCmd::record(const Device& dev, VkCommandBuffer cb, u32) const { + dlg_assert(bufferIndices.size() == offsets.size()); + + if (stages) { + VkSetDescriptorBufferOffsetsInfoEXT info {}; + info.sType = VK_STRUCTURE_TYPE_SET_DESCRIPTOR_BUFFER_OFFSETS_INFO_EXT; + info.pNext = pNext; + info.stageFlags = stages; + info.layout = pipeLayout->handle; + info.firstSet = firstSet; + info.pBufferIndices = bufferIndices.data(); + info.setCount = u32(bufferIndices.size()); + info.pOffsets = offsets.data(); + + dev.dispatch.CmdSetDescriptorBufferOffsets2EXT(cb, &info); + } else { + dlg_assert(!pNext); + dev.dispatch.CmdSetDescriptorBufferOffsetsEXT(cb, bindPoint, pipeLayout->handle, + firstSet, u32(bufferIndices.size()), bufferIndices.data(), offsets.data()); + } +} + +void BindDescriptorBufferEmbeddedSamplersCmd::record(const Device& dev, VkCommandBuffer cb, u32) const { + if (stages) { + VkBindDescriptorBufferEmbeddedSamplersInfoEXT info {}; + info.sType = VK_STRUCTURE_TYPE_BIND_DESCRIPTOR_BUFFER_EMBEDDED_SAMPLERS_INFO_EXT; + info.pNext = pNext; + info.layout = pipeLayout->handle; + info.set = set; + info.stageFlags = stages; + + dev.dispatch.CmdBindDescriptorBufferEmbeddedSamplers2EXT(cb, &info); + } else { + dlg_assert(!pNext); + dev.dispatch.CmdBindDescriptorBufferEmbeddedSamplersEXT(cb, bindPoint, + pipeLayout->handle, set); + } +} + // util bool isIndirect(const Command& cmd) { return diff --git a/src/command/commands.hpp b/src/command/commands.hpp index 94978d4..a2ee800 100644 --- a/src/command/commands.hpp +++ b/src/command/commands.hpp @@ -179,6 +179,11 @@ enum class CommandType : u32 { // VK_EXT_depth_bias_control setDepthBias2, + // VK_EXT_descriptor_buffer + bindDescriptorBuffers, + setDescriptorBufferOffsets, + bindDescriptorBufferEmbeddedSamplers, + count, }; @@ -689,6 +694,7 @@ struct BindDescriptorSetCmd final : CmdDerive { }; struct PushConstantsCmd final : CmdDerive { - PipelineLayout* pipeLayout; // kept alive via shared_ptr in CommandBuffer + PipelineLayout* pipeLayout; // kept alive via shared ptr in CommandBuffer VkShaderStageFlags stages {}; u32 offset {}; span values; + bool v2 {}; + void* pNext {}; - std::string_view nameDesc() const override { return "PushConstants"; } + std::string_view nameDesc() const override { + return v2 ? "PushConstants2" : "PushConstants"; + } Category category() const override { return Category::bind; } void record(const Device&, VkCommandBuffer, u32) const override; }; @@ -1170,6 +1180,7 @@ struct PushDescriptorSetCmd final : CmdDerive data; + bool v2 {}; + void* pNext{}; + Category category() const override { return Category::bind; } - std::string_view nameDesc() const override { return "PushDescriptorSetWithTemplate"; } + std::string_view nameDesc() const override { + return v2 ? "PushDescriptorSetWithTemplate2" : "PushDescriptorSetWithTemplate2"; + } void record(const Device&, VkCommandBuffer cb, u32) const override; }; @@ -1605,9 +1621,12 @@ struct GeneratedCommandsInfo { }; // TODO: make this parent command? +VkGeneratedCommandsInfoEXT convert(const GeneratedCommandsInfo& info); + struct ExecuteGeneratedCommandsCmd final : CmdDerive { bool isPreprocessed {}; GeneratedCommandsInfo info {}; + void* pNext {}; std::string_view nameDesc() const override { return "ExecuteGeneratedCommands"; } void record(const Device&, VkCommandBuffer cb, u32) const override; @@ -1617,6 +1636,7 @@ struct ExecuteGeneratedCommandsCmd final : CmdDerive { GeneratedCommandsInfo info {}; CommandBuffer* state {}; + void* pNext {}; std::string_view nameDesc() const override { return "PreprocessGeneratedCommands"; } void record(const Device&, VkCommandBuffer cb, u32) const override; @@ -1809,6 +1829,47 @@ struct SetDepthBias2Cmd final : CmdDerive { Category category() const override { return Category::bind; } }; +// VK_EXT_descriptor_buffer +struct BindDescriptorBuffersCmd final : CmdDerive { + span buffers; + + std::string_view nameDesc() const override { return "BindDescriptorBuffers"; } + void record(const Device&, VkCommandBuffer cb, u32) const override; + Category category() const override { return Category::bind; } +}; + +struct SetDescriptorBufferOffsetsCmd final : CmdDerive { + PipelineLayout* pipeLayout {}; + u32 firstSet {}; + span bufferIndices {}; + span offsets {}; + + VkPipelineBindPoint bindPoint {}; // only for v1 + VkShaderStageFlags stages {}; // only for v2 + void* pNext {}; + + std::string_view nameDesc() const override { + return stages ? "SetDescriptorBufferOffsets2" : "SetDescriptorBufferOffsets"; + } + void record(const Device&, VkCommandBuffer cb, u32) const override; + Category category() const override { return Category::bind; } +}; + +struct BindDescriptorBufferEmbeddedSamplersCmd final : CmdDerive { + PipelineLayout* pipeLayout {}; + u32 set {}; + + VkPipelineBindPoint bindPoint {}; // only for v1 + VkShaderStageFlags stages {}; // only for v2 + void* pNext {}; + + std::string_view nameDesc() const override { + return stages ? "BindDescriptorBufferEmbeddedSamplers2" : "BindDescriptorBufferEmbeddedSamplers"; + } + void record(const Device&, VkCommandBuffer cb, u32) const override; + Category category() const override { return Category::bind; } +}; + // F: overloaded function type of signature void(*) // Will call f with cmd casted to the type indicated by cmdType. @@ -1944,6 +2005,9 @@ auto castCommandType(CommandType cmdType, Command* cmd, F&& f) { case CT::setLineStippleEnable: return f(static_cast(cmd)); case CT::setDepthClipNegativeOneToOneEXT: return f(static_cast(cmd)); case CT::setDepthBias2: return f(static_cast(cmd)); + case CT::bindDescriptorBuffers: return f(static_cast(cmd)); + case CT::setDescriptorBufferOffsets: return f(static_cast(cmd)); + case CT::bindDescriptorBufferEmbeddedSamplers: return f(static_cast(cmd)); case CT::count: dlg_error("Invalid command type"); // NOTE: no default case by design so that we get compiler warnings about diff --git a/src/commandHook/hook.cpp b/src/commandHook/hook.cpp index f890c9c..1cc37bf 100644 --- a/src/commandHook/hook.cpp +++ b/src/commandHook/hook.cpp @@ -142,7 +142,8 @@ CommandHook::CommandHook(Device& dev) { hookAccelStructBuilds = checkEnvBinary("VIL_CAPTURE_ACCEL_STRUCTS", true); initImageCopyPipes(dev); initVertexCopy(dev); - if(hasAppExt(dev, VK_KHR_ACCELERATION_STRUCTURE_EXTENSION_NAME)) { + if(hasAppExt(dev, VK_KHR_ACCELERATION_STRUCTURE_EXTENSION_NAME) && + dev.bufferDeviceAddress) { initAccelStructCopy(dev); initShaderTableHook(dev); } diff --git a/src/commandHook/record.cpp b/src/commandHook/record.cpp index 64a05fe..6db1c2b 100644 --- a/src/commandHook/record.cpp +++ b/src/commandHook/record.cpp @@ -691,7 +691,8 @@ void CommandHookRecord::hookRecord(Command* cmd, RecordInfo& info) { while(cmd) { // check if command needs additional, manual hook if(cmd->category() == CommandCategory::buildAccelStruct && - commandHook().hookAccelStructBuilds) { + commandHook().hookAccelStructBuilds && + commandHook().accelStructVertCopy_) { auto* basCmd = commandCast(cmd); auto* basCmdIndirect = commandCast(cmd); diff --git a/src/data.hpp b/src/data.hpp index b22b769..ea1045a 100644 --- a/src/data.hpp +++ b/src/data.hpp @@ -52,8 +52,11 @@ R& getData(T handle) { template void insertData(T handle, void* data) { std::lock_guard lock(dataMutex); - auto [_, success] = dispatchableTable.emplace(handleToU64(handle), data); - dlg_assert(success); + dlg_trace("insertData {} {}", typeid(handle).name(), handleToU64(handle)); + // we want to override in question. We just assume that it wasn't properly + // cleaned up before. + auto [_, newInsert] = dispatchableTable.insert_or_assign(handleToU64(handle), data); + dlg_assertm(newInsert, "handle {} already known", handleToU64(handle)); } template @@ -72,6 +75,7 @@ void eraseData(T handle) { return; } + dlg_trace("eraseData {} {}", typeid(handle).name(), handleToU64(handle)); dispatchableTable.erase(it); } @@ -85,6 +89,7 @@ std::unique_ptr moveDataOpt(T handle) { auto ptr = it->second; dispatchableTable.erase(it); + dlg_trace("eraseData {} {}", typeid(handle).name(), handleToU64(handle)); return std::unique_ptr(static_cast(ptr)); } diff --git a/src/device.cpp b/src/device.cpp index d24bd9c..a5fc978 100644 --- a/src/device.cpp +++ b/src/device.cpp @@ -860,6 +860,7 @@ VkResult doCreateDevice( dev.enabledFeatures13 = features13; dev.windowWaitForSurface = checkEnvBinary("VIL_WAIT_SURFACE", false); + dev.hookRecordOnEnd = checkEnvBinary("VIL_CB_TEST_HOOK", true); layer_init_device_dispatch_table(dev.handle, &dev.dispatch, fpGetDeviceProcAddr); diff --git a/src/device.hpp b/src/device.hpp index ccb38b4..2c247ab 100644 --- a/src/device.hpp +++ b/src/device.hpp @@ -107,6 +107,7 @@ struct Device { // Whether we are in integration testing mode bool testing {}; + bool hookRecordOnEnd {}; // Whether indirect vertex copy is enabled. // Will modify usage flags resources are created with diff --git a/src/ds.cpp b/src/ds.cpp index 32166f2..1bd0ccf 100644 --- a/src/ds.cpp +++ b/src/ds.cpp @@ -2057,4 +2057,70 @@ VKAPI_ATTR void VKAPI_CALL GetDescriptorSetLayoutSupport( dev.dispatch.GetDescriptorSetLayoutSupport(dev.handle, &nci, pSupport); } +// VK_EXT_descriptor_buffer +VKAPI_ATTR void VKAPI_CALL GetDescriptorSetLayoutSizeEXT( + VkDevice device, + VkDescriptorSetLayout vklayout, + VkDeviceSize* pLayoutSizeInBytes) { + auto& layout = get(device, vklayout); + auto& dev = *layout.dev; + return dev.dispatch.GetDescriptorSetLayoutSizeEXT(dev.handle, + layout.handle, pLayoutSizeInBytes); +} + +VKAPI_ATTR void VKAPI_CALL GetDescriptorSetLayoutBindingOffsetEXT( + VkDevice device, + VkDescriptorSetLayout vklayout, + uint32_t binding, + VkDeviceSize* pOffset) { + auto& layout = get(device, vklayout); + auto& dev = *layout.dev; + return dev.dispatch.GetDescriptorSetLayoutBindingOffsetEXT(dev.handle, + layout.handle, binding, pOffset); +} + +VKAPI_ATTR void VKAPI_CALL GetDescriptorEXT( + VkDevice device, + const VkDescriptorGetInfoEXT* pDescriptorInfo, + size_t dataSize, + void* pDescriptor) { + auto& dev = getDevice(device); + auto info = *pDescriptorInfo; + VkDescriptorImageInfo imgInfo; + switch (info.type) { + case VK_DESCRIPTOR_TYPE_SAMPLER: + if (info.data.pSampler) { + info.data.pSampler = &get(device, *info.data.pSampler).handle; + } + break; + case VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER: + case VK_DESCRIPTOR_TYPE_INPUT_ATTACHMENT: + case VK_DESCRIPTOR_TYPE_SAMPLED_IMAGE: + case VK_DESCRIPTOR_TYPE_STORAGE_IMAGE: + if (info.data.pSampledImage) { + imgInfo = *info.data.pSampledImage; + if (imgInfo.imageView) { + imgInfo.imageView = get(device, imgInfo.imageView).handle; + } + if (imgInfo.sampler) { + imgInfo.sampler = get(device, imgInfo.sampler).handle; + } + info.data.pSampledImage = &imgInfo; + } + break; + case VK_DESCRIPTOR_TYPE_STORAGE_BUFFER: + case VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER: + case VK_DESCRIPTOR_TYPE_STORAGE_TEXEL_BUFFER: + case VK_DESCRIPTOR_TYPE_UNIFORM_TEXEL_BUFFER: + case VK_DESCRIPTOR_TYPE_ACCELERATION_STRUCTURE_KHR: + // nop, just address referenced + break; + default: + dlg_error("Unsupported descriptor type"); + break; + } + + dev.dispatch.GetDescriptorEXT(dev.handle, &info, dataSize, pDescriptor); +} + } // namespace vil diff --git a/src/ds.hpp b/src/ds.hpp index 0333aa1..d486477 100644 --- a/src/ds.hpp +++ b/src/ds.hpp @@ -404,4 +404,50 @@ VKAPI_ATTR void VKAPI_CALL GetDescriptorSetLayoutSupport( const VkDescriptorSetLayoutCreateInfo* pCreateInfo, VkDescriptorSetLayoutSupport* pSupport); +// VK_EXT_descriptor_buffer +VKAPI_ATTR void VKAPI_CALL GetDescriptorSetLayoutSizeEXT( + VkDevice device, + VkDescriptorSetLayout layout, + VkDeviceSize* pLayoutSizeInBytes); + +VKAPI_ATTR void VKAPI_CALL GetDescriptorSetLayoutBindingOffsetEXT( + VkDevice device, + VkDescriptorSetLayout layout, + uint32_t binding, + VkDeviceSize* pOffset); + +VKAPI_ATTR void VKAPI_CALL GetDescriptorEXT( + VkDevice device, + const VkDescriptorGetInfoEXT* pDescriptorInfo, + size_t dataSize, + void* pDescriptor); + +/* +// TODO +VKAPI_ATTR VkResult VKAPI_CALL GetBufferOpaqueCaptureDescriptorDataEXT( + VkDevice device, + const VkBufferCaptureDescriptorDataInfoEXT* pInfo, + void* pData); + +VKAPI_ATTR VkResult VKAPI_CALL GetImageOpaqueCaptureDescriptorDataEXT( + VkDevice device, + const VkImageCaptureDescriptorDataInfoEXT* pInfo, + void* pData); + +VKAPI_ATTR VkResult VKAPI_CALL GetImageViewOpaqueCaptureDescriptorDataEXT( + VkDevice device, + const VkImageViewCaptureDescriptorDataInfoEXT* pInfo, + void* pData); + +VKAPI_ATTR VkResult VKAPI_CALL GetSamplerOpaqueCaptureDescriptorDataEXT( + VkDevice device, + const VkSamplerCaptureDescriptorDataInfoEXT* pInfo, + void* pData); + +VKAPI_ATTR VkResult VKAPI_CALL GetAccelerationStructureOpaqueCaptureDescriptorDataEXT( + VkDevice device, + const VkAccelerationStructureCaptureDescriptorDataInfoEXT* pInfo, + void* pData); +*/ + } // namespace vil diff --git a/src/exts.hpp b/src/exts.hpp index 6f6f42b..8179806 100644 --- a/src/exts.hpp +++ b/src/exts.hpp @@ -352,6 +352,7 @@ constexpr std::string_view supportedDevExts[] = { "VK_KHR_depth_stencil_resolve", "VK_EXT_vertex_attribute_divisor", "VK_KHR_vertex_attribute_divisor", + "VK_EXT_descriptor_buffer", }; // Known/look like they might cause problems or crashes. @@ -402,7 +403,6 @@ constexpr std::string_view unsupportedDevExts[] = { "VK_KHR_object_refresh", // commands "VK_QCOM_tile_shading", // commands "VK_EXT_metal_objects", - "VK_EXT_descriptor_buffer", // TODO! "VK_NV_fragment_shading_rate_enums", // command "VK_NV_ray_tracing_motion_blur", // might work as nop? "VK_EXT_image_compression_control", // image wrapped diff --git a/src/gencmd.cpp b/src/gencmd.cpp index b6a07e5..70b3a8d 100644 --- a/src/gencmd.cpp +++ b/src/gencmd.cpp @@ -5,9 +5,35 @@ #include #include #include +#include namespace vil { +// ugly, should be integrated into util/chain.hpp functions +void patchIndirectExecutionChain(LinAllocator& alloc, Device& dev, void* pNext) { + auto next = static_cast(pNext); + while(next) { + if(next->sType == VK_STRUCTURE_TYPE_GENERATED_COMMANDS_SHADER_INFO_EXT) { + auto* shaders = reinterpret_cast(next); + auto shadersCopy = alloc.copy(shaders->pShaders, shaders->shaderCount); + for(auto& shader : shadersCopy) { + if(shader) { + shader = get(dev, shader).handle; + } + } + + shaders->pShaders = shadersCopy.data(); + } else if(next->sType == VK_STRUCTURE_TYPE_GENERATED_COMMANDS_PIPELINE_INFO_EXT) { + auto* pipe = reinterpret_cast(next); + if(pipe->pipeline) { + pipe->pipeline = get(dev, pipe->pipeline).handle; + } + } + + next = next->pNext; + } +} + IndirectCommandsLayout::~IndirectCommandsLayout() = default; VKAPI_ATTR void VKAPI_CALL GetGeneratedCommandsMemoryRequirementsEXT( @@ -17,8 +43,18 @@ VKAPI_ATTR void VKAPI_CALL GetGeneratedCommandsMemoryRequirementsEXT( auto& dev = getDevice(device); auto info = *pInfo; - info.indirectCommandsLayout = get(device, pInfo->indirectCommandsLayout).handle; - info.indirectExecutionSet = get(device, pInfo->indirectExecutionSet).handle; + if (info.indirectCommandsLayout) { + info.indirectCommandsLayout = get(device, pInfo->indirectCommandsLayout).handle; + } + if (info.indirectExecutionSet) { + info.indirectExecutionSet = get(device, pInfo->indirectExecutionSet).handle; + } + + // unwrap pipe/shader handles + ThreadMemScope tms; + auto copied = copyChainLocal(tms, info.pNext); + info.pNext = copied; + patchIndirectExecutionChain(tms.customUse(), dev, copied); return dev.dispatch.GetGeneratedCommandsMemoryRequirementsEXT(device, &info, pMemoryRequirements); @@ -31,8 +67,12 @@ VKAPI_ATTR VkResult VKAPI_CALL CreateIndirectCommandsLayoutEXT( VkIndirectCommandsLayoutEXT* pIndirectCommandsLayout) { auto& dev = getDevice(device); auto nci = *pCreateInfo; - auto& pipeLayout = get(device, nci.pipelineLayout); - nci.pipelineLayout = pipeLayout.handle; + + PipelineLayout* pipeLayout {}; + if (nci.pipelineLayout) { + pipeLayout = &get(device, nci.pipelineLayout); + nci.pipelineLayout = pipeLayout->handle; + } auto res = dev.dispatch.CreateIndirectCommandsLayoutEXT(dev.handle, &nci, pAllocator, pIndirectCommandsLayout); @@ -40,14 +80,17 @@ VKAPI_ATTR VkResult VKAPI_CALL CreateIndirectCommandsLayoutEXT( return res; } - auto& cmdLayout = dev.indirectCommandsLayouts.add(*pIndirectCommandsLayout); + auto cmdLayoutPtr = IntrusivePtr(new IndirectCommandsLayout()); + auto& cmdLayout = *cmdLayoutPtr; cmdLayout.dev = &dev; - cmdLayout.pipeLayout.reset(&pipeLayout); + cmdLayout.pipeLayout.reset(pipeLayout); cmdLayout.handle = *pIndirectCommandsLayout; cmdLayout.tokens = {pCreateInfo->pTokens, pCreateInfo->pTokens + pCreateInfo->tokenCount}; cmdLayout.stride = pCreateInfo->indirectStride; cmdLayout.flags = pCreateInfo->flags; + *pIndirectCommandsLayout = castDispatch(cmdLayout); + dev.indirectCommandsLayouts.mustEmplace(*pIndirectCommandsLayout, std::move(cmdLayoutPtr)); return res; } @@ -58,7 +101,7 @@ VKAPI_ATTR void VKAPI_CALL DestroyIndirectCommandsLayoutEXT( const VkAllocationCallbacks* pAllocator) { auto cmdLayoutPtr = mustMoveUnset(device, indirectCommandsLayout); cmdLayoutPtr->dev->dispatch.DestroyIndirectCommandsLayoutEXT( - cmdLayoutPtr->dev->handle, cmdLayoutPtr->handle, pAllocator); + cmdLayoutPtr->dev->handle, indirectCommandsLayout, pAllocator); } VKAPI_ATTR VkResult VKAPI_CALL CreateIndirectExecutionSetEXT( @@ -83,16 +126,22 @@ VKAPI_ATTR VkResult VKAPI_CALL CreateIndirectExecutionSetEXT( nci.info.pShaderInfo = &shaderInfo; auto shaders = tms.copy(shaderInfo.pInitialShaders, shaderInfo.maxShaderCount); - auto layoutInfos = tms.copy(shaderInfo.pSetLayoutInfos, shaderInfo.maxShaderCount); + + span layoutInfos; + if (shaderInfo.pSetLayoutInfos) { + layoutInfos = tms.copy(shaderInfo.pSetLayoutInfos, shaderInfo.maxShaderCount); + } for(auto i = 0u; i < shaderInfo.maxShaderCount; ++i) { shaders[i] = get(device, shaders[i]).handle; - auto dsLayouts = tms.copy(layoutInfos[i].pSetLayouts, layoutInfos[i].setLayoutCount); - for (auto& dsLayout : dsLayouts) { - dsLayout = get(device, dsLayout).handle; + if (!layoutInfos.empty()) { + auto dsLayouts = tms.copy(layoutInfos[i].pSetLayouts, layoutInfos[i].setLayoutCount); + for (auto& dsLayout : dsLayouts) { + dsLayout = get(device, dsLayout).handle; + } + layoutInfos[i].pSetLayouts = dsLayouts.data(); } - layoutInfos[i].pSetLayouts = dsLayouts.data(); } shaderInfo.pInitialShaders = shaders.data(); @@ -107,11 +156,14 @@ VKAPI_ATTR VkResult VKAPI_CALL CreateIndirectExecutionSetEXT( return res; } - auto& cmdLayout = dev.indirectExecutionSets.add(*pIndirectExecutionSet); - cmdLayout.dev = &dev; + auto exeSetPtr = IntrusivePtr(new IndirectExecutionSet()); + auto& exeSet = *exeSetPtr; + exeSet.dev = &dev; + exeSet.handle = *pIndirectExecutionSet; // TODO: remember unwrapped creation info - cmdLayout.handle = *pIndirectExecutionSet; - *pIndirectExecutionSet = castDispatch(cmdLayout); + + *pIndirectExecutionSet = castDispatch(exeSet); + dev.indirectExecutionSets.mustEmplace(*pIndirectExecutionSet, std::move(exeSetPtr)); return res; } @@ -122,7 +174,7 @@ VKAPI_ATTR void VKAPI_CALL DestroyIndirectExecutionSetEXT( const VkAllocationCallbacks* pAllocator) { auto cmdLayoutPtr = mustMoveUnset(device, indirectExecutionSet); cmdLayoutPtr->dev->dispatch.DestroyIndirectExecutionSetEXT( - cmdLayoutPtr->dev->handle, cmdLayoutPtr->handle, pAllocator); + cmdLayoutPtr->dev->handle, indirectExecutionSet, pAllocator); } VKAPI_ATTR void VKAPI_CALL UpdateIndirectExecutionSetPipelineEXT( diff --git a/src/gencmd.hpp b/src/gencmd.hpp index deab30e..1ad1c2c 100644 --- a/src/gencmd.hpp +++ b/src/gencmd.hpp @@ -23,6 +23,9 @@ struct IndirectExecutionSet : public SharedDeviceHandle { VkIndirectExecutionSetEXT handle {}; }; +void patchIndirectExecutionChain(LinAllocator& alloc, Device& dev, void* pNext); + + // api VKAPI_ATTR void VKAPI_CALL GetGeneratedCommandsMemoryRequirementsEXT( VkDevice device, diff --git a/src/gui/cb.cpp b/src/gui/cb.cpp index 1a5a931..f568077 100644 --- a/src/gui/cb.cpp +++ b/src/gui/cb.cpp @@ -285,7 +285,13 @@ void CommandRecordGui::draw(Draw& draw) { // selected command anymore. updateMode = selector_.updateMode(); if(updateMode == UpdateMode::swapchain && !selector_.freezeState && doUpdate) { - auto lastPresent = swapchain->frameSubmissions[0].presentID; + FrameSubmissions lastFrame; + { + std::lock_guard lock(dev.mutex); + lastFrame = swapchain->frameSubmissions[0]; + } + + auto lastPresent = lastFrame.presentID; auto statePresent = selector_.hookStateSwapchainPresent(); if(!selector_.submission() || lastPresent > statePresent + 5) { auto diff = lastPresent - statePresent; @@ -299,8 +305,7 @@ void CommandRecordGui::draw(Draw& draw) { // force update if(!freezeCommands_) { - updateRecords(swapchain->frameSubmissions[0].batches, - {}, {}, {}); + updateRecords(std::move(lastFrame.batches), {}, {}, {}); } } } @@ -702,10 +707,13 @@ void CommandRecordGui::displayBatch(FrameSubmission& batch, u32 batchID) { } void CommandRecordGui::displayFrameCommands(Swapchain& swapchain) { + (void) swapchain; + /* if(frame_.empty() && swapchain.frameSubmissions[0].batches.empty()) { dlg_warn("how did this happen?"); frame_ = swapchain.frameSubmissions[0].batches; } + */ for(auto b = 0u; b < frame_.size(); ++b) { if(b > 0) { diff --git a/src/gui/command.cpp b/src/gui/command.cpp index 426d77b..c59e8c5 100644 --- a/src/gui/command.cpp +++ b/src/gui/command.cpp @@ -972,7 +972,7 @@ void CommandViewer::displayDs(Draw& draw) { auto* buf = std::get_if(&copiedData->data); if(!buf) { dlg_assert(copiedData->data.index() == 0); - imGuiText("Error copying descriptor buffer. See log output"); + imGuiText("Error copying buffer from descriptor. See log output"); return; } diff --git a/src/gui/gui.cpp b/src/gui/gui.cpp index e0198f1..0c1f0e3 100644 --- a/src/gui/gui.cpp +++ b/src/gui/gui.cpp @@ -1132,6 +1132,8 @@ void Gui::drawOverviewUI(Draw& draw) { imGuiCheckbox("Hook AccelerationStructures", dev.commandHook->hookAccelStructBuilds); imGuiCheckbox("Print VertexCapture Timings", dev.printVertexCaptureTimings); imGuiCheckbox("Print VertexCapture Metadata", dev.printVertexCaptureMetadata); + ImGui::Checkbox("Show cursor", &io_->MouseDrawCursor); + ImGui::Checkbox("Show debug window", &showDebug); } } diff --git a/src/gui/resources.cpp b/src/gui/resources.cpp index c4c3370..866fce9 100644 --- a/src/gui/resources.cpp +++ b/src/gui/resources.cpp @@ -24,6 +24,7 @@ #include #include #include +#include #include #include #include @@ -1544,6 +1545,14 @@ void ResourceGui::drawDesc(Draw& draw, ShaderObject& sobj) { imGuiText("TODO"); } +void ResourceGui::drawDesc(Draw&, IndirectExecutionSet&) { + imGuiText("TODO"); +} + +void ResourceGui::drawDesc(Draw&, IndirectCommandsLayout&) { + imGuiText("TODO"); +} + void ResourceGui::clearHandles() { auto decRefCountVisitor = TemplateResourceVisitor([&](auto& res) { using HT = std::remove_reference_t; @@ -1567,8 +1576,10 @@ void ResourceGui::clearHandles() { // clear selection auto typeHandler = ObjectTypeHandler::handler(filter_); - for(auto& handle : handles_) { - typeHandler->visit(decRefCountVisitor, *handle); + if (typeHandler) { + for(auto& handle : handles_) { + typeHandler->visit(decRefCountVisitor, *handle); + } } handles_.clear(); @@ -1618,7 +1629,7 @@ void ResourceGui::updateResourceList() { } } } - } else { + } else if(typeHandler) { handles_ = typeHandler->resources(dev, search_); for(auto& handle : handles_) { @@ -1664,7 +1675,11 @@ void ResourceGui::draw(Draw& draw) { auto filterName = vil::name(filter_); // ImGui::SetNextItemWidth(150.f); if(ImGui::BeginCombo(ICON_FA_FILTER, filterName)) { - for(auto& typeHandler : ObjectTypeHandler::handlers) { + for(auto* typeHandler : ObjectTypeHandler::handlers) { + if (!typeHandler) { + continue; + } + auto filter = typeHandler->objectType(); auto name = vil::name(filter); if(ImGui::Selectable(name)) { @@ -1695,6 +1710,7 @@ void ResourceGui::draw(Draw& draw) { // resource list const auto* typeHandler = ObjectTypeHandler::handler(filter_); + dlg_assert(typeHandler); bool isDestroyed {}; auto isDestroyedVisitor = TemplateResourceVisitor([&](auto& res) { @@ -1888,7 +1904,7 @@ void ResourceGui::drawHandleDesc(Draw& draw) { } } else { for(auto& handler : ObjectTypeHandler::handlers) { - if(handler->objectType() == filter_) { + if(handler && handler->objectType() == filter_) { handler->visit(visitor, *handle_); } } diff --git a/src/gui/resources.hpp b/src/gui/resources.hpp index e15f89e..e36ac2f 100644 --- a/src/gui/resources.hpp +++ b/src/gui/resources.hpp @@ -56,6 +56,8 @@ class ResourceGui { void drawDesc(Draw&, AccelStruct&); void drawDesc(Draw&, DescriptorUpdateTemplate&); void drawDesc(Draw&, ShaderObject&); + void drawDesc(Draw&, IndirectExecutionSet&); + void drawDesc(Draw&, IndirectCommandsLayout&); void drawShaderInfo(VkPipeline, VkShaderStageFlagBits stage); void drawImageContents(Draw&, Image&, bool doSelect); diff --git a/src/handle.cpp b/src/handle.cpp index 91cc3b8..e4766e8 100644 --- a/src/handle.cpp +++ b/src/handle.cpp @@ -15,7 +15,7 @@ #include #include #include -#include +#include #include #include #include @@ -266,39 +266,49 @@ const PipelineTypeImpl PipelineTypeImpl::instance; // TODO: replace with something like castCommandType? static const ObjectTypeHandler* typeHandlers[] = { - &ObjectTypeMapImpl::instance, - &ObjectTypeMapImpl::instance, - &ObjectTypeMapImpl::instance, - &ObjectTypeMapImpl::instance, - &ObjectTypeMapImpl::instance, + nullptr, // unknown + nullptr, // instance + nullptr, // physical device + nullptr, // device + &QueueTypeImpl::instance, + &ObjectTypeMapImpl::instance, &ObjectTypeMapImpl::instance, - &ObjectTypeMapImpl::instance, - &ObjectTypeMapImpl::instance, - &ObjectTypeMapImpl::instance, - &ObjectTypeMapImpl::instance, - &ObjectTypeMapImpl::instance, - &ObjectTypeMapImpl::instance, - &ObjectTypeMapImpl::instance, &ObjectTypeMapImpl::instance, - &ObjectTypeMapImpl::instance, + &ObjectTypeMapImpl::instance, + &ObjectTypeMapImpl::instance, + &ObjectTypeMapImpl::instance, &ObjectTypeMapImpl::instance, &ObjectTypeMapImpl::instance, - &ObjectTypeMapImpl::instance, + &ObjectTypeMapImpl::instance, + &ObjectTypeMapImpl::instance, &ObjectTypeMapImpl::instance, - &ObjectTypeMapImpl::instance, - &ObjectTypeMapImpl::instance, + nullptr, // pipeline cache + &ObjectTypeMapImpl::instance, + &ObjectTypeMapImpl::instance, &PipelineTypeImpl::instance, - &QueueTypeImpl::instance, - // NOTE: this one is special, it does not support all operations. - // Caller must handle this on their side. + &ObjectTypeMapImpl::instance, + &ObjectTypeMapImpl::instance, + &ObjectTypeMapImpl::instance, &DescriptorSetTypeImpl::instance, + &ObjectTypeMapImpl::instance, + &ObjectTypeMapImpl::instance, + &ObjectTypeMapImpl::instance, + // ... + &ObjectTypeMapImpl::instance, + // ... + &ObjectTypeMapImpl::instance, + // ... + &ObjectTypeMapImpl::instance, + // ... + &ObjectTypeMapImpl::instance, + &ObjectTypeMapImpl::instance, }; const span ObjectTypeHandler::handlers = typeHandlers; const ObjectTypeHandler* ObjectTypeHandler::handler(VkObjectType type) { for(auto& handler : ObjectTypeHandler::handlers) { - if(handler->objectType() == type) { + if(handler && handler->objectType() == type) { return handler; } } @@ -309,13 +319,14 @@ const ObjectTypeHandler* ObjectTypeHandler::handler(VkObjectType type) { } Handle* findHandle(Device& dev, VkObjectType objectType, u64 handle, u64& fwdID) { - for(auto& handler : ObjectTypeHandler::handlers) { - if(handler->objectType() == objectType) { - auto* ptr = handler->find(dev, handle, fwdID); - if(ptr) { - return ptr; - } - } + auto* h = ObjectTypeHandler::handler(objectType); + if(!h) { + return nullptr; + } + + auto* ptr = h->find(dev, handle, fwdID); + if(ptr) { + return ptr; } dlg_info("can't find handle {}, type {}", handle, vk::name(objectType)); diff --git a/src/handle.hpp b/src/handle.hpp index 3e48565..180fc4a 100644 --- a/src/handle.hpp +++ b/src/handle.hpp @@ -69,6 +69,8 @@ struct ResourceVisitor { virtual void visit(ShaderModule&) = 0; virtual void visit(AccelStruct&) = 0; virtual void visit(ShaderObject&) = 0; + virtual void visit(IndirectCommandsLayout& res) = 0; + virtual void visit(IndirectExecutionSet& res) = 0; }; template @@ -105,6 +107,8 @@ struct TemplateResourceVisitor : ResourceVisitor { void visit(ShaderModule& res) override { impl(res); } void visit(AccelStruct& res) override { impl(res); } void visit(ShaderObject& res) override { impl(res); } + void visit(IndirectCommandsLayout& res) override { impl(res); } + void visit(IndirectExecutionSet& res) override { impl(res); } }; struct ObjectTypeHandler { diff --git a/src/layer.cpp b/src/layer.cpp index 69447b4..2a73325 100644 --- a/src/layer.cpp +++ b/src/layer.cpp @@ -43,6 +43,10 @@ #include +#if !(defined(_WIN32) && (_MSC_VER >= 1900)) +extern "C" char** environ; +#endif + namespace vil { // Util @@ -160,6 +164,25 @@ VkResult VKAPI_PTR SetInstanceLoaderDataNOOP(VkInstance, void*) { return VK_SUCCESS; } +void printEnvironment() { + char** s; +#if defined(_WIN32) && (_MSC_VER >= 1900) + s = *__p__environ(); +#else + s = ::environ; +#endif + + dlg_trace("Creating instance. Environment:"); + for (; *s; s++) { + dlg_trace("{}", *s); + } + + // TODO WIP test for xkbcommon issues with wine +#ifndef _WIN32 + // setenv("XLOCALEDIR", "/usr/share/X11/locale", 0); +#endif // _WIN32 +} + // Instance VkResult doCreateInstance( const VkInstanceCreateInfo* ci, @@ -209,6 +232,8 @@ VkResult doCreateInstance( } #endif // DLG_DISABLE + printEnvironment(); + PFN_vkGetInstanceProcAddr fpGetInstanceProcAddr {}; VkLayerInstanceCreateInfo* mutLinkInfo {}; if(standalone) { @@ -446,6 +471,70 @@ VKAPI_ATTR void VKAPI_CALL DestroyInstance(VkInstance ini, const VkAllocationCal shutdownTracy(); } +VKAPI_ATTR VkResult VKAPI_CALL EnumerateDeviceExtensionProperties( + VkPhysicalDevice phdev, + const char* pLayerName, + uint32_t* pPropertyCount, + VkExtensionProperties* pProperties) { + dlg_assert(phdev); + + auto* ini = findData(phdev); + dlg_assert(ini); + + constexpr auto filterExtensions = true; + if (!filterExtensions) { + return ini->dispatch.EnumerateDeviceExtensionProperties(phdev, pLayerName, + pPropertyCount, pProperties); + } + + // TODO: always do this? + u32 count {}; + auto res = ini->dispatch.EnumerateDeviceExtensionProperties(phdev, + pLayerName, &count, nullptr); + if (res != VK_SUCCESS) { + return res; + } + + std::vector props(count); + res = ini->dispatch.EnumerateDeviceExtensionProperties(phdev, + pLayerName, &count, props.data()); + if (res != VK_SUCCESS) { + return res; + } + + auto checkSupport = [&](const VkExtensionProperties& ext) { + if (contains(supportedDevExts, ext.extensionName)) { + return false; + } + + if (contains(unsupportedDevExts, ext.extensionName)) { + dlg_info("Filtering out unsupported extension {}", ext.extensionName); + return true; + } + + dlg_info("Filtering out unknown extension {}", ext.extensionName); + return true; + }; + erase_if(props, checkSupport); + + res = VK_SUCCESS; + if (pPropertyCount && pProperties) { + auto count = props.size(); + if (*pPropertyCount < count) { + res = VK_INCOMPLETE; + count = *pPropertyCount; + } else { + *pPropertyCount = count; + } + + std::memcpy(pProperties, props.data(), sizeof(props[0]) * count); + } else if (pPropertyCount) { + *pPropertyCount = u32(props.size()); + } + + return res; +} + // tmp test void CmdCuLaunchKernelNVX( VkCommandBuffer commandBuffer, @@ -538,6 +627,7 @@ static const std::unordered_map funcPtrTable { VIL_DEV_HOOK(CreateDevice, VK_API_VERSION_1_0), VIL_DEV_HOOK(DestroyDevice, VK_API_VERSION_1_0), VIL_DEV_HOOK(DeviceWaitIdle, VK_API_VERSION_1_0), + VIL_DEV_HOOK(EnumerateDeviceExtensionProperties, VK_API_VERSION_1_0), // queue.hpp VIL_DEV_HOOK(QueueSubmit, VK_API_VERSION_1_0), @@ -1056,6 +1146,18 @@ static const std::unordered_map funcPtrTable { // VK_EXT_depth_bias_control VIL_DEV_HOOK_EXT(CmdSetDepthBias2EXT, VK_EXT_DEPTH_BIAS_CONTROL_EXTENSION_NAME), + // VK_EXT_descriptor_buffer + VIL_DEV_HOOK_EXT(GetDescriptorSetLayoutSizeEXT, VK_EXT_DESCRIPTOR_BUFFER_EXTENSION_NAME), + VIL_DEV_HOOK_EXT(GetDescriptorSetLayoutBindingOffsetEXT, VK_EXT_DESCRIPTOR_BUFFER_EXTENSION_NAME), + VIL_DEV_HOOK_EXT(GetDescriptorEXT, VK_EXT_DESCRIPTOR_BUFFER_EXTENSION_NAME), + // TODO: support opaque capture functions + VIL_DEV_HOOK_EXT(CmdBindDescriptorBuffersEXT, VK_EXT_DESCRIPTOR_BUFFER_EXTENSION_NAME), + VIL_DEV_HOOK_EXT(CmdSetDescriptorBufferOffsetsEXT, VK_EXT_DESCRIPTOR_BUFFER_EXTENSION_NAME), + VIL_DEV_HOOK_EXT(CmdBindDescriptorBufferEmbeddedSamplersEXT, VK_EXT_DESCRIPTOR_BUFFER_EXTENSION_NAME), + // TODO they also require vulkan1.4|maintenance6. Support expressions! + VIL_DEV_HOOK_EXT(CmdSetDescriptorBufferOffsets2EXT, VK_EXT_DESCRIPTOR_BUFFER_EXTENSION_NAME), + VIL_DEV_HOOK_EXT(CmdBindDescriptorBufferEmbeddedSamplers2EXT, VK_EXT_DESCRIPTOR_BUFFER_EXTENSION_NAME), + // For dlss testing. // VIL_DEV_HOOK_EXT(GetImageViewAddressNVX, VK_NVX_IMAGE_VIEW_HANDLE_EXTENSION_NAME), // VIL_DEV_HOOK_EXT(GetImageViewHandleNVX, VK_NVX_IMAGE_VIEW_HANDLE_EXTENSION_NAME), diff --git a/src/pipe.cpp b/src/pipe.cpp index 8534cf1..9c38b34 100644 --- a/src/pipe.cpp +++ b/src/pipe.cpp @@ -89,6 +89,10 @@ VKAPI_ATTR VkResult VKAPI_CALL CreateGraphicsPipelines( nci.basePipelineHandle = basePipe.handle; } + auto* flags2Info = findChainInfo(nci); + auto flags2 = flags2Info ? flags2Info->flags : u64(0u); + auto& pre = pres[i]; if(nci.renderPass) { @@ -102,6 +106,7 @@ VKAPI_ATTR VkResult VKAPI_CALL CreateGraphicsPipelines( // TODO: support it for non-multiview dynamic rendering auto useXfb = dev.transformFeedback && pre.rp && + !(flags2 & VK_PIPELINE_CREATE_2_INDIRECT_BINDABLE_BIT_EXT) && !hasChain(pre.rp->desc, VK_STRUCTURE_TYPE_RENDER_PASS_MULTIVIEW_CREATE_INFO); auto& stages = stagesVecs.emplace_back(); @@ -337,7 +342,7 @@ VKAPI_ATTR VkResult VKAPI_CALL CreateGraphicsPipelines( // might be set to invalid pointer. pipe.needsColorBlend = colorAttachmentCount != 0u && - !pci.pRasterizationState->rasterizerDiscardEnable; + pci.pRasterizationState && !pci.pRasterizationState->rasterizerDiscardEnable; if(pipe.needsColorBlend) { dlg_assert(pci.pColorBlendState); if(pci.pColorBlendState) { @@ -717,15 +722,16 @@ VKAPI_ATTR VkResult VKAPI_CALL CreateRayTracingPipelinesKHR( nci.flags |= VK_PIPELINE_CREATE_ALLOW_DERIVATIVES_BIT; } + VkResult res; { ZoneScopedN("dispatch"); - auto res = dev.dispatch.CreateRayTracingPipelinesKHR(dev.handle, + res = dev.dispatch.CreateRayTracingPipelinesKHR(dev.handle, deferredOperation, pipelineCache, createInfoCount, ncis.data(), pAllocator, pPipelines); if(res != VK_SUCCESS && res != VK_OPERATION_NOT_DEFERRED_KHR && res != VK_OPERATION_DEFERRED_KHR && - res != VK_PIPELINE_COMPILE_REQUIRED_EXT) { + res != VK_PIPELINE_COMPILE_REQUIRED) { dlg_trace("CreateRayTracingPipelinesKHR returned {} ({})", vk::name(res), res); return res; } @@ -761,10 +767,6 @@ VKAPI_ATTR VkResult VKAPI_CALL CreateRayTracingPipelinesKHR( } dlg_assert(dev.rtProps.shaderGroupHandleSize); - pipe.groupHandles.resize(dev.rtProps.shaderGroupHandleSize * ci.groupCount); - VK_CHECK(dev.dispatch.GetRayTracingShaderGroupHandlesKHR(dev.handle, - pipe.handle, 0u, ci.groupCount, pipe.groupHandles.size(), - pipe.groupHandles.data())); for(auto i = 0u; i < ci.groupCount; ++i) { auto& src = ci.pGroups[i]; @@ -783,7 +785,25 @@ VKAPI_ATTR VkResult VKAPI_CALL CreateRayTracingPipelinesKHR( dev.pipes.mustEmplace(pPipelines[i], std::move(newPipePtr)); } - return VK_SUCCESS; + return res; +} + +void ensureGroupHandles(RayTracingPipeline& pipe) { + // TODO: check and do if VK_EXT_pipeline_library_group_handles feature is enabled + // is a check like this still needed? + // if (!(ci.flags & VK_PIPELINE_CREATE_LIBRARY_BIT_KHR)) { + + auto& dev = *pipe.dev; + std::lock_guard lock(dev.mutex); + + if (!pipe.groups.empty()) { + return; + } + + pipe.groupHandles.resize(dev.rtProps.shaderGroupHandleSize * pipe.groups.size()); + VK_CHECK(dev.dispatch.GetRayTracingShaderGroupHandlesKHR(dev.handle, + pipe.handle, 0u, pipe.groups.size(), pipe.groupHandles.size(), + pipe.groupHandles.data())); } VKAPI_ATTR VkResult VKAPI_CALL GetRayTracingCaptureReplayShaderGroupHandlesKHR( diff --git a/src/pipe.hpp b/src/pipe.hpp index 3bdaf7b..68d0fdb 100644 --- a/src/pipe.hpp +++ b/src/pipe.hpp @@ -136,6 +136,8 @@ struct RayTracingPipeline : Pipeline { std::unique_ptr exts; // copied pnext chain }; +void ensureGroupHandles(RayTracingPipeline& pipe); + // API VKAPI_ATTR VkResult VKAPI_CALL CreateGraphicsPipelines( VkDevice device, diff --git a/src/serialize/handles.cpp b/src/serialize/handles.cpp index bba469a..56bc054 100644 --- a/src/serialize/handles.cpp +++ b/src/serialize/handles.cpp @@ -15,6 +15,7 @@ #include #include #include +#include #include namespace vil { diff --git a/src/shader.cpp b/src/shader.cpp index d7bd561..114939a 100644 --- a/src/shader.cpp +++ b/src/shader.cpp @@ -799,11 +799,14 @@ VKAPI_ATTR VkResult VKAPI_CALL CreateShadersEXT( for(auto i = 0u; i < createInfoCount; ++i) { dlg_assert(pShaders[i]); - auto& mod = dev.shaderObjects.add(pShaders[i]); + auto modPtr = IntrusivePtr(new ShaderObject()); + auto& mod = *modPtr; mod.dev = &dev; mod.handle = pShaders[i]; mod.dsLayouts = std::move(layoutVecs[i]); + pShaders[i] = castDispatch(mod); + dev.shaderObjects.mustEmplace(pShaders[i], std::move(modPtr)); } return VK_SUCCESS; @@ -819,7 +822,7 @@ VKAPI_ATTR void VKAPI_CALL DestroyShaderEXT( auto shaderPtr = mustMoveUnset(device, shader); shaderPtr->dev->dispatch.DestroyShaderEXT(shaderPtr->dev->handle, - shaderPtr->handle, pAllocator); + shader, pAllocator); } VKAPI_ATTR VkResult VKAPI_CALL GetShaderBinaryDataEXT( diff --git a/src/util/patch.cpp b/src/util/patch.cpp index 1fe9111..32d5a33 100644 --- a/src/util/patch.cpp +++ b/src/util/patch.cpp @@ -1425,6 +1425,8 @@ vku::Pipeline createPatchCopy(const RayTracingPipeline& src, groupMapping.ensure(dev, handleSize * src.groups.size() * 2, usage, {}, "patchedShaderTable"); + ensureGroupHandles(const_cast(src)); + // we sort the mappings by key lexicographical order // that makes patching in the shader later much more efficient // see shaderTable.comp