Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix occasional GPU crash when a smaller descriptor set replaces a larger one. #2355

Merged
merged 1 commit into from
Sep 28, 2024

Conversation

billhollings
Copy link
Contributor

If a descriptor set with fewer than 64 descriptors replaced one with more, the resources it contained were not made resident to the GPU, due to MVKBitArray thrashing incorrectly when transitioning between static and dynamic memory allocations.

  • Overhaul MVKBitArray design:
    • Never downsize memory allocation unless reset() is called.
    • No longer support static allocation for smaller sizes, to avoid thrashing between dynamic and static allocs when frequently resizing up and down.
    • Use realloc() instead of malloc/copy to improve performance.
    • Simplify tracking of partially and fully disabled sections.
    • Rename setXX() and clearXX() functions to enableXX() & disableXX().
    • Rename several internal functions.
    • getIndexOfFirstEnabledBit() & enumerateEnabledBits() no longer have an option to disable the bit.
  • MVKDescriptorPool track highest descriptor set allocated, instead of querying MVKBitArray.

Fixes part of issue #2271.

…ger one.

If a descriptor set with fewer than 64 descriptors replaced one with more,
the resources it contained were not made resident to the GPU, due to
MVKBitArray thrashing incorrectly when transitioning between static and
dynamic memory allocations.

- Overhaul MVKBitArray design:
  - Never downsize memory allocation unless reset() is called.
  - No longer support static allocation for smaller sizes, to avoid thrashing
    between dynamic and static allocs when frequently resizing up and down.
  - Use realloc() instead of malloc/copy to improve performance.
  - Simplify tracking of partially and fully disabled sections.
  - Rename setXX() and clearXX() functions to enableXX() & disableXX().
  - Rename several internal functions.
  - getIndexOfFirstEnabledBit() & enumerateEnabledBits()
	no longer have an option to disable the bit.
- MVKDescriptorPool track highest descriptor set allocated,
  instead of querying MVKBitArray.
@billhollings billhollings merged commit 2307b08 into KhronosGroup:main Sep 28, 2024
6 checks passed
@billhollings billhollings deleted the mvkbitarray-redesign branch September 28, 2024 18:08
@giomasce
Copy link
Contributor

Unfortunately it seems that this PR introduced a regression in MVKBitArray. Specifically, running test test_map_placed_resources() from tests/d3d12.c from current vkd3d crashes when extension VK_KHR_push_descriptor is disabled (but VK_EXT_descriptor_indexing is enabled). The crash doesn't happen if I rollback MoltenVK to this PR's parent commit.

I could capture this stack trace at the crash

(lldb) bt
* thread #1, stop reason = ESR_EC_DABORT_EL0 (fault address: 0x0)
  * frame #0: 0x000000010280c044 libMoltenVK.dylib`MVKBitArray::getIndexOfFirstEnabledBit(this=0x0000000129115fc0, startIndex=0) at MVKBitArray.h:107:83
    frame #1: 0x0000000102804724 libMoltenVK.dylib`MVKDescriptorTypePool<MVKUniformBufferDescriptor>::allocateDescriptor(this=0x0000000129115fa0, descType=VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER, pMVKDesc=0x000000016fc42cb0, dynamicAllocation=0x000000016fc42caf, pool=0x0000000129115e00) at MVKDescriptorSet.mm:637:38
    frame #2: 0x0000000102802fb4 libMoltenVK.dylib`MVKDescriptorPool::allocateDescriptor(this=0x0000000129115e00, descriptorType=VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER, pMVKDesc=0x000000016fc42cb0, dynamicAllocation=0x000000016fc42caf) at MVKDescriptorSet.mm:850:37
    frame #3: 0x0000000102802c44 libMoltenVK.dylib`MVKDescriptorSet::allocate(this=0x0000000118048000, layout=0x0000600002f046c0, variableDescriptorCount=0, mtlArgBufferOffset=0, mtlArgEnc=0x0000000000000000) at MVKDescriptorSet.mm:556:34
    frame #4: 0x000000010281abd4 libMoltenVK.dylib`MVKDescriptorPool::allocateDescriptorSet(MVKDescriptorSetLayout*, unsigned int, VkDescriptorSet_T**)::$_2::operator()(this=0x0000600003911098, dsIdx=0) const at MVKDescriptorSet.mm:777:18
    frame #5: 0x000000010281aa08 libMoltenVK.dylib`decltype(std::declval<MVKDescriptorPool::allocateDescriptorSet(MVKDescriptorSetLayout*, unsigned int, VkDescriptorSet_T**)::$_2&>()(std::declval<unsigned long>())) std::__1::__invoke[abi:ue170006]<MVKDescriptorPool::allocateDescriptorSet(MVKDescriptorSetLayout*, unsigned int, VkDescriptorSet_T**)::$_2&, unsigned long>(__f=0x0000600003911098, __args=0x000000016fc42e50) at invoke.h:340:25
    frame #6: 0x000000010281a9b0 libMoltenVK.dylib`bool std::__1::__invoke_void_return_wrapper<bool, false>::__call[abi:ue170006]<MVKDescriptorPool::allocateDescriptorSet(MVKDescriptorSetLayout*, unsigned int, VkDescriptorSet_T**)::$_2&, unsigned long>(__args=0x0000600003911098, __args=0x000000016fc42e50) at invoke.h:407:12
    frame #7: 0x000000010281a980 libMoltenVK.dylib`std::__1::__function::__alloc_func<MVKDescriptorPool::allocateDescriptorSet(MVKDescriptorSetLayout*, unsigned int, VkDescriptorSet_T**)::$_2, std::__1::allocator<MVKDescriptorPool::allocateDescriptorSet(MVKDescriptorSetLayout*, unsigned int, VkDescriptorSet_T**)::$_2>, bool (unsigned long)>::operator()[abi:ue170006](this=0x0000600003911098, __arg=0x000000016fc42e50) at function.h:193:16
    frame #8: 0x0000000102819954 libMoltenVK.dylib`std::__1::__function::__func<MVKDescriptorPool::allocateDescriptorSet(MVKDescriptorSetLayout*, unsigned int, VkDescriptorSet_T**)::$_2, std::__1::allocator<MVKDescriptorPool::allocateDescriptorSet(MVKDescriptorSetLayout*, unsigned int, VkDescriptorSet_T**)::$_2>, bool (unsigned long)>::operator()(this=0x0000600003911090, __arg=0x000000016fc42e50) at function.h:364:12
    frame #9: 0x000000010280c198 libMoltenVK.dylib`std::__1::__function::__value_func<bool (unsigned long)>::operator()[abi:ue170006](this=0x000000016fc42f98, __args=0x000000016fc42e50) const at function.h:518:16
    frame #10: 0x000000010280c0c0 libMoltenVK.dylib`std::__1::function<bool (unsigned long)>::operator()(this=0x000000016fc42f98, __arg=0) const at function.h:1169:12
    frame #11: 0x0000000102804170 libMoltenVK.dylib`MVKBitArray::enumerateEnabledBits(this=0x0000000129115e40, func=function<bool (unsigned long)> @ 0x000000016fc42f98) at MVKBitArray.h:125:10
    frame #12: 0x0000000102803fa8 libMoltenVK.dylib`MVKDescriptorPool::allocateDescriptorSet(this=0x0000000129115e00, mvkDSL=0x0000600002f046c0, variableDescriptorCount=0, pVKDS=0x000000016fc430d0) at MVKDescriptorSet.mm:749:30
    frame #13: 0x0000000102803d34 libMoltenVK.dylib`MVKDescriptorPool::allocateDescriptorSets(this=0x0000000129115e00, pAllocateInfo=0x000000016fc430d8, pDescriptorSets=0x000000016fc430d0) at MVKDescriptorSet.mm:713:21
    frame #14: 0x000000010284dad4 libMoltenVK.dylib`vkAllocateDescriptorSets(device=0x0000000128835418, pAllocateInfo=0x000000016fc430d8, pDescriptorSets=0x000000016fc430d0) at vulkan.mm:1235:25
    frame #15: 0x000000010073f6fc libvkd3d.1.dylib`d3d12_command_allocator_allocate_descriptor_set + 156
    frame #16: 0x000000010073f208 libvkd3d.1.dylib`d3d12_command_list_prepare_descriptors + 152
    frame #17: 0x0000000100741040 libvkd3d.1.dylib`d3d12_command_list_set_root_descriptor + 476
    frame #18: 0x00000001001bfaf4 d3d12`vkd3d_test_main + 119964
    frame #19: 0x00000001001a25cc d3d12`main + 280
    frame #20: 0x000000018ac6f154 dyld`start + 2476

I don't know whether this is useful. Tomorrow I can try to reduce the crashing program to a pure Vulkan MWE.

@giomasce
Copy link
Contributor

Ok, I debugged it directly and submitted #2381 to fix this.

@billhollings
Copy link
Contributor Author

Ok, I debugged it directly and submitted #2381 to fix this.

Thanks for your debugging effort, and for the fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants