Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RKNPU 0.9.2 Driver has Spinlock Recursion Bug for RK3588 #329

Open
muhammedKocoglu opened this issue May 14, 2024 · 0 comments
Open

RKNPU 0.9.2 Driver has Spinlock Recursion Bug for RK3588 #329

muhammedKocoglu opened this issue May 14, 2024 · 0 comments

Comments

@muhammedKocoglu
Copy link

Hello,

We are developing with RK3588 and tried yolov7 model with RKNN Api version 1.5.2 and the Kernel version is 5.10.
With RKNPU 0.9.2 the kernel crash in the log file occurs (BUG: spinlock recursion on CPU#0) if the NPU core selection is auto mod.

RK3588-SpinlockRecursion.log

The usage scenario is below.

In our 4-threads scenario we have 4 different rknn contexts, the device is crashing in the default auto core selection mode. Our tests:

  • Calling rknn_set_core_mask(ctx, RKNN_NPU_CORE_0_1_2) in all threads led to kernel crash
  • Calling rknn_set_core_mask(ctx, RKNN_NPU_CORE_0) in 1st and 2nd thread, calling rknn_set_core_mask(ctx, RKNN_NPU_CORE_1) in 3rd thread and calling rknn_set_core_mask(ctx, RKNN_NPU_CORE_2) in 4th thread; led to kernel crash
  • Calling rknn_set_core_mask(ctx, RKNN_NPU_CORE_0) in all threads worked but the inference time notably increased

When RKNPU driver is downgraded to 0.9.1, the problem is solved. Please, fix the current version, 0.9.2.

Regards,
Muhammed
muhammed@dtsis.com

BUG: spinlock recursion on CPU#0, InferencerDTSIS/812
[ 275.386054] lock: 0xffffff81054490b8, .magic: dead4ead, .owner: InferencerDTSIS/812, .owner_cpu: 0
[ 275.386063] CPU: 0 PID: 812 Comm: InferencerDTSIS Tainted: G O 5.10.160 #1
[ 275.386069] Hardware name: FriendlyElec NanoPi R6C (DT)
[ 275.386075] Call trace:
[ 275.386086] dump_backtrace+0x0/0x1d0
[ 275.386093] show_stack+0x1c/0x24
[ 275.386101] dump_stack_lvl+0xc8/0xec
[ 275.386108] dump_stack+0x14/0x50
[ 275.386115] spin_dump+0x98/0xa8
[ 275.386123] do_raw_spin_lock+0x114/0x11c
[ 275.386130] _raw_spin_lock+0x14/0x20
[ 275.386137] rknpu_job_commit+0x1d8/0x2d0
[ 275.386143] rknpu_job_next.part.0+0xdc/0x124
[ 275.386150] rknpu_irq_handler.constprop.0+0x264/0x360
[ 275.386156] rknpu_core1_irq_handler+0x1c/0x2c
[ 275.386164] __handle_irq_event_percpu+0x60/0x220
[ 275.386170] handle_irq_event+0x68/0x150
[ 275.386177] handle_fasteoi_irq+0xac/0x1d0
[ 275.386183] __handle_domain_irq+0x78/0xe0
[ 275.386190] gic_handle_irq+0xc0/0x144
[ 275.386196] el1_irq+0xcc/0x180
[ 275.386202] rknpu_job_subcore_commit_pc.isra.0+0x110/0x214
[ 275.386208] rknpu_job_commit+0xd0/0x2d0
[ 275.386214] rknpu_job_next.part.0+0xdc/0x124
[ 275.386220] rknpu_job_schedule+0x22c/0x27c
[ 275.386226] rknpu_submit_ioctl+0x244/0x85c
[ 275.386234] __rknpu_submit_ioctl+0x44/0x84
[ 275.386242] drm_ioctl_kernel+0xb4/0x100
[ 275.386249] drm_ioctl+0x28c/0x520
[ 275.386257] __arm64_sys_ioctl+0xa8/0xf0
[ 275.386264] el0_svc_common.constprop.0+0x64/0x140
[ 275.386271] do_el0_svc+0x24/0x30
[ 275.386278] el0_svc+0x1c/0x2c
[ 275.386285] el0_sync_handler+0x9c/0x120
[ 275.386291] el0_sync+0x15c/0x180
[ 275.734858] rockchip-spi feb20000.spi: RK SPI transfer timed out
[ 275.734885] rk806 spi2.0: SPI transfer failed: -110
[ 275.734899] rockchip-spi feb20000.spi: state=0
[ 275.734908] rockchip-spi feb20000.spi: tx_left=0
[ 275.734916] rockchip-spi feb20000.spi: rx_left=3
[ 275.734930] regs 00000000: 00002c01 00000002 00000001 00000000
[ 275.734942] regs 00000010: 000000c6 00000000 00000002 00000000
[ 275.734952] regs 00000020: 00000003 00000044 00000000 00000010
[ 275.734963] regs 00000030: 00000010 00000091 00000091 00000000
[ 275.734973] regs 00000040: 0000001f 00000000 00110002
[ 275.734989] spi_master spi2: failed to transfer one message from queue
[ 275.938203] rockchip-spi feb20000.spi: RK SPI transfer timed out
[ 275.938225] rk806 spi2.0: SPI transfer failed: -110
[ 275.938238] rockchip-spi feb20000.spi: state=0
[ 275.938247] rockchip-spi feb20000.spi: tx_left=0
[ 275.938256] rockchip-spi feb20000.spi: rx_left=4
[ 275.938269] regs 00000000: 00002c01 00000003 00000001 00000000
[ 275.938281] regs 00000010: 000000c6 00000000 00000003 00000000
[ 275.938291] regs 00000020: 00000004 00000044 00000000 00000010
[ 275.938301] regs 00000030: 00000010 00000091 00000091 00000000
[ 275.938312] regs 00000040: 0000001f 00000003 00110002
[ 275.938327] spi_master spi2: failed to transfer one message from queue
[ 276.141517] rockchip-spi feb20000.spi: RK SPI transfer timed out
[ 276.141545] rk806 spi2.0: SPI transfer failed: -110
[ 276.141557] rockchip-spi feb20000.spi: state=0
[ 276.141566] rockchip-spi feb20000.spi: tx_left=0
[ 276.141575] rockchip-spi feb20000.spi: rx_left=3
[ 276.141589] regs 00000000: 00002c01 00000002 00000001 00000000
[ 276.141601] regs 00000010: 000000c6 00000000 00000002 00000000
[ 276.141611] regs 00000020: 00000003 00000044 00000000 00000010
[ 276.141621] regs 00000030: 00000010 00000091 00000091 00000000
[ 276.141631] regs 00000040: 0000001f 00000000 00110002

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant