Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about panic of TA2 in TA1-TA2 connection. #6701

Closed
ydonghyuk opened this issue Feb 21, 2024 · 10 comments
Closed

Question about panic of TA2 in TA1-TA2 connection. #6701

ydonghyuk opened this issue Feb 21, 2024 · 10 comments

Comments

@ydonghyuk
Copy link

ydonghyuk commented Feb 21, 2024

  • Testing process
    1/ TA1 -> TA2 open session (TA_FLAGS: single instance, multi session flag)
    2/ Force TEE_Panic on TA2
    3/ TA1->TA2 invoke command (TEE_InvokeTACommand)

  • Result
    No response from TA1.

  • Additional test
    TA panic situation when connecting CA-TA through the same process

  • Result
    When invoke command is performed from CA to TA, TEE_ERROR_TARGET_DEAD is immediately returned.

  • Inquiry
    Is there any way for TA1 to know if TA2 is experiencing a panic or abnormal termination?
    Also, in this situation, would it be meaningful to use the cancellationRequestTimeout parameter of TEE_InvokeTACommand?

@etienne-lms
Copy link
Contributor

Testing process
1/ TA1 -> TA2 open session (TA_FLAGS: single instance, multi session flag)
2/ Force TEE_Panic on TA2
3/ TA1->TA2 invoke command (TEE_InvokeTACommand)

I would guess at stage 3 TEE_InvokeTACommand() (from TA1) returns TEE_ERROR_TARGET_DEAD. This is how TA1 knows TA2 is dead.

Also, in this situation, would it be meaningful to use the cancellationRequestTimeout parameter of TEE_InvokeTACommand

There is no API function for a TA to request cancellation of a request it has sent to another TA. Actually there can't be as the client TA is still busy waiting its request to the other TA completes.

@ydonghyuk
Copy link
Author

ydonghyuk commented Feb 21, 2024

Before step 3, TA2 has already panicked.
So I also expected TEE_InvokeTACommand in step 3 to return TEE_ERROR_TARGET_DEAD.
However, it seems to be stuck in an infinite wait state with a page abort log as shown below.

E/TC:1 0 Core data-abort at address 0x10 (translation fault)
E/TC:1 0 esr 0x96000006 ttbr0 0x400000e1c4020 ttbr1 0x00000000 cidr 0x0
E/TC:1 0 cpu #1 cpsr 0x60000004
E/TC:1 0 x0 0000000000000000 x1 0000000000000000
E/TC:1 0 x2 0000000000000000 x3 0000000000000000
E/TC:1 0 x4 0000000000000000 x5 00000000ad37bef8
E/TC:1 0 x6 0000000000000000 x7 000000004007ff80
E/TC:1 0 x8 00000000ad397638 x9 00000000ad3dee30
E/TC:1 0 x10 00000000600002bd x11 0000000000000000
E/TC:1 0 x12 00000000300003eb x13 000000004001bde3
E/TC:1 0 x14 0000000020000000 x15 00000000dfffffff
E/TC:1 0 x16 00000000ad33eaf4 x17 00000000900003eb
E/TC:1 0 x18 000000001000000a x19 0000000000000010
E/TC:1 0 x20 00000000ad3dedc8 x21 000000004001c298
E/TC:1 0 x22 00000000ad3ded60 x23 00000000ad396000
E/TC:1 0 x24 00000000ad3ded00 x25 00000000ad3c3ba0
E/TC:1 0 x26 00000000ad3c3b50 x27 0000000000000065
E/TC:1 0 x28 00000000ad3c2310 x29 00000000ad3debe0
E/TC:1 0 x30 00000000ad33e268 elr 00000000ad332310
E/TC:1 0 sp_el0 00000000ad3debe0
E/TC:1 0 TEE load address @ 0xad316000
E/TC:1 0 Call stack:
E/TC:1 0 0xad332310
E/TC:1 0 0xad33ebe4
E/TC:1 0 0xad31b874
E/TC:1 0 0xad31b5c4
E/TC:1 0 0xad31a280
E/TC:1 0 Panic 'unhandled pageable abort' at core/arch/arm/kernel/abort.c:580 <abort_handler>
E/TC:1 0 TEE load address @ 0xad316000
E/TC:1 0 Call stack:
E/TC:1 0 0xad31e5c4
E/TC:1 0 0xad32bdc0
E/TC:1 0 0xad31d070
E/TC:1 0 0xad31a3b0

In fact, in the case of CA-TA, TEEC_InvokeCommand returns TEE_ERROR_TARGET_DEAD as expected.

Please help if possible.
thank you

@etienne-lms
Copy link
Contributor

etienne-lms commented Feb 21, 2024

step 1: TA1 invokes TA2 using calls TEE_InvokeTAComment().
step 2: TA2 panics
end of step 2: I expect TA1 gets return code TEE_ERROR_TARGET_DEAD from TEE_InvokeTAComment().

Is that the scenario your are testing?

If so, TA1 is not expected to call TEE_InvokeTAComment() again in step 3 since the related session was closed when TA2 died. That said, TEE core should not panic in such case, this is not good. I'm a bit surprised it does. I'll try this with Qemu.

(edited: add few missing words)

@etienne-lms
Copy link
Contributor

I've setup a quick test with OP-TEE qemu_virt env, playing and hacking with xtest regression_1016 and os_test TA and I see that OP-TEE behaves as expected:

  1. xtest opens a session to TA1 and invokes TA1 that in turn opens a session to TA2
    (note: xtest is of corse a CA: a non-secure client application)
    (note: TA1 keeps its session to TA2 opened so that it can use it later)
    (note: TA2 is single instance/multi session, so that it can be invoked from both xtest and TA1)
  2. xtest invokes TA1 that invokes TA2 (using the already opened session) for some dummy processing: all good.
  3. xtest invokes TA2 to make TA2 instance to panic: TA2 panics.
  4. xtest invokes TA1 that invokes TA2 (using the already opened session): TA1 is informed that TA2 is dead.
    (TEE core behaves as expected and does not panic, only TA2 has panicked where expected)

I've tested this using latest OP-TEE (4.10.) but I strong think this behavior can be reproduced on OP-TEE tags 3.1x.0 and 3.2x.0.

@etienne-lms
Copy link
Contributor

My bad! I've run my test on OP-TEE 3.19.0 and... indeed OP-TEE OS crashes when TA1 invokes TA2 after CA has made TA2 to panic. TEE core trace message looks a lot like the trace message you found:

E/TC:0 0 
E/TC:0 0 Core data-abort at address 0x10 (translation fault)
E/TC:0 0  fsr 0x00000005  ttbr0 0x0e1b786a  ttbr1 0x0e1b006a  cidr 0x4
E/TC:0 0  cpu #0          cpsr 0xa0000133
E/TC:0 0  r0 0x00000000      r4 0x00000000    r8 0x93868c90   r12 0x00272104
E/TC:0 0  r1 0x00000000      r5 0x00116e08    r9 0x937f4fdb    sp 0x93868b20
E/TC:0 0  r2 0xffffffff      r6 0x00000001   r10 0x00000000    lr 0x937f489b
E/TC:0 0  r3 0x00000000      r7 0x93868b20   r11 0x00000000    pc 0x937be5e0
E/TC:0 0 TEE load address @ 0x937aa000
E/TC:0 0 Call stack:
E/TC:0 0  0x937be5e0
E/TC:0 0  0x937f489b
E/TC:0 0  0x937f50eb
E/TC:0 0  0x937b1a48

I have not bisect to find where this situation got fixed between 3.19.0 tag and 4.1.0 tag.

@etienne-lms
Copy link
Contributor

etienne-lms commented Feb 21, 2024

After some investigation, the issue you face is related to #6226 (that issue deals with a deadlock case and mentions the core crash you experienced). The issue was addressed though #6281 (commit c10e3fa and neighbors) with an added fixup (#6378 / commit 0a75d40). A regression test was added to cover the case, see OP-TEE/optee_test#693).

(edited)

@ydonghyuk
Copy link
Author

@etienne-lms

Thank you so much for your help.
I will check and comment.

@ydonghyuk
Copy link
Author

ydonghyuk commented Feb 28, 2024

@etienne-lms

I applied the 4 commits below. Is this correct?
0a75d40 core: fix data abort during ftrace
c10e3fa core: fix race in handling TA panic
5a5d117 core: add release_state to struct ts_ops
1a60437 core: vm_info_final(): clear vm_info.asid only

After applying the patch, it was confirmed that it operates as follows.
When calling TEE_InvokeTACommand in TA1, TEEC_ERROR_TARGET_DEAD error is returned without stuck phenomenon.

@etienne-lms
Copy link
Contributor

Yes, I think you made it right. I think there is a conflict to resolve but IMHO it's quite obvious to sort it out.

@ydonghyuk
Copy link
Author

I applied the above 4 patches on optee 3.22,
It was applied well without any conflict issues.
Thank you so much for your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants