Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sporadic ApertureScatterguard TimeoutErrors reported #549

Open
rtuck99 opened this issue Oct 3, 2024 · 4 comments
Open

Sporadic ApertureScatterguard TimeoutErrors reported #549

rtuck99 opened this issue Oct 3, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@rtuck99
Copy link
Contributor

rtuck99 commented Oct 3, 2024

2 occasions where ApertureScaterguard has reported a TimeoutError on i03

occurrence-2024-10-03.txt
occurrence-2024-10-02.txt

From the stack trace, this line in aperturescatterguard.py is failing
await self._safe_move_within_datacollection_range(value.location)

_safe_move_within_datacollection_range() fails here
ap_z_in_position = await self.aperture.z.motor_done_move.get_value()
However, from the logs it seems that the attenuator settings transmission although it reports a timeout error, did not have any elapsed time. Instead it seems that the concurrent aperturescatterguard movement is cancelled, and this causes a TimeoutError to be raised.

However it's not obvious why the aperture scatterguard set operation would be cancelled.

Steps To Reproduce

This issue occurs sporadically ~1 x per day ?

Acceptance Criteria

  • Specific criteria that will be used to judge if the issue is fixed
    Issue does not recur
@rtuck99 rtuck99 added the bug Something isn't working label Oct 3, 2024
@DominicOram
Copy link
Contributor

@coretl any idea why we might be seeing this? Seems to be an issue with the read getting cancelled sporadically on BL03I-MO-MAPT-01:Z.DMOV

@DominicOram
Copy link
Contributor

We removed the check on BL03I-MO-MAPT-01:Z.DMOV but this error then occurred with the next PV in the set aperture BL03I-MO-MAPT-01:Z.RBV

@DominicOram
Copy link
Contributor

This error is also suspiciously similar to that in DiamondLightSource/dodal#791, maybe the context branch in aoica will fix this too?

@coretl
Copy link
Contributor

coretl commented Oct 7, 2024

The Cancelled error is a red herring, aioca.caget is being called with timeout=None, then being wrapped with asyncio.wait_for(timeout=10) insider ophyd-async to add a default timeout to it.

It is suspicious that 2 PVs on the same motor are timing out. Has the motor just been moving? Was it using caput-callback? Is the motor ever moved from ophyd as well as ophyd-async? What happens if you check a completely unrelated PV just before BL03I-MO-MAPT-01:Z.{something}?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Backlog
Development

No branches or pull requests

3 participants