Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tolerance parameter to events_from_annotations to #12321

Closed
rcmdnk opened this issue Dec 22, 2023 · 5 comments · Fixed by #12324
Closed

Add tolerance parameter to events_from_annotations to #12321

rcmdnk opened this issue Dec 22, 2023 · 5 comments · Fixed by #12324
Labels

Comments

@rcmdnk
Copy link
Contributor

rcmdnk commented Dec 22, 2023

Describe the new feature or enhancement

Add tol parameter to annotations_from_events to avoid omitting epochs by rounding errors.

Describe your proposed implementation

In the function named events_from_annotations, there is a section of code as follows:

for annot in annotations[event_sel]:
      annot_offset = annot["onset"] + annot["duration"]
      _onsets = np.arange(
          start=annot["onset"], stop=annot_offset, step=chunk_duration
      )
      good_events = annot_offset - _onsets >= chunk_duration 
      if good_events.any():

However, if we consider values like

annot['onset']=32760.12
annot['duration']=30.0
chunk_duration=30.0

In theory, this region should be obtained as a 30-second epoch, but due to rounding errors, it results in:

>>> annot_offset = annot['onset'] + annot['duration']
>>> print(annot_offset - np.arange(annot['onset'], annot_offset,  30.0) - 30.0)
array([-3.63797881e-12])

This causes good_events to contain False, and thus, this epoch is not captured.

However, in reality, this interval's data should be fully treated as an epoch. Therefore, I'm considering introducing a tolerance and modifying the code like this:

good_events = annot_offset - _onsets >= chunk_duration + tol

By adding a small tol, we can avoid the situation mentioned above. If tol is sufficiently smaller than the frequency, it should not erroneously include intervals that are not supposed to be selected.
This is safe as long as use_rounding = True.

If use_rounding = False, the first data of the epoch could be overlapped with the previous epoch. (if tol = 0, such an epoch is not created.)
But even this case, I think it is better to keep the epoch.

By setting the default value of tol to 0, we can maintain the same results as currently obtained, as long as it is used in the same way as now.

Describe possible alternatives

Currently, I am using Annotations that are precisely segmented into 30-second intervals. By setting chunk_duration = 0, I can create indexes without needing to calculate the duration.

This approach ensures that all epochs are included in the output data.

However, sometimes the annotations have durations that are multiple times the length of an epoch. In these cases, we need to modify the annotations before applying them to the Raw data.

Additional context

No response

@rcmdnk rcmdnk added the ENH label Dec 22, 2023
@drammock
Copy link
Member

Can you look at PR #12311 to see if it makes this no longer a problem?

@rcmdnk
Copy link
Contributor Author

rcmdnk commented Dec 23, 2023

I like #12311 update but the problem is different from this PR.

#12311 is useful if the annotations are precisely segmented into epoch length,
or if the annotations are pointing data that you just want data around the onset.

But this can not be used if the annotations have duration of epoch length x N (N>=1)
and users want to divide them into epoch unit.
For example, if I have an sleep annotations like:

onset duration stage
0 30 W
0 30 1
0 60 2
0 30 3

In this case I want epochs of 30sec(W), 30sec(1), 30sec(2), 30sec(2), 30sec(3).

This is possible with the traditional method:

from mne import create_info, Epochs
from mne.io import RawArray
from mne.annotations import Annotations, events_from_annotations, _handle_meas_date
import numpy as np

info = create_info(ch_names=1, sfreq=100.0)
raw = RawArray(data=np.empty((1, 15000)), info=info, first_samp=0)
meas_date = _handle_meas_date(0)
with raw.info._unlock(check_after=True):
    raw.info["meas_date"] = meas_date
event_id = {"0": 0, "1": 1, "2": 2, "3": 3}
annot = Annotations([0, 30, 60, 120], [30.0, 30.0, 60.0, 30], ["0", "1", "2", "3"], 0)
raw.set_annotations(annot)

events, _ = events_from_annotations(
    raw,
    event_id=event_id,
    chunk_duration=30.0,
    use_rounding=True,
)

epochs = Epochs(
    raw=raw,
    events=events,
    event_id=event_id,
)
print(epochs.events)
[[    0     0     0]
 [ 3000     0     1]
 [ 6000     0     2]
 [ 9000     0     2]
 [12000     0     3]]

So, chunk_duration for events_from_annotations is necessary.

But if I make Epochs from raw directly with #12311 code,

from mne import create_info, Epochs
from mne.io import RawArray
from mne.annotations import Annotations, events_from_annotations, _handle_meas_date
import numpy as np

info = create_info(ch_names=1, sfreq=100.0)
raw = RawArray(data=np.empty((1, 15000)), info=info, first_samp=0)
meas_date = _handle_meas_date(0)
with raw.info._unlock(check_after=True):
    raw.info["meas_date"] = meas_date
event_id = {"0": 0, "1": 1, "2": 2, "3": 3}
annot = Annotations([0, 30, 60, 120], [30.0, 30.0, 60.0, 30], ["0", "1", "2", "3"], 0)
raw.set_annotations(annot)

epochs = Epochs(raw)
print(epochs.events)
[[    0     0     1]
 [ 3000     0     2]
 [ 6000     0     3]
 [12000     0     4]]

so, the 4th epoch is skipped.

@rcmdnk
Copy link
Contributor Author

rcmdnk commented Dec 25, 2023

I'm ready to make PR if it seems useful.

@agramfort
Copy link
Member

I think it's a useful PR. I would even consider tol to be by default eg some eps * sfreq. I guess it would not change the current behavior and would fix your issue.

thx @rcmdnk for the meticulous investigation !

@rcmdnk
Copy link
Contributor Author

rcmdnk commented Dec 25, 2023

@agramfort
Thanks for the comment.
I made PR #12324
In the PR, the default value is set as zero
and I added a comment about the default value for the discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants