Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Job in RECOVERING_JOBS group does not appear to respect DisallowConcurrentExecution annotation #1097

Open
luke-marrs opened this issue Jan 31, 2024 · 5 comments
Labels
is:enhancement Enhancement to an existing feature needs:review Needs review / investigation

Comments

@luke-marrs
Copy link

Quartz version: 2.3.2

I'm not sure if this is a bug or not: I have a job that is is annotated with @DisallowConcurrentExecution, but during a particular scenario, I found that execution of that job was attempted concurrently.

Here's my scenario:

  1. We have quartz set up on three nodes with a thread pool size of two worker threads per node. At this time I don't think the multi-node setup contributed to this issue.
  2. During a particular event, over 200 triggers fired for job - all had the same job group, but all had different job names. Since each job instance takes multiple minutes, the vast majority of the jobs started misfiring every minute.
  3. For some reason (not yet sure why), one of the job instances (name: 17) needed to be recovered. However, it also fired "normally" after misfiring for a few minutes on a different node.
  4. So, job 17 started in "myGroup", then also started in the "RECOVERING_JOBS" group during the same time period. From taking a peek at the code that checks isConcurrentExectionDisallowed in the JobStoreSupport class, it appears that a second instance of a job won't be started only if the job key matches - that is, the job must have the same group and name. But in this case, the triggered job instance is in the "RECOVERING_JOBS" group, while the executing job instance is in "myGroup". Here's what the logs look like for this scenario. Notice the overlap for job name 17 starting at 16:56:57.
16:43:49 [3_Worker-2] Trigger [group: myGroup, name: 17] fired for job [group: myGroup, name: 17]
16:56:57 [2_Worker-2] Trigger [group: RECOVERING_JOBS, name: recover_***] fired for job [group: myGroup, name: 17]
16:58:14 [3_Worker-2] Trigger [group: myGroup, name: 17] completed for job [group: myGroup, name: 17]
16:58:34 [2_Worker-2] Trigger [group: RECOVERING_JOBS, name: recover_***] completed for job [group: myGroup, name: 17]

Is this intentional behavior? Should the original job group & name be checked instead when a job is in the recovery group and isConcurrentExectionDisallowed is true?

@jhouserizer
Copy link
Contributor

This is a known limitation of job recovery, it could certainly be improved.

@jhouserizer jhouserizer added is:enhancement Enhancement to an existing feature needs:review Needs review / investigation labels Oct 14, 2024
@luke-marrs
Copy link
Author

This is a known limitation of job recovery, it could certainly be improved.

Thank you for the confirmation. We have worked around this for now by disabling recovery for jobs that can't have concurrent execution.

@rkorpu01
Copy link

@jhouserizer Is there any plans to fix this issue.

@jhouserizer
Copy link
Contributor

No immediate plan/priority for it, no - this limitation has existed for about 2 decades now. PRs welcome from anyone confident about a quality solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
is:enhancement Enhancement to an existing feature needs:review Needs review / investigation
Projects
None yet
Development

No branches or pull requests

4 participants