Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test concurrent submissions that affect same entity #1202

Merged
merged 3 commits into from
Oct 3, 2024

Conversation

matthew-white
Copy link
Member

This PR adds a test for getodk/central#705. It runs the race condition described in the issue many times, asserting that it is successful each time.

Why is this the best possible solution? Were any other approaches considered?

The test in #1033 takes a more explanatory approach, beginning transactions manually and controlling the timing of queries. I think that test is helpful, but it didn't seem useful to just copy it for this new issue. It's also not easy to control the timing of Entities._processSubmissionEvent(), which I think would be needed in order to write such a test.

The test that I wrote runs the race condition described in the issue. Even without locking, things occasionally work fine: the offline update doesn't end up stuck in the backlog. That's the case <10% of the time on my local machine. Given that things sometimes work, I run the race condition 50 times. I think it's nice to have a test that demonstrates the race condition as described in the issue and that fails if locking is removed.

I tried to do more to control the timing of Entities._processSubmissionEvent(), thinking that that would allow me to run the race condition only once. Specifically, this is what I had in mind:

  • Submit an offline create and an offline update to the same entity.
  • Start processing the offline update first.
  • Pause the processing of the offline update after Entities._computeBaseVersion() is run.
  • Start processing the offline create, allowing it to finish before unpausing the offline update.
  • Unpause the processing of the offline update.
  • Observe that the offline update ends up stuck in the backlog.

However, it turned out to be harder than expected to spy on Entities._computeBaseVersion(). Each time a new container is created, it looks like the query modules are recreated (injector() in lib/model/container.js). Further, a new container is created when a parent transaction is begun. I was able to spy on Entities._computeBaseVersion() in the container supplied to the test, but that didn't end up being the same Entities._computeBaseVersion() run by Entities._processSubmissionEvent(), as processing an event begins a new transaction.

Before submitting this PR, please make sure you have:

  • run make test and confirmed all checks still pass OR confirm CircleCI build passes
  • verified that any code from external sources are properly credited in comments or that everything is internally sourced

@brontolosone
Copy link
Contributor

Something that may come in handy is to use pg_sleep() to enlarge time windows, making it easier to set up concurrent operations to hit the race condition.

@matthew-white
Copy link
Member Author

Ooh interesting! Are there any special strategies for inserting the pg_sleep() where it needs to go? If a function runs a series of queries, and you want the pg_sleep() to happen in the middle, is there an easy way to do that?

@brontolosone
Copy link
Contributor

You can alter the function to have it run a select pg_sleep(60); query in the middle if that's what you mean?
If you can't touch the function or its SQL, and the SQL mutates (insert/update/delete), then another angle would be to implant a trigger that does the pg_sleep() before or after.

@matthew-white matthew-white merged commit e20a059 into master Oct 3, 2024
6 checks passed
@matthew-white matthew-white deleted the test-entity-lock branch October 3, 2024 04:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants