-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallelising work in CI systems #505
Comments
There are few ways you might approach this that come to mind. First, you could have each worker handle a particular module (or subset of modules). For each worker, the Another option is to give each worker access to the entire set of mutations for your project, but to have them only actually perform a subset of them. So if a worker knew, for example, that it was number 3 out of 5, then it would only work on the third fifth of all mutations...or something like that, I'm glossing over details. As before, you might want some way to combine all of the results to get a unified report. Of course, if the workers are actually able to communicate with one another, you could also use e.g. the celery execution engine to distribute work among them. I'm not sure if this is possible or not. So I think we already have most of what you need to do this. It'll require a little creativity, and we might find that there are even better ways, e.g. perhaps a new execution engine. I'm happy to help you work on a solution (though I don't have much bandwidth to actually implement something right now). |
doesn't celery require real-time access between controller and runners? I'm thinking files as those are typically handled well by CI systems (as build artefacts), so they would be runnable even if workers don't have access to network. I'm thinking that the split should happen on a single machine, with job files having a subset of mutations to execute, as we probably want to preserve the runner that executes mutations in random order—so that we can kill workers after specific amount of time, not when they finish the job (for CI we want results quickly, even if they are incomplete). While using filtering and Build artefacts handling is also why I'm thinking that combining should use files as inputs. So basically, I think that we need something that splits the sqlite file into N files with random mutations that can be executed by the existing runner, and then something that takes all the files after the runners are done with them and melds it together. |
That's right, hence the caveat about the workers needing to be able to communicate. I figured this was not likely to be possible, but I thought I should include it for completeness.
I agree, this is a pretty crude approach. It's primary benefit is its simplicity, but it's not so much simpler than other approaches that I'd try it first.
Right, I think this is the best way to start. I think we even have most of the parts we need. I'm not sure what channels there are for communicating between the workers, but I guess we'll need some way of serializing |
Both Gitlab CI and Github Actions allow having tasks that depend on each-other: https://docs.github.com/en/free-pro-team@latest/actions/learn-github-actions/migrating-from-gitlab-cicd-to-github-actions#dependencies-between-jobs
It would be nice to have ability to:
The text was updated successfully, but these errors were encountered: