Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(cli): parallel scans using chunks and go routines #6820

Closed
wants to merge 13 commits into from

Conversation

johnnogit
Copy link

@johnnogit johnnogit commented Dec 3, 2023

Closes #6583

Proposed Changes
-use this lib https://medium.com/@dgravesa/an-openmp-inspired-approach-to-parallelizing-loops-in-go-9d2e984a488c to control the parallelism of the execution queries
-implement a flag to control the number of goroutines
-when 0 (default value) use the number of cpus

I submit this contribution under the Apache-2.0 license.

I'm getting the number of the cpus of the machine and divide for two. With this number, I'm calculating the size of the chunk.
After this, the kics will launch a go routine for each chunk
@github-actions github-actions bot added community Community contribution feature request Community: new feature request labels Dec 3, 2023
@kaplanlior kaplanlior changed the title feat(cli): using chunks and go routines do issue 6583 feat(cli): parallel scans using chunks and go routines Dec 4, 2023
@kaplanlior
Copy link
Contributor

Thanks @johnnogit for taking the initiative and sending the PR.

@kaplanlior kaplanlior added the enhancement Enhancement label Dec 4, 2023
@kaplanlior
Copy link
Contributor

@johnnogit Thank you for the PR.
can you take a look on the lint errors and fix them ?

@kaplanlior
Copy link
Contributor

Also see #6833

@johnnogit
Copy link
Author

@gabriel-cx @kaplanlior
I was running regressions to this solution, here you have the results

for the file cumulative1.json
Before resume, number of file : 1, number of lines: 114, number of results: 8, execution time: 3.2436036s
After resume, number of file : 1, number of lines: 114, number of results: 8, execution time: 5.0310739s

for the file cumulative2.json
Before resume, number of file : 90, number of lines: 1008, number of results: 376, execution time: 1m10.5221919s
After resume, number of file : 90, number of lines: 1008, number of results: 375, execution time: 54.8851443s

for the file cumulative3.json
Before resume, number of file : 477, number of lines: 7086, number of results: 477, execution time: 3m48.7089977s
After resume, number of file : 477, number of lines: 7086, number of results: 476, execution time: 3m12.9283518s

for the file cumulative4.json
Before resume, number of file : 821, number of lines: 16806, number of results: 1908, execution time: 4m37.4863077s
After resume, number of file : 821, number of lines: 16806, number of results: 1909, execution time: 3m25.5413461s

for the file cumulative5.json
Before resume, number of file : 881, number of lines: 877258, number of results: 2354, execution time: 20m5.2522011s
After resume, number of file : 881, number of lines: 877258, number of results: 2355, execution time: 16m5.1617701s

for the file cumulative6.json
Before resume, number of file : 1091, number of lines: 928564, number of results: 2799, execution time: 29m52.3641096s
After resume, number of file : 1091, number of lines: 928564, number of results: 2799, execution time: 17m10.1586006s

for the file cumulative7.json
Before resume, number of file : 1096, number of lines: 967231, number of results: 2831, execution time: 27m8.7280063s
After resume, number of file : 1096, number of lines: 967231, number of results: 2829, execution time: 17m58.2907972s

for the file cumulative8.json
Before resume, number of file : 1208, number of lines: 995081, number of results: 3107, execution time: 29m5.4096606s
After resume, number of file : 1208, number of lines: 995081, number of results: 3109, execution time: 17m25.3139678s

for the file cumulative9.json
Before resume, number of file : 1220, number of lines: 996897, number of results: 3906, execution time: 28m52.5278487s
After resume, number of file : 1220, number of lines: 996897, number of results: 3907, execution time: 16m42.410249s

for the file cumulative10.json
Before resume, number of file : 1300, number of lines: 998409, number of results: 4443, execution time: 27m7.9700073s
After resume, number of file : 1300, number of lines: 998409, number of results: 4442, execution time: 17m2.5428431s

for the file cumulative11.json
Before resume, number of file : 1376, number of lines: 1000969, number of results: 4822, execution time: 27m36.6319132s
After resume, number of file : 1376, number of lines: 1000969, number of results: 4821, execution time: 18m19.8423457s

@github-actions github-actions bot added the query New query feature label Dec 21, 2023
@gabriel-cx
Copy link
Contributor

Hi @johnnogit ,

Thanks for this amazing contribution!
After some inspection, the team decided to move forward with this other approach of parallel scanning.

Again, thank you very much for your time dedicated to this!

@gabriel-cx gabriel-cx closed this Feb 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community Community contribution enhancement Enhancement feature request Community: new feature request query New query feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Concurrent Queries Scan
3 participants