Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The curl_multi can break down nextcloud or the db connections #65

Open
rabser opened this issue Feb 7, 2022 · 8 comments
Open

The curl_multi can break down nextcloud or the db connections #65

rabser opened this issue Feb 7, 2022 · 8 comments
Assignees
Labels
3. completed (being tested) Development is done. Will be merged. bug Something isn't working enhancement New feature or request, approved

Comments

@rabser
Copy link

rabser commented Feb 7, 2022

I experienced bad runs (many users empty lines) and a general DOS on nextcloud server when using this wonderful tool (480 users registered), as curl_multi is too aggressive.
I solved it by splitting the array of users in chunks simply wrapping the “curl_multi” user data attributes using array_chunk on the users list read from $SESSION.
Chunking in blocks of 50 users I managed to get all the info about them without loosing nextcloud or the backend database…

If my patch can be of any interest, i can share.

@bpcurse
Copy link
Owner

bpcurse commented Feb 7, 2022

@rabser Great, I'm happy to hear you like it :)

Sharing the patch would be very appreciated, indeed.
This also sounds like a plausible explanation for another issue.

@bpcurse bpcurse added bug Something isn't working needs info Some information is missing labels Feb 7, 2022
@rabser
Copy link
Author

rabser commented Feb 7, 2022

I'm sorry but I never understood how to use the pull requests (my fault), so I paste here the patch, as it's very simple:

In functions.php substitute starting from line 314

  // Initialize cURL multi handle for parallel requests
  $mh = curl_multi_init();

  // Iterate through userlist
  foreach($_SESSION['userlist'] as $key => $user_id) {

with this block

  $userids = array_chunk($_SESSION['userlist'],50);
  $chunks = count($userids);

  for ( $chunk=0; $chunk<$chunks; $chunk++) {
        // just to let you know what's happening...
        error_log("Working on chunk $chunk"); 

        // Initialize cURL multi handle for parallel requests
        $mh = curl_multi_init();
        // Iterate through userlist
        foreach($userids[$chunk] as $key => $user_id) {

then add a close brace after

curl_multi_close($mh);

to

curl_multi_close($mh);
}

which in original file is in line 350.

Obviously a better work should having a config item for chunking size, for a simpler tuning.
A chunk-size of 1, would be equal to not having curl_multi at all ...

HTH

@bpcurse bpcurse added 1. to develop Will be developed enhancement New feature or request, approved and removed needs info Some information is missing labels Feb 9, 2022
@bpcurse
Copy link
Owner

bpcurse commented Feb 9, 2022

Obviously a better work should having a config item for chunking size, for a simpler tuning.
A chunk-size of 1, would be equal to not having curl_multi at all ...

I had the same thought. A config option will be made available in the next version.

Thank you for contributing to this project 👍, no need to apologize for the pull request.
I had problems getting started with pull requests myself.

Just in case you would like to practice (as it is quite easy in Githubs Web UI):

  • Open the Code tab
  • Click on functions.php
  • Click the Edit button (a fork of the repository will be created automatically in your account)
  • Alter the code
  • Leave a short note about your changes and propose the added/changed code by clicking the corresponding button
  • That should be all AFAIK

@rabser
Copy link
Author

rabser commented Feb 9, 2022

Great! i didn’t know that i could simply edit directly a file to get a pull request (I manage several gitlab self hosted servers, so that i could edit a file was known to me).
Next time iI’ll try !

@bpcurse bpcurse added 2. developing and removed 1. to develop Will be developed labels Feb 9, 2022
@bpcurse bpcurse self-assigned this Feb 9, 2022
@bpcurse bpcurse added 3. completed (being tested) Development is done. Will be merged. and removed 2. developing labels Feb 11, 2022
@bpcurse
Copy link
Owner

bpcurse commented Feb 11, 2022

@rabser A new version containing your chunking proposal and a debug log (both are options in config.php) has been published in the debug branch. May I ask you to test this version as your setup is known to react to these changes?

Here is a howto:

  • Backup your functions.php, config.php and the l10n folder
  • After switching to the 'debug' branch download the above mentioned files and the folder and overwrite your existing ones
  • (regarding config.php you can just add the two new options to your existing file instead of overwriting it)
  • Set user_chunk_size and debug_log options as necessary

@rabser
Copy link
Author

rabser commented Feb 11, 2022

Hi,
tests were successful. Starting with chunks to false, I got many "!ERROR: Received empty user id" lines in the debug.log
Raising up (eg. 40, 50) the errors gone away, as it should be.

Just an hints about the debug.log: if you think to bring that feature to production, you should think that you should left to the admin (eg. in config.php) the choice on were debug.log will be saved or web dir should be writable by the webserver and this is normally not a best practice (nor accepted by providers).

HTH

@bpcurse
Copy link
Owner

bpcurse commented Feb 12, 2022

Thanks a lot for testing! Glad to hear it works in "real life".

For production I am planning to show an error message after user data has been fetched and at least one transfer has failed, including a hint to alter the user_chunk_size config setting. It won't keep the user from working with the potentially incomplete data, if necessary.

This error message could offer a "download log file" button. Log data would be kept in a $_SESSION var until download is actively chosen.
I hope the use of the same system as user data csv file downloads with random temp path/file names and auto deletion after download should be secure enough? See build_csv_file, download_fileand delete_temp_folder functions in functions.php.

What do you think?

@rabser
Copy link
Author

rabser commented Feb 14, 2022

I'm not so used to save data into the $_SESSION, but yes, I can't see any problem in this strategy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3. completed (being tested) Development is done. Will be merged. bug Something isn't working enhancement New feature or request, approved
Projects
None yet
Development

No branches or pull requests

2 participants