Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Predict config changes for purging #563

Open
m5r opened this issue Jun 21, 2023 · 2 comments
Open

Predict config changes for purging #563

m5r opened this issue Jun 21, 2023 · 2 comments
Labels
Type: Feature Add something new

Comments

@m5r
Copy link
Member

m5r commented Jun 21, 2023

Describe the issue

Ensure community members can own sustainable CHT deployments without Medic directly involved

App developers can easily visualize and quantify the impact of a change to config for purging

Additional context
Related allies OKR

@jkuester
Copy link
Contributor

Behavior Overview

(@m5r please correct this if it is wrong!)

A new cht-conf action, dry-run-purge-config has been added. When you execute this action, it will call the new API endpoint with your current purge config and print the results. The results will indicate:

  • When the next purge will run
  • How many total docs would be purged with the new config
  • How many currently purged docs would be unpurged with the new config
  • How many docs would not have their purged status change

@m5r m5r removed their assignment Oct 24, 2023
@m5r
Copy link
Member Author

m5r commented Jan 25, 2024

As noted in the initial cht-core PR, we tried to solve this by running the purging code minus the database mutations (aka dry run) but we ran into the same limits as actual purging with slow queries that made a dry run take hours to complete. Here is a copy of our test results:

I got some disappointing news about our purging dry run solution 😞

I've started a dry run of a purge in my morning on a clone of Muso-Mali with a beefy machine with similar specs: Xeon E5-2686 v4 @ 2.30GHz, 256 GB of RAM, ~650GB of data stored on a 1.5 TB disk. I'm using a fork of CHT 3.13.0 with the purging dry run API living on the temporary branch 3.13.0-FR-dry-run-purging.

It's the beginning of the night over here and the dry run is still going. It took nearly 5 hours to simulate purging contacts, processing ~10k records with each batched request. Our assumption was that queries were cheap and mutating the data was the expensive part of purging that makes the process so slow but it turns out the queries are expensive as we're seeing roughly the same performances as actual purging despite using couchdb views.

It averages 35% of CPU usage with spikes to 80% and any loss of connection between cht-conf and the API during the dry run results in wasted CPU usage as cht-conf can't reconnect to the API to wait for the results while the API keeps running the dry run.

With all this, it's safe to say we cannot move forward with this solution and we should go back to the design step for this feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Feature Add something new
Projects
Status: Todo
Development

No branches or pull requests

2 participants