Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix[MQB]: app refcount is numVirtualStorages #428

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

dorjesinpo
Copy link
Collaborator

The race:

  1. Domain is being reconfigured with one less app
  2. Domain saves the new config as d_config before proceeding with updating Apps (in Cluster thread)
  3. PUT arrives and picks the "reduced" count but the storage for the App being removed is not removed yet. Queue inserts the PUT into the App storage that is about to be removed. (In queue thread)
  4. Queue thread starts unregistering the App and as the result decrements the PUT refCount. Now, the count is one less than the number of Apps (N)
  5. N - 1 Confirms arrive (where N is the number of Apps after reconfiguration).
  6. refCount drops to zero because it is off by 1.
  7. Queue attempts to remove the PUT but there is still Nth storage without confirm. And this asserts

this can manifest the other way - when we add new App

@dorjesinpo dorjesinpo added the bug Something isn't working label Sep 30, 2024
@dorjesinpo dorjesinpo requested a review from a team as a code owner September 30, 2024 15:11
Copy link
Collaborator

@pniedzielski pniedzielski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change looks good to me, testing locally while rebased on #427 to make sure it fixes the issue.

@678098 678098 changed the title app refcount is numVirtualStorages Fix[MQB]: app refcount is numVirtualStorages Sep 30, 2024
Copy link

@bmq-oss-ci bmq-oss-ci bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Build 277 of commit cb1a3a4 has completed with FAILURE

Signed-off-by: dorjesinpo <129227380+dorjesinpo@users.noreply.github.com>
Copy link

@bmq-oss-ci bmq-oss-ci bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Build 278 of commit 616a7a0 has completed with FAILURE

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants