Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add failing test for edge case max_batch_size=max_entries #11378

Closed
wants to merge 4 commits into from

Commits on Aug 21, 2023

  1. fix(queue): add failing test for edge case max_batch_size=max_entries

    When defining a very special edge case configuration having
    max_batch_size=max_entries, the queue can fail with an assertion error when
    removing the frontmost element. This happens especially when the
    callback repeatedly fails (eg. an unavailable backend system receiving
    data).
    
    What happens:
    
    1. we add max_batch_size elements, all of which "post" resources
    2. the batch queue consumes all of those resources in `process_once` by `wait()`ing for them, but gets stuck processing/sending the batch
    3. as `process_once` is stuck until `max_retry_time` passed, the function does not run `delete_frontmost_entry()` and thus actually moves the `front` reference
    4. when enqueuing the next item, it tries to drop the oldest entry, but triggers the assertion in queue.lua as no resources are left
    JensErat committed Aug 21, 2023
    Configuration menu
    Copy the full SHA
    ca35465 View commit details
    Browse the repository at this point in the history
  2. potential fix for race condition

    This commit might fix Kong#11377 by removing currently processed elements
    out of the race condition window.
    
    Two tests needed changes:
    
    1. "giving up sending after retrying" needed another (otherwise) ignored
    value, such that we can wait long enough in `wait_until_queue_done`
    (there might be a more elegant solution here)
    2. the new test required reactivating the handler to succeed to finally
    clear the queue
    
    Why do I think this works?
    
    - immediately after the last call on `semaphore:wait()`, we'll start
    actually removing items from `entries`
    - the code cannot be interrupted by other light threads before we
    actually start the handler
    
    These assumptions strongly need verification by some lua experts!
    JensErat committed Aug 21, 2023
    Configuration menu
    Copy the full SHA
    131e181 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    7432cea View commit details
    Browse the repository at this point in the history
  4. update changelog

    JensErat committed Aug 21, 2023
    Configuration menu
    Copy the full SHA
    2885cbd View commit details
    Browse the repository at this point in the history