Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bazel marks files BEP as available in the remote cache even if the upload failed #23250

Closed
iamricard opened this issue Aug 9, 2024 · 6 comments
Labels
team-Remote-Exec Issues and PRs for the Execution (Remote) team type: bug untriaged

Comments

@iamricard
Copy link

iamricard commented Aug 9, 2024

Description of the bug:

Starting with bazel 7 (b0c5eb3) files in the BEP are referenced bytestream:// regardless of whether the upload succeeded. We've only noticed this being an issue with files in the BuildToolLogs event, particularly the profile, but it might affect other uploads.

Which category does this issue belong to?

Remote Execution

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

Hard to repro and even harder without a remote cache. We've noticed this happening for invocations that use bes_upload_mode=fully_async. But roughly, this is how this can happen:

  • A user starts an invocation (with RC and BES on fully_async)
    • The profile takes more than 5s to upload
  • The user starts a new invocation immediately after the previous one finished
  • At this point bazel will wait for 5s and then cancel the pending uploads

With bazel < 7 the file(s) that failed to upload would have their location set to file://, indicating that they never did get uploaded. Starting with bazel 7 the BEP will (incorrectly) claim files are available at a bytestream:// URI regardless of whether or not bazel did upload them.

Which operating system are you running Bazel on?

MacOS, Linux, Windows

What is the output of bazel info release?

8.0.0-pre.20240730.1

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse HEAD ?

No response

If this is a regression, please try to identify the Bazel commit where the bug was introduced with bazelisk --bisect.

b0c5eb3

Have you found anything relevant by searching the web?

No response

Any other information, logs, or outputs that you want to share?

No response

@github-actions github-actions bot added the team-Remote-Exec Issues and PRs for the Execution (Remote) team label Aug 9, 2024
@saraadams
Copy link
Contributor

saraadams commented Aug 9, 2024

@coeuvre as the author of b0c5eb3

@Silic0nS0ldier
Copy link
Contributor

I understand that the same thing happens with --remote_build_event_upload=minimal (the default) for skipped files. That mode actually forces using the bytestream protocol, which may be the cause of the behaviour observed here.

@iamricard
Copy link
Author

iamricard commented Aug 12, 2024

I understand that the same thing happens with --remote_build_event_upload=minimal (the default) for skipped files. That mode actually forces using the bytestream protocol, which may be the cause of the behaviour observed here.

Yes, but looking at b0c5eb3, that seems like expected behavior. However, marking failed uploads as available remotely seems like a bug because Bazel knows the file is most likely not available at the bytestream server.

@coeuvre
Copy link
Member

coeuvre commented Aug 12, 2024

It's by design to always use bytestream:// to avoid FindMissingBlobs calls. See discussion in #16999 for the background.

What's your use case that requiring the location for the missing files to be file://?

@iamricard
Copy link
Author

It's by design to always use bytestream:// to avoid FindMissingBlobs calls. See discussion in #16999 for the background.

Right, I think that discussion makes sense for files bazel didn't attempt to upload. But for files that bazel knows it failed to be uploaded to the bytestream server this results in a confusing BEP to process. Previously, a build event service could interpret a the location of a profile being file:// as "profile not uploaded or failed to upload, therefore not available to display".

What's your use case that requiring the location for the missing files to be file://?

With the new behavior either the BES or the client of the BES needs to query the bytestream server to figure out whether the profile actually did get uploaded.

Does that answer your question?

@coeuvre
Copy link
Member

coeuvre commented Aug 12, 2024

With the new behavior either the BES or the client of the BES needs to query the bytestream server to figure out whether the profile actually did get uploaded.

I think the BES client always need to query the bytestream server to figure out whether a file exists -- even if Bazel uploaded the file successfully, the file could still be evicted from the CAS later.

@tjgq tjgq closed this as not planned Won't fix, can't repro, duplicate, stale Aug 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team-Remote-Exec Issues and PRs for the Execution (Remote) team type: bug untriaged
Projects
None yet
Development

No branches or pull requests

8 participants