Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(zstd compression?) com.google.common.base.VerifyException: java.io.IOException: attempt to write to a closed Outputstream backed by a native file #22930

Open
rbeasley-avgo opened this issue Jul 1, 2024 · 3 comments
Assignees
Labels
P2 We'll consider working on this in future. (Assignee optional) team-Remote-Exec Issues and PRs for the Execution (Remote) team type: bug

Comments

@rbeasley-avgo
Copy link

rbeasley-avgo commented Jul 1, 2024

Description of the bug:

We're observing sporadic build failures where the Bazel daemon crashes with the following:

240630 09:50:38.941:XT 2149 [com.google.devtools.build.lib.bugreport.BugReport.handleCrash] Handling crash with CrashContext{haltJvm=true, args=[], sendBugReport=true, extraOomInfo=, eventHandler=com.google.devtools.build.lib.events.Reporter@2a243856}
com.google.common.base.VerifyException: java.io.IOException: attempt to write to a closed Outputstream backed by a native file
        at com.google.devtools.build.lib.remote.GrpcCacheClient$1.onNext(GrpcCacheClient.java:433)
        at com.google.devtools.build.lib.remote.GrpcCacheClient$1.onNext(GrpcCacheClient.java:414)
        at io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onMessage(ClientCalls.java:474)
        at io.grpc.ForwardingClientCallListener.onMessage(ForwardingClientCallListener.java:33)
        at io.grpc.ForwardingClientCallListener.onMessage(ForwardingClientCallListener.java:33)
        at com.google.devtools.build.lib.remote.logging.LoggingInterceptor$LoggingForwardingCall$1.onMessage(LoggingInterceptor.java:138)
        at io.grpc.internal.DelayedClientCall$DelayedListener$2.run(DelayedClientCall.java:457)
        at io.grpc.internal.DelayedClientCall$DelayedListener.drainPendingCallbacks(DelayedClientCall.java:507)
        at io.grpc.internal.DelayedClientCall$1DrainListenerRunnable.runInContext(DelayedClientCall.java:296)
        at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
        at java.base/java.lang.Thread.run(Thread.java:1583)
Caused by: java.io.IOException: attempt to write to a closed Outputstream backed by a native file
        at com.google.devtools.build.lib.unix.UnixFileSystem$NativeFileOutputStream.write(UnixFileSystem.java:562)
        at com.google.devtools.build.lib.remote.common.LazyFileOutputStream.write(LazyFileOutputStream.java:44)
        at com.google.devtools.build.lib.remote.RemoteCache$ReportingOutputStream.write(RemoteCache.java:550)
        at com.google.devtools.build.lib.remote.util.DigestOutputStream.write(DigestOutputStream.java:58)
        at com.google.common.io.CountingOutputStream.write(CountingOutputStream.java:54)
        at com.google.devtools.build.lib.remote.zstd.ZstdDecompressingOutputStream.write(ZstdDecompressingOutputStream.java:61)
        at com.google.devtools.build.lib.remote.zstd.ZstdDecompressingOutputStream.write(ZstdDecompressingOutputStream.java:54)
        at com.google.protobuf.ByteString$LiteralByteString.writeTo(ByteString.java:1459)
        at com.google.devtools.build.lib.remote.GrpcCacheClient$1.onNext(GrpcCacheClient.java:430)
        ... 12 more

Which category does this issue belong to?

Remote Execution

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

No response

Which operating system are you running Bazel on?

Linux

What is the output of bazel info release?

release 7.2.0-vmware

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse HEAD ?

No response

If this is a regression, please try to identify the Bazel commit where the bug was introduced with bazelisk --bisect.

No response

Have you found anything relevant by searching the web?

No. I couldn't find any matches for "closed outputstream".

Any other information, logs, or outputs that you want to share?

We're using dynamic RBE (impl: bazelbuild/bazel-buildfarm) and remote cache compression.

build:remote --extra_execution_platforms=//:rbe_platform
build:remote --remote_executor=grpcs://<endpoint>
build:remote --jobs=HOST_CPUS*10
build:remote --remote_retries=5
build:remote --experimental_remote_cache_eviction_retries=5
build:remote --verbose_failures
build:remote --remote_cache=
build:remote --disk_cache=
build:remote --noremote_upload_local_results
build:remote --experimental_remote_cache_async
build:remote --experimental_remote_merkle_tree_cache
build:remote --remote_local_fallback
build:remote --remote_local_fallback_strategy=sandboxed
build:remote --experimental_remote_downloader_local_fallback
build:remote --remote_cache_compression

build:rbe_dynamic \
    --config=remote \
    --internal_spawn_scheduler \
    --spawn_strategy=dynamic \
    --dynamic_local_strategy=worker,sandboxed,local

build --experimental_debug_spawn_scheduler
@github-actions github-actions bot added the team-Remote-Exec Issues and PRs for the Execution (Remote) team label Jul 1, 2024
@zhengwei143 zhengwei143 added P2 We'll consider working on this in future. (Assignee optional) and removed untriaged labels Jul 2, 2024
@tjgq
Copy link
Contributor

tjgq commented Aug 6, 2024

@rbeasley-avgo is this reproducible without dynamic execution? My theory at the moment is that the write happens after the remote branch gets canceled (because we're not propagating the cancellation to the write thread properly).

@rbeasley-avgo
Copy link
Author

rbeasley-avgo commented Aug 6, 2024

@rbeasley-avgo is this reproducible without dynamic execution? My theory at the moment is that the write happens after the remote branch gets canceled (because we're not propagating the cancellation to the write thread properly).

@tjgq I'll try it out and get back to you.

@rbeasley-avgo
Copy link
Author

@tjgq In order to establish a baseline, I updated one of our canary pipelines to reenable --remote_cache_compression while still using dynamic execution. I was hoping to encounter the failure described by this issue, after which I'd switch off dynamic execution and re-observe. However, I haven't seen these failures. (FWIW, they coincided with a window where our RBE instance was unhealthy. Brief summary of that here: #22854 (comment).)

Unless anyone else can corroborate this, I guess we'll just need to close as not planned. :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 We'll consider working on this in future. (Assignee optional) team-Remote-Exec Issues and PRs for the Execution (Remote) team type: bug
Projects
None yet
Development

No branches or pull requests

6 participants