Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GC doesn't run when using TransactionScope + async #50683

Open
karlra opened this issue Apr 3, 2021 · 18 comments
Open

GC doesn't run when using TransactionScope + async #50683

karlra opened this issue Apr 3, 2021 · 18 comments

Comments

@karlra
Copy link

karlra commented Apr 3, 2021

Description

I have an application that uses the TransactionScope class to control SQL transactions. We also use the same library to do mathematical simulations, but in this case it injects a dummy repository to stop the persistence to SQL. When doing this I noticed that performance became progressively worse, and eventually the app crashed with OOM errors.

I had previously created a bug report here: #1419 but it was closed because it was determined that this is a GC issue and not a TransactionScope issue.

Take this code:

using System;
using System.Diagnostics;
using System.Threading;
using System.Transactions;

namespace TestTrxScope
{
    class Program
    {
        static void Main(string[] args)
        {
            int i = 0;
            double lastElapsed = 0;
            Stopwatch sw = new Stopwatch();
            sw.Start();
            while(true)
            {
                using(var scope = CreateTransactionScope())
                {
                    scope.Complete();
                    i++;
                    if(i % 1_000_000 == 0)
                    {
                        Console.WriteLine($"1m took {sw.Elapsed.TotalSeconds - lastElapsed}");
                        lastElapsed = sw.Elapsed.TotalSeconds;
                        //GC.Collect(2);
                    }
                }
            }
        }

        public static TransactionScope CreateTransactionScope()
        {
            //Memory grows constantly
            return new TransactionScope(TransactionScopeOption.Suppress, new TransactionOptions() { }, TransactionScopeAsyncFlowOption.Enabled);

            //This version doesn't constantly grow in memory usage
            //return new TransactionScope();
        }
    }
}

This program's memory usage will grow forever, and the time it takes to complete 1 million TransactionScopes will also grow forever.

image

image

image

Uncommenting the GC.Collect(2) fixes the ever increasing memory issue, but for some reason the GC does not run on its own in this program. You might think that this is a contrived example but I found this issue in production code, where we have console apps that do extremely heavy calculations.

Note that even if GC.Collect(2) is called manually, the performance per 1 million rounds still gets worse by 75% or so until it evens out. This suggests that our data access code is impacted by this even in situations when the GC might run level 2 collections on its own due to other allocations causing it.

Configuration

.net core 3.1, Windows 10 x64

Regression?

This does not seem to happen on .net framework.

Other information

The situation improves somewhat with .net 5 as I haven't managed to get OOM errors but it still gets massively progressively worse.

@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Apr 3, 2021
@HongGit HongGit removed the untriaged New issue has not been triaged by the area owner label Jul 22, 2021
@HongGit HongGit added this to the Future milestone Jul 22, 2021
@karlra
Copy link
Author

karlra commented Nov 28, 2022

Any chance I could get an update on this? There is a pretty serious leak in TransactionScope with TransactionScopeAsyncFlowOption.Enabled.

@Timovzl
Copy link

Timovzl commented Nov 28, 2022

@karlra, out of curiosity, does omitting TransactionScopeAsyncFlowOption.Enabled affect the problem?

I'm also curious if .NET 7 (current latest) makes any difference.

Finally, what about .NET 7 with <TieredPGO>True</TieredPGO> in the running application's csproj?

@karlra
Copy link
Author

karlra commented Nov 28, 2022

Yes, that's the whole point, omitting TransactionScopeAsyncFlowOption.Enabled fixes the issue (but then TransactionScope is unusable with async code).

I just tested with net7 and there is no appreciable difference vs net6. TieredPGO also makes no difference.

This is with TransactionScopeAsyncFlowOption.Enabled. The GC does run, but it doesn't collect whatever is being leaked here:

image

This is without TransactionScopeAsyncFlowOption.Enabled. GC runs often, but RAM usage is stable.

image

@roji
Copy link
Member

roji commented Oct 11, 2024

Note similarity with #108762.

Am going to go ahead and close this issue as it has been open for a while, and there's no evidence of a leak: growing memory usage in itself is not a problem, and only means that the GC hasn't yet decided to reclaim that memory. An actual leak would mean that an OutOfMemoryException would be thrown eventually; that has been flagged in #108762 for Android only; if someone can put together a repro that shows an OOM for non-mobile .NET, that would indeed be interesting and we'd want to look into it.

@roji roji closed this as not planned Won't fix, can't repro, duplicate, stale Oct 11, 2024
@roji roji removed this from the Future milestone Oct 11, 2024
@karlra
Copy link
Author

karlra commented Oct 11, 2024

Interesting attitude. And you don't feel like it's a problem that performance gets worse by 800% in this example?

@roji
Copy link
Member

roji commented Oct 11, 2024

@karira first, I put together a quick BenchmarkDotNet test. When filing issues on performance and/or memory issues, please always include such a BenchmarkDotNet benchmark rather than a Stopwatch-based benchmark - that greatly helps ensuring you're seeing correct results and not e.g. getting skewed by warm-up problems (it generally helps getting your issue to be addressed more quickly etc.).

Here are the results (code just below):

BenchmarkDotNet=v0.13.0, OS=macOS 14.6.1 (23G93) [Darwin 23.6.0]
Apple M2 Max, 1 CPU, 12 logical and 12 physical cores
.NET SDK=8.0.203
  [Host]     : .NET 8.0.3 (8.0.324.11423), Arm64 RyuJIT
  DefaultJob : .NET 8.0.3 (8.0.324.11423), Arm64 RyuJIT


|                                  Method |     Mean |   Error |  StdDev |  Gen 0 |  Gen 1 |  Gen 2 | Allocated |
|---------------------------------------- |---------:|--------:|--------:|-------:|-------:|-------:|----------:|
|                                 Default | 164.6 ns | 0.82 ns | 0.77 ns | 0.0746 | 0.0138 |      - |     624 B |
| TransactionScopeAsyncFlowOption_Enabled | 524.1 ns | 6.03 ns | 5.64 ns | 0.0944 | 0.0086 | 0.0038 |     777 B |
Benchmark code
BenchmarkRunner.Run<Benchmark>();

[MemoryDiagnoser]
public class Benchmark
{
    [Benchmark]
    public void Default()
    {
        using (new TransactionScope())
        {
            // Do nothing
        }
    }

    [Benchmark]
    public void TransactionScopeAsyncFlowOption_Enabled()
    {
        using (new TransactionScope(TransactionScopeAsyncFlowOption.Enabled))
        {
            // Do nothing
        }
    }
}

In other words, from TransactionScope is indeed x3.5 slower with TransactionScopeAsyncFlowOption.Enabled, and also allocates more memory, with some of it living long enough to get into gen2. While it may be possible to optimize TransactionScopeAsyncFlowOption.Enabled handling, the benchmark doesn't show a problem per se.

I read through #1419, and if I understand correctly, this issue is basically a duplicate of that, which you opened because #1419 was closed. I essentially agree with what @mconnew wrote there: there's no evidence of a problem with TransactionScope itself - and certainly no memory leak as you've indicated above - although there may be a GC issue lurking here.

I'll reopen and tag this as a GC issue, hopefully someone from the GC team and chime in (I recommend also looking at #1419 as well for previous discussion). Am also happy to try and help if I can.

@roji roji reopened this Oct 11, 2024
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Oct 11, 2024
@roji roji added area-GC-coreclr and removed area-System.Transactions untriaged New issue has not been triaged by the area owner labels Oct 11, 2024
Copy link
Contributor

Tagging subscribers to this area: @dotnet/gc
See info in area-owners.md if you want to be subscribed.

@karlra
Copy link
Author

karlra commented Oct 11, 2024

I hear what you are saying, however, I don't agree that this is a case where Benchmark.NET is appropriate or useful. The problem is that the performance gets progressively worse and in the end the application will use all RAM available on the system without doing anything at all. Those issues don't come through at all in your benchmarks.

@roji
Copy link
Member

roji commented Oct 11, 2024

The point of the BenchmarkDotNet benchmark here is to show exactly what's happening at the TransactionScope level, i.e. how much time it's taking, how much memory it's allocating, and whether that memory stays around or not. It serves precisely to help isolate things, as we're suspecting a GC issue etc.

@mangod9 mangod9 added this to the 10.0.0 milestone Oct 11, 2024
@markples
Copy link
Member

I have a partial explanation for this - will try to post later today.

@markples
Copy link
Member

I believe that there are many pieces that fit together to cause this problem. Unfortunately, each part makes sense on its own. I'll attempt to describe it all here.

  • TransactionScope is only relevant in that it uses ConditionalWeakTable (CWT). A similar repro that does
    object obj = new();
    cwt.Add(obj, value);
    cwt.Remove(obj);
    exhibits similar behavior.
  • CWT has a design point that it takes no locks on read access, so all updates are constrained in what they can do. One specific impact is that the Remove can't fully deallocate the entry internally because a concurrent read might be happening.
  • CWT has an internal table (Container). Entries aren't reused (see the previous point), so it is periodically replaced, though (as before) we can't immediately deallocate the internal entries. Also, the Containers are linked in order to manage ownership of entries that end up shared across Containers.
  • The final cleanup work for a Container is left to a finalizer because at that point we know that those concurrent read operations must be done (since they would have references to the Container). However, the possibility of object resurrection (of an object with a reference to a CWT) means that new read operations could have been started. I'll avoid the details for now, but the end result is that a second finalization pass is good enough.
  • Two stage finalization means that that Containers must get garbage collected twice and go through the finalization queue twice, so these objects end up in gen2. Because of the links, they end up being roots for future Containers. This means that eventually the Containers can't even start the cleanup process until a gen2, so they pile up.
  • On top of this, in the microbenchmark scenario, there is a risk of creating finalization work faster than it can be completed. The order of finalization isn't guaranteed, so it's possible that a "bad" order could make things worse. I haven't looked for this or seen evidence either way. However, once a Container is in the finalization queue, no more Containers from the same CWT will be added because they will be rooted by the original one. Therefore, there should a bit of self-correction is that we'll eventually get to the oldest one, and so on.

@markples
Copy link
Member

I've experimented locally with several changes to CWT. Some of them improve the performance of the microbenchmark, but I believe that all of them are unsound. The problematic case that often breaks things is the following:

  • Object X has a CWT and both are collected.
  • X has a finalizer that starts using the CWT again. (Probably this happens by storing the reference to X and/or the CWT to some location and having another thread pick up the work so that the finalizer thread can continue to other objects.)
  • CWT may have an internal resize operation and has read operations happening.
  • A Container finalizer starts executing while the above is happening.

Note also that the CWT can't tell the difference between an operation happening "normally" or under an object resurrection scenario. One thing that I tried was some cleanup during the resize operation so that only one finalization pass would be needed. This doesn't work if the resize happens during an object resurrection.

Note: "during an object resurrection" is sloppy here - the point is that the Container strategy depends on learning that no references exist, which the GC determines during a collection. Some of these cleanup strategies depend on doing something before that point and something else after that point. When finalization leads to more operations, references "reappear" after that GC.

@julealgon
Copy link

julealgon commented Oct 16, 2024

  • TransactionScope is only relevant in that it uses ConditionalWeakTable (CWT). A similar repro that does
    object obj = new();
    cwt.Add(obj, value);
    cwt.Remove(obj);

    exhibits similar behavior.

@markples after this bit, I kept reading to see if you'd mention the async part, but it doesn't seem like you did.

Why does this behavior trigger for TransactionScopes in async mode, but not for non-async ones? Does the non-async TransactionScope not use a CWT (I'm assuming that must be the case here)? If so, why is that the case? Could the implementation for the async flow be changed in a way that it stopped depending on CWT perhaps, thus avoiding this issue at its root?

Just asking for clarification.

@markples
Copy link
Member

Why does this behavior trigger for TransactionScopes in async mode, but not for non-async ones? Does the non-async TransactionScope not use a CWT (I'm assuming that must be the case here)? If so, why is that the case? Could the implementation for the async flow be changed in a way that it stopped depending on CWT perhaps, thus avoiding this issue at its root?

The source code does seem to show that the behavior is async-vs-not.

            if (AsyncFlowEnabled)
            {
                Debug.Assert(ContextKey != null);
                // Async Flow is enabled and CallContext will be used for ambient transaction.
                _threadContextData = CallContextCurrentData.CreateOrGetCurrentData(ContextKey);
...
        public static ContextData CreateOrGetCurrentData(ContextKey contextKey)
        {
            s_currentTransaction.Value = contextKey;
            return s_contextDataTable.GetValue(contextKey, (env) => new ContextData(true));
        }
...
        private static readonly ConditionalWeakTable<ContextKey, ContextData> s_contextDataTable = new ConditionalWeakTable<ContextKey, ContextData>();

I assume that this is due to additional bookkeeping needed to make the async scenario work, but I don't know any details here. @roji is this something that you can answer?

@markples
Copy link
Member

markples commented Oct 16, 2024

There are two remaining things on my list:

  • The internal mechanism for linking Containers is to handle the case where an entry is copied from one Container to another. In this microbenchmark, each entry is immediately removed after being added, so the Container replacements always happen with no live entries. My conjecture is that the link is not needed in this case, and that this is independent of finalizer timing, etc., so it is safe to do. ConditionalWeakTable - don't link Containers when no entries are transferred #108941 has this possible change. In my experiments, this almost works but the finalizer queue ends up being overloaded. If I add a small delay to the microbenchmark (very short spin loop), then the finalizer thread can keep up and memory usage stays low and constant. (but a big caveat - I've eventually found a problem with every other proposed fix I've come up with)
  • Promoting a finalizable object to a higher generation is unfortunate. Often this doesn't matter much as the finalizer can run (nearly) immediately, which handles the important cleanup, and only the memory is left to the higher generation. However, in this case, the double finalization means that the important cleanup is delayed (potentially by a lot), and the links cause other objects to not be eligible for finalization (again, potentially by a lot). Unfortunately, we don't manage objects individually at this level, but one thing that I can try is to check if a significant fraction of the entire gen0 is finalizable and if so, leave all of it in gen0. Theoretically this could solve the generation problems and allow the collection and finalization process to make better progress.

@roji
Copy link
Member

roji commented Oct 16, 2024

@markples thanks for doing the deep dive here, very interesting stuff...!

I assume that this is due to additional bookkeeping needed to make the async scenario work, but I don't know any details here. @roji is this something that you can answer?

Unfortunately not, not without some investigation...

I will note that the code has been this way for a very long time, basically since TransactionScopeAsyncFlowOption.Enabled was introduced (I'm almost certain the same logic exists on .NET Framework). I'll also note that in general, the bar for changing things in System.Transactions is quite high - this isn't a part of .NET that's being actively evolved (and in any case, unless we say that using CWT is discouraged and should be removed, I'm not sure there's an actionable Sys.Tx-side thing to do...).

@markples
Copy link
Member

Thanks @roji. Given some of the details in #1419, it looks like the performance problem has been around for a very long time as well. It looks like the CWT implementation was changed between framework and .net core. The previous version locked on reads, which slows them, but then cleanup is much simpler (can be immediate). I don't know the history of that change (though it's certainly understandable to not want a lock for reads).

It's possible that one or both of the above ideas will handle real scenarios. It would certainly be possible to build a microbenchmark that defeats them, so I worry a bit that a real scenario could still hit trouble. It would be interesting to try these are @karlra's workload.

Cloning the entries into a new Container would allow use to break the link in general but with definite cost for other scenarios.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants