Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mono AOT decoder workaround for slow jpeg decoding. #2762

Merged
merged 5 commits into from
Jul 31, 2024

Conversation

JimBobSquarePants
Copy link
Member

@JimBobSquarePants JimBobSquarePants commented Jul 3, 2024

Prerequisites

  • I have written a descriptive pull-request title
  • I have verified that there are no overlapping pull-requests open
  • I have verified that I am following the existing coding patterns and practice as demonstrated in the repository. These follow strict Stylecop rules 👮.
  • I have provided test coverage for my change (where applicable)

Description

This PR is an attempt to work around issues found with the Mono AOT compiler which causes slow performance on IOS Android

dotnet/runtime#71210

In the linked Issue, Analysis from the Mono team highlighted this indirection as a culprit.

The AOT compiler does appear to have problems figuring out which instances to generate. In this specific case, the caller is
ImageDecoderUtilities:Decode<Rgba32> which calls IImageDecoderInternals::Decode on an argument. So in theory, the aot compiler could figure out that the call could possible go to JpegDecoderCore::Decode<Rgba32> and generate that instance. Currently, this kind of analysis is not done.

This PR removes that indirection completely by introducing an internal base class, ImageDecoderCore for all XXDecoderCore instances. In addition, seeding has been introduced for SpectralConverter<TPixel> and others.

This is currently untested but I'm confident that this should improve matters.

@beeradmoore
Copy link

beeradmoore commented Jul 10, 2024

I don't think this PR had any impact for what I have tested so far (assuming I built the PR correctly), but I also didn't see the super slow loads the initial person reported. I do still need to test Android.

I have a test repo here. It is a .NET 8 MAUI application. I will test Windows and Mac out of curiosity. I won't test Xamarin.Forms (Xamarin.iOS/Xamarin.Android)

The readme has instructions on how I built this PR into a local nuget package, and how I tested the project on my iPhone 15 Pro Max in both debug and release mode.

The test image I am using is located in ImageSharpMAUITest/ImageSharpMAUITest/Resources/Raw/sloth.jpg

The ImageSharp code I am using is here.

I am loading the image from a stream in the raw folder. I don't think there is any overhead from loading it like this instead of passing file on disk.

using (var stream = await FileSystem.OpenAppPackageFileAsync("sloth.jpg"))
{
    using (var image = await SixLabors.ImageSharp.Image.LoadAsync(stream))
    {
        ...
    }
}

Results

Updating these results as they come in

Debug (this PR)

Device JpgLoad JpgResize PngLoad PngResize
iPhone 3032.1ms 4060.1ms 26.2ms 73.9ms
iOS Simulator 2884.9ms 3615.2ms 21.7ms 61.5ms
Android 14342.7ms 18562.1ms 123.8ms 348ms
Android Emulator 3193.1ms 3933ms 48.3ms 88.7ms
macOS 2708.2ms 3512.4ms 21.4ms 60.9ms
Windows 106.6ms 82.3ms 26.3ms 17.1ms

Debug (3.1.4)

Device JpgLoad JpgResize PngLoad PngResize
iPhone 2932.4ms 3895.8ms 25ms 67.7ms
iOS Simulator 2794.2ms 3555.1ms 21.9ms 61.4ms
Android 14289.4ms 18504.5ms 122.7ms 346.4ms
Android Emulator 3027.6ms 4075.2ms 48.9ms 97.6ms
macOS 2701.9ms 3425.3ms 20.6ms 59.3ms
Windows 98.5ms 86.4ms 22.4ms 19.7ms

Release (this pr)

Device JpgLoad JpgResize PngLoad PngResize
iPhone 62.7ms 80.3ms 4.2ms 4ms
iOS Simulator 64.3ms 81.3ms 3.7ms 4.1ms
Android 1134.5ms 1369.2ms 33.7ms 41.6ms
Android Emulator 243.3ms 300.8ms 19.6ms 25.3ms
macOS 188.5ms 233.9ms 4.9ms 7ms
Windows 149.3ms 96.4ms 16.9ms 14.7ms

Release (3.1.4)

Device JpgLoad JpgResize PngLoad PngResize
iPhone 61.2ms 78.8ms 4ms 4ms
iOS Simulator 61.1ms 79.6ms 3.4ms 4.1ms
Android 1121.5ms 1349.6ms 35.1ms 43ms
Android Emulator 227.8ms 287.9ms 16.7ms 13.5ms
macOS 181.5ms 230.1ms 5.2ms 7.3ms
Windows 153.1ms 146.4ms 17.2ms 15ms

Test devices

Device Hardware OS
iPhone iPhone 15 Pro Max 17.5.1
iOS Simulator iPhone 15 17.4
Android Pixel 2 XL Android 14
Android Emulator Pixel 3a Android 14
macOS MacBook Pro M3 Max Sonoma 14.5
Windows AMD 3300X + RTX 3060 Windows 11 23H2

@beeradmoore
Copy link

Added all the numbers.

My takeaway from this (as someone who uses images, but doesn't really know much about image encoding) is:

  • Jpg is way slower than png
  • Debug mode in almost all platforms is slower than I would have expected
  • Jpgs on an android device in debug mode is wild
  • Shocked to see jpg on android emulator be faster than android on device
  • I don't see any noticeable difference in the results of this PR and the currently published 3.1.4. I don't know if there is meant to be additional changes to using AOT mode for the tests or something.

More than happy to add and run more tests as requested. Attaching the raw results numbers below (if anyone wants to see the results per run).

ImageSharpMAUITest-Results.zip

@JimBobSquarePants
Copy link
Member Author

@beeradmoore I've been doing some R&D in order to understand the Android performance. I can't see a configuration value in your sample to enable LLVM which is required for maximum performance. Did I miss something?

https://learn.microsoft.com/en-us/dotnet/android/building-apps/build-properties#enablellvm

@beeradmoore
Copy link

I did not.

I added

    <PropertyGroup Condition="$([MSBuild]::GetTargetPlatformIdentifier('$(TargetFramework)')) == 'android' AND '$(Configuration)' == 'Release'">
        <EnableLLVM>true</EnableLLVM>
    </PropertyGroup>

And during executing all the tests it gets to about 15/40 (half way through jpg load and resize run) and it crashes with

07-16 12:36:03.133  1730  1778 W WindowManager: Exception thrown during dispatchAppVisibility Window{647a9a8 u0 com.beeradmoore.imagesharpmauitest/crc64df7e0c4a761c65fa.MainActivity EXITING}
07-16 12:36:03.133  1730  1778 W WindowManager: android.os.DeadObjectException
07-16 12:36:03.133  1730  1778 W WindowManager: 	at android.os.BinderProxy.transactNative(Native Method)
07-16 12:36:03.133  1730  1778 W WindowManager: 	at android.os.BinderProxy.transact(BinderProxy.java:586)
07-16 12:36:03.133  1730  1778 W WindowManager: 	at android.view.IWindow$Stub$Proxy.dispatchAppVisibility(IWindow.java:552)
07-16 12:36:03.133  1730  1778 W WindowManager: 	at com.android.server.wm.WindowState.sendAppVisibilityToClients(WindowState.java:3217)
07-16 12:36:03.133  1730  1778 W WindowManager: 	at com.android.server.wm.WindowContainer.sendAppVisibilityToClients(WindowContainer.java:1293)
07-16 12:36:03.133  1730  1778 W WindowManager: 	at com.android.server.wm.WindowToken.setClientVisible(WindowToken.java:403)
07-16 12:36:03.133  1730  1778 W WindowManager: 	at com.android.server.wm.ActivityRecord.setClientVisible(ActivityRecord.java:7100)
07-16 12:36:03.133  1730  1778 W WindowManager: 	at com.android.server.wm.ActivityRecord.postApplyAnimation(ActivityRecord.java:5820)
07-16 12:36:03.133  1730  1778 W WindowManager: 	at com.android.server.wm.ActivityRecord.commitVisibility(ActivityRecord.java:5762)
07-16 12:36:03.133  1730  1778 W WindowManager: 	at com.android.server.wm.Transition.finishTransition(Transition.java:1257)
07-16 12:36:03.133  1730  1778 W WindowManager: 	at com.android.server.wm.TransitionController.finishTransition(TransitionController.java:925)
07-16 12:36:03.133  1730  1778 W WindowManager: 	at com.android.server.wm.WindowOrganizerController.finishTransition(WindowOrganizerController.java:489)
07-16 12:36:03.133  1730  1778 W WindowManager: 	at android.window.IWindowOrganizerController$Stub.onTransact(IWindowOrganizerController.java:278)
07-16 12:36:03.133  1730  1778 W WindowManager: 	at com.android.server.wm.WindowOrganizerController.onTransact(WindowOrganizerController.java:199)
07-16 12:36:03.133  1730  1778 W WindowManager: 	at android.os.Binder.execTransactInternal(Binder.java:1500)
07-16 12:36:03.133  1730  1778 W WindowManager: 	at android.os.Binder.execTransact(Binder.java:1444)

This is new to me and I have no idea what or why it is doing this. I have re-installed AndroidOS between then and now, this could be related, it could be AOT compile. Tonight I can revert back to not having LLVM and see if to happens again or not. It could also be I did no have developer mode enabled so maybe it was going to sleep in 15sec 🤷‍♂️

I did a single run of with v3.1.4 in release mode running the JpgLoad test and got 1116.8ms. For the local nuget I got 1126.6ms.

I referred to the previous thread and grabbed info from this comment,

and just put

<EnableLLVM>true</EnableLLVM>
<RunAOTCompilation>true</RunAOTCompilation>
<AndroidEnableProfiledAot>false</AndroidEnableProfiledAot>

in my main PropertyGroup to ensure it is enabled.

With that I got 516ms for the local nuget. Huge improvement!

I can test again tonight to see what ones of those above properties are required, if my release Android targeting below was working as intended.

@JimBobSquarePants
Copy link
Member Author

@beeradmoore Did you ever get those updated Android benchmarks?

@beeradmoore
Copy link

@JimBobSquarePants , aside from that single test I did not. Creating a reminder to do it this weekend.

@JimBobSquarePants
Copy link
Member Author

Thanks @beeradmoore very much appreciated!

@beeradmoore
Copy link

So some more checking into things. For these results I will call things run #1 - #6 with each run having a different csproj config. These different configs can also impact filesize.

I think we are also focusing on changes to Android, so these tests are only on Release mode on my physical Pixel 2 XL. If there are any specific AOT flags or anything you want me to re-run on other platforms just let me know what ones.

Settings used for each run:

Run 1

Of the 3 options listed in the other runs, this run did not have them in the csproj. I did not lookk what the net8.0-android defaults are for release mode.

Run 2

<EnableLLVM>true</EnableLLVM>

This resulted in a crush mid way through the second test. While the tests could be ran individually there is an issue if you can't run them back to back so I am considering these invalid.

07-27 14:06:30.804 7895 7961 E gesharpmauitest: * Assertion at /__w/1/s/src/mono/mono/mini/aot-runtime.c:5220, condition `plt_entry' not met
07-27 14:06:30.806 7895 7961 F libc : Fatal signal 6 (SIGABRT), code -1 (SI_QUEUE) in tid 7961 (.NET TP Worker), pid 7895 (gesharpmauitest)

More info on these crashes below.

Run 3

<EnableLLVM>true</EnableLLVM>
<RunAOTCompilation>true</RunAOTCompilation>

This also resulted in a crash.

07-27 14:12:55.543 8325 8359 F libc : Fatal signal 6 (SIGABRT), code -1 (SI_QUEUE) in tid 8359 (.NET TP Gate), pid 8325 (gesharpmauitest)
07-27 14:12:55.543 8325 8378 E gesharpmauitest: * Assertion at /__w/1/s/src/mono/mono/mini/aot-runtime.c:5220, condition plt_entry' not met 07-27 14:12:55.543 8325 8358 E gesharpmauitest: * Assertion at /__w/1/s/src/mono/mono/mini/aot-runtime.c:5220, condition plt_entry' not met

More info on these crashes below.

Run 4

<EnableLLVM>true</EnableLLVM>
<RunAOTCompilation>true</RunAOTCompilation>
<AndroidEnableProfiledAot>false</AndroidEnableProfiledAot>

Run 5

<EnableLLVM>true</EnableLLVM>
<AndroidEnableProfiledAot>false</AndroidEnableProfiledAot>

Same as Run 4, but without the RunAOTCompilation defined. I did not check the defaults for these values, but size and performance appears teh same so I am going to assume RunAOTCompilation is defaulting to true.

Run 6

<EnableLLVM>false</EnableLLVM>
<AndroidEnableProfiledAot>true</AndroidEnableProfiledAot>

This is the same as run 5 but it swaps EnableLLVM and AndroidEnableProfiledAot around. Size and results appears this is the default when no params are defined.

Results (3.1.4)

Run JpgLoad JpgResize PngLoad PngResize aab size
1 1114.9ms 1353.9ms 34.0ms 41.2ms 17.8 MB
2 Crash Crash Crash Crash 17.7 MB
3 Crash Crash Crash Crash 17.7 MB
4 515.3ms 712.2ms 26.9ms 35.4ms 24.1 MB
5 512.0ms 698.8ms 26.6ms 33.9ms 24.1 MB
6 1116.4ms 1355.3ms 33.9ms 40.9ms 17.8 MB

Results (this pr)

Run JpgLoad JpgResize PngLoad PngResize aab size
1 1124.5ms 1361.0ms 33.2ms 41.0ms 17.8 MB
2 Crash Crash Crash Crash 17.7 MB
3 Crash Crash Crash Crash 17.7 MB
4 510.1ms 709.8ms 27.2ms 34.5ms 24.1 MB
5 512.7ms 713.2ms 26.8ms 34.8ms 24.1 MB
6 1123.1ms 1352.3ms 34.1ms 40.5ms 17.8 MB

Crashes

The crashes looks like they are discussed in this issue in the dotnet runtime repo.

Conclusion

My takeaway from this is the currently release 3.1.4 and this PR perform the same, at least for the limited tests I am doing. Other situations, or possibly other devices, may perform differently. What the new code is doing is above my skillset, but I don't see it degrading performance so I don't think its a bad change.

If people want more performance with ImageSharp for Android then they will get it at an incrased filesize by enabling these options for release mode Android builds.

<PropertyGroup Condition="$([MSBuild]::GetTargetPlatformIdentifier('$(TargetFramework)')) == 'android' AND '$(Configuration)' == 'Release'">
    <EnableLLVM>true</EnableLLVM>
    <AndroidEnableProfiledAot>false</AndroidEnableProfiledAot>
</PropertyGroup>

Using EnableLLVM is likely not enough as you could still have crashes, but those crashes may not appear all the time. EG. I can run the second test individually and it will pass, but if I run all tests back to back it will crash halfway through the second set of tests.

I am unsure if there are more optimisations you can do, or if the next step of improvemnets requires the dotnet team to do things on their end.

I am also unsure if writing C# code using Android SDK for image manipulation gets it going any faster or if I am now approching limits on my 7 year old phone. Doing so would then require more devtime to write platform specific code for every single image manipulation piece I'd do for each platform I'd run my app on. It's also then just a lot more code in general to maintain. Whereas ImageSharp appears to do a great job across the board, except for Android where it just does an ok job OR I just use pngs everywhere, becaause they were always fast everywhere 😅

@JimBobSquarePants
Copy link
Member Author

Thanks @beeradmoore for the monumental effort you've put in here. I think this should actually enable us to close off several longstanding issues.

Judging by this comment and this comment the following combination should be the recommended one for Android.

<EnableLLVM>true</EnableLLVM>
<RunAOTCompilation>true</RunAOTCompilation>
<AndroidEnableProfiledAot>false</AndroidEnableProfiledAot>

iOS performance looks fantastic and Android, I imagine will improve with new phones and codegen improvements. I think most complaints are either outdated or the result of miscofiguration.

Adding the Android and iOS specific MAUI configuration from your sample with an explanation to this section of the docs will help eleviate issues going forward.

Despite relative equality of the performance metrics of the two source versions I think merging this PR is still a wise choice. It's much easier to maintain internally as it removes misdirection.

@beeradmoore
Copy link

I think one of the things that will catch people off guard is running in debug mode being slower. I would expect things to be slower in debug mode, but not 14sec to load an image. That would have caught me off guard and I would have questioned the library long before trying in release mode.

@JimBobSquarePants
Copy link
Member Author

I think one of the things that will catch people off guard is running in debug mode being slower. I would expect things to be slower in debug mode, but not 14sec to load an image. That would have caught me off guard and I would have questioned the library long before trying in release mode.

Yeah... Definitely need to docuement that also and raise issues with the relevant parties.

@JimBobSquarePants
Copy link
Member Author

I've added configuration notes to our docs. Please let me know if I've missed anything.

https://docs.sixlabors.com/articles/imagesharp/gettingstarted.html#maui-performance

@JimBobSquarePants JimBobSquarePants merged commit aad5cfa into release/3.1.x Jul 31, 2024
8 checks passed
@JimBobSquarePants JimBobSquarePants deleted the js/mono-aot-decoder-workaround branch July 31, 2024 12:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants