Skip to content

Commit

Permalink
2024-03-04-FEX-2403.md
Browse files Browse the repository at this point in the history
  • Loading branch information
Sonicadvance1 committed Mar 5, 2024
1 parent b163dbf commit 0f808e5
Showing 1 changed file with 96 additions and 0 deletions.
96 changes: 96 additions & 0 deletions _posts/2024-03-04-FEX-2403.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
---
layout: post
title: FEX 2403 Tagged!
author: FEX-Emu Maintainers
---

Welcome back to another new tagged version of FEX-Emu! This month we have quite a few important bug fixes and optimizations, so let's get right in to
it!

## Steam fix
As of Steam's [February 27th update](https://store.steampowered.com/news/app/593110/view/4161959294800836361) there was a fairly major change to how Steam starts its embedded Chromium instance.
With this most recent change it now is run inside of the Steam Linux Runtime environment. In turn Steam has disabled the sandbox feature of the
Chromium instance because it is incompatible. FEX was already disabling this sandbox and forcibly passing in the argument to disable it.

Chromium really didn't like the argument being passed in twice and it was causing it to crash early. We have now removed our application profile and
let Steam configure the arguments as required.

As a side effect of Steam updating their version of Chromium, some users have noted that they are experiencing problems with GPU acceleration on
Raspberry Pi systems. This is seemingly a video driver problem and unrelated to this crash that was fixed. It is currently unknown if we can fix this
problem, as it is working find on Tegra and Snapdragon systems.

## Rootfs images updated
FEX's rootfs images have been updated to include the latest versions of Mesa, gfxreconstruct, and Renderdoc. The major change here is having Mesa
updated to 24.0 as the other two packages are mostly for developers.

## Fix a potential hang on forking with memory allocations
We have fixed a known hang that occurs when a process is forking while another thread is allocating memory. This tended to occur as a hang when
running Proton applications. While this fixes one hang, we still have another one that sporatically happens that we haven't tracked down. While the
occurence is relatively rare, it's good to watch the process trees if the program is stuck waiting on a futex.

## A bunch of CPU optimizations
As per usual, this last month has added a bunch of CPU optimizations! We have noted up to a 14% performance improvement in one benchmark and an
average of around 4% in [Geekbench](https://browser.geekbench.com/v5/cpu/compare/22287235?baseline=22220780). We need to commend our developers for
hammering out these optimizations, even a small optimization can have big impacts on games that abuse a particular feature.

- Use FlagM SETF8/16 for INC/DEC
- Optimize LOCK DEC
- Optimize ADC/SBC
- Fuse add+cmn in to adds
- Misc other optimizations
- Optimize less than 32-bit add and sub

## Small timestamp counter scaling
Recently we have found out that some games rely on an x86 CPU's RDTSC instruction operating at Ghz frequencies. While this is not a good idea to make
assumptions, it is relatively common that x86 CPUs have a really high frequency cycle counter frequency. Most laptop CPUs operating in the 1-2Ghz
range while desktop CPUs can go up to 3Ghz in our testing.

Unreal Engine 5 has a new work graph system that spin-loops CPUs for a fixed number of cycles, expecting to not spin for very long. While this is
relatively okay at 1Ghz, since it is only a few nanoseconds, When an ARM CPU's cycle counter gets added to the mix it starts encountering problems.

The primary problem here is that all 64-bit Snapdragon processors ship with a fixed rate 19.2Mhz cycle counter. This continues all the way to their
latest flagship the Snapdragon 8 Gen 3. Additionally other ARM devices we have tested like the Nvidia Jetson ARM boards and Apple M1 also ship a
similarly low cycle counter. So while Unreal engine will only spin-loop for a measily 1597 cycles, on an ARM board this takes ~51,000 nanoseconds but
on an x86 PC it only takes 591 nanoseconds! This was causing games to burn CPU time unnecessarily and run slower than they should!

To compensate for these slow cycle counters on most ARM devices, we are now scaling the value we return to the applications by multiplying the value
by 128 times! This makes snapdragon cycle counters behave more similarly to a 2457 Mhz cycle counter, but with a 128 cycle granularity. This improves
the FPS in Tekken 8 and will also improve performance in all other Unreal Engine 5 games. There may be other games affected as well!

As a side-note, a 1Ghz cycle counter is now mandated by ARMv8.6 and ARMv9.1 spec! So this problem will soon go away as new SoCs get on to the market.

## Introduce ARM64EC static register allocation mapping
As part of the ongoing effort to support WINE's Arm64EC code, we have changed the order in which our registers get allocated to more closely match
what the [Arm64EC ABI](https://learn.microsoft.com/en-us/cpp/build/arm64ec-windows-abi-conventions?view=msvc-170#register-mapping-for-integer-registers)
wants for register layout. Matching what Arm64EC wants for the regster layout means that when the JIT jumps out to some code, we shuffle less data
around which gives a performance improvement. The Linux side of code doesn't need this, so this only happens when building as a WINE module.

## 32-bit thunking improvements
This last month has had an exciting milestone for 32-bit thunking! We have landed support for thunking Wayland on 32-bit. Which this means with our
previously implemented Vulkan thunking, we can now run some games using Wayland plus Vulkan and Zink thrown in to the mix! In particular we have been
able to test that Super Meat Boy works in this configuration! We still have more work to do before X11 and GLX works with 32-bit thunking so stay
tuned to the future!

## Memory leak fix
FEX had an issue with long running processes leaking memory. This showed up in applications that would start hundreds of threads and tear them down
over and over. Steam is one of these long running processes that would starve the system of memory if left open over night. This is because the
program spins up helper threads fairly aggressively and then shuts them down.

We have fixed one major memory leak but we still have a few more to go before its nailed down!

## Syscall passthrough optimization
One important thing to be wary of when running games is syscall overhead. Every time an ioctl or other syscall is made, FEX can incur significant
overhead compared to running the application natively. Additionally if we are passing syscalls through to glibc helpers then this can add more
overhead and sometimes introduce bad behaviour.

This month we spent some time looking at how syscalls are handled when we know that we can pass the data directly to the kernel. This allows us to
more quickly add new system calls when the kernel adds them, and ensures they are as fast as possible. With this optimization in-place FEX now
directly emits small syscall handlers per syscall and jumps directly to the kernel if possible. This lowers CPU overhead for the most common syscalls,
thus removing emulation overhead. While FEX's syscalls were already fairly low overhead, this just improves the situation further!

# Video game showcase
<iframe width="560" height="315" src="https://www.youtube-nocookie.com/embed/5lnBGmq3G88" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>

See the [2403 Release Notes](https://github.com/FEX-Emu/FEX/releases/tag/FEX-2403) or the [detailed change log](https://github.com/FEX-Emu/FEX/compare/FEX-2402...FEX-2403) in Github.


0 comments on commit 0f808e5

Please sign in to comment.