Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] OpenBVE freezes at the loading screen #1050

Open
JakubVanek opened this issue Jul 22, 2024 · 16 comments
Open

[Bug] OpenBVE freezes at the loading screen #1050

JakubVanek opened this issue Jul 22, 2024 · 16 comments

Comments

@JakubVanek
Copy link
Contributor

JakubVanek commented Jul 22, 2024

Description

On some Linux OpenGL drivers, OpenBVE will freeze during the "Loading track" screen. The CPU is just idle in that state.

Reproduction

  • Obtain a machine that is capable of running Mesa OpenGL drivers
    • This is confirmed to happen on Ryzen 4600G integrated graphics (-> Mesa RadeonSI driver)
    • I was also able to reproduce this on a Intel Core i7-1165G7 integrated graphics (-> Mesa Iris driver)
  • Install Xubuntu 22.04
  • Install all APT updates (this might also update Mesa)
  • Install the latest OpenBVE (1.10.1.1)
  • Download http://md.archive.ubuntu.com/ubuntu/ubuntu/pool/universe/b/bve-route-cross-city-south/bve-route-cross-city-south_1.31.08-0ubuntu2_all.deb and install it (just to have some route available)
  • Start OpenBVE from console with the following command line:
    mesa_glthread=true openbve
    On machines with AMD GPUs, the mesa_glthread=true part is not even necessary. This is because Mesa enabled its OpenGL threaded dispatch in new releases by default: https://www.phoronix.com/news/Mesa-22.3-RadeonSI-glthread-On
  • Pick any available track and select the default BR323 train
  • Observe that the game gets stuck on the loading screen

Cause

I have debugged this and it turns out that OpenTK deadlocks the Mesa driver. The game freezes because the SwapBuffers() call never returns:

Program.currentGameWindow.SwapBuffers();
.

The cause is that OpenTK calls XLockDisplay() before calling Glx.SwapBuffers() (link). The Mesa driver in glthread mode also wants to acquire the lock. The situation can be described like this:

  • OpenBVE thread:
    • locks the X11 lock via XLockDisplay()
    • calls Glx.SwapBuffers()
    • this enters Mesa and Mesa wants to wait for the GL thread to finish its processing
  • Mesa GL thread:
    • does some processing
    • wants to perform some X11 call
    • calls a XLockDisplay() equivalent likely to be able to access the X11 socket
    • this call will block forever, because the lock is held by the OpenBVE thread, but that thread is waiting for this thread.

I have found opentk/opentk#691 and that looks somewhat similar, but that applies only to other GLX calls.

When the glthread mode is disabled, I assume that this deadlock will not happen because the GL commands will be processed from within the OpenBVE thread.

Workarounds

  • Run OpenBVE explicitly with mesa_glthread=false environment variable. This switches the Mesa driver into a mode where this bug does not happen.
  • Disable GL threading through a DRI configuration file. Create a file /usr/share/drirc.d/01-openbve.conf with the following contents:
DRI configuration file contents
<?xml version="1.0" standalone="yes"?>
<!DOCTYPE driconf [
   <!ELEMENT driconf      (device+)>
   <!ELEMENT device       (application | engine)+>
   <!ATTLIST device       driver CDATA #IMPLIED
                          device CDATA #IMPLIED>
   <!ELEMENT application  (option+)>
   <!ATTLIST application  name CDATA #REQUIRED
                          executable CDATA #IMPLIED
                          executable_regexp CDATA #IMPLIED
                          sha1 CDATA #IMPLIED
                          application_name_match CDATA #IMPLIED
                          application_versions CDATA #IMPLIED>
   <!ELEMENT engine       (option+)>

   <!-- engine_name_match: A regexp matching the engine name -->
   <!-- engine_versions: A version in range format
             (version 1 to 4 : "1:4") -->

   <!ATTLIST engine       engine_name_match CDATA #REQUIRED
                          engine_versions CDATA #IMPLIED>

   <!ELEMENT option       EMPTY>
   <!ATTLIST option       name CDATA #REQUIRED
                          value CDATA #REQUIRED>
]>

<driconf>
    <!-- Please always enable app-specific workarounds for all drivers and
         screens. -->
    <device>
		<application name="OpenBVE" executable="cli"> <!-- The "cli" match is not very robust -->
            <option name="mesa_glthread_app_profile" value="0"/>
        </application>
    </device>
</driconf>
  • Set the preferNativeBackend OpenBVE option in ~/.config/OpenBve/Settings/1.5.0/options.cfg to false. This makes OpenTK use SDL2 for SwapBuffers and SDL2 does not have this issue.

Related issues

It seems to me that #944 might be the same issue.

Route

Any route is likely sufficient, but http://md.archive.ubuntu.com/ubuntu/ubuntu/pool/universe/b/bve-route-cross-city-south/bve-route-cross-city-south_1.31.08-0ubuntu2_all.deb was used.

Train

Any train is likely sufficient , but https://packages.ubuntu.com/jammy/bve-train-br-class-323 was used.

Logs

These are from the Intel machine. If needed, I can also provide logs from the AMD machine, but these logs were fairly similar.

OpenBVE log.txt
OpenBVE Log: 2024-07-22 12:22:09
Program Version: v1.10.1.1

Attached Joysticks:
--------------------
--------------------
12:22:09  Using openGL 3.0 (new) renderer
12:22:09  Initialising game window of size 960 x 600
12:22:09  Creating game window with standard context.
12:22:09  Game window initialised successfully.
12:22:09  Renderer initialised successfully.
12:22:09  /usr/share/games/bve/Railway : Railway folder found.
12:22:09  INFO: 2 Route loading plugins available.
12:22:09  INFO: 6 Object loading plugins available.
12:22:09  INFO: 2 Sound loading plugins available.
12:22:09  Load in Advance is disabled
12:22:09  Loading route file: /usr/share/games/bve/Railway/Route/Birmingham_Cross-City_South/Day/323_Summer_2002_0931_Dry_Clear.csv
12:22:09  INFO: Route file hash 2A19D371A0F6AB76613603C1B883A17A39A68773123D67FE7CDFB77DAC9C5FAD
12:22:09  Route file format is: CSV
12:22:09  INFO: Using the Japanese compatibility signal set.
12:22:13  RailIndex 3 does not reference an existing dike in Track.DikeEnd at line 1721, column 13 in file /usr/share/games/bve/Railway/Route/Birmingham_Cross-City_South/Day/323_Summer_2002_0931_Dry_Clear.csv
12:22:13  RailIndex 3 does not reference an existing dike in Track.DikeEnd at line 1729, column 13 in file /usr/share/games/bve/Railway/Route/Birmingham_Cross-City_South/Day/323_Summer_2002_0931_Dry_Clear.csv
12:22:13  RailIndex 2 does not reference an existing dike in Track.DikeEnd at line 1930, column 17 in file /usr/share/games/bve/Railway/Route/Birmingham_Cross-City_South/Day/323_Summer_2002_0931_Dry_Clear.csv
12:22:13  RailIndex 0 does not reference an existing dike in Track.DikeEnd at line 2062, column 4 in file /usr/share/games/bve/Railway/Route/Birmingham_Cross-City_South/Day/323_Summer_2002_0931_Dry_Clear.csv
12:22:13  RailIndex 4 does not reference an existing wall in Track.WallEnd at line 2166, column 5 in file /usr/share/games/bve/Railway/Route/Birmingham_Cross-City_South/Day/323_Summer_2002_0931_Dry_Clear.csv
12:22:13  RailIndex 1 does not reference an existing dike in Track.DikeEnd at line 2255, column 22 in file /usr/share/games/bve/Railway/Route/Birmingham_Cross-City_South/Day/323_Summer_2002_0931_Dry_Clear.csv
12:22:13  RailIndex 0 does not reference an existing dike in Track.DikeEnd at line 2583, column 3 in file /usr/share/games/bve/Railway/Route/Birmingham_Cross-City_South/Day/323_Summer_2002_0931_Dry_Clear.csv
12:22:13  RailIndex 2 does not reference an existing dike in Track.DikeEnd at line 2768, column 15 in file /usr/share/games/bve/Railway/Route/Birmingham_Cross-City_South/Day/323_Summer_2002_0931_Dry_Clear.csv
12:22:13  RailIndex 0 does not reference an existing wall in Track.WallEnd at line 2926, column 10 in file /usr/share/games/bve/Railway/Route/Birmingham_Cross-City_South/Day/323_Summer_2002_0931_Dry_Clear.csv
12:22:13  RailIndex 3 does not reference an existing wall in Track.WallEnd at line 2926, column 12 in file /usr/share/games/bve/Railway/Route/Birmingham_Cross-City_South/Day/323_Summer_2002_0931_Dry_Clear.csv
12:22:13  RailIndex 0 does not reference an existing dike in Track.DikeEnd at line 2926, column 14 in file /usr/share/games/bve/Railway/Route/Birmingham_Cross-City_South/Day/323_Summer_2002_0931_Dry_Clear.csv
12:22:13  RailIndex 1 does not reference an existing dike in Track.DikeEnd at line 2926, column 15 in file /usr/share/games/bve/Railway/Route/Birmingham_Cross-City_South/Day/323_Summer_2002_0931_Dry_Clear.csv
12:22:13  RailIndex 2 does not reference an existing dike in Track.DikeEnd at line 2926, column 16 in file /usr/share/games/bve/Railway/Route/Birmingham_Cross-City_South/Day/323_Summer_2002_0931_Dry_Clear.csv
12:22:13  RailIndex 3 does not reference an existing dike in Track.DikeEnd at line 2926, column 17 in file /usr/share/games/bve/Railway/Route/Birmingham_Cross-City_South/Day/323_Summer_2002_0931_Dry_Clear.csv
12:22:13  RailIndex 1 does not reference an existing wall in Track.WallEnd at line 2934, column 4 in file /usr/share/games/bve/Railway/Route/Birmingham_Cross-City_South/Day/323_Summer_2002_0931_Dry_Clear.csv
12:22:13  RailIndex 9 does not reference an existing dike in Track.DikeEnd at line 2980, column 2 in file /usr/share/games/bve/Railway/Route/Birmingham_Cross-City_South/Day/323_Summer_2002_0931_Dry_Clear.csv
12:22:13  RailIndex 9 does not reference an existing dike in Track.DikeEnd at line 2990, column 3 in file /usr/share/games/bve/Railway/Route/Birmingham_Cross-City_South/Day/323_Summer_2002_0931_Dry_Clear.csv
12:22:13  RailIndex 4 does not reference an existing dike in Track.DikeEnd at line 2990, column 4 in file /usr/share/games/bve/Railway/Route/Birmingham_Cross-City_South/Day/323_Summer_2002_0931_Dry_Clear.csv
12:22:13  RailIndex 1 does not reference an existing dike in Track.DikeEnd at line 3008, column 36 in file /usr/share/games/bve/Railway/Route/Birmingham_Cross-City_South/Day/323_Summer_2002_0931_Dry_Clear.csv
12:22:14  Route file loaded successfully.
12:22:14  Loading player train: /usr/share/games/bve/Train/BR_Class_323
12:22:14  a0 in section #ACCELERATION is expected to be greater than zero at line 7 in file /usr/share/games/bve/Train/BR_Class_323/train.dat
12:22:14  a1 in section #ACCELERATION is expected to be greater than zero at line 7 in file /usr/share/games/bve/Train/BR_Class_323/train.dat
12:22:14  v1 in section #ACCELERATION is expected to be greater than zero at line 7 in file /usr/share/games/bve/Train/BR_Class_323/train.dat
12:22:14  v2 in section #ACCELERATION is expected to be greater than zero at line 7 in file /usr/share/games/bve/Train/BR_Class_323/train.dat
12:22:14  a0 in section #ACCELERATION is expected to be greater than zero at line 8 in file /usr/share/games/bve/Train/BR_Class_323/train.dat
12:22:14  a1 in section #ACCELERATION is expected to be greater than zero at line 8 in file /usr/share/games/bve/Train/BR_Class_323/train.dat
12:22:14  v1 in section #ACCELERATION is expected to be greater than zero at line 8 in file /usr/share/games/bve/Train/BR_Class_323/train.dat
12:22:14  v2 in section #ACCELERATION is expected to be greater than zero at line 8 in file /usr/share/games/bve/Train/BR_Class_323/train.dat
12:22:14  a0 in section #ACCELERATION is expected to be greater than zero at line 9 in file /usr/share/games/bve/Train/BR_Class_323/train.dat
12:22:14  a1 in section #ACCELERATION is expected to be greater than zero at line 9 in file /usr/share/games/bve/Train/BR_Class_323/train.dat
12:22:14  v1 in section #ACCELERATION is expected to be greater than zero at line 9 in file /usr/share/games/bve/Train/BR_Class_323/train.dat
12:22:14  v2 in section #ACCELERATION is expected to be greater than zero at line 9 in file /usr/share/games/bve/Train/BR_Class_323/train.dat
12:22:14  a0 in section #ACCELERATION is expected to be greater than zero at line 10 in file /usr/share/games/bve/Train/BR_Class_323/train.dat
12:22:14  a1 in section #ACCELERATION is expected to be greater than zero at line 10 in file /usr/share/games/bve/Train/BR_Class_323/train.dat
12:22:14  v1 in section #ACCELERATION is expected to be greater than zero at line 10 in file /usr/share/games/bve/Train/BR_Class_323/train.dat
12:22:14  v2 in section #ACCELERATION is expected to be greater than zero at line 10 in file /usr/share/games/bve/Train/BR_Class_323/train.dat
12:22:14  The #ACCELERATION section defines 8 curves, but the #HANDLE section defines 4 power notches in /usr/share/games/bve/Train/BR_Class_323/train.dat
12:22:14  Loading train panel: /usr/share/games/bve/Train/BR_Class_323/panel.animated
12:22:14  INFO: This train contains both a 2D and a 3D panel. The 3D panel will always take precedence
12:22:14  Train panel loaded sucessfully.
12:22:14  Loading sound.cfg file: /usr/share/games/bve/Train/BR_Class_323/sound.cfg
12:22:14  The SoundFile airzero.wav was not found at line 55 in file /usr/share/games/bve/Train/BR_Class_323/sound.cfg
12:22:14  FileName contains illegal characters or is empty at line 99 in file /usr/share/games/bve/Train/BR_Class_323/sound.cfg
12:22:14  FileName contains illegal characters or is empty at line 99 in file /usr/share/games/bve/Train/BR_Class_323/sound.cfg
12:22:14  FileName contains illegal characters or is empty at line 99 in file /usr/share/games/bve/Train/BR_Class_323/sound.cfg
12:22:14  FileName contains illegal characters or is empty at line 100 in file /usr/share/games/bve/Train/BR_Class_323/sound.cfg
12:22:14  FileName contains illegal characters or is empty at line 100 in file /usr/share/games/bve/Train/BR_Class_323/sound.cfg
12:22:14  FileName contains illegal characters or is empty at line 100 in file /usr/share/games/bve/Train/BR_Class_323/sound.cfg
12:22:14  FileName contains illegal characters or is empty at line 101 in file /usr/share/games/bve/Train/BR_Class_323/sound.cfg
12:22:14  FileName contains illegal characters or is empty at line 101 in file /usr/share/games/bve/Train/BR_Class_323/sound.cfg
12:22:14  FileName contains illegal characters or is empty at line 101 in file /usr/share/games/bve/Train/BR_Class_323/sound.cfg
12:22:14  FileName contains illegal characters or is empty at line 102 in file /usr/share/games/bve/Train/BR_Class_323/sound.cfg
12:22:14  FileName contains illegal characters or is empty at line 102 in file /usr/share/games/bve/Train/BR_Class_323/sound.cfg
12:22:14  FileName contains illegal characters or is empty at line 102 in file /usr/share/games/bve/Train/BR_Class_323/sound.cfg
12:22:14  FileName contains illegal characters or is empty at line 117 in file /usr/share/games/bve/Train/BR_Class_323/sound.cfg
12:22:14  The #ACCELERATION section defines 0 curves, but the #HANDLE section defines 8 power notches in /usr/lib/openbve/Data/Compatibility/PreTrain/train.dat
GDB stacktrace of the Mesa GL thread when the deadlock happens
(gdb) bt
#0  __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x57fb3b781088) at ./nptl/futex-internal.c:57
#1  __futex_abstimed_wait_common (cancel=true, private=0, abstime=0x0, clockid=0, expected=0, futex_word=0x57fb3b781088) at ./nptl/futex-internal.c:87
#2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x57fb3b781088, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139
#3  0x00007091d2093a41 in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x57fb3ba117b0, cond=0x57fb3b781060) at ./nptl/pthread_cond_wait.c:503
#4  ___pthread_cond_wait (cond=0x57fb3b781060, mutex=0x57fb3ba117b0) at ./nptl/pthread_cond_wait.c:627
--- JV NOTE: This is the Mesa GL thread wanting to lock the X11 lock ---
#5  0x00007091cdae71a5 in _XDisplayLockWait (dpy=0x57fb3ba1a8f0) at ../../src/locking.c:451
#6  0x00007091cdb0180c in return_socket (closure=0x57fb3ba1a8f0) at ../../src/xcb_io.c:56
#7  0x00007091cebb3b86 in get_socket_back (c=0x57fb3bb24510) at ../../src/xcb_out.c:96
#8  get_socket_back (c=c@entry=0x57fb3bb24510) at ../../src/xcb_out.c:87
#9  0x00007091cebbb5a6 in prepare_socket_request (c=0x57fb3bb24510) at ../../src/xcb_out.c:126
#10 send_fds (num_fds=0, fds=0x0, c=0x57fb3bb24510) at ../../src/xcb_out.c:196
#11 xcb_send_request_with_fds64 (c=0x57fb3bb24510, flags=flags@entry=1, vector=vector@entry=0x7091497f9600, req=req@entry=0x7091c5204c60 <xcb_req>, num_fds=num_fds@entry=0, fds=fds@entry=0x0) at ../../src/xcb_out.c:316
#12 0x00007091cebbbbcf in xcb_send_request64 (c=<optimized out>, flags=flags@entry=1, vector=vector@entry=0x7091497f9600, req=req@entry=0x7091c5204c60 <xcb_req>) at ../../src/xcb_out.c:349
#13 0x00007091cebbbbdd in xcb_send_request (c=<optimized out>, flags=flags@entry=1, vector=vector@entry=0x7091497f9600, req=req@entry=0x7091c5204c60 <xcb_req>) at ../../src/xcb_out.c:355
#14 0x00007091c5201986 in xcb_dri3_get_supported_modifiers (c=<optimized out>, window=<optimized out>, depth=depth@entry=24 '\030', bpp=bpp@entry=32 ' ') at /build/libxcb-9z6Jre/libxcb-1.14/build/src/dri3.c:522
#15 0x00007091a73576a8 in dri3_alloc_render_buffer (draw=draw@entry=0x57fb3b5d11b8, format=format@entry=4099, width=960, height=600, depth=24) at ../src/loader/loader_dri3_helper.c:1479
#16 0x00007091a7359033 in dri3_get_buffer (format=format@entry=4099, buffer_type=buffer_type@entry=loader_dri3_buffer_back, draw=draw@entry=0x57fb3b5d11b8, driDrawable=<optimized out>) at ../src/loader/loader_dri3_helper.c:2080
#17 0x00007091a7359515 in loader_dri3_get_buffers (driDrawable=<optimized out>, format=4099, stamp=0x57fb3be0a880, loaderPrivate=0x57fb3b5d11b8, buffer_mask=<optimized out>, buffers=0x7091497f9880) at ../src/loader/loader_dri3_helper.c:2283
#18 0x0000709185eb9475 in dri_image_drawable_get_buffers (drawable=drawable@entry=0x57fb3be0a880, images=images@entry=0x7091497f9880, statts=statts@entry=0x57fb3b5f87b0, statts_count=statts_count@entry=2) at ../src/gallium/frontends/dri/dri2.c:316
#19 0x0000709185eb95f9 in dri2_allocate_textures (ctx=0x57fb3bcea270, drawable=0x57fb3be0a880, statts=0x57fb3b5f87b0, statts_count=2) at ../src/gallium/frontends/dri/dri2.c:488
#20 0x0000709185ebc4b1 in dri_st_framebuffer_validate (st=<optimized out>, pdrawable=0x57fb3be0a880, statts=0x57fb3b5f87b0, count=<optimized out>, out=0x7091497f9a70, resolve=0x7091497f9a38) at ../src/gallium/frontends/dri/dri_drawable.c:79
#21 0x0000709185f7f654 in st_framebuffer_validate (stfb=stfb@entry=0x57fb3b5f8350, st=st@entry=0x57fb3b42bf60) at ../src/mesa/state_tracker/st_manager.c:239
#22 0x0000709185f804b8 in st_manager_validate_framebuffers (st=st@entry=0x57fb3b42bf60) at ../src/mesa/state_tracker/st_manager.c:1239
#23 0x00007091861d05f4 in st_update_framebuffer_state (st=0x57fb3b42bf60) at ../src/mesa/state_tracker/st_atom_framebuffer.c:120
#24 0x00007091861d517b in st_validate_state (pipeline_state_mask=1107296512, st=0x57fb3b42bf60) at ../src/util/bitscan.h:117
#25 st_Clear (ctx=0x57fb3cc25240, mask=2) at ../src/mesa/state_tracker/st_cb_clear.c:400
#26 0x00007091860334b8 in _mesa_unmarshal_Clear (ctx=<optimized out>, cmd=<optimized out>) at src/mapi/glapi/gen/marshal_generated1.c:216
#27 0x0000709185f2b302 in glthread_unmarshal_batch (job=job@entry=0x57fb3cc2d408, gdata=gdata@entry=0x0, thread_index=thread_index@entry=0) at ../src/mesa/main/glthread.c:139
#28 0x0000709185ec9978 in util_queue_thread_func (input=input@entry=0x57fb3b7c6180) at ../src/util/u_queue.c:309
#29 0x0000709185f1945b in impl_thrd_routine (p=<optimized out>) at ../src/c11/impl/threads_posix.c:67
#30 0x00007091d2094ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#31 0x00007091d2126850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
GDB stacktrace of the OpenBVE renderer thread when the deadlock happens
(gdb) bt
--- JV NOTE: This is the OpenBVE thread waiting for the Mesa GL thread ---
#0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1  0x0000709185ec3446 in sys_futex (val3=-1, addr2=0x0, timeout=0x0, val1=2, op=9, addr1=0x57fb3cc2d408) at ../src/util/futex.c:43
#2  futex_wait (addr=addr@entry=0x57fb3cc2d408, value=value@entry=2, timeout=timeout@entry=0x0) at ../src/util/futex.c:55
#3  0x0000709185ec957f in do_futex_fence_wait (fence=0x57fb3cc2d408, timeout=timeout@entry=false, abs_timeout=abs_timeout@entry=0) at ../src/util/u_queue.c:130
#4  0x0000709185ec9bcd in _util_queue_fence_wait (fence=<optimized out>) at ../src/util/u_queue.c:145
#5  0x0000709185f2b967 in util_queue_fence_wait (fence=<optimized out>) at ../src/util/u_queue.h:163
#6  _mesa_glthread_finish (ctx=0x57fb3cc25240) at ../src/mesa/main/glthread.c:383
#7  _mesa_glthread_finish (ctx=0x57fb3cc25240) at ../src/mesa/main/glthread.c:364
#8  0x0000709185ebccff in dri_flush (cPriv=0x57fb3bcea270, dPriv=0x57fb3be0a880, flags=3, reason=__DRI2_THROTTLE_SWAPBUFFER) at ../src/gallium/frontends/dri/dri_drawable.c:494
#9  0x00007091a7358292 in loader_dri3_swap_buffers_msc (draw=0x57fb3b5d11b8, target_msc=0, divisor=0, remainder=0, flush_flags=<optimized out>, rects=rects@entry=0x0, n_rects=0, force_copy=false) at ../src/loader/loader_dri3_helper.c:1046
#10 0x00007091a7349df5 in dri3_swap_buffers (pdraw=<optimized out>, target_msc=<optimized out>, divisor=<optimized out>, remainder=<optimized out>, flush=<optimized out>) at ../src/glx/dri3_glx.c:571
#11 0x00007091a73393ff in glXSwapBuffers (dpy=0x57fb3ba1a8f0, drawable=123731970) at ../src/glx/glxcmds.c:677
--- JV NOTE: The frame below is the OpenTK.Platform.X11.Glx:SwapBuffers() call ---
#12 0x000000004128fb46 in  ()
#13 0x00007fffdf789a88 in  ()
#14 0x00007091c796e700 in  ()
#15 0x00007091c796e700 in  ()
#16 0x0000000000000006 in  ()
#17 0x00007091c66e9990 in  ()
#18 0x000057fb3a410aa0 in  ()
#19 0x00007fffdf789a20 in  ()
#20 0x00007fffdf789970 in  ()
#21 0x0000000000000000 in  ()
(gdb) p (char*)mono_pmip(0x000000004128fb46)
$1 = 0x57fb3d439d30 " (wrapper managed-to-native) OpenTK.Platform.X11.Glx:SwapBuffers (intptr,intptr) [{0x57fb3d2e0db8} + 0xa6]  (0x4128faa0 0x4128fbda) [0x57fb3a3f09c0 - OpenBve.exe]"
(gdb)

If needed, I can also post the full C#/Mono backtrace. However, I remember the following:

Related information

  • OS: Xubuntu 22.04.4
  • AMD hardware
    • Motherboard: Gigabyte A320M-H
    • CPU & GPU: AMD Ryzen 4600G
    • Drivers: Mesa 23.2 from Ubuntu repositories. Mesa from the Oibaf graphics drivers repository behaves the same.
  • Intel hardware
    • HW: Lenovo ThinkPad L13 Gen2
    • CPU & GPU: Intel Core i7-1165G7
    • Drivers: Mesa 23.2 from Ubuntu repositories.
@leezer3
Copy link
Owner

leezer3 commented Jul 22, 2024

OK, that's interesting....

I've only got a pretty elderly Intel integrated graphics (HD3000), which normally is Windows / Hackintosh only.
Might be able to get that to reproduce in this case.

Looking into the code, the default timeout for texture loading is 1000ms in the main game ( https://github.com/leezer3/OpenBVE/blob/master/source/OpenBVE/System/GameWindow.cs#L1180 ; you're linking to the Route Viewer method. I'm not sure off the top of my head why there's no timeout here, possibly an oversight)
Panel textures have a 20000ms timeout (this is because we've had cases of people using ~20mb JPG files for panels), as they need the texture sizes to calculate element positioning.

I'd suspect there's probably a strong element of machine / disk based timing in this as well.

I'll see what I can do about reproducing with that.

@leezer3
Copy link
Owner

leezer3 commented Jul 22, 2024

Good, that reproduces on the VM with mesa_glthread=true set.

Going to do a little excavating....

leezer3 added a commit that referenced this issue Jul 22, 2024
@leezer3
Copy link
Owner

leezer3 commented Jul 22, 2024

OK, I think we had a livelock problem there.
I've tuned it to use ConcurrentQueues (as opposed to the original lock + 2 queues, which I think was causing this) and simpified it slightly.

I think more people haven't reported this simply due to the fact that timing here was absolutely critical.

@ginga81 The build from today will hopefully resolve your freeze when not running under WINE.
If not, it'd be very interesting to see if setting the environment variable above helps you.

(I could probably just set this environment variable process wide, but I'd quite like to actually fix the issue rather than mask it)

@JakubVanek
Copy link
Contributor Author

JakubVanek commented Jul 22, 2024

If I understand the fix correctly, I think that it does not eliminate the deadlock, it just reduces the likelihood of it.

It might this might be fixed by just removing the using (new XLock(Display)) from OpenTK's X11GLContext.SwapBuffers() (truth to be told, I do not know if that is safe to do). It seemed to me that the bug was not in OpenBVE directly - rather it was a deadlock between OpenTK and Mesa. The infinite loadscreen may still happen if the deadlock occurs -- the LoadingScreenLoop() would not return in that case because it would be stuck in SwapBuffers().

@JakubVanek
Copy link
Contributor Author

However, I wouldn't rule out that we are seeing two different bugs. The bug that I was seeing could be distinguished by that the SwapBuffers() call would initially succeed, but once the deadlock occurs, it will not return (-> LoadingScreenLoop() stops querying the queue and will also not return). My formulation in the initial bugreport was unfortunate (it implies that any call to SwapBuffers() triggers the deadlock, but that is not the case).

@JakubVanek
Copy link
Contributor Author

JakubVanek commented Jul 22, 2024

I have tested the commit 71689f6 on the Intel machine and the game is still getting stuck when I use mesa_glthread=true. The GDB backtraces seem to be the same.

The deadlock seems to be reliable (the game never successfully loads with the mesa_glthread=true environment variable and preferNativeBackend set to true in the OpenBVE config file)

@leezer3
Copy link
Owner

leezer3 commented Jul 22, 2024

What I've changed is an issue with queues.

It's a little complex, but basically, if we add an item to a queue from one thread (length > 0), but at the same time are returning the pulse signal from another, one of the two can be silently discarded, despite the fact a lock was in use.
e.g.
https://stackoverflow.com/questions/3956127/threading-problem-with-monitor-wait-and-monitor-pulse

This then leaves the loop stuck waiting for a return.


It's a pity that this doesn't solve it for you.
The lock inside OpenTK is something I'm not too happy about messing with; There are a lot of pointers and magic numbers being thrown around, and if something changes mid render call (screensaver is the obvious answer I can think of), I suspect it'll crash out.

Let me think on it a bit.

@JakubVanek
Copy link
Contributor Author

I finally got the Mono SDB debugger running on the Intel machine. Here are the C# stacktraces when the deadlock happens:

Output of "thread bt" in Mono sdb debugger when the deadlock happens
Thread #6 ''
#0 [0x00000000] Interop.Sys.Read
#1 [0x00000014] Interop.Sys.Read (no source)
    call Int32 Sys:Read (IntPtr, Byte*, Int32)
#2 [0x00000037] System.IO.CoreFX.FileSystemWatcher.RunningInstance.TryReadEvent (no source)
    call Int32 Sys:Read (SafeFileHandle, Byte*, Int32)
#3 [0x000002D4] System.IO.CoreFX.FileSystemWatcher.RunningInstance.ProcessEvents (no source)
    call Boolean RunningInstance:TryReadEvent (NotifyEvent)
#4 [0x00000006] System.IO.CoreFX.FileSystemWatcher.RunningInstance.<>c.<Start>b__14_0 (no source)
    callvirt Void RunningInstance:ProcessEvents ()
#5 [0x0000002C] System.Threading.Tasks.Task.InnerInvoke (no source)
    callvirt Void System.Action`1:Invoke (Object)
#6 [0x00000001] System.Threading.Tasks.Task.Execute (no source)
    callvirt Void System.Threading.Tasks.Task:InnerInvoke ()
#7 [0x00000006] System.Threading.Tasks.Task.ExecutionContextCallback (no source)
    callvirt Void System.Threading.Tasks.Task:Execute ()
#8 [0x00000073] System.Threading.ExecutionContext.RunInternal (no source)
    callvirt Void System.Threading.ContextCallback:Invoke (Object)
#9 [0x00000004] System.Threading.ExecutionContext.Run (no source)
    call Void System.Threading.ExecutionContext:RunInternal (ExecutionContext, ContextCallback, Object, Boolean)
#10 [0x00000038] System.Threading.Tasks.Task.ExecuteWithThreadLocal (no source)
    call Void System.Threading.ExecutionContext:Run (ExecutionContext, ContextCallback, Object, Boolean)
#11 [0x00000050] System.Threading.Tasks.Task.ExecuteEntry (no source)
    call Void System.Threading.Tasks.Task:ExecuteWithThreadLocal (Task)
#12 [0x00000007] System.Threading.Tasks.ThreadPoolTaskScheduler.<>c.<.cctor>b__10_0 (no source)
    callvirt Boolean System.Threading.Tasks.Task:ExecuteEntry (Boolean)
#13 [0x00000036] System.Threading.ThreadHelper.ThreadStart_Context (no source)
    callvirt Void System.Threading.ParameterizedThreadStart:Invoke (Object)
#14 [0x00000073] System.Threading.ExecutionContext.RunInternal (no source)
    callvirt Void System.Threading.ContextCallback:Invoke (Object)
#15 [0x00000004] System.Threading.ExecutionContext.Run (no source)
    call Void System.Threading.ExecutionContext:RunInternal (ExecutionContext, ContextCallback, Object, Boolean)
#16 [0x0000002F] System.Threading.ExecutionContext.Run (no source)
    call Void System.Threading.ExecutionContext:Run (ExecutionContext, ContextCallback, Object, Boolean)
#17 [0x0000001B] System.Threading.ThreadHelper.ThreadStart (no source)
    call Void System.Threading.ExecutionContext:Run (ExecutionContext, ContextCallback, Object)

Thread #13 ''
#0 [0x00000030] System.Threading.Thread.SleepInternal
#1 [0x0000001A] System.Threading.Thread.Sleep (no source)
    call Void System.Threading.Thread:SleepInternal (Int32)
#2 [0x0000002A] SoundManager.SoundsBase.SoundThread at /home/kuba/tmp/bve-src/source/SoundManager/Sounds.cs:226
					Thread.Sleep(100);	
#3 [0x0000001F] System.Threading.ThreadHelper.ThreadStart_Context (no source)
    callvirt Void System.Threading.ThreadStart:Invoke ()
#4 [0x00000073] System.Threading.ExecutionContext.RunInternal (no source)
    callvirt Void System.Threading.ContextCallback:Invoke (Object)
#5 [0x00000004] System.Threading.ExecutionContext.Run (no source)
    call Void System.Threading.ExecutionContext:RunInternal (ExecutionContext, ContextCallback, Object, Boolean)
#6 [0x0000002F] System.Threading.ExecutionContext.Run (no source)
    call Void System.Threading.ExecutionContext:Run (ExecutionContext, ContextCallback, Object, Boolean)
#7 [0x00000014] System.Threading.ThreadHelper.ThreadStart (no source)
    call Void System.Threading.ExecutionContext:Run (ExecutionContext, ContextCallback, Object)

Thread #2 ''
#0 [0x00000030] System.Threading.Thread.SleepInternal
#1 [0x0000001A] System.Threading.Thread.Sleep (no source)
    call Void System.Threading.Thread:SleepInternal (Int32)
#2 [0x00000052] LibRender2.BaseRenderer.vt at /home/kuba/tmp/bve-src/source/LibRender2/BaseRenderer.cs:715
					Thread.Sleep(100);
#3 [0x0000001F] System.Threading.ThreadHelper.ThreadStart_Context (no source)
    callvirt Void System.Threading.ThreadStart:Invoke ()
#4 [0x00000073] System.Threading.ExecutionContext.RunInternal (no source)
    callvirt Void System.Threading.ContextCallback:Invoke (Object)
#5 [0x00000004] System.Threading.ExecutionContext.Run (no source)
    call Void System.Threading.ExecutionContext:RunInternal (ExecutionContext, ContextCallback, Object, Boolean)
#6 [0x0000002F] System.Threading.ExecutionContext.Run (no source)
    call Void System.Threading.ExecutionContext:Run (ExecutionContext, ContextCallback, Object, Boolean)
#7 [0x00000014] System.Threading.ThreadHelper.ThreadStart (no source)
    call Void System.Threading.ExecutionContext:Run (ExecutionContext, ContextCallback, Object)

Thread #1 ''
#0 [0x0000000B] OpenTK.Platform.X11.Glx.SwapBuffers
#1 [0x00000072] OpenTK.Platform.X11.X11GLContext.SwapBuffers (no source)
    call Void OpenTK.Platform.X11.Glx:SwapBuffers (IntPtr, IntPtr)
#2 [0x00000006] OpenTK.Graphics.GraphicsContext.SwapBuffers (no source)
    callvirt Void OpenTK.Graphics.IGraphicsContext:SwapBuffers ()
#3 [0x0000000C] OpenTK.GameWindow.SwapBuffers (no source)
    callvirt Void OpenTK.Graphics.IGraphicsContext:SwapBuffers ()
#4 [0x0000020C] OpenBve.OpenBVEGame.LoadingScreenLoop at /home/kuba/tmp/bve-src/source/OpenBVE/System/GameWindow.cs:1125
				Program.currentGameWindow.SwapBuffers();
#5 [0x000001EB] OpenBve.OpenBVEGame.OnLoad at /home/kuba/tmp/bve-src/source/OpenBVE/System/GameWindow.cs:416
				LoadingScreenLoop();
#6 [0x00000002] OpenTK.GameWindow.OnLoadInternal (no source)
    callvirt Void OpenTK.GameWindow:OnLoad (EventArgs)
#7 [0x00000095] OpenTK.GameWindow.Run (no source)
    call Void OpenTK.GameWindow:OnLoadInternal (EventArgs)
#8 [0x00000013] OpenTK.GameWindow.Run (no source)
    call Void OpenTK.GameWindow:Run (Double, Double)
#9 [0x000002EC] OpenBve.MainLoop.StartLoopEx at /home/kuba/tmp/bve-src/source/OpenBVE/System/MainLoop.cs:115
			Program.currentGameWindow.Run();
#10 [0x000007C1] OpenBve.Program.Main at /home/kuba/tmp/bve-src/source/OpenBVE/System/Program.cs:294
						MainLoop.StartLoopEx(result);

Thread #4 ''
#0 [0x00000030] System.Threading.Thread.SleepInternal
#1 [0x0000001A] System.Threading.Thread.Sleep (no source)
    call Void System.Threading.Thread:SleepInternal (Int32)
#2 [0x00000015] OpenTK.Platform.Linux.LinuxJoystick.ProcessEvents (no source)
    call Void System.Threading.Thread:Sleep (Int32)
#3 [0x0000001F] System.Threading.ThreadHelper.ThreadStart_Context (no source)
    callvirt Void System.Threading.ThreadStart:Invoke ()
#4 [0x00000073] System.Threading.ExecutionContext.RunInternal (no source)
    callvirt Void System.Threading.ContextCallback:Invoke (Object)
#5 [0x00000004] System.Threading.ExecutionContext.Run (no source)
    call Void System.Threading.ExecutionContext:RunInternal (ExecutionContext, ContextCallback, Object, Boolean)
#6 [0x0000002F] System.Threading.ExecutionContext.Run (no source)
    call Void System.Threading.ExecutionContext:Run (ExecutionContext, ContextCallback, Object, Boolean)
#7 [0x00000014] System.Threading.ThreadHelper.ThreadStart (no source)
    call Void System.Threading.ExecutionContext:Run (ExecutionContext, ContextCallback, Object)

Thread #5 ''
#0 [0x00000000] Interop.Sys.Read
#1 [0x00000014] Interop.Sys.Read (no source)
    call Int32 Sys:Read (IntPtr, Byte*, Int32)
#2 [0x00000037] System.IO.CoreFX.FileSystemWatcher.RunningInstance.TryReadEvent (no source)
    call Int32 Sys:Read (SafeFileHandle, Byte*, Int32)
#3 [0x000002D4] System.IO.CoreFX.FileSystemWatcher.RunningInstance.ProcessEvents (no source)
    call Boolean RunningInstance:TryReadEvent (NotifyEvent)
#4 [0x00000006] System.IO.CoreFX.FileSystemWatcher.RunningInstance.<>c.<Start>b__14_0 (no source)
    callvirt Void RunningInstance:ProcessEvents ()
#5 [0x0000002C] System.Threading.Tasks.Task.InnerInvoke (no source)
    callvirt Void System.Action`1:Invoke (Object)
#6 [0x00000001] System.Threading.Tasks.Task.Execute (no source)
    callvirt Void System.Threading.Tasks.Task:InnerInvoke ()
#7 [0x00000006] System.Threading.Tasks.Task.ExecutionContextCallback (no source)
    callvirt Void System.Threading.Tasks.Task:Execute ()
#8 [0x00000073] System.Threading.ExecutionContext.RunInternal (no source)
    callvirt Void System.Threading.ContextCallback:Invoke (Object)
#9 [0x00000004] System.Threading.ExecutionContext.Run (no source)
    call Void System.Threading.ExecutionContext:RunInternal (ExecutionContext, ContextCallback, Object, Boolean)
#10 [0x00000038] System.Threading.Tasks.Task.ExecuteWithThreadLocal (no source)
    call Void System.Threading.ExecutionContext:Run (ExecutionContext, ContextCallback, Object, Boolean)
#11 [0x00000050] System.Threading.Tasks.Task.ExecuteEntry (no source)
    call Void System.Threading.Tasks.Task:ExecuteWithThreadLocal (Task)
#12 [0x00000007] System.Threading.Tasks.ThreadPoolTaskScheduler.<>c.<.cctor>b__10_0 (no source)
    callvirt Boolean System.Threading.Tasks.Task:ExecuteEntry (Boolean)
#13 [0x00000036] System.Threading.ThreadHelper.ThreadStart_Context (no source)
    callvirt Void System.Threading.ParameterizedThreadStart:Invoke (Object)
#14 [0x00000073] System.Threading.ExecutionContext.RunInternal (no source)
    callvirt Void System.Threading.ContextCallback:Invoke (Object)
#15 [0x00000004] System.Threading.ExecutionContext.Run (no source)
    call Void System.Threading.ExecutionContext:RunInternal (ExecutionContext, ContextCallback, Object, Boolean)
#16 [0x0000002F] System.Threading.ExecutionContext.Run (no source)
    call Void System.Threading.ExecutionContext:Run (ExecutionContext, ContextCallback, Object, Boolean)
#17 [0x0000001B] System.Threading.ThreadHelper.ThreadStart (no source)
    call Void System.Threading.ExecutionContext:Run (ExecutionContext, ContextCallback, Object)

Thread #7 '<Thread Pool>'
Backtrace for this thread is unavailable

Thread #9 '<Thread Pool>'
Backtrace for this thread is unavailable

Thread #3 ''
#0 [0x00000000] OpenTK.Platform.X11.XI2MouseKeyboard.IsEventValid (no source)
    ldc.i4.0 
#1 [0x00000016] OpenTK.Platform.X11.Functions.XIfEvent
#2 [0x0000003C] OpenTK.Platform.X11.XI2MouseKeyboard.ProcessEvents (no source)
    call Boolean OpenTK.Platform.X11.Functions:XIfEvent (IntPtr, XEvent, IntPtr, IntPtr)
#3 [0x0000001F] System.Threading.ThreadHelper.ThreadStart_Context (no source)
    callvirt Void System.Threading.ThreadStart:Invoke ()
#4 [0x00000073] System.Threading.ExecutionContext.RunInternal (no source)
    callvirt Void System.Threading.ContextCallback:Invoke (Object)
#5 [0x00000004] System.Threading.ExecutionContext.Run (no source)
    call Void System.Threading.ExecutionContext:RunInternal (ExecutionContext, ContextCallback, Object, Boolean)
#6 [0x0000002F] System.Threading.ExecutionContext.Run (no source)
    call Void System.Threading.ExecutionContext:Run (ExecutionContext, ContextCallback, Object, Boolean)
#7 [0x00000014] System.Threading.ThreadHelper.ThreadStart (no source)
    call Void System.Threading.ExecutionContext:Run (ExecutionContext, ContextCallback, Object)

@JakubVanek
Copy link
Contributor Author

It's a little complex, but basically, if we add an item to a queue from one thread (length > 0), but at the same time are returning the pulse signal from another, one of the two can be silently discarded, despite the fact a lock was in use.

I've missed this, I agree that this could also be causing problems on the loading screen.

@ginga81
Copy link
Contributor

ginga81 commented Jul 22, 2024

I immediately tried the daily build, but unfortunately it was at 0%.
What confuses me is that on my current Ubuntu 22.04, 1.7.2.0, 1.8.0.2, and 1.9.0.3 are all stuck at 0%.
From this, I suspect that it's not an OpenBVE problem, but rather that it's due to an update to mono or OpenGL or something, but what do you think?
In any case, I'm sure that it has stopped working, so the fact remains that we need to find the cause and fix it...

@leezer3
Copy link
Owner

leezer3 commented Jul 22, 2024

Have you tried setting the environment variable mentioned in the first message?

If not, please try the build which will generate in a few minutes, which will automatically set this variable.
That's a sub-par solution, but possibly the best we can get if it works.


For what it's worth, after looking into the OpenTK code, I can't see this being anything other than a MESA bug.
Literally all that's inside the freezing call is an external call to glXSwapBuffers
That really shouldn't be blocking on MESA, assuming XInitThreads has been called correctly.
I'll do some more digging, but I suspect it'll need someone more competent than me to actually figure out what MESA is doing wrong.

@ginga81
Copy link
Contributor

ginga81 commented Jul 22, 2024

Sorry for not setting the environment variables.
I just tried it with the build, and when I started it from the command line,
'ATTENTION: default value of option mesa_glthread overridden by environment.'
was displayed on the terminal, and it loaded normally!
The long-running problem has finally been solved.
I can't believe it was caused by a MESA bug...
Anyway, thank you so much.
I think that's resolved for now.
I'll keep an eye on it for a while.

@JakubVanek
Copy link
Contributor Author

JakubVanek commented Jul 22, 2024

I'm now swayed towards this being a Mesa bug as well. I can reproduce this on the generic Zink OpenGL-on-Vulkan driver (MESA_LOADER_DRIVER_OVERRIDE=zink environment variable) and the backtraces in https://gitlab.freedesktop.org/mesa/mesa/-/issues/8994 look similar to the OpenBVE backtraces.

@leezer3
Copy link
Owner

leezer3 commented Jul 22, 2024

Only thing I do wonder is if OpenTK is mis-handling a failing call somewhere, as nested locking within a thread is absolutely possible with X. If it hasn't called to release the lock exactly the same number of times as it took it, then this might actually be expected. (if IMHO atrocious design on the part of X)
https://linux.die.net/man/3/xinitthreads

I'll try and see about looking into that, but OpenTK build isn't liking the main Windows machine at the minute....

@JakubVanek
Copy link
Contributor Author

I have now tested the latest nightly OpenBVE build on the AMD machine (where the mesa_glthread setting was implicitly enabled before) and can confirm that the issue does not appear there anymore.

@JakubVanek
Copy link
Contributor Author

I have reproduced the Mesa deadlock using a tiny C app and I have created a Mesa bugreport for that - https://gitlab.freedesktop.org/mesa/mesa/-/issues/11558.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants