OS: Arch Linux x86-64 RetroArch 1.3.4 RetroArch fails to switch back into windowed mode from fullscreen while pressing F twice, video freezes instead.
Moved from here: https://bugs.freedesktop.org/show_bug.cgi?id=93844
These two commits seem to be the cause:
[diego@myhost ~]$ cd linux [diego@myhost linux]$ git bisect bad 7ac7d19f808697abe6658c64c96868f728273f9c is the first bad commit commit 7ac7d19f808697abe6658c64c96868f728273f9c Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Sun Apr 17 20:42:46 2016 +0100 drm/i915: Avoid stalling on pending flips for legacy cursor updates The legacy cursor ioctl expects to be asynchronous with respect to other screen updates, in particular page flips. As X updates the cursor from a signal context, if the cursor blocks then it will stall both the input and output chains causing bad stuttering and horrible UX. Reported-and-tested-by: Rafael Ristovski <rafael.ristovski@gmail.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94980 Fixes: 5008e874edd34 ("drm/i915: Make wait_for_flips interruptible.") Suggested-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Jani Nikula <jani.nikula@intel.com> Cc: stable@vger.kernel.org Link: http://patchwork.freedesktop.org/patch/msgid/1460922166-20292-1-git-send-email-chris@chris-wilson.co.uk Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> (cherry picked from commit acf4e84d6167317ff21be5c03e1ea76ea5783701) Signed-off-by: Jani Nikula <jani.nikula@intel.com> :040000 040000 ffd5371b8faffb065a2cd8c5624127ce2a03284c a19fe78a340ba89e0c206e96b65d5c426eb0e150 M drivers [diego@myhost linux]$
[diego@myhost ~]$ cd linux [diego@myhost linux]$ git bisect good f4502c25ebd04691f284fdafff4a5613299c36dc is the first bad commit commit f4502c25ebd04691f284fdafff4a5613299c36dc Author: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Date: Thu Aug 27 15:44:04 2015 +0200 drm/i915: Always try to inherit the initial fb. The initial state is read out correctly and the state is atomic, so it's safe to preserve the fb without any hacks if it's suitable. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> :040000 040000 4aa2f488c319e68a1a17baac4bd4269dbc34daf9 67bdae945d662fd1846371b469b402f070203b45 M drivers [diego@myhost linux]$
Created attachment 124844 [details] git bisect v45 v46
Created attachment 124846 [details] git bisect v43 v44
*** Bug 93844 has been marked as a duplicate of this bug. ***
As seen in Bug 93844, I tested Linux v4.1, v4.2, v4.3, those are all OK. The problem started with v4.4, or commit f4502c25ebd04691f284fdafff4a5613299c36dc to be precise.
v4.5 is also fine, v4.4 and v4.6 are broken.
d551599181769571f4f68dd93e5d8b15868889af is also OK, doesn't hang.
Created attachment 124863 [details] cpuinfo
Created attachment 124864 [details] lspci -nn
I actually noticed that the whole X hangs, because my WM doesn't respond anymore (i.e. can't switch to other virtual desktops) after RetroArch hangs. The only thing I can do is switch back to a TTY (i.e. Ctrl+Alt+F2) and back to X in order for the image to update/refresh itself. After that my desktop is usable again.
The chvt will cause a modeset which will flush the vblank queue (more evidence that is a "lost" event in the kernel as opposed to userspace missing the event enitrely). This will be very noisy, but I wonder if a drm.debug=0xff dmesg will have a clue (you will probably also need to increase the dmesg to say log_buf_len=10M) when you hit the hang and then chvt.
Created attachment 124879 [details] dmesg
(In reply to Diego Viola from comment #15) > Created attachment 124879 [details] > dmesg Attached it, that's when I opened retroarch and loaded a core/game and hit F multiples times, RetroArch/X just kept hanging. Then I went back to VT and X and captured the dmesg.
Still hanging with Linux 4.7.0-rc7-ARCH.
This is no longer an issue with the latest xf86-video-intel, especially after they added --with-default-dri=3 by default. https://git.archlinux.org/svntogit/packages.git/commit/trunk/PKGBUILD?h=packages/xf86-video-intel&id=cd3de9bb45a9ab84383541ed45ee6f0c10ea8798 Closing.
This is still a problem after installing xf86-video-intel and setting Option "DRI" "2" in /etc/X11/xorg.conf.d/20-intel.conf. Arch Linux have defaulted to DRI3. Pressing F in RetroArch then freezes X.
What's the difference between this and bug 96769?
(In reply to Jani Nikula from comment #20) > What's the difference between this and bug 96769? It's the same problem, RetroArch causes my graphics to freeze when pressing F many times. That said, I only see this problem when I use DRI2 or SNA, I'm currently using modesetting and I don't see this problem anymore. I initially submitted this bug here: Bug 93844 but I ended up bisecting this problem on my two computers, and things got too messy, so I opened separate bug reports so I don't mix the git-bisect results in 1 bug report, considering I was doing the bisect on different hardware.
Just did a bisect of xf86-video-intel and this commit results in the commit that causes this bug: commit c4565979572b61cf7fc0b931333c032c88b259f1 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Dec 2 10:06:46 2015 +0000 sna/dri2: Emit the outstanding signal when eliding a swap When we do the exchange for the next swap, we should emit any pending completion signal for the previous buffer. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> :040000 040000 9135307f594751c0bd86feb2d780af649a93bbfe 0bb2becd9a5622656b7fb70fff021e373f7987df M src
Just checked and HEAD^ (AKA the parent commit is OK).
Actually, commit c4565979572b61cf7fc0b931333c032c88b259f1 crashes my X server.
I'm convinced this commit is where the problem started happening.
(In reply to Chris Wilson from comment #14) > The chvt will cause a modeset which will flush the vblank queue (more > evidence that is a "lost" event in the kernel as opposed to userspace > missing the event enitrely). This will be very noisy, but I wonder if a > drm.debug=0xff dmesg will have a clue (you will probably also need to > increase the dmesg to say log_buf_len=10M) when you hit the hang and then > chvt. Chris, you mentioned this was a possible kernel bug, but I got that commit that is causing this in xf86-video-intel. Do you have any suggestions please?
(In reply to Diego Viola from comment #26) > (In reply to Chris Wilson from comment #14) > > The chvt will cause a modeset which will flush the vblank queue (more > > evidence that is a "lost" event in the kernel as opposed to userspace > > missing the event enitrely). This will be very noisy, but I wonder if a > > drm.debug=0xff dmesg will have a clue (you will probably also need to > > increase the dmesg to say log_buf_len=10M) when you hit the hang and then > > chvt. > > Chris, you mentioned this was a possible kernel bug, but I got that commit > that is causing this in xf86-video-intel. > > Do you have any suggestions please? I've run over 96hours on xf86-video-intel/test/dri2-race that captures all of the issues you have hit so far and more. Both xorg-1.18.0 (as demonstrated earlier and fixed) and xorg-1.19-rc are buggy. That change just moves a spurious early signal to the later pending flip, i.e. expanding the race window. It is not the culprit you think it is.
(In reply to Chris Wilson from comment #27) > (In reply to Diego Viola from comment #26) > > (In reply to Chris Wilson from comment #14) > > > The chvt will cause a modeset which will flush the vblank queue (more > > > evidence that is a "lost" event in the kernel as opposed to userspace > > > missing the event enitrely). This will be very noisy, but I wonder if a > > > drm.debug=0xff dmesg will have a clue (you will probably also need to > > > increase the dmesg to say log_buf_len=10M) when you hit the hang and then > > > chvt. > > > > Chris, you mentioned this was a possible kernel bug, but I got that commit > > that is causing this in xf86-video-intel. > > > > Do you have any suggestions please? > > I've run over 96hours on xf86-video-intel/test/dri2-race that captures all > of the issues you have hit so far and more. Both xorg-1.18.0 (as > demonstrated earlier and fixed) and xorg-1.19-rc are buggy. That change just > moves a spurious early signal to the later pending flip, i.e. expanding the > race window. It is not the culprit you think it is. Weird. I built xf86-video-intel from git and I was able to reproduce it, the issue is still there. I made a video here to demonstrate the issue: https://dl.dropboxusercontent.com/u/6005119/VID_20161005_124021.mp4
Since Arch now builds xf86-video-intel with --with-default-dri=3 I have to start retroarch with LIBGL_DRI3_DISABLE=1 to actually reproduce this.
(In reply to Chris Wilson from comment #27) > (In reply to Diego Viola from comment #26) > > (In reply to Chris Wilson from comment #14) > > > The chvt will cause a modeset which will flush the vblank queue (more > > > evidence that is a "lost" event in the kernel as opposed to userspace > > > missing the event enitrely). This will be very noisy, but I wonder if a > > > drm.debug=0xff dmesg will have a clue (you will probably also need to > > > increase the dmesg to say log_buf_len=10M) when you hit the hang and then > > > chvt. > > > > Chris, you mentioned this was a possible kernel bug, but I got that commit > > that is causing this in xf86-video-intel. > > > > Do you have any suggestions please? > > I've run over 96hours on xf86-video-intel/test/dri2-race that captures all > of the issues you have hit so far and more. Both xorg-1.18.0 (as > demonstrated earlier and fixed) and xorg-1.19-rc are buggy. That change just > moves a spurious early signal to the later pending flip, i.e. expanding the > race window. It is not the culprit you think it is. I think you're right, I went back to Aug. 2015 and I still get these hangs, going back further I get compile time errors and I can't test anymore. It's interesting though that some of those commits are OK and don't hang. I'm out of ideas. :(
(In reply to Diego Viola from comment #30) > (In reply to Chris Wilson from comment #27) > > (In reply to Diego Viola from comment #26) > > > (In reply to Chris Wilson from comment #14) > > > > The chvt will cause a modeset which will flush the vblank queue (more > > > > evidence that is a "lost" event in the kernel as opposed to userspace > > > > missing the event enitrely). This will be very noisy, but I wonder if a > > > > drm.debug=0xff dmesg will have a clue (you will probably also need to > > > > increase the dmesg to say log_buf_len=10M) when you hit the hang and then > > > > chvt. > > > > > > Chris, you mentioned this was a possible kernel bug, but I got that commit > > > that is causing this in xf86-video-intel. > > > > > > Do you have any suggestions please? > > > > I've run over 96hours on xf86-video-intel/test/dri2-race that captures all > > of the issues you have hit so far and more. Both xorg-1.18.0 (as > > demonstrated earlier and fixed) and xorg-1.19-rc are buggy. That change just > > moves a spurious early signal to the later pending flip, i.e. expanding the > > race window. It is not the culprit you think it is. > > I think you're right, I went back to Aug. 2015 and I still get these hangs, > going back further I get compile time errors and I can't test anymore. > > It's interesting though that some of those commits are OK and don't hang. > I'm out of ideas. :( Yes, I realize it's a bug in the kernel, and it sucks nobody is doing anything about them. What I'm saying is that maybe you'll be able to alleviate by using modesetting/glamor as it probably uses different functions than xf86-video-intel. Although that is more of a workaround right now, and it has worked for me.
(In reply to Diego Viola from comment #31) > (In reply to Diego Viola from comment #30) > > (In reply to Chris Wilson from comment #27) > > > (In reply to Diego Viola from comment #26) > > > > (In reply to Chris Wilson from comment #14) > > > > > The chvt will cause a modeset which will flush the vblank queue (more > > > > > evidence that is a "lost" event in the kernel as opposed to userspace > > > > > missing the event enitrely). This will be very noisy, but I wonder if a > > > > > drm.debug=0xff dmesg will have a clue (you will probably also need to > > > > > increase the dmesg to say log_buf_len=10M) when you hit the hang and then > > > > > chvt. > > > > > > > > Chris, you mentioned this was a possible kernel bug, but I got that commit > > > > that is causing this in xf86-video-intel. > > > > > > > > Do you have any suggestions please? > > > > > > I've run over 96hours on xf86-video-intel/test/dri2-race that captures all > > > of the issues you have hit so far and more. Both xorg-1.18.0 (as > > > demonstrated earlier and fixed) and xorg-1.19-rc are buggy. That change just > > > moves a spurious early signal to the later pending flip, i.e. expanding the > > > race window. It is not the culprit you think it is. > > > > I think you're right, I went back to Aug. 2015 and I still get these hangs, > > going back further I get compile time errors and I can't test anymore. > > > > It's interesting though that some of those commits are OK and don't hang. > > I'm out of ideas. :( > > Yes, I realize it's a bug in the kernel, and it sucks nobody is doing > anything about them. > > What I'm saying is that maybe you'll be able to alleviate by using > modesetting/glamor as it probably uses different functions than > xf86-video-intel. > > Although that is more of a workaround right now, and it has worked for me. Wrong thread.
I tried LIBGL_DRI3_DISABLE=1 retroarch in both modesetting/glamor and xf86-video-intel + SNA and it fails only with the -intel DDX. I'm confused how this can be a kernel problem in this case, sigh.
Is this patch no longer relevant for this problem? https://bugs.freedesktop.org/attachment.cgi?id=121524
If it is of any help, I created this trace with apitrace for reproducing the problem: https://dl.dropboxusercontent.com/u/6005119/retroarch/retroarch.trace $ apitrace replay retroarch.trace
Created attachment 127251 [details] retroarch apitrace
(In reply to Diego Viola from comment #34) > Is this patch no longer relevant for this problem? > > https://bugs.freedesktop.org/attachment.cgi?id=121524 We fixed that bug in Xorg. commit e43abdce964f5ed9689cf908af8c305b39a5dd36 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Feb 3 09:54:46 2016 +0000 dri2: Unblock Clients on Drawable release If the Window is destroyed by another client, such as the window manager, the original client may be blocked by DRI2 awaiting a vblank event. When this happens, DRI2DrawableGone forgets to unblock that client and so the wait never completes. Note Present/xshmfence is also suspectible to this race. Testcase: dri2-race/manager Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
(In reply to Chris Wilson from comment #37) > (In reply to Diego Viola from comment #34) > > Is this patch no longer relevant for this problem? > > > > https://bugs.freedesktop.org/attachment.cgi?id=121524 > > We fixed that bug in Xorg. > > commit e43abdce964f5ed9689cf908af8c305b39a5dd36 > Author: Chris Wilson <chris@chris-wilson.co.uk> > Date: Wed Feb 3 09:54:46 2016 +0000 > > dri2: Unblock Clients on Drawable release > > If the Window is destroyed by another client, such as the window > manager, the original client may be blocked by DRI2 awaiting a vblank > event. When this happens, DRI2DrawableGone forgets to unblock that > client and so the wait never completes. > > Note Present/xshmfence is also suspectible to this race. > > Testcase: dri2-race/manager > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> > Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Yes, I remember now. I still think xf86-video-intel+sna+dri2 shouldn't give different results than modesetting+glamor+dri2 for this use case, although I still get this bug even with Linux 4.8.1 and I've been bisecting from Linux 4.0 and up. That said, I realize you are busy and I'd hate to distract you with this. This seems like the last bug that I know of when it comes to RetroArch on Intel and I'd love for this to be fixed. Thanks for all of your help so far.
(In reply to Diego Viola from comment #38) > Yes, I remember now. I still think xf86-video-intel+sna+dri2 shouldn't give > different results than modesetting+glamor+dri2 for this use case, although I > still get this bug even with Linux 4.8.1 and I've been bisecting from Linux > 4.0 and up. If modesetting implements similar levels of DRI2 functionality (that seems unlikely), then it will also hit the same bugs that have been uncovered so far.
(In reply to Chris Wilson from comment #39) > (In reply to Diego Viola from comment #38) > > Yes, I remember now. I still think xf86-video-intel+sna+dri2 shouldn't give > > different results than modesetting+glamor+dri2 for this use case, although I > > still get this bug even with Linux 4.8.1 and I've been bisecting from Linux > > 4.0 and up. > > If modesetting implements similar levels of DRI2 functionality (that seems > unlikely), then it will also hit the same bugs that have been uncovered so > far. I see, interesting. I've just tried reproducing this bug on Debian 8.6 and I couldn't reproduce it there, I made sure to remove the modesetting driver and use the -intel one, restart X, and make sure compositing was disabled on Xfce. I noticed it uses a much older kernel version: 3.16. Would you be interested to see a bisect between Linux 3.16 and 4.0 perhaps? Thanks.
(In reply to Diego Viola from comment #40) > (In reply to Chris Wilson from comment #39) > > (In reply to Diego Viola from comment #38) > > > Yes, I remember now. I still think xf86-video-intel+sna+dri2 shouldn't give > > > different results than modesetting+glamor+dri2 for this use case, although I > > > still get this bug even with Linux 4.8.1 and I've been bisecting from Linux > > > 4.0 and up. > > > > If modesetting implements similar levels of DRI2 functionality (that seems > > unlikely), then it will also hit the same bugs that have been uncovered so > > far. > > I see, interesting. > > I've just tried reproducing this bug on Debian 8.6 and I couldn't reproduce > it there, I made sure to remove the modesetting driver and use the -intel > one, restart X, and make sure compositing was disabled on Xfce. > > I noticed it uses a much older kernel version: 3.16. > > Would you be interested to see a bisect between Linux 3.16 and 4.0 perhaps? > > Thanks. BTW, I tested this on the dualcore machine I originally reported this on, I couldn't boot Debian on my T450 for some reason. I plan on doing the bisect on both machines.
Just tried building Linux 3.16 today, and found it hangs on this machine, only stable releases like 3.16.37 will boot.
How to reproduce this bug: 1. git clone git@github.com:libretro/RetroArch.git 2. cd RetroArch 3. ./configure && make 4. LIBGL_DRI3_DISABLE=1 ./retroarch # it is important to run RA in DRI2 mode for this bug to occur. 5. Online Updater -> Core Updater -> grab genesis_plus_gx_libretro.so.zip # hit x to select and z to go back. 6. Go to main menu with z (press z twice), hit "Load Core", load the recently downloaded core (genesis_plus_gx_libretro.so) 7. Go to Quick Menu and hit Resume (you'll see a black screen), at this point if you hit F twice it should hang. You can also open a game with a core such as genesis_plus_gx and it should hang in the game as well.
Created attachment 127523 [details] xf86-video-intel git-325570e with --enable-debug=full
(In reply to Diego Viola from comment #44) > Created attachment 127523 [details] > xf86-video-intel git-325570e with --enable-debug=full Was this log useful in any way? Anything else I can provide to get this fixed? Thanks.
Adding this to my xorg.conf makes the problem also go away with DRI2: Option "PageFlip" "off"
Turning off TripleBuffer helps as well.
it works with VSync=off as well.
BLT acceleration has this issue as well.
SwapbuffersWait=off makes it work too.
TearFree=on helps also.
LinearFramebuffer=on makes it work as well.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.