Bug 67092

Summary:

[SNB regression vsync] WAIT_FOR_EVENT hangs

Product:

DRI

Reporter:

Martin Jørgensen <mkj>

Component:

DRM/Intel

Assignee:

Intel GFX Bugs mailing list <intel-gfx-bugs>

Status:

CLOSED FIXED

QA Contact:

Intel GFX Bugs mailing list <intel-gfx-bugs>

Severity:

normal

Priority:

medium

CC:

bugs, jens, mthode

Version:

unspecified

Hardware:

x86-64 (AMD64)

OS:

Linux (All)

Whiteboard:

i915 platform:

i915 features:

Attachments:

Description	Flags
grep'ed dmesg	none
error state dump just after second GPU hang	none
Xorg.0.log file just after second GPU hang	none
error state after hang. rc6 is off.	none
output from: lspci -vvv -s 0:0:2	none
Read-after-write patch	none
Even more paranoid read-after-write	none
One more variant	none
failed output of patch (One more variant, 82751)	none
error state dump - first hang	none

Description Martin Jørgensen 2013-07-19 15:06:50 UTC

Created attachment 82694 [details]
grep'ed dmesg

After I upgraded from 3.7 kernel to 3.8+ kernels, my GPU have started hanging itself. Sometimes it recovers, sometimes it doesnt. Sometimes it recovers after a single "kick" sometimes after 4-5 kicks.

I upgraded the Xorg intel driver from 2.20.13 to 2.21.12 because I wasnt even able to resize accelerated windows without hangs. But it still hangs sometimes when running something accelerated (VLC, some game).

I also get 2 new errors in my dmesg after i upgraded from kernel 3.7.
From kernel 3.8 I get:

 [drm] GMBUS [i915 gmbus dpb] timed out, falling back to bit banging on pin 5

From kernel 3.9 and 3.10 I get:

 [drm] Wrong MCH_SSKPD value: 0x16040307
 [drm] This can cause pipe underruns and display issues.
 [drm] Please upgrade your BIOS to fix this.

I'm running up-to-date Gentoo + a few keyworded packages on a Thinkpad T420.

Comment 1 Chris Wilson 2013-07-19 15:10:12 UTC

Please attach /sys/kernel/debug/dri/0/i915_error_state

Comment 2 Chris Wilson 2013-07-19 15:10:40 UTC

And also Xorg.0.log

Comment 3 Martin Jørgensen 2013-07-20 04:54:03 UTC

Created attachment 82713 [details]
error state dump just after second GPU hang

Comment 4 Martin Jørgensen 2013-07-20 04:54:51 UTC

Created attachment 82714 [details]
Xorg.0.log file just after second GPU hang

Comment 5 Chris Wilson 2013-07-20 08:47:02 UTC

Can you please try running with i915.i915_enable_rc6=0 on the kernel commandline?

Comment 6 Martin Jørgensen 2013-07-20 16:28:09 UTC

I didnt seem to help. A little dmesg grep:

[    1.709971] [drm] Enabling RC6 states: RC6 off, RC6p off, RC6pp off
[  145.524766] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[  145.524770] [drm] capturing error event; look for more information in /sys/kernel/debug/dri/0/i915_error_state
[  145.533684] [drm:kick_ring] *ERROR* Kicking stuck wait on render ring

I have the error state for this hang if you need it.

Comment 7 Chris Wilson 2013-07-20 16:57:03 UTC

Yes, can you attach the error-state as well. I expect it to the be the same, but there is no harm in double checking.

Can you also please attach lspci -vvv -s 0:0:2?

Comment 8 Martin Jørgensen 2013-07-20 17:48:44 UTC

Created attachment 82737 [details]
error state after hang. rc6 is off.

Comment 9 Martin Jørgensen 2013-07-20 17:49:50 UTC

Created attachment 82738 [details]
output from: lspci -vvv -s 0:0:2

Comment 10 Chris Wilson 2013-07-20 18:02:15 UTC

What wm do you use? Is the hang only associated with vlc/games, or general desktop usage? (Trying to work out if every attempt to vsync fails or if it is sporadic.) Otherwise the cmd looks valid and I don't see anything special about your machine - though I have to admit to not having used vsync on pipe B myself, but given the bugs that were fixed involving pipe B I think others are using it successfully...

Comment 11 Martin Jørgensen 2013-07-20 19:38:01 UTC

Im running Enlightenment E17, without composistion.
The hangs only occurs with vlc/mplayer/games. All my daily applications runs without "special effects" and works fine.

I will try testing with E17 composition (fancy AIGLX stuff) on pipe 0 and on pipe 1 with my external monitor. 

I have some Diplayport->HDMI adapters between my monitor and my laptop. Maybe some of them disturbs the timing/ddc/edid-whatever?

Comment 12 Chris Wilson 2013-07-20 19:49:50 UTC

(In reply to comment #11)
> I have some Diplayport->HDMI adapters between my monitor and my laptop.
> Maybe some of them disturbs the timing/ddc/edid-whatever?

The messages involved here are all internal to the GPU (actually between the display engine and the GPU...) so should not be affected by external configuration.

Comment 13 Martin Jørgensen 2013-07-20 20:29:32 UTC

I've done the testing. It seems not possible to make the GPU hang when using pipe 0 (LVDS), with or without composition, no matter the application.

Using pipe 1 (HDMI2), I have turn composition on. Otherwise the GPU hangs consitently, no matter the application.

Comment 14 Martin Jørgensen 2013-07-20 20:39:07 UTC

(In reply to comment #13)
> I've done the testing. It seems not possible to make the GPU hang when using
> pipe 0 (LVDS), with or without composition, no matter the application.
> 
> Using pipe 1 (HDMI2), I have turn composition on. Otherwise the GPU hangs
> consitently, no matter the application.

This just in'! I managed to get a hang running E17 + composition with VLC fullscreen running some movie.

Comment 15 Martin Jørgensen 2013-07-20 20:39:37 UTC

(In reply to comment #14)
> (In reply to comment #13)
> > I've done the testing. It seems not possible to make the GPU hang when using
> > pipe 0 (LVDS), with or without composition, no matter the application.
> > 
> > Using pipe 1 (HDMI2), I have turn composition on. Otherwise the GPU hangs
> > consitently, no matter the application.
> 
> This just in'! I managed to get a hang running E17 + composition with VLC
> fullscreen running some movie.

On pipe 1 :)

Comment 16 Chris Wilson 2013-07-20 21:36:42 UTC

Created attachment 82749 [details]
Read-after-write patch

Comment 17 Chris Wilson 2013-07-20 21:39:44 UTC

Created attachment 82750 [details] [review]
Even more paranoid read-after-write

Comment 18 Chris Wilson 2013-07-20 21:46:20 UTC

Created attachment 82751 [details] [review]
One more variant

Comment 19 Martin Jørgensen 2013-07-21 04:48:00 UTC

Neither of the patches fixed the problem.

glxgears generaly hangs if I resize the window too fast. If i resize the window slow enough no hangs occur. No issues when using composition.

I have error_state and Xorg.0.log for attachment 82749 [details] and 82750 if needed.

I also get these new messages in dmesg after I apply any of the 2 patches:


[   51.077013] [drm:drm_edid_block_valid] *ERROR* EDID checksum is invalid, remainder is 255
[   51.101981] [drm:drm_edid_block_valid] *ERROR* EDID checksum is invalid, remainder is 255
[   51.102002] [drm:drm_edid_block_valid] *ERROR* EDID checksum is invalid, remainder is 255
[   51.102009] stereo mode not supported
[   51.102013] stereo mode not supported

Attachment 82751 [details] (One more variant) fails to patch.

Comment 20 Martin Jørgensen 2013-07-21 04:49:26 UTC

Created attachment 82762 [details]
failed output of patch (One more variant, 82751)

Comment 21 Martin Jørgensen 2013-07-21 04:53:58 UTC

(In reply to comment #19)
> Neither of the patches fixed the problem.
> 
> glxgears generaly hangs if I resize the window too fast. If i resize the
> window slow enough no hangs occur. No issues when using composition.
> 
> I have error_state and Xorg.0.log for attachment 82749 [details] and 82750
> if needed.
> 
> I also get these new messages in dmesg after I apply any of the 2 patches:
> 
> 
> [   51.077013] [drm:drm_edid_block_valid] *ERROR* EDID checksum is invalid,
> remainder is 255
> [   51.101981] [drm:drm_edid_block_valid] *ERROR* EDID checksum is invalid,
> remainder is 255
> [   51.102002] [drm:drm_edid_block_valid] *ERROR* EDID checksum is invalid,
> remainder is 255
> [   51.102009] stereo mode not supported
> [   51.102013] stereo mode not supported
> 
> Attachment 82751 [details] (One more variant) fails to patch.

Never mind the "new messages" part. It seems I get these errors with the stock driver as well.

Comment 22 Chris Wilson 2013-08-07 08:03:48 UTC

*** Bug 67856 has been marked as a duplicate of this bug. ***

Comment 23 Matthew Thode 2013-08-07 16:00:07 UTC

Created attachment 83787 [details]
error state dump - first hang

Comment 24 Chris Wilson 2013-09-08 19:55:26 UTC

*** Bug 69099 has been marked as a duplicate of this bug. ***

Comment 25 Jens Pranaitis 2013-10-03 08:48:46 UTC

I can't reproduce this bug anymore after the last X stack upgrade in Gentoo.

I have:
kernel 3.11.3
mesa 9.1.6
xorg-server 1.14.3
xf86-video-intel 2.21.15
libdrm 2.4.46

Comment 26 Martin Jørgensen 2013-10-03 09:04:30 UTC

I've recently also upgraded my Gentoo system to the same package versions as Jens Pranaitis, except the kernel which is 3.10.7-r1.
I'm currently running xmonad, and the momemt i run glxgears the GPU hangs big time.
I still need to compile xf86-video-intel with uxa and disable sna to avoid hangs.

Comment 27 Chris Wilson 2013-10-03 09:10:57 UTC

You do realise that UXA only works because it doesn't support vsync? You can also turn off vsync for SNA with either

  Option "SwapbuffersWait" "false"

or

  Option "VSync" "false"

At this moment in time, the most likely reason is that you have an early version of SNB prior to the retrofitted vsync support. :|

Comment 28 Martin Jørgensen 2013-10-03 11:34:23 UTC

No i did not realize that. I will try that instead.

Which version have the retrofitted vsync? >2.21.15?

Comment 29 Chris Wilson 2013-10-03 11:37:07 UTC

I was referring to the GPU. (They didn't add vsync back into the design until very, very late.)

Comment 30 Martin Jørgensen 2013-10-03 12:20:35 UTC

*sigh* I guess I'm too sleepy. For some reason I read SNB as SNA.

I belive my graphics card is  HD 3000 (GT2). Is that recent enough for vsync?

Comment 31 Chris Wilson 2013-10-03 12:24:58 UTC

The question is which stepping of the GPU - as it was never clear which stepping received the fix, which stepping actually went to market first and how to query the stepping from userspace...

Comment 32 Matthew Thode 2013-10-03 16:54:24 UTC

I have noticed that when switching monitors with xrandr, if I shut them all off before turning them up again I don't hit this bug.

#doesn't hit
xrandr --output LVDS1 --off
xrandr --output HDMI1 --auto
xrandr --output LVDS1 --off --output VGA1 --auto --output HDMI1 --auto --right-of VGA1

instead of this

#does hit
xrandr --output HDMI1 --auto
xrandr --output LVDS1 --off
xrandr --output LVDS1 --off --output VGA1 --auto --output HDMI1 --auto --right-of VGA1

Comment 33 Javier S. Pedro 2013-10-18 10:46:40 UTC

I experience a similar issue with 3.7+ kernels that seems to be caused by the same underlying cause.
This is a Lenovo X220t, (early) SNB graphics.
I use Metacity without compositing, in a dual-monitor setup (internal LVDS + external VGA1).

Launching glxgears on VGA1 causes lots of "Kicking stuck wait on render ring" messages on dmesg and the framerate (of the entire screen) drops to something like 0.1 fps.
Glxgears on LVDS is completely fine (60fps). Launching it on LVDS and then dragging the window to VGA1, however, also causes the problem. 
The lockups start the moment the center of the glxgears window crosses monitors, so I suspected Vsync issue, and found this bug report.

Adding 'Option "SwapbuffersWait" "false"' workarounds the problem. 
As does enabling compositing, or switching to UXA.

Comment 34 Daniel Vetter 2013-10-18 14:09:09 UTC

(In reply to comment #33)
> I experience a similar issue with 3.7+ kernels that seems to be caused by
> the same underlying cause.
> This is a Lenovo X220t, (early) SNB graphics.
> I use Metacity without compositing, in a dual-monitor setup (internal LVDS +
> external VGA1).
> 
> Launching glxgears on VGA1 causes lots of "Kicking stuck wait on render
> ring" messages on dmesg and the framerate (of the entire screen) drops to
> something like 0.1 fps.
> Glxgears on LVDS is completely fine (60fps). Launching it on LVDS and then
> dragging the window to VGA1, however, also causes the problem. 
> The lockups start the moment the center of the glxgears window crosses
> monitors, so I suspected Vsync issue, and found this bug report.
> 
> Adding 'Option "SwapbuffersWait" "false"' workarounds the problem. 
> As does enabling compositing, or switching to UXA.

Please make sure that you're on the latest version of the intel DDX. Early versions of the snb vsync support had bugs with dual-head configurations. If you still experience hangs then please file a new bug report - for us it's much easier to mark duplicates than to untangle multiple bugs in the same report. And there are countless reasons to hang a gpu ;-)

Comment 35 Daniel Vetter 2013-11-18 08:11:21 UTC

Presuming fixed with latest ddx version.

Comment 36 Martin Jørgensen 2013-11-18 13:04:43 UTC

Using kernel 3.12, and ddx 2.99.906, I'm still able to provoke a hang when resizing the glxgears window intensively, but it seems to be alot harder to make the gpu hang now. Fullscreen applications, vlc, and GL apps doesnt seem to hang anyone, but I havent tested it much.

I'll open a new one if it gets severe.

Comment 37 Daniel Vetter 2013-11-18 17:34:47 UTC

The glxgears bug could be the infamous blorp death bug, which is fixed in the 9.2.4 release of mesa iirc. So please check that you have that, if not please file a new bug with the error state attached.

Comment 38 Jari Tahvanainen 2016-10-07 10:11:00 UTC

Closing verified+fixed.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.