Summary: | [BSW]Graphics frozen / stuck after random time (minutes-hours) messages contain "drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe A ..." | ||
---|---|---|---|
Product: | DRI | Reporter: | Jan Bertran <joanbe> |
Component: | DRM/Intel | Assignee: | Intel GFX Bugs mailing list <intel-gfx-bugs> |
Status: | CLOSED NOTOURBUG | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> |
Severity: | major | ||
Priority: | medium | CC: | adolfo_sm_cr, ben, intel-gfx-bugs |
Version: | unspecified | ||
Hardware: | x86-64 (AMD64) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | BSW/CHT | i915 features: | display/atomic |
Attachments: |
Description
Jan Bertran
2015-10-02 08:03:48 UTC
Created attachment 118585 [details]
dmesg from startup
Created attachment 118664 [details]
tested again (same result) with other ram modules just in case, so new dmidecode
Can you attach to the hanging processes with gdb -p and check the backtraces? If we're not seeing a GPU hang, the process must be hanging on something else, maybe waiting for an event or potentially a locking problem in the kernel (sysrq-t would help track that down; depending on your kernel config that's easy or hard to do; best google around for your specific distro instructions). Created attachment 119213 [details]
new complete dmesg with "echo t > /proc/sysrq-trigger" at the end
Created attachment 119214 [details]
X server backtrace
Created attachment 119215 [details]
OpenGL sample application backtrace
If you can gather cat /sys/kernel/debug/dri/0/i915_gem_pageflip and an Xorg.0.log from xf86-video-intel compiled with --enable-debug=full, that should be enough information to pinpoint the blame. Jan, have you attempted to reproduce this without OpenGL involved? First a bit of history about the issue. One year ago we tested our games in single screen mode under Kubuntu 14.10 and J1900 system. It worked for weeks without any issue. Then an update on xf86-video-intel always produced a hang and always in a few seconds. We went back to original xf86-video-intel file. Later we started making dual screen versions of our games. The window setup keeps being similar. The app window covers whole framebuffer, single OpenGl context (No compositor) and page flip gets activated as in single monitor setup. Then started to happen graphics frozen issues. We have tried for several releases of Ubuntus (15.04 15.10 kernels 3.10, 4.1 4.2 4.3) and Fedora 22 with same result but different behavior and time to hang (some could be recovered switching to console vt and back to X, others left system unusable). Lately all hangs on J1900 left system totally inaccessible so I switched testing to N3150. About test without OpenGL I'm not sure to understand. I'm not sure how to test page flip without OpenGL or maybe you suggest that depending on the gpu load the behavior can change. I think so, our game needs less time to hang than a test application I made based on glxgears. Also about timing we setup frame rate to 30fps (60 Hz monitor and glXSwapIntervalMESA(2) or glXSwapIntervalSGI(2) ) While almost all applications just sync to VBlank with no divisor, so this could be an uncommon case. The last tests done are under Kubuntu 15.10 with kernels linux-image-4.3.0-994-generic_4.3.0-994.201510162200_amd64 and libdrm2_2.4.65+git20150922.f3c6740f-0ubuntu0ricotz and more recently with linux-image-4.3.0-994-generic_4.3.0-994.201511052100_amd64 and libdrm2_2.4.65+git20151026.c745e541-0ubuntu0ricotz And with latest updates I have a problem: the test application does not hang (at least for 3 days) without "drm.debug=0x1e" but it does hang when drm debug is enabled, our game instead keeps hanging in 1 day regardless of drm debug. It seems also that if intel_gpu_top is running on an ssh terminal graphics hang in much less time. Maybe system/gpu load influences on some race condition ? Test app is just glxgears hacked with dummy textures and more gears and window setup using all display size, but our game usually is on the limit of 30fps (sporadically goes down to 20fps). Next week I will do requested tests of xf86-video-intel with debug enabled, and later a test app that just fills framebuffer with single color. At the moment cat /sys/kernel/debug/dri/0/i915_gem_pageflip gives either: No flip due on pipe A (plane A) No flip due on pipe B (plane B) No flip due on pipe C (plane C) or like: Flip queued on pipe A (plane A) Flip queued on render ring at seqno 13434, next seqno 13435 [current breadcrumb 13434], completed? 1 Flip queued on frame 90833, (was ready on frame 0), now 90833 Stall check enabled, 1 prepares Current scanout address 0x0da59000 New framebuffer address 0x0da59000 MMIO update completed? 1 No flip due on pipe B (plane B) Flip queued on pipe C (plane C) Flip queued on render ring at seqno 13434, next seqno 13435 [current breadcrumb 13434], completed? 1 Flip queued on frame 91314, (was ready on frame 0), now 91314 Stall check enabled, 1 prepares Current scanout address 0x0da4f000 New framebuffer address 0x0da4f000 MMIO update completed? 1 or Flip queued on pipe A (plane A) Flip queued on render ring at seqno 193d8, next seqno 193d9 [current breadcrumb 193d8], completed? 1 Flip queued on frame 115155, (was ready on frame 0), now 115155 Stall check enabled, 0 prepares Current scanout address 0x02db0000 New framebuffer address 0x0da59000 MMIO update completed? 0 No flip due on pipe B (plane B) Flip queued on pipe C (plane C) Flip not associated with any ring Flip queued on frame 0, (was ready on frame 0), now 115789 Stall check waiting for page flip ioctl, 0 prepares Current scanout address 0x02da6000 New framebuffer address 0x00000000 MMIO update completed? 0 (In reply to Jan Bertran from comment #9) > About test without OpenGL I'm not sure to understand. > I'm not sure how to test page flip without OpenGL or maybe you suggest > that depending on the gpu load the behavior can change. > I think so, our game needs less time to hang than a test application I made > based on glxgears. > Also about timing we setup frame rate to 30fps (60 Hz monitor and > glXSwapIntervalMESA(2) or glXSwapIntervalSGI(2) ) > While almost all applications just sync to VBlank with no divisor, so this > could be an uncommon case. Thanks. We're trying to narrow down the cause of the underlying issue. I was trying to determine if this can be reproduced without mesa. It sounds like you've been able to reproduce it without your game, which is a good start. Would you be able to post your glxgears based application? Created attachment 119697 [details]
source code of modified glxgears
Attached a hacked version of glxgears:
Main modifications (just quick hacks):
- force frame rate divided by 2 (if 60Hz monitor 30fps)
- force run only if desktop resolution is double the width one of typical resolutions (for example 1920*2 x 1080 or 1600*2 x 900 ...)
- undecorated, stay on top window. No fullscreen hint as it would make go to only one monitor, problem with some window managers the panel/menu bar is visible (deactivating page flip)
- textures generated by program with alpha channel to "stress test" and more gears.
Build with
gcc -o "glx_swapbuf_test" ./glxswapcontrol.c -lm -lX11 -lGL -lGLU
Time to hang can be as much as 2 days usually less than 1 day.
Last hang with kernel linux-image-4.3.0-994-generic_4.3.0-994.201511052100_amd64 both with and without drm debug enabled.
(In reply to Chris Wilson from comment #7) > If you can gather cat /sys/kernel/debug/dri/0/i915_gem_pageflip and an > Xorg.0.log from xf86-video-intel compiled with --enable-debug=full, that > should be enough information to pinpoint the blame. I have compiled xf86-video-intel with --enable-debug=full but it makes crash kde-plasma desktop. I'm trying to start app with other window managers or without any (xinit + shell script with xrandr) but in both cases I only get visual artifacts, (giant fragments of gears). There has to be something missing in X startup sequence/permissions. Thanks. Created attachment 120080 [details]
finally running with xf86-video-intel debug full Xorg log
contents of
/sys/kernel/debug/dri/0/i915_gem_pageflip:
No flip due on pipe A (plane A)
No flip due on pipe B (plane B)
No flip due on pipe C (plane C)
Attached fragments of Xorg.0.log file
With debug full, the assert crashes in minutes/seconds. Now running the test with UXA instead of default SNA cat /etc/X11/xorg.conf Section "Device" Identifier "Intel UXA" Driver "Intel" Option "AccelMethod" "UXA" EndSection First difference: there are no debug messages from xorg_video_Intel. (In reply to Jan Bertran from comment #14) > With debug full, the assert crashes in minutes/seconds. > Now running the test with UXA instead of default SNA > cat /etc/X11/xorg.conf > Section "Device" > Identifier "Intel UXA" > Driver "Intel" > Option "AccelMethod" "UXA" > EndSection > > First difference: there are no debug messages from xorg_video_Intel. 2 days working, UXA seems valid workaround (at least with linux-image-4.3.0-994-generic_4.3.0-994.201511052100_amd64.deb) for me, next tests pending xf86-video-intel from git instead of ubuntu release and sna enabled, and maybe J1900 instead of N3150. N3150 with SNA and latest git (2015-11-22) keeps blocking on page flip events. N3150 with UXA works OK. J1900 keeps hard-locking both with UXA and SNA. (in both cases multimonitor setup) Jan, Ben - is this issue still valid on newer kernels? There have been 1 year of silence... (In reply to Jari Tahvanainen from comment #17) > Jan, Ben - is this issue still valid on newer kernels? There have been 1 > year of silence... In my opinion bug could be closed on Gen8 graphics as the problem with Celeron N3150 was in the X11 SNA driver and not in the kernel. About Gen7 J1900 I have not tried anymore. Setting resolved+notourbug per comment 18 by Reporter. Closing Resolved+Notourbug per comment 18 by Reporter. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.