Bug 75295 - Frequent hang and render glitches on Ubuntu 14.04
Frequent hang and render glitches on Ubuntu 14.04
Status: RESOLVED FIXED
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965
unspecified
x86-64 (AMD64) Linux (All)
: medium major
Assigned To: Ian Romanick
Intel 3D Bugs Mailing List
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-02-21 03:38 UTC by Rohan Dhruva
Modified: 2015-02-12 21:43 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
File /sys/class/drm/card0/error (2.22 MB, text/plain)
2014-02-21 03:42 UTC, Rohan Dhruva
Details
Complete dmesg (73.72 KB, text/plain)
2014-02-21 03:42 UTC, Rohan Dhruva
Details
Latest Xorg log (34.28 KB, text/plain)
2014-02-21 03:43 UTC, Rohan Dhruva
Details
Output of glxinfo (16.84 KB, text/plain)
2014-02-21 08:08 UTC, Rohan Dhruva
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Rohan Dhruva 2014-02-21 03:38:57 UTC
After the latest kernel upgrade, my system has many graphical glitches, and is locking up frequently. The dmesg output has errors like these:

[ 1951.568672] Watchdog[2753]: segfault at 0 ip 00007fe00773a32e sp 00007fdff869f680 error 6 in chrome[7fe003cbe000+5dd9000]
[ 1959.241676] [drm] stuck on render ring
[ 1959.241685] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[ 1959.241686] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[ 1959.241687] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[ 1959.241688] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[ 1959.241689] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[ 1959.244266] [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0x3dc32000 ctx 17) at 0x3dc32c48
[ 3964.330034] perf samples too long (2503 > 2500), lowering kernel.perf_event_max_sample_rate to 50000
[ 4849.028446] [drm] stuck on render ring
[ 4849.028492] [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0x565c2000 ctx 17) at 0x565c2c48
[ 4861.093551] Watchdog[5226]: segfault at 0 ip 00007fc00d36f32e sp 00007fbffe2d4680 error 6 in chrome[7fc0098f3000+5dd9000]
[ 4863.020198] [drm] stuck on render ring
[ 4863.020255] [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0x3dc32000 ctx 17) at 0x3dc32c48
[ 4893.028245] [drm] stuck on render ring
[ 4893.028295] [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0x4cbed000 ctx 10) at 0x4cbedc98
[ 4899.041855] [drm] stuck on render ring
[ 4899.041900] [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0xea3d000 ctx 10) at 0xea3dc98
[ 4899.041903] [drm:i915_context_is_banned] *ERROR* context hanging too fast, declaring banned!
[ 5833.173837] warning: `VBoxHeadless' uses 32-bit capabilities (legacy support in use)
[ 5833.326542] device vboxnet0 entered promiscuous mode
[ 6429.476175] [drm] stuck on render ring
[ 6488.455986] [drm] stuck on render ring
[ 6547.507818] [drm] stuck on render ring
[ 6615.490047] [drm] stuck on render ring

I am not sure if the xserver-xorg-video-intel ricver was also updated at the same time. This is the version in use:

rdhruva@ubuntu:~$ apt-cache policy xserver-xorg-video-intel
xserver-xorg-video-intel:
  Installed: 2:2.99.910-0ubuntu1
  Candidate: 2:2.99.910-0ubuntu1
  Version table:
 *** 2:2.99.910-0ubuntu1 0
        500 http://us.archive.ubuntu.com/ubuntu/ trusty/main amd64 Packages
        100 /var/lib/dpkg/status

The corresponding ubuntu bug is: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1282867. That bug has a lot of information, including dmidecode output.
Comment 1 Rohan Dhruva 2014-02-21 03:42:20 UTC
Created attachment 94469 [details]
File /sys/class/drm/card0/error

Attaching as directed by dmesg.
Comment 2 Rohan Dhruva 2014-02-21 03:42:51 UTC
Created attachment 94470 [details]
Complete dmesg
Comment 3 Rohan Dhruva 2014-02-21 03:43:20 UTC
The versions might be incorrect, I am not sure about that.
Comment 4 Rohan Dhruva 2014-02-21 03:43:34 UTC
Created attachment 94471 [details]
Latest Xorg log
Comment 5 Chris Wilson 2014-02-21 08:01:34 UTC
Any clue as to what OpenGL applications are running at the time of the hangs? Also what version of mesa is installed (i.e. the output of glxinfo)?
Comment 6 Rohan Dhruva 2014-02-21 08:07:23 UTC
I don't remember the exact list of applications, but I use KDE 4.12.2. I also have Chrome and Hexchat open. No games, video playback, or other graphic intensive applications were running.
Comment 7 Rohan Dhruva 2014-02-21 08:08:44 UTC
Created attachment 94481 [details]
Output of glxinfo
Comment 8 Rohan Dhruva 2014-02-21 18:49:51 UTC
I have switched to UXA for now, and that seems to have solved all the problems: I don't see any display glitches, and no lock-ups. The "stuck on render ring" messages are also gone. 

Let me know if I can provide any more debugging information.
Comment 9 3vi1 2014-02-23 16:29:20 UTC
I've been seeing the same glitches, starting when the updated mesa packages were pushed out on 2/20:  http://ubuntuforums.org/showthread.php?t=2206883

At exactly the same time the glitches appeared, the primus bridge quit working for Bumblebee:  https://github.com/amonakov/primus/issues/133

Backleveling mesa to the packages in the saucy repository causes the graphical glitches to disappear and everything looks normal again.
Comment 10 Daniel Vetter 2014-03-03 10:32:04 UTC
Can you try to bisect through the mesa git history to find this regression?
Comment 11 Rohan Dhruva 2014-03-03 19:58:20 UTC
Are there any instructions on how to do this for Ubuntu?
Comment 12 Daniel Vetter 2014-03-04 20:05:57 UTC
First hit on google for kernel besicting ;-)

https://wiki.ubuntu.com/Kernel/KernelBisection
Comment 13 Rohan Dhruva 2014-03-04 20:07:10 UTC
Sorry, I was not aware that this is a kernel thing. I thought this bisection was required in the mesa source package.
Comment 14 Daniel Vetter 2014-03-04 20:13:25 UTC
It's a mesa thing, it's me being confused since the bugzilla update somehow ended up in my kernel bugs folder.

For bisecting mesa you can simply build from sources without any need to install anything. You only need to set LIBGL_DRIVERS_PATH to the i965_dri.so binary built by mesa, e.g.

LIBGL_DRIVERS_PATH=~/home/mesa/src/mesa/drivers/dri/i965/.libs/ glxinfo

Mesa built from git has the git version tag in the Gl version string embedded, so you can check you run the right thing.
Comment 15 Rohan Dhruva 2014-03-04 20:33:11 UTC
Following the instructions to build mesa, I think I was able to get it running successfully: 

rdhruva@ubuntu:~/build/mesa$ LIBGL_DRIVERS_PATH=./lib LD_PRELOAD=./lib/libGL.so.1  glxinfo  | grep -i version 
server glx version string: 1.4
client glx version string: 1.4
GLX version: 1.4
OpenGL core profile version string: 3.3 (Core Profile) Mesa 10.2.0-devel (git-079bff5)
OpenGL core profile shading language version string: 3.30
OpenGL version string: 3.0 Mesa 10.2.0-devel (git-079bff5)
OpenGL shading language version string: 1.30

Now when I try "glxgears", everything is fine. I am unable to determine if this because the latest git checkout fixed the problem, or if the problem is not surfacing because I am running X in the UXA acceleration mode (instead of SNA). 

Looking at the logs, can you suggest a good way to reproduce this problem? Thanks!
Comment 16 Daniel Vetter 2014-03-04 20:40:14 UTC
You need to check out the same version of mesa you have currently installed, to make sure you can reproduce the issue correctly when building from sources. Then the same for the last known working version.

Only once that's confirmed should you start the bisect.
Comment 17 Rohan Dhruva 2014-03-05 04:12:59 UTC
I updated to the latest packages in Ubuntu, and this bug still exists. 

Daniel, are you sure this is a problem in i965? dmesg seems to indicate i915:

[   70.826529] [drm] stuck on render ring
[   70.826536] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[   70.826538] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[   70.826538] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[   70.826539] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[   70.826540] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[   70.829103] [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0x4ddb4000 ctx 2) at 0x4ddb4d50
[   76.828152] [drm] stuck on render ring
[   76.828189] [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0x9162000 ctx 2) at 0x9162c64
[   76.828191] [drm:i915_context_is_banned] *ERROR* context hanging too fast, declaring banned!
Comment 18 Chris Wilson 2014-03-05 07:25:03 UTC
The kernel is just the messenger here reporting that someone hung the GPU. The details about who and how are all in the error state.
Comment 19 Rohan Dhruva 2014-03-05 07:25:59 UTC
@Chris: Does that mean a bisect is not required? Is the attached error information enough to debug the issue?
Comment 20 Rohan Dhruva 2014-03-11 19:14:18 UTC
Hello: what can I provide to remove the NEEDINFO status? I am confused whether the git bisect is still required: Chris' message seems to imply that the problem might be completely visible in the attached error file. 

If a bisect is indeed required, I am not sure if it's for i965 or i915 (the dmesg errors reference i915).
Comment 21 Rohan Dhruva 2014-03-19 02:27:03 UTC
This is still happening to me with all the updates applied. Can I provide any more information to help debug this issue?
Comment 22 Daniel Vetter 2014-03-27 09:00:10 UTC
Chris comment was just about your statement in comment #17 that this is i915 related: The kernel driver is called i915, but the mesa driver for your hw is i965. And like Chris said the kernel is just the messenger.

In short, the bisect of mesa is still required, nothing changed.
Comment 23 Rohan Dhruva 2014-04-11 19:28:11 UTC
Daniel: I updated my git repo and checked out version 10.1 (which is what my install currently hast). I am still unable to repro the problem from git checkout because the moment I start X with "AccelMethod uxa", the problem goes away. 

Is there any better of testing this one library against an X which is not started with UXA? Starting X with SNA and testing this is not really an option because everything is unusable then.
Comment 24 Rohan Dhruva 2014-04-15 18:05:05 UTC
The latest "mesa" updates in Ubuntu fixed all the problems for me. The relevant patch I see is http://permalink.gmane.org/gmane.linux.debian.devel.x/115099, but I am not sure.
Comment 25 Kenneth Graunke 2014-04-16 04:26:10 UTC
Closing as fixed per comment #24.