It hangs soon as Firefox starts playing any HTML5 video. ickle highlighted it might be an issue with vaapi, which I can confirm that Firefox is using. As a side note, the same thing happens on weston when playing video in epiphany, which also uses gstreamer and utilizes gstreamer-vaapi. Relevant info from dmesg: Sep 04 07:16:56 krejzi kernel: [drm] stuck on render ring Sep 04 07:16:56 krejzi kernel: [drm] stuck on blitter ring Sep 04 07:16:56 krejzi kernel: [drm] GPU HANG: ecode 6:0:0xf4e9fffe, in Xorg [400], reason: Ring hung, action: reset Sep 04 07:16:56 krejzi kernel: [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. Sep 04 07:16:56 krejzi kernel: [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel Sep 04 07:16:56 krejzi kernel: [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. Sep 04 07:16:56 krejzi kernel: [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. Sep 04 07:16:56 krejzi kernel: [drm] GPU crash dump saved to /sys/class/drm/card0/error Sep 04 07:16:56 krejzi kernel: [drm:i915_set_reset_status] *ERROR* gpu hanging too fast, banning! Sep 04 07:16:56 krejzi kernel: drm/i915: Resetting chip after gpu hang Sep 04 07:17:20 krejzi kernel: [drm] stuck on render ring Sep 04 07:17:20 krejzi kernel: [drm] GPU HANG: ecode 6:0:0x87e8fffd, in kwin_x11 [560], reason: Ring hung, action: reset Sep 04 07:17:20 krejzi kernel: drm/i915: Resetting chip after gpu hang /sys/class/drm/card0/error, from two different hangs: http://www.linuxfromscratch.org/~krejzi/error.log http://www.linuxfromscratch.org/~krejzi/error2.log Linux-4.2, libdrm-2.4.64, mesa-11.0.0-rc2, xorg-server-1.17.99.901, xf86-video-intel-2.99.917 (git from today, with UXA acceleration backend, but I've tried SNA too, no diff), libva-1.6.0, libva-intel-driver-1.6.0, gstreamer-1.5.90, gstreamer-vaapi-0.6.0
Created attachment 118100 [details] Kernel oops when Epiphany hung Weston while playing youtube video This is what happened when I tried to play an html5 video on epiphany on weston.
I forgot to mention that my system is a laptop with Intel HD 3000 (Sandybridge, Gen6) graphics which also has a muxless AMD Radeon 6470M GPU. I suspect the secondary GPU might cause any issues, but I think it was worth mentioning.
libva did have a bug where they forgot to mark render targets and one of the 4.2 changes is a read-read optimisation. A hack like diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c index a953d4975b8c..e4786eeca38f 100644 --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c @@ -1033,7 +1033,7 @@ i915_gem_execbuffer_move_to_active(struct list_head *vmas, u32 old_write = obj->base.write_domain; obj->dirty = 1; /* be paranoid */ - obj->base.write_domain = obj->base.pending_write_domain; + obj->base.write_domain = I915_GEM_DOMAIN_RENDER; if (obj->base.write_domain == 0) obj->base.pending_read_domains |= obj->base.read_domains; obj->base.read_domains = obj->base.pending_read_domains; would disable the optimisation and hide the libva bug. Does that fix your hang?
Upgrading to libva/libva-intel-driver 1.6.1.pre1 seems to have fixed the issue. Keep the bug open for some time so I can do some more testing.
It was as I feared. libva/libva-intel-driver updates didn't fix the problem, it was kinda luck that got it working at that time. Not only that, but the patch from Comment 3 (backported to 4.2, although not sure if done correctly) also didn't fix the issue.
Issue still present in linux-4.3-rc1
[ 298.520654] [drm] stuck on render ring [ 298.521713] [drm] GPU HANG: ecode 6:0:0x87e8fffd, in MediaPl~back #4 [1039], reason: Ring hung, action: reset [ 298.521714] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. [ 298.521715] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel [ 298.521716] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. [ 298.521717] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. [ 298.521718] [drm] GPU crash dump saved to /sys/class/drm/card0/error [ 298.523448] drm/i915: Resetting chip after gpu hang http://www.linuxfromscratch.org/~krejzi/error3.log
Since my guess was wrong, the best option is to do a bisection between 4.1 and 4.2 (which take about 12 steps). Is that something you could do?
Now, this is confusing. git bisect says the following: 0875546c5318c85c13d07014af5350e9000bc9e9 is the first bad commit commit 0875546c5318c85c13d07014af5350e9000bc9e9 Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Mon Apr 20 09:04:05 2015 -0700 drm/i915: Fix up the vma aliasing ppgtt binding Currently we have the problem that the decision whether ptes need to be (re)written is splattered all over the codebase. Move all that into i915_vma_bind. This needs a few changes: - Just reuse the PIN_* flags for i915_vma_bind and do the conversion to vma->bound in there to avoid duplicating the conversion code all over. - We need to make binding for EXECBUF (i.e. pick aliasing ppgtt if around) explicit, add PIN_USER for that. - Two callers want to update ptes, give them a PIN_UPDATE for that. Of course we still want to avoid double-binding, but that should be taken care of: - A ppgtt vma will only ever see PIN_USER, so no issue with double-binding. - A ggtt vma with aliasing ppgtt needs both types of binding, and we track that properly now. - A ggtt vma without aliasing ppgtt could be bound twice. In the lower-level ->bind_vma functions hence unconditionally set GLOBAL_BIND when writing the ggtt ptes. There's still a bit room for cleanup, but that's for follow-up patches. v2: Fixup fumbles. v3: s/PIN_EXECBUF/PIN_USER/ for clearer meaning, suggested by Chris. Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> However, checking out that revision and building it, all is fine. Now, the two commits after that revision introduced the problem. First one, fa42331b4cd961cecb3f6919116d2e6efeb2334b didn't introduce a real problem, but a hang happened when I closed the video tab where the video was playing, not while playing. Second one, 4755265977159be0261972da2ba54917765b18ed introduced the real problem, ie hangs all over the place when a video was playing and gst-vaapi was utilized.
Can you please try: diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c index a953d4975b8c..bbf7d35ca906 100644 --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c @@ -585,7 +585,7 @@ i915_gem_execbuffer_reserve_vma(struct i915_vma *vma, uint64_t flags; int ret; - flags = PIN_USER; + flags = PIN_USER | PIN_GLOBAL; if (entry->flags & EXEC_OBJECT_NEEDS_GTT) flags |= PIN_GLOBAL; on a recent kernel?
Kernel 4.3-rc1 patched with patch from Comment 10 still has the issue.
Hmm. Can you verify that running with 0875546c5318c85c13d07014af5350e9000bc9e9^ (i.e. the commit before the bisect result) is stable (say over the course of a few days)? The patch you just tested suggests that it is not a result of the lack of aliased GGTT entries, which is perhaps the most obvious effect of the bisected commit. Spotted one fumble in the patch, but that was fixed in commit 5e562f1dddfa3242cede5ec49888260a856a9da2 Author: Mika Kuoppala <mika.kuoppala@linux.intel.com> Date: Thu Apr 30 11:02:31 2015 +0300 drm/i915: Clear vma->bound on unbinding in v4.2-rc1. Do you suspend/resume the system before the hang appears?
As I already said, even the bisected commit 0875546c5318c85c13d07014af5350e9000bc9e9 gives a stable system. It's two commits after that one that introduced the problem. I don't suspend or hibernate. It's 100% reproducible on a fresh system. I use KDE Plasma, if that matters, but even VAAPI would hang weston with epiphany on running on top GTK+ Wayland. The problem happens as soon as I utilize vaapi, no matter how, be it from firefox through gstreamer/gstreamer-vaapi or epiphany through webkit via gstreamer/gstreamer-vaapi. I can work around the problem by removing gstreamer-vaapi, which confirms that vaapi is the problem.
(In reply to Armin K from comment #13) > As I already said, even the bisected commit > 0875546c5318c85c13d07014af5350e9000bc9e9 gives a stable system. It's two > commits after that one that introduced the problem. I know. I just wanted to be absolutely certain that is the commit we pick apart. Given that you are only definitely sure that hangs start two commits after, that raises the element that perhaps the issue simply isn't easily reproduced earlier and that maybe the bisect is not definitive.
As I said, issue is either always there or not there at all, depending on which revision is picked. I'm not going to revert to an older kernel snapshot for a few days to verify what I've already verified last night and this morning while bisecting.
Probably related to #92814 ?
I don't have the hw to test this anymore and it doesn't seem that anyone else is hitting the issue.
Closing almost one year old resolved+invalid.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.