Summary: | [BDW rc6] GPU hang playing a video | ||
---|---|---|---|
Product: | xorg | Reporter: | Timo Aaltonen <tjaalton> |
Component: | Driver/intel | Assignee: | Rodrigo Vivi <rodrigo.vivi> |
Status: | RESOLVED FIXED | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> |
Severity: | critical | ||
Priority: | high | CC: | gary.c.wang, intel-gfx-bugs, xiong.y.zhang, yk |
Version: | git | ||
Hardware: | Other | ||
OS: | All | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
Description
Timo Aaltonen
2014-08-28 18:46:52 UTC
using 'ximagesink' instead works fine That's a planar YUV video (as opposed to packed YUV), do you have other videos that work? Just hoping that the failure is in the planar video path... Timo, can you please try with commit 2086965e5c0781e0a3996de89e4dda03c5d42610 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Aug 29 10:37:09 2014 +0100 gen8: Refresh video render programs ? So that patch didn't change things as discussed on irc, any further ideas? :) Timo mentioned that if he disables rc6 from the BIOS, all is fine. We await results from testing with recent -nightly and the per-context reg w/a. nightly build from last night should have that commit and it still has the bug this issue is blocking us from shipping. bumping the importance to high+critical . please let me know if this is not appropriate. thanks -YK Here's a recent error state with drm-intel-nightly build from Sep 17th: http://koti.kapsi.fi/~tjaalton/bdw/i915_error_state_b36_intel Created attachment 106758 [details] [review] patch 1/2 Created attachment 106759 [details] [review] patch 2/2 WaCsStallBeforeStateCacheInvalidate Could you please test -nightly with 2 patches attached? Also the original equivalent of them on your kernel? no luck with them on -nightly, error state looks identical to the old That is odd. Maybe I was looking your error state though... But regardless the -nightly result with those patches, your kernel really need those original patches. Created attachment 107170 [details]
GPU HANG: ecode 0:0xf5dffffe
Run "UbuntuBoot.ogv" playback on Ubuntu 14.04 (3.13.0-36-generic #63+hwe3-Ubuntu) and the gpu hang in kernel log
Created attachment 107215 [details] Enable using BCS for pageflips in gen7/7+/8 It verified this issue in HP Stag BW C2 Sku device/BIOS B.38 for kernel 3.13 (https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tag/?id=v3.13) with "drm-i915-always-enable-BCS-to-gen7-later" patch applied. It forces to using BCS pageflips for Ivybridge and later. The GPU hang with ecode:0xf5dffffe went away in kernel 3.13+patch, and UbuntuBoot.ogv playback worked well (via firstboot-video provided by Canonical). test #1, test cycle with 1345 times overnight, pass test #2, test cycle with 100 times, pass test #3, test cycle with 123 times, pass test #4, test cycle with 132 times, pass test #4, test cycle with 50 times, pass The random UI freeze (like # 77104, https://bugs.freedesktop.org/show_bug.cgi?id=77104) didn't be happened again until now. Gary For fix patch (https://bugs.freedesktop.org/attachment.cgi?id=100213) from issue #77104 (https://www.libreoffice.org/bugzilla/show_bug.cgi?id=77104), it still got some fail-rate to be UI freeze/GPU hang. For #16/17, it's based on BDW platform. Does disabling RC6 really eliminate the error? The original comments indicated it helped some early steppings, but not later steppings. I have to admit I'm lost here. This patch looks correct because it forces a behaviour that is already the one used upstream. And also the one used on Canonical backport for BDW. So I have no idea what kernel in question here. Does Canonical applied the kernel I had pointed out? to include this W/A: WaCsStallBeforeStateCacheInvalidate ? #77104 doesn't make sense here. If you are facing a similar issue this is another bug. Please reproduce it with -nightly and open a new bug. Hi Timo, I got a clean ubuntu 14.04-1 here and got the versions you had mentioned on the first report from launchpad and tried to reproduce the bug locally here and I couldn't. With your 3.13.0-36 it hangs on boot. with 3.17 everything works fine, including the video. Is there anything I'm missing? Any other change on your environment you didn't mentioned? You need to install the matching linux-image-extra package too, which has all of drm/*.. that'd explain the boot hang. I think the problem that OEM1&2 are seeing (and not OEM3) is due to the fact that their first-stage installer uses a slightly older kernel (-34) which then might(?) leave the hw in some state that after a reboot to the latest kernel it'll fail with this issue. The gpu hang can't be reproduced after the second reboot.. I'll try to synthesize that on my hw. And I'll double-check if this is the diff the images have. It can be reproduced in XUbuntu 14.10 beta-1 (http://cdimage.ubuntu.com/xubuntu/releases/14.10/beta-1/xubuntu-14.10-beta1-desktop-amd64.iso) with its resolution more than 1920x1080 (in WSB SDS). if 14.10beta fails it could be because it's 3.16 based kernel doesn't have all the workarounds.. I upgrade the kernel from 3.16 to 3.18rc1 in Xubuntu 14.10 beta-1/-2, still suffered from the same fail GPU hang error code “0x85dffffb” on BDW platform (WSB SDS) Created attachment 108334 [details] rc6 disabled in BDW d-step CPU with kernel 3.18-rc1/Xubuntu 14.10 beta For comment #25 (drm-intel-nightly-10/22), If disabling rc6 by i915.enable_rc6=0 in drm-intel-nightly-10/22, it only suffered GPU hang at the first time, and worked well at following test cycles. It appears to be related to GPU rc6. intel@intel-Broadwell-Client-platform:~$ ./play.sh Setting pipeline to PAUSED ... Pipeline is PREROLLING ... Pipeline is PREROLLED ... Setting pipeline to PLAYING ... New clock: GstPulseSinkClock WARNING: from element /GstPlayBin:playbin0/GstBin:vbin/GstBin:bin0/GstXvImageSink:xvimagesink0: A lot of buffers are being dropped. Additional debug info: gstbasesink.c(2875): gst_base_sink_is_too_late (): /GstPlayBin:playbin0/GstBin:vbin/GstBin:bin0/GstXvImageSink:xvimagesink0: There may be a timestamping problem, or this computer is too slow. WARNING: from element /GstPlayBin:playbin0/GstBin:vbin/GstBin:bin0/GstXvImageSink:xvimagesink0: A lot of buffers are being dropped. Additional debug info: gstbasesink.c(2875): gst_base_sink_is_too_late (): /GstPlayBin:playbin0/GstBin:vbin/GstBin:bin0/GstXvImageSink:xvimagesink0: There may be a timestamping problem, or this computer is too slow. Got EOS from element "playbin0". Execution ended after 33048083158 ns. Setting pipeline to PAUSED ... Setting pipeline to READY ... Setting pipeline to NULL ... Freeing pipeline ... intel@intel-Broadwell-Client-platform:~$ ./play.sh Setting pipeline to PAUSED ... Pipeline is PREROLLING ... Pipeline is PREROLLED ... Setting pipeline to PLAYING ... New clock: GstPulseSinkClock Got EOS from element "playbin0". Execution ended after 33047536912 ns. Setting pipeline to PAUSED ... Setting pipeline to READY ... Setting pipeline to NULL ... Freeing pipeline ... The theory that I had was wrong, it doesn't matter if the first-stage installer kernel is old or not, still happens with a newer kernel. And in fact I can reproduce this on 14.10 with WB and newer kernel.. at least sometimes. Same ecode 0x85dffffb. Hi Timo and Garry, This seems duplicate of: https://bugs.freedesktop.org/show_bug.cgi?id=85389 Can you please verify the xf86-video-intel' sna fix listed there. ddx on 14.10 is 2.99.914, so it doesn't have that regression I tried again to reproduce here and everything run fine. Now I got Xubuntu 14.10. But latest one already contains Mesa 10.3. So you probably wants to give a try. But also other differences are on Silicon stepping and on BIOS. I would recommend to test your images on latest available silicon/bios. (In reply to Rodrigo Vivi from comment #30) > I tried again to reproduce here and everything run fine. > > Now I got Xubuntu 14.10. But latest one already contains Mesa 10.3. So you > probably wants to give a try. > > But also other differences are on Silicon stepping and on BIOS. I would > recommend to test your images on latest available silicon/bios. Hi Rodrigo, can you share what Silicon stepping and BIOS/vBIOS version you're using ? thank you -YK Created attachment 108553 [details] The latest xf86-video-intel built for commit d08a5f555a0c47ae23c0f9a890b512cb23e74feb Hi Rodrigo, I use the latest snapshot (including your patch http://cgit.freedesktop.org/xorg/driver/xf86-video-intel/commit/?id=4df0052a21efd744c4b8cb2409139ded6e45f5c8) of xf86-video-video to verify this issue in Xubuntu 14.10 beta-1 (because my built host is xserver-xorg-core v1.15), commit d08a5f555a0c47ae23c0f9a890b512cb23e74feb Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Oct 24 09:53:29 2014 +0100 sna/trapezoids: Prevent overflow of edge gradient in mono rasteriser References: https://bugs.freedesktop.org/show_bug.cgi?id=70461#c76 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> This issue is still able to be reproduced with the same GPU error code in CPU d-step BDW machine. Created attachment 108554 [details] xserver-xorg-video-intel_2.99.999-0ubuntu1.1_amd64.deb For comment #32, The latest code of xf86-video-intel built for commit commit d08a5f555a0c47ae23c0f9a890b512cb23e74feb Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Oct 24 09:53:29 2014 +0100 sna/trapezoids: Prevent overflow of edge gradient in mono rasteriser References: https://bugs.freedesktop.org/show_bug.cgi?id=70461#c76 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> It got the same GPU error code in Xubuntu 14.10 formal release in the same BDW devices. (BIOS: BDW-E2R1.86C.0095.R08.1410190256, 10/19/2014, d-step CPU) in 3200x1800, and pass the test in 1920x1080. I will try to get newer BDW CPU/DEVICE for its verification (I don't have them until now). The version of MESA in Xubuntu 14.10 formal release is v10.3.0, xdrv is 2.99.914, libdrm is 2.4.56 One way to trigger this is to bump the scale factor on Unity to 1.5, then I can reproduce it on the BDW ULX box too. It should have the latest stepping (4), while my Wilson Beach is still on beta. You can find the scale factor from display settings. It's set by default on the OEM machines in question. After disabling it the gpu hang is not seen. On Wilson Beach,I can reproduce this issue as Timo suggest to set scale > 1 and resolution > 1920x1080 If I add i915.enable_rc6=0 boot option on Wilson Beach, the first time run gst-launch, the gpu will hang. Once the gpu finish reset resulting from gpu hang, running gst-launch has no problem. Could you please try reverting this patch and see if you can still reproduce the issue: git show 0d68b25e9ceb344fe2f93373b1c0311d33814265 commit 0d68b25e9ceb344fe2f93373b1c0311d33814265 Author: Tom O'Rourke <Tom.O'Rourke@intel.com> Date: Wed Apr 9 11:44:06 2014 -0700 drm/i915/bdw: Use timeout mode for RC6 on bdw (In reply to Rodrigo Vivi from comment #39) > Could you please try reverting this patch and see if you can still reproduce > the issue: > > git show 0d68b25e9ceb344fe2f93373b1c0311d33814265 > commit 0d68b25e9ceb344fe2f93373b1c0311d33814265 > Author: Tom O'Rourke <Tom.O'Rourke@intel.com> > Date: Wed Apr 9 11:44:06 2014 -0700 > > drm/i915/bdw: Use timeout mode for RC6 on BDW After reverting this commit, this issue still exist Created attachment 108997 [details] [review] Use Vmask for 3DSTATE_PS Please confirm attached xf86-video-intel patch fixes the issue for you. Created attachment 109004 [details] "Use Vmask for 3DSTATE_PS" patch applied xserver-xorg-video deb I verified it by "Use Vmask for 3DSTATE_PS" patch with following the latesat xf86-video-intel snapshot (without that patch, it fails test and gets GPU hang) commit ba408bf21c4b65f19c7b581e4c88c92805184334 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Nov 4 13:39:52 2014 +0000 sna: Correct units for videoRam It appears to work well on WSB SDS now (rc6_enabled). Thanks Rodrigo! For Timo, can you help to verify it on your customized system environment? (In reply to Rodrigo Vivi from comment #41) > Created attachment 108997 [details] [review] [review] > Use Vmask for 3DSTATE_PS > > Please confirm attached xf86-video-intel patch fixes the issue for you. This patch fix this issue reproduced on Willson Beach thanks the patch applied on 2.99.910 works fine, but on current master (the driver you provided) it causes corrupted video output on the window Hi Timo, I only built it based on 2.99.216+ for its experiment on WSB SDS/Xubuntu 14.10 beta-1 (original one is 2.99.214). commit 97fe3c1c860978c7a649cba93a55fa497010ccc1 Author: Rodrigo Vivi <rodrigo.vivi@intel.com> Date: Wed Nov 5 15:48:14 2014 -0800 sna: Use VMask in 3DSTATE_PS Using dispatch mask cause hangs waiting PS Done on some cases like bug #83207, with larger screen or when scaling it. Also mesa uses VMask instead of Dmask for 3DSTATE_PS because in some cases they were getting incorrect derivatives for subspans. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83207 Cc: Timo Aaltonen <tjaalton@ubuntu.com> Cc: Gary Wang <gary.c.wang@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Tested-by: Timo Aaltonen <tjaalton@ubuntu.com> |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.