Summary: | [BAT execlists] Sporadic - gem_exec_suspend basic-s4 GPU hang after resume | ||
---|---|---|---|
Product: | DRI | Reporter: | Marius Vlad <marius.c.vlad> |
Component: | DRM/Intel | Assignee: | Humberto Israel Perez Rodriguez <humberto.i.perez.rodriguez> |
Status: | CLOSED FIXED | QA Contact: | Humberto Israel Perez Rodriguez <humberto.i.perez.rodriguez> |
Severity: | blocker | ||
Priority: | highest | CC: | abchk1234, chris, chris.harris, christophe.prigent, ewfalor, fei.yang, harish.hyma, humberto.i.perez.rodriguez, intel-gfx-bugs, kassick, leonard, matwey.kornilov, Nikolaus, pj.crommen, ricardo.vega, rodrigo.vivi, slacker702, solitone, unki, xlionell, ziegler |
Version: | DRI git | ||
Hardware: | Other | ||
OS: | All | ||
Whiteboard: | |||
i915 platform: | i915 features: | firmware/guc, GPU hang, power/suspend-resume | |
Attachments: |
Created attachment 124910 [details] /sys/class/drm/card0/error on APL Reproduced on APL. # ./gem_exec_suspend --r basic-S4 IGT-Version: 1.15-g88c1f7c (x86_64) (Linux: 4.7.0-rc5-nightly+ x86_64) rtcwake: wakeup from "disk" using /dev/rtc0 at Tue Jul 5 15:06:28 2016 (gem_exec_suspend:2939) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:399: (gem_exec_suspend:2939) igt-aux-CRITICAL: Failed assertion: !"GPU hung" Stack trace: #0 [__igt_fail_assert+0xf1] #1 [sig_abort+0x3a] #2 [killpg+0x40] #3 [__write_nocancel+0x7] #4 [igt_drop_caches_set+0xa4] #5 [gem_quiescent_gpu+0xcf] #6 [run_test+0x403] #7 [__real_main227+0x159] #8 [main+0x29] #9 [__libc_start_main+0xf0] #10 [_start+0x29] #11 [<unknown>+0x29] Subtest basic-S4 failed. **** DEBUG **** (gem_exec_suspend:2939) DEBUG: Test requirement passed: gem_has_ring(fd, 0) (gem_exec_suspend:2939) DEBUG: Test requirement passed: can_mi_store_dword(gen, 0) (gem_exec_suspend:2939) DEBUG: Test requirement passed: nengine (gem_exec_suspend:2939) DEBUG: Test requirement passed: gem_has_ring(fd, engine) (gem_exec_suspend:2939) DEBUG: Test requirement passed: can_mi_store_dword(gen, engine) (gem_exec_suspend:2939) DEBUG: Test requirement passed: nengine (gem_exec_suspend:2939) ioctl-wrappers-DEBUG: Test requirement passed: __gem_set_caching(fd, handle, caching) == 0 (gem_exec_suspend:2939) DEBUG: Test requirement passed: __gem_execbuf(fd, &execbuf) == 0 (gem_exec_suspend:2939) DEBUG: Verifying result (gem_exec_suspend:2939) DEBUG: Test requirement passed: gem_has_ring(fd, engine) (gem_exec_suspend:2939) DEBUG: Test requirement passed: can_mi_store_dword(gen, engine) (gem_exec_suspend:2939) DEBUG: Test requirement passed: nengine (gem_exec_suspend:2939) ioctl-wrappers-DEBUG: Test requirement passed: __gem_set_caching(fd, handle, caching) == 0 (gem_exec_suspend:2939) DEBUG: Test requirement passed: __gem_execbuf(fd, &execbuf) == 0 (gem_exec_suspend:2939) DEBUG: Verifying result (gem_exec_suspend:2939) DEBUG: Test requirement passed: gem_has_ring(fd, engine) (gem_exec_suspend:2939) DEBUG: Test requirement passed: can_mi_store_dword(gen, engine) (gem_exec_suspend:2939) DEBUG: Test requirement passed: nengine (gem_exec_suspend:2939) ioctl-wrappers-DEBUG: Test requirement passed: __gem_set_caching(fd, handle, caching) == 0 (gem_exec_suspend:2939) DEBUG: Test requirement passed: __gem_execbuf(fd, &execbuf) == 0 (gem_exec_suspend:2939) DEBUG: Verifying result (gem_exec_suspend:2939) DEBUG: Test requirement passed: gem_has_ring(fd, engine) (gem_exec_suspend:2939) DEBUG: Test requirement passed: can_mi_store_dword(gen, engine) (gem_exec_suspend:2939) DEBUG: Test requirement passed: nengine (gem_exec_suspend:2939) ioctl-wrappers-DEBUG: Test requirement passed: __gem_set_caching(fd, handle, caching) == 0 (gem_exec_suspend:2939) DEBUG: Test requirement passed: __gem_execbuf(fd, &execbuf) == 0 (gem_exec_suspend:2939) DEBUG: Verifying result (gem_exec_suspend:2939) ioctl-wrappers-DEBUG: Test requirement passed: __gem_set_caching(fd, handle, caching) == 0 (gem_exec_suspend:2939) DEBUG: Test requirement passed: __gem_execbuf(fd, &execbuf) == 0 (gem_exec_suspend:2939) igt-core-DEBUG: Test requirement passed: !igt_run_in_simulation() (gem_exec_suspend:2939) igt-aux-DEBUG: Test requirement passed: system("rtcwake -n -s 30 -m disk" SQUELCH) == 0 (gem_exec_suspend:2939) DEBUG: Verifying result (gem_exec_suspend:2939) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:399: (gem_exec_suspend:2939) igt-aux-CRITICAL: Failed assertion: !"GPU hung" **** END **** Subtest basic-S4: FAIL (14.214s) Platform: APL system CPU Name : Intel(R) Genuine Processor @ 1.1 GHz (family: 6, model: 12, stepping: 9) 4 cores QDF : Q6HE SoC : B1 CRB : Apollo Lake DDR3L RVP1A FAB2 Reworks : R19, R20 Software Bios: 144_B10 - APLK_B0_IFWI_X64_R_2016_06_27_0956_SPI_RVP1 from from \\gar\ec\proj\ba\CCG\APL BIOS\External\BIOS_Release\Daily\v144_10_2016_WW27.1\IFWI\IFWI_RVP1_Release\IFWI KSC: 1.15 Linux distribution: Ubuntu 16.04 64 bits Kernel: drm-intel-nightly 4.7.0-rc4 5c244f4 from http://cgit.freedesktop.org/drm-intel/ commit 5c244f4b128c6274755007e080d46e0a61b71534 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Jun 24 16:17:56 2016 +0100 drm-intel-nightly: 2016y-06m-24d-15h-17m-32s UTC integration manifest drm: libdrm-2.4.68-9 625d181 from git://anongit.freedesktop.org/mesa/drm mesa: mesa-11.2.2 56cd706 from git://anongit.freedesktop.org/mesa/mesa cairo: 1.15.2 db8a7f1 from git://anongit.freedesktop.org/cairo server: xorg-server-1.18.0-419 7397a21 from git://git.freedesktop.org/git/xorg/xserver xf86-video-intel: 2.99.917-670 cac7c8d from git://git.freedesktop.org/git/xorg/driver/xf86-video-intel libva: libva-1.7.0-26 c36971c from git://git.freedesktop.org/git/vaapi/libva vaapi-intel-driver: 1.7.0-52 f47e513 from git://git.freedesktop.org/git/vaapi/intel-driver DMC 1.07 GuC 8.7 Intel-Gpu-Tools: 1.15-54 88c1f7c from http://anongit.freedesktop.org/git/xorg/app/intel-gpu-tools.git Created attachment 124911 [details]
APL-gem_exec_suspend_gpu-hang_kern.log
On APL it could be related to Guc: [ 16.937392] [drm:guc_fw_fetch] GuC fw fetch status FAIL; err -11, fw (null), obj (null) [ 16.937417] [drm:intel_guc_init [i915]] *ERROR* Failed to fetch GuC firmware from i915/bxt_guc_ver8_7.bin (error -11) Adding Rodrigo as watcher. About GuC: BSW doesn't have GuC so probably good to file a separated issue for GuC not loading after S4 on APL. (probably happen on SKL and KBL as well). About this on APL: Can you please reproduce disabling GuC so we see if this is happening only on BSW or also on APL regardless GuC? (In reply to Rodrigo Vivi from comment #4) > About GuC: BSW doesn't have GuC so probably good to file a separated issue > for GuC not loading after S4 on APL. (probably happen on SKL and KBL as > well). > > About this on APL: Can you please reproduce disabling GuC so we see if this > is happening only on BSW or also on APL regardless GuC? Reported internally. On APL, the GPU Hang is reproduced with GuC not loaded. Executed several times. Sometimes the test is skip: ./gem_exec_suspend --r basic-S4 IGT-Version: 1.15-g2038b24 (x86_64) (Linux: 4.7.0-rc6-testing+ x86_64) Test requirement not met in function run_test, file gem_exec_suspend.c:157: Test requirement: __gem_execbuf(fd, &execbuf) == 0 Subtest basic-S4: SKIP (0.001s) The reason is different than in: http://benchsrv.fi.intel.com/archive/results/CI_IGT_test/CI_DRM_1416/bxtp-1/html/bxtp-1@CI_DRM_1416@1/igt@gem_exec_suspend@basic-s4.html Most of the time the test is fail with GPU Hang. Tested with: Platform: APL system CPU Name : Intel(R) Genuine Processor @ 1.1 GHz (family: 6, model: 12, stepping: 9) 4 cores QDF : Q6HE SoC : B1 CRB : Apollo Lake DDR3L RVP1A FAB2 Reworks : R19, R20 Software Bios: 144_B10 APLK_B0_IFWI_X64_R_2016_06_27_0956_SPI_RVP1.bin from \\gar\ec\proj\ba\CCG\APL BIOS\External\BIOS_Release\Daily\v144_10_2016_WW27.1\IFWI\IFWI_RVP1_Release\IFWI KSC: 1.15 Linux distribution: Ubuntu 16.04 64 bits Kernel: tag drm-intel-testing-2016-07-11 4.7.0-rc6 0230e3c from http://cgit.freedesktop.org/drm-intel/ commit 0230e3c4eb76cf8f57cf40db0e908b96b84e3911 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Sun Jul 10 13:24:46 2016 +0100 drm-intel-nightly: 2016y-07m-10d-12h-23m-38s UTC integration manifest drm: libdrm-2.4.68-14 8c8d5ddfrom git://anongit.freedesktop.org/mesa/drm mesa: mesa-11.2.2 3a9f628from git://anongit.freedesktop.org/mesa/mesa cairo: 1.15.2 db8a7f1 from git://anongit.freedesktop.org/cairo xserver: xorg-server-1.18.0-454 033888e from git://git.freedesktop.org/git/xorg/xserver xf86-video-intel: 2.99.917-676 26f8ab5 from git://git.freedesktop.org/git/xorg/driver/xf86-video-intel libva: libva-1.7.0-26 c36971c from git://git.freedesktop.org/git/vaapi/libva vaapi-intel-driver: 1.7.0-53 bcde10d from git://git.freedesktop.org/git/vaapi/intel-driver GuC 8.7 DMC 1.07 from https://01.org/linuxgraphics/downloads/broxton-dmc-1.07 Intel-Gpu-Tools 1.15 2038b24 from http://anongit.freedesktop.org/git/xorg/app/intel-gpu-tools.git No hang in ring buffer mode on BSW, so looks like something execlist related. And not using stolen for the ring buffer also cures it: @@ -2049,7 +2049,7 @@ static int intel_alloc_ringbuffer_obj(struct drm_device *dev, struct drm_i915_gem_object *obj; obj = NULL; - if (!HAS_LLC(dev)) + if (!HAS_LLC(dev) && !i915.enable_execlists) obj = i915_gem_object_create_stolen(dev, ringbuf->size); if (obj == NULL) obj = i915_gem_object_create(dev, ringbuf->size); I would assume our ring should be empty when we resume, so it shouldn't matter that stolen gets clobbered. But this patch says otherwise. It's hard to tell because we don't record the request->head in the error state (review sigh), but my inkling is that it is actually dying with HEAD before our request, and it is just that using stolen has invalid content triggering the hang. Following that suspicion it would be that we are flushing the context image to coherent memory before the hibernation image is made. Quickest way to test that theory would be to reset the HEAD/TAIL in the context image upon resume. Humberto, can you re-test with https://patchwork.freedesktop.org/patch/98894/ ? if this works, please sign as "tested-by" Humberto please rather consider this patch set: https://patchwork.freedesktop.org/series/9926/ *** Bug 94698 has been marked as a duplicate of this bug. *** Humberto, re-run as well igt@gem_softpin@noreloc-s4 to confirm this patch fix issue. *** Bug 96895 has been marked as a duplicate of this bug. *** Finally, please re-run also igt@gem_exec_suspend (In reply to yann from comment #11) > Humberto please rather consider this patch set: > https://patchwork.freedesktop.org/series/9926/ Hi, after test this patch i got the following results test case : gem_exec_suspend basic-s4 platform : bsw / status : pass platform : bxt / status : fail please see the output of APL ====================================== IGT-Version: 1.15-gee5d5c4 (x86_64) (Linux: 4.7.0-rc7drm-intel-nightly-bug-96526-commit-d416f56-mbox+ x86_64) (gem_exec_suspend:1553) drmtest-DEBUG: Test requirement passed: fd >= 0 (gem_exec_suspend:1553) drmtest-DEBUG: Test requirement passed: fd >= 0 (gem_exec_suspend:1553) drmtest-DEBUG: Test requirement passed: drmSetMaster(fd) == 0 (gem_exec_suspend:1553) igt-core-DEBUG: Starting subtest: basic-S4 (gem_exec_suspend:1553) DEBUG: Test requirement passed: gem_has_ring(fd, 0) (gem_exec_suspend:1553) DEBUG: Test requirement passed: can_mi_store_dword(gen, 0) (gem_exec_suspend:1553) DEBUG: Test requirement passed: nengine (gem_exec_suspend:1553) DEBUG: Test requirement passed: gem_has_ring(fd, engine) (gem_exec_suspend:1553) DEBUG: Test requirement passed: can_mi_store_dword(gen, engine) (gem_exec_suspend:1553) DEBUG: Test requirement passed: nengine (gem_exec_suspend:1553) ioctl-wrappers-DEBUG: Test requirement passed: __gem_set_caching(fd, handle, caching) == 0 (gem_exec_suspend:1553) DEBUG: Test requirement passed: __gem_execbuf(fd, &execbuf) == 0 (gem_exec_suspend:1553) DEBUG: Verifying result (gem_exec_suspend:1553) DEBUG: Test requirement passed: gem_has_ring(fd, engine) (gem_exec_suspend:1553) DEBUG: Test requirement passed: can_mi_store_dword(gen, engine) (gem_exec_suspend:1553) DEBUG: Test requirement passed: nengine (gem_exec_suspend:1553) ioctl-wrappers-DEBUG: Test requirement passed: __gem_set_caching(fd, handle, caching) == 0 (gem_exec_suspend:1553) DEBUG: Test requirement passed: __gem_execbuf(fd, &execbuf) == 0 (gem_exec_suspend:1553) DEBUG: Verifying result (gem_exec_suspend:1553) DEBUG: Test requirement passed: gem_has_ring(fd, engine) (gem_exec_suspend:1553) DEBUG: Test requirement passed: can_mi_store_dword(gen, engine) (gem_exec_suspend:1553) DEBUG: Test requirement passed: nengine (gem_exec_suspend:1553) ioctl-wrappers-DEBUG: Test requirement passed: __gem_set_caching(fd, handle, caching) == 0 (gem_exec_suspend:1553) DEBUG: Test requirement passed: __gem_execbuf(fd, &execbuf) == 0 (gem_exec_suspend:1553) DEBUG: Verifying result (gem_exec_suspend:1553) DEBUG: Test requirement passed: gem_has_ring(fd, engine) (gem_exec_suspend:1553) DEBUG: Test requirement passed: can_mi_store_dword(gen, engine) (gem_exec_suspend:1553) DEBUG: Test requirement passed: nengine (gem_exec_suspend:1553) ioctl-wrappers-DEBUG: Test requirement passed: __gem_set_caching(fd, handle, caching) == 0 (gem_exec_suspend:1553) DEBUG: Test requirement passed: __gem_execbuf(fd, &execbuf) == 0 (gem_exec_suspend:1553) DEBUG: Verifying result (gem_exec_suspend:1553) ioctl-wrappers-DEBUG: Test requirement passed: __gem_set_caching(fd, handle, caching) == 0 (gem_exec_suspend:1553) DEBUG: Test requirement passed: __gem_execbuf(fd, &execbuf) == 0 (gem_exec_suspend:1553) igt-core-DEBUG: Test requirement passed: !igt_run_in_simulation() (gem_exec_suspend:1553) igt-aux-DEBUG: Test requirement passed: system("rtcwake -n -s 30 -m disk" SQUELCH) == 0 rtcwake: assuming RTC uses UTC ... rtcwake: wakeup from "disk" using /dev/rtc0 at Fri Jul 15 18:41:53 2016 (gem_exec_suspend:1553) DEBUG: Verifying result (gem_exec_suspend:1553) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:401: (gem_exec_suspend:1553) igt-aux-CRITICAL: Failed assertion: !"GPU hung" Stack trace: #0 [__igt_fail_assert+0xf1] #1 [sig_abort+0x3a] #2 [killpg+0x40] #3 [__write_nocancel+0x7] #4 [igt_drop_caches_set+0xa4] #5 [gem_quiescent_gpu+0xcf] #6 [run_test+0x3eb] #7 [__real_main227+0x2a8] #8 [main+0x23] #9 [__libc_start_main+0xf0] #10 [_start+0x29] #11 [<unknown>+0x29] Subtest basic-S4 failed. **** DEBUG **** (gem_exec_suspend:1553) DEBUG: Test requirement passed: gem_has_ring(fd, 0) (gem_exec_suspend:1553) DEBUG: Test requirement passed: can_mi_store_dword(gen, 0) (gem_exec_suspend:1553) DEBUG: Test requirement passed: nengine (gem_exec_suspend:1553) DEBUG: Test requirement passed: gem_has_ring(fd, engine) (gem_exec_suspend:1553) DEBUG: Test requirement passed: can_mi_store_dword(gen, engine) (gem_exec_suspend:1553) DEBUG: Test requirement passed: nengine (gem_exec_suspend:1553) ioctl-wrappers-DEBUG: Test requirement passed: __gem_set_caching(fd, handle, caching) == 0 (gem_exec_suspend:1553) DEBUG: Test requirement passed: __gem_execbuf(fd, &execbuf) == 0 (gem_exec_suspend:1553) DEBUG: Verifying result (gem_exec_suspend:1553) DEBUG: Test requirement passed: gem_has_ring(fd, engine) (gem_exec_suspend:1553) DEBUG: Test requirement passed: can_mi_store_dword(gen, engine) (gem_exec_suspend:1553) DEBUG: Test requirement passed: nengine (gem_exec_suspend:1553) ioctl-wrappers-DEBUG: Test requirement passed: __gem_set_caching(fd, handle, caching) == 0 (gem_exec_suspend:1553) DEBUG: Test requirement passed: __gem_execbuf(fd, &execbuf) == 0 (gem_exec_suspend:1553) DEBUG: Verifying result (gem_exec_suspend:1553) DEBUG: Test requirement passed: gem_has_ring(fd, engine) (gem_exec_suspend:1553) DEBUG: Test requirement passed: can_mi_store_dword(gen, engine) (gem_exec_suspend:1553) DEBUG: Test requirement passed: nengine (gem_exec_suspend:1553) ioctl-wrappers-DEBUG: Test requirement passed: __gem_set_caching(fd, handle, caching) == 0 (gem_exec_suspend:1553) DEBUG: Test requirement passed: __gem_execbuf(fd, &execbuf) == 0 (gem_exec_suspend:1553) DEBUG: Verifying result (gem_exec_suspend:1553) DEBUG: Test requirement passed: gem_has_ring(fd, engine) (gem_exec_suspend:1553) DEBUG: Test requirement passed: can_mi_store_dword(gen, engine) (gem_exec_suspend:1553) DEBUG: Test requirement passed: nengine (gem_exec_suspend:1553) ioctl-wrappers-DEBUG: Test requirement passed: __gem_set_caching(fd, handle, caching) == 0 (gem_exec_suspend:1553) DEBUG: Test requirement passed: __gem_execbuf(fd, &execbuf) == 0 (gem_exec_suspend:1553) DEBUG: Verifying result (gem_exec_suspend:1553) ioctl-wrappers-DEBUG: Test requirement passed: __gem_set_caching(fd, handle, caching) == 0 (gem_exec_suspend:1553) DEBUG: Test requirement passed: __gem_execbuf(fd, &execbuf) == 0 (gem_exec_suspend:1553) igt-core-DEBUG: Test requirement passed: !igt_run_in_simulation() (gem_exec_suspend:1553) igt-aux-DEBUG: Test requirement passed: system("rtcwake -n -s 30 -m disk" SQUELCH) == 0 (gem_exec_suspend:1553) DEBUG: Verifying result (gem_exec_suspend:1553) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:401: (gem_exec_suspend:1553) igt-aux-CRITICAL: Failed assertion: !"GPU hung" **** END **** Subtest basic-S4: FAIL (25.898s) (gem_exec_suspend:1553) igt-core-DEBUG: Exiting with status code 99 relevant dmesg info ===================== [ 61.796101] [drm] GPU HANG: ecode 9:0:0x5931a887, in gem_exec_suspen [1553], reason: Hang on blitter ring, bsd ring, video enhancement ring, acti [ 61.796103] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. [ 61.796104] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. [ 61.796104] [drm] GPU crash dump saved to /sys/class/drm/card0/error [ 61.796122] drm/i915: Resetting chip after gpu hang kernel used ===================== branch : nightly commit d416f561e8fad56f2c6922ef3a703a5a829a99cf Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Jul 15 13:03:40 2016 +0100 Gfx stack ======================= Component : drm tag : libdrm-2.4.68-14-g8c8d5dd commit : 8c8d5dd Component : cairo tag : 1.15.2-44-g1a380ef commit : 1a380ef Component : intel-gpu-tools tag : intel-gpu-tools-1.15-127-gee5d5c4 commit : ee5d5c4 Attachments ============================== dmesg_bsw.log dmesg_bxt.log gpu_error_bxt Humberto, don't forget the attachments ;) I'll take the partial victory. commit 5ab57c7020697942ea15f45ad14c69cecb164329 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Jul 15 14:56:20 2016 +0100 drm/i915: Flush logical context image out to memory upon suspend Before suspend, and especially before building the hibernation image, we need to context image to be coherent in memory. To do this we require that we perform a context switch to a disposable context (i.e. the dev_priv->kernel_context) - when that switch is complete, all other context images will be complete. This leaves the kernel_context image as incomplete, but fortunately that is disposable and we can do a quick fixup of the logical state after resuming. But it looks like there's more fish in the ocean. Can you please attach the latest error state? Created attachment 125090 [details]
dmes_bsw.log
Created attachment 125091 [details]
dmesg_bxt.log
Created attachment 125092 [details]
gpu_error_bxt
We've seen that hang before, it looks like the execlists failed to submit the next request to the hardware. This hang is worth checking to see if it changes between guc and plain execlists. (In reply to yann from comment #15) > Finally, please re-run also igt@gem_exec_suspend Hi Yaan : after test all the family "gem_exec_suspend" the following subtest are fail with the configuration in my comment 16 test cases ============================= basic-S4 default-uncached-S4 default-cached-S4 render-uncached-S4 render-cached-S4 bsd-uncached-S4 bsd-cached-S4 bsd1-uncached bsd1-cached bsd1-uncached-S3 bsd1-cached-S3 bsd1-uncached-S4 bsd1-cached-S4 bsd2-uncached bsd2-cached bsd2-uncached-S3 bsd2-cached-S3 bsd2-uncached-S4 bsd2-cached-S4 blt-uncached-S4 blt-cached-S4 vebox-uncached-S4 vebox-cached-S4 Created attachment 125093 [details]
dmesg_gem_exec_suspend_apl.log
Please see the attachment "dmesg_gem_exec_suspend" for my previuos comment
Test is Pass on APL with commit 5ab57c7020697942ea15f45ad14c69cecb164329 and patch to revert GuC loading and submission. Platform: APL system CPU Name : Intel(R) Genuine Processor @ 1.1 GHz (family: 6, model: 12, stepping: 9) 4 cores QDF : Q6HE SoC : B1 CRB : Apollo Lake DDR3L RVP1A FAB2 Reworks : R19, R20 Software Bios: 144_B10 APLK_B0_IFWI_X64_R_2016_06_27_0956_SPI_RVP1.bin from \\gar\ec\proj\ba\CCG\APL BIOS\External\BIOS_Release\Daily\v144_10_2016_WW27.1\IFWI\IFWI_RVP1_Release\IFWI KSC: 1.15 Linux distribution: Ubuntu 16.04 64 bits Kernel: 4.7.0-rc7 895a714 from http://cgit.freedesktop.org/drm-intel/ with https://patchwork.freedesktop.org/patch/99445/ applied commit 895a714b0b596cfcbe82065f99376ad02d369125 Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Mon Jul 18 14:35:39 2016 +0200 drm-intel-nightly: 2016y-07m-18d-12h-35m-15s UTC integration manifest drm: libdrm-2.4.68-15 2212a64 from git://anongit.freedesktop.org/mesa/drm mesa: mesa-11.2.2 3a9f628from git://anongit.freedesktop.org/mesa/mesa cairo: 1.15.2 db8a7f1 from git://anongit.freedesktop.org/cairo xserver: xorg-server-1.18.0-460 e8e3675 from git://git.freedesktop.org/git/xorg/xserver xf86-video-intel: 2.99.917-676 26f8ab5 from git://git.freedesktop.org/git/xorg/driver/xf86-video-intel libva: libva-1.7.0-26 c36971c from git://git.freedesktop.org/git/vaapi/libva vaapi-intel-driver: 1.7.0-53 bcde10d from git://git.freedesktop.org/git/vaapi/intel-driver DMC 1.07 from https://01.org/linuxgraphics/downloads/broxton-dmc-1.07 Intel-Gpu-Tools 1.15-127 ee5d5c4 from http://anongit.freedesktop.org/git/xorg/app/intel-gpu-tools.git (In reply to cprigent from comment #25) > Test is Pass on APL with commit 5ab57c7020697942ea15f45ad14c69cecb164329 and > patch to revert GuC loading and submission. Hmm. Can you verify this by leaving the test in a loop for hours and see how long it takes before it eventually fails? You are right. The GPU Hang is reproduced after around 5 iterations. And I tried again with GuC loaded. The GPU Hang is always reproduced at first iteration. (In reply to cprigent from comment #27) > You are right. The GPU Hang is reproduced after around 5 iterations. Can you please attach the error from the non-guc fail? I want to see if it has the same characteristics. And for completeness grab an error state after a hang with the guc enabled on your machine. Created attachment 125142 [details]
APL_sys-class-drm-card0-error_without-guc
Created attachment 125143 [details]
APL-gem_exec_suspend_basic-S4_without-guc_kern.log
Created attachment 125145 [details]
APL_sys-class-drm-card0-error_with-guc
Created attachment 125146 [details]
APL-gem_exec_suspend_basic-S4_with-guc_kern.log
Created attachment 125147 [details]
APL_gem_exec_suspend_basic-S4_output-with-and-without-guc
Updating priority based on the fact that w/o GuC this becomes sporadic The failure is reasonably consistent, it looks like the execlists context-switch goes awry, and for whatever reason the guc is more susceptible. It is more likely to be something not quite right in the interrupt routing upon resume (first guess). This doesn't really appear to be GuC-related. For example, the APL-gem-exec-suspend-basic-s4 test logs show exactly the same failure (GPU HANG) with or without GuC submission. GuC mode may expose it more quickly but the issue itself is not caused by the GuC. dmesg_gem_exec_suspend_apl.log shows: [ 331.343467] [drm:intel_guc_setup] GuC fw status: path i915/bxt_guc_ver8_7.bin, fetch FAIL, load NONE in other words the (correct) firmware is not present. Ditto for dmesg_bxt_log: [ 1.671603] i915 0000:00:02.0: Direct firmware load for i915/bxt_guc_ver8_7.bin failed with error -2 As for dmes_bsw.log, that contains: [ 351.405681] [drm:intel_guc_setup] GuC fw status: path (null), fetch NONE, load NONE where the (null) path means that this kernel does not support BSW. To contradict my previous comment, APL-gem_exec_suspend_gpu-hang_kern.log shows something very odd: the GuC firmware has disappeared. Early in the log we have Jul 5 16:56:27 BXTP5 kernel: [ 1.689144] [drm:intel_guc_setup] GuC fw status: path i915/bxt_guc_ver8_7.bin, fetch SUCCESS, load NONE but 5 minutes later, on the next reboot cycle: Jul 5 17:01:44 BXTP5 kernel: [ 1.711802] i915 0000:00:02.0: Direct firmware load for i915/bxt_guc_ver8_7.bin failed with error -2 The kernel logs for each cycle look generally similar, but the order of some operations is not identical. In particular, the appearance of the MMC devices can come before OR after the attempt to load the GuC firmware. So this is really a completely different issue, related to the way that devices are initialised asynchronously w.r.t one another. It should be moved to a separate bug report. .Dave. (In reply to david.s.gordon from comment #38) > It should be moved to a separate bug report. Reported here: bug 97275 Created attachment 125661 [details] bsw-gem_exec_suspend__basic_S4-kern.log I reproduced the GPU on a BSW production device Hardware: Acer Desktop Motherboard: Aspire XC-704 CPU: Intel(R) Pentium(R) CPU N3700 @ 1.60GHz (Family 6, Model 76, Stepping 3) GPU: IntelĀ® HD Graphics - Intel Corporation Device 22b1 (rev 21) Memory card: 1 card 4GB Hynix HMT451S6BFR8APB HDD: Western Digital WDC WD10EZEX-21M (1TB) Software: Bios: R01-A2 Linux distribution: Ubuntu 16.04 64 bits Kernel: 4.8.0-rc1 d0e3a4b from http://cgit.freedesktop.org/drm-intel/ commit d0e3a4b2e1743e3ed20327718b5cd069f6a39414 Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Tue Aug 9 22:19:07 2016 +0200 drm-intel-nightly: 2016y-08m-09d-20h-18m-38s UTC integration manifest drm: libdrm-2.4.70-2 b214b05 from git://anongit.freedesktop.org/mesa/drm mesa: mesa-11.2.2 3a9f628from git://anongit.freedesktop.org/mesa/mesa cairo: 1.15.2 db8a7f1 from git://anongit.freedesktop.org/cairo xserver: xorg-server-1.18.0-502 c833c08 from git://git.freedesktop.org/git/xorg/xserver xf86-video-intel: 2.99.917-691 a77397a from git://git.freedesktop.org/git/xorg/driver/xf86-video-intel libva: libva-1.7.0-44 695f99e from git://git.freedesktop.org/git/vaapi/libva vaapi-intel-driver: 1.7.0-66 fb7d6f5 from git://git.freedesktop.org/git/vaapi/intel-driver Intel-Gpu-Tools 1.15-216 9afd545 from http://anongit.freedesktop.org/git/xorg/app/intel-gpu-tools.git Created attachment 125662 [details]
bsw-gem_exec_suspend__basic_S4-output
Created attachment 125663 [details]
bsw-error
still occurs with the following configuration Software information ============================================ Linux distribution : Ubuntu 16.04.1 LTS Architecture : 64-bit Bios revision : 148.11 Bios release date : 07/25/2016 KSC revision : 1.15 Hardware information ============================================ Platform : BXT-P Motherboard model : Broxton P Motherboard type : NOTEBOOK Hand Held Motherboard manufacturer : Intel Corp. CPU family : Other CPU information : 06/5c GPU Card : Intel Corporation Device 5a84 (rev 0a) (prog-if 00 [VGA controller]) Memory ram : 8 GB CPU thread : 4 CPU core : 4 Firmwares information ============================================ DMC fw loaded : yes DMC version : 1.7 Gfx stack ================================================ Component : drm tag : libdrm-2.4.70-2-gb214b05 commit : b214b05 Component : cairo tag : 1.15.2 commit : db8a7f1 Component : intel-gpu-tools tag : intel-gpu-tools-1.15-245-g572a770 commit : 572a770 Add Comment Kernel ================================================ commit f4f46e5544894b2198cdfd5a226ee587d9834cc4 Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Mon Aug 29 16:09:42 2016 +0200 drm-intel-nightly: 2016y-08m-29d-14h-09m-23s UTC integration manifest Upgrading priority since it is impacting APL. BTW, Christophe, Humberto, it looks like on FI CI, these tests are passed. Please confirm on you your side. (In reply to yann from comment #44) > Upgrading priority since it is impacting APL. BTW, Christophe, Humberto, it > looks like on FI CI, these tests are passed. Please confirm on you your side. Hi Yann; this test is fail in our side, actually when the DUT try to resume the dut reboots with the following configuration Hardware information ============================================ Platform : BXT-P FAB2 Motherboard model : Broxton P Motherboard type : NOTEBOOK Hand Held Motherboard manufacturer : Intel Corp. CPU information : 06/5c GPU Card : Intel Corporation Device 5a84 (rev 0a) (prog-if 00 [VGA controller]) Memory ram : 16 GB Maximum memory ram allowed : 16 GB CPU thread : 4 CPU core : 4 Firmwares information ============================================ DMC fw loaded : yes DMC version : 1.7 Gfx Stack ======================================================= Component : drm tag : libdrm-2.4.70-2-gb214b05 commit : b214b05ccd433c484a6a65e491a1a51b19e4811d Component : cairo tag : 1.15.2 commit : db8a7f1697c49ae4942d2aa49eed52dd73dd9c7a Component : intel-gpu-tools tag : intel-gpu-tools-1.15-245-g572a770 commit : 572a770f997cae6c3bcb76577e6eac61baa0afa3 Kernel ======================================================= commit f4f46e5544894b2198cdfd5a226ee587d9834cc4 Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Mon Aug 29 16:09:42 2016 +0200 drm-intel-nightly: 2016y-08m-29d-14h-09m-23s UTC integration manifest (In reply to yann from comment #44) > Upgrading priority since it is impacting APL. BTW, Christophe, Humberto, it > looks like on FI CI, these tests are passed. Please confirm on you your side. Hi Yann; this test is fail in our side, actually when the DUT try to resume the dut reboots with the following configuration Hardware information ============================================ Platform : BXT-P FAB2 Motherboard model : Broxton P Motherboard type : NOTEBOOK Hand Held Motherboard manufacturer : Intel Corp. CPU information : 06/5c GPU Card : Intel Corporation Device 5a84 (rev 0a) (prog-if 00 [VGA controller]) Memory ram : 16 GB Maximum memory ram allowed : 16 GB CPU thread : 4 CPU core : 4 Firmwares information ============================================ DMC fw loaded : yes DMC version : 1.7 Gfx Stack ======================================================= Component : drm tag : libdrm-2.4.70-2-gb214b05 commit : b214b05ccd433c484a6a65e491a1a51b19e4811d Component : cairo tag : 1.15.2 commit : db8a7f1697c49ae4942d2aa49eed52dd73dd9c7a Component : intel-gpu-tools tag : intel-gpu-tools-1.15-245-g572a770 commit : 572a770f997cae6c3bcb76577e6eac61baa0afa3 Kernel ======================================================= commit f4f46e5544894b2198cdfd5a226ee587d9834cc4 Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Mon Aug 29 16:09:42 2016 +0200 drm-intel-nightly: 2016y-08m-29d-14h-09m-23s UTC integration manifest Humberto, Do you still see hung? If not, this a different issue and therefore fill a new bug and close this one. this test keep fail on BSW with the following configuration Software information ============================================ Kernel version : 4.8.0-rc4-drm-intel-nightly-ww37-commit-507a1d9+ Linux distribution : Ubuntu 16.04.1 LTS Architecture : 64-bit Kernel driver in use : i915 Bios revision : 0.33 Bios release date : 08/12/2015 KSC revision : 0.16 Hardware information ============================================ Platform : BSW Motherboard model : 10G9000NUS Motherboard type : BRASWELL Desktop Motherboard manufacturer : LENOVO CPU family : Pentium CPU information : Intel(R) Pentium(R) CPU N3700 @ 1.60GHz GPU Card : Intel Corporation Device 22b1 (rev 21) (prog-if 00 [VGA controller]) Memory ram : 8 GB CPU thread : 4 CPU core : 4 Socket : Socket BGA1155 Signature : Type 0, Family 6, Model 76, Stepping 3 Kernel ============================================ commit 507a1d98d13f18acd36d9b81f4b316a3f79af00e Author: Jani Nikula <jani.nikula@intel.com> Date: Tue Sep 6 16:55:52 2016 +0300 drm-intel-nightly: 2016y-09m-06d-13h-55m-34s UTC integration manifest Gfx Stack ============================================== Component : drm tag : libdrm-2.4.68 commit : fc09c5ab84240e9b6bd0bed01685ef004f56c4fa Component : cairo tag : 1.15.2 commit : db8a7f1697c49ae4942d2aa49eed52dd73dd9c7a Component : intel-gpu-tools tag : intel-gpu-tools-1.16 commit : a28e9e38a9efc6daf5a08d60d29adcd3e328fe6f (In reply to yann from comment #47) > Humberto, > > Do you still see hung? If not, this a different issue and therefore fill a > new bug and close this one. Hi Yann : regarding APL i have a issue with rtcwake tool, by itselft it works well but launch it by c file looks like that shows the following issue : PM swap header not found, i'll investigate in order to reprduce this issue Created attachment 126574 [details] BDW__gem_exec_suspend--basic-S4__kern.log GPU Hang is reproduced on BDW with fresh setup Platform: NUC5i3RYB CPU: Intel(R) Core(TM) i3-5010U CPU @ 2.10GHz (family 6, model 61, stepping 4) Motherboard version: H41000-503 GPU: IntelĀ® HD Graphics 5500 - Intel Corporation Broadwell-U Integrated Graphics (rev 09) Memory: two 4GB card Crucial CT51264BF160B.C16F SSD: INTEL SSDSC2BW48 480 Go Software Bios: RYBDWi35.86A.0358.2016.0606.1423 from https://downloadcenter.intel.com/downloads/eula/26081/BIOS-Update-RYBDWi35-86A-?httpDown=https%3A%2F%2Fdownloadmirror.intel.com%2F26081%2Feng%2FRY0358.bio Linux distribution: Ubuntu 16.04 64 bits Kernel: 4.8.0-rc5 bef9c1f from http://cgit.freedesktop.org/drm-intel/ commit bef9c1f4afe24cfff578d386bde349add65673eb Author: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Date: Mon Sep 12 11:35:34 2016 +0300 drm-intel-nightly: 2016y-09m-12d-08h-35m-02s UTC integration manifest libdrm-2.4.70-12 2d00869 from git://anongit.freedesktop.org/mesa/drm mesa: mesa-11.2.2 3a9f628 from git://anongit.freedesktop.org/mesa/mesa cairo 1.15.2 db8a7f1 from git://anongit.freedesktop.org/cairo xorg-server-1.18.0-549 527c6ba from git://git.freedesktop.org/git/xorg/xserver xf86-video-intel 2.99.917-703 15c5ff1 from git://git.freedesktop.org/git/xorg/driver/xf86-video-intel libva-1.7.0-47 2ebf897 from git://git.freedesktop.org/git/vaapi/libva vaapi-intel-driver: 1.7.0-117 8c11f51 from git://git.freedesktop.org/git/vaapi/intel-driver Intel-Gpu-Tools 1.16 f565b6c from http://anongit.freedesktop.org/git/xorg/app/intel-gpu-tools.git Created attachment 126575 [details]
BDW_error
Created attachment 126576 [details]
BDW__gem_exec_suspend--basic-S4__output
Tested 3 times, reproduced 3 times
(In reply to cprigent from comment #51) > Created attachment 126575 [details] > BDW_error Looks like the GPU resumed execution from before the saved portion of the ring buffer i.e. the context image was stale (RING_HEAD). Does diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c index df10f4e95736..331c4a5c6822 100644 --- a/drivers/gpu/drm/i915/i915_gem_context.c +++ b/drivers/gpu/drm/i915/i915_gem_context.c @@ -945,7 +945,7 @@ int i915_gem_switch_to_kernel_context(struct drm_i915_private *dev_priv) return PTR_ERR(req); ret = i915_switch_context(req); - i915_add_request_no_flush(req); + i915_add_request(req); if (ret) return ret; } help? Patch from Chris available at: https://patchwork.freedesktop.org/series/12592/ Created attachment 126704 [details] BDW__with-patch-comment-54 The patch (In reply to Chris Wilson from comment #54) > Does > > diff --git a/drivers/gpu/drm/i915/i915_gem_context.c > b/drivers/gpu/drm/i915/i915_gem_context.c > index df10f4e95736..331c4a5c6822 100644 > --- a/drivers/gpu/drm/i915/i915_gem_context.c > +++ b/drivers/gpu/drm/i915/i915_gem_context.c > @@ -945,7 +945,7 @@ int i915_gem_switch_to_kernel_context(struct > drm_i915_private *dev_priv) > return PTR_ERR(req); > > ret = i915_switch_context(req); > - i915_add_request_no_flush(req); > + i915_add_request(req); > if (ret) > return ret; > } > > help? No. I reproduce it on BDW. (In reply to yann from comment #55) > Patch from Chris available at: > https://patchwork.freedesktop.org/series/12592/ I tried several commits and tags. I'm not able to apply patch number 1. commit bafb2f7d4755bf1571bd5e9a03b97f3fc4fe69ae Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Sep 21 14:51:08 2016 +0100 drm/i915/execlists: Reset RING registers upon resume There is a disparity in the context image saved to disk and our own bookkeeping - that is we presume the RING_HEAD and RING_TAIL match our stored ce->ring->tail value. However, as we emit WA_TAIL_DWORDS into the ring but may not tell the GPU about them, the GPU may be lagging behind our bookkeeping. Upon hibernation we do not save stolen pages, presuming that their contents are volatile. This means that although we start writing into the ring at tail, the GPU starts executing from its HEAD and there may be some garbage in between and so the GPU promptly hangs upon resume. Testcase: igt/gem_exec_suspend/basic-S4 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96526 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160921135108.29574-3-chris@chris-wilson.co.uk (In reply to Chris Wilson from comment #58) > commit bafb2f7d4755bf1571bd5e9a03b97f3fc4fe69ae > Author: Chris Wilson <chris@chris-wilson.co.uk> > Date: Wed Sep 21 14:51:08 2016 +0100 > > drm/i915/execlists: Reset RING registers upon resume > > There is a disparity in the context image saved to disk and our own > bookkeeping - that is we presume the RING_HEAD and RING_TAIL match our > stored ce->ring->tail value. However, as we emit WA_TAIL_DWORDS into the > ring but may not tell the GPU about them, the GPU may be lagging behind > our bookkeeping. Upon hibernation we do not save stolen pages, presuming > that their contents are volatile. This means that although we start > writing into the ring at tail, the GPU starts executing from its HEAD > and there may be some garbage in between and so the GPU promptly hangs > upon resume. > > Testcase: igt/gem_exec_suspend/basic-S4 > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96526 > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> > Link: > http://patchwork.freedesktop.org/patch/msgid/20160921135108.29574-3- > chris@chris-wilson.co.uk with this commit and the following configuration on BXT this test pass : gem_exec_suspend basic-s4 Component : drm tag : libdrm-2.4.70-15-gabfa680 commit : abfa680 Component : cairo tag : 1.15.2-58-gb207a93 commit : b207a93 Component : intel-gpu-tools tag : intel-gpu-tools-1.16-36-gd16318a commit : d16318a Hardware information ============================================ Platform : BXT-P Motherboard model : Broxton P Motherboard type : NOTEBOOK Hand Held Motherboard manufacturer : Intel Corp. CPU family : Other CPU information : 06/5c GPU Card : Intel Corporation Device 5a84 (rev 0a) (prog-if 00 [VGA controller]) Memory ram : 16 GB Maximum memory ram allowed : 16 GB CPU thread : 4 CPU core : 4 (In reply to Humberto Israel Perez Rodriguez from comment #59) > (In reply to Chris Wilson from comment #58) > > commit bafb2f7d4755bf1571bd5e9a03b97f3fc4fe69ae > > Author: Chris Wilson <chris@chris-wilson.co.uk> > > Date: Wed Sep 21 14:51:08 2016 +0100 > > > > drm/i915/execlists: Reset RING registers upon resume > > > > There is a disparity in the context image saved to disk and our own > > bookkeeping - that is we presume the RING_HEAD and RING_TAIL match our > > stored ce->ring->tail value. However, as we emit WA_TAIL_DWORDS into the > > ring but may not tell the GPU about them, the GPU may be lagging behind > > our bookkeeping. Upon hibernation we do not save stolen pages, presuming > > that their contents are volatile. This means that although we start > > writing into the ring at tail, the GPU starts executing from its HEAD > > and there may be some garbage in between and so the GPU promptly hangs > > upon resume. > > > > Testcase: igt/gem_exec_suspend/basic-S4 > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96526 > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> > > Link: > > http://patchwork.freedesktop.org/patch/msgid/20160921135108.29574-3- > > chris@chris-wilson.co.uk > > > with this commit and the following configuration on BXT this test pass : > > gem_exec_suspend basic-s4 > > Component : drm > tag : libdrm-2.4.70-15-gabfa680 > commit : abfa680 > > Component : cairo > tag : 1.15.2-58-gb207a93 > commit : b207a93 > > Component : intel-gpu-tools > tag : intel-gpu-tools-1.16-36-gd16318a > commit : d16318a > > > > Hardware information > ============================================ > Platform : BXT-P > Motherboard model : Broxton P > Motherboard type : NOTEBOOK Hand Held > Motherboard manufacturer : Intel Corp. > CPU family : Other > CPU information : 06/5c > GPU Card : Intel Corporation Device 5a84 (rev 0a) > (prog-if 00 [VGA controller]) > Memory ram : 16 GB > Maximum memory ram allowed : 16 GB > CPU thread : 4 > CPU core : 4 with the same gfx stack configuration and the same kernel this test pass as well in BDW platform : Hardware information ============================================ Platform : BDW Motherboard type : NUC5i5RYB Desktop CPU family : Core i5 CPU information : Intel(R) Core(TM) i5-5250U CPU @ 1.60GHz GPU Card : Intel Corporation Broadwell-U Integrated Graphics (rev 09) (prog-if 00 [VGA controller]) Memory ram : 8 GB Maximum memory ram allowed : 16 GB CPU thread : 4 CPU core : 2 Socket : Socket BGA1168 Signature : Type 0, Family 6, Model 61, Stepping 4 *** Bug 98288 has been marked as a duplicate of this bug. *** *** Bug 99632 has been marked as a duplicate of this bug. *** *** Bug 99545 has been marked as a duplicate of this bug. *** *** Bug 99719 has been marked as a duplicate of this bug. *** *** Bug 99771 has been marked as a duplicate of this bug. *** *** Bug 99814 has been marked as a duplicate of this bug. *** *** Bug 101289 has been marked as a duplicate of this bug. *** *** Bug 101884 has been marked as a duplicate of this bug. *** *** Bug 101959 has been marked as a duplicate of this bug. *** (In reply to Chris Wilson from comment #18) > commit 5ab57c7020697942ea15f45ad14c69cecb164329 > Author: Chris Wilson <chris@chris-wilson.co.uk> > Date: Fri Jul 15 14:56:20 2016 +0100 This seems included in version 4.8-rc2: $ git describe bafb2f7d4755bf1571bd5e9a03b97f3fc4fe69ae v4.8-rc2-641-gbafb2f7d4755 I have kernel version 4.9.30: $ uname -v #1 SMP Debian 4.9.30-2+deb9u2 (2017-06-26) but I still experience the bug. This is because, if I understand it right, the commit containing this patch has been reverted in the production kernel: ################ commit 0ee72d8f9b8e17b8e4ccfebc7a25cbc2d395cd6a Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Date: Wed Apr 12 15:49:39 2017 +0200 Revert "drm/i915/execlists: Reset RING registers upon resume" This reverts commit f2a0409a08502d64fbe3990354dff5902b08d2fb which is commit bafb2f7d4755bf1571bd5e9a03b97f3fc4fe69ae upstream. It was reported to have problems. ################ https://lists.freedesktop.org/archives/intel-gfx/2017-April/125833.html I therefore wonder whether this means this bug is still there in the production kernel. Jani Nikula explained the history of this patch [1]: > (In reply to Damian Martinez Dreyer from comment #0) > > Description: I have bisected Kernel 4.9.9 and determined the following to be > > the cause: > > > > commit f2a0409a08502d64fbe3990354dff5902b08d2fb > > Author: Chris Wilson <chris@chris-wilson.co.uk> > > Date: Wed Sep 21 14:51:08 2016 +0100 > > > > drm/i915/execlists: Reset RING registers upon resume > > > > commit bafb2f7d4755bf1571bd5e9a03b97f3fc4fe69ae upstream. > > The stable backport has been reverted in v4.9.23 by > > commit 0ee72d8f9b8e17b8e4ccfebc7a25cbc2d395cd6a > Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org> > Date: Wed Apr 12 15:49:39 2017 +0200 > > Revert "drm/i915/execlists: Reset RING registers upon resume" > > This reverts commit f2a0409a08502d64fbe3990354dff5902b08d2fb which is > commit bafb2f7d4755bf1571bd5e9a03b97f3fc4fe69ae upstream. > > It was reported to have problems. > > Cc: Jani Nikula <jani.nikula@linux.intel.com> > Cc: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> > Cc: Eric Blau <eblau1@gmail.com> > Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org > > Thread http://mid.mail-archive.com/1489443835.5568.7.camel@mailbox.org has > the details. [1] https://bugs.freedesktop.org/show_bug.cgi?id=100221#c10 *** Bug 102056 has been marked as a duplicate of this bug. *** *** Bug 102269 has been marked as a duplicate of this bug. *** *** Bug 102534 has been marked as a duplicate of this bug. *** *** Bug 102831 has been marked as a duplicate of this bug. *** *** Bug 103065 has been marked as a duplicate of this bug. *** *** Bug 103394 has been marked as a duplicate of this bug. *** *** Bug 103275 has been marked as a duplicate of this bug. *** |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 124525 [details] dump from intel_error_decode Time 0:01:18.061184 IGT-Version: 1.15-g3ce58b6 (x86_64) (Linux: 4.7.0-rc1+ x86_64) rtcwake: wakeup from "disk" using /dev/rtc0 at Fri Jun 3 12:20:58 2016 Stack trace: #0 [__igt_fail_assert+0xf1] #1 [sig_abort+0x3a] #2 [killpg+0x40] #3 [ioctl+0x7] #4 [drmIoctl+0x28] #5 [__gem_execbuf+0x15] Stdout #6 [gem_has_ring+0x54] #7 [test_all+0x40] #8 [run_test+0x3fe] #9 [__real_main227+0x26f] #10 [main+0x23] #11 [__libc_start_main+0xf0] #12 [_start+0x29] #13 [<unknown>+0x29] Subtest basic-S4: FAIL (7.956s) (gem_exec_suspend:7902) DEBUG: Test requirement passed: gem_has_ring(fd, 0) (gem_exec_suspend:7902) DEBUG: Test requirement passed: can_mi_store_dword(gen, 0) (gem_exec_suspend:7902) DEBUG: Test requirement passed: nengine (gem_exec_suspend:7902) DEBUG: Test requirement passed: gem_has_ring(fd, engine) (gem_exec_suspend:7902) DEBUG: Test requirement passed: can_mi_store_dword(gen, engine) (gem_exec_suspend:7902) DEBUG: Test requirement passed: nengine (gem_exec_suspend:7902) ioctl-wrappers-DEBUG: Test requirement passed: __gem_set_caching(fd, handle, caching) == 0 (gem_exec_suspend:7902) DEBUG: Test requirement passed: __gem_execbuf(fd, &execbuf) == 0 (gem_exec_suspend:7902) DEBUG: Verifying result (gem_exec_suspend:7902) DEBUG: Test requirement passed: gem_has_ring(fd, engine) (gem_exec_suspend:7902) DEBUG: Test requirement passed: can_mi_store_dword(gen, engine) (gem_exec_suspend:7902) DEBUG: Test requirement passed: nengine (gem_exec_suspend:7902) ioctl-wrappers-DEBUG: Test requirement passed: __gem_set_caching(fd, handle, caching) == 0 (gem_exec_suspend:7902) DEBUG: Test requirement passed: __gem_execbuf(fd, &execbuf) == 0 (gem_exec_suspend:7902) DEBUG: Verifying result Stderr (gem_exec_suspend:7902) DEBUG: Test requirement passed: gem_has_ring(fd, engine) (gem_exec_suspend:7902) DEBUG: Test requirement passed: can_mi_store_dword(gen, engine) (gem_exec_suspend:7902) DEBUG: Test requirement passed: nengine (gem_exec_suspend:7902) ioctl-wrappers-DEBUG: Test requirement passed: __gem_set_caching(fd, handle, caching) == 0 (gem_exec_suspend:7902) DEBUG: Test requirement passed: __gem_execbuf(fd, &execbuf) == 0 (gem_exec_suspend:7902) DEBUG: Verifying result (gem_exec_suspend:7902) DEBUG: Test requirement passed: gem_has_ring(fd, engine) (gem_exec_suspend:7902) DEBUG: Test requirement passed: can_mi_store_dword(gen, engine) (gem_exec_suspend:7902) DEBUG: Test requirement passed: nengine (gem_exec_suspend:7902) ioctl-wrappers-DEBUG: Test requirement passed: __gem_set_caching(fd, handle, caching) == 0 (gem_exec_suspend:7902) DEBUG: Test requirement passed: __gem_execbuf(fd, &execbuf) == 0 (gem_exec_suspend:7902) DEBUG: Verifying result (gem_exec_suspend:7902) ioctl-wrappers-DEBUG: Test requirement passed: __gem_set_caching(fd, handle, caching) == 0 (gem_exec_suspend:7902) DEBUG: Test requirement passed: __gem_execbuf(fd, &execbuf) == 0 (gem_exec_suspend:7902) igt-core-DEBUG: Test requirement passed: !igt_run_in_simulation() (gem_exec_suspend:7902) igt-aux-DEBUG: Test requirement passed: system("rtcwake -n -s 30 -m disk" SQUELCH) == 0 (gem_exec_suspend:7902) DEBUG: Verifying result (gem_exec_suspend:7902) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:399: (gem_exec_suspend:7902) igt-aux-CRITICAL: Failed assertion: !"GPU hung" dmesg: [ 642.918976] [drm] stuck on blitter ring [ 642.918987] [drm] stuck on bsd ring [ 642.918993] [drm] stuck on video enhancement ring [ 642.930715] [drm] GPU HANG: ecode 8:1:0x5ccddf92, in gem_exec_suspen [6045], reason: Engine(s) hung, action: reset [ 642.930921] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. [ 642.930925] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel [ 642.930929] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. [ 642.930932] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. [ 642.930935] [drm] GPU crash dump saved to /sys/class/drm/card0/error [ 642.938079] [drm:i915_set_reset_status [i915]] *ERROR* gpu hanging too fast, banning! S3 and S4 works fine without issuing batch commands to the GPU.