Split off from Bug 96526 - [APL] [BAT execlists] Sporadic - gem_exec_suspend basic-s4 GPU hang after resume One of the logs attached to that bug (https://bugs.freedesktop.org/attachment.cgi?id=124911) shows the GuC firmware disappearing between reboots. Early in the log we have Jul 5 16:56:27 BXTP5 kernel: [ 1.689144] [drm:intel_guc_setup] GuC fw status: path i915/bxt_guc_ver8_7.bin, fetch SUCCESS, load NONE but 5 minutes later, on the next reboot cycle: Jul 5 17:01:44 BXTP5 kernel: [ 1.711802] i915 0000:00:02.0: Direct firmware load for i915/bxt_guc_ver8_7.bin failed with error -2 The kernel logs for each cycle look generally similar, but the order of some operations is not identical. In particular, the appearance of the MMC devices can come before OR after the attempt to load the GuC firmware. So this is really a completely different issue, related to the way that devices are initialised asynchronously w.r.t one another. Hence, this bug report has been opened to track it separately.
So to work out just what's happening here we need to know: Is the correct version of the firmware present in the root filesystem, an dif so where. We expect it to be /lib/firmware/i915/bxt_guc_ver8_7.bin And incidentally, is it a binary file or a symlink to the binary? Although they're generally equivalent, there are cases where symlinks will not be followed (although the error would probably be -EACCESS rather than -ENOENT). Secondly: is the binary also present in the ramdisk? lsinitramfs(8) can be used to examine the contents of the initrd file, e.g. $ lsinitramfs /boot/initrd.img-4.7.0-rc7-dsg-02211-g3b9ac6a-dsg-work-111 | grep -i guc lib/firmware/i915/skl_guc_ver6_1.bin shows that on this SKL machine, we have only version 6.1 of the GuC firmware. Thirdly, we need the mount table (/etc/fstab) so we know which device contains the root filesystem, and therefore when it gets mounted. And finally (for now) it would be useful to know what Kconfig options have been selected, as these can affect ramfs, i915, and device probing in general.
Updating priority/severity based on Feature failure w/o workaround
A similar but different occurence Command /opt/igt/tests/drv_module_reload_basic Stdout Reloading i915.ko with unbinding /sys/class/vtconsole/vtcon1/: (M) frame buffer device module successfully unloaded module successfully loaded again Reloading i915.ko with inject_load_failure=1 unbinding /sys/class/vtconsole/vtcon1/: (M) frame buffer device module successfully unloaded Reloading i915.ko with inject_load_failure=2 module successfully unloaded Reloading i915.ko with inject_load_failure=3 module successfully unloaded Reloading i915.ko with inject_load_failure=4 module successfully unloaded Reloading i915.ko with module successfully unloaded module successfully loaded again dmesg [ 164.447845] Setting dangerous option enable_psr - tainting kernel [ 164.482186] i915 0000:00:02.0: Failed to load DMC firmware [https://01.org/linuxgraphics/intel-linux-graphics-firmwares], disabling runtime power management. [ 165.476834] i915 0000:00:02.0: Direct firmware load for i915/skl_guc_ver6_1.bin failed with error -2 [ 165.476911] [drm:intel_guc_init [i915]] *ERROR* Failed to fetch GuC firmware from i915/skl_guc_ver6_1.bin (error -2) [ 174.256375] Setting dangerous option enable_psr - tainting kernel [ 174.256378] Setting dangerous option inject_load_failure - tainting kernel [ 174.584082] Setting dangerous option enable_psr - tainting kernel [ 174.584085] Setting dangerous option inject_load_failure - tainting kernel [ 174.948056] Setting dangerous option enable_psr - tainting kernel [ 174.948058] Setting dangerous option inject_load_failure - tainting kernel [ 175.286457] Setting dangerous option enable_psr - tainting kernel [ 175.286460] Setting dangerous option inject_load_failure - tainting kernel [ 175.638344] Setting dangerous option enable_psr - tainting kernel [ 175.667779] i915 0000:00:02.0: Failed to load DMC firmware [https://01.org/linuxgraphics/intel-linux-graphics-firmwares], disabling runtime power management. [ 176.652812] i915 0000:00:02.0: Direct firmware load for i915/skl_guc_ver6_1.bin failed with error -2 [ 176.652889] [drm:intel_guc_init [i915]] *ERROR* Failed to fetch GuC firmware from i915/skl_guc_ver6_1.bin (error -2) Here both the GuC firmware AND the DMC firmware seem to have gone missing. Unless machine (ro-skl3-i5-6260u) just doesn't have them installed?
(In reply to david.s.gordon from comment #3) > A similar but different occurence > > Command /opt/igt/tests/drv_module_reload_basic > > Stdout > > Reloading i915.ko with > unbinding /sys/class/vtconsole/vtcon1/: (M) frame buffer device > module successfully unloaded > module successfully loaded again > Reloading i915.ko with inject_load_failure=1 > unbinding /sys/class/vtconsole/vtcon1/: (M) frame buffer device > module successfully unloaded > Reloading i915.ko with inject_load_failure=2 > module successfully unloaded > Reloading i915.ko with inject_load_failure=3 > module successfully unloaded > Reloading i915.ko with inject_load_failure=4 > module successfully unloaded > Reloading i915.ko with > module successfully unloaded > module successfully loaded again > > dmesg > > [ 164.447845] Setting dangerous option enable_psr - tainting kernel > [ 164.482186] i915 0000:00:02.0: Failed to load DMC firmware > [https://01.org/linuxgraphics/intel-linux-graphics-firmwares], disabling > runtime power management. > [ 165.476834] i915 0000:00:02.0: Direct firmware load for > i915/skl_guc_ver6_1.bin failed with error -2 > [ 165.476911] [drm:intel_guc_init [i915]] *ERROR* Failed to fetch GuC > firmware from i915/skl_guc_ver6_1.bin (error -2) > [ 174.256375] Setting dangerous option enable_psr - tainting kernel > [ 174.256378] Setting dangerous option inject_load_failure - tainting kernel > [ 174.584082] Setting dangerous option enable_psr - tainting kernel > [ 174.584085] Setting dangerous option inject_load_failure - tainting kernel > [ 174.948056] Setting dangerous option enable_psr - tainting kernel > [ 174.948058] Setting dangerous option inject_load_failure - tainting kernel > [ 175.286457] Setting dangerous option enable_psr - tainting kernel > [ 175.286460] Setting dangerous option inject_load_failure - tainting kernel > [ 175.638344] Setting dangerous option enable_psr - tainting kernel > [ 175.667779] i915 0000:00:02.0: Failed to load DMC firmware > [https://01.org/linuxgraphics/intel-linux-graphics-firmwares], disabling > runtime power management. > [ 176.652812] i915 0000:00:02.0: Direct firmware load for > i915/skl_guc_ver6_1.bin failed with error -2 > [ 176.652889] [drm:intel_guc_init [i915]] *ERROR* Failed to fetch GuC > firmware from i915/skl_guc_ver6_1.bin (error -2) > > > Here both the GuC firmware AND the DMC firmware seem to have gone missing. > Unless machine (ro-skl3-i5-6260u) just doesn't have them installed? Actually, that is the problem for this specific case; the firmware isn't installed on that machine. That hasn't been causing failures because GuC submission has been disabled :( It also means DMC is not getting tested, which also goes unnoticed because DMC firmware load failure isn't flagged as an error :(
*** Bug 97275 has been marked as a duplicate of this bug. ***
@cprigent @elio Tried drv_module_reload_basic and gem_exec_suspend basic -S4. GuC get loaded in each case. Can you confirm this?
gem_exec_suspend basic -S4 is failing 50/100, some times sends errors regarding rtcwake but all the times the GUC is there [ 220.568647] [drm:intel_guc_setup [i915]] GuC fw status: path i915/bxt_guc_ver8_7.bin, fetch SUCCESS, load SUCCESS [ 220.568681] [drm:intel_guc_setup [i915]] GuC fw status: fetch SUCCESS, load PENDING [ 220.568779] [drm:intel_huc_load [i915]] i915/bxt_huc_ver01_07_1398.bin fw status: fetch SUCCESS, load SUCCESS [ 220.570899] [drm:intel_huc_load [i915]] HuC DMA transfer wait over with ret 0 [ 220.570932] [drm:intel_huc_load [i915]] i915/bxt_huc_ver01_07_1398.bin fw status: fetch SUCCESS, load SUCCESS Output. rtcwake: assuming RTC uses UTC ... rtcwake: wakeup from "disk" using /dev/rtc0 at Thu Oct 20 15:51:20 2016 rtcwake: write error (gem_exec_suspend:1091) igt-aux-CRITICAL: Test assertion failure function suspend_via_rtcwake, file igt_aux.c:710: (gem_exec_suspend:1091) igt-aux-CRITICAL: Failed assertion: system(cmd) == 0 (gem_exec_suspend:1091) igt-aux-CRITICAL: This failure means that something is wrong with the rtcwake tool or how your distro is set up. This is not a i915.ko or i-g-t bug. IGT-Version: 1.16-gf7408bf (x86_64) (Linux: 4.8.0-nightly-patch-anusha+ x86_64) Subtest basic-S4 failed. **** DEBUG **** (gem_exec_suspend:1091) DEBUG: Test requirement passed: gem_has_ring(fd, 0) (gem_exec_suspend:1091) DEBUG: Test requirement passed: can_mi_store_dword(gen, 0) (gem_exec_suspend:1091) DEBUG: Test requirement passed: nengine (gem_exec_suspend:1091) DEBUG: Test requirement passed: gem_has_ring(fd, engine) (gem_exec_suspend:1091) DEBUG: Test requirement passed: can_mi_store_dword(gen, engine) (gem_exec_suspend:1091) DEBUG: Test requirement passed: nengine (gem_exec_suspend:1091) ioctl-wrappers-DEBUG: Test requirement passed: __gem_set_caching(fd, handle, caching) == 0 (gem_exec_suspend:1091) DEBUG: Test requirement passed: __gem_execbuf(fd, &execbuf) == 0 (gem_exec_suspend:1091) DEBUG: Verifying result (gem_exec_suspend:1091) DEBUG: Test requirement passed: gem_has_ring(fd, engine) (gem_exec_suspend:1091) DEBUG: Test requirement passed: can_mi_store_dword(gen, engine) (gem_exec_suspend:1091) DEBUG: Test requirement passed: nengine (gem_exec_suspend:1091) ioctl-wrappers-DEBUG: Test requirement passed: __gem_set_caching(fd, handle, caching) == 0 (gem_exec_suspend:1091) DEBUG: Test requirement passed: __gem_execbuf(fd, &execbuf) == 0 (gem_exec_suspend:1091) DEBUG: Verifying result (gem_exec_suspend:1091) DEBUG: Test requirement passed: gem_has_ring(fd, engine) (gem_exec_suspend:1091) DEBUG: Test requirement passed: can_mi_store_dword(gen, engine) (gem_exec_suspend:1091) DEBUG: Test requirement passed: nengine (gem_exec_suspend:1091) ioctl-wrappers-DEBUG: Test requirement passed: __gem_set_caching(fd, handle, caching) == 0 (gem_exec_suspend:1091) DEBUG: Test requirement passed: __gem_execbuf(fd, &execbuf) == 0 (gem_exec_suspend:1091) DEBUG: Verifying result (gem_exec_suspend:1091) DEBUG: Test requirement passed: gem_has_ring(fd, engine) (gem_exec_suspend:1091) DEBUG: Test requirement passed: can_mi_store_dword(gen, engine) (gem_exec_suspend:1091) DEBUG: Test requirement passed: nengine (gem_exec_suspend:1091) ioctl-wrappers-DEBUG: Test requirement passed: __gem_set_caching(fd, handle, caching) == 0 (gem_exec_suspend:1091) DEBUG: Test requirement passed: __gem_execbuf(fd, &execbuf) == 0 (gem_exec_suspend:1091) DEBUG: Verifying result (gem_exec_suspend:1091) ioctl-wrappers-DEBUG: Test requirement passed: __gem_set_caching(fd, handle, caching) == 0 (gem_exec_suspend:1091) DEBUG: Test requirement passed: __gem_execbuf(fd, &execbuf) == 0 (gem_exec_suspend:1091) igt-core-DEBUG: Test requirement passed: !igt_run_in_simulation() (gem_exec_suspend:1091) igt-aux-DEBUG: Test requirement passed: (power_dir = open("/sys/power", O_RDONLY)) >= 0 (gem_exec_suspend:1091) igt-aux-DEBUG: Test requirement passed: get_supported_suspend_states(power_dir) & (1 << state) (gem_exec_suspend:1091) igt-aux-DEBUG: Test requirement passed: test == SUSPEND_TEST_NONE || faccessat(power_dir, "pm_test", R_OK | W_OK, 0) == 0 (gem_exec_suspend:1091) igt-aux-DEBUG: Test requirement passed: system(cmd) == 0 (gem_exec_suspend:1091) igt-aux-CRITICAL: Test assertion failure function suspend_via_rtcwake, file igt_aux.c:710: (gem_exec_suspend:1091) igt-aux-CRITICAL: Failed assertion: system(cmd) == 0 (gem_exec_suspend:1091) igt-aux-CRITICAL: This failure means that something is wrong with the rtcwake tool or how your distro is set up. This is not a i915.ko or i-g-t bug. **** END **** Stack trace: #0 [__igt_fail_assert+0x101] #1 [igt_system_suspend_autoresume+0x5a5] #2 [run_test+0x387] #3 [__real_main241+0x13e] #4 [main+0x23] #5 [__libc_start_main+0xf0] #6 [_start+0x29] #7 [<unknown>+0x29] Subtest basic-S4: FAIL (9.654s) (11:50 AM) [gfx@gfx-bxt] [tests]$ : dmesg | grep intel_guc_setup [ 3.150691] [drm:intel_guc_setup [i915]] GuC fw status: path i915/bxt_guc_ver8_7.bin, fetch SUCCESS, load NONE [ 3.150726] [drm:intel_guc_setup [i915]] GuC fw status: fetch SUCCESS, load PENDING [ 3.159459] [drm:intel_guc_setup [i915]] GuC fw status: fetch SUCCESS, load SUCCESS [ 64.319473] [drm:intel_guc_setup [i915]] GuC fw status: path i915/bxt_guc_ver8_7.bin, fetch SUCCESS, load SUCCESS [ 64.319499] [drm:intel_guc_setup [i915]] GuC fw status: fetch SUCCESS, load PENDING [ 64.332832] [drm:intel_guc_setup [i915]] GuC fw status: fetch SUCCESS, load SUCCESS [ 155.722796] [drm:intel_guc_setup [i915]] GuC fw status: path i915/bxt_guc_ver8_7.bin, fetch SUCCESS, load SUCCESS [ 155.722830] [drm:intel_guc_setup [i915]] GuC fw status: fetch SUCCESS, load PENDING [ 155.735341] [drm:intel_guc_setup [i915]] GuC fw status: fetch SUCCESS, load SUCCESS I just put drv_module_reload_basic in a short cycle (100 iterations, not able to reproduce the issue) Attaching logs
Created attachment 127436 [details] Dmesg log
Created attachment 127437 [details] Success log
Created attachment 127438 [details] Failure output
Configuration: Kernel: drm-intel-next-2016-09-19-15336-gcb978ac IGT: intel-gpu-tools-1.16-90-gf7408bf Component : drm url : http://cgit.freedesktop.org/mesa/drm tag : libdrm-2.4.68 commit : fc09c5ab84240e9b6bd0bed01685ef004f56c4fa Component : mesa url : http://cgit.freedesktop.org/mesa/mesa tag : mesa-12.0.1 commit : 04277f058d00238937e664cf546c43b16cea7b2b Component : xf86-video-intel url : http://cgit.freedesktop.org/xorg/driver/xf86-video-intel tag : 2.99.917-701-g205146b commit : 205146b0fdc8db016e5cfeeae5a6b25df3470ebc Component : libva url : http://cgit.freedesktop.org/libva tag : libva-1.7.2.pre1 commit : 9927bd2fbbb1d923fd6a8932a3cdbb5c9185ee22 Component : intel-driver url : http://cgit.freedesktop.org/vaapi/intel-driver tag : 1.7.2.pre1 commit : ce444fb412966ca6afbb1331b7cae8ab621c1108 Component : cairo url : http://cgit.freedesktop.org/cairo tag : 1.15.2 commit : db8a7f1697c49ae4942d2aa49eed52dd73dd9c7a Component : xserver url : http://cgit.freedesktop.org/xorg/xserver tag : xorg-server-1.18.3 commit : 9454cd51da9b38b974cff7c8b7125901f6403848 Component : macros url : https://cgit.freedesktop.org/xorg/util/macros tag : util-macros-1.19.0-2-gd7acec2
With latest configuration we are having the following, needs more investigation. With Latest configuration + kernel patches we are having a failed action just after returning from S4 using "rtcwake" [91291.107493] [drm:guc_ucode_xfer_dma [i915]] returning 0 [91291.107539] [drm:intel_guc_setup [i915]] GuC fw status: fetch SUCCESS, load SUCCESS [91291.127091] [drm] INTEL_GUC_SEND: Action 0x502 failed; ret=-110 status=0x00000502 response=0x40000000 Configuration: (Graphic Stack) Intel® Graphics for Linux* | 01.org ============================================ Software information ============================================ Kernel version : 4.9.0-rc8latestfirmware+ Linux distribution : Ubuntu 16.04.1 LTS Architecture : 64-bit Gfx stack code : 1771486656 Mesa version : Not found << Please see the message at the bottom >> xf86-video-intel version : Xorg-Xserver version : 1.18.3 DRM version : 2.4.67 VAAPI version : Not found Cairo version : 1.14.6 Intel GPU Tools version : Tag [intel-gpu-tools-1.17-41-ga56a1f6] / Commit [a56a1f6] Kernel driver in use : i915 Hardware acceleration : Bios revision : 144.10 Bios release date : 06/27/2016 KSC revision : 1.15 ============================================ Firmwares information ============================================ DMC fw loaded : yes DMC version : 1.7 GUC fw loaded : SUCCESS GUC version wanted : 8.7 GUC version found : 8.7 Kernel parameters ============================================ quiet drm.debug=0xe i915.enable_guc_loading=2 i915.enable_guc_submission=0 resume=/dev/sda3 fastboot
Same configuration, i just put reboot in a 100 iteration cycle, all the times i got something similar: 8_7.bin, fetch SUCCESS, load NONE [ 3.598334] [drm:intel_guc_setup [i915]] GuC fw status: fetch SUCCESS, load PENDING [ 3.606409] [drm:guc_ucode_xfer_dma [i915]] DMA status 0x10, GuC status 0x8002f0ec [ 3.606447] [drm:guc_ucode_xfer_dma [i915]] returning 0 [ 3.606484] [drm:intel_guc_setup [i915]] GuC fw status: fetch SUCCESS, load SUCCESS
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.