When running igt@drv_module_reload@basic-reload-final, the first iteration hits an assertion failure and any further invocation will hard hang the system. Network connection is lost, no console output, etc. I'm trying to collect a crash dump next. This might be two separate issues, but until confirmed, I think they should be considered related and handled together. The output of the 2 subsequent invocations: root@collab-x220:~/work/igt-gpu-tools# tests/drv_module_reload --run-subtest basic-reload-final --debug IGT-Version: 1.18-g8039c0ef6e51 (x86_64) (Linux: 4.11.0-rc8.intel-boxes+ x86_64) (drv_module_reload:1181) igt-core-DEBUG: Starting subtest: basic-reload-final (drv_module_reload:1181) igt-kmod-DEBUG: Could not remove module drm_kms_helper (No such file or directory) (drv_module_reload:1181) igt-kmod-DEBUG: Could not remove module drm (No such file or directory) (drv_module_reload:1181) igt-debugfs-DEBUG: Opening debugfs directory '/sys/kernel/debug/dri/0' (drv_module_reload:1181) igt-debugfs-DEBUG: Opening debugfs directory '/sys/kernel/debug/dri/0' (drv_module_reload:1181) igt-debugfs-DEBUG: Opening debugfs directory '/sys/kernel/debug/dri/0' (drv_module_reload:1181) igt-debugfs-DEBUG: Opening debugfs directory '/sys/kernel/debug/dri/0' (drv_module_reload:1181) igt-debugfs-DEBUG: Opening debugfs directory '/sys/kernel/debug/dri/0' (drv_module_reload:1181) igt-debugfs-DEBUG: Opening debugfs directory '/sys/kernel/debug/dri/0' (drv_module_reload:1181) igt-debugfs-DEBUG: Opening debugfs directory '/sys/kernel/debug/dri/0' (drv_module_reload:1181) igt-debugfs-DEBUG: Opening debugfs directory '/sys/kernel/debug/dri/0' (drv_module_reload:1181) igt-debugfs-DEBUG: Opening debugfs directory '/sys/kernel/debug/dri/0' (drv_module_reload:1181) igt-debugfs-DEBUG: Opening debugfs directory '/sys/kernel/debug/dri/0' (drv_module_reload:1181) CRITICAL: Test assertion failure function store_dword, file drv_module_reload.c:111: (drv_module_reload:1181) CRITICAL: Failed assertion: *batch == 0xc0ffee (drv_module_reload:1181) CRITICAL: error: 0 != 12648430 Stack trace: #0 [__igt_fail_assert+0x16e] #1 [store_dword+0x35d] #2 [gem_exec_store+0x48] #3 [__real_main308+0x1e0] #4 [main+0x49] #5 [__libc_start_main+0xf1] #6 [_start+0x2a] #7 [<unknown>+0x2a] Subtest basic-reload-final failed. **** DEBUG **** (drv_module_reload:1181) igt-kmod-DEBUG: Could not remove module drm_kms_helper (No such file or directory) (drv_module_reload:1181) igt-kmod-DEBUG: Could not remove module drm (No such file or directory) (drv_module_reload:1181) igt-debugfs-DEBUG: Opening debugfs directory '/sys/kernel/debug/dri/0' (drv_module_reload:1181) igt-debugfs-DEBUG: Opening debugfs directory '/sys/kernel/debug/dri/0' (drv_module_reload:1181) igt-debugfs-DEBUG: Opening debugfs directory '/sys/kernel/debug/dri/0' (drv_module_reload:1181) igt-debugfs-DEBUG: Opening debugfs directory '/sys/kernel/debug/dri/0' (drv_module_reload:1181) igt-debugfs-DEBUG: Opening debugfs directory '/sys/kernel/debug/dri/0' (drv_module_reload:1181) igt-debugfs-DEBUG: Opening debugfs directory '/sys/kernel/debug/dri/0' (drv_module_reload:1181) igt-debugfs-DEBUG: Opening debugfs directory '/sys/kernel/debug/dri/0' (drv_module_reload:1181) igt-debugfs-DEBUG: Opening debugfs directory '/sys/kernel/debug/dri/0' (drv_module_reload:1181) igt-debugfs-DEBUG: Opening debugfs directory '/sys/kernel/debug/dri/0' (drv_module_reload:1181) igt-debugfs-DEBUG: Opening debugfs directory '/sys/kernel/debug/dri/0' (drv_module_reload:1181) CRITICAL: Test assertion failure function store_dword, file drv_module_reload.c:111: (drv_module_reload:1181) CRITICAL: Failed assertion: *batch == 0xc0ffee (drv_module_reload:1181) CRITICAL: error: 0 != 12648430 **** END **** Subtest basic-reload-final: FAIL (1.584s) (drv_module_reload:1181) igt-core-DEBUG: Exiting with status code 99 root@collab-x220:~/work/igt-gpu-tools# tests/drv_module_reload --run-subtest basic-reload-final --debug IGT-Version: 1.18-g8039c0ef6e51 (x86_64) (Linux: 4.11.0-rc8.intel-boxes+ x86_64) (drv_module_reload:1200) igt-core-DEBUG: Starting subtest: basic-reload-final ^C
The answer lies in if the GPU didn't write the dword to where we wanted, where did it write it? And the same worrying question will be for lots of different ops. The HWS is intact otherwise it would have complained about a gpu hang... Actually no, that test isn't checking so that's a more reasonable explanation that the HWS write also went astray, i.e. it is likely that nothing is right in the way the GPU is addressing memory. In load/unload we do a GPU reset, so everything should be sane... Remove the store dword check and see it it still hard hangs. If so, we can start removing chunks from the load/unload sequence and see at what point it works.
(In reply to Chris Wilson from comment #1) > The answer lies in if the GPU didn't write the dword to where we wanted, > where did it write it? And the same worrying question will be for lots of > different ops. The HWS is intact otherwise it would have complained about a > gpu hang... Actually no, that test isn't checking so that's a more > reasonable explanation that the HWS write also went astray, i.e. it is > likely that nothing is right in the way the GPU is addressing memory. > > In load/unload we do a GPU reset, so everything should be sane... > > Remove the store dword check and see it it still hard hangs. If so, we can > start removing chunks from the load/unload sequence and see at what point it > works. It doesn't hard hang without the store_dword sequence. The hang happens on gem_write when executing the blt engine. One interesting aspect is that the issue is only reproducible with intel_iommu=on. If that parameter is disabled, the test succeeds and the system never hangs.
Krisman can you add information regarding the platform
Ignore last comment
Good afternoon, Has the status of this bug changed recently? Is there any new information? Thanks.
Changing priority since is IGT basic Failure. Thanks.
Closing since: commit 2fea8d26e589a9d256eca9f3d561750ecb3fb681 Author: Marius Vlad <marius.c.vlad@intel.com> Date: Thu Dec 1 14:23:57 2016 +0200 tests/drv_module_reload: Convert sh script to C version. Cc: tomi.p.sarvela@intel.com Tested-by: Arkadiusz Hiler <arkadiusz.hiler@intel.com> Reviewed-by: Arkadiusz Hiler <arkadiusz.hiler@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> We always need to make sure there's a working driver, hence need to move the -final test into the igt_fixture.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.