Bug 92599

Summary: [BAT SNB BDW SKL] drv_module_reload_basic is failing
Product: DRI Reporter: Daniel Vetter <daniel>
Component: DRM/IntelAssignee: Joonas Lahtinen <joonas.lahtinen>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: blocker    
Priority: highest CC: intel-gfx-bugs, joonas.lahtinen
Version: DRI git   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: BDW, SKL, SNB i915 features:
Bug Depends on: 90963    
Bug Blocks:    
Attachments:
Description Flags
kern.log.bz2
none
results.json.bz2
none
drv-module-reload-basic_skl-y_kern.log none

Description Daniel Vetter 2015-10-22 14:50:13 UTC
Noise in dmesg. Probably more platforms affected (we don't have all the debug options enabled yet, and a lot of the leak issues only get caught when unloading the module). Probably needs to be split up into a bug per actual issue.
Comment 1 cprigent 2015-10-28 10:32:26 UTC
Created attachment 119245 [details]
kern.log.bz2

Confirmed on SKL-Y:
Hardware:
Motherboard: Skylake Y
cpu model name : Intel(R) Core(TM) m5-6Y54 CPU @ 1.10GHz
cpu model : 78
cpu family : 6
Graphic card: Device 191e (rev 07)
Software:
Kernel: 4.3.0-rc7 drm-intel-nightly 34d1da7d864295c6411788d84b44567f029defd6 from git://anongit.freedesktop.org/drm-intel
commit 34d1da7d864295c6411788d84b44567f029defd6
Author: Jani Nikula <jani.nikula@intel.com>
Date:   Tue Oct 27 15:54:58 2015 +0200
drm-intel-nightly: 2015y-10m-27d-13h-54m-35s UTC integration manifest
Ubuntu 14.04.2 LTS
Bios: SKLSE2R1.R00.X100.B01.1509220551
Libdrm: 2.4.65

Log attached.
Comment 2 cprigent 2015-10-28 10:32:43 UTC
Created attachment 119246 [details]
results.json.bz2
Comment 3 Joonas Lahtinen 2015-11-25 14:10:08 UTC
Please, do not compress the plain text attachments.

Also, the kern.log seems to contain output of gem_mmap and gem_mmap_gtt tests, could it be possible to have a clean run which means that only the offending test is executed after boot, and the log then captured.

Judging by the results.json, there are consumers for the i915 module so it can not be removed. Is there graphical desktop environment or other client running in addition to the test?
Comment 4 cprigent 2015-11-25 15:23:09 UTC
Created attachment 120114 [details]
drv-module-reload-basic_skl-y_kern.log

Tested again on SKL-Y with last setup.

It looks like the test is Pass. There is no "success" printed.
# ./drv_module_reload_basic
unbinding /sys/class/vtconsole/vtcon1/: (M) frame buffer device
module successfully unloaded
module successfully loaded again

There are several errors in kernel log:
Nov 25 15:52:59 SKLY4 kernel: [   70.721619] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe B FIFO underrun
Nov 25 15:53:01 SKLY4 kernel: [   72.046113] [drm:intel_dp_link_training_clock_recovery [i915]] *ERROR* too many full retries, give up
Nov 25 15:53:01 SKLY4 kernel: [   72.061783] [drm:intel_dp_link_training_clock_recovery [i915]] *ERROR* too many full retries, give Nov 25 15:53:01 SKLY4 kernel: [   72.077462] [drm:intel_dp_link_training_clock_recovery [i915]] *ERROR* too many full retries, give up
up
Nov 25 15:53:01 SKLY4 kernel: [   72.093128] [drm:intel_dp_link_training_clock_recovery [i915]] *ERROR* too many full retries, give up
Nov 25 15:53:01 SKLY4 kernel: [   72.108405] [drm:intel_dp_link_training_clock_recovery [i915]] *ERROR* too many full retries, give up
Nov 25 15:53:01 SKLY4 kernel: [   72.137256] [drm:intel_dp_link_training_clock_recovery [i915]] *ERROR* too many full retries, give up
Nov 25 15:53:01 SKLY4 kernel: [   72.137478] [drm:intel_dp_start_link_train [i915]] *ERROR* failed to train DP, aborting
Nov 25 15:53:01 SKLY4 kernel: [   72.238813] [drm:intel_atomic_commit [i915]] *ERROR* mismatch in has_infoframe (expected 1, found 0)
Nov 25 15:53:01 SKLY4 kernel: [   72.415019] snd_hda_intel 0000:00:1f.3: spurious response 0x80000000:0x2, last cmd=0x000000
Nov 25 15:53:06 SKLY4 kernel: [   77.453504] hdaudio hdaudioC0D2: no AFG or MFG node found
Nov 25 15:53:06 SKLY4 kernel: [   77.453633] snd_hda_intel 0000:00:1f.3: no codecs initialized

Hardware
Platform: SKY LAKE Y A0 QUAL
CPU : Intel(R) Core(TM) M5-6Y54 @ 1.10GHz 4MB (family: 6, model: 78  stepping: 3)
MCP : SKL-Y  D1  
QDF : QJ9W
CPU : SKL D0
Chipset PCH: Sunrise Point LP C1       
CRB : SKY LAKE Y LPDDR3 RVP3 CRB FAB2
Reworks : All Mandatories + FBS02 & FBS03, O-06
Software 
Linux : Ubuntu 14.04 64 bits
BIOS : SKLSE2R1.R00.B104.B01.1511110114
ME FW : 11.0.0.1191
Ksc (EC FW): 1.20
Kernel 4.4.0-rc2 nighlty 9e096bc from git://anongit.freedesktop.org/drm-intel
  commit 9e096bc5a20d1d8122740136ab6c584afd4cb913
  Author: Imre Deak <imre.deak@intel.com>
  Date:   Mon Nov 23 17:11:06 2015 +0200
  drm-intel-nightly: 2015y-11m-23d-15h-10m-47s UTC integration manifest
Mesa 11.0.5 from http://cgit.freedesktop.org/mesa/mesa/
xf86-video-intel - 2.99.917 from http://cgit.freedesktop.org/xorg/driver/xf86-video-intel/
Libdrm - 2.4.65 from http://cgit.freedesktop.org/mesa/drm/
Libva - 1.6.1 from http://cgit.freedesktop.org/libva/
vaapi intel-driver - 1.6.1 from http://cgit.freedesktop.org/vaapi/intel-driver
Cairo - 1.14.2 from http://cgit.freedesktop.org/cairo
Xorg Xserver - 1.17.2 from http://cgit.freedesktop.org/xorg/xserver
IGT 1.12-gd84e624
Comment 5 Elio 2015-11-25 16:23:16 UTC
This problem is present as well on BYT-M BAT igt run with 4.4.0-rc1-nightly+
Kernel

Latest configurarion, single run
Comment 6 Joonas Lahtinen 2015-11-26 13:21:42 UTC
Looking at the log, original bug report, and deducing from the fact that it is working for QA currently: The test failed due to noise in dmesg (failed DP link training and WARN_ON(!wm_changed)), both of which appear already in the boot before module is unloaded. This should work just fine if you do not have any drm clients running, so this has to be ran in text-mode.

wm_changed bug seems to be SKL related, would be interesting to see the kernel log for BYT-M. I will delegate the information from this task to the respective bugs.

*** This bug has been marked as a duplicate of bug 89055 ***
Comment 7 Daniel Vetter 2015-11-26 13:23:59 UTC
Ok, I think we need to split this up into per-case reports. I'm closing this one (shame on me for filing such a lame bug report), please everyone file new reports.

module reload can hit all paths, so you need to double-check dmesg to make sure that it's not a different issue: fifo underruns, lockdep splat, WARN backtraces, whatever are all different bugs.
Comment 8 Jari Tahvanainen 2016-09-30 09:36:24 UTC
Closing.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.