Summary: | [SKL]Time out and system reboot fails while running IGT cases: gem_ringfill/render, gem_ringfill/render-interruptible | ||||||
---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | fangxun <xunx.fang> | ||||
Component: | DRM/Intel | Assignee: | cprigent <christophe.prigent> | ||||
Status: | CLOSED WORKSFORME | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> | ||||
Severity: | major | ||||||
Priority: | high | CC: | christophe.prigent, intel-gfx-bugs, michel.thierry, thomas.daniel | ||||
Version: | unspecified | ||||||
Hardware: | All | ||||||
OS: | Linux (All) | ||||||
Whiteboard: | |||||||
i915 platform: | SKL | i915 features: | GEM/Other | ||||
Attachments: |
|
Michel, have you seen this one? It's hard to capture logs since the system hangs pretty hard, but I saw one that was a bad io access in the iowrite32 in intel_logical_ring_emit() which sent me searching for our virtual_start mapping setup. That led me to something like this: diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index fcb074b..bc97457 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -504,8 +504,11 @@ static int execlists_context_queue(struct intel_engine_cs * unsigned long flags; int num_elements = 0; - if (to != ring->default_context) - intel_lr_context_pin(ring, to); + if (to != ring->default_context) { + ret = intel_lr_context_pin(ring, to); + if (ret) + return ret; + } if (!request) { /* @@ -802,13 +805,16 @@ intel_logical_ring_advance_and_submit(struct intel_ringbuf struct drm_i915_gem_request *request) { struct intel_engine_cs *ring = ringbuf->ring; + int ret; intel_logical_ring_advance(ringbuf); if (intel_ring_stopped(ring)) return; - execlists_context_queue(ring, ctx, ringbuf->tail, request); + ret = execlists_context_queue(ring, ctx, ringbuf->tail, request); + if (ret) + DRM_ERROR("execlist context queue failed: %d\n", ret); } static int intel_lr_context_pin(struct intel_engine_cs *ring, but that's not sufficient to fix this bug. It does seem important that we check these return values though. And this failure may indicate something wrong with the lrc handling code, I'm not sure. Some additional, custom kernel debug code would probably help narrow things down. Those tests pass in BDW, so there must be something we need to change for SKL. I'll try to find one in the office. Command submission hang with "reset button does not work" is something I've been experiencing "forever" on my SKL. In my case reset button actually works but with ~20 second delay (same with power off). And I was reproducing it with gem_exec_nop, or actually any other submission but much less frequently. So even any IGT can hang since it does a submission on startup. I was able to get occasional lockdep traces over serial when it happens, but extremely rarely, and they would point to seemingly impossible locking scenarios. Can try and dig them out if we think it is the same bug. This may be a duplicate of https://bugs.freedesktop.org/show_bug.cgi?id=88865 and may be fixed by the 'OLR removal' patch set. Assigning to QA for duplication; could be fixed already or hidden by the ringfill hard hangs #90854. i've tested the following tests cases with drm-intel-testing and nightly and on both kernels the tests passed on SKL-Y Test cases tested : ./gem_ringfill --run-subtest render ./gem_ringfill --run-subtest render-interruptible Kernel : latest drm-intel-nightly: 2015y-11m-06d-12h-48m-02s UTC integration manifest commit a3b0dec82fdb59c629c4fb9847245b80b0cf69dd Author: Jani Nikula <jani.nikula@intel.com> Date: Fri Nov 6 14:48:23 2015 +0200 Kernel : latest drm-intel-testing (4.3.0-rc6-testing) commit 87074657f22e38163e712ca417e1a398d00096b6 Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Fri Oct 23 11:56:52 2015 +0200 Software configuration : -------------------------------- Ubuntu 14.04.03 x86_64 Xserver : 1.17.4 (commit : 2c7fa2a) libdrm : 2.4.65 (commit :c349616) Xf86-video-intel : 2.99.917 (commit : baec802) Mesa : 11.0.4 (commit : 31bf247) Libva : 1.6.1 (commit : 613eb96) Intel-driver : 1.6.1 (commit : 35858c6) Cairo : 1.14.4 (commit : 0317ee7) --- Hardware information --- CPU information : Intel(R) Core(TM) m5-6Y57 CPU @ 1.10GHz GPU Card : Intel Corporation Device 191e (rev 07) (prog-if 00 [VGA controller]) Bios : 102.0 KSC : 1.15 Memory ram : 4 GB So i will proceed to close this bug as fixed, if in the future this bug is needed please reopen it So closed |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 113215 [details] dmes file ==System Environment== -------------------------- Regression: not sure Non-working platforms: SKL ==kernel== -------------------------- drm-intel-nightly/9583cb ==Bug detailed description== ----------------------------- Time out while running IGT cases: gem_ringfill/render, gem_ringfill/render-interruptible. System failed to reboot after that. Reproduce Steps ============== ./gem_ringfill --run-subtest render ./gem_ringfill --run-subtest render-interruptible