Summary: | [BDW/BYT Bisected semaphores] igt/gem_gtt_hog fails, gem_reset_stats and more | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | Guo Jinxian <jinxianx.guo> | ||||||||||||||||
Component: | DRM/Intel | Assignee: | Elio <elio.martinez.monroy> | ||||||||||||||||
Status: | CLOSED FIXED | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> | ||||||||||||||||
Severity: | major | ||||||||||||||||||
Priority: | highest | CC: | ben, intel-gfx-bugs | ||||||||||||||||
Version: | unspecified | ||||||||||||||||||
Hardware: | Other | ||||||||||||||||||
OS: | All | ||||||||||||||||||
Whiteboard: | |||||||||||||||||||
i915 platform: | BYT | i915 features: | |||||||||||||||||
Attachments: |
|
Description
Guo Jinxian
2014-07-11 06:43:11 UTC
Created attachment 102597 [details]
dmesg
Please attach the error state as that may provide the vital clue, and please bisect. This bug still able to reproduce on latest -nightly() while running test igt/kms_flip/flip-vs-panning-vs-hang ./kms_flip --run-subtest flip-vs-panning-vs-hang IGT-Version: 1.7-g8d60b82 (x86_64) (Linux: 3.16.0-rc5_drm-intel-nightly_778206_20140716+ x86_64) Using monotonic timestamps Beginning flip-vs-panning-vs-hang on crtc 7, connector 18 1920x1080 60 1920 1966 1996 2080 1080 1082 1086 1112 0xa 0x48 138780 .. flip-vs-panning-vs-hang on crtc 7, connector 18: PASSED Beginning flip-vs-panning-vs-hang on crtc 11, connector 18 1920x1080 60 1920 1966 1996 2080 1080 1082 1086 1112 0xa 0x48 138780 ... flip-vs-panning-vs-hang on crtc 11, connector 18: PASSED Beginning flip-vs-panning-vs-hang on crtc 15, connector 18 1920x1080 60 1920 1966 1996 2080 1080 1082 1086 1112 0xa 0x48 138780 ... flip-vs-panning-vs-hang on crtc 15, connector 18: PASSED Subtest flip-vs-panning-vs-hang: SUCCESS Test assertion failure function gem_quiescent_gpu, file drmtest.c:149: Last errno: 5, Input/output error Failed assertion: drmIoctl((fd), ((((1U) << (((0+8)+8)+14)) | ((('d')) << (0+8)) | (((0x40 + 0x29)) << 0) | ((((sizeof(struct drm_i915_gem_execbuffer2)))) << ((0+8)+8)))), (&execbuf)) == 0 kms_flip: igt_core.c:651: igt_fail: Assertion `!test_with_subtests || in_fixture' failed. Aborted (core dumped) (In reply to comment #2) > Please attach the error state as that may provide the vital clue, and please > bisect. 521e62e49a42661a4ee0102644517dbe2f100a23 is the first bad commit commit 521e62e49a42661a4ee0102644517dbe2f100a23 Author: Rodrigo Vivi <rodrigo.vivi@intel.com> AuthorDate: Mon Jun 30 09:53:44 2014 -0700 Commit: Daniel Vetter <daniel.vetter@ffwll.ch> CommitDate: Mon Jul 7 23:16:56 2014 +0200 drm/i915: Enable semaphores on BDW Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> The error state file is too big to bugzilla, I will send it via email. Error-state says: IPEHR 0x54300005 on render ring. Those semaphores are not serialising properly. Fix or revert. Hi Guo, can you please send me the error state? (In reply to comment #6) > Hi Guo, can you please send me the error state? I forwarded the error state to you by email. Please check you email. Thanks. *** Bug 81317 has been marked as a duplicate of this bug. *** There are some things in the error state that look slightly suspicious (it kind of looks like mesa is also running). Since we cannot reproduce it locally, can you please generate another error state? The compressed error state should fit as an attachment. Thanks. Created attachment 103244 [details] i915 error state (In reply to comment #9) > There are some things in the error state that look slightly suspicious (it > kind of looks like mesa is also running). > > Since we cannot reproduce it locally, can you please generate another error > state? The compressed error state should fit as an attachment. > > Thanks. Update i915 error state. Created attachment 103308 [details]
error state with "i915.semaphore=0"
Created attachment 103309 [details]
error state with "i915.semaphore=1"
Something is not right with the semaphore = 0 error state. Semaphores signals and waits are still being emitted. Can you please confirm you correct set the module parameter for that test by reproducing, and providing the dmesg and new error state? Created attachment 103311 [details] dmesg (In reply to comment #13) > Something is not right with the semaphore = 0 error state. Semaphores > signals and waits are still being emitted. Can you please confirm you > correct set the module parameter for that test by reproducing, and providing > the dmesg and new error state? I misspelled "i915.semaphores=0" to "i915.semaphore=0" last time. With correct parameter "i915.semaphores=0", the test was passed on latest -nightly(411fa8b275ee903df6c07976af4eebe5815646a4) [root@x-bdw01 tests]# time ./gem_gtt_hog IGT-Version: 1.7-g70e6ed9 (x86_64) (Linux: 3.16.0-rc6_drm-intel-nightly_411fa8_20140722+ x86_64) Time to execute 64 children: 93781.680ms real 1m33.837s user 0m0.018s sys 0m0.804s Created attachment 103312 [details] error state with "i915.semaphores=0" (In reply to comment #14) > Created attachment 103311 [details] > dmesg > > (In reply to comment #13) > > Something is not right with the semaphore = 0 error state. Semaphores > > signals and waits are still being emitted. Can you please confirm you > > correct set the module parameter for that test by reproducing, and providing > > the dmesg and new error state? > > I misspelled "i915.semaphores=0" to "i915.semaphore=0" last time. With > correct parameter "i915.semaphores=0", the test was passed on latest > -nightly(411fa8b275ee903df6c07976af4eebe5815646a4) > > [root@x-bdw01 tests]# time ./gem_gtt_hog > IGT-Version: 1.7-g70e6ed9 (x86_64) (Linux: > 3.16.0-rc6_drm-intel-nightly_411fa8_20140722+ x86_64) > Time to execute 64 children: 93781.680ms > > real 1m33.837s > user 0m0.018s > sys 0m0.804s Update error state with correct parameter Created attachment 103569 [details]
dmesg
Test failed on latest -nightly(e967a525207bd40ab446e2f809907039f88e66f3). but the failure is a litter difference. I am not sure if they are the same issue.
[root@x-bdw01 tests]# time ./gem_gtt_hog
IGT-Version: 1.7-gfcbc502 (x86_64) (Linux: 3.16.0-rc6_drm-intel-nightly_e967a5_20140727+ x86_64)
Test assertion failure function run, file gem_gtt_hog.c:152:
Failed assertion: x == canary
Test assertion failure function run, file gem_gtt_hog.c:152:
Failed assertion: x == canary
Test assertion failure function run, file gem_gtt_hog.c:152:
Failed assertion: x == canary
Test assertion failure function run, file gem_gtt_hog.c:152:
Failed assertion: x == canary
Test assertion failure function __real_main156, file gem_gtt_hog.c:187:
Failed assertion: status == 0
real 9m37.636s
user 0m0.007s
sys 0m0.179s
[root@x-bdw01 tests]#
[root@x-bdw01 tests]# echo $?
99
It seems not exactly the same as the one when we filing the bug. can we close this bug and file a new one to track new issue? *** Bug 81831 has been marked as a duplicate of this bug. *** Is this still happening on latest -nightly? It contains the semaphores disabled by default, but per latest duplicated mark it seems it isn't anymore related to semaphores, right? (In reply to comment #19) > Is this still happening on latest -nightly? > > It contains the semaphores disabled by default, but per latest duplicated > mark it seems it isn't anymore related to semaphores, right? The failure unable to reproduce on latest -nightly(0f7cc12c94e3a3033a46ce41bed55e8b6b35561b) root@x-bdw05:/GFX/Test/Intel_gpu_tools/intel-gpu-tools/tests# ./gem_gtt_hog IGT-Version: 1.7-g5f16ef6 (x86_64) (Linux: 3.17.0-rc6_drm-intel-nightly_0f7cc1_20140925+ x86_64) Time to execute 64 children: 72740.048ms root@x-bdw05:/GFX/Test/Intel_gpu_tools/intel-gpu-tools/tests# echo $? 0 (In reply to comment #19) > It contains the semaphores disabled by default, but per latest duplicated > mark it seems it isn't anymore related to semaphores, right? No. The last duplicate was because gem_seqno_wrap failed with semaphores enabled but passed with semaphores disabled... Ok, so with semaphores disabled we can close this for now. Created a Jira task to investigate semaphores further later: https://jira01.devtools.intel.com/browse/VIZ-4400 Verified. Closing verified+fixed. The problem with gem_seqno_wrap still happening on BYT with following configuration: Platform BYT: Acer Aspire XC-603 CPU: Intel(R) Pentium(R) CPU J2900 @ 2.41GHz (family 6, model 55, stepping 8) Motherboard: Aspire XC-603 GPU: Intel® HD Graphics - Intel Corporation Atom Processor Z36xxx/Z37xxx Series Graphics & Display (rev 0e) Software Bios: P11-B2 Linux distribution: Ubuntu 16.04 64 bits Kernel: 4.9.0-rc4 91e164f branch drm-intel-nightly from http://cgit.freedesktop.org/drm-intel/ commit 91e164fea17d3e5366048b6eae3c6eea4e14e9fe Author: Ville Syrjälä <ville.syrjala@linux.intel.com> Date: Mon Nov 14 16:31:06 2016 +0200 drm-intel-nightly: 2016y-11m-14d-14h-30m-30s UTC integration manifest libdrm-2.4.71-13 670f1e4 from git://anongit.freedesktop.org/mesa/drm mesa: mesa-13.0.0 df1b0a5 from git://anongit.freedesktop.org/mesa/mesa cairo 1.15.2 db8a7f1 from git://anongit.freedesktop.org/cairo xorg-server-1.18.99.902-2 7513da4 from git://git.freedesktop.org/git/xorg/xserver xf86-video-intel 2.99.917-731 d1d14f2 from git://git.freedesktop.org/git/xorg/driver/xf86-video-intel libva-1.7.2-40 3a7547b from git://git.freedesktop.org/git/vaapi/libva vaapi-intel-driver: 1.7.2-157 55a538c from git://git.freedesktop.org/git/vaapi/intel-driver intel-gpu-tools-1.16-132 773ac7c from http://anongit.freedesktop.org/git/xorg/app/intel-gpu-tools.git External screen: DELL U2312HM (VGA) errors: sudo ./gem_seqno_wrap IGT-Version: 1.16-g40a3ada (x86_64) (Linux: 4.9.0-rc6-nightly+ x86_64) seqno readback differs rb:0x0 vs w:0x1 (gem_seqno_wrap:6004) CRITICAL: Test assertion failure function preset_run_once, file gem_seqno_wrap.c:383: (gem_seqno_wrap:6004) CRITICAL: Failed assertion: write_seqno(1) == 0 (gem_seqno_wrap:6004) CRITICAL: error: -1 != 0 Stack trace: #0 [__igt_fail_assert+0xf1] #1 [main+0x382] #2 [__libc_start_main+0xf0] #3 [_start+0x29] #4 [<unknown>+0x29] Test gem_seqno_wrap failed. **** DEBUG **** (gem_seqno_wrap:6004) DEBUG: next_seqno set to: 0x1 (gem_seqno_wrap:6004) DEBUG: next_seqno: 0x0 (gem_seqno_wrap:6004) INFO: seqno readback differs rb:0x0 vs w:0x1 (gem_seqno_wrap:6004) CRITICAL: Test assertion failure function preset_run_once, file gem_seqno_wrap.c:383: (gem_seqno_wrap:6004) CRITICAL: Failed assertion: write_seqno(1) == 0 (gem_seqno_wrap:6004) CRITICAL: error: -1 != 0 **** END **** FAIL (0.024s) No dmesg available. [49295.265477] [IGT] gem_seqno_wrap: executing [49295.289616] [IGT] gem_seqno_wrap: exiting, ret=99 (In reply to Elio from comment #25) > The problem with gem_seqno_wrap still happening on BYT with following > configuration: This was a new regression. It should have been a new bug. commit 9607ae79710afb453173b90d5bf564788a6e09b1 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Thu Nov 24 09:47:52 2016 +0000 drm/i915/debugfs: Increment return value of gt.next_seqno The i915_next_seqno read value is to be the next seqno used by the kernel. However, in the conversion to atomics ops for gt.next_seqno, in commit 28176ef4cfa5 ("drm/i915: Reserve space in the global seqno during request allocation"), this was changed from a post-increment to a pre-increment. This increment was missed from the value reported by debugfs, so in effect it was reporting the current seqno (last assigned), not the next seqno. (In reply to Chris Wilson from comment #27) > commit 9607ae79710afb453173b90d5bf564788a6e09b1 > Author: Chris Wilson <chris@chris-wilson.co.uk> > Date: Thu Nov 24 09:47:52 2016 +0000 > > drm/i915/debugfs: Increment return value of gt.next_seqno > > The i915_next_seqno read value is to be the next seqno used by the > kernel. However, in the conversion to atomics ops for gt.next_seqno, in > commit 28176ef4cfa5 ("drm/i915: Reserve space in the global seqno during > request allocation"), this was changed from a post-increment to a > pre-increment. This increment was missed from the value reported by > debugfs, so in effect it was reporting the current seqno (last > assigned), not the next seqno. Elio, please re-test and confirm that the issue you posted here is fixed then. Moreover, don't forget to create new bug when defect cause is different. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.