Created attachment 106699 [details] dmesg System Environment: -------------------------- Platform: IVB Kernel: (drm-intel-nightly)c5660b4ad395f1e34eacc22cf81c687edfc9c83c Bug detailed description: --------------------------- It sporadically cause system hang, fail rate: 2/3. It happens on -queued, -fixes and -nightly kernel. Run 10 cycles on -queued branch commit 6e47e3f097cc6c4cb470a805a3fa07a8e8376dab, it works well. good commit: 6e47e3f097cc6c4cb470a805a3fa07a8e8376dab bad commit: b680c37a4d145cf4d8f2b24e46b1163e5ceb1d35 output: IGT-Version: 1.8-g4b81e9c (x86_64) (Linux: 3.17.0-rc5_drm-intel-nightly_c5660b_20140922_debug+ x86_64) Subtest normal: SUCCESS (0.218s) Subtest interruptible: SUCCESS (0.202s) Subtest flink: SUCCESS (0.817s) Subtest flink-interruptible: SUCCESS (0.646s) Reproduce steps: ------------------------- 1. xinit 2. run ./gem_render_copy_redux 3 cycles.
It impacts SNB+ platforms.
The good commit sha does't exist on current -nightly So please paste the commit subject along with sha. Also, the bad commit is a Docbook integration, so I assume there was still a gap beween bad and good commit right?
good commit:9c787942907face82da505c2c5493998b56cfc5a
Yeah, I big gap between those 2. Could you please go a bit deep on this bisect, trying to find the offending commit?
The bisect is difficult. I bisect it and run each step 10 rounds, shows 9430dfa67d7 is the first bad commit. Then test on commit b478e336b3e755057, it also has this issue, fail rate: 1/25.
Could you run with CONFIG_DEBUG_PAGEALLOC, CONFIG_DEBUG_OBJECTS, CONFIG_SLUB_DEBUG_ON, CONFIG_PROVE_LOCKING and CONFIG_DEBUG_LIST enabled?
Created attachment 106898 [details] dmesg (In reply to comment #6) > Could you run with CONFIG_DEBUG_PAGEALLOC, CONFIG_DEBUG_OBJECTS, > CONFIG_SLUB_DEBUG_ON, CONFIG_PROVE_LOCKING and CONFIG_DEBUG_LIST enabled? Attached the dmesg with these settings.
Hmm, that shows no signs of poisoning, so I think we can rule out a double-free of the i915_mm_struct. Could you gdb vmlinux (it will be in the root directory of your build tree) and "list * mmu_notifier_unregister+0x18"
You run nothing else after boot? (i.e. you boot the machine and the first thing you run is gem_render_copy_redux in a loop?) I can't see how that would trigger userptr and creation of the i915_mm_struct.
Oh, the userptr is from libdrm. And the invalid dereference is an error pointer.
*** Bug 84358 has been marked as a duplicate of this bug. ***
commit f2775039b1d2f3c24876622e4528604496de8abc Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Sep 26 10:22:33 2014 +0100 igt/gem_userptr_blits: Test interruptible create-destroy In order to exercise https://bugs.freedesktop.org/show_bug.cgi?id=84207 we need to interrupt the mmu_notifier_register with a signal. This is likely to be quite difficult, but let's just try running the create-destroy test in an interruptible loop for 5s. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
http://patchwork.freedesktop.org/patch/34166/
(In reply to comment #13) > http://patchwork.freedesktop.org/patch/34166/ Apply this patch and test on latest igt, run 30 cycles, it works well.
OOO on Sep 28 – Sep 30. Sorry for no mail access. Best Regards, Shuo
Test below cause system hang on HSW IVB [root@x-hsw27 tests]# time ./gem_userptr_blits --run-subtest create-destroy-sync IGT-Version: 1.8-g32a0308 (x86_64) (Linux: 3.17.0-rc6_drm-intel-fixes_c84db7_20140929+ x86_64) Aperture size is 2048 MiB Total RAM is 7669 MiB Testing unsynchronized mappings... Testing synchronized mappings... ^C [root@x-ivb9 tests]# time ./gem_userptr_blits --run-subtest create-destroy-sync IGT-Version: 1.8-g32a0308 (x86_64) (Linux: 3.17.0-rc6_drm-intel-nightly_7101d8_20140929+ x86_64) Aperture size is 2048 MiB Total RAM is 3836 MiB Testing unsynchronized mappings... Testing synchronized mappings... ^C^C
(In reply to comment #14) > (In reply to comment #13) > > http://patchwork.freedesktop.org/patch/34166/ > > Apply this patch and test on latest igt, run 30 cycles, it works well. Pushed to drm-intel-fixes as commit 72e59c89131606106452f1773a316b90d9f54423 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Sep 26 10:31:02 2014 +0100 drm/i915: Do not store the error pointer for a failed userptr registration (In reply to comment #16) > Test below cause system hang on HSW IVB Please file new reports for them, AFAICT these are separate issues.
(In reply to comment #17) > (In reply to comment #16) > > Test below cause system hang on HSW IVB > > Please file new reports for them, AFAICT these are separate issues. Nope. They were test failures added to explicitly reproduce this bug, see comment 12.
(In reply to comment #17) > (In reply to comment #14) > > (In reply to comment #13) > > > http://patchwork.freedesktop.org/patch/34166/ > > > > Apply this patch and test on latest igt, run 30 cycles, it works well. > > Pushed to drm-intel-fixes as > > commit 72e59c89131606106452f1773a316b90d9f54423 > Author: Chris Wilson <chris@chris-wilson.co.uk> > Date: Fri Sep 26 10:31:02 2014 +0100 > > drm/i915: Do not store the error pointer for a failed userptr > registration > scratch that, it's drm-intel-next-fixes as commit e9681366ea9e76ab8f75e84351f2f3ca63ee542c Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Sep 26 10:31:02 2014 +0100 drm/i915: Do not store the error pointer for a failed userptr registration > (In reply to comment #16) > > Test below cause system hang on HSW IVB > > Please file new reports for them, AFAICT these are separate issues.
(In reply to comment #18) > Nope. They were test failures added to explicitly reproduce this bug, see > comment 12. Reopen?
commit e9681366ea9e76ab8f75e84351f2f3ca63ee542c Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Sep 26 10:31:02 2014 +0100 drm/i915: Do not store the error pointer for a failed userptr registration Fixes regression from commit ad46cb533d586fdb256855437af876617c6cf609 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Thu Aug 7 14:20:40 2014 +0100 drm/i915: Prevent recursive deadlock on releasing a busy userptr Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84207 Testcase: igt/gem_render_copy_redux Testcase: igt/gem_userptr_blits/create-destroy-sync Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jacek Danecki <jacek.danecki@intel.com> Cc: "Gong, Zhipeng" <zhipeng.gong@intel.com> Cc: Jacek Danecki <jacek.danecki@intel.com> Cc: "Ursulin, Tvrtko" <tvrtko.ursulin@intel.com> Cc: stable@vger.kernel.org Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Note that the commit references both this bugzilla and the new test case from c12 that qa reported failure for in c17.
Verified.Fixed.
Closing old verified+fixed.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.