Created attachment 111939 [details] dmesg ==System Environment== -------------------------- Regression: yes Non-working platforms: PNV ==kernel== -------------------------- drm-intel-nightly/296cb88821b7113a9add839ba1e8e116ff13ecc0 commit 296cb88821b7113a9add839ba1e8e116ff13ecc0 Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Wed Jan 7 18:19:46 2015 +0100 drm-intel-nightly: 2015y-01m-07d-17h-19m-17s UTC integration manifest ==Bug detailed description== ----------------------------- It fails on PNV with drm-intel-nightly and drm-intel-next-queued kernel, works well on drm-intel-fixes kernel. Bisect shows: 43566dedde54f9729113f5f9fde77d53e75e61e9 is the first bad commit. commit 43566dedde54f9729113f5f9fde77d53e75e61e9 Author: Chris Wilson <chris@chris-wilson.co.uk> AuthorDate: Fri Jan 2 16:29:29 2015 +0530 Commit: Daniel Vetter <daniel.vetter@ffwll.ch> CommitDate: Tue Jan 6 09:08:00 2015 +0100 drm/i915: Broaden application of set-domain(GTT) Previously, this was restricted to only operate on bound objects - to make pointer access through the GTT to the object coherent with writes to and from the GPU. A second usecase is drm_intel_bo_wait_rendering() which at present does not function unless the object also happens to be bound into the GGTT (on current systems that is becoming increasingly rare, especially for the typical requests from mesa). A third usecase is a future patch wishing to extend the coverage of the GTT domain to include objects not bound into the GGTT but still in its coherent cache domain. For the latter pair of requests, we need to operate on the object regardless of its bind state. v2: After discussion with Akash, we came to the conclusion that the get-pages was required in order for accurate domain tracking in the corner cases (like the shrinker) and also useful for ensuring memory coherency with earlier cached CPU mmaps in case userspace uses exotic cache bypass (non-temporal) instructions. v3: Fix the inactive object check. v4: Rebase to latest drm-intel-nightly codebase Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Akash Goel <akash.goel@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> output: IGT-Version: 1.9-g25cf055 (i686) (Linux: 3.19.0-rc3_drm-intel-nightly_296cb8_20150108+ i686) Test assertion failure function exec, file gem_exec_big.c:100: Failed assertion: tmp == gem_reloc[0].presumed_offset error: 0 != 3899392 ==Reproduce steps== ---------------------------- 1. ./gem_exec_big
So far it looks like it has uncovered a nasty coherency issue between pwrite + set-to-domain + relocate + pread. Forcing pwrite to use GTT, makes it work, as does removing the user set-to-domain. It also seems to be timing dependent (but it fails quite reproducibly), adding printk also makes it work. Throwing i915_gem_chipset_flush() around is not sufficient. But... diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c index c7f4048..831cef2 100644 --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c @@ -332,6 +332,8 @@ relocate_entry_gtt(struct drm_i915_gem_object *obj, if (ret) return ret; + mb(); + /* Map the page containing the relocation we're going to perform. */ offset = i915_gem_obj_ggtt_offset(obj); offset += reloc->offset; @@ -355,6 +357,7 @@ relocate_entry_gtt(struct drm_i915_gem_object *obj, io_mapping_unmap_atomic(reloc_page); + mb(); return 0; } crazily enough does. And notably diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c index c7f4048..3da285a 100644 --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c @@ -880,7 +880,7 @@ vmas_move_to_rq(struct eb_vmas *eb, i915_gem_chipset_flush(rq->i915); if (flush_domains & I915_GEM_DOMAIN_GTT) - wmb(); + mb(); /* 2: invalidate the caches from this ring after emitting semaphores */ ret = i915_request_emit_flush(rq, I915_INVALIDATE_CACHES); does not. Hmm.
The magic so far: diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c index c7f4048..e5a6d52 100644 --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c @@ -332,6 +332,8 @@ relocate_entry_gtt(struct drm_i915_gem_object *obj, if (ret) return ret; + mb(); + /* Map the page containing the relocation we're going to perform. */ offset = i915_gem_obj_ggtt_offset(obj); offset += reloc->offset;
Making the mb() unconditional at the end of set_to_gtt_domain() is ineffective. So, something must be pertubing the system between the user set-to-domain and the no-op set-to-domain during the reloc???
My hypothesis is that we need a mb() to serialise updating the GTT and accessing the PTE via the CPU. diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem index bdfd44c..c8932b3 100644 --- a/drivers/gpu/drm/i915/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c @@ -1444,6 +1444,7 @@ static int i915_ggtt_bind_vma(struct i915_vma *vma, BUG_ON(!i915_is_ggtt(vma->vm)); intel_gtt_insert_sg_entries(vma->obj->pages, entry, flags); + wmb(); vma->bound = GLOBAL_BIND; vma->vm->dirty = true; is sufficient, but intel_gtt_insert_sg_entries() already ends with a readl()!
Replacing the readl() with wmb() in intel_gtt_insert_sg_entries() works. I just need to explain why the readl() itself doesn't (either the compiler optimises it away or the CPU does).
Created attachment 111983 [details] [review] s/readl/wmb/
Oh, and while I remember. The bug is clearly independent of 43566dedd, but because of the mb() inside set-to-gtt-domain we conveniently masked this bug i.e. the impact of the bug is non-existent, though it is probably the real reason why I had to add the mb() there in the first place. That's for a latter patch. Oh, indeedy.... commit 63256ec5347fb2344a42adbae732b90603c92f35 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Jan 4 18:42:07 2011 +0000 drm/i915: Enforce write ordering through the GTT We need to ensure that writes through the GTT land before any modification to the MMIO registers and so must impose a mandatory write barrier when flushing the GTT domain. This was revealed by relaxing the write ordering by experimentally mapping the registers and the GATT as write-combining. commit d0a57789d5ec807fc218151b2fb2de4da30fbef5 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Oct 9 19:24:37 2012 +0100 drm/i915: Only insert the mb() before updating the fence parameter With a fence, we only need to insert a memory barrier around the actual fence alteration for CPU accesses through the GTT. Performing the barrier in flush-fence was inserting unnecessary and expensive barriers for never fenced objects. Note removing the barriers from flush-fence, which was effectively a barrier before every direct access through the GTT, revealed that we where missing a barrier before the first access through the GTT. Lack of that barrier was sufficient to cause GPU hangs. Bingo.
Disassembling intel_gtt_insert_sg_entries() shows that the readl() is not optimised away, so the issue appears to be that it is simply not an adequate barrier. Seems strange.
Created attachment 112001 [details] [review] Change readl(addr) Even more fun.
(In reply to Chris Wilson from comment #9) > Created attachment 112001 [details] [review] [review] > Change readl(addr) > > Even more fun. Test this patch, it still fail. output: IGT-Version: 1.9-g1dddf32 (i686) (Linux: 3.19.0-rc4_kcloud_d574c4_20150112+ i686) Test assertion failure function gem_execbuf, file ioctl_wrappers.c:437: Failed assertion: ret == 0 Last errno: 28, No space left on device #df -h /dev/sda2 145G 88G 50G 64% /
(In reply to lu hua from comment #10) > (In reply to Chris Wilson from comment #9) > > Created attachment 112001 [details] [review] [review] [review] > > Change readl(addr) > > > > Even more fun. > > Test this patch, it still fail. > output: > IGT-Version: 1.9-g1dddf32 (i686) (Linux: 3.19.0-rc4_kcloud_d574c4_20150112+ > i686) > Test assertion failure function gem_execbuf, file ioctl_wrappers.c:437: > Failed assertion: ret == 0 > Last errno: 28, No space left on device > > #df -h > /dev/sda2 145G 88G 50G 64% / Technically that's a success. The original bug is fixed and now you hit another bug exposed by the new test. If you retest with IGT-Version: 1.9-g25cf055 to confirm the fix then file a new bug for the failure.
Test igt g25cf055 and the latest igt g67eb20c. [root@x-pnv2 tests]# ./gem_exec_big IGT-Version: 1.9-g25cf055 (i686) (Linux: 3.19.0-rc4_drm-intel-nightly_823e71_20150113+ i686) Test assertion failure function exec, file gem_exec_big.c:100: Failed assertion: tmp == gem_reloc[0].presumed_offset error: 0 != 4173824 IGT-Version: 1.9-g67eb20c (i686) (Linux: 3.19.0-rc4_drm-intel-nightly_823e71_20150113+ i686) Test assertion failure function exec, file gem_exec_big.c:99: Failed assertion: tmp == gem_reloc[0].presumed_offset error: 0 != 385024
But that was with a different kernel. Did it include the patch?
(In reply to Chris Wilson from comment #13) > But that was with a different kernel. Did it include the patch? Sorry, didn't apply the patch. Test this patch and igt commit 25cf055, it works well.I will report a new bug to track comment 10's fail. output: IGT-Version: 1.9-g25cf055 (i686) (Linux: 3.19.0-rc4_kcloud_d574c4_20150113+ i686)
> Test this patch and igt commit 25cf055, it works well.I will report a new > bug to track comment 10's fail. bug 88392
commit 983d308cb8f602d1920a8c40196eb2ab6cc07bd2 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Jan 26 10:47:10 2015 +0000 agp/intel: Serialise after GTT updates
(In reply to Chris Wilson from comment #16) > commit 983d308cb8f602d1920a8c40196eb2ab6cc07bd2 Author: Chris Wilson > <chris@chris-wilson.co.uk> Date: Mon Jan 26 10:47:10 2015 +0000 > agp/intel: Serialise after GTT updates Case fail. ./gem_exec_big IGT-Version: 1.9-g9846e7f (i686) (Linux: 3.19.0-rc5_kcloud_983d30_20150211+ i686) (gem_exec_big:3569) ioctl-wrappers-CRITICAL: Test assertion failure function gem_execbuf, file ioctl_wrappers.c:437: (gem_exec_big:3569) ioctl-wrappers-CRITICAL: Failed assertion: ret == 0 (gem_exec_big:3569) ioctl-wrappers-CRITICAL: Last errno: 28, No space left on device FAIL (2.549s) [root@x-pnv2 tests]# uname -a Linux x-pnv2 3.19.0-rc5_kcloud_983d30_20150211+ #84 SMP Wed Feb 11 10:09:16 EST 2015 i686 i686 i386 GNU/Linux
(In reply to Ding Heng from comment #17) > (In reply to Chris Wilson from comment #16) > > commit 983d308cb8f602d1920a8c40196eb2ab6cc07bd2 > Author: Chris Wilson > > <chris@chris-wilson.co.uk> > Date: Mon Jan 26 10:47:10 2015 +0000 > > > > agp/intel: Serialise after GTT updates > > Case fail. > ./gem_exec_big > IGT-Version: 1.9-g9846e7f (i686) (Linux: 3.19.0-rc5_kcloud_983d30_20150211+ > i686) > (gem_exec_big:3569) ioctl-wrappers-CRITICAL: Test assertion failure function > gem_execbuf, file ioctl_wrappers.c:437: > (gem_exec_big:3569) ioctl-wrappers-CRITICAL: Failed assertion: ret == 0 > (gem_exec_big:3569) ioctl-wrappers-CRITICAL: Last errno: 28, No space left > on device > FAIL (2.549s) > [root@x-pnv2 tests]# uname -a > Linux x-pnv2 3.19.0-rc5_kcloud_983d30_20150211+ #84 SMP Wed Feb 11 10:09:16 > EST 2015 i686 i686 i386 GNU/Linux That is the other gem_exec_big bug.
Closing old verified.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.