Bug 88191 - [PNV Bisected]igt/gem_exec_big fails
Summary: [PNV Bisected]igt/gem_exec_big fails
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: All Linux (All)
: high normal
Assignee: Chris Wilson
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-01-08 05:20 UTC by lu hua
Modified: 2017-10-06 14:32 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
dmesg (90.83 KB, text/plain)
2015-01-08 05:20 UTC, lu hua
no flags Details
s/readl/wmb/ (1.37 KB, patch)
2015-01-08 23:22 UTC, Chris Wilson
no flags Details | Splinter Review
Change readl(addr) (2.33 KB, patch)
2015-01-09 10:46 UTC, Chris Wilson
no flags Details | Splinter Review

Description lu hua 2015-01-08 05:20:55 UTC
Created attachment 111939 [details]
dmesg

==System Environment==
--------------------------
Regression: yes

Non-working platforms: PNV

==kernel==
--------------------------
drm-intel-nightly/296cb88821b7113a9add839ba1e8e116ff13ecc0
commit 296cb88821b7113a9add839ba1e8e116ff13ecc0
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Wed Jan 7 18:19:46 2015 +0100

    drm-intel-nightly: 2015y-01m-07d-17h-19m-17s UTC integration manifest


==Bug detailed description==
-----------------------------
It fails on PNV with drm-intel-nightly and drm-intel-next-queued kernel, works well on drm-intel-fixes kernel.
Bisect shows: 43566dedde54f9729113f5f9fde77d53e75e61e9 is the first bad commit.
commit 43566dedde54f9729113f5f9fde77d53e75e61e9
Author:     Chris Wilson <chris@chris-wilson.co.uk>
AuthorDate: Fri Jan 2 16:29:29 2015 +0530
Commit:     Daniel Vetter <daniel.vetter@ffwll.ch>
CommitDate: Tue Jan 6 09:08:00 2015 +0100

    drm/i915: Broaden application of set-domain(GTT)

    Previously, this was restricted to only operate on bound objects - to
    make pointer access through the GTT to the object coherent with writes
    to and from the GPU. A second usecase is drm_intel_bo_wait_rendering()
    which at present does not function unless the object also happens to
    be bound into the GGTT (on current systems that is becoming increasingly
    rare, especially for the typical requests from mesa). A third usecase is
    a future patch wishing to extend the coverage of the GTT domain to
    include objects not bound into the GGTT but still in its coherent cache
    domain. For the latter pair of requests, we need to operate on the
    object regardless of its bind state.

    v2: After discussion with Akash, we came to the conclusion that the
    get-pages was required in order for accurate domain tracking in the
    corner cases (like the shrinker) and also useful for ensuring memory
    coherency with earlier cached CPU mmaps in case userspace uses exotic
    cache bypass (non-temporal) instructions.

    v3: Fix the inactive object check.

    v4: Rebase to latest drm-intel-nightly codebase

    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Reviewed-by: Akash Goel <akash.goel@intel.com>
    Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

output:
IGT-Version: 1.9-g25cf055 (i686) (Linux: 3.19.0-rc3_drm-intel-nightly_296cb8_20150108+ i686)
Test assertion failure function exec, file gem_exec_big.c:100:
Failed assertion: tmp == gem_reloc[0].presumed_offset
error: 0 != 3899392

==Reproduce steps==
---------------------------- 
1. ./gem_exec_big
Comment 1 Chris Wilson 2015-01-08 14:36:47 UTC
So far it looks like it has uncovered a nasty coherency issue between pwrite + set-to-domain + relocate + pread.

Forcing pwrite to use GTT, makes it work, as does removing the user set-to-domain. It also seems to be timing dependent (but it fails quite reproducibly), adding printk also makes it work.

Throwing i915_gem_chipset_flush() around is not sufficient. But...


diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index c7f4048..831cef2 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -332,6 +332,8 @@ relocate_entry_gtt(struct drm_i915_gem_object *obj,
        if (ret)
                return ret;
 
+       mb();
+
        /* Map the page containing the relocation we're going to perform.  */
        offset = i915_gem_obj_ggtt_offset(obj);
        offset += reloc->offset;
@@ -355,6 +357,7 @@ relocate_entry_gtt(struct drm_i915_gem_object *obj,
 
        io_mapping_unmap_atomic(reloc_page);
 
+       mb();
        return 0;
 }


crazily enough does. And notably

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index c7f4048..3da285a 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -880,7 +880,7 @@ vmas_move_to_rq(struct eb_vmas *eb,
                i915_gem_chipset_flush(rq->i915);
 
        if (flush_domains & I915_GEM_DOMAIN_GTT)
-               wmb();
+               mb();
 
        /* 2: invalidate the caches from this ring after emitting semaphores */
        ret = i915_request_emit_flush(rq, I915_INVALIDATE_CACHES);

does not. Hmm.
Comment 2 Chris Wilson 2015-01-08 16:45:47 UTC
The magic so far:

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index c7f4048..e5a6d52 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -332,6 +332,8 @@ relocate_entry_gtt(struct drm_i915_gem_object *obj,
        if (ret)
                return ret;
 
+       mb();
+
        /* Map the page containing the relocation we're going to perform.  */
        offset = i915_gem_obj_ggtt_offset(obj);
        offset += reloc->offset;
Comment 3 Chris Wilson 2015-01-08 21:16:25 UTC
Making the mb() unconditional at the end of set_to_gtt_domain() is ineffective. So, something must be pertubing the system between the user set-to-domain and the no-op set-to-domain during the reloc???
Comment 4 Chris Wilson 2015-01-08 22:14:57 UTC
My hypothesis is that we need a mb() to serialise updating the GTT and accessing the PTE via the CPU.

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem
index bdfd44c..c8932b3 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1444,6 +1444,7 @@ static int i915_ggtt_bind_vma(struct i915_vma *vma,
 
        BUG_ON(!i915_is_ggtt(vma->vm));
        intel_gtt_insert_sg_entries(vma->obj->pages, entry, flags);
+       wmb();
        vma->bound = GLOBAL_BIND;
        vma->vm->dirty = true;

is sufficient, but intel_gtt_insert_sg_entries() already ends with a readl()!
Comment 5 Chris Wilson 2015-01-08 23:21:10 UTC
Replacing the readl() with wmb() in intel_gtt_insert_sg_entries() works. I just need to explain why the readl() itself doesn't (either the compiler optimises it away or the CPU does).
Comment 6 Chris Wilson 2015-01-08 23:22:07 UTC
Created attachment 111983 [details] [review]
s/readl/wmb/
Comment 7 Chris Wilson 2015-01-08 23:28:46 UTC
Oh, and while I remember. The bug is clearly independent of 43566dedd, but because of the mb() inside set-to-gtt-domain we conveniently masked this bug i.e. the impact of the bug is non-existent, though it is probably the real reason why I had to add the mb() there in the first place. That's for a latter patch.

Oh, indeedy....

commit 63256ec5347fb2344a42adbae732b90603c92f35
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Jan 4 18:42:07 2011 +0000

    drm/i915: Enforce write ordering through the GTT
    
    We need to ensure that writes through the GTT land before any
    modification to the MMIO registers and so must impose a mandatory write
    barrier when flushing the GTT domain. This was revealed by relaxing the
    write ordering by experimentally mapping the registers and the GATT as
    write-combining.

commit d0a57789d5ec807fc218151b2fb2de4da30fbef5
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Oct 9 19:24:37 2012 +0100

    drm/i915: Only insert the mb() before updating the fence parameter
    
    With a fence, we only need to insert a memory barrier around the actual
    fence alteration for CPU accesses through the GTT. Performing the
    barrier in flush-fence was inserting unnecessary and expensive barriers
    for never fenced objects.
    
    Note removing the barriers from flush-fence, which was effectively a
    barrier before every direct access through the GTT, revealed that we
    where missing a barrier before the first access through the GTT. Lack of
    that barrier was sufficient to cause GPU hangs.


Bingo.
Comment 8 Chris Wilson 2015-01-09 09:48:54 UTC
Disassembling intel_gtt_insert_sg_entries() shows that the readl() is not optimised away, so the issue appears to be that it is simply not an adequate barrier. Seems strange.
Comment 9 Chris Wilson 2015-01-09 10:46:49 UTC
Created attachment 112001 [details] [review]
Change readl(addr)

Even more fun.
Comment 10 lu hua 2015-01-12 03:21:01 UTC
(In reply to Chris Wilson from comment #9)
> Created attachment 112001 [details] [review] [review]
> Change readl(addr)
> 
> Even more fun.

Test this patch, it still fail.
output:
IGT-Version: 1.9-g1dddf32 (i686) (Linux: 3.19.0-rc4_kcloud_d574c4_20150112+ i686)
Test assertion failure function gem_execbuf, file ioctl_wrappers.c:437:
Failed assertion: ret == 0
Last errno: 28, No space left on device

#df -h
/dev/sda2   145G   88G   50G  64% /
Comment 11 Chris Wilson 2015-01-12 09:39:22 UTC
(In reply to lu hua from comment #10)
> (In reply to Chris Wilson from comment #9)
> > Created attachment 112001 [details] [review] [review] [review]
> > Change readl(addr)
> > 
> > Even more fun.
> 
> Test this patch, it still fail.
> output:
> IGT-Version: 1.9-g1dddf32 (i686) (Linux: 3.19.0-rc4_kcloud_d574c4_20150112+
> i686)
> Test assertion failure function gem_execbuf, file ioctl_wrappers.c:437:
> Failed assertion: ret == 0
> Last errno: 28, No space left on device
> 
> #df -h
> /dev/sda2   145G   88G   50G  64% /

Technically that's a success. The original bug is fixed and now you hit another bug exposed by the new test. If you retest with IGT-Version: 1.9-g25cf055 to confirm the fix then file a new bug for the failure.
Comment 12 lu hua 2015-01-13 06:30:43 UTC
Test igt g25cf055 and the latest igt g67eb20c.

[root@x-pnv2 tests]# ./gem_exec_big
IGT-Version: 1.9-g25cf055 (i686) (Linux: 3.19.0-rc4_drm-intel-nightly_823e71_20150113+ i686)
Test assertion failure function exec, file gem_exec_big.c:100:
Failed assertion: tmp == gem_reloc[0].presumed_offset
error: 0 != 4173824

IGT-Version: 1.9-g67eb20c (i686) (Linux: 3.19.0-rc4_drm-intel-nightly_823e71_20150113+ i686)
Test assertion failure function exec, file gem_exec_big.c:99:
Failed assertion: tmp == gem_reloc[0].presumed_offset
error: 0 != 385024
Comment 13 Chris Wilson 2015-01-13 09:07:13 UTC
But that was with a different kernel. Did it include the patch?
Comment 14 lu hua 2015-01-14 08:20:06 UTC
(In reply to Chris Wilson from comment #13)
> But that was with a different kernel. Did it include the patch?

Sorry, didn't apply the patch. 
Test this patch and igt commit 25cf055, it works well.I will report a new bug to track comment 10's fail.
output:
IGT-Version: 1.9-g25cf055 (i686) (Linux: 3.19.0-rc4_kcloud_d574c4_20150113+ i686)
Comment 15 lu hua 2015-01-14 08:34:56 UTC
> Test this patch and igt commit 25cf055, it works well.I will report a new
> bug to track comment 10's fail.

bug 88392
Comment 16 Chris Wilson 2015-02-06 11:21:19 UTC
commit 983d308cb8f602d1920a8c40196eb2ab6cc07bd2
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Jan 26 10:47:10 2015 +0000

    agp/intel: Serialise after GTT updates
Comment 17 Ding Heng 2015-02-11 02:21:42 UTC
(In reply to Chris Wilson from comment #16)
> commit 983d308cb8f602d1920a8c40196eb2ab6cc07bd2
Author: Chris Wilson
> <chris@chris-wilson.co.uk>
Date:   Mon Jan 26 10:47:10 2015 +0000

   
> agp/intel: Serialise after GTT updates

Case fail.
./gem_exec_big
IGT-Version: 1.9-g9846e7f (i686) (Linux: 3.19.0-rc5_kcloud_983d30_20150211+ i686)
(gem_exec_big:3569) ioctl-wrappers-CRITICAL: Test assertion failure function gem_execbuf, file ioctl_wrappers.c:437:
(gem_exec_big:3569) ioctl-wrappers-CRITICAL: Failed assertion: ret == 0
(gem_exec_big:3569) ioctl-wrappers-CRITICAL: Last errno: 28, No space left on device
FAIL (2.549s)
[root@x-pnv2 tests]# uname -a
Linux x-pnv2 3.19.0-rc5_kcloud_983d30_20150211+ #84 SMP Wed Feb 11 10:09:16 EST 2015 i686 i686 i386 GNU/Linux
Comment 18 Chris Wilson 2015-02-11 10:06:30 UTC
(In reply to Ding Heng from comment #17)
> (In reply to Chris Wilson from comment #16)
> > commit 983d308cb8f602d1920a8c40196eb2ab6cc07bd2
> Author: Chris Wilson
> > <chris@chris-wilson.co.uk>
> Date:   Mon Jan 26 10:47:10 2015 +0000
> 
>    
> > agp/intel: Serialise after GTT updates
> 
> Case fail.
> ./gem_exec_big
> IGT-Version: 1.9-g9846e7f (i686) (Linux: 3.19.0-rc5_kcloud_983d30_20150211+
> i686)
> (gem_exec_big:3569) ioctl-wrappers-CRITICAL: Test assertion failure function
> gem_execbuf, file ioctl_wrappers.c:437:
> (gem_exec_big:3569) ioctl-wrappers-CRITICAL: Failed assertion: ret == 0
> (gem_exec_big:3569) ioctl-wrappers-CRITICAL: Last errno: 28, No space left
> on device
> FAIL (2.549s)
> [root@x-pnv2 tests]# uname -a
> Linux x-pnv2 3.19.0-rc5_kcloud_983d30_20150211+ #84 SMP Wed Feb 11 10:09:16
> EST 2015 i686 i686 i386 GNU/Linux

That is the other gem_exec_big bug.
Comment 19 Elizabeth 2017-10-06 14:32:31 UTC
Closing old verified.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.