Bug 79670

Summary: [All bisected ppgtt] igt/gem_exec_big fails, with ppgtt enabled
Product: DRI Reporter: Guo Jinxian <jinxianx.guo>
Component: DRM/IntelAssignee: Chris Wilson <chris>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: highest CC: intel-gfx-bugs
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
dmesg
none
clflush ptes
none
dmesg with patch
none
dmesg
none
dmesg
none
dmesg none

Description Guo Jinxian 2014-06-05 07:29:06 UTC
Created attachment 100445 [details]
dmesg

==System Environment==
--------------------------
Regression: Yes. 
Good commit on -next-queued: 192155025197cc4765702a180904c3b62c152b7a

Non-working platforms: BYT

==kernel==
--------------------------
origin/drm-intel-nightly: 0a37b5d366831590ebc976018d1bd812ef526a98(fails)
    drm-intel-nightly: 2014y-06m-03d-19h-31m-28s integration manifest
origin/drm-intel-next-queued: 92d7377929140bc120f7742ee3afffcb2a827fe4(fails)
     drm/i915: Simplify intel_gpu_reset
origin/drm-intel-fixes: d23db88c3ab233daed18709e3a24d6c95344117f(fails)
    drm/i915: Prevent negative relocation deltas from wrapping

==Bug detailed description==
-----------------------------
igt/gem_exec_big fails

Output:
./gem_exec_big
IGT-Version: 1.6-g1451df1 (x86_64) (Linux: 3.15.0-rc3_drm-intel-next-queued_06946f_20140605+ x86_64)
Test assertion failure function exec, file gem_exec_big.c:95:
Last errno: 0, Success
Failed assertion: tmp == gem_reloc[0].presumed_offset

==Reproduce steps==
---------------------------- 
1. ./gem_exec_big
Comment 1 Guo Jinxian 2014-06-05 07:39:01 UTC
Update result on -fixes
origin/drm-intel-fixes: d23db88c3ab233daed18709e3a24d6c95344117f(works)
    drm/i915: Prevent negative relocation deltas from wrapping
Comment 2 Chris Wilson 2014-06-05 07:56:59 UTC
Want to bet this is ppgtt enabling? Please try i915.enable_ppgtt=0, otherwise please bisect.
Comment 3 Guo Jinxian 2014-06-06 03:01:46 UTC
(In reply to comment #2)
> Want to bet this is ppgtt enabling? Please try i915.enable_ppgtt=0,
> otherwise please bisect.

Disable ppgtt, the result is passed.
Comment 4 Gordon Jin 2014-06-06 04:17:18 UTC
Thanks for Chris's comment. So I'd assume we don't need bisect.
Comment 5 Daniel Vetter 2014-06-06 06:16:56 UTC
Created attachment 100505 [details] [review]
clflush ptes

Please test this - I don't have a byt myself so can't check myself ...
Comment 6 Ville Syrjala 2014-06-06 08:19:22 UTC
Also I'd like to see 'lspci -n' for the affected machine. The theory is that the stepping is a factor here since it worked for me and Jesse using production stepping machines.
Comment 7 Guo Jinxian 2014-06-09 02:18:44 UTC
Created attachment 100691 [details]
dmesg with patch

(In reply to comment #5)
> Created attachment 100505 [details] [review] [review]
> clflush ptes
> 
> Please test this - I don't have a byt myself so can't check myself ...

With this patch, the result was fail.
Output:
./gem_exec_big
IGT-Version: 1.6-g18d2130 (x86_64) (Linux: 3.15.0-rc3_kcloud_10dca6_20140609+ x86_64)
Test assertion failure function exec, file gem_exec_big.c:95:
Last errno: 0, Success
Failed assertion: tmp == gem_reloc[0].presumed_offset
Comment 8 Guo Jinxian 2014-06-09 02:21:19 UTC
(In reply to comment #6)
> Also I'd like to see 'lspci -n' for the affected machine. The theory is that
> the stepping is a factor here since it worked for me and Jesse using
> production stepping machines.

lspci -n
00:00.0 0600: 8086:0f00 (rev 0a)
00:02.0 0300: 8086:0f31 (rev 0a)
00:13.0 0106: 8086:0f23 (rev 0a)
00:14.0 0c03: 8086:0f35 (rev 0a)
00:1a.0 1080: 8086:0f18 (rev 0a)
00:1b.0 0403: 8086:0f04 (rev 0a)
00:1c.0 0604: 8086:0f48 (rev 0a)
00:1c.1 0604: 8086:0f4a (rev 0a)
00:1c.2 0604: 8086:0f4c (rev 0a)
00:1c.3 0604: 8086:0f4e (rev 0a)
00:1f.0 0601: 8086:0f1c (rev 0a)
00:1f.3 0c05: 8086:0f12 (rev 0a)
01:00.0 0200: 8086:107d (rev 06)
Comment 9 Daniel Vetter 2014-06-13 08:28:59 UTC
Bisect result would still be good here ...
Comment 10 Daniel Vetter 2014-06-13 08:29:37 UTC
Meh, already checked for ppgtt, no need for bisect. Sorry about the noise
Comment 11 Jesse Barnes 2014-06-13 14:08:58 UTC
Ok looks like an early stepping.  We can disable PPGTT on pre-rev C and hope for better luck...
Comment 12 Jesse Barnes 2014-06-13 15:32:59 UTC
Please try this patch

http://lists.freedesktop.org/archives/intel-gfx/2014-June/047137.html
Comment 13 Daniel Vetter 2014-06-13 16:53:57 UTC
commit 62942ed7279d3e06dc15ae3d47665eff3b373327
Author: Jesse Barnes <jbarnes@virtuousgeek.org>
Date:   Fri Jun 13 09:28:33 2014 -0700

    drm/i915/vlv: disable PPGTT on early revs v3


Do you have any production silicon byt machines around? Otherwise testing will lack coverage ...
Comment 14 Guo Jinxian 2014-06-16 06:10:16 UTC
Tested on latest -next-queued 868d665b43473e230d560d5186535270a3d57a19(which include the patch 047137), the result was passed.
Output:
root@x-byt06:/GFX/Test/Intel_gpu_tools/intel-gpu-tools/tests# ./gem_exec_big
IGT-Version: 1.7-g8c1566e (x86_64) (Linux: 3.15.0-rc8_kcloud_868d66_20140616+ x86_64)
root@x-byt06:/GFX/Test/Intel_gpu_tools/intel-gpu-tools/tests# echo $?
0
Comment 15 Jesse Barnes 2014-06-16 23:39:53 UTC
I guess we can mark this fixed then.
Comment 16 Guo Jinxian 2014-06-18 06:08:39 UTC
Created attachment 101280 [details]
dmesg

The case still fail on latest -next-queued. I am not sure if they are the same failure.
root@x-byt06:/GFX/Test/Intel_gpu_tools/intel-gpu-tools/tests# ./gem_exec_big
IGT-Version: 1.7-g1b1f4b1 (x86_64) (Linux: 3.15.0-rc8_drm-intel-next-queued_27b6c1_20140618+ x86_64)
Test assertion failure function exec, file gem_exec_big.c:90:
Last errno: 0, Success
Failed assertion: tmp == gem_reloc[0].presumed_offset
error: 17043456 == -1
Comment 17 Daniel Vetter 2014-06-18 14:09:48 UTC
Please retest on latest -nightly, that should have a fix.
Comment 18 Guo Jinxian 2014-06-19 01:52:43 UTC
Created attachment 101329 [details]
dmesg

(In reply to comment #17)
> Please retest on latest -nightly, that should have a fix.

The result on latest -nightly still fail.

Output:
[root@x-hsw27 tests]# ./gem_exec_big
IGT-Version: 1.7-g1b1f4b1 (x86_64) (Linux: 3.15.0-rc8_drm-intel-nightly_fff6c5_20140618+ x86_64)
Test assertion failure function exec, file gem_exec_big.c:90:
Last errno: 0, Success
Failed assertion: tmp == gem_reloc[0].presumed_offset
error: 7974912 == -1
Comment 19 Guo Jinxian 2014-06-19 01:56:13 UTC
(In reply to comment #16)
> Created attachment 101280 [details]
> dmesg
> 
> The case still fail on latest -next-queued. I am not sure if they are the
> same failure.
> root@x-byt06:/GFX/Test/Intel_gpu_tools/intel-gpu-tools/tests# ./gem_exec_big
> IGT-Version: 1.7-g1b1f4b1 (x86_64) (Linux:
> 3.15.0-rc8_drm-intel-next-queued_27b6c1_20140618+ x86_64)
> Test assertion failure function exec, file gem_exec_big.c:90:
> Last errno: 0, Success
> Failed assertion: tmp == gem_reloc[0].presumed_offset
> error: 17043456 == -1

Auto-bisect shows commit below is the first bad commit about the failure above.

commit eb36fc993d7ae1988c80ba5b767989059c91d0ec
Author:     Chris Wilson <chris@chris-wilson.co.uk>
AuthorDate: Mon Jun 16 10:49:16 2014 +0100
Commit:     Chris Wilson <chris@chris-wilson.co.uk>
CommitDate: Mon Jun 16 10:51:02 2014 +0100

    igt/gem_exec_big: Update to new igt_assert_eq
    
    Use igt_assert_eq for better test output on failures.
    
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Comment 20 Chris Wilson 2014-07-19 10:37:15 UTC
commit 236d6bd2d36114fe402fe0e85d97b14cdf102963
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Thu Jun 19 10:13:12 2014 +0200

    tests/gem_exec_big: Re-add gem_sync
    
    We need this to avoid hitting the slowpath and ending up with a
    presumed_offset == -1. Regression reported by PRTS, bisected to
    
    commit eb36fc993d7ae1988c80ba5b767989059c91d0ec
    Author:     Chris Wilson <chris@chris-wilson.co.uk>
    AuthorDate: Mon Jun 16 10:49:16 2014 +0100
    Commit:     Chris Wilson <chris@chris-wilson.co.uk>
    CommitDate: Mon Jun 16 10:51:02 2014 +0100
    
        igt/gem_exec_big: Update to new igt_assert_eq
    
        Use igt_assert_eq for better test output on failures.
    
        Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    
    v2: igt_warn_on unexpected reloc offsets.
    
    Cc: shuang.he@intel.com
    Acked-by: Chris Wilson <chris@chris-wilson.co.uk> (on irc)
    Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.c
Comment 21 Guo Jinxian 2014-07-21 03:29:11 UTC
Verified on latest -nightly(8734408c113bb38234ed03ec51c723b3deff579b)

[root@x-bdw01 tests]# ./gem_exec_big
IGT-Version: 1.7-g4d4f4b2 (x86_64) (Linux: 3.16.0-rc5_drm-intel-nightly_873440_20140721+ x86_64)
[root@x-bdw01 tests]# echo $?
0
Comment 22 Guo Jinxian 2014-07-24 07:26:50 UTC
Created attachment 103384 [details]
dmesg

Test still failed on latest -nightly(af1aaba219fdd90ca1b30f9b8d8d19352224f170) on BYT
root@x-bytm02:~# cd /GFX/Test/Intel_gpu_tools/intel-gpu-tools/tests/
root@x-bytm02:/GFX/Test/Intel_gpu_tools/intel-gpu-tools/tests# ./gem_exec_big
IGT-Version: 1.7-g70e6ed9 (x86_64) (Linux: 3.16.0-rc6_drm-intel-nightly_af1aab_20140724+ x86_64)
Test assertion failure function exec, file gem_exec_big.c:97:
Failed assertion: tmp == gem_reloc[0].presumed_offset
error: 0 == 8908800
Comment 23 Chris Wilson 2014-07-24 07:43:34 UTC
That's not going to be the same bug.
Comment 24 Guo Jinxian 2014-07-25 05:16:38 UTC
(In reply to comment #22)
> Created attachment 103384 [details]
> dmesg
> 
> Test still failed on latest
> -nightly(af1aaba219fdd90ca1b30f9b8d8d19352224f170) on BYT
> root@x-bytm02:~# cd /GFX/Test/Intel_gpu_tools/intel-gpu-tools/tests/
> root@x-bytm02:/GFX/Test/Intel_gpu_tools/intel-gpu-tools/tests# ./gem_exec_big
> IGT-Version: 1.7-g70e6ed9 (x86_64) (Linux:
> 3.16.0-rc6_drm-intel-nightly_af1aab_20140724+ x86_64)
> Test assertion failure function exec, file gem_exec_big.c:97:
> Failed assertion: tmp == gem_reloc[0].presumed_offset
> error: 0 == 8908800

Reported new bug for this error (Bug 81728), close this one.
Comment 25 Jari Tahvanainen 2016-10-19 09:30:28 UTC
Closing verified+fixed.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.