Bug 47085 - [ILK]I-G-T/gem_pipe_control_store_loop fail unstablely
Summary: [ILK]I-G-T/gem_pipe_control_store_loop fail unstablely
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: Daniel Vetter
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 40928
  Show dependency treegraph
 
Reported: 2012-03-07 23:45 UTC by Guang Yang
Modified: 2017-10-06 14:50 UTC (History)
4 users (show)

See Also:
i915 platform:
i915 features:


Attachments
reg_dumper (11.73 KB, text/plain)
2012-03-07 23:45 UTC, Guang Yang
no flags Details
dmesg (442 bytes, text/plain)
2012-03-14 00:19 UTC, Guang Yang
no flags Details
print error value (688 bytes, patch)
2012-03-18 09:56 UTC, Daniel Vetter
no flags Details | Splinter Review

Description Guang Yang 2012-03-07 23:45:48 UTC
Created attachment 58156 [details]
reg_dumper

System Environment:
--------------------------
Platform:        ILK
Kernel: (drm-intel-next-queued)fa37d39e4c6622d80bd8061d600701bcea1d6870

Bug detailed description:
-------------------------
   On ILK platform ,running gem_pipe_control_store_loop of the Intel-gpu-tools with "make test" command will sometimes fail,but run it seperate can always pass.
   the error on the console is like this:
   gem_pipe_control_store_loop: intel_batchbuffer.c:110:      intel_batchbuffer_flush_on_ring: Assertion `ret == 0\' failed.
   BTW,the dmesg and error_state are all empty.
Comment 1 Chris Wilson 2012-03-08 00:37:34 UTC

*** This bug has been marked as a duplicate of bug 44748 ***
Comment 2 Chris Wilson 2012-03-08 00:41:24 UTC
Wrong bug, sorry for the noise.
Comment 3 Chris Wilson 2012-03-08 01:17:25 UTC
After a failure is there anything in dmesg?
Comment 4 Guang Yang 2012-03-08 17:01:34 UTC
(In reply to comment #3)
> After a failure is there anything in dmesg?
No,I only get a empty dmesg after the failure.
Comment 5 Guang Yang 2012-03-14 00:18:44 UTC
System Environment:
--------------------------
Platform:        ILK
Kernel: (drm-intel-fixes)5d031e5b633d910f35e6e0abce94d9d842390006

Bug detailed description:
-------------------------
   On ILK platform ,running gem_tiled_fence_blits of the Intel-gpu-tools
with "make test" command will sometimes fail,but run it seperate can always
pass.

 the error on the console is like this:
 gem_tiled_fence_blits: intel_batchbuffer.c:110: intel_batchbuffer_flush_on_ring: Assertion `ret == 0\' failed.
Comment 6 Guang Yang 2012-03-14 00:19:31 UTC
Created attachment 58419 [details]
dmesg
Comment 7 Daniel Vetter 2012-03-18 09:56:41 UTC
Created attachment 58638 [details] [review]
print error value

Please apply this patch to i-g-t and check what the error value is when these tests fail. Please also check whether it's always the same or sometimes different, by running the tests until you've observed a few failures with both.
Comment 8 Guang Yang 2012-03-19 19:50:37 UTC
(In reply to comment #7)
> Created attachment 58638 [details] [review] [review]
> print error value
> Please apply this patch to i-g-t and check what the error value is when these
> tests fail. Please also check whether it's always the same or sometimes
> different, by running the tests until you've observed a few failures withboth.
The error is:
 exec_batch failed with -13, Permission denied
 gem_pipe_control_store_loop:     intel_batchbuffer.c:113:intel_batchbuffer_flush_on_ring: Assertion `0\' failed.
Comment 9 Daniel Vetter 2012-03-20 02:31:46 UTC
Just to check: All these sporadically failing tests fail with "-13, Permission denied"?

Also please check that when you're runnning i-g-t tests that:
- you are running these tests as root.
- no other drm client (e.g. X) is running. This is _very_ important, because a few tests can only exercise a specific bug if nothing else is using the kernel drm driver.
Comment 10 Guang Yang 2012-03-22 01:41:35 UTC
(In reply to comment #9)
> Just to check: All these sporadically failing tests fail with "-13, Permission
> denied"?
> Also please check that when you're runnning i-g-t tests that:
> - you are running these tests as root.
> - no other drm client (e.g. X) is running. This is _very_ important, because a
> few tests can only exercise a specific bug if nothing else is using the kernel
> drm driver.
  It seems like other I-G-T case which running first will influence the fail case.
  With your patch:
  The error of gem_pipe_control_store_loop is:
  exec_batch failed with -13, Permission denied
 gem_pipe_control_store_loop: intel_batchbuffer.c:113:       intel_batchbuffer_flush_on_ring: Assertion `0\' failed. 

  The error of gem_tiled_fence_blits now is:
  not enough RAM to run test, reducing buffer count.
 So,Daniel,I want to confirm with you:If one case run seperate can return a PASS result,can we think the result is credible? Or we should check the single and the continuous running result?
Comment 11 Daniel Vetter 2012-03-22 04:32:30 UTC
> --- Comment #10 from yangguang <guang.a.yang@intel.com> 2012-03-22 01:41:35 PDT ---
> (In reply to comment #9)
> > Just to check: All these sporadically failing tests fail with "-13, Permission
> > denied"?
> > Also please check that when you're runnning i-g-t tests that:
> > - you are running these tests as root.
> > - no other drm client (e.g. X) is running. This is _very_ important, because a
> > few tests can only exercise a specific bug if nothing else is using the kernel
> > drm driver.
>   It seems like other I-G-T case which running first will influence the fail
> case.
>   With your patch:
>   The error of gem_pipe_control_store_loop is:
>   exec_batch failed with -13, Permission denied
>  gem_pipe_control_store_loop: intel_batchbuffer.c:113:      
> intel_batchbuffer_flush_on_ring: Assertion `0\' failed. 

I've crawled through the kernel and the only place we return this error
code is when checking for is when checking for drm authentication and
is_master. The former will never fail if you run these tests as root. The
later will never fail if these tests are run separately, when there's no
other drm client running. So again, please check this. And especially
check whether your test rig launches tests in parallel or might not
properly wait until a test has completed. In both cases I expect spurious
errors.

>   The error of gem_tiled_fence_blits now is:
>   not enough RAM to run test, reducing buffer count.
>  So,Daniel,I want to confirm with you:If one case run seperate can return a
> PASS result,can we think the result is credible? Or we should check the single
> and the continuous running result?

Well, that's not actually an error of any sort. The script is just
complaining that there's not enough memory to run it fully. And then it
exits with return code 77, which means skip. I guess your testing rig gets
confused because the test dumps the error into stderr. I've fixed that up.
But that test failure should happen _always_ on a given machine, so if
this fails unstably, it's rather strange.
Comment 12 Daniel Vetter 2012-03-30 13:01:53 UTC
This should be fixed by the following patch from -fixes:

commit 5d031e5b633d910f35e6e0abce94d9d842390006
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Feb 8 13:34:13 2012 +0000

    drm/i915: Remove use of the autoreported ringbuffer HEAD position
Comment 13 Guang Yang 2012-03-30 19:11:01 UTC
 confirmed, the bug has been fixed with -fixes branch.
Comment 14 Elizabeth 2017-10-06 14:50:48 UTC
Closing old verified.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.