Bug 64811 - [HSW-ULT -nightly regression] GPU hang when run x11perf enabling SNA
Summary: [HSW-ULT -nightly regression] GPU hang when run x11perf enabling SNA
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: All Linux (All)
: high major
Assignee: Chris Wilson
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-05-21 06:53 UTC by meng
Modified: 2017-10-06 14:46 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
Xorg.0.log (25.41 KB, text/plain)
2013-05-21 06:53 UTC, meng
no flags Details
i915_error_state.tar.gz (321.91 KB, application/gzip)
2013-05-21 06:57 UTC, meng
no flags Details

Description meng 2013-05-21 06:53:58 UTC
Created attachment 79606 [details]
Xorg.0.log

Environment:
----------------
Platform: HSW-ULT(id=0x0a26, rev09)
Libdrm:  (master)2.4.44-9-g96e90aabc4c0238de2f2d245899f991a3b996587
Mesa:    (master)888fc7a89197972aac614fc19d1c82ed1adbb3f2
Xserver: (master)xorg-server-1.14.99.1-81-g2f1aedcaed8fd99b823
Xf86_video_intel:(master)2.21.6-31-g2217f6356b53263b6ce8f92b5c29c0614d4ef2a5
Cairo:   (master)728e58e60f89076f626329ee3f006f011783f90b
Kernel:  (drm-intel-nightly) fa643cb2d17c011fdddd5e0cc8fc808e097dc5bb

Bug description:
--------------------------
GPU hang when run x11perf enabling SNA on ULT and it works well with UXA.
The problem doesn’t exist on HSW desktop and IVB.
It’s kernel regression on nightlytop branch. It’s good on –nightlytop_2013-05-03(git-040e2924d).
The problem doesn’t exist on all sub-branches of nightlytop. It seems that
kernel merge cause the problem. Please see Xorg.0.log and i915_error_state
attached.

Reproduce steps:
----------------
1.enable SNA
2.xinit&
3.x11perf –aa10text
Comment 1 meng 2013-05-21 06:57:25 UTC
Created attachment 79608 [details]
i915_error_state.tar.gz
Comment 2 Chris Wilson 2013-05-21 07:34:58 UTC
No one else has access to this hw, and the error indicates something is wonky in the hw - can you please do you best to try and narrow down exactly which change makes the error (more?) reproducible? If it is a merge commit perhaps bisect the other side of the merge as well.
Comment 3 meng 2013-05-22 05:48:16 UTC
(In reply to comment #2)

Test the other merges, find the first merge(drm-intel-fixes and drm-intel-next-queued) can reproduce the problem.

detail(build on 2013-05-21)
---------------------------
Merges:1b45e0a60448224697edcde9e065baee050a3813(drm-intel-nightly)bad          
        89ced125472b8551c65526934b7f6c733a6864fa(drm-fixes)  good
Merges: 25d7bf4acbea24176dc79bba9cf8f888bc3d4f00
          b11b88ef0e07a1ea9a3df6666ba8e15833facc67(drm-next)       good      
Merges:   297b0a345999368ca02616801ab7bfc9a560a422    bad
            e6c6992522d3f74341f032bda2c76266057cc53b(drm-intel-fixes)good
            98304ad186296dc1e655399e28d5973c21db6a73(drm-intel-next-queued) good
Comment 4 Chris Wilson 2013-05-22 08:14:13 UTC
Looks like a semantic conflict in gtt_total_entries() between the two trees - GTT corruption would explain why it dies in the middle of an otherwise sane batch.
Comment 5 Daniel Vetter 2013-05-22 08:34:50 UTC
To clarify: Is only hsw ult blowing up like this or are other platforms affected?
Comment 6 meng 2013-05-22 09:04:41 UTC
(In reply to comment #5)
> To clarify: Is only hsw ult blowing up like this or are other platforms
> affected?

It only exists on hsw ult.
Comment 7 Daniel Vetter 2013-05-22 09:30:21 UTC
Hm, that's a bit strange. Has any other test started to fail in the same manner on that hsw ult box?

Also, could you maybe check out a few older -nightly builds to test when that merge regression started to appear?

And finally just to check: Is latest dinq broken now, too?
Comment 8 meng 2013-05-22 12:18:02 UTC
(In reply to comment #7)
3D benchmarks work well on that machine HSW-ULT and only find x11perf causes  GPU hang.
The latest 4 sub-branches of nightlytop such as dinq are good.
Comment 9 Daniel Vetter 2013-05-23 07:55:12 UTC
dinq has now -fixes merged in (mostly) and is based on 3.10-rc2. So -nightly is (almost) the same as dinq. Does that work now and we can close this, or is this still an issue?
Comment 10 Daniel Vetter 2013-05-23 08:57:49 UTC
Fixed in 3.10, 3.9 is known-broken. The fixes are imo too invasive to backport.
Comment 11 Daniel Vetter 2013-05-23 10:03:46 UTC
That was the wrong bug report :(
Comment 12 Gordon Jin 2013-05-24 05:15:56 UTC
Mengmeng, I guess we can skip Daniel's comment#10, but he's still waiting for you to answer comment#9.
Comment 13 meng 2013-05-24 05:32:44 UTC
(In reply to comment #9)
The problem now exists on -dinq_ef8a07_20130524 and 3.10.0-rc2_nightlytop_858f85_20130524_+.
Comment 14 Chris Wilson 2013-05-24 13:26:03 UTC
For sanity's sake, can you please verify that c778879 is also good? As we know 98304ad is good, that will reconfirm that it is the merge commit in dinq that exacerbates the issue.
Comment 15 meng 2013-05-26 02:46:33 UTC
(In reply to comment #14)
> For sanity's sake, can you please verify that c778879 is also good? As we
> know 98304ad is good, that will reconfirm that it is the merge commit in
> dinq that exacerbates the issue.

The commit c778879 is also good commit.  Bisecting between ef8a07(bad)
and c778879(good) on -dinq, show the first bad commit is the merged:
commit e1b73cba13a0cc68dd4f746eced15bd6bb24cda4 
Merge: 98304ad c778879
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Tue May 21 09:52:16 2013 +0200

    Merge tag 'v3.10-rc2' into drm-intel-next-queued
Comment 16 Gordon Jin 2013-06-03 14:01:41 UTC
Chris/Daniel, any idea?
Comment 17 Chris Wilson 2013-06-03 21:54:13 UTC
No idea. The only thing that looks even remotely suspicious is gtt_total_entries() but the merged code reads fine. Back to assuming that the bisect is a red herring and we need a wa for some unknown issue.
Comment 18 meng 2013-06-04 01:39:27 UTC
It works well on kernel 3.10.0-rc2_nightlytop_8d72c3_20130603.
Comment 19 Elizabeth 2017-10-06 14:46:21 UTC
Closing old verified.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.