72508 – [SNB] Glxgears frozen under gnome-session in stress test with Q4 release

Bug 72508 - [SNB] Glxgears frozen under gnome-session in stress test with Q4 release

Summary: [SNB] Glxgears frozen under gnome-session in stress test with Q4 release

Status:	VERIFIED FIXED

Alias:	None

Product:	xorg
Classification:	Unclassified
Component:	Driver/intel (show other bugs)
Version:	git
Hardware:	Other Linux (All)

Importance:	medium normal
Assignee:	Chris Wilson
QA Contact:	Intel GFX Bugs mailing list

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2013-12-09 07:28 UTC by zhoujian
Modified:	2014-05-29 08:29 UTC (History)
CC List:	0 users

See Also:
i915 platform:
i915 features:

Attachments
dmesg.log (27.20 KB, text/plain) 2013-12-09 07:29 UTC, zhoujian	no flags	Details
Xorg.0.log (17.16 KB, text/plain) 2013-12-09 07:29 UTC, zhoujian	no flags	Details
View All

Description zhoujian 2013-12-09 07:28:25 UTC

System Environment:       
----------------------------------------------------------------------
Platform: SNB (Fedora19)
Kernel: 3.12.2
Libdrm: (master)libdrm-2.4.49
Mesa: (10.0)724c07ff1288cebda9a009ca5a72bae5b51e1062
Xserver:(server-1.14-branch)xorg-server-1.14.4
Xf86_video_intel: (master)2.99.906
Cairo:(master)8e11a42e3e9b679dce97ac45cd8b47322536a253
Libva:(master)88ed1ebe960b1c4a7970e12f8df1ed7d7031352a
Libva_intel_driver:(master)4a0f76c5b706fccbc85fadaeee9d785cd7b57d5a

Bug detailed description:
------------------------------------------------------------------------
Glxgears frozen with gnome-session when running by 50~90 circle on fedora19,It 
Works ok on Raw X, The problem doesn’t exist on upstream. Please see Xorg.0.log
and dmesg.log.
This problem reproduced probability is low in Q3, often reproduced in Q4.

Xf86 performance
--------------------------------------------------------------------
Test 100glxgears.sh on SNB
git- 9a8478d  :good
git- 7468a6b :bad



Reproduce steps:
--------------------------------------------------------------------
1.	xinit&
2.	gnome-session
3.	
declare -i K=1
while [ $K -le 100 ]
do
       echo $K
        glxgears &
        K=$K+1
        sleep 1
done
killall glxgears

Comment 1 zhoujian 2013-12-09 07:29:33 UTC

Created attachment 90492 [details]
dmesg.log

Comment 2 zhoujian 2013-12-09 07:29:58 UTC

Created attachment 90493 [details]
Xorg.0.log

Comment 3 Chris Wilson 2013-12-09 09:28:07 UTC

The problem exists only in f19, not upstream, why is it filed here?

Comment 4 Daniel Vetter 2013-12-09 10:13:26 UTC

Also can you please specify exactly which of the components need to be from downstream (fedora) and which from upstream to reproduce this? In your system environment only the kernel isn't latest git (and doesn't look like a distro kernel either).

Comment 5 zhoujian 2013-12-10 02:01:26 UTC

(In reply to comment #4)
> Also can you please specify exactly which of the components need to be from
> downstream (fedora) and which from upstream to reproduce this? In your
> system environment only the kernel isn't latest git (and doesn't look like a
> distro kernel either).
This problems is caused by Xf86_video_intel. As I know, the good commit is 
(master)git-9a8478d.

Comment 6 zhoujian 2013-12-10 03:28:47 UTC

(In reply to comment #3)
> The problem exists only in f19, not upstream, why is it filed here?
Sorry,I confused you,the problem exists on all Linux.

Comment 7 Daniel Vetter 2013-12-10 07:29:34 UTC

Since this is a regression in userspace can you pls do the bisect?

Comment 8 zhoujian 2013-12-11 07:18:05 UTC

(In reply to comment #7)
> Since this is a regression in userspace can you pls do the bisect?
Okay,by bisected, show that the first bad commit is:
f3225fcb38686f3b9701725bf3a11ecf1c100c3f
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Tue Nov 5 08:38:22 2013 +0000
    sna: Be move conservative with tiling sizes for older fenced gen

    The older generations have stricter requirements for alignment of fenced
    GPU surfaces, so accommodate this by reducing our estimate available
    space for the temporary tile.

    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Comment 9 Chris Wilson 2013-12-11 10:21:55 UTC

Bisect fail, you fell down the rabbit hole of chasing a different bug. To confirm yet again, you do not see the bug with the current ddx?

Comment 10 zhoujian 2013-12-13 10:48:33 UTC

(In reply to comment #9)
> Bisect fail, you fell down the rabbit hole of chasing a different bug. To
> confirm yet again,you do not see the bug with the current ddx?
In this stress test,glxgears loop running 100 times as a round.
On 2013Q4,the frame of a glxgears would frozed when running to 30~50 time(only one round),but after a few seconds, then glxgears would work well again.
     Eerros in demsg:"[65598.315187] [drm:i915_set_reset_status] *ERROR* render ring hung inside bo(0x55e6000 ctx 1) at 0x55e6220"
On 2013Q2,it's OK (I have tried 4 rounds,all passed).

I'm sorry,the problem doesn't seem caused by xf86. 
I find that the probability of reproduing the bug is quit lower with old driver.
It's difficult to find out which commpent.

Comment 11 Chris Wilson 2014-05-28 18:21:43 UTC

Next time you run the stress tests, if it does trigger a hang again, please grab the error state and lets see what's responsible.

Comment 12 zhoujian 2014-05-29 08:24:14 UTC

I have test it 5 times and 100 rounds per times,The issue doesn't exists on Mesa-10.2,so close it.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.