Bug 21330 - [G45 64-bit] ut2004-demo hang after running auto benchmark for 1 min
Summary: [G45 64-bit] ut2004-demo hang after running auto benchmark for 1 min
Status: VERIFIED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: unspecified
Hardware: Other Linux (All)
: high critical
Assignee: Eric Anholt
QA Contact:
URL:
Whiteboard:
Keywords:
: 19231 (view as bug list)
Depends on:
Blocks:
 
Reported: 2009-04-21 20:08 UTC by zhao jian
Modified: 2009-07-21 20:00 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
xorg.0.log (46.78 KB, text/plain)
2009-04-21 20:08 UTC, zhao jian
Details
gpu dump info (115.55 KB, application/octet-stream)
2009-04-21 22:56 UTC, zhao jian
Details
gpu dump after ut2004 hangs X (183.07 KB, application/x-zip-compressed)
2009-06-14 20:42 UTC, zhao jian
Details

Description zhao jian 2009-04-21 20:08:08 UTC
Created attachment 25021 [details]
xorg.0.log

System Environment:
--------------------------
Host:           G45
Arch:           X86_64
Kernel:         2.6.29.1
Libdrm:          (master)a1e3ab9e55047c08a4006ec389c1a99b72bc672c
Mesa:           (mesa_7_4_branch)e8807a14a61a0b9389aa2f2a113da24ab22a364d
Xserver:         (server-1.6-branch)11db545a86c8933c638a0bc1fcd4f2c65279f617
Xf86_video_intel:     (2.7)296a986e5258e2fd13ec494071b7063bd639cd68



Bug detailed description:
--------------------------
start X and run ut2004-demo, both with exa and uxa it will hang after about 1 minute.  

Reproduce steps:
----------------
1.xinit &
2. run ut2004 demo(benchmark.sh)
Comment 1 Gordon Jin 2009-04-21 20:16:40 UTC
Isn't this like bug#19231? (just needs shorter time to reproduce?)
Does this impact the auto benchmarking?
Comment 2 zhao jian 2009-04-21 20:21:20 UTC
(In reply to comment #1)
> Isn't this like bug#19231? (just needs shorter time to reproduce?)
> Does this impact the auto benchmarking?
No. What I mean is just its benchmark. Maybe it is caused by its kernel. 
Comment 3 zhao jian 2009-04-21 22:56:35 UTC
Created attachment 25023 [details]
gpu dump info
Comment 4 Gordon Jin 2009-06-09 10:58:10 UTC
Jian, does this still exist with Q2 RC1 package?

Eric, this blocks our nightly performance regression testing on this machine (for stable release branch).
Comment 5 zhao jian 2009-06-14 20:39:06 UTC
(In reply to comment #4)
> Jian, does this still exist with Q2 RC1 package?
> Eric, this blocks our nightly performance regression testing on this machine
> (for stable release branch).

Yes, it still exist with Q2 RC1 package. And its gpu dump is in attachment. 
Comment 6 zhao jian 2009-06-14 20:42:19 UTC
Created attachment 26793 [details]
gpu dump after ut2004 hangs X
Comment 7 Eric Anholt 2009-06-23 19:38:22 UTC
Spent a bunch of time today looking into this.  
always_flush_batch=true always_flush_cache=true INTEL_DEBUG=sync narrowed it down to a small batchbuffer in the dump where it all looked sane as far as I can tell (new bits are pushed to intel_gpu_dump for better decoding, too).

My current guess is that the VS program is failing (see updated intel_gpu_top output at hang time) and the instruction parser error is a red herring.
Comment 8 Eric Anholt 2009-06-29 23:49:05 UTC
Based on finding today that the G965 doesn't exhibit the problem while the G/GM45 does, other things that don't help:
- Switching G4x to use only 256 URB register pairs
- Enabling the -RHW workaround
- Cutting max VS threads to 16 like 965.
- Cutting max WM threads to 32 like 965.

What did help was forcing the minimum URB allocation.  More experimenting with this tomorrow.

Comment 9 Eric Anholt 2009-06-30 15:31:28 UTC
*** Bug 19231 has been marked as a duplicate of this bug. ***
Comment 10 Eric Anholt 2009-06-30 17:58:03 UTC
commit cf8ce46531130df3e486fe13567c4aed09b17292
Author: Eric Anholt <eric@anholt.net>
Date:   Tue Jun 30 14:26:06 2009 -0700

    i965: Increase G4X default VS URB allocation to actually allow 32 threads.
    
    This improves the performance of my GLSL demo by 30%.  It also fixes the
    VS deadlock that ut2004 had, for reasons I can't explain. Bug #21330.

This will be cherry-picked to 7.5 if the next set of regression testing comes out OK.
Comment 11 zhao jian 2009-07-03 03:58:24 UTC
It works well on master branch now. verified. 
Comment 12 zhao jian 2009-07-21 20:00:07 UTC
Eric, can you cherry pick this fix to mesa 7.5 branch? 


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.