Bug 17967 - more glxgears window freezes system (with default vblank_mode=1)
Summary: more glxgears window freezes system (with default vblank_mode=1)
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i915 (show other bugs)
Version: unspecified
Hardware: Other All
: medium blocker
Assignee: Jesse Barnes
QA Contact:
URL:
Whiteboard:
Keywords: NEEDINFO
Depends on:
Blocks:
 
Reported: 2008-10-08 00:21 UTC by qwang13
Modified: 2009-01-06 12:02 UTC (History)
10 users (show)

See Also:
i915 platform:
i915 features:


Attachments
xorg.0.log (32.03 KB, text/plain)
2008-10-08 00:21 UTC, qwang13
Details
dmesg (131.00 KB, text/plain)
2008-10-08 00:22 UTC, qwang13
Details

Note You need to log in before you can comment on or make changes to this bug.
Description qwang13 2008-10-08 00:21:23 UTC
Created attachment 19479 [details]
xorg.0.log

Platform: Montevina, Mccreary
OS:    SLES_11_beta2

Packages: 2008Q3RC2, 2008Q3RC3
Mesa-7.2+GEM
xf86-video-intel-2.5 branch


Reproduce steps:

Initiate several glxgears windowns (5 or more), and then maxsize one of more, switch them and wait, you will find the OS and X crashed. not any response.

Or you can move one of glxgears window quickly, the problem is the same.

Since this case is easily reproduced, I don't upload xorg.conf.
Comment 1 qwang13 2008-10-08 00:22:23 UTC
Created attachment 19480 [details]
dmesg
Comment 2 Gordon Jin 2008-10-08 06:16:19 UTC
Jiewen, can you reproduce this on G45 or GM45?

Quanxian, have you tried other platforms? I want to know if this is GM45/G45 specific.
Comment 3 lin, jiewen 2008-10-08 20:05:07 UTC
With Packages 2008Q3RC2, 2008Q3RC3, I can't see "the OS and X crash" on our GM45.But when dragging one of them , last for several seconds, others application windows are freezing except the one dragged, and everything works well as soon as release you mouse button. Maxsizing one of more is the same as dragging, freezing happens "only in the course of your operation". After your  operation finish, everything works well.
Comment 4 lin, jiewen 2008-10-08 20:10:09 UTC
The phenomena descripted above happen Not only on the app glxgear, like arbfplight.
Comment 5 qwang13 2008-10-08 22:08:41 UTC
(In reply to comment #3)
> With Packages 2008Q3RC2, 2008Q3RC3, I can't see "the OS and X crash" on our
> GM45.But when dragging one of them , last for several seconds, others
> application windows are freezing except the one dragged, and everything works
> well as soon as release you mouse button. Maxsizing one of more is the same as
> dragging, freezing happens "only in the course of your operation". After your 
> operation finish, everything works well.
> 

Do you open 5 or more glxgears windows? one window can not reproduce this. Also when dragging the window, please quickly drag, and drag to every place for about 30 seconds. :)

we are trying on T61(965GM) to check if it is specific for GM45 / G45. 
Comment 6 haihao 2008-10-08 23:40:45 UTC
Crash issue? I saw a frozen screen with my 965GM. 
Comment 7 haihao 2008-10-09 00:15:27 UTC
What is the kernel version used in SLES_11_beta2

Comment 8 Stefan Dirsch 2008-10-09 00:20:00 UTC
(In reply to comment #7)
> What is the kernel version used in SLES_11_beta2

2.6.27 rc6 (without any GEM patches).

Comment 9 haihao 2008-10-09 01:47:20 UTC
Could you set vblank_mode to 0 in your ~/.drirc, such as 

<driconf>
 <device screen="0" driver="i965">
         <application name="Default">
                 <option name="vblank_mode" value="0" />
         </application>
 </device>
</driconf>
Comment 10 qwang13 2008-10-09 02:44:08 UTC
We have tried this. After the configuration, it is OK. :). Seems it is vblank_mode problem.
Comment 11 Gordon Jin 2008-10-09 19:39:24 UTC
So it turns out a vblank issue.
Jesse has vblank-rework in anholt's drm-intel-next kernel. So I guess we'll follow up there in Q4 release.
Comment 12 qwang13 2008-10-14 01:52:55 UTC
Gordon, you can close this bug.

Thanks for your help
Comment 13 Gordon Jin 2008-10-14 06:00:01 UTC
(In reply to comment #12)
> Gordon, you can close this bug.

No. This _is_ a bug.
Comment 14 Keith Packard 2008-10-17 01:19:34 UTC
Ok, Eric and I spent the day playing with vblank. We've got fixes in the 2D driver, mesa, libdrm and the kernel. Would be nice to know whether this bug remains a problem now.
Comment 15 qwang13 2008-10-17 02:17:07 UTC
(In reply to comment #14)
> Ok, Eric and I spent the day playing with vblank. We've got fixes in the 2D
> driver, mesa, libdrm and the kernel. Would be nice to know whether this bug
> remains a problem now.
> 
It is great.
I have checked the email. There are two email with vblank issue.
1)  [Intel-gfx] Several vblank swapbuffers fixes
2) [PATCH] [drm/i915] Protect vblank IRQ regaccess	with spinlock

Is still others ? or just use the branch. If so, please tell me which branch is touched.

Thanks
Comment 16 qwang13 2008-10-19 18:42:25 UTC
Keith,
The problem still is there. We just disable vblank. I don't make where your commit for vblank are for libdrm, mesa, 2d and kernel. 

Thanks
Comment 17 Jesse Barnes 2008-10-22 08:59:23 UTC
It could be that this isn't a full machine hang, but rather a deadlock in the DRM.  You may be able to run a script in the background that waits a few seconds and then captures the dmesg after a sysrq-t trigger, something like:
  $ sleep 30; echo t > /proc/sysrq-trigger; dmesg > dmesg.out; sync &
  $ startx
  <reproduce hang, wait 30s before rebooting>
or something.
Comment 18 qwang13 2008-10-24 00:32:26 UTC
After the hang, network is broken. We can not ssh to the machine to get the information.

We have tried RC5 packages on T61, Mccreary and Montevina, all of them have such problem.
Comment 19 qwang13 2008-10-29 21:15:40 UTC
I upgrade the priority of this issue since it is critical for 3D. If it is not works, it will block Novell-SLED11 3D release for new platforms and old platforms. 

We have tried this on G33, T61, Montevina and Mccreary. Every machines which we touched will have such problem. 

Also I have tried the vblank patch for libdrm and mesa, not works. For drm patches, it is for gem. Novell just use no gem for kernel(Q3 final release).

Bug 17963, is also the problems initialized by glxgears (black screen). 

Such problems will be a block path for 3D support.

Thanks 
Comment 20 Keith Packard 2008-11-04 02:20:20 UTC
I've posted a patch to intel-gfx and dri-devel for review which makes vblank work on my machine at least.
Comment 21 qwang13 2008-11-04 22:05:31 UTC
Ok. I have checked the email. Seem it is the email with title [Intel-gfx] [PATCH] Manage PIPESTAT pending interrupt values to unblock vblank interrupts

I will try this. 

any information, I will report.

Comment 22 qwang13 2008-11-05 01:33:43 UTC
I have checked the content of patch. It has the big difference with Q3 release. 
Drm package of Q3 release is from linux kernel. However I checked the contents, seems it is for drm-gem. Novell drm doesn't support drm-gem kernel.

Also the interface has been changed more.

Any comments for that?
Comment 23 Kent Liu 2008-11-10 23:49:33 UTC
Keith or Jesse, are you planning to provide fix for Gem-classic? Vblank issue seems the root cause for many current critical 3D issues.

Also from Eric's concern in Bug # 17963, the Vblank code path should be reverted for 855/865. Will you also include that fix together?
Comment 24 qwang13 2008-11-13 21:17:30 UTC
any progress for that?
Comment 25 Stefan Dirsch 2008-11-22 02:49:44 UTC
I've now disabled VBlank (by default) for i965 for openSUSE 11.1/SLE11. Did
this already for i915 before (Bug #18041). So no more VBlank for intel.
Comment 26 Jesse Barnes 2008-11-24 12:22:41 UTC
Stefan, disabling vblank altogether is a pretty big hammer, since apps depend on it to draw without tearing.  There's another patch which might help this bug at http://lists.freedesktop.org/archives/intel-gfx/2008-November/000614.html, you might want to give it a try.
Comment 27 Jesse Barnes 2008-11-24 15:51:05 UTC
As for the backporting question, no we weren't planning on doing the backport, but someone in the OSV team or at Novell probably could.  There have been a lot of changes though, so it won't be trivial.
Comment 28 Jesse Barnes 2008-11-24 15:53:18 UTC
Quanxiang, can you try Eric's for-airlied tree?  Keith's fixes are included there, along with some additional fixes that have occurred since then.  If that works at least we'll have an idea of whether backporting is needed.
Comment 29 qwang13 2008-11-25 18:07:20 UTC
We am try to backporting you vbalnk packages. Keith patch is based on your packages and also some others are also based on yours or gem. Also we will have a try on the latest branch including for-airlied. Seems it is very hard for upstream to provide a patch based on Q3 release. :( . I know you are very busy for gem. Wish This way can help.
Comment 30 qwang13 2008-11-26 01:08:42 UTC
1) for-airlied works.
2) After packageing your vblank package and adding Keith patch for "[drm] Move drm vblank initialization/cleanup to driver load/unload", the glxgears window becomes black just as bug 17963. 

By the way, I tested them in GM965(T61).
Comment 31 Gordon Jin 2008-12-01 00:48:07 UTC
shall we close this since for-airlied works?
Comment 32 qwang13 2008-12-01 05:07:35 UTC
I don't think. We still not find the solution for this. We ever included Jesse vblank patch and plus Keith patch, glxgears still be hang. For branch of for-airlied, it is based on v2.6.28 kernel. We ever try to backporting to v2.6.27. However it is stopped by io-mapping. There are much dependency on new kernel. 

Therefore we still need to find a solution for non-gem branch. 

There is no way for us to backporting gem to v2.6.27 since novell will not change kernel.
Comment 33 qwang13 2008-12-01 23:03:48 UTC
Jesse, 
Any idea for that?

I have ported your patches for vblank and plus Keith patch. It doesn't work on 965GM. 

Also I get the information from Novell, if we don't provide the solution for this. They will disable vblank.

This is still the blocker issue for Novell if we want them to enable vblank.
Comment 34 Jesse Barnes 2008-12-17 15:52:52 UTC
Disabling vblank by default is fine; it just means users will see tearing in some cases, but shouldn't affect correctness otherwise.
Comment 35 Jesse Barnes 2009-01-06 12:02:43 UTC
Fixed upstream and worked around in SuSE by disabling vblank sync.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.