Bug 26569

Summary: [G45] GPU lockups with 2D accel
Product: DRI Reporter: Shawn Starr <shawn.starr>
Component: DRM/IntelAssignee: Wang Zhenyu <zhenyu.z.wang>
Status: CLOSED NOTOURBUG QA Contact:
Severity: normal    
Priority: medium Keywords: NEEDINFO
Version: DRI git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:

Description Shawn Starr 2010-02-14 20:11:04 UTC
I am having regressions, screen updates are causing GPU to lockup. I am able to reproduce 100% but nothing is being logged from kernel, just GPU wedges can't get into machine either.

Using latest libdrm, ddx and mesa git master code.
Comment 1 Shawn Starr 2010-02-14 20:11:39 UTC
I have a Intel GMA 4500HDx (i965)
Comment 2 Shawn Starr 2010-02-14 20:35:14 UTC
X.Org X Server 1.7.4
Release Date: 2010-01-08
Comment 3 Jesse Barnes 2010-02-16 08:20:15 UTC
Which component regressed?  Can you bisect?
Comment 4 Shawn Starr 2010-02-20 15:15:43 UTC
I switched to Fedora so here is new info:

X.Org X Server 1.7.99.901 (1.8.0 RC 1) - xorg-x11-server-Xorg-1.7.99.901-6.20100215.fc13.x86_64.rpm

intel DDX: xorg-x11-drv-intel-2.10.0-4.fc13.x86_64

libdrm: libdrm-2.4.18-0.1.fc13.x86_64

Kernel is custom built early this morning of February 20th around 11-12am EST time using: anholt's latest patches against 2.6.33-rc8+

What happens is if I use 2D acceleration the GPU will lockup, if there was any audio being played it will get stuck playing a chunk of the buffer over and over, can't ssh into machine wedges.

We can rule out the X server parts since two different versions show this. It might be the drm driver changes?
Comment 5 Shawn Starr 2010-03-02 23:23:45 UTC
I can give some more information there is a error being reported just prior to KMS mode setup.

This still happens on 2.6.33 + for-linus and intel-drm-next, with latest libdrm, mesa, ddx.

When moving windows X uses high CPU, when trying to use glxgears, low FPS, X uses 40-50% cpu glxgears 50% CPU, the gears show stalls then locks up completely.

kernel spits out:

Mar  3 02:07:18 segfault kernel: DMA-API: preallocated 32768 debug entries
Mar  3 02:07:18 segfault kernel: DMA-API: debugging enabled by kernel config
Mar  3 02:07:18 segfault kernel: DMAR: Device scope device [0000:00:03.02] not found
Mar  3 02:07:18 segfault kernel: DMAR: Device scope device [0000:00:03.02] not found
Mar  3 02:07:18 segfault kernel: DMAR: Device scope device [0000:00:03.03] not found
Mar  3 02:07:18 segfault kernel: DMAR: Device scope device [0000:00:03.03] not found
Mar  3 02:07:18 segfault kernel: IOMMU 0xfeb00000: using Register based invalidation
Mar  3 02:07:18 segfault kernel: IOMMU 0xfeb01000: using Register based invalidation
Mar  3 02:07:18 segfault kernel: IOMMU 0xfeb03000: using Register based invalidation
Mar  3 02:07:18 segfault kernel: IOMMU 0xfeb02000: using Register based invalidation
Mar  3 02:07:18 segfault kernel: IOMMU: Setting RMRR:
Mar  3 02:07:18 segfault kernel: IOMMU: Setting identity map for device 0000:00:02.0 [0xbdc00000 - 0xc0000000]
Mar  3 02:07:18 segfault kernel: IOMMU: Setting identity map for device 0000:00:02.1 [0xbdc00000 - 0xc0000000]
Mar  3 02:07:18 segfault kernel: IOMMU: Setting identity map for device 0000:00:1d.0 [0xfc226c00 - 0xfc227400]
Mar  3 02:07:18 segfault kernel: IOMMU: Setting identity map for device 0000:00:1d.1 [0xfc226c00 - 0xfc227400]
Mar  3 02:07:18 segfault kernel: IOMMU: Setting identity map for device 0000:00:1d.2 [0xfc226c00 - 0xfc227400]
Mar  3 02:07:18 segfault kernel: IOMMU: Setting identity map for device 0000:00:1d.7 [0xfc226c00 - 0xfc227400]
Mar  3 02:07:18 segfault kernel: IOMMU: Setting identity map for device 0000:00:1a.0 [0xfc226c00 - 0xfc227400]
Mar  3 02:07:18 segfault kernel: IOMMU: Setting identity map for device 0000:00:1a.1 [0xfc226c00 - 0xfc227400]
Mar  3 02:07:18 segfault kernel: IOMMU: Setting identity map for device 0000:00:1a.2 [0xfc226c00 - 0xfc227400]
Mar  3 02:07:18 segfault kernel: IOMMU: Setting identity map for device 0000:00:1a.7 [0xfc226c00 - 0xfc227400]
Mar  3 02:07:18 segfault kernel: IOMMU: Prepare 0-16MiB unity mapping for LPC
Mar  3 02:07:18 segfault kernel: IOMMU: Setting identity map for device 0000:00:1f.0 [0x0 - 0x1000000]
Mar  3 02:07:18 segfault kernel: DRHD: handling fault status reg 3
Mar  3 02:07:18 segfault kernel: DMAR:[DMA Write] Request device [00:02.0] fault addr 200000000
Mar  3 02:07:18 segfault kernel: DMAR:[fault reason 05] PTE Write access is not set

00:02.0 in this case is the Intel integrated graphics
Comment 6 Shawn Starr 2010-03-06 21:37:47 UTC
I note: I cannot view /debug/dri/0/i915_wedged because the GPU is not doing a reset at all. 

This continues even with 2.6.33 final, I also can reproduce this with 2.6.32 vanilla, so there might be something broken in the DDX with 2.10, I'm not able to git bisect too far back because of ABI changes which break with Xorg pre-1.8 server.

Comment 7 Gordon Jin 2010-03-11 00:24:12 UTC
(In reply to comment #6)
> I also can reproduce this with 2.6.32
> vanilla, so there might be something broken in the DDX with 2.10, I'm not able
> to git bisect too far back because of ABI changes which break with Xorg pre-1.8
> server.

You mean it's regression in xf86-video-intel 2.10. So what's the previous working version? I don't understand why you can't bisect (with xserver 1.7)?

I don't see this problem on my G45 (GMA4500HD)
Comment 8 Shawn Starr 2010-03-11 10:24:18 UTC
Well, You cannot build 2.9 against X server 1.8 due to API changes. 

After testing with 2.6.32 + Xserver 1.8 i encountered lockups. Either the regression is in the drm driver and I have to go back to .31 to confirm or it's in the DDX.

I can try 2.6.31 and compile this to validate that it really isn't from the drm driver if you like.

It would be good if it would trigger a GPU reset but it's not which is making this very difficult to debug. Even using Intel AMT over LAN for serial it did not dump anything from the drm saying there was faults.

I do not think Xserver 1.7.x would cause it to lockup so tightly only the DDX or drm drivers could do this.
Comment 9 Shawn Starr 2010-03-12 09:06:26 UTC
Ok, It is not happening in 2.6.31 vanilla. So this is a drm regression. I will begin a bisect from .31. Something in 2.6.32-rcX broke and we're going to find out.
Comment 10 Shawn Starr 2010-03-12 09:31:38 UTC
I believe it is the use of the new DMA API which broke my GM45 (i965) GMA 4500HD. From what .31 had and .32/.33 can someone else confirm this?

Do we need a quirk for this chipset im using?
Comment 11 Shawn Starr 2010-03-12 09:41:58 UTC
I can confirm using kernel parameter intel_iommu=off  I can use the GPU with 2.6.34-rc1 even fine.
Comment 12 Shawn Starr 2010-03-27 21:37:35 UTC
Drop severity, the new code is experimental workaround solves problem right now.
Comment 13 Jesse Barnes 2010-06-01 12:23:44 UTC
I think Zhenyu wrote this code; could very well have broken in recent kernels possibly due to other IOMMU code changes.

Zhenyu, one other thing I notice when using the IOMMU code is that with DMAR debugging enabled, the kernel will eventually give up tracking DMAR regions due to an overload, and at unload time we seem to have some stale mappings.  Maybe we're not matching map/unmap somewhere?
Comment 14 Chris Wilson 2010-08-08 07:18:17 UTC
I note that DMA-Remapping is now disabled on this chipset due to a few hardware issues...

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.