Bug 23566

Summary: [i965] Uses 100% CPU with latest mesa/libdrm update
Product: xorg Reporter: Bryce Harrington <bryce>
Component: Driver/intelAssignee: Eric Anholt <eric>
Status: RESOLVED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: blocker    
Priority: high CC: quanxian.wang, sa, yingying.zhao
Version: 7.4 (2008.09)Keywords: regression
Hardware: x86 (IA32)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
BootDmesg.txt
none
CurrentDmesg.txt
none
Dependencies.txt
none
XorgLog.txt
none
XsessionErrors.txt
none
gdb.txt none

Description Bryce Harrington 2009-08-27 23:57:36 UTC
Forwarding this bug from Ubuntu:
https://bugs.edge.launchpad.net/ubuntu/+bug/419264

[Problem]
Compiz locks up system using 100% cpu and preventing mouse or keyboard input until it is killed when running with recent git snapshots of libdrm and mesa.  Downgrading to mesa 7.5 and libdrm 2.4.12 the issue goes away.

[Original Report]
compiz eats 100% of the CPU even after restarting ! only kill -9 is able to close th crazy compiz process.

ProblemType: Bug
Architecture: i386
CompizPlugins: [core,ccp,dbus,place,mousepoll,gnomecompat,move,resize,decoration,png,svg,imgjpeg,text,neg,video,wall,snap,animation,scale,scaleaddon,expo,staticswitcher,regex,resizeinfo,workarounds,ezoom,vpswitch,extrawm,fade,session,shift,wobbly]
Date: Wed Aug 26 17:47:57 2009
DistroRelease: Ubuntu 9.10
MachineType: LENOVO 8933Y16
Package: compiz 1:0.8.2-0ubuntu16
PackageArchitecture: all
PccardctlIdent:
 Socket 0:
   no product info available
PccardctlStatus:
 Socket 0:
   no card
PciDisplay: 00:02.0 VGA compatible controller [0300]: Intel Corporation Mobile GM965/GL960 Integrated Graphics Controller [8086:2a02] (rev 0c)
ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.31-7-generic root=UUID=8920ca3c-8a9b-4b68-893c-1fec8a7cf652 ro quiet splash
ProcEnviron:
 PATH=(custom, user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcVersionSignature: Ubuntu 2.6.31-7.27-generic
RelatedPackageVersions:
 xserver-xorg 1:7.4+3ubuntu5
 libgl1-mesa-glx 7.6.0~git20090817.7c422387-0ubuntu2
 libdrm2 2.4.12+git20090801.45078630-0ubuntu1
 xserver-xorg-video-intel 2:2.8.0-0ubuntu2
 xserver-xorg-video-ati 1:6.12.99+git20090629.f39cafc5-0ubuntu6
SourcePackage: compiz
Uname: Linux 2.6.31-7-generic i686
XorgConf: Error: [Errno 2] No such file or directory: '/etc/X11/xorg.conf'
dmi.bios.date: 06/28/2007
dmi.bios.vendor: LENOVO
dmi.bios.version: 7OET24WW (1.03 )
dmi.board.name: 8933Y16
dmi.board.vendor: LENOVO
dmi.board.version: Not Available
dmi.chassis.asset.tag: No Asset Information
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.version: Not Available
dmi.modalias: dmi:bvnLENOVO:bvr7OET24WW(1.03):bd06/28/2007:svnLENOVO:pn8933Y16:pvrThinkPadR61/R61i:rvnLENOVO:rn8933Y16:rvrNotAvailable:cvnLENOVO:ct10:cvrNotAvailable:
dmi.product.name: 8933Y16
dmi.product.version: ThinkPad R61/R61i
dmi.sys.vendor: LENOVO
system: distro = Ubuntu, architecture = i686, kernel = 2.6.31-7-generic
Comment 1 Bryce Harrington 2009-08-27 23:58:40 UTC
Created attachment 28969 [details]
BootDmesg.txt
Comment 2 Bryce Harrington 2009-08-27 23:58:51 UTC
Created attachment 28970 [details]
CurrentDmesg.txt
Comment 3 Bryce Harrington 2009-08-27 23:59:01 UTC
Created attachment 28971 [details]
Dependencies.txt
Comment 4 Bryce Harrington 2009-08-27 23:59:12 UTC
Created attachment 28972 [details]
XorgLog.txt
Comment 5 Bryce Harrington 2009-08-27 23:59:27 UTC
Created attachment 28973 [details]
XsessionErrors.txt
Comment 6 Bryce Harrington 2009-08-27 23:59:39 UTC
Created attachment 28974 [details]
gdb.txt
Comment 7 Chris Wilson 2009-08-28 00:09:14 UTC
Just a quick question to clarify: Is it spinning inside drawWindowTexture() chain or are we doing lots of counter-productive work?

Another couple of gdb traces, or ideally a sysprof, whilst it is spinning would be useful.
Comment 8 Bryce Harrington 2009-08-28 00:12:12 UTC
I took several additional gdb traces but they all look more or less the same - something stuck in _mesa_copy_rect ().
Comment 9 Eric Anholt 2009-08-28 16:41:07 UTC
This is probably yet another case of the lack of LRUs on our fences causing failure.  Writing a patch.
Comment 10 Eric Anholt 2009-08-29 12:14:14 UTC
*** Bug 23220 has been marked as a duplicate of this bug. ***
Comment 11 Eric Anholt 2009-08-29 12:14:26 UTC
*** Bug 23253 has been marked as a duplicate of this bug. ***
Comment 12 Eric Anholt 2009-08-29 12:56:56 UTC
*** Bug 23366 has been marked as a duplicate of this bug. ***
Comment 14 Eric Anholt 2009-08-29 20:08:53 UTC
pull request sent.

commit a09ba7faf75fa4b21980d81de8e5f3d5c0785ccf
Author: Eric Anholt <eric@anholt.net>
Date:   Sat Aug 29 12:49:51 2009 -0700

    drm/i915: Fix CPU-spinning hangs related to fence usage by using an LRU.
    
    The lack of a proper LRU was partially worked around by taking the fence
    from the object containing the oldest seqno.  But if there are multiple
    objects inactive, then they don't have seqnos and the first fence reg
    among them would be chosen.  If you were trying to copy data between two
    mappings, this could result in each page fault stealing the fence from
    the other argument, and your application hanging.
    
    https://bugs.freedesktop.org/show_bug.cgi?id=23566
    https://bugs.freedesktop.org/show_bug.cgi?id=23220
    https://bugs.freedesktop.org/show_bug.cgi?id=23253
    https://bugs.freedesktop.org/show_bug.cgi?id=23366
    
    Cc: Stable Team <stable@kernel.org>
    Signed-off-by: Eric Anholt <eric@anholt.net>
    Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
    Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Comment 15 qwang13 2009-08-31 13:40:11 UTC
(In reply to comment #14)
Hi, Eric
I have put the patch into 2.6.31_RC7,the problem is still there. Are there more patches needed? 
My environment is libdrm2.4.12, Mesa_7.6, xserver-1.6.3,xf86-video-intel:2.8.1

> pull request sent.
> 
> commit a09ba7faf75fa4b21980d81de8e5f3d5c0785ccf
> Author: Eric Anholt <eric@anholt.net>
> Date:   Sat Aug 29 12:49:51 2009 -0700
> 
>     drm/i915: Fix CPU-spinning hangs related to fence usage by using an LRU.
> 
>     The lack of a proper LRU was partially worked around by taking the fence
>     from the object containing the oldest seqno.  But if there are multiple
>     objects inactive, then they don't have seqnos and the first fence reg
>     among them would be chosen.  If you were trying to copy data between two
>     mappings, this could result in each page fault stealing the fence from
>     the other argument, and your application hanging.
> 
>     https://bugs.freedesktop.org/show_bug.cgi?id=23566
>     https://bugs.freedesktop.org/show_bug.cgi?id=23220
>     https://bugs.freedesktop.org/show_bug.cgi?id=23253
>     https://bugs.freedesktop.org/show_bug.cgi?id=23366
> 
>     Cc: Stable Team <stable@kernel.org>
>     Signed-off-by: Eric Anholt <eric@anholt.net>
>     Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
>     Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
> 

Comment 16 Sven Arvidsson 2009-08-31 13:53:13 UTC
I don't know about Compiz, but at least the problems I reported with Warzone 2100 and ETQW have been fixed with this patch.
Comment 17 Eric Anholt 2009-08-31 19:01:19 UTC
Reclose since it's reported fixed by Bryce.
Comment 18 Bryce Harrington 2009-08-31 19:34:40 UTC
Yes, since updating to a kernel which includes this patch, I have been unable to reproduce the bug so far.  I'll continue to keep an eye out for it, and encourage others to likewise test, but so far it appears this patch solved it.

Comment 19 Gordon Jin 2009-09-09 23:32:59 UTC
Good this commit went into 2.6.31.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.