Bug 52473

Summary: [ivb gt1 0x0152 regression] instant death
Product: xorg Reporter: felix.engemann
Component: Driver/intelAssignee: Chris Wilson <chris>
Status: RESOLVED DUPLICATE QA Contact: Xorg Project Team <xorg-team>
Severity: major    
Priority: highest CC: astrand
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
Xorg.0.log while trying to start X Server 1.12.3
none
i915_error_state from Ubuntu 12.04 live CD
none
i915_error_state from Ubuntu 12.04 (SNA activated)
none
Xorg.0.log from Ubuntu 12.04 (SNA activated)
none
dmesg from Ubuntu 12.04 (SNA activated)
none
i915_error_state from Ubuntu 12.04 xorg-edgers (SNA activated)
none
Xorg.0.log from Ubuntu 12.04 (xorg-edgers) (SNA activated)
none
dmesg from Ubuntu 12.04 xorg-edgers (SNA activated)
none
Xorg.0.log from OpenSuse 12.1 (X Server 1.10, Kernel 3.1) WORKING
none
Compile against 1.10
none
Xorg.0.log from OpenSuse 12.1 (X Server 1.10, Kernel 3.1) git drivers
none
i915_error_state from OpenSuse 12.1 (X Server 1.10, Kernel 3.1) git drivers
none
Reduce max thread count to 42
none
Allocate some scratch space for gen7
none
i915_error_state from Patch "Allocate some scratch space for gen7"
none
Reduce WM count for IVB GT1 (sna)
none
Allocate some scratch space for gen7
none
i915_error_state from Patch "64713: Allocate some scratch space for gen7"
none
kdm.log (64713: Allocate some scratch space for gen7) running glxgears
none
Glitches 64712: Reduce WM count for IVB GT1 (sna) with Kernel 3.1
none
Distortions SNA (without kwin effects)
none
Xorg.0.log from Ubuntu 12.04 (git compiled + SNA activated + WM count patch)
none
i915_error_state from Ubuntu 12.04 (git compiled + SNA activated + WM count patch)
none
i915_error_state from Ubuntu 12.04 (git compiled + SNA activated + WM count patch + unity2d) after running glxgears
none
Distortions from Ubuntu 12.04 (git compiled + SNA activated + WM count patch + unity2d) after running glxgears none

Description felix.engemann 2012-07-25 09:10:01 UTC
Created attachment 64645 [details]
Xorg.0.log while trying to start X Server 1.12.3

Board: Zotac H77-ITX-A-E
affected CPU: i5-3470S (HD2500)
reference CPU: i5-3475S (HD4000)

So I have two nearly identical CPUs and identical boards to test with. Only the one with HD2500 is affected by this bug. 

The one with the HD4000 is working (although affected by complete system freezes under Ubuntu 

https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/993187
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/999910 )  


Linux Distros I have tested:
----------------------------
OpenSuse 12.1 (X Server 1.10.x)
OpenSuse 12.2 (X Server 1.12.x)
Debian Wheezy (X Server 1.11.x)
Ubuntu 12.04 (X Server 1.11.3)
Fedora 17 (X Server 1.12.x)


Bug Description:
----------------
- XServer crashes right away with distros offering XServer > 1.10
- On some distros graphical installation of distro works - but heavy glitches and artefacts make it impossible to even read the fonts. 

Only distro I got working with HD2500 is OpenSuse 12.1 (X Server 1.10.x) and Kernel 3.1.x.x - Got some GPU hangs but those could be fixed by installing Kernel 3.4.x.x

This seems odd because OpenSuse 12.1 is offering the oldest XServer of all distros tested. 

With OpenSuse 12.2 I managed to get a Xorg.0.log (attached), which is saying something like this:


[    18.957] [mi] EQ overflowing.  Additional events will be discarded until existing events are processed.
[    18.957] 
[    18.957] Backtrace:
[    18.957] 0: /usr/bin/Xorg (xorg_backtrace+0x36) [0x5646a6]
[    18.957] 1: /usr/bin/Xorg (mieqEnqueue+0x26b) [0x54596b]
[    18.957] 2: /usr/bin/Xorg (0x400000+0x4c402) [0x44c402]
[    18.957] 3: /usr/lib64/xorg/modules/input/evdev_drv.so (0x7f4f2ae26000+0x61e4) [0x7f4f2ae2c1e4]
[    18.957] 4: /usr/bin/Xorg (0x400000+0x732e7) [0x4732e7]
[    18.957] 5: /usr/bin/Xorg (0x400000+0x97710) [0x497710]
[    18.957] 6: /lib64/libpthread.so.0 (0x7f4f32b77000+0xf140) [0x7f4f32b86140]
[    18.957] 7: /lib64/libc.so.6 (ioctl+0x7) [0x7f4f31ae17f7]
[    18.957] 8: /usr/lib64/libdrm.so.2 (drmIoctl+0x28) [0x7f4f30d33af8]
[    18.957] 9: /usr/lib64/libdrm.so.2 (drmCommandNone+0x16) [0x7f4f30d35ea6]
[    18.957] 10: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7f4f2dd34000+0x145de) [0x7f4f2dd485de]
[    18.957] 11: /usr/bin/Xorg (BlockHandler+0x4a) [0x43c64a]
[    18.957] 12: /usr/bin/Xorg (WaitForSomething+0x114) [0x561a44]
[    18.957] 13: /usr/bin/Xorg (0x400000+0x385f1) [0x4385f1]
[    18.957] 14: /usr/bin/Xorg (0x400000+0x27965) [0x427965]
[    18.957] 15: /lib64/libc.so.6 (__libc_start_main+0xf5) [0x7f4f31a25455]
[    18.957] 16: /usr/bin/Xorg (0x400000+0x27c3d) [0x427c3d]
[    18.957] 
[    18.957] [mi] These backtraces from mieqEnqueue may point to a culprit higher up the stack.
[    18.957] [mi] mieq is *NOT* the cause.  It is a victim.
[    20.197] [mi] EQ overflow continuing.  100 events have been dropped.
[    20.197] 
[    20.197] Backtrace:
[    20.197] 0: /usr/bin/Xorg (xorg_backtrace+0x36) [0x5646a6]
[    20.197] 1: /usr/bin/Xorg (0x400000+0x4c402) [0x44c402]
[    20.197] 2: /usr/lib64/xorg/modules/input/evdev_drv.so (0x7f4f2ae26000+0x61e4) [0x7f4f2ae2c1e4]
[    20.197] 3: /usr/bin/Xorg (0x400000+0x732e7) [0x4732e7]
[    20.197] 4: /usr/bin/Xorg (0x400000+0x97710) [0x497710]
[    20.197] 5: /lib64/libpthread.so.0 (0x7f4f32b77000+0xf140) [0x7f4f32b86140]
[    20.197] 6: /lib64/libc.so.6 (ioctl+0x7) [0x7f4f31ae17f7]
[    20.197] 7: /usr/lib64/libdrm.so.2 (drmIoctl+0x28) [0x7f4f30d33af8]
[    20.197] 8: /usr/lib64/libdrm.so.2 (drmCommandNone+0x16) [0x7f4f30d35ea6]
[    20.197] 9: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7f4f2dd34000+0x145de) [0x7f4f2dd485de]
[    20.197] 10: /usr/bin/Xorg (BlockHandler+0x4a) [0x43c64a]
[    20.197] 11: /usr/bin/Xorg (WaitForSomething+0x114) [0x561a44]
[    20.197] 12: /usr/bin/Xorg (0x400000+0x385f1) [0x4385f1]
[    20.197] 13: /usr/bin/Xorg (0x400000+0x27965) [0x427965]
[    20.197] 14: /lib64/libc.so.6 (__libc_start_main+0xf5) [0x7f4f31a25455]
[    20.197] 15: /usr/bin/Xorg (0x400000+0x27c3d) [0x427c3d]
[    20.197] 
[    21.127] [mi] Increasing EQ size to 512 to prevent dropped events.
[    21.127] [mi] EQ processing has resumed after 184 dropped events.
[    21.127] [mi] This may be caused my a misbehaving driver monopolizing the server's resources.
[    48.513] (WW) intel(0): flip queue failed: Input/output error
[    49.224] (WW) intel(0): Page flip failed: Input/output error
[    49.224] (WW) intel(0): divisor 0 get vblank counter failed: Invalid argument
[    49.378] (II) AIGLX: Suspending AIGLX clients for VT switch


I also think this may have something to do with the full system freezes Bugs mentioned above. With the HD4000 and XServer 1.10 (OpenSuse 12.1) I do not suffer system freezes at all!

I'm really willing to start further testing in order to sort this bug out - so if some dev could lead me in the right direction.

Thx Felix
Comment 1 Chris Wilson 2012-07-25 09:19:19 UTC
There has been a GPU hang, please mount debugfs and attach /sys/kernel/debug/dri/0/i915_error_state.
Comment 2 felix.engemann 2012-07-25 09:57:04 UTC
Created attachment 64650 [details]
i915_error_state from Ubuntu 12.04 live CD
Comment 3 Chris Wilson 2012-07-25 10:16:26 UTC
So it dies on the very first UXA render batch buffer. Either we program some state incorrectly for HD2500, or we are missing some chicken bits in the kernel. Can you check whether SNA dies in the same way?
Comment 4 felix.engemann 2012-07-25 10:19:57 UTC
Chris, i will try to do an alternate install of Ubuntu 12.04 in order to change rendering in xorg.conf
Comment 5 felix.engemann 2012-07-25 11:34:26 UTC
Created attachment 64665 [details]
i915_error_state from Ubuntu 12.04 (SNA activated)

sna rendering activated by:

Section "Device"
 Identifier	"card0"
 Driver	"intel"
 Option	"AccelMethod" "sna"
EndSection

in /etc/X11/xorg.conf.d/20-intel.conf

Hope I did that correct?! Anyway X Server won't start ...
Comment 6 felix.engemann 2012-07-25 11:34:53 UTC
Created attachment 64666 [details]
Xorg.0.log from Ubuntu 12.04 (SNA activated)
Comment 7 felix.engemann 2012-07-25 11:35:14 UTC
Created attachment 64667 [details]
dmesg from Ubuntu 12.04 (SNA activated)
Comment 8 Chris Wilson 2012-07-25 11:45:38 UTC
Ok, that driver is too old to have SNA enabled, so it is just using UXA and hitting the same hang.
Comment 9 felix.engemann 2012-07-25 11:47:39 UTC
Ok. That is what I suspected ... Sorry .. Looking for a PPA with newer drivers. Will post again ..
Comment 10 felix.engemann 2012-07-25 12:07:26 UTC
Created attachment 64668 [details]
i915_error_state from Ubuntu 12.04 xorg-edgers (SNA activated)
Comment 11 felix.engemann 2012-07-25 12:07:57 UTC
Created attachment 64669 [details]
Xorg.0.log from Ubuntu 12.04 (xorg-edgers) (SNA activated)
Comment 12 felix.engemann 2012-07-25 12:08:35 UTC
Created attachment 64671 [details]
dmesg from Ubuntu 12.04 xorg-edgers (SNA activated)
Comment 13 felix.engemann 2012-07-25 12:12:00 UTC
Hi Chris, this time SNA is activated - but X Server still crashes badly.

Xorg.0.log says:
[    15.400] (II) intel(0): SNA initialized with IvyBridge backend

Please note that the xorg-edgers PPA also pulls in a whole new kernel (3.5.x) +
xorg stack (X Server 1.12.3) to Ubuntu 12.04.
Comment 14 Chris Wilson 2012-07-25 12:49:09 UTC
Slightly different, it didn't die immediately on the first render batch. However, very early on and in similar circumstances.

That rules out an easy approach of just comparing and contrasting the UXA/SNA batches, I will have to look harder at the code instead.
Comment 15 felix.engemann 2012-07-25 12:57:14 UTC
Ok thx Chris, ... If you have anything I could test for you, just let me know. The HD2500 isn't as comman as the HD4000 I'm afraid.

I think it's strange that the HD2500 is working on much older X Server 1.10 with Kernel 3.1 ...
Comment 16 Chris Wilson 2012-07-25 13:03:25 UTC
Does 3.1 even recognise IVB? Could you please attach the old Xorg.log?
Comment 17 felix.engemann 2012-07-25 13:11:21 UTC
Maybe support for IVB is backported by OpenSuse kernel developers?! I don't know. Installing OpenSuse 12.1 right now on yet another disk. Will post Xorg.0.log of working configuration.
Comment 18 felix.engemann 2012-07-25 13:31:54 UTC
Created attachment 64673 [details]
Xorg.0.log from OpenSuse 12.1 (X Server 1.10, Kernel 3.1) WORKING

Hi Chris, attached is the Xorg.log from my only working configuration (OpenSuse 12.1 plain installation / X-Server 1.10 / Kernel 3.1)
Comment 19 Chris Wilson 2012-07-25 13:40:58 UTC
Ok, now can we do a little experimentation?

Can you update the kernel on the OpenSuse box? Does that explode.
Then can you update the ddx, and see if that explodes with either kernel?

If you can compile the kernel/ddx from source for OpenSuse that will help for the next phase... :)
Comment 20 felix.engemann 2012-07-25 13:58:21 UTC
Sure we can do some experimentation ...

1) I already tried Kernel 3.4.4 (not self compiled) from OpenSuse Repo (http://download.opensuse.org/repositories/Kernel:/stable/standard/)

Nothing exploded ... things got even better. I even got rid of some "hangcheck timer elapsed" messages.

2) I'm willing to compile Kernel / DDX. So if you could give me some advice which version of drivers I should compile respectively which combination of Kernel / ddx ...
Comment 21 Chris Wilson 2012-07-25 14:07:37 UTC
Ok, to catch up to Ubuntu we should check kernel-3.5 to be sure. But it looks like the hunt is on for a bug in the ddx (hopefully, as it tends to be a little easier to bisect).

So if you grab xf86-video-intel from http://cgit.freedesktop.org/xorg/driver/xf86-video-intel (git clone git://anongit.freedesktop.org/xorg/driver/xf86-video-intel) and run ./autogen.sh --prefix=/usr and install (you can overwrite the installed driver with the package manager later) and see if that triggers the hang.
Comment 22 felix.engemann 2012-07-25 14:19:45 UTC
ok will try that Chris. First I have to set up some sane build environment on that OpenSuse box ... argh ...

Did I tell you, that I'm normally trying to avoid everything which has Suse in its name ... :)
Comment 23 felix.engemann 2012-07-25 15:33:51 UTC
Still git-cloning http://cgit.freedesktop.org/xorg/driver/xf86-video-intel 
Guess this will take same time for the initial clone ... Will report back in a few hours ...
Comment 24 felix.engemann 2012-07-25 22:20:41 UTC
Ok build environment ready now. Installed some tools and xorg-*-devel packages to sort some errors out. But I have still some problems running autogen on xf86-video-intel  (libdrm > 2.4.29 requested):

checking sys/sysinfo.h usability... yes
checking sys/sysinfo.h presence... yes
checking for sys/sysinfo.h... yes
checking whether to include SNA support... yes
checking whether to include UXA support... yes
checking for DRMINTEL... no
configure: error: Package requirements (libdrm_intel >= 2.4.29) were not met:

Requested 'libdrm_intel >= 2.4.29' but version of libdrm is 2.4.26

Consider adjusting the PKG_CONFIG_PATH environment variable if you
installed software in a non-standard prefix.

Alternatively, you may set the environment variables DRMINTEL_CFLAGS
and DRMINTEL_LIBS to avoid the need to call pkg-config.
See the pkg-config man page for more details.

-------------------------------------------------------------------------------

Where can I get appropriate libdrm. And how can I tell autogen.sh to use the right one.
Comment 25 Chris Wilson 2012-07-25 22:49:19 UTC
libdrm is available from git://anongit.freedesktop.org/mesa/drm (http://cgit.freedesktop.org/mesa/drm) and if you compile and install it in the usual fashion, the rest of the build will pick it up.
Comment 26 felix.engemann 2012-07-25 23:40:23 UTC
Ok next error i couldn't sort out. autogen.sh in xf86-video-intel works now. But make fails:


  CC     intel_xvmc.lo
In file included from intel_xvmc.h:43:0,
                 from intel_xvmc.c:27:
/usr/local/include/xf86drm.h:40:17: fatal error: drm.h: No such file or directory
compilation terminated.
make[4]: *** [intel_xvmc.lo] Fehler 1
make[4]: Leaving directory `/home/engemann/DDX/xf86-video-intel/src/xvmc'
make[3]: *** [all-recursive] Fehler 1
make[3]: Leaving directory `/home/engemann/DDX/xf86-video-intel/src/xvmc'
make[2]: *** [all-recursive] Fehler 1
make[2]: Leaving directory `/home/engemann/DDX/xf86-video-intel/src'
make[1]: *** [all-recursive] Fehler 1
make[1]: Leaving directory `/home/engemann/DDX/xf86-video-intel'
make: *** [all] Fehler 2


-----------------------------------------------
But: There is a /usr/include/libdrm/drm.h

I exported CPPFLAGS:

export CPPFLAGS="-I/usr/include/libdrm/"

with no success
Comment 27 felix.engemann 2012-07-25 23:47:01 UTC
Ok forget about the last post. Simply added a wrong --prefix while configuring libdrm. But still getting errors building the drivers:

/usr/include/X11/fonts/fontproto.h:61:12: note: previous declaration of 'StoreFontClientFont' was here
  CC     libfb_la-fbimage.lo
In file included from /usr/include/xorg/privates.h:145:0,
                 from /usr/include/xorg/gcstruct.h:59,
                 from fb.h:36,
                 from fbimage.c:26:
/usr/include/xorg/dix.h:527:22: warning: redundant redeclaration of 'ffs' [-Wredundant-decls]
  CC     libfb_la-fbline.lo
In file included from /usr/include/xorg/privates.h:145:0,
                 from /usr/include/xorg/gcstruct.h:59,
                 from fb.h:36,
                 from fbline.c:24:
/usr/include/xorg/dix.h:527:22: warning: redundant redeclaration of 'ffs' [-Wredundant-decls]
fbline.c: In function 'fbZeroLine':
fbline.c:63:10: warning: declaration of 'y1' shadows a global declaration [-Wshadow]
  CC     libfb_la-fbpict.lo
In file included from /usr/include/xorg/privates.h:145:0,
                 from /usr/include/xorg/gcstruct.h:59,
                 from fb.h:36,
                 from fbpict.c:28:
/usr/include/xorg/dix.h:527:22: warning: redundant redeclaration of 'ffs' [-Wredundant-decls]
In file included from /usr/include/xorg/picturestr.h:28:0,
                 from fbpict.c:30:
/usr/include/xorg/glyphstr.h:100:1: warning: redundant redeclaration of 'FindGlyphHashSet' [-Wredundant-decls]
/usr/include/xorg/glyphstr.h:94:1: note: previous declaration of 'FindGlyphHashSet' was here
fbpict.c: In function 'sfbComposite':
fbpict.c:49:2: error: too few arguments to function 'miCompositeSourceValidate'
/usr/include/xorg/mipict.h:84:1: note: declared here
fbpict.c:51:3: error: too few arguments to function 'miCompositeSourceValidate'
/usr/include/xorg/mipict.h:84:1: note: declared here
make[4]: *** [libfb_la-fbpict.lo] Fehler 1
make[4]: Leaving directory `/home/engemann/DDX/xf86-video-intel/src/sna/fb'
make[3]: *** [all-recursive] Fehler 1
make[3]: Leaving directory `/home/engemann/DDX/xf86-video-intel/src/sna'
make[2]: *** [all-recursive] Fehler 1
make[2]: Leaving directory `/home/engemann/DDX/xf86-video-intel/src'
make[1]: *** [all-recursive] Fehler 1
make[1]: Leaving directory `/home/engemann/DDX/xf86-video-intel'
make: *** [all] Fehler 2
Comment 28 Chris Wilson 2012-07-26 00:03:59 UTC
Created attachment 64690 [details] [review]
Compile against 1.10

This patch should do the trick temporarily.
Comment 29 felix.engemann 2012-07-26 00:39:26 UTC
Thx. Finally I made it. I compiled and installed git drivers, restarted X - and it blew the whole thing up. X Server is starting first - but Image is horribly distorted and heavily artefacted. Made my way to the console with CTRL-ALT-FX because the whole UI is hanging ..

attaching Xorg.0.log and i915_error_state now
Comment 30 felix.engemann 2012-07-26 00:40:07 UTC
Created attachment 64691 [details]
Xorg.0.log from OpenSuse 12.1 (X Server 1.10, Kernel 3.1) git drivers
Comment 31 felix.engemann 2012-07-26 00:41:08 UTC
Created attachment 64692 [details]
i915_error_state from OpenSuse 12.1 (X Server 1.10, Kernel 3.1) git drivers
Comment 32 Chris Wilson 2012-07-26 00:50:37 UTC
I get the feeling that's a different hang. Does the current set of kernel & drivers hang immediately boot like the original bug?
Comment 33 felix.engemann 2012-07-26 00:58:03 UTC
Yes seems different to me too. Drivers on Ubuntu 12.04 did hang immediate. Now X seems to start but screen is horribly distorted. It goes from a black screen to horribly distorted every few seconds. So something between intel drivers on Ubuntu 12.04 till now changed. But not for the better .... Working drivers for HD2500 on OpenSuse 12.1 got messed up.
Comment 34 Chris Wilson 2012-07-26 07:38:42 UTC
So lets tackle the ddx on OpenSuse issue and see if that leads to the wider issue. If you 'cd xf86-video-intel && git checkout 2.16.0' compile, install and check if that works. Hopefully that should build the working suse driver from before.

Then the hunt begins...

git bisect start
git bisect good 2.16.0
git bisect bad master

Compile, install, test.

If it works, say git bisect good, otherwise git bisect bad, and lets hope it leads to the culprit/victim...
Comment 35 felix.engemann 2012-07-26 07:55:41 UTC
Hi Chris, yes self compiled 2.16.0 works as expected. No distortions, no hangs, nice clean i915_error_state ..

I never did a bisect before, so I can't say what 

git bisect start
git bisect good 2.16.0
git bisect bad master

will do exactly ... Shouldn't I check out next version, compile, install, test until bug appears, so that we know which version broke ?
Comment 36 felix.engemann 2012-07-26 08:03:23 UTC
So ok I see - this is what 

git bisect start
git bisect good 2.16.0
git bisect bad master

does. 

[    27.885] (II) Loading /usr/lib64/xorg/modules/drivers/intel_drv.so
[    27.885] (II) Module intel: vendor="X.Org Foundation"
[    27.885]    compiled for 1.10.4, module version = 2.17.0

2.17.0 seems to introduce the bug. Same distorted screen as 2.20.1 
I can post Xorg.0.log and i915_error_state if necessary.
Comment 37 Chris Wilson 2012-07-26 08:09:14 UTC
Ok, the git bisection should hopefully narrow down to which commit actually broke things.

cd xf86-video-intel
git bisect start # tell git to prepare for bisection
git bisect good 2.16.0 # last known good, one end point
git bisect bad 2.17.0 # first known bad, the other end point

git will then pick a commit roughly half-way between the two for you to compile and test. Then after you say git bisect good or git bisect bad (depending on whether the test passed or failed) it will choose another commit for your to test, roughly between the two new endpoints. Eventually it will whittle down to the most likely suspect.

Meanwhile I can review all the patches that went into 2.17.0.
Comment 38 felix.engemann 2012-07-26 08:29:32 UTC
cd xf86-video-intel
git bisect start 
git bisect good 2.16.0 
git bisect bad 2.17.0 

compile, install - same thing


> git bisect bad
Bisecting: 102 revisions left to test after this (roughly 7 steps)
[c6acf1325833b8679ef09ab74f0cb0fd82a8cd92] sna/accel: Micro-optimise sna_fill_spans_blt

./autogen.sh --prefix=/usr && make && make install

-> same thing


> git bisect bad
Bisecting: 51 revisions left to test after this (roughly 6 steps)
[84a7c11a8134dfd040d2f90bb1e0670aa2c89962] sna/video: Stop advertising unsupported Xv attributes


./autogen.sh --prefix=/usr && make && make install

-> WORKS! 



Does this help? Can I further narrow down if i now do "git bisect good" until it fails again ?
Comment 39 felix.engemann 2012-07-26 08:53:06 UTC
Ok I just continued bisecting ...



> git bisect good
Bisecting: 25 revisions left to test after this (roughly 5 steps)
[61764af13aa3c770d19d51c8ad198cab8a5866f1] sna/dri: Bump DRI2INFOREC_VERSION

./autogen.sh --prefix=/usr && make && make install
-> WORKS! 


>git bisect good
Bisecting: 12 revisions left to test after this (roughly 4 steps)
[4fd46b8bb7e7a104a0afa0e5dee92993e043ef57] sna/glyphs: Add glyphs directly onto a client temporary buffer

./autogen.sh --prefix=/usr && make && make install
-> FAILS


>git bisect bad
Bisecting: 6 revisions left to test after this (roughly 3 steps)
[6bbb88af096e054877409a54d0e0a4ccf5ee317e] Fix incorrect maximum PS thread count on IvyBridge

./autogen.sh --prefix=/usr && make && make install
-> FAILS


>git bisect bad
Bisecting: 2 revisions left to test after this (roughly 2 steps)
[c68856f34653ac3e7af900dfbba41a108ffe119e] sna/accel: Only skip undamaging the GPU for reads

./autogen.sh --prefix=/usr && make && make install
-> WORKS!


>git bisect good
Bisecting: 0 revisions left to test after this (roughly 1 step)
[7f7f95abbf57e6e71f6a30d917f97c2f2bd6cea9] sna/accel: Use the PolyFillRect to handle tiled spans
-> WORKS!

>git bisect good
6bbb88af096e054877409a54d0e0a4ccf5ee317e is the first bad commit
commit 6bbb88af096e054877409a54d0e0a4ccf5ee317e
Author: Kenneth Graunke <kenneth@whitecape.org>
Date:   Sat Sep 24 09:27:33 2011 +0100

    Fix incorrect maximum PS thread count on IvyBridge
    
    I mistakenly set GEN7_PS_MAX_THREAD_SHIFT to 23; it's actually 24 on
    Ivybridge.  Not only did this halve our thread count, it caused us to
    write 1 into a bit 23, which is marked as MBZ (must be zero).
    Furthermore, it made us write an even number into this field, which is
    apparently not allowed.  Apparently we were just lucky it worked.
    
    Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

:040000 040000 15f01b78a0d0224a8ac73b0aec0740ed782bcb87 00ea8e4e6c877d1699849b63398c9603f04bd678 M	src
Comment 40 Chris Wilson 2012-07-26 09:11:46 UTC
Created attachment 64705 [details] [review]
Reduce max thread count to 42

Scary. Can you please try this patch to reduce the max thread count?
Comment 41 Chris Wilson 2012-07-26 09:14:54 UTC
Can you please tell me the output of "intel_reg_read 0x7008"? (intel_reg_read is part of intel-gpu-tools, http://cgit.freedesktop.org/xorg/app/intel-gpu-tools)
Comment 42 felix.engemann 2012-07-26 09:20:33 UTC
> intel_reg_read 0x7008
Couldn't map MMIO region: No such file or directory

This was before applying patch attached above. 


Before I apply your patch I will first checkout 2.17.0 right ?

> git checkout 2.17.0

to be sure the bug is in ?
Comment 43 Chris Wilson 2012-07-26 09:26:10 UTC
Try sudo intel_reg_read 0x7008

You can first try the patch against 2.17.0 (might need some fuzzing, it's just changing the value 86 to 42 next to GEN7_PS_MAX_THREADS_SHIFT) to confirm your bisection and then try against master to confirm that it is at least one of the issues involved...
Comment 44 felix.engemann 2012-07-26 09:41:59 UTC
Ok, your patch worked!

> cd xf86-video-intel
> git checkout 2.17.0

-> Applying your patch
-> configure, compile, install 
-> No distortions no hang on 2.17.0


intel_reg_read 0x7008 

was already run as root ... Maybe wrong version ?! I just executed the version which was already installed on that OpenSuse box.
Comment 45 felix.engemann 2012-07-26 09:52:32 UTC
Your patch also worked on master! What I did:

> git stash
> git checkout master
-> applied patches for build against 1.10 and Reduce max thread count to 42
-> configure / compile / install


[    27.882] (II) Loading /usr/lib64/xorg/modules/drivers/intel_drv.so
[    27.882] (II) Module intel: vendor="X.Org Foundation"
[    27.882]    compiled for 1.10.4, module version = 2.20.1
[    27.882]    Module class: X.Org Video Driver
[    27.882]    ABI class: X.Org Video Driver, version 10.0


No glitches / no hangs / everything seems to work. Don't know if this makes sense!?

> intel_reg_read 0x7008 (as root)
Couldn't map MMIO region: No such file or directory
Comment 46 Chris Wilson 2012-07-26 10:08:39 UTC
Created attachment 64710 [details] [review]
Allocate some scratch space for gen7

Another idea...

Can you please apply this patch and test SNA?
Comment 47 felix.engemann 2012-07-26 10:22:33 UTC
Ok

> git stash
> git checkout master

-> patching with following patches
- Compile against 1.10
- Allocate some scratch space for gen7

-> configuring, compiling, installing


UXA fails with same distortaion
SNA fails with similar but not same distortions. Screen doesn't periodically turns black. Distortions seems less. But gets obvious when moving a window. Especially content of window gets distorted.

Attaching i915_error_state
Comment 48 felix.engemann 2012-07-26 10:25:09 UTC
Created attachment 64711 [details]
i915_error_state from Patch "Allocate some scratch space for gen7"
Comment 49 Chris Wilson 2012-07-26 10:29:47 UTC
Created attachment 64712 [details] [review]
Reduce WM count for IVB GT1 (sna)

So this is the equivalent patch for SNA. Drop the failed scratch space idea and please try this instead. Thanks!
Comment 50 Chris Wilson 2012-07-26 10:34:54 UTC
Created attachment 64713 [details] [review]
Allocate some scratch space for gen7

Drat. I had my arguments reversed (thanks for the error-state!) can you try the compile + scratch space again for a quick test?
Comment 51 felix.engemann 2012-07-26 10:40:30 UTC
(In reply to comment #49)
> Created attachment 64712 [details] [review] [review]
> Reduce WM count for IVB GT1 (sna)
> 
> So this is the equivalent patch for SNA. Drop the failed scratch space idea and
> please try this instead. Thanks!

So first this patch "Reducing WM count". This one ALMOST solves the issue for SNA. There are minor glitches (only on the window decorator of one window I have open). But no hangs. /sys/kernel/debug/dri/0/i915_error_state is empty.
Comment 52 felix.engemann 2012-07-26 10:47:45 UTC
(In reply to comment #50)
> Created attachment 64713 [details] [review] [review]
> Allocate some scratch space for gen7
> 
> Drat. I had my arguments reversed (thanks for the error-state!) can you try the
> compile + scratch space again for a quick test?

Ok. All previously applied patches dropped. (git stash) Applied build 1.10 patch and this one. 

Tested on SNA. Everything seems to work (no distortions, no glitches) ... But Xorg.0.log is saying: 


[    37.421] (EE) intel(0): Detected a hung GPU, disabling acceleration.
[    37.421] (EE) intel(0): When reporting this, please include i915_error_state from debugfs and the full dmesg.

Attaching i915_error_state
Comment 53 felix.engemann 2012-07-26 10:49:26 UTC
Created attachment 64714 [details]
i915_error_state from Patch "64713: Allocate some scratch space for gen7"
Comment 54 felix.engemann 2012-07-26 10:53:08 UTC
Uuuups. Just ran glxgears. X restarted and all distortions and glitches are back ... 

(In reply to comment #52)
> (In reply to comment #50)
> > Created attachment 64713 [details] [review] [review] [review]
> > Allocate some scratch space for gen7
> > 
> > Drat. I had my arguments reversed (thanks for the error-state!) can you try the
> > compile + scratch space again for a quick test?
> 
> Ok. All previously applied patches dropped. (git stash) Applied build 1.10
> patch and this one. 
> 
> Tested on SNA. Everything seems to work (no distortions, no glitches) ... But
> Xorg.0.log is saying: 
> 
> 
> [    37.421] (EE) intel(0): Detected a hung GPU, disabling acceleration.
> [    37.421] (EE) intel(0): When reporting this, please include
> i915_error_state from debugfs and the full dmesg.
> 
> Attaching i915_error_state
Comment 55 Chris Wilson 2012-07-26 11:12:59 UTC
Hmm, do you have the log file for the glxgears X crash?
Comment 56 Chris Wilson 2012-07-26 11:15:50 UTC
Also are you able to test the reduce WM thread patches on the Ubuntu system as well?
Comment 57 Chris Wilson 2012-07-26 11:21:03 UTC
Also do you mind grabbing a photo of the residual corruption after the SNA thread reduction? Does it persist with an uptodate kernel? Does it go away if we reduce the 42 threads to 8?
Comment 58 felix.engemann 2012-07-26 11:26:23 UTC
Yes I can try WM reducing Patch on Ubuntu 12.04 if you want. Will take a while until  build environment is set up ...

Just to clear things up. The patch we talked about in the last comments is a different ... (Allocate some scratch space for gen7)

The WM reducing patches solves the issue under UXA and almost solves it under SNA (glitches on window decorator - which could be another issue maybe kwin) But no hangs.

(In reply to comment #56)
> Also are you able to test the reduce WM thread patches on the Ubuntu system as
> well?
Comment 59 felix.engemann 2012-07-26 11:27:08 UTC
I can reproduce this issue. Which log do you need?

(In reply to comment #55)
> Hmm, do you have the log file for the glxgears X crash?
Comment 60 Chris Wilson 2012-07-26 11:30:49 UTC
Right, I suspect the issue with SNA and the reduced thread count is going to be a missing cache flush in the kernel, hence the interest in seeing if an updated kernel fixes things. And of course wishing to confirm that this is the same issue as befell Ubuntu.

Don't worry too much about reproducing the glxgears, just look in /var/log/kdm.log for a recent crash. I'm quite confident that the cause was indirect GLX asserting due to a GPU hang. I wish it wouldn't since it kills X...
Comment 61 felix.engemann 2012-07-26 11:35:37 UTC
Created attachment 64721 [details]
kdm.log (64713: Allocate some scratch space for gen7) running glxgears
Comment 62 felix.engemann 2012-07-26 11:44:57 UTC
(In reply to comment #57)
> Also do you mind grabbing a photo of the residual corruption after the SNA
> thread reduction? Does it persist with an uptodate kernel? Does it go away if
> we reduce the 42 threads to 8?

Attaching screenshot for SNA + WM Reduce Patch. 

uname -a
Linux linux-kwrd 3.1.10-1.16-desktop #1 SMP PREEMPT Wed Jun 27 05:21:40 UTC 2012 (d016078) x86_64 x86_64 x86_64 GNU/Linux


Will now try newer Kernel - then reduce threads further
Comment 63 felix.engemann 2012-07-26 11:45:54 UTC
Created attachment 64722 [details]
Glitches 64712: Reduce WM count for IVB GT1 (sna) with Kernel 3.1
Comment 64 Chris Wilson 2012-07-26 11:50:36 UTC
What an odd set of glitches. At least they look to be a separate issue!

Can you please run "addr2line -e /usr/lib64/xorg/modules/drivers/intel_drv.so 0xc6560 0xc7d32" on that Suse build?
Comment 65 felix.engemann 2012-07-26 11:54:16 UTC
Yes very odd. 

Linux linux-kwrd 3.4.5-1-desktop #1 SMP PREEMPT Wed Jul 18 09:09:22 UTC 2012 (cf1edf7) x86_64 x86_64 x86_64 GNU/Linux

does not fix them.
Comment 66 felix.engemann 2012-07-26 11:55:35 UTC
(In reply to comment #64)
> What an odd set of glitches. At least they look to be a separate issue!
> 
> Can you please run "addr2line -e /usr/lib64/xorg/modules/drivers/intel_drv.so
> 0xc6560 0xc7d32" on that Suse build?

addr2line -e /usr/lib64/xorg/modules/drivers/intel_drv.so 0xc6560 0xc7d32
/home/engemann/DDX/xf86-video-intel/src/sna/sna_damage.h:156
/home/engemann/DDX/xf86-video-intel/src/sna/sna_dri.c:1653
Comment 67 felix.engemann 2012-07-26 12:11:05 UTC
(In reply to comment #57)
> Also do you mind grabbing a photo of the residual corruption after the SNA
> thread reduction? Does it persist with an uptodate kernel? Does it go away if
> we reduce the 42 threads to 8?


Next Test. In src/sna/gen7_render.c changed:

static const struct gt_info ivb_0152_info = {
        .max_vs_threads = 36,
        .max_gs_threads = 36,
        .max_wm_threads = (42-1) << GEN7_PS_MAX_THREADS_SHIFT,
        .urb = { 128, 512, 192 },
};


TO:

static const struct gt_info ivb_0152_info = {
        .max_vs_threads = 36,
        .max_gs_threads = 36,
        .max_wm_threads = (8-1) << GEN7_PS_MAX_THREADS_SHIFT,
        .urb = { 128, 512, 192 },
};

-> Same glitches as before. 

Remeber first WM Patch:
- Reduce max thread count to 42

is not applied here.
 

On all SNA tests only:
- Reduce WM count for IVB GT1 (sna) 

was applied
Comment 68 Chris Wilson 2012-07-26 12:25:22 UTC
As another datapoint do you see the same corruption under SNA and KDE on the GT2 device?
Comment 69 Chris Wilson 2012-07-26 12:26:15 UTC
Also what effects do you have active for kwin?
Comment 70 Chris Wilson 2012-07-26 12:29:03 UTC
(And please excuse the noise)

Can you paste what lines you have for

/home/engemann/DDX/xf86-video-intel/src/sna/sna_damage.h:156
/home/engemann/DDX/xf86-video-intel/src/sna/sna_dri.c:1653

as I'm being left baffled. :)
Comment 71 felix.engemann 2012-07-26 12:33:47 UTC
(In reply to comment #69)
> Also what effects do you have active for kwin?

Pretty much standard XRandr effects. If I deactivate kwin effects. Distortions are little less but similar. See attachment
Comment 72 felix.engemann 2012-07-26 12:34:30 UTC
Created attachment 64726 [details]
Distortions SNA (without kwin effects)
Comment 73 felix.engemann 2012-07-26 12:39:23 UTC
(In reply to comment #70)
> (And please excuse the noise)
> 
> Can you paste what lines you have for
> 
> /home/engemann/DDX/xf86-video-intel/src/sna/sna_damage.h:156
> /home/engemann/DDX/xf86-video-intel/src/sna/sna_dri.c:1653
> 
> as I'm being left baffled. :)

> /home/engemann/DDX/xf86-video-intel/src/sna/sna_damage.h:156
151 fastcall struct sna_damage *_sna_damage_subtract(struct sna_damage *damage,
152                                                  RegionPtr region);
153 static inline void sna_damage_subtract(struct sna_damage **damage,
154                                        RegionPtr region)
155 {
156         *damage = _sna_damage_subtract(DAMAGE_PTR(*damage), region);
157         assert(*damage == NULL || (*damage)->mode != DAMAGE_ALL);
158 }
159 


> /home/engemann/DDX/xf86-video-intel/src/sna/sna_dri.c:1653
1634 sna_dri_immediate_blit(struct sna *sna,
1635                        DrawablePtr draw,
1636                        struct sna_dri_frame_event *info)
1637 {
1638         drmVBlank vbl;
1639 
1640         DBG(("%s: emitting immediate blit, throttling client\n", __FUNCTION__));
1641         VG_CLEAR(vbl);
1642 
1643         if ((sna->flags & SNA_NO_WAIT) == 0) {
1644                 info->type = DRI2_SWAP_THROTTLE;
1645                 if (sna_dri_window_get_chain((WindowPtr)draw) == info) {
1646                         DBG(("%s: no pending blit, starting chain\n",
1647                              __FUNCTION__));
1648 
1649                         info->bo = sna_dri_copy_to_front(sna, draw, NULL,
1650                                                          get_private(info->front)->bo,
1651                                                          get_private(info->back)->bo,
1652                                                          true);
1653                         DRI2SwapComplete(info->client, draw, 0, 0, 0,
1654                                          DRI2_BLIT_COMPLETE,
1655                                          info->event_complete,
1656                                          info->event_data);
1657
Comment 74 felix.engemann 2012-07-26 12:52:03 UTC
(In reply to comment #68)
> As another datapoint do you see the same corruption under SNA and KDE on the
> GT2 device?

Sorry can't test a HD4000 at the moment. The one I had for testing is already sold since yesterday. All I can say is that on this device Ubuntu 12.04 worked out of the box (no glitches, all effects, etc ...) But we had some nasty System Freezes:

https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/993187
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/999910

But seemed to be fixed with Kernel 3.4.4 .. hopefully ... At least no customer complaint about Freezes since Kernel 3.4.4 

Next Test HD4000 CPU (i5 3475S) will arrive shortly. Then I can test SNA distortions ... 

We are a small company from germany selling mini-PCs with Ubuntu preinstalled (www.cirrus7.com). We started selling our Ivy Bridge model shortly. So first shock was constant freezes under Ubuntu in first tests ... 

Second shock was HD2500 did not work at all ... So we stopped selling all HD2500 models ...
Comment 75 Chris Wilson 2012-07-26 13:01:15 UTC
It came as a bit of shock to us as well... Many thanks for your efforts!
Comment 76 felix.engemann 2012-07-26 13:14:24 UTC
(In reply to comment #75)
> It came as a bit of shock to us as well... Many thanks for your efforts!

No problem .. It's in my own interest that the problems get sorted out ... :) 

I'm setting up a Ubuntu 12.04 build environment now to confirm WM count patches under Ubuntu Xorg stack. This will take some hours. Will report back ...
Comment 77 Chris Wilson 2012-07-26 14:26:41 UTC
Ok, spotted how we could end up with an invalid pointer there after the GPU hangs, and hopefully corrected it.

Now all that remains is verifying the fix for the GPU hang, and making progress on the flushing front. I've looked at kde4 on the i7-3720qm I have here and bingo... Ok, should be able to find out what's causing that.
Comment 78 Chris Wilson 2012-07-26 14:44:24 UTC
And lo, the corruption is gone:

commit 7f3fdef98c1ab2fa27439c3be9810b7a934017ce
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu Jul 26 15:39:05 2012 +0100

    sna/gen7: IVB requires a complete pipeline stall when changing blend modes
    
    Similar to how SandyBridge behaves, I had hoped that with IvyBridge they
    would have made the pipelined operation actually pipelined, but alas.
Comment 79 felix.engemann 2012-07-26 15:08:21 UTC
Now this is strange ....

- installed Ubuntu 12.04 
- git cloned xf86-video-intel 

-> patched with "Reduce max thread count to 42" and "Reduce WM count for IVB GT1 (sna)"

- configured, build, installed 


-> Under UXA same symptom as unpatched X Server wont start properly
-> Under SNA same thing

So WM count patches does not fix the issue on Ubuntu 12.04. Same patches worked for 

- X Server 1.10 (OpenSuse 12.1)
- Kernel 3.1 / 3.4.4 (OpenSuse 12.1)

Now same patched does not work for

- XServer 1.11 (Ubuntu 12.04)
- Kernel 3.2 (Ubuntu 12.04)


Will attach Xorg.0.log and i915_error_state from last test (SNA with WM count patch)
Comment 80 felix.engemann 2012-07-26 15:10:35 UTC
Created attachment 64744 [details]
Xorg.0.log from Ubuntu 12.04 (git compiled + SNA activated + WM count patch)
Comment 81 felix.engemann 2012-07-26 15:11:08 UTC
Created attachment 64745 [details]
i915_error_state from Ubuntu 12.04 (git compiled + SNA activated + WM count patch)
Comment 82 Chris Wilson 2012-07-26 15:19:56 UTC
Oj, we are now into mesa territory, so we have overcome the instant death upon starting X only to die a little later as mesa (libGL) encounters the same issue.

Can you prepare to build mesa, and perhaps test X without using a 3D compositor (e.g. unity2d or kde or gnome-classic) on ubuntu?
Comment 83 Chris Wilson 2012-07-26 15:21:44 UTC
The mesa patch is

diff --git a/src/mesa/drivers/dri/i965/brw_context.c b/src/mesa/drivers/dri/i965
index de8b66c..7a70612 100644
--- a/src/mesa/drivers/dri/i965/brw_context.c
+++ b/src/mesa/drivers/dri/i965/brw_context.c
@@ -240,7 +240,7 @@ brwCreateContext(int api,
    /* WM maximum threads is number of EUs times number of threads per EU. */
    if (intel->gen >= 7) {
       if (intel->gt == 1) {
-        brw->max_wm_threads = 86;
+        brw->max_wm_threads = 42;
         brw->max_vs_threads = 36;
         brw->max_gs_threads = 36;
         brw->urb.size = 128;
Comment 84 felix.engemann 2012-07-26 15:35:20 UTC
good point! unity3d dies right away 

unity2d however works! No corruption and no errors in i915_error_state so far. (under SNA and with WM count patches applied)

will prepare for compiling mesa now ... 

(In reply to comment #82)
> Oj, we are now into mesa territory, so we have overcome the instant death upon
> starting X only to die a little later as mesa (libGL) encounters the same
> issue.
> 
> Can you prepare to build mesa, and perhaps test X without using a 3D compositor
> (e.g. unity2d or kde or gnome-classic) on ubuntu?
Comment 85 felix.engemann 2012-07-26 15:58:45 UTC
(In reply to comment #83)
> The mesa patch is
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_context.c
> b/src/mesa/drivers/dri/i965
> index de8b66c..7a70612 100644
> --- a/src/mesa/drivers/dri/i965/brw_context.c
> +++ b/src/mesa/drivers/dri/i965/brw_context.c
> @@ -240,7 +240,7 @@ brwCreateContext(int api,
>     /* WM maximum threads is number of EUs times number of threads per EU. */
>     if (intel->gen >= 7) {
>        if (intel->gt == 1) {
> -        brw->max_wm_threads = 86;
> +        brw->max_wm_threads = 42;
>          brw->max_vs_threads = 36;
>          brw->max_gs_threads = 36;
>          brw->urb.size = 128;



- git cloned mesa
- installed all kind of dev-packages to meet requirements
- finally ./autogen.sh works ....
- your patch fails with: 

root@engemann-H77:~/DDX/mesa# patch src/mesa/drivers/dri/i965/brw_context.c patch.patch 
patching file src/mesa/drivers/dri/i965/brw_context.c
Hunk #1 FAILED at 240.
1 out of 1 hunk FAILED -- saving rejects to file src/mesa/drivers/dri/i965/brw_context.c.rej

because: 

 brw->max_wm_threads = 86; 

starts two lines later here.

- so i manually changed "brw->max_wm_threads = 86;" to "brw->max_wm_threads = 42;"

- hit "make" - installed bison because yacc was needed 
- hit "make" again and get another error:

make[2]: Verlasse Verzeichnis '/root/DDX/mesa/src/mapi/glapi'
Making all in glsl
make[2]: Betrete Verzeichnis '/root/DDX/mesa/src/glsl'
  GEN    glsl_parser.h
Konflikte: 1 Schiebe/Reduziere
  CC     hash_table.o
  CC     symbol_table.o
  CXX    standalone_scaffolding.o
  CXX    main.o
  CXX    builtin_stubs.o
  LEX    glsl_lexer.cc
make[2]: *** [glsl_lexer.cc] Fehler 1
make[2]: Verlasse Verzeichnis '/root/DDX/mesa/src/glsl'
make[1]: *** [all-recursive] Fehler 1
make[1]: Verlasse Verzeichnis '/root/DDX/mesa/src'
make: *** [all-recursive] Fehler 1


Don't know how to solve this one ...
Comment 86 felix.engemann 2012-07-26 20:38:46 UTC
Ok, now it worked. I ran "make distclean", after that autogen.sh and make again ... Don't know what happened before - but now it is freshly built with: 

"brw->max_wm_threads = 42;" in src/mesa/drivers/dri/i965/brw_context.c

Nothing seems changed ... Same symptoms as before: unity2d working, unity3d crashing ... How can I check if system is using self compiled mesa ?
Comment 87 felix.engemann 2012-07-26 20:45:34 UTC
Ok unity2d seems to run "quite" well. No distortions - no hangs. But if I run glxgears - distortions and hangs are back ...
Comment 88 felix.engemann 2012-07-26 20:48:07 UTC
Created attachment 64754 [details]
i915_error_state from Ubuntu 12.04 (git compiled + SNA activated + WM count patch + unity2d) after running glxgears
Comment 89 felix.engemann 2012-07-26 20:49:16 UTC
Created attachment 64755 [details]
Distortions  from Ubuntu 12.04 (git compiled + SNA activated + WM count patch + unity2d) after running glxgears
Comment 90 felix.engemann 2012-07-26 21:02:00 UTC
hmmm. Think I'm running on wrong mesa. Maybe i missed some option while running autogen.sh ? This is what I get after running it without options:



        prefix:          /usr/local
        exec_prefix:     ${prefix}
        libdir:          ${exec_prefix}/lib
        includedir:      ${prefix}/include

        OpenGL:          yes (ES1: no ES2: no)
        OpenVG:          no

        OSMesa:          no
        DRI drivers:     i915 i965 nouveau r200 radeon swrast
        DRI driver dir:  ${libdir}/dri
        GLX:             DRI-based

        GLU:             yes

        EGL:             yes
        EGL platforms:   x11
        EGL drivers:     builtin:egl_glx builtin:egl_dri2

        llvm:            yes
        llvm-config:     /usr/bin/llvm-config
        llvm-version:    2.9

        Gallium:         yes
        Gallium dirs:    auxiliary drivers state_trackers
        Target dirs:     dri-r300 dri-r600 dri-swrast dri-vmwgfx 
        Winsys dirs:     radeon/drm svga/drm sw sw/dri 
        Driver dirs:     galahad identity llvmpipe noop r300 r600 rbug softpipe svga trace 
        Trackers dirs:   dri 

        Shared libs:     yes
        Static libs:     no

        CFLAGS:          -g -O2 -Wall -std=c99 -Werror=implicit-function-declaration -Werror=missing-prototypes -fno-strict-aliasing -fno-builtin-memcmp -g -O2 -fPIC
        CXXFLAGS:        -g -O2 -Wall -fno-strict-aliasing -fno-builtin-memcmp -fPIC
        Macros:          -D_GNU_SOURCE -DPTHREADS -DUSE_X86_64_ASM -DHAVE_POSIX_MEMALIGN -DUSE_XCB -DGLX_INDIRECT_RENDERING -DGLX_DIRECT_RENDERING -DUSE_EXTERNAL_DXTN_LIB=1 -DIN_DRI_DRIVER -DHAVE_ALIAS -DHAVE_MINCORE -DHAVE_LLVM=0x0209

        PYTHON2:         python2

        Run 'make' to build Mesa
Comment 91 Chris Wilson 2012-07-26 21:04:40 UTC
Ok, so we discovered that there was a bspec update and that IVB GT1 is limited to 48 threads, so I pushed

commit 1ced4f1ddcf30b518e1760c7aa4a5ed4f934b9f5
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu Jul 26 10:50:31 2012 +0100

    Reduce maximum thread count for IVB GT1 to avoid spontaneous combustion
    
    Somewhere along the way it seems that IVB GT1 was reduced to only allow
    a maximum of 48 threads, as revealed in the lastest bspecs.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=52473
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

to take care of the issue in the ddx. As you are aware the issue still remains in mesa.

Testing mesa itself is a bit tricky since Ubuntu install their libGL ahead of /usr/local/lib in the search path. If you are brave, you can just cp your mesa/lib/i965_drv.so to /usr/lib/`uname -m`/dri and overwrite the system library. Or you can try poking around /etc/ld.so.conf to see if you can your library linked first (I haven't had much luck with that approach). Or you can boot to a vt, and run everything by hand with a LD_PRELOAD=/usr/local/lib/libGL.so.
Comment 92 felix.engemann 2012-07-26 21:35:33 UTC
Ok running patched mesa now. Can confirm changing limiting max_wm_threads to 42 in src/mesa/drivers/dri/i965/brw_context.c does not crash anymore. 

OpenGL version string: 2.1 Mesa 8.1-devel (git-e72f206)

Can run glxgears without crashing now. Can start unity3d session without crashing. Only strange thing is - it seems Ubuntu is falling back to unity2d. I have no desktop effects at least ...
Comment 93 Chris Wilson 2012-07-26 21:41:46 UTC
(In reply to comment #92)
> Ok running patched mesa now. Can confirm changing limiting max_wm_threads to 42
> in src/mesa/drivers/dri/i965/brw_context.c does not crash anymore. 
> 
> OpenGL version string: 2.1 Mesa 8.1-devel (git-e72f206)
> 
> Can run glxgears without crashing now. Can start unity3d session without
> crashing. Only strange thing is - it seems Ubuntu is falling back to unity2d. I
> have no desktop effects at least ...

Don't know why that should be... But at least it looks like we on the right path to getting the driver stable.
Comment 94 felix.engemann 2012-07-26 21:45:41 UTC
yes looks good now! I think it's something complete different and unrelated ....
Can continue testing tomorrow - if needed. 

Cheers Felix
Comment 95 Chris Wilson 2012-07-27 08:25:23 UTC
I've pushed a new release of the DDX to fix this bug, thank you many times over for identifying the issue!

I've raised a bug (#52382) against Mesa to get the respective fix into a release there as well.

If you have any more issues, please do let me know about them.

*** This bug has been marked as a duplicate of bug 52382 ***
Comment 96 felix.engemann 2012-07-27 10:46:00 UTC
Thank you Chris for taking care about this issue so quick. 
Will retest new DDX release (2.20.2) on weekend. Hopefully it's picked up by Distros soon :) 

Will open a request on launchpad.

Cheers Felix
Comment 97 Ralf Czekalla 2012-08-03 12:06:34 UTC
With 2.20.0 the SNA acceleration worked wonderfully on my gen4 965GM notebook.
After updating to 2.20.2 X gets stuck very fast after start-up with the
following message:

[    33.859] (II) Loading /usr/lib/xorg/modules/drivers/intel_drv.so
[    33.859] (II) Module intel: vendor="X.Org Foundation"
[    33.859]    compiled for 1.12.3, module version = 2.20.2
[    33.859]    Module class: X.Org Video Driver
[    33.859]    ABI class: X.Org Video Driver, version 12.0
[    33.859] (II) intel: Driver for Intel Integrated Graphics Chipsets: i810,
        i810-dc100, i810e, i815, i830M, 845G, 854, 852GM/855GM, 865G, 915G,
        E7221 (i915), 915GM, 945G, 945GM, 945GME, Pineview GM, Pineview G,
        965G, G35, 965Q, 946GZ, 965GM, 965GME/GLE, G33, Q35, Q33, GM45,
...
[    33.869] (II) intel(0): Creating default Display subsection in Screen
section
        "Default Screen Section" for depth/fbbpp 24/32
[    33.869] (==) intel(0): Depth 24, (--) framebuffer bpp 32
[    33.869] (==) intel(0): RGB weight 888
[    33.869] (==) intel(0): Default visual is TrueColor
[    33.869] (**) intel(0): Option "AccelMethod" "SNA"
[    33.869] (--) intel(0): Integrated Graphics Chipset: Intel(R) 965GM
[    33.870] (**) intel(0): Framebuffer tiled
[    33.870] (**) intel(0): Pixmaps tiled
[    33.870] (**) intel(0): 3D buffers tiled
[    33.870] (**) intel(0): Throttling enabled
[    33.870] (**) intel(0): Delayed flush enabled
[    33.870] (**) intel(0): "Tear free" disabled
[    33.870] (**) intel(0): Forcing per-crtc-pixmaps? no
[    33.870] (II) intel(0): Output LVDS1 has no monitor section
[    33.870] (II) intel(0): found backlight control interface acpi_video0 (type
'firmware')
[    33.912] (II) intel(0): Output VGA1 has no monitor section
[    33.920] (II) intel(0): Output DVI1 has no monitor section
[    33.920] (II) intel(0): Output TV1 has no monitor section
[    33.921] (II) intel(0): EDID for output LVDS1
...
[   184.513] [mi] EQ overflowing.  Additional events will be discarded until
existing events are processed.
[   184.513]
[   184.513] Backtrace:
[   184.513] 0: /usr/bin/Xorg (xorg_backtrace+0x49) [0x81c8bf9]
[   184.513] 1: /usr/bin/Xorg (mieqEnqueue+0x22b) [0x81a6e5b]
[   184.513] 2: /usr/bin/Xorg (0x8048000+0x48375) [0x8090375]
[   184.513] 3: /usr/bin/Xorg (xf86PostMotionEventM+0x25f) [0x80ccf9f]
[   184.514] 4: /usr/bin/Xorg (xf86PostMotionEvent+0x8d) [0x80cd19d]
[   184.514] 5: /usr/lib/xorg/modules/input/synaptics_drv.so
(0xb6862000+0x3afe) [0xb6865afe]
[   184.514] 6: /usr/lib/xorg/modules/input/synaptics_drv.so
(0xb6862000+0x5e1e) [0xb6867e1e]
[   184.514] 7: /usr/bin/Xorg (0x8048000+0x73af1) [0x80bbaf1]
[   184.514] 8: /usr/bin/Xorg (0x8048000+0x9b412) [0x80e3412]
[   184.514] 9: (vdso) (__kernel_sigreturn+0x0) [0xb7779400]
[   184.514] 10: /lib/libc.so.6 (ioctl+0x14) [0xb73c0334]
[   184.514] 11: /usr/lib/libdrm.so.2 (drmIoctl+0x34) [0xb71c2014]
[   184.514] 12: /usr/lib/xorg/modules/drivers/intel_drv.so
(0xb6dd2000+0x349e7) [0xb6e069e7]
[   184.514] 13: /usr/lib/xorg/modules/drivers/intel_drv.so
(0xb6dd2000+0x35aa8) [0xb6e07aa8]
[   184.514] 14: /usr/lib/xorg/modules/drivers/intel_drv.so
(0xb6dd2000+0x5ee63) [0xb6e30e63]
[   184.514] 15: /usr/lib/xorg/modules/drivers/intel_drv.so
(0xb6dd2000+0x6efe5) [0xb6e40fe5]
[   184.514] 16: /usr/bin/Xorg (BlockHandler+0x56) [0x807efc6]
[   184.514] 17: /usr/bin/Xorg (WaitForSomething+0x10b) [0x81c5dab]
[   184.515] 18: /usr/bin/Xorg (0x8048000+0x32d62) [0x807ad62]
[   184.515] 19: /usr/bin/Xorg (0x8048000+0x209ba) [0x80689ba]
[   184.515] 20: /lib/libc.so.6 (__libc_start_main+0xf3) [0xb730c003]
[   184.515] 21: /usr/bin/Xorg (0x8048000+0x20ce9) [0x8068ce9]
[   184.515]
[   184.515] [mi] These backtraces from mieqEnqueue may point to a culprit
higher up the stack.
[   184.515] [mi] mieq is *NOT* the cause.  It is a victim.
[   189.020] (EE) intel(0): Detected a hung GPU, disabling acceleration.
[   189.020] (EE) intel(0): When reporting this, please include
i915_error_state from debugfs and the full dmesg.
[   190.679] [mi] Increasing EQ size to 512 to prevent dropped events.
[   190.680] [mi] EQ processing has resumed after 93 dropped events.
[   190.680] [mi] This may be caused my a misbehaving driver monopolizing the
server's resources.
[   201.938] (II) AIGLX: Suspending AIGLX clients for VT switch

This problem is gone, if I switch back to UXA.

When I check the change log from 
http://download.opensuse.org/repositories/X11:/XOrg/openSUSE_12.1/i586/xf86-video-intel-2.20.2-21.1.i586.rpm
I see this change.
In the other logs of this bug report like Xorg.0.log the messages seems to be quite similar. 

I use openSUSE 12.1 with the newest intel driver version 2.20.2 offered by OBS together with stable kernel 3.4.6 and flavor desktop-i586. 

Thanks 
Ralf
Comment 98 Chris Wilson 2012-08-03 12:25:12 UTC
Nothing at all obvious to do with this bug. Please file a fresh bug report with a full set of Xorg.log, dmesg and i915_error_state.
Comment 99 Ralf Czekalla 2012-08-03 15:10:17 UTC
With 2.20.0 the SNA acceleration worked wonderfully on my gen4 965GM notebook.
After updating to 2.20.2 X gets stuck very fast after start-up with the
following message:

[    33.859] (II) Loading /usr/lib/xorg/modules/drivers/intel_drv.so
[    33.859] (II) Module intel: vendor="X.Org Foundation"
[    33.859]    compiled for 1.12.3, module version = 2.20.2
[    33.859]    Module class: X.Org Video Driver
[    33.859]    ABI class: X.Org Video Driver, version 12.0
[    33.859] (II) intel: Driver for Intel Integrated Graphics Chipsets: i810,
        i810-dc100, i810e, i815, i830M, 845G, 854, 852GM/855GM, 865G, 915G,
        E7221 (i915), 915GM, 945G, 945GM, 945GME, Pineview GM, Pineview G,
        965G, G35, 965Q, 946GZ, 965GM, 965GME/GLE, G33, Q35, Q33, GM45,
...
[    33.869] (II) intel(0): Creating default Display subsection in Screen
section
        "Default Screen Section" for depth/fbbpp 24/32
[    33.869] (==) intel(0): Depth 24, (--) framebuffer bpp 32
[    33.869] (==) intel(0): RGB weight 888
[    33.869] (==) intel(0): Default visual is TrueColor
[    33.869] (**) intel(0): Option "AccelMethod" "SNA"
[    33.869] (--) intel(0): Integrated Graphics Chipset: Intel(R) 965GM
[    33.870] (**) intel(0): Framebuffer tiled
[    33.870] (**) intel(0): Pixmaps tiled
[    33.870] (**) intel(0): 3D buffers tiled
[    33.870] (**) intel(0): Throttling enabled
[    33.870] (**) intel(0): Delayed flush enabled
[    33.870] (**) intel(0): "Tear free" disabled
[    33.870] (**) intel(0): Forcing per-crtc-pixmaps? no
[    33.870] (II) intel(0): Output LVDS1 has no monitor section
[    33.870] (II) intel(0): found backlight control interface acpi_video0 (type
'firmware')
[    33.912] (II) intel(0): Output VGA1 has no monitor section
[    33.920] (II) intel(0): Output DVI1 has no monitor section
[    33.920] (II) intel(0): Output TV1 has no monitor section
[    33.921] (II) intel(0): EDID for output LVDS1
...
[   184.513] [mi] EQ overflowing.  Additional events will be discarded until
existing events are processed.
[   184.513]
[   184.513] Backtrace:
[   184.513] 0: /usr/bin/Xorg (xorg_backtrace+0x49) [0x81c8bf9]
[   184.513] 1: /usr/bin/Xorg (mieqEnqueue+0x22b) [0x81a6e5b]
[   184.513] 2: /usr/bin/Xorg (0x8048000+0x48375) [0x8090375]
[   184.513] 3: /usr/bin/Xorg (xf86PostMotionEventM+0x25f) [0x80ccf9f]
[   184.514] 4: /usr/bin/Xorg (xf86PostMotionEvent+0x8d) [0x80cd19d]
[   184.514] 5: /usr/lib/xorg/modules/input/synaptics_drv.so
(0xb6862000+0x3afe) [0xb6865afe]
[   184.514] 6: /usr/lib/xorg/modules/input/synaptics_drv.so
(0xb6862000+0x5e1e) [0xb6867e1e]
[   184.514] 7: /usr/bin/Xorg (0x8048000+0x73af1) [0x80bbaf1]
[   184.514] 8: /usr/bin/Xorg (0x8048000+0x9b412) [0x80e3412]
[   184.514] 9: (vdso) (__kernel_sigreturn+0x0) [0xb7779400]
[   184.514] 10: /lib/libc.so.6 (ioctl+0x14) [0xb73c0334]
[   184.514] 11: /usr/lib/libdrm.so.2 (drmIoctl+0x34) [0xb71c2014]
[   184.514] 12: /usr/lib/xorg/modules/drivers/intel_drv.so
(0xb6dd2000+0x349e7) [0xb6e069e7]
[   184.514] 13: /usr/lib/xorg/modules/drivers/intel_drv.so
(0xb6dd2000+0x35aa8) [0xb6e07aa8]
[   184.514] 14: /usr/lib/xorg/modules/drivers/intel_drv.so
(0xb6dd2000+0x5ee63) [0xb6e30e63]
[   184.514] 15: /usr/lib/xorg/modules/drivers/intel_drv.so
(0xb6dd2000+0x6efe5) [0xb6e40fe5]
[   184.514] 16: /usr/bin/Xorg (BlockHandler+0x56) [0x807efc6]
[   184.514] 17: /usr/bin/Xorg (WaitForSomething+0x10b) [0x81c5dab]
[   184.515] 18: /usr/bin/Xorg (0x8048000+0x32d62) [0x807ad62]
[   184.515] 19: /usr/bin/Xorg (0x8048000+0x209ba) [0x80689ba]
[   184.515] 20: /lib/libc.so.6 (__libc_start_main+0xf3) [0xb730c003]
[   184.515] 21: /usr/bin/Xorg (0x8048000+0x20ce9) [0x8068ce9]
[   184.515]
[   184.515] [mi] These backtraces from mieqEnqueue may point to a culprit
higher up the stack.
[   184.515] [mi] mieq is *NOT* the cause.  It is a victim.
[   189.020] (EE) intel(0): Detected a hung GPU, disabling acceleration.
[   189.020] (EE) intel(0): When reporting this, please include
i915_error_state from debugfs and the full dmesg.
[   190.679] [mi] Increasing EQ size to 512 to prevent dropped events.
[   190.680] [mi] EQ processing has resumed after 93 dropped events.
[   190.680] [mi] This may be caused my a misbehaving driver monopolizing the
server's resources.
[   201.938] (II) AIGLX: Suspending AIGLX clients for VT switch

This problem is gone, if I switch back to UXA.

When I check the change log from 
http://download.opensuse.org/repositories/X11:/XOrg/openSUSE_12.1/i586/xf86-video-intel-2.20.2-21.1.i586.rpm
I see this change.
In the other logs of this bug report like Xorg.0.log the messages seems to be quite similar. 

I use openSUSE 12.1 with the newest intel driver version 2.20.2 offered by OBS together with stable kernel 3.4.6 and flavor desktop-i586. 

Thanks 
Ralf
Comment 100 Ralf Czekalla 2012-08-03 15:11:25 UTC
WONTFIX for gen4 hardware. 2.20.0 was able to work on GM965
Comment 101 Chris Wilson 2012-08-03 15:15:43 UTC
Ralf, this is not your bug. This is nothing to do with 965gm. The first thing you should do before reporting a bug upstream is first try to reproduce it on the upstream driver. Had you done that you would have found it fixed already.
Comment 102 Chris Wilson 2012-09-13 08:42:41 UTC
*** Bug 54848 has been marked as a duplicate of this bug. ***
Comment 103 Gordon Jin 2012-10-08 08:27:34 UTC

*** This bug has been marked as a duplicate of bug 52382 ***

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.