Created attachment 55974 [details]
Typical appearance of garbage
After running memory intensive programs I often have garbled UIs, most often in GTK+ programs. I started to suspect memory read back from disk for this problem and run a little test driving out a lot of memory to disk and artifacts showed. Someone else on gentoo.org said he had exactly the same sort of artifacts after hibernating to disk.
-- chipset: "Intel 965GM"
-- system architecture: i686 32-bit
-- xf86-video-intel: "2.17.0"
-- xserver: "X.Org X Server 1.11.2"
-- mesa: 7.11.2
-- libdrm: 2.4.27
-- kernel: "Linux beluga 3.1.6-gentoo #1 SMP PREEMPT Sun Jan 22 00:29:26 CET 2012 i686 Intel(R) Core(TM)2 Duo CPU T7300 @ 2.00GHz GenuineIntel GNU/Linux"
-- Linux distribution: gentoo
-- Machine or mobo model: i686 Intel(R) Core(TM)2 Duo CPU T7300 @ 2.00GHz GenuineIntel
-- Display connector: LVDS1
Unable to build intel-gpu-tools due to xorg version mismatch.
Created attachment 55975 [details]
Created attachment 55976 [details]
Created attachment 55977 [details]
Output of xrandr --verbose
Possibly related bug 28813. Or we may just have never got the swizzling correct for crestline?
Johnny, can you please download intel-gpu_tools from http://cgit.freedesktop.org/xorg/app/intel-gpu-tools/ and run through a make test. However we may need to throw in a memory hog in order to exercise the swap paths.
Ok, a few things:
- Can you try to grab another screenshot, preferrably where the background image (or any other large image) shows some corruptions? Because gui elements are usually pretty small and have mostly uniform&flat coloring it's much harder to see the pattern.
- Please attach the output of intel_reg_dumper from the intel-gpu-tools (at last v1.1, prefarrably git). You need to install a bunch of dependencies for that to compile.
- As Chris mentioned, please run the i-g-t testsuite. If that one blows up, the relevant tests are tests/gem_tiled_*, you can run these manually.
Created attachment 56003 [details]
I have uploaded a shot of the desktop with damages.
As I already wrote, I can't build intel-gpu-tools because of an error. Here's the output of autogen.sh:
autoreconf-2.68: Entering directory `.'
autoreconf-2.68: configure.ac: not using Gettext
autoreconf-2.68: running: aclocal
configure.ac:52: error: xorg-macros version 1.16 or higher is required but 1.15.0 found
/usr/share/aclocal/xorg-macros.m4:39: XORG_MACROS_VERSION is expanded from...
configure.ac:52: the top level
autom4te-2.68: /usr/bin/m4 failed with exit status: 1
aclocal-1.11: /usr/bin/autom4te-2.68 failed with exit status: 1
autoreconf-2.68: aclocal failed with exit status: 1
I guess to succeed, I have to have a newer util-macros package for which I have to have a newer X server. I don't know whether I want to do that. Can't we get around without it?
> --- Comment #7 from Johnny Wezel <freedesktop-
> I guess to succeed, I have to have a newer util-macros package for which I have
> to have a newer X server. I don't know whether I want to do that. Can't we get
> around without it?
Nope, newer X server not required at all. Either you upgarde the dev
package (xutils-dev on debian) or you grab the single missing file
Run ./autogen.sh and copy xorg-macros.m4 into /usr/share/aclocal
That should make this work. Btw for quick questions like this it's
usually faster to ask for help on irc, #intel-gfx on the freenode
Created attachment 56044 [details]
Output of intel_reg_dumper
OK, got it with intel_reg_dumper (had to update libdrm and check whether the bug is still there because of that [yes, it is])
Thanks a lot for the reg dump and the screenshot, perfect match with what Chris suspected. Can you please also grab the same registers as in https://bugs.freedesktop.org/show_bug.cgi?id=28813#c21
0x10200 : 0xF0002
0x10204 : 0x0
0x100E0 : 0x0
0x11234 : 0x910C1800
0x11334 : 0x910C1800
Can you try to set bit20 in register 0x10204 like this?
intel_reg_write 0x10204 0x100000
Note though that it's unclear from the documentation what this bit exactly does, and it has the potential to corrupt system memory (and not just graphics stuff). So I highly advise you to try this on a throw-away disk/installation.
But it's the only thing I could find, so please try it if you can.
This is the output of the command:
Value before: 0x0
Value after: 0x0
There is no effect from the command. Problems persist.
I'm not sure whether this helps but another effect of swapped back memory is that in GTK+ programs, icons lose their images, like shown in the last screenshot.
Created attachment 56154 [details]
Damaged GTK+ icons
There is no way to make the icon's images to reappear.
Ok, so the hw doesn't allow this bit to be flipped after initialization. Which makes sense. I need to do more documentation reading and also some patch writing before I'll have something new for you to test.
For the damaged icons: Does this not happen when swap is disabled?
The icons look more like the ddx bug:
Author: Chris Wilson <firstname.lastname@example.org>
Date: Thu Nov 3 20:41:31 2011 +0000
uxa: Remove caching of surface binding location
If the pixmap were to be used multiple times within a batch with
mulitple formats, the cache would only return the initial location with
the incorrect format and so cause rendering glitches. For instance, GTK+
uses the same pixmap as an xrgb source and as an argb mask in order to
premultiply and composite in a single pass. Rather than introduce an
overly complication caching (handle, format) mechanism, kiss and remove
the invalid implementation.
Signed-off-by: Chris Wilson <email@example.com>
release in 2.16.902
The icons blacken only after swapping. IMHO this is not a GTK+ bug.
*** Bug 46178 has been marked as a duplicate of this bug. ***
Got this again, under GNOME Shell, after using suspend to disk. (I've a screenshot if you need more of them.) Restarting GNOME Shell with Alt-F2 r fixed the corruption.
Is there anything I can to do help debug this?
(In reply to comment #20)
> Got this again, under GNOME Shell, after using suspend to disk. (I've a
> screenshot if you need more of them.) Restarting GNOME Shell with Alt-F2 r
> fixed the corruption.
> Is there anything I can to do help debug this?
Unfortunately not. I have a machine which has this issue, too. And we have tests in i-g-t that can easily reproduce it. The problem is simply that I have no idea how to fix it (without rewriting the entire driver, that is).
What's the grand plan? Everyone needs to be aware of physical page locations and swizzling again?
One thing you could try is to grab the latest verion of intel-gpu-tools from git and run the testsuite with make test. That should give you a nice set of failing tests with "tiled" in their names.
Then try to decrease the amount of memory linux uses with the mem=xxxm boot parameter, until all these tests with 'tiled' work reliably. The best result would be to figure out that things work for mem=xxxm, but not for mem=xxx+1m.
Created attachment 81971 [details] [review]
Hack to prevent movement of swizzled pages
A hack for you to please test.
Timeout. What is the status of the bug?
Johnny, still an issue? Did you try the proposed hack patch?
Hi i had the same problem in debian Wheezy and i managed to solve changing my hibernation method to TuxOnIce.
I hope this extra info could help solve the problem.
Excuse me, what are the workarounds for the problem besides patching the drm kernel module? is it possible to invalidate pixmap cache that GTK uses?
2nd: the bug is in "NEEDINFO" state. What kind of info is needed?
Disabling swap, replace with more ram.
And by the time, please retest with latest drm-intel-nightly
(In reply to Rodrigo Vivi from comment #29)
> And by the time, please retest with latest drm-intel-nightly
Known hardware issue that remains unresolved. The patch we want tested is attached to this bug.
Ok I've finally gotten around to polish Chris' patch and update testcase:
As soon as I have a few tested-by reports I'll pull this in, so please go wild. Patch applies on top of latest drm-intel-nightly.
Workaround is now merged into drm-intel-nightly, should land in 3.19
Author: Daniel Vetter <firstname.lastname@example.org>
Date: Thu Nov 20 09:26:30 2014 +0100
drm/i915: Pin tiled objects for L-shaped configs
Note that this is v2, v1 was a bit WARNING-happy.
(In reply to Daniel Vetter from comment #32)
> Workaround is now merged into drm-intel-nightly, should land in 3.19
> commit 14a369b6c9bdb40cebdac5a248321a05119fe02b
> Author: Daniel Vetter <email@example.com>
> Date: Thu Nov 20 09:26:30 2014 +0100
> drm/i915: Pin tiled objects for L-shaped configs
> Note that this is v2, v1 was a bit WARNING-happy.
Assuming the problem was fixed by the patch above, since it has been almost a month. Please reopen if necessary.
I have this chip, running with version 2:2.99.916+git201412 of the Intel driver from xorg-edgers didn't help. I used the DebugWait trick from bug 37326 and it seemed to help.
Created attachment 111328 [details]
I have 3145990144 bytes of RAM and I think it may also be bug 55000. But since that bug does not mention the DebugWait workaround, and it works for me, I don't know what to think.
I was wrong, DebugWait didn't help. I disabled DebugWait again of course.
It looks as if upgrading to kernel v3.19-rc1 did though.
I got the kernel here: http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.19-rc1-vivid/
However, the commit ID of the patch doesn't match the one Daniel Vetter gave above. I found this though, with the same commit description: https://github.com/torvalds/linux/commit/656bfa3afc14e45e2d9e1624bf60d79b3beb12f2
In debugfs, I can see that the new code paths are getting hit:
# cat /sys/kernel/debug/dri/0/i915_swizzle_info
bit6 swizzle for X-tiling = bit9/bit10/bit11
bit6 swizzle for Y-tiling = bit9/bit11
DDC = 0x000f0002
DDC2 = 0x00000000
C0DRB3 = 0x0000
C1DRB3 = 0x0000
L-shaped memory detected