Created attachment 67409 [details]
Intel Corporation Mobile 4 Series Chipset Integrated Graphics Controller (rev 07)
Fedora rawhide, KDE
After running OpenGL applications, I often notice severe 2D corruption (see attachments). The corruptions are not "static", but change every time a redraw is triggered by some action.
Despite many attempts, I have not managed to find a way to reliably and immediately reproduce the issue.
- Seems definitely to be a consequence of running 3D applications: running only 2D applications, I noticed no graphics glitch for two weeks in a row (without reboot)
- Cannot reproduce when switching to the discreet graphics (ATI)
- Disabling compositing does not help (actually, makes it even worse, since compositing triggers more redraws)
- First noticed with the xorg-x11-drv-intel-2.20.x series
- Affects Gtk applications in particular. Qt ones are way less affected.
- Usually, some time after the 3D application has quit, the system is able to recover (no more glitches)
- sample opengl application which triggers the issue
Created attachment 67410 [details]
Created attachment 67411 [details]
Created attachment 67412 [details]
Note: the diagonal pattern is a glitch, not the actual image!
Sample application: http://n.ethz.ch/~smani/download/SampleApp.tar.gz
Can you please confirm that the Xorg.log is from a session after you start seeing the corruption?
Yes, I confirm that.
Dave Airlie has been chasing a similar-ish bug involving glyph corruption on gm45/ilk, with a potential bisect in 3.5-rc1. His bisections suggest that pwrite is involved, and I've been trying to reproduce this by stressing those paths, in particular the unmappable region of the GTT.
If you get the chance, can you please grab /sys/kernel/debug/dri/0/i915_gem_objects at the time you see corruption? If you can also run 'trace-cmd record -e i915' at that time and attach the output of 'trace-cmd report' that would be very informative.
3.5-rc1 may well be when issues started here too!
I've managed to reproduce the issue as follows:
- start alienarena
- put graphics settings to highest
- start a game, but press esc to return to the menu unlock the mouse from the window (game still runs "behind" the menu though)
- click around in normal 2d apps (i.e. just browse the web in firefox)
(It took me about 10 minutes to reproduce)
- report_3d (trace-cmd report with alienarena running, with glitches observable)
- report_2d (trace-cmd report after alienarena running, with glitches observable)
Find here: http://n.ethz.ch/~smani/download/files.tar.xz
Created attachment 67912 [details] [review]
disable cpu relocs
Can you please test this quick debug hack?
So far surviving my stress tests... (i.e. no glitches)
Can you also please test drm-intel-next-queued from http://cgit.freedesktop.org/~danvet/drm-intel as the use of cpu relocations and flushing is further modified in -next?
Created attachment 68011 [details]
Kernel backtrace (unrelated)
Looking good so far, though will do some OpenGL development tomorrow to further test.
Unrelated: I got the attached kernel backtrace (fedora's automatic bug reporting tool notified me just after login).
(In reply to comment #12)
> Looking good so far, though will do some OpenGL development tomorrow to
> further test.
Just to clarify: Does it look good so far with my patch applied, or when running drm-intel-next?
I tested one day with your patch applied, and am now testing with drm-next. In both cases, I didn't encounter the issue (yet?).
Just a status update: so far I haven't encountered the issue again (kernel 3.6 + drm-next). I'll keep testing for another week, if things keep working, then I guess this issue can be marked as solved.
(In reply to comment #15)
> Just a status update: so far I haven't encountered the issue again (kernel
> 3.6 + drm-next). I'll keep testing for another week, if things keep working,
> then I guess this issue can be marked as solved.
The tricky part is then working out the minimal fix for 3.5/3.6. I think something like http://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=for-airlied&id=9c1188292c9da53bbf29799ed5f682029e8c9583 should work around the issue and be backportable. Just I have no idea what the actual underlying bug along that path is.
Created attachment 68296 [details] [review]
Invaliate all state caches before a patch
To test a theory I have that we miss a GPU flush after a CPU reloc, please could you test this patch on top of 3.5/3.6?
Created attachment 68297 [details] [review]
Invaliate all state caches before a batch
Created attachment 68364 [details]
Patch for 3.6.1
The patch does not apply to 3.6.1, I'd change it as attached - is this correct?
In the end,
Author: Chris Wilson <firstname.lastname@example.org>
Date: Thu Aug 23 13:12:52 2012 +0100
drm/i915: Use cpu relocations if the object is in the GTT but not mappable
This prevents the case of unbinding the object in order to process the
relocations through the GTT and then rebinding it only to then proceed
to use cpu relocations as the object is now in the CPU write domain. By
choosing to use cpu relocations up front, we can therefore avoid the
Signed-off-by: Chris Wilson <email@example.com>
Signed-off-by: Daniel Vetter <firstname.lastname@example.org>
was chosen as the patch to be sent forthwith to stable@.