Bug 43144 - Render corruption of moved window
Render corruption of moved window
Status: RESOLVED FIXED
Product: xorg
Classification: Unclassified
Component: Driver/intel
git
x86-64 (AMD64) Linux (All)
: medium normal
Assigned To: Chris Wilson
Xorg Project Team
:
: 43587 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-11-21 10:36 UTC by Lukas Hejtmanek
Modified: 2012-03-26 01:44 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
Screen corruption (6.22 KB, image/png)
2011-11-21 10:36 UTC, Lukas Hejtmanek
no flags Details
Xorg log (29.81 KB, text/plain)
2011-11-21 11:25 UTC, Lukas Hejtmanek
no flags Details
i915_error_state (1.99 MB, text/plain)
2011-12-01 06:43 UTC, Lukas Hejtmanek
no flags Details
i915_error_state #2 (1.99 MB, text/plain)
2011-12-02 03:25 UTC, Lukas Hejtmanek
no flags Details
i915_error_state_semaphores1 (2.05 MB, text/plain)
2011-12-08 11:22 UTC, Nick Hebrun
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Lukas Hejtmanek 2011-11-21 10:36:41 UTC
Created attachment 53745 [details]
Screen corruption

Hello,

as of git commit 3b9479dc39d32fd97f80c1e5e0fac67d36ee5e40, I got window content corruption, if I move window from right to left :) (The opposite dirrection is not affected nor moving up/down).

See attached screenshot.
Comment 1 Chris Wilson 2011-11-21 11:19:11 UTC
Can you please describe your configuration (in particular WM and compositing mode) along with an Xorg.log?
Comment 2 Lukas Hejtmanek 2011-11-21 11:25:01 UTC
(In reply to comment #1)
> Can you please describe your configuration (in particular WM and compositing
> mode) along with an Xorg.log?

Xorg.log attached. WM is OpenBox 3.5.0, no composing (or at least no intentional).
The application is xfce4-terminal. But it happens with different application as well, e.g., stardict, skype. (not gtk2/3 related, skype is Qt).
Comment 3 Lukas Hejtmanek 2011-11-21 11:25:46 UTC
Created attachment 53748 [details]
Xorg log
Comment 4 Lukas Hejtmanek 2011-12-01 02:32:32 UTC
(In reply to comment #3)
> Created attachment 53748 [details]
> Xorg log

Seems to be gone with the current git head.
Comment 5 Chris Wilson 2011-12-01 02:46:05 UTC
Still none the wiser, so keep an eye out for its reoccurrence. Thanks for the report and following-up.
Comment 6 Lukas Hejtmanek 2011-12-01 03:12:28 UTC
(In reply to comment #5)
> Still none the wiser, so keep an eye out for its reoccurrence. Thanks for the
> report and following-up.

I think this one could be the fix.
sna: Avoid the double application of drawable offsets for tiled spans
Comment 7 Chris Wilson 2011-12-01 03:24:51 UTC
Could be... I hope not as that implies some half-evil code. Sounds like you have an interesting setup to analyze. ;-)
Comment 8 Lukas Hejtmanek 2011-12-01 06:05:40 UTC
(In reply to comment #7)
> Could be... I hope not as that implies some half-evil code. Sounds like you
> have an interesting setup to analyze. ;-)

Well, it's not. It seems that everything is OK until GPU hang/reset. After GPU reset, I got the corruption again.

Any ideas?
Comment 9 Lukas Hejtmanek 2011-12-01 06:22:17 UTC
this could be related:

[ 4111.642204] ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[12383.273486] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[12383.273492] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[12383.283560] [drm:i915_wait_request] *ERROR* i915_wait_request returns -11 (awaiting 618518 at 618513, next 618519)
[12383.377526] [drm:ironlake_update_pch_refclk] *ERROR* enabling SSC on PCH
[12398.607001] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[12398.607015] [drm:i915_wait_request] *ERROR* i915_wait_request returns -11 (awaiting 618523 at 618513, next 618524)
[12398.694399] [drm:ironlake_update_pch_refclk] *ERROR* enabling SSC on PCH

and tons of the following messages repeated:

[12461.190594] ------------[ cut here ]------------
[12461.190599] WARNING: at drivers/gpu/drm/i915/i915_drv.c:372 gen6_gt_force_wake_put+0x46/0x50 [i915]()
[12461.190600] Hardware name: 4178A4G
[12461.190601] Modules linked in: i915 fbcon tileblit font bitblit softcursor drm_kms_helper drm fb fbdev i2c_algo_bit cfbcopyarea cfbimgblt cfbfillrect uvcvideo videodev v4l2_compat_ioctl32 bnep aesni_intel cryptd aes_x86_64 aes_generic ecb bluetooth thinkpad_acpi hwmon snd_hda_codec_conexant arc4 iwlagn mac80211 cfg80211 e1000e intel_agp snd_hda_intel intel_gtt snd_hda_codec ehci_hcd rfkill uinput [last unloaded: sunrpc]
[12461.190619] Pid: 0, comm: kworker/0:0 Tainted: G        W   3.1.0+ #154
[12461.190620] Call Trace:
[12461.190621]  <IRQ>  [<ffffffff8103995b>] ? warn_slowpath_common+0x7b/0xc0
[12461.190628]  [<ffffffffa0313686>] ? gen6_gt_force_wake_put+0x46/0x50 [i915]
[12461.190633]  [<ffffffffa031a494>] ? i915_handle_error+0x84/0xc20 [i915]
[12461.190637]  [<ffffffffa0313686>] ? gen6_gt_force_wake_put+0x46/0x50 [i915]
[12461.190642]  [<ffffffffa031bf53>] ? i915_hangcheck_elapsed+0x253/0x350 [i915]
[12461.190645]  [<ffffffff810461cb>] ? cascade+0x7b/0xa0
[12461.190650]  [<ffffffffa031bd00>] ? i915_vblank_swap+0x10/0x10 [i915]
[12461.190652]  [<ffffffff81046306>] ? run_timer_softirq+0x116/0x270
[12461.190655]  [<ffffffff8105f523>] ? ktime_get+0x63/0xf0
[12461.190657]  [<ffffffff8103f838>] ? __do_softirq+0x98/0x120
[12461.190659]  [<ffffffff814646ac>] ? call_softirq+0x1c/0x30
[12461.190662]  [<ffffffff810048fd>] ? do_softirq+0x4d/0x80
[12461.190664]  [<ffffffff8103fbee>] ? irq_exit+0x8e/0xd0
[12461.190667]  [<ffffffff8101c1e8>] ? smp_apic_timer_interrupt+0x68/0xa0
[12461.190669]  [<ffffffff81463c4b>] ? apic_timer_interrupt+0x6b/0x70
[12461.190670]  <EOI>  [<ffffffff81059f8a>] ? __hrtimer_start_range_ns+0x16a/0x3e0
[12461.190675]  [<ffffffff812077b2>] ? intel_idle+0xc2/0x110
[12461.190678]  [<ffffffff8120778e>] ? intel_idle+0x9e/0x110
[12461.190681]  [<ffffffff8132ae77>] ? cpuidle_idle_call+0x97/0xe0
[12461.190683]  [<ffffffff810011da>] ? cpu_idle+0xba/0x110
[12461.190686]  [<ffffffff814559e6>] ? start_secondary+0x1f5/0x1fb
[12461.190687] ---[ end trace 989737665136fa51 ]---
[12461.285621] [drm:ironlake_update_pch_refclk] *ERROR* enabling SSC on PCH
Comment 10 Chris Wilson 2011-12-01 06:39:08 UTC
How soon after the hang do you see corruption? There will be some corruption inevitably as a result of lost data due to the hang.

I just want to establish whether we misrender in the acceleration or fallback code.
Comment 11 Chris Wilson 2011-12-01 06:40:13 UTC
And can you attach the /sys/kernel/debug/dri/0/i915_error_state for the hang?
Comment 12 Lukas Hejtmanek 2011-12-01 06:43:18 UTC
Created attachment 54013 [details]
i915_error_state
Comment 13 Lukas Hejtmanek 2011-12-01 06:44:52 UTC
(In reply to comment #10)
> How soon after the hang do you see corruption? There will be some corruption
> inevitably as a result of lost data due to the hang.

anytime since the hang, so it does not looks like corruption during the hang.
Comment 14 Chris Wilson 2011-12-01 06:53:53 UTC
You don't happen to have FBC enabled do you? cat /sys/kernel/debug/dri/0/i915_fbc_status

If you do can you test without, i915.i915_enable_fbc=0?
Comment 15 Lukas Hejtmanek 2011-12-01 07:11:34 UTC
(In reply to comment #14)
> You don't happen to have FBC enabled do you? cat
> /sys/kernel/debug/dri/0/i915_fbc_status
> 
> If you do can you test without, i915.i915_enable_fbc=0?

I have FBC enabled. I try to test without FBC. Is it possible to set the parameter on the fly?
Comment 16 Chris Wilson 2011-12-01 07:21:08 UTC
You can try echo 0 > /sys/module/i915/parameters/i915_enable_fbc and restarting X and then cat /sys/kernel/debug/dri/0/i915_fbc_status to confirm
Comment 17 Lukas Hejtmanek 2011-12-02 02:50:49 UTC
(In reply to comment #16)
> You can try echo 0 > /sys/module/i915/parameters/i915_enable_fbc and restarting
> X and then cat /sys/kernel/debug/dri/0/i915_fbc_status to confirm

well, it stil happes even if fbc disabled.

cat /sys/kernel/debug/dri/0/i915_fbc_status
FBC disabled: disabled per module param (default off)

btw, it seems that system is more hang prone if I run forcewaked (to prevent render issues)
Comment 18 Chris Wilson 2011-12-02 02:56:08 UTC
Can you attach the error state so that I can be sure it is the same problem? rc6 issues have been related to VTd/iommu in the past, can you either disable VTd in the BIOS or pass intel_iommu=off
Comment 19 Lukas Hejtmanek 2011-12-02 03:25:06 UTC
(In reply to comment #18)
> Can you attach the error state so that I can be sure it is the same problem?
> rc6 issues have been related to VTd/iommu in the past, can you either disable
> VTd in the BIOS or pass intel_iommu=off

VTd is disabled in the BIOS all the time, I do not use it.

error state attached.
Comment 20 Lukas Hejtmanek 2011-12-02 03:25:57 UTC
Created attachment 54065 [details]
i915_error_state #2
Comment 21 Chris Wilson 2011-12-02 03:31:55 UTC
Ok, that does look to be consistent with the first. A nuisance, as I had seen a very similar error (along with performance issues) go away after disabling FBC. A further check is that I was suffering x11perf -dot performance of around 300Kdot/s with FBC enabled and 70Mdot/s without.
Comment 22 Lukas Hejtmanek 2011-12-02 03:53:21 UTC
(In reply to comment #21)
> Ok, that does look to be consistent with the first. A nuisance, as I had seen a
> very similar error (along with performance issues) go away after disabling FBC.
> A further check is that I was suffering x11perf -dot performance of around
> 300Kdot/s with FBC enabled and 70Mdot/s without.

I noticed performance drop with my favorite glxgears. I see drop from 6000fps to 3000fps (with FBC/without FBC, resp.).

I got 180Mdot/s without FBC. Don't know how much with FBC.
Comment 23 Chris Wilson 2011-12-07 11:23:45 UTC
*** Bug 43587 has been marked as a duplicate of this bug. ***
Comment 24 Lukas Hejtmanek 2011-12-07 11:36:49 UTC
(In reply to comment #23)
> *** Bug 43587 has been marked as a duplicate of this bug. ***

not sure if that's the same bug. the hang is unrelated to window move itself, the corruption of moved window just happens after any hang..
Comment 25 Chris Wilson 2011-12-07 13:02:36 UTC
(In reply to comment #24)
> (In reply to comment #23)
> > *** Bug 43587 has been marked as a duplicate of this bug. ***
> 
> not sure if that's the same bug. the hang is unrelated to window move itself,
> the corruption of moved window just happens after any hang..

Just go with me when I say the error states are the same, how you trigger it is up to you...
Comment 26 Nick Hebrun 2011-12-08 11:20:40 UTC
Thanks for your response in my bugreport Chris.

i915.i915_enable_fbc=0
----------------------
fbc is still enabled... maybe there is no option like this in my module (xf86-video-intel-2.17.0-r2)?

black ~ # grep . /sys/module/i915/parameters/*
/sys/module/i915/parameters/fbpercrtc:0
/sys/module/i915/parameters/i915_enable_rc6:0
/sys/module/i915/parameters/lvds_downclock:0
/sys/module/i915/parameters/lvds_use_ssc:1
/sys/module/i915/parameters/modeset:-1
/sys/module/i915/parameters/panel_ignore_lid:0
/sys/module/i915/parameters/powersave:1
/sys/module/i915/parameters/reset:Y
/sys/module/i915/parameters/semaphores:0
/sys/module/i915/parameters/vbt_sdvo_panel_type:-1


and just for test:
i915.semaphores=1
-----------------
drm:i915_hangcheck_elapsed after open urxvt and "less /var/log/messages"
i915_error_state is attached


> And we may as try with rc6 and semaphores disabled for completeness.
Do you mean rc6 AND semaphores disabled or rc6 enabled and semaphores disabled too?
Comment 27 Nick Hebrun 2011-12-08 11:22:32 UTC
Created attachment 54249 [details]
i915_error_state_semaphores1
Comment 28 Chris Wilson 2012-03-26 01:44:54 UTC
I believe these are all related to the underlying bug:

commit c501ae7f332cdaf42e31af30b72b4b66cbbb1604
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Dec 14 13:57:23 2011 +0100

    drm/i915: Only clear the GPU domains upon a successful finish
    
    By clearing the GPU read domains before waiting upon the buffer, we run
    the risk of the wait being interrupted and the domains prematurely
    cleared. The next time we attempt to wait upon the buffer (after
    userspace handles the signal), we believe that the buffer is idle and so
    skip the wait.
    
    There are a number of bugs across all generations which show signs of an
    overly haste reuse of active buffers.
    
    Such as:
    
      https://bugs.freedesktop.org/show_bug.cgi?id=29046
      https://bugs.freedesktop.org/show_bug.cgi?id=35863
      https://bugs.freedesktop.org/show_bug.cgi?id=38952
      https://bugs.freedesktop.org/show_bug.cgi?id=40282
      https://bugs.freedesktop.org/show_bug.cgi?id=41098
      https://bugs.freedesktop.org/show_bug.cgi?id=41102
      https://bugs.freedesktop.org/show_bug.cgi?id=41284
      https://bugs.freedesktop.org/show_bug.cgi?id=42141
    
    A couple of those pre-date i915_gem_object_finish_gpu(), so may be
    unrelated (such as a wild write from a userspace command buffer), but
    this does look like a convincing cause for most of those bugs.
    
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: stable@kernel.org
    Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
    Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
    Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>