43144 – Render corruption of moved window

Bug 43144 - Render corruption of moved window

Summary: Render corruption of moved window

Status:	RESOLVED FIXED

Alias:	None

Product:	xorg
Classification:	Unclassified
Component:	Driver/intel (show other bugs)
Version:	git
Hardware:	x86-64 (AMD64) Linux (All)

Importance:	medium normal
Assignee:	Chris Wilson
QA Contact:	Xorg Project Team

URL:
Whiteboard:
Keywords:

Duplicates (1):	43587 (view as bug list)
Depends on:
Blocks:

Reported:	2011-11-21 10:36 UTC by Lukas Hejtmanek
Modified:	2012-03-26 01:44 UTC (History)
CC List:	1 user (show)

See Also:
i915 platform:
i915 features:

Attachments
Screen corruption (6.22 KB, image/png) 2011-11-21 10:36 UTC, Lukas Hejtmanek	no flags	Details
Xorg log (29.81 KB, text/plain) 2011-11-21 11:25 UTC, Lukas Hejtmanek	no flags	Details
i915_error_state (1.99 MB, text/plain) 2011-12-01 06:43 UTC, Lukas Hejtmanek	no flags	Details
i915_error_state #2 (1.99 MB, text/plain) 2011-12-02 03:25 UTC, Lukas Hejtmanek	no flags	Details
i915_error_state_semaphores1 (2.05 MB, text/plain) 2011-12-08 11:22 UTC, delete	no flags	Details
View All

Description Lukas Hejtmanek 2011-11-21 10:36:41 UTC

Created attachment 53745 [details]
Screen corruption

Hello,

as of git commit 3b9479dc39d32fd97f80c1e5e0fac67d36ee5e40, I got window content corruption, if I move window from right to left :) (The opposite dirrection is not affected nor moving up/down).

See attached screenshot.

Comment 1 Chris Wilson 2011-11-21 11:19:11 UTC

Can you please describe your configuration (in particular WM and compositing mode) along with an Xorg.log?

Comment 2 Lukas Hejtmanek 2011-11-21 11:25:01 UTC

(In reply to comment #1)
> Can you please describe your configuration (in particular WM and compositing
> mode) along with an Xorg.log?

Xorg.log attached. WM is OpenBox 3.5.0, no composing (or at least no intentional).
The application is xfce4-terminal. But it happens with different application as well, e.g., stardict, skype. (not gtk2/3 related, skype is Qt).

Comment 3 Lukas Hejtmanek 2011-11-21 11:25:46 UTC

Created attachment 53748 [details]
Xorg log

Comment 4 Lukas Hejtmanek 2011-12-01 02:32:32 UTC

(In reply to comment #3)
> Created attachment 53748 [details]
> Xorg log

Seems to be gone with the current git head.

Comment 5 Chris Wilson 2011-12-01 02:46:05 UTC

Still none the wiser, so keep an eye out for its reoccurrence. Thanks for the report and following-up.

Comment 6 Lukas Hejtmanek 2011-12-01 03:12:28 UTC

(In reply to comment #5)
> Still none the wiser, so keep an eye out for its reoccurrence. Thanks for the
> report and following-up.

I think this one could be the fix.
sna: Avoid the double application of drawable offsets for tiled spans

Comment 7 Chris Wilson 2011-12-01 03:24:51 UTC

Could be... I hope not as that implies some half-evil code. Sounds like you have an interesting setup to analyze. ;-)

Comment 8 Lukas Hejtmanek 2011-12-01 06:05:40 UTC

(In reply to comment #7)
> Could be... I hope not as that implies some half-evil code. Sounds like you
> have an interesting setup to analyze. ;-)

Well, it's not. It seems that everything is OK until GPU hang/reset. After GPU reset, I got the corruption again.

Any ideas?

Comment 9 Lukas Hejtmanek 2011-12-01 06:22:17 UTC

this could be related:

[ 4111.642204] ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[12383.273486] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[12383.273492] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[12383.283560] [drm:i915_wait_request] *ERROR* i915_wait_request returns -11 (awaiting 618518 at 618513, next 618519)
[12383.377526] [drm:ironlake_update_pch_refclk] *ERROR* enabling SSC on PCH
[12398.607001] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[12398.607015] [drm:i915_wait_request] *ERROR* i915_wait_request returns -11 (awaiting 618523 at 618513, next 618524)
[12398.694399] [drm:ironlake_update_pch_refclk] *ERROR* enabling SSC on PCH

and tons of the following messages repeated:

[12461.190594] ------------[ cut here ]------------
[12461.190599] WARNING: at drivers/gpu/drm/i915/i915_drv.c:372 gen6_gt_force_wake_put+0x46/0x50 [i915]()
[12461.190600] Hardware name: 4178A4G
[12461.190601] Modules linked in: i915 fbcon tileblit font bitblit softcursor drm_kms_helper drm fb fbdev i2c_algo_bit cfbcopyarea cfbimgblt cfbfillrect uvcvideo videodev v4l2_compat_ioctl32 bnep aesni_intel cryptd aes_x86_64 aes_generic ecb bluetooth thinkpad_acpi hwmon snd_hda_codec_conexant arc4 iwlagn mac80211 cfg80211 e1000e intel_agp snd_hda_intel intel_gtt snd_hda_codec ehci_hcd rfkill uinput [last unloaded: sunrpc]
[12461.190619] Pid: 0, comm: kworker/0:0 Tainted: G        W   3.1.0+ #154
[12461.190620] Call Trace:
[12461.190621]  <IRQ>  [<ffffffff8103995b>] ? warn_slowpath_common+0x7b/0xc0
[12461.190628]  [<ffffffffa0313686>] ? gen6_gt_force_wake_put+0x46/0x50 [i915]
[12461.190633]  [<ffffffffa031a494>] ? i915_handle_error+0x84/0xc20 [i915]
[12461.190637]  [<ffffffffa0313686>] ? gen6_gt_force_wake_put+0x46/0x50 [i915]
[12461.190642]  [<ffffffffa031bf53>] ? i915_hangcheck_elapsed+0x253/0x350 [i915]
[12461.190645]  [<ffffffff810461cb>] ? cascade+0x7b/0xa0
[12461.190650]  [<ffffffffa031bd00>] ? i915_vblank_swap+0x10/0x10 [i915]
[12461.190652]  [<ffffffff81046306>] ? run_timer_softirq+0x116/0x270
[12461.190655]  [<ffffffff8105f523>] ? ktime_get+0x63/0xf0
[12461.190657]  [<ffffffff8103f838>] ? __do_softirq+0x98/0x120
[12461.190659]  [<ffffffff814646ac>] ? call_softirq+0x1c/0x30
[12461.190662]  [<ffffffff810048fd>] ? do_softirq+0x4d/0x80
[12461.190664]  [<ffffffff8103fbee>] ? irq_exit+0x8e/0xd0
[12461.190667]  [<ffffffff8101c1e8>] ? smp_apic_timer_interrupt+0x68/0xa0
[12461.190669]  [<ffffffff81463c4b>] ? apic_timer_interrupt+0x6b/0x70
[12461.190670]  <EOI>  [<ffffffff81059f8a>] ? __hrtimer_start_range_ns+0x16a/0x3e0
[12461.190675]  [<ffffffff812077b2>] ? intel_idle+0xc2/0x110
[12461.190678]  [<ffffffff8120778e>] ? intel_idle+0x9e/0x110
[12461.190681]  [<ffffffff8132ae77>] ? cpuidle_idle_call+0x97/0xe0
[12461.190683]  [<ffffffff810011da>] ? cpu_idle+0xba/0x110
[12461.190686]  [<ffffffff814559e6>] ? start_secondary+0x1f5/0x1fb
[12461.190687] ---[ end trace 989737665136fa51 ]---
[12461.285621] [drm:ironlake_update_pch_refclk] *ERROR* enabling SSC on PCH

Comment 10 Chris Wilson 2011-12-01 06:39:08 UTC

How soon after the hang do you see corruption? There will be some corruption inevitably as a result of lost data due to the hang.

I just want to establish whether we misrender in the acceleration or fallback code.

Comment 11 Chris Wilson 2011-12-01 06:40:13 UTC

And can you attach the /sys/kernel/debug/dri/0/i915_error_state for the hang?

Comment 12 Lukas Hejtmanek 2011-12-01 06:43:18 UTC

Created attachment 54013 [details]
i915_error_state

Comment 13 Lukas Hejtmanek 2011-12-01 06:44:52 UTC

(In reply to comment #10)
> How soon after the hang do you see corruption? There will be some corruption
> inevitably as a result of lost data due to the hang.

anytime since the hang, so it does not looks like corruption during the hang.

Comment 14 Chris Wilson 2011-12-01 06:53:53 UTC

You don't happen to have FBC enabled do you? cat /sys/kernel/debug/dri/0/i915_fbc_status

If you do can you test without, i915.i915_enable_fbc=0?

Comment 15 Lukas Hejtmanek 2011-12-01 07:11:34 UTC

(In reply to comment #14)
> You don't happen to have FBC enabled do you? cat
> /sys/kernel/debug/dri/0/i915_fbc_status
> 
> If you do can you test without, i915.i915_enable_fbc=0?

I have FBC enabled. I try to test without FBC. Is it possible to set the parameter on the fly?

Comment 16 Chris Wilson 2011-12-01 07:21:08 UTC

You can try echo 0 > /sys/module/i915/parameters/i915_enable_fbc and restarting X and then cat /sys/kernel/debug/dri/0/i915_fbc_status to confirm

Comment 17 Lukas Hejtmanek 2011-12-02 02:50:49 UTC

(In reply to comment #16)
> You can try echo 0 > /sys/module/i915/parameters/i915_enable_fbc and restarting
> X and then cat /sys/kernel/debug/dri/0/i915_fbc_status to confirm

well, it stil happes even if fbc disabled.

cat /sys/kernel/debug/dri/0/i915_fbc_status
FBC disabled: disabled per module param (default off)

btw, it seems that system is more hang prone if I run forcewaked (to prevent render issues)

Comment 18 Chris Wilson 2011-12-02 02:56:08 UTC

Can you attach the error state so that I can be sure it is the same problem? rc6 issues have been related to VTd/iommu in the past, can you either disable VTd in the BIOS or pass intel_iommu=off

Comment 19 Lukas Hejtmanek 2011-12-02 03:25:06 UTC

(In reply to comment #18)
> Can you attach the error state so that I can be sure it is the same problem?
> rc6 issues have been related to VTd/iommu in the past, can you either disable
> VTd in the BIOS or pass intel_iommu=off

VTd is disabled in the BIOS all the time, I do not use it.

error state attached.

Comment 20 Lukas Hejtmanek 2011-12-02 03:25:57 UTC

Created attachment 54065 [details]
i915_error_state #2

Comment 21 Chris Wilson 2011-12-02 03:31:55 UTC

Ok, that does look to be consistent with the first. A nuisance, as I had seen a very similar error (along with performance issues) go away after disabling FBC. A further check is that I was suffering x11perf -dot performance of around 300Kdot/s with FBC enabled and 70Mdot/s without.

Comment 22 Lukas Hejtmanek 2011-12-02 03:53:21 UTC

(In reply to comment #21)
> Ok, that does look to be consistent with the first. A nuisance, as I had seen a
> very similar error (along with performance issues) go away after disabling FBC.
> A further check is that I was suffering x11perf -dot performance of around
> 300Kdot/s with FBC enabled and 70Mdot/s without.

I noticed performance drop with my favorite glxgears. I see drop from 6000fps to 3000fps (with FBC/without FBC, resp.).

I got 180Mdot/s without FBC. Don't know how much with FBC.

Comment 23 Chris Wilson 2011-12-07 11:23:45 UTC

*** Bug 43587 has been marked as a duplicate of this bug. ***

Comment 24 Lukas Hejtmanek 2011-12-07 11:36:49 UTC

(In reply to comment #23)
> *** Bug 43587 has been marked as a duplicate of this bug. ***

not sure if that's the same bug. the hang is unrelated to window move itself, the corruption of moved window just happens after any hang..

Comment 25 Chris Wilson 2011-12-07 13:02:36 UTC

(In reply to comment #24)
> (In reply to comment #23)
> > *** Bug 43587 has been marked as a duplicate of this bug. ***
> 
> not sure if that's the same bug. the hang is unrelated to window move itself,
> the corruption of moved window just happens after any hang..

Just go with me when I say the error states are the same, how you trigger it is up to you...

Comment 26 delete 2011-12-08 11:20:40 UTC

Thanks for your response in my bugreport Chris.

i915.i915_enable_fbc=0
----------------------
fbc is still enabled... maybe there is no option like this in my module (xf86-video-intel-2.17.0-r2)?

black ~ # grep . /sys/module/i915/parameters/*
/sys/module/i915/parameters/fbpercrtc:0
/sys/module/i915/parameters/i915_enable_rc6:0
/sys/module/i915/parameters/lvds_downclock:0
/sys/module/i915/parameters/lvds_use_ssc:1
/sys/module/i915/parameters/modeset:-1
/sys/module/i915/parameters/panel_ignore_lid:0
/sys/module/i915/parameters/powersave:1
/sys/module/i915/parameters/reset:Y
/sys/module/i915/parameters/semaphores:0
/sys/module/i915/parameters/vbt_sdvo_panel_type:-1


and just for test:
i915.semaphores=1
-----------------
drm:i915_hangcheck_elapsed after open urxvt and "less /var/log/messages"
i915_error_state is attached


> And we may as try with rc6 and semaphores disabled for completeness.
Do you mean rc6 AND semaphores disabled or rc6 enabled and semaphores disabled too?

Comment 27 delete 2011-12-08 11:22:32 UTC

Created attachment 54249 [details]
i915_error_state_semaphores1

Comment 28 Chris Wilson 2012-03-26 01:44:54 UTC

I believe these are all related to the underlying bug:

commit c501ae7f332cdaf42e31af30b72b4b66cbbb1604
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Dec 14 13:57:23 2011 +0100

    drm/i915: Only clear the GPU domains upon a successful finish
    
    By clearing the GPU read domains before waiting upon the buffer, we run
    the risk of the wait being interrupted and the domains prematurely
    cleared. The next time we attempt to wait upon the buffer (after
    userspace handles the signal), we believe that the buffer is idle and so
    skip the wait.
    
    There are a number of bugs across all generations which show signs of an
    overly haste reuse of active buffers.
    
    Such as:
    
      https://bugs.freedesktop.org/show_bug.cgi?id=29046
      https://bugs.freedesktop.org/show_bug.cgi?id=35863
      https://bugs.freedesktop.org/show_bug.cgi?id=38952
      https://bugs.freedesktop.org/show_bug.cgi?id=40282
      https://bugs.freedesktop.org/show_bug.cgi?id=41098
      https://bugs.freedesktop.org/show_bug.cgi?id=41102
      https://bugs.freedesktop.org/show_bug.cgi?id=41284
      https://bugs.freedesktop.org/show_bug.cgi?id=42141
    
    A couple of those pre-date i915_gem_object_finish_gpu(), so may be
    unrelated (such as a wild write from a userspace command buffer), but
    this does look like a convincing cause for most of those bugs.
    
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: stable@kernel.org
    Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
    Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
    Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.