Bug 80560 - 62102f505cd13840e4 causes icon corruption in thunar
Summary: 62102f505cd13840e4 causes icon corruption in thunar
Status: RESOLVED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/intel (show other bugs)
Version: git
Hardware: x86-64 (AMD64) Linux (All)
: high blocker
Assignee: Chris Wilson
QA Contact: Intel GFX Bugs mailing list
URL: http://cgit.freedesktop.org/xorg/driv...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-06-26 14:08 UTC by Harald Judt
Modified: 2014-06-29 13:05 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
courrpted-icon.png (4.66 KB, image/png)
2014-06-26 14:08 UTC, Harald Judt
no flags Details
original-icon.png (4.12 KB, image/png)
2014-06-26 14:09 UTC, Harald Judt
no flags Details
Xorg.log (24.56 KB, text/plain)
2014-06-26 15:08 UTC, Harald Judt
no flags Details
other-corruption.png (6.91 KB, image/png)
2014-06-26 16:21 UTC, Harald Judt
no flags Details

Description Harald Judt 2014-06-26 14:08:26 UTC
Created attachment 101806 [details]
courrpted-icon.png

git bisect start
# bad: [c5c7dd24a55f04322d5eec10dc4352d8a8e92b1e] sna: Remove scare quotes from hotplug detection "enabled"
git bisect bad c5c7dd24a55f04322d5eec10dc4352d8a8e92b1e
# good: [cb7b27a705b477ae1b369786eea13fb14506d54a] 2.99.912 snapshot
git bisect good cb7b27a705b477ae1b369786eea13fb14506d54a
# bad: [e52f917a2370026073ed6851161393a69ee94abe] sna: Use nxm tiled blits for small regions of large tiles
git bisect bad e52f917a2370026073ed6851161393a69ee94abe
# good: [cd381ad96eadb72eb1a983c973a5dd47f547fc8b] xvmc: Refactor mutex locking
git bisect good cd381ad96eadb72eb1a983c973a5dd47f547fc8b
# bad: [62102f505cd13840e4c910adbe762b3fb46dfaec] sna: Promote better active buffer reuse
git bisect bad 62102f505cd13840e4c910adbe762b3fb46dfaec
# good: [05fdfe3b2813495ea122471ed6dde340e538aa65] intel-virtual-output: Compile fixes
git bisect good 05fdfe3b2813495ea122471ed6dde340e538aa65
# good: [0a6a94630f4fec26bed2aefb25de1a165193ff10] test: Add a simple SHM test skeleton
git bisect good 0a6a94630f4fec26bed2aefb25de1a165193ff10
# good: [243bd26ad31776b2dc45563107e669cf7b45fd91] sna: Clear our private hints about front rendering exported bo
git bisect good 243bd26ad31776b2dc45563107e669cf7b45fd91
# good: [2877d018d3510e415f273ce955eb5c272c340e7d] sna/glyphs: Show number of glyphs in DBG
git bisect good 2877d018d3510e415f273ce955eb5c272c340e7d
# first bad commit: [62102f505cd13840e4c910adbe762b3fb46dfaec] sna: Promote better active buffer reuse


lspci -vvv:
00:02.0 VGA compatible controller: Intel Corporation Haswell-ULT Integrated Graphics Controller (rev 09) (prog-if 00 [VGA controller])
	Subsystem: Lenovo Device 220c
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 56
	Region 0: Memory at f0000000 (64-bit, non-prefetchable) [size=4M]
	Region 2: Memory at e0000000 (64-bit, prefetchable) [size=256M]
	Region 4: I/O ports at 3000 [size=64]
	Expansion ROM at <unassigned> [disabled]
	Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
		Address: fee0f00c  Data: 4171
	Capabilities: [d0] Power Management version 2
		Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [a4] PCI Advanced Features
		AFCap: TP+ FLR+
		AFCtrl: FLR-
		AFStatus: TP-
	Kernel driver in use: i915

This is on a Lenovo Thinkpad T440s running gentoo.
linux-3.6.11
xorg-server-1.15.99.903
mesa-git
x11-libs/cairo-1.12.16-r3

I have not noticed these issue in other applications, it could be related to cairo, used by thunar. It is not reproducible with xf86-video-ati, though.
Comment 1 Harald Judt 2014-06-26 14:09:00 UTC
Created attachment 101807 [details]
original-icon.png

This is how it should look like.
Comment 2 Chris Wilson 2014-06-26 14:54:24 UTC
Any tricks required to reproduce?
Comment 3 Harald Judt 2014-06-26 14:56:55 UTC
Not that I know of:
    Option "AccelMethod" "sna"
    Option "PageFlip" "true"
    Option "TripleBuffer" "true"
    Option "TearFree" "true"
    Option "VSync" "true"
Comment 4 Harald Judt 2014-06-26 15:08:55 UTC
Created attachment 101809 [details]
Xorg.log
Comment 5 Chris Wilson 2014-06-26 15:39:20 UTC
It is definitely the right icon, but distorted? Do you see sometimes other junk in the icons (even just incorrect highlight/focus state)?
Comment 6 Harald Judt 2014-06-26 16:20:21 UTC
Yes.

1) It is 100% reproducible when highlighting/selecting and not reverting the commit.

2) It also happens occasionally with the commit reverted, but is not so easily reproducible. Also, the corruptions are not the same; About 3/4 of the icon are corrupted with a garbled black mess. I'll attach another image for comparison if I can make a screenshot. The corruptions will disappear on a key press or mouse click, or when switching to another window.

Maybe 2) is the real cause and the commit makes it only worse?
I will investigate in 2) on the machine with the radeon card to see if it happens there too...
Comment 7 Harald Judt 2014-06-26 16:21:34 UTC
Created attachment 101815 [details]
other-corruption.png
Comment 8 Harald Judt 2014-06-26 16:24:02 UTC
Ok, a bit different than what I described. The lower part of the icon is still visible. In other cases the whole icon is garbled, and sometimes the corruption is different for the same icon but will be the same on the next occurence.
Comment 9 Chris Wilson 2014-06-26 19:23:09 UTC
(In reply to comment #6) 
> Maybe 2) is the real cause and the commit makes it only worse?
> I will investigate in 2) on the machine with the radeon card to see if it
> happens there too...

That makes more sense. The commit in question tries hard to reuse an active buffer, so if there is some race condition it would exacerbate it. Something else to check is different kernel versions (just not 3.10!) to see if the race is actually a kernel bug (unlikely, but has happened!).
Comment 10 Chris Wilson 2014-06-26 21:06:56 UTC
Something simple to try is compile with ./configure --enable-debug and see if any of the current asserts catch the error.
Comment 11 Chris Wilson 2014-06-27 07:42:23 UTC
No luck reproducing this so far, trying pnv/snb/ivb with trusty, utopic and f20. But using current kernels. Hmm. Please do try a different kernel and recompiling with --enable-debug.
Comment 12 Harald Judt 2014-06-27 11:22:51 UTC
I've tried a new kernel (3.15.2), same problem. It is not reproducible on the other machine with a different driver. However, I did a new bisect and got this result:

git bisect start
# bad: [b6eca567c08fa1e14b06d751a13b714119a480a6] Revert "sna: Promote better active buffer reuse"
git bisect bad b6eca567c08fa1e14b06d751a13b714119a480a6
# good: [7468a6b740af14d95e8f9bacd2e352ec98a9acf2] 2.99.906 snapshot
git bisect good 7468a6b740af14d95e8f9bacd2e352ec98a9acf2
# good: [2fb2cd092dadba40b5ad2da57943c660a0c9bc14] 2.99.910 snapshot
git bisect good 2fb2cd092dadba40b5ad2da57943c660a0c9bc14
# good: [df297e9fafe1a7de1036f3151f93de67661c2a4e] sna: Userptr lands upstream, enable.
git bisect good df297e9fafe1a7de1036f3151f93de67661c2a4e
# bad: [b41a51e0f54c983ef72bc31125419382d666d48c] sna: Tweak self-copy boxes to hit the GPU more often
git bisect bad b41a51e0f54c983ef72bc31125419382d666d48c
# bad: [7f28b1cfa20eb4fc11c1a9c1e0c0fac7847be72e] uxa: Support 64-bit MSC values. Handle kernel vageries about MSC reporting
git bisect bad 7f28b1cfa20eb4fc11c1a9c1e0c0fac7847be72e
# good: [e8d8f754ebf77f17a4666c8f649907bee50313bd] sna/dri2: Make the swap-limit transitions more obvious
git bisect good e8d8f754ebf77f17a4666c8f649907bee50313bd
# bad: [bb49222a514b1d6041f3d9530a22f5701377118b] sna: Add DBG hints for using inplace CPU mmappings
git bisect bad bb49222a514b1d6041f3d9530a22f5701377118b
# bad: [562c47fc21b9029ebff29b53d2be590df4329264] Silence CLang (almost)
git bisect bad 562c47fc21b9029ebff29b53d2be590df4329264
# bad: [e6ee0679374f8fc80cb34693f80affe6ea676ddb] uxa: Silence compiler for warnings in Cursor API changes
git bisect bad e6ee0679374f8fc80cb34693f80affe6ea676ddb
# good: [2a0dc7ed4961d5bda08bb583a7a6a2cc633b27e4] sna/dri2: Move scanout processing from frame event to global
git bisect good 2a0dc7ed4961d5bda08bb583a7a6a2cc633b27e4
# good: [4321e8decd7be9daaa49c6ab4e8bb9bd66faae31] sna: Silence compiler for warnings in Cursor API changes
git bisect good 4321e8decd7be9daaa49c6ab4e8bb9bd66faae31
# first bad commit: [e6ee0679374f8fc80cb34693f80affe6ea676ddb] uxa: Silence compiler for warnings in Cursor API changes

Reproducability varies between bisect steps, but in general it's not hard to notice the issue. I'm using gcc-4.8.3. If it's a compiler problem, it might explain why it's not reproducible everywhere. However, the other machine uses gcc-4.8.3 too but has a sandy bridge processor.

CFLAGS="-O2 -march=native -pipe -fno-ident -ggdb"
LDFLAGS="-Wl,-O1 -Wl,--hash-style=gnu -Wl,--as-needed -Wl,--sort-common -Wl,-z,now"
Comment 13 Chris Wilson 2014-06-27 16:38:33 UTC
Hmm, not sure yet. Can you try:

diff --git a/src/sna/sna_accel.c b/src/sna/sna_accel.c
index a559907..8241be2 100644
--- a/src/sna/sna_accel.c
+++ b/src/sna/sna_accel.c
@@ -75,7 +75,7 @@
 #define USE_USERPTR_UPLOADS 1
 #define USE_USERPTR_DOWNLOADS 1
 #define USE_COW 1
-#define UNDO 1
+#define UNDO 0
 
 #define MIGRATE_ALL 0
 #define DBG_NO_PARTIAL_MOVE_TO_CPU 0
Comment 14 Harald Judt 2014-06-27 18:49:26 UTC
Modifying UNDO does not make any difference at all.

But I noticed something else: What both issues have in common is that these corruptions do not appear with smaller icon sizes. I use the largest setting in thunar, I'm not sure how big that is, but probably 128px. Going one size lower (96px or 80px maybe?) gets completely rid of the issue. This is 100% reproducible. Maybe that sheds more light on this?
Comment 15 Chris Wilson 2014-06-27 19:03:37 UTC
How do you change icon size? I see side-pane and tree-pane in the preferences? Which icons are affected (I was assuming the files)?
Comment 16 Harald Judt 2014-06-27 19:19:44 UTC
Easy. It's in menu view->zoom in/out. This is for the icons in the main view.
Comment 17 Harald Judt 2014-06-27 19:20:41 UTC
...or you press <C>-<+> or <C>-<-> to adjust the size, <C>-<0> restores the original size.
Comment 18 Chris Wilson 2014-06-27 19:26:14 UTC
Ok, I wasn't sure if zooming in there was doing what you wanted. But now I can see a little bit of corruption in the selection box, so I'll start digging. Thanks.
Comment 19 Chris Wilson 2014-06-27 22:30:31 UTC
The corruption I see is related to manual detiling it seems. Haven't pieced together the exact cause, I am just tracing symptoms.

So far, I can "fix" it by:

1. disabling X-tiling for 512 byte wide tiles
2. disabling gpu_bo_download() (manual detiling)
3. disabling try_upload__inplace()

It is the 128x128 icon that gets manually tiled inplace, then hits a fallback, and gets detiled inplace. So far that explains why it is only the largest size that triggers it (it has to be 128 to hit this path, and I guess no other element is rendered quite like those icons).

To confirm you have a similar bug, can you please test:

diff --git a/src/sna/kgem.c b/src/sna/kgem.c
index 898f943..d143b42 100644
--- a/src/sna/kgem.c
+++ b/src/sna/kgem.c
@@ -4194,7 +4194,7 @@ int kgem_choose_tiling(struct kgem *kgem, int tiling, int width, int height, int
                goto done;
        }
 
-       if (tiling == I915_TILING_X && width * bpp <= 8*8*512/10) {
+       if (tiling == I915_TILING_X && width * bpp <= 8*512) {
                DBG(("%s: too thin [width %d, %d bpp] for TILING_X\n",
                     __FUNCTION__, width, bpp));
                tiling = I915_TILING_NONE;
Comment 20 Harald Judt 2014-06-28 08:05:16 UTC
Yes, this small change fixes both issues, so I can confirm the bug is similar.
Comment 21 Chris Wilson 2014-06-28 20:12:17 UTC
I think I understand the bug now. I changed the transfer from GPU bo to CPU bo paths, but missed that implied a change in synchronisation requirement. This fixes the bug, but I want to see if I can choose the allocation flags more accurately before committing;


diff --git a/src/sna/sna_accel.c b/src/sna/sna_accel.c
index dd5c108..6efc5f3 100644
--- a/src/sna/sna_accel.c
+++ b/src/sna/sna_accel.c
@@ -2157,7 +2157,8 @@ _sna_pixmap_move_to_cpu(PixmapPtr pixmap, unsigned int flags)
 skip_inplace_map:
                sna_damage_destroy(&priv->gpu_damage);
                priv->clear = false;
-               if (priv->cpu_bo && !priv->cpu_bo->flush &&
+               if ((flags & MOVE_ASYNC_HINT) == 0 &&
+                   priv->cpu_bo && !priv->cpu_bo->flush &&
                    __kgem_bo_is_busy(&sna->kgem, priv->cpu_bo)) {
                        DBG(("%s: discarding busy CPU bo\n", __FUNCTION__));
                        assert(!priv->shm);
@@ -2269,8 +2270,8 @@ skip_inplace_map:
 
        assert(priv->mapped == MAPPED_NONE);
        if (pixmap->devPrivate.ptr == NULL &&
-           !sna_pixmap_alloc_cpu(sna, pixmap, priv,
-                                 flags & MOVE_READ ? priv->gpu_damage && !priv->clear : 0))
+           !sna_pixmap_alloc_cpu(sna, pixmap, priv, 0))
+                                 //((flags & (MOVE_READ | MOVE_ASYNC_HINT)) == MOVE_READ) ? priv->gpu_damage && !priv->clear : 0))
                return false;
        assert(priv->mapped == MAPPED_NONE);
        assert(pixmap->devPrivate.ptr == PTR(priv->ptr));
@@ -2594,7 +2595,7 @@ sna_drawable_move_region_to_cpu(DrawablePtr drawable,
                sna_pixmap_unmap(pixmap, priv);
                assert(priv->mapped == MAPPED_NONE);
                if (pixmap->devPrivate.ptr == NULL &&
-                   !sna_pixmap_alloc_cpu(sna, pixmap, priv, false))
+                   !sna_pixmap_alloc_cpu(sna, pixmap, priv, flags & MOVE_ASYNC_HINT))
                        return false;
                assert(priv->mapped == MAPPED_NONE);
                assert(pixmap->devPrivate.ptr == PTR(priv->ptr));
@@ -2780,8 +2781,8 @@ move_to_cpu:
 
        assert(priv->mapped == MAPPED_NONE);
        if (pixmap->devPrivate.ptr == NULL &&
-           !sna_pixmap_alloc_cpu(sna, pixmap, priv,
-                                 flags & MOVE_READ ? priv->gpu_damage && !priv->clear : 0)) {
+           !sna_pixmap_alloc_cpu(sna, pixmap, priv, 0)) {
+                                 //flags & MOVE_READ ? priv->gpu_damage && !priv->clear : 0)) {
                DBG(("%s: CPU bo allocation failed, trying full move-to-cpu\n", __FUNCTION__));
                goto move_to_cpu;
        }
Comment 22 Chris Wilson 2014-06-29 06:19:01 UTC
commit b961d7323369284ea2c3db47d30c27ffe01a9040
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Sun Jun 29 07:00:58 2014 +0100

    sna: Sync CPU bo before writes
    
    Fixes regression from
    
    commit 961139f5878572ebea268a0bbf47caf05af9093f [2.99.912]
    Author: Chris Wilson <chris@chris-wilson.co.uk>
    Date:   Fri May 30 09:45:15 2014 +0100
    
        sna: Use manual detiling for downloads
    
    Reported-by: Harald Judt <h.judt@gmx.at>
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=80560
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Comment 23 Harald Judt 2014-06-29 13:05:19 UTC
Thanks, I confirm problem fixed.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.