Bug 91153 - [ivb] bright pixels appear when mousing over LibreOffice Impress
Summary: [ivb] bright pixels appear when mousing over LibreOffice Impress
Status: RESOLVED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/intel (show other bugs)
Version: unspecified
Hardware: Other Linux (All)
: medium normal
Assignee: Chris Wilson
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: Deepak S <deepak.s@linux.intel.com>
Keywords:
Depends on:
Blocks:
 
Reported: 2015-06-29 23:09 UTC by Joe Peterson
Modified: 2015-07-06 20:24 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
Xorg log file (22.03 KB, text/plain)
2015-06-29 23:09 UTC, Joe Peterson
no flags Details
Screenshot showing bright pixels (371.88 KB, image/png)
2015-06-29 23:10 UTC, Joe Peterson
no flags Details
Xorg.0.log (22.26 KB, text/plain)
2015-06-30 20:56 UTC, Joe Peterson
no flags Details

Description Joe Peterson 2015-06-29 23:09:18 UTC
Created attachment 116810 [details]
Xorg log file

Running: xf86-video-intel 2.99.917-5 in Arch Linux

If I launch LibreOffice with a couple of dark-background terminal windows on the screen, select the "Impress" application (clone of powerpoint), bring the terminal windows forward, and move my mouse around a bit (hitting areas of the LibreOffice window), bright pixels appear in my dark terminal windows.

This only happens when using "SNA" mode in the intel driver.

This is on a ThinkPad T430s.
I can reproduce this very easily. It does not happen in UXA.

I've attached an Xorg.0.log file. I'll attach a screenshot showing some bright pixels in the term windows in the next entry. Let me know if I can provide any other files.
Comment 1 Joe Peterson 2015-06-29 23:10:28 UTC
Created attachment 116811 [details]
Screenshot showing bright pixels

Here's the screenshot showing the bright pixels.
Comment 2 Chris Wilson 2015-06-30 08:10:13 UTC
A couple of possibilities:

- it's a missed GPU w/a, state, flush

- it's just tearfree damage tracking going awry

The second is very easy to test as you can quickly turn it off from your xorg.conf.

At a guess, I would say the window manager is wmaker?
Comment 3 Joe Peterson 2015-06-30 11:58:51 UTC
(In reply to Chris Wilson from comment #2)
> - it's just tearfree damage tracking going awry
> 
> The second is very easy to test as you can quickly turn it off from your
> xorg.conf.

Nope, I turned off tearfree and could still make it happen.

> At a guess, I would say the window manager is wmaker?

It's fvwm.

Let me know of any more tests I can try.
Comment 4 Joe Peterson 2015-06-30 12:01:50 UTC
(In reply to Joe Peterson from comment #3)
> It's fvwm.
> 
> Let me know of any more tests I can try.

Note that I use "focus follows mouse" (and with fvwm, it's strict, meaning focus goes in and out of windows as the mouse passes over them). This may make the issue much easier to reproduce for me, as moving the mouse causes focus/unfocus events, changes terminal cursor, etc. The terminal I use mainly is urxvt (rxvt-unicode), but I also checked to see if xterm gets the bright pixels, and it does.
Comment 5 Chris Wilson 2015-06-30 12:40:26 UTC
It would be handy if you could compile with --enable-debug=pixmap. That will make sure that we verify the coordinates before drawing and so hopefully ruling out a simple mistake.
Comment 6 Joe Peterson 2015-06-30 18:41:40 UTC
Strange, I recompiled xf86-video-intel (using the Arch package). With or without the option, --enable-debug=pixmap, I cannot get X to start. The following is in Xorg.0.log:

[  8087.510] (==) intel(0): Depth 24, (--) framebuffer bpp 32
[  8087.510] (==) intel(0): RGB weight 888
[  8087.510] (==) intel(0): Default visual is TrueColor
[  8087.510] (**) intel(0): Option "AccelMethod" "sna"
[  8087.510] (**) intel(0): Option "TearFree" "true"
[  8087.510] (EE) 
[  8087.510] (EE) Backtrace:
[  8087.510] (EE) 0: /usr/lib/xorg-server/Xorg (OsLookupColor+0x139) [0x596749]
[  8087.510] (EE) 1: /usr/lib/libc.so.6 (__restore_rt+0x0) [0x7f4f84cba5af]
[  8087.511] (EE) 2: ? (?+0x0) [0x0]
[  8087.511] (EE) 
[  8087.511] (EE) Segmentation fault at address 0x0
[  8087.511] (EE) 
Fatal server error:
[  8087.511] (EE) Caught signal 11 (Segmentation fault). Server aborting

Am I missing a vital step in building it?
Comment 7 Chris Wilson 2015-06-30 19:55:10 UTC
You are trying to build -intel-2.99.917 with new libdrm, and there was an abi break in libdrm. You need to build from -intel.git to assist with debugging anyway.
Comment 8 Joe Peterson 2015-06-30 20:56:41 UTC
Created attachment 116832 [details]
Xorg.0.log
Comment 9 Joe Peterson 2015-06-30 20:57:16 UTC
OK, thanks for the tip on that. I do get an extra line in Xorg.0.log:

    intel(0): SNA compiled with extra pixmap/damage validation

So I assume my build it active (although version appears the same).

At first try, I could not, this time, cause the issue to appear. Could the compile option prevent it now? I've attached a new log file (just before this comment) in case it helps...
Comment 10 Chris Wilson 2015-06-30 21:04:15 UTC
It'll change timings (it runs a bit slower), but it shouldn't otherwise impact the rendering results - I tried to avoid perturbing the paths it validates.
Comment 11 Joe Peterson 2015-06-30 21:53:18 UTC
(In reply to Chris Wilson from comment #10)
> It'll change timings (it runs a bit slower), but it shouldn't otherwise
> impact the rendering results - I tried to avoid perturbing the paths it
> validates.

OK, I compiled the git version *without* the pixmap option, and now I cannot make it happen anymore. Have there been changes, that you know of, since the version I was using that could have fixed this? If you have any more suggestions of things to try, let me know.
Comment 12 Chris Wilson 2015-07-01 07:50:53 UTC
(In reply to Joe Peterson from comment #11)
> OK, I compiled the git version *without* the pixmap option, and now I cannot
> make it happen anymore. Have there been changes, that you know of, since the
> version I was using that could have fixed this? If you have any more
> suggestions of things to try, let me know.

About the only thing which appears relevant are engine selection tweaks. Nothing that purports to change the rendering for your machine, just fine tuning.
Comment 13 Joe Peterson 2015-07-03 16:45:24 UTC
(In reply to Chris Wilson from comment #12)
> About the only thing which appears relevant are engine selection tweaks.
> Nothing that purports to change the rendering for your machine, just fine
> tuning.

OK, interesting. I was trying latest git version, which seemed to either fix it or mask the issue.

And today, I dyd an Arch Linux update, and it seems the official package now grabs git commit b24e758 (backlight: Factor known names into preferred interfaces).

This one is earlier than latest, of course, but it also seems to mask or fix the issue.

I guess there is no way to go back to the 2.99.917 (due to the abi change) to find out what cause the "fix". I hope it's not just a random timing change that is now causing it to work better, since that means it could still be lurking...
Comment 14 Chris Wilson 2015-07-03 17:02:37 UTC
You can bisect, you just have to cherry-pick the fix every time. I would appreciate it because as you say without understanding why it is no longer apparent, we have no idea if it is fixed or merely hidden.
Comment 15 Joe Peterson 2015-07-03 17:14:44 UTC
(In reply to Chris Wilson from comment #14)
> You can bisect, you just have to cherry-pick the fix every time. I would
> appreciate it because as you say without understanding why it is no longer
> apparent, we have no idea if it is fixed or merely hidden.

Is there a patch I could make that would update the older versions to current abi, or is it more complex than that? Bisecting would be easier if that were possible. BTW, do you know the commit at which the abi changed to the current?
Comment 16 Chris Wilson 2015-07-03 17:19:45 UTC
commit 8576527cfacaf42af8316e1030c192193e94225a
Author: Neil Roberts <neil@linux.intel.com>
Date:   Thu Mar 19 17:11:08 2015 +0000

    intel: Merge latest i915_drm.h

is the fateful commit in libdrm.

diff --git a/include/drm/i915_drm.h b/include/drm/i915_drm.h
index ded43b1..e398949 100644
--- a/include/drm/i915_drm.h
+++ b/include/drm/i915_drm.h
@@ -497,14 +497,6 @@ struct drm_i915_gem_mmap {
         * This is a fixed-size type for 32/64 compatibility.
         */
        __u64 addr_ptr;
-
-       /**
-        * Flags for extended behaviour.
-        *
-        * Added in version 2.
-        */
-       __u64 flags;
-#define I915_MMAP_WC 0x1
 };
 

should be all that is required.
Comment 17 Joe Peterson 2015-07-03 17:40:16 UTC
(In reply to Chris Wilson from comment #16)
> commit 8576527cfacaf42af8316e1030c192193e94225a
> Author: Neil Roberts <neil@linux.intel.com>
> Date:   Thu Mar 19 17:11:08 2015 +0000
> 
>     intel: Merge latest i915_drm.h
> 
> is the fateful commit in libdrm.
> 
> diff --git a/include/drm/i915_drm.h b/include/drm/i915_drm.h
> index ded43b1..e398949 100644
> --- a/include/drm/i915_drm.h
> +++ b/include/drm/i915_drm.h
> @@ -497,14 +497,6 @@ struct drm_i915_gem_mmap {
>          * This is a fixed-size type for 32/64 compatibility.
>          */
>         __u64 addr_ptr;
> -
> -       /**
> -        * Flags for extended behaviour.
> -        *
> -        * Added in version 2.
> -        */
> -       __u64 flags;
> -#define I915_MMAP_WC 0x1
>  };
>  
> 
> should be all that is required.

Hi, so that's the change in libdrm, but when in xf86-video-intel did the code start to use the new libdrm abi?

And is there a patch I can apply to the older versions of xf86-video-intel (not to libdrm) to allow them to work with the new libdrm? I'd rather just bisect xf86-video-intel's code and reinstall that package than downgrade libdrm on mys system (that sounds a lot more dangerous).
Comment 18 Joe Peterson 2015-07-03 17:41:38 UTC
Is it this one:?

    7fe2b29 sna: Protect against ABI breakage in recent versions of libdrm
Comment 19 Chris Wilson 2015-07-03 17:57:37 UTC
(In reply to Joe Peterson from comment #17)
> Hi, so that's the change in libdrm, but when in xf86-video-intel did the
> code start to use the new libdrm abi?

The point is the existent ABI changed. We thought we were being smart by zero extending the field and believed it wouldn't break. But then we started passing in nearby stack because the ioctl was passing in more data than the old struct.


(In reply to Joe Peterson from comment #18)
> Is it this one:?
> 
>     7fe2b29 sna: Protect against ABI breakage in recent versions of libdrm

Yes.
Comment 20 Joe Peterson 2015-07-03 18:01:36 UTC
(In reply to Chris Wilson from comment #19)
> (In reply to Joe Peterson from comment #17)
> > Hi, so that's the change in libdrm, but when in xf86-video-intel did the
> > code start to use the new libdrm abi?
> 
> The point is the existent ABI changed. We thought we were being smart by
> zero extending the field and believed it wouldn't break. But then we started
> passing in nearby stack because the ioctl was passing in more data than the
> old struct.
> 
> 
> (In reply to Joe Peterson from comment #18)
> > Is it this one:?
> > 
> >     7fe2b29 sna: Protect against ABI breakage in recent versions of libdrm
> 
> Yes.

OK, cool. So do you expect that if I apply the patch (git diff bacaf7f..7fe2b29) to earlier versions of xf86-video-intel that it will work with new abi? I can try, but if hopeless, I'd rather know now. :)
Comment 21 Joe Peterson 2015-07-06 19:42:57 UTC
Ok, according to my bisection (and I want to test the "good" version a little more before I declare it good), the following commit appears to "fix" it:

   e47eb0c sna: Don't unroll BLT points

Looking at the diff for that commit, I do see something potentially suspect in the old code in the second unrolled block:

-                       if (n_this_time & 4) {
-                               *((uint64_t *)b + 0) = pt_add(cmd, p+0, dx, dy);
-                               *((uint64_t *)b + 1) = pt_add(cmd, p+1, dx, dy);
-                               *((uint64_t *)b + 2) = pt_add(cmd, p+2, dx, dy);
-                               *((uint64_t *)b + 3) = pt_add(cmd, p+3, dx, dy);
-                               b += 8;
-                               p += 8;

Note that both b and p are incremented by 8. In the previous block, which looks more correct they are:

-                       if (n_this_time & 4) {
-                               *((uint64_t *)b + 0) = pt_add(cmd, p+0, 0, 0);
-                               *((uint64_t *)b + 1) = pt_add(cmd, p+1, 0, 0);
-                               *((uint64_t *)b + 2) = pt_add(cmd, p+2, 0, 0);
-                               *((uint64_t *)b + 3) = pt_add(cmd, p+3, 0, 0);
-                               b += 8;
-                               p += 4;

I didn't analyze the unrolled code completely yet, but I wonder if that is where the bug was before the code was "rolled". I'll look at this a bit more later, but I wanted to get where I was so far to you now...
Comment 22 Chris Wilson 2015-07-06 20:24:27 UTC
Oh, I remember seeing that bug as well and feeling guilty for not commenting on it in the changelog.

That would explain the pixels very nicely and why they evade --enable-debug.

Thanks.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.