Bug 77351

Summary: [855GM] Mouse cursors show wrong default black cursors and/or do not update correctly
Product: DRI Reporter: CarlEitsger <4607vrfcr84spd21f08>
Component: DRM/IntelAssignee: Chris Wilson <chris>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: medium CC: intel-gfx-bugs
Version: unspecified   
Hardware: x86 (IA32)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
KDE's default cursors
none
KDE's default black cursors (again with correct type...)
none
KDE's oxygen cursors
none
Xorg.0.log after logging in
none
Xorg.0.log after logoff and relogin
none
Fix dynamic allocation of physical handles
none
Don't reuse cursors on broken kernels none

Description CarlEitsger 2014-04-11 23:15:05 UTC
Created attachment 97243 [details]
KDE's default cursors

(thi is orignating from comment 18 in bug 77201, and I have not checked for duplicate bugs since Chris already was aware)

Mouse cursors show wrong default black cursors (sometimes the cursor for the last action - no update, e.g. a waiting clock) instead of KDE4's "Oxgen white".

* Expected: black cursors for the login screen of KDM which changes to the oxygen cursors when my user profile is loaded. Also cursors shoudld change accoding to their function (e.g. textfield or not).
* Actual: black cursors stay. And the waiting watch symbol stays for the KDM login screen even when it is read for login. Textfield cursor stays when it should be a pointer again. In  9ae8213.

Tested with 2.99.911: intel(0): SNA compiled from 2.99.911 - cursors work.

bisected from 9ae8213 to 2.99.911.:
* A observation: 2.99.911-15-ge8be2a4 is more broken than the other versions: shows a white block (rectangle) as cursor ... until KDE4 has started and I first hovered over a text field - then it becomes a text cursor. But does not change to a pointer afterwards.
* result: 
Bisecting: 0 revisions left to test after this (roughly 0 steps)
[25ca8f136cef9e1bdf06967bf8e78c87b54ffce2] sna: Support variable sized cursors


Will test without KDM in some minutes...
Comment 1 CarlEitsger 2014-04-11 23:16:10 UTC
Created attachment 97244 [details]
KDE's default black cursors (again with correct type...)
Comment 2 CarlEitsger 2014-04-11 23:16:41 UTC
Created attachment 97245 [details]
KDE's oxygen cursors
Comment 3 CarlEitsger 2014-04-11 23:43:45 UTC
with a pure startx and LXDE similar things happen (with current git version): Cursor is a a styled (non-black) pointer after LXDE start BUT it does not change to a textcursor when I hover ofter a text field / console (it does with 2.99.910).
Comment 4 Chris Wilson 2014-04-12 14:37:17 UTC
I dug out an 855gm to check, and I have not noticed any odd behaviour (or lag) with both 2-color and argb cursors.

Please do install the latest from xf86-video-intel.git and confirm that Xorg.0.log reports the right version.

If it still occurs, apply

diff --git a/src/sna/sna_display.c b/src/sna/sna_display.c
index 3f2b912..067f086 100644
--- a/src/sna/sna_display.c
+++ b/src/sna/sna_display.c
@@ -87,8 +87,8 @@ union compat_mode_get_connector{
 #define DEFAULT_DPI 96
 #endif
 
-#if 0
-#define __DBG DBG
+#if 1
+#define __DBG(x) ErrorF x
 #else
 #define __DBG(x)
 #endif

and attach the Xorg.0.log
Comment 5 CarlEitsger 2014-04-12 23:54:00 UTC
Thank you! Which parts of the Xorg.0.log are you interested in? I do not want to post it completely due to privacy reasons.
Comment 6 Chris Wilson 2014-04-13 15:42:35 UTC
The tail of the log should be full of cursor messages, which do not contain any sensitive information.
Comment 7 CarlEitsger 2014-04-14 13:58:04 UTC
the normal behaviour is: watch cursor when KDM login screen loads, then black pointer. Changes to white, styled pointer when my user profile has loaded.

Verified that the problem still occurs with:

intel(0): SNA compiled from 2.99.911-73-g9ae8213

Adding the debug patch:

patch sna_display.c ~⁄thepatch 

told something about ffuzz 1 but the patch seems to have worked.
Comment 8 CarlEitsger 2014-04-14 14:03:15 UTC
Created attachment 97352 [details]
Xorg.0.log after logging in

With KDM into KDE4 and browsing a bit in Firefox.  After editing a text field I had a text cursor for the whole time.
Comment 9 CarlEitsger 2014-04-14 14:03:22 UTC
Created attachment 97353 [details]
Xorg.0.log after logoff and relogin

in KDM login screen I had the basic watch cursor which stayed even after login completed. At last it changed to a Oxygen text cursor.
Comment 10 Chris Wilson 2014-04-14 16:33:59 UTC
The log behaves as I expect. The change of cursors is being sent to the kernel, but you observe that the updates do not occur.

Can you please try this kernel patch:

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 764467e..efcfd53 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -5055,6 +5055,7 @@ i915_gem_phys_pwrite(struct drm_device *dev,
                        return -EFAULT;
        }
 
+       drm_clflush_virt_range(vaddr, args->size);
        i915_gem_chipset_flush(dev);
        return 0;
 }

The other thing may be the two CRTCs using the same physical address. It's not prohibited by the spec, but it may just be upsetting the hw?
Comment 11 CarlEitsger 2014-04-14 20:25:04 UTC
(In reply to comment #10)
> The log behaves as I expect. The change of cursors is being sent to the
> kernel, but you observe that the updates do not occur.

Okay, but it was working with the older driver versions?! Just mentioning in case you did not see it. 

> Can you please try this kernel patch:

Uuh, a kernel patch... Never did this (also never compiled the kernel myself). Sorry, can you tell me how in a few words? Do I need to get the kernel, apply the patch and then compile the kernel myself? ... If it is worth the effort - apparently I am running into a seldom bug here. Thank you anyway!

> The other thing may be the two CRTCs using the same physical address. It's
> not prohibited by the spec, but it may just be upsetting the hw?

Do you mean my both screens (VGA1 LVDS1)? One is the laptop's built-in screen and another one is attached via VGA port. I could not find a physical address in my Xorg.0.log. Do I (or my distro's defaults) use a not-so-good configuration? If yes, sorry about this.
Comment 12 Chris Wilson 2014-04-14 20:53:53 UTC
(In reply to comment #11)
> (In reply to comment #10)
> > The log behaves as I expect. The change of cursors is being sent to the
> > kernel, but you observe that the updates do not occur.
> 
> Okay, but it was working with the older driver versions?! Just mentioning in
> case you did not see it.

Right, and the only difference between then and now would appear to be use of a single cursor for both CRTCs. (Previously each CRTC had an independent cursor allocated for it.) It just may be that our cursor programming sequence is wrong in the kernel, we keep finding bugs there so it would not be too surprising.

> > Can you please try this kernel patch:
> 
> Uuh, a kernel patch... Never did this (also never compiled the kernel
> myself). Sorry, can you tell me how in a few words? Do I need to get the
> kernel, apply the patch and then compile the kernel myself? ... If it is
> worth the effort - apparently I am running into a seldom bug here. Thank you
> anyway!

Getting to the bottom of this is most certainly worth the effort!

First of to compile the kernel, run

$ git clone --depth 1 --branch drm-intel-fixes git://people.freedesktop.org/~danvet/drm-intel
$ cd drm-intel
$ cp /boot/config-`uname -r` .config
$ make oldconfig
$ make
$ sudo make modules_install install

That should make the new kernel available for booting. Check that it does boot! Applying the patch should then be easier (you can probably do it faster using a text editor than pasting it into patch -p1) and then you just need to repeat make && sudo make modules_install install.
 
> > The other thing may be the two CRTCs using the same physical address. It's
> > not prohibited by the spec, but it may just be upsetting the hw?
> 
> Do you mean my both screens (VGA1 LVDS1)? One is the laptop's built-in
> screen and another one is attached via VGA port. I could not find a physical
> address in my Xorg.0.log. Do I (or my distro's defaults) use a not-so-good
> configuration? If yes, sorry about this.

No, it's a quirk of this particular hardware that it reads the cursor using a physical address to a contiguous chunk of memory.
Comment 13 CarlEitsger 2014-04-14 21:57:36 UTC
Okay, thanks for your great comments so far!

What is this ~danvet repo? By the way: It says "IMPORTANT: drm-intel moved to git://anongit.freedesktop.org/drm-intel".  It seems to be deprecated.

I was thinking of downloading the kernel from kernel.org and not some private source (of which I do not know its trustworthiness). Also okay?

I will try to look into it more closely tomorrow.
Comment 14 Chris Wilson 2014-04-15 05:29:40 UTC
(In reply to comment #13)
> Okay, thanks for your great comments so far!
> 
> What is this ~danvet repo? By the way: It says "IMPORTANT: drm-intel moved
> to git://anongit.freedesktop.org/drm-intel".  It seems to be deprecated.

Oops, sorry. Keep forgetting Daniel moved it, use the new address instead.
danvet == Daniel Vetter, our kernel driver maintainer
 
> I was thinking of downloading the kernel from kernel.org and not some
> private source (of which I do not know its trustworthiness). Also okay?

Using Linus should be ok in this case. drm-intel contains everything we are sending upstream.
Comment 15 CarlEitsger 2014-04-15 19:58:22 UTC
I had a look at https://wiki.archlinux.org/index.php/Kernels/Compilation/Traditional and https://wiki.archlinux.org/index.php/Kernels/Compilation/Arch_Build_System. I never used the Arch build system (ABS) before. I think this is the more easy and safe method, but I think I should try to understand the ABS before I try to use it for a kernel compilation/installation. The traditional method seems not to be easy too. The machine which shows the problems is my main machine and I do not like to mess it up. Also I am running out of backup HDDs.

I tried to follow your command list in brain dry-run: Problems already start  with the issue that I do not have any /boot/config-* file which could be cp'ed. ls: grub  initramfs-linux-fallback.img  initramfs-linux.img  vmlinuz-linux

So, sorry, I think this all would take too much time from me (although it is interesting). Isn't there another option? Or is the only other option to wait for someone with the same problems who can do a kernel patch?  I plan to move to a new main computer in the next months which would make my old computer free for messy testing, but I guess this would be too late.
Comment 16 Chris Wilson 2014-04-15 20:13:38 UTC
The other option is to wait and see if I can reproduce your issue. So far I have only looked at an 855gm with its builtin LVDS, and it appears to work just fine. Next I need to see what happens when I plug in an external monitor.
Comment 17 CarlEitsger 2014-04-15 22:43:12 UTC
good idea! I have tested with several configs (listed wich devices connected to the notebook):

* LAN: OK
* LAN, USB keyboard, USB mouse: OK
* LAN, USB keyboard, USB mouse, VGA: PROBLEM

The cursor problem only seems to happen with the external VGA monitor connected.

Note that my VGA has a bigger resolution as the LVDS (did not test if this makes a difference): 

[    41.920] (II) intel(0): switch to mode 1280x1024@60.0 on VGA1 using pipe 0, position (0, 0), rotation normal, reflection none
[    41.936] (II) intel(0): switch to mode 1024x768@60.0 on LVDS1 using pipe 1, position (0, 0), rotation normal, reflection none
[    41.951] (II) intel(0): Setting screen physical size to 338 x 270
Comment 18 Chris Wilson 2014-04-17 09:33:40 UTC
Spotted the bug:

diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 3d017f5b656a..261fc7196e57 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -7776,13 +7776,14 @@ static int intel_crtc_cursor_set(struct drm_crtc *crtc,
 
                addr = i915_gem_obj_ggtt_offset(obj);
        } else {
-               int align = IS_I830(dev) ? 16 * 1024 : 256;
-               ret = i915_gem_attach_phys_object(dev, obj,
-                                                 (intel_crtc->pipe == 0) ? I915_GEM_PHYS_CURSOR_0 : I915_GEM_PHYS_CURSOR_1,
-                                                 align);
-               if (ret) {
-                       DRM_DEBUG_KMS("failed to attach phys object\n");
-                       goto fail_locked;
+               if (obj->phys_obj == NULL) {
+                       ret = i915_gem_attach_phys_object(dev, obj,
+                                                         (intel_crtc->pipe == 0) ? I915_GEM_PHYS_CURSOR_0 : I915_GEM_PHYS_CURSOR_1,
+                                                         IS_I830(dev) ? 16 * 1024 : 256);
+                       if (ret) {
+                               DRM_DEBUG_KMS("failed to attach phys object\n");
+                               goto fail_locked;
+                       }
                }
                addr = obj->phys_obj->handle->busaddr;
        }
Comment 19 Chris Wilson 2014-04-17 10:14:13 UTC
Created attachment 97503 [details] [review]
Fix dynamic allocation of physical handles
Comment 20 CarlEitsger 2014-04-17 11:59:08 UTC
Super, Chris! The bug was in the Kernel, correct? So I cannot easily test, if it fixes the issue on my computer too.

So all driver versions after snapshot 2.99.911 (exactly after 2014-03-27 25ca8f1 sna: Support variable sized cursors) will only work with a patched / new kernel version (something later than 3.14.1), if I understand correctly.
Comment 21 Chris Wilson 2014-04-17 13:30:14 UTC
The bug is limited to just a few models of machines, and we should be pushing the kernel patch back through the stable trees since we end up doing nasty things to hardware. But I am working on a follow up patch to add a kernel feature that will both make the cursors more efficient and allow userspace to detect when it needs to workaround a broken kernel.
Comment 22 Chris Wilson 2014-04-18 07:37:02 UTC
Created attachment 97557 [details] [review]
Don't reuse cursors on broken kernels

This should do the trick for your machine and kernel...
Comment 23 Chris Wilson 2014-04-18 13:26:19 UTC
This should w/a the issue in userspace

commit 154f7e9668bffdf565b6914a3a3e5bdfe17aa1b9
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Apr 18 09:28:02 2014 +0100

    sna: Do not reuse physical cursors for the kernel is broken
    
    Big bug in the kernel that prevents the sharing of cursors across pipes
    when they are backed by a phys_obj. To prevent hitting that bug, don't
    do that!
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77351
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Comment 24 CarlEitsger 2014-04-18 23:41:02 UTC
(In reply to comment #23)
> This should w/a the issue in userspace

Hmmmm, I have tested with 2d9ae02 (sna/video: Provide a fallback path for pwrite failure) and 3.14.1 kernel but the problem still occurs. I will add your debug patch from comment 4 and post the error log tomorrow if you did not comment otherwise until then.
Comment 25 CarlEitsger 2014-04-19 12:56:14 UTC
patching does not seem to work anymore - quite understandable:

patching file sna_display.c
patch unexpectedly ends in middle of line
Hunk #1 succeeded at 87 with fuzz 1.


I also saw that you committed a new driver version. Will try 58a757b "sna: Do not reuse physical cursors for the kernel is broken, harder".
Comment 26 Chris Wilson 2014-04-19 12:58:57 UTC
Tried to update bugzilla earlier, but it was playing up. Hopefully this is enough to prevent keep a cursor per pipe...
Comment 27 CarlEitsger 2014-04-19 13:20:27 UTC
(In reply to comment #26)
> Tried to update bugzilla earlier, but it was playing up. 

Don't worry.

It works, I see no cursor problems with 3.14.1-1-ARCH and (II) intel(0): SNA compiled from 2.99.911-88-g58a757b !

Great! I will continue to use this driver version to see if I spot any problems.
Comment 28 CarlEitsger 2014-04-24 11:54:38 UTC
As announced in my Comment 27 I continued to use the git driver: With driver version 58a757b I experienced a *freeze* of keyboard (even numlock LED did not respond to numlock key), mouse cursor and any screen change (old image is displayed) at not good reproducible conditions. I did not recognize a pattern when exactly it happens  (it occured only three times, I think it was never in idle but when starting a big program or doing some work inside a program). 

Access to the machine via ssh and shutdown or kill of the X server is possible.

I have changed to the old version 2.99.910 again since two days and I did not experience a freeze anymore (as I never did up to last week). Yes, the freezes may still have been caused by something else and randomness may have caused no freezes since two days.

I had a look in the logs (Xorg.0.log and my system's journalctl) but did not find any suspicious entry just before/at the freeze.

I know this problem report is far from helpful, but I hope it is not useless...
Comment 29 Chris Wilson 2014-04-24 12:08:30 UTC
If it happens again, grab a stacktrace of X (gdb --pid=`pidof X` ; bt) and cat /proc/`pidof X`/stack and please file a new bug. I suspect that is a fun and separate issue.
Comment 30 CarlEitsger 2014-04-25 00:25:19 UTC
okay, thanks Chris. I guess this one (cursors) is resolved, fixed then. Sure, I will post a new bug with a stack trace (swiching to the last git version now, again).
Comment 31 Chris Wilson 2014-04-25 06:50:59 UTC
It's not entirely resolved yet. I've still got to get the patches to fix the kernel from giving stale pointers to the hardware upstream.
Comment 32 CarlEitsger 2014-04-26 17:30:25 UTC
okay, sorry about the closing. 

The freeze issue is now at bug 77975 ("X freezes for no apparent reason and reproducability").
Comment 34 Daniel Vetter 2014-05-19 20:04:01 UTC
(In reply to comment #33)
> Please test
> http://mid.gmane.org/1400071997-30498-1-git-send-email-chris@chris-wilson.co.
> uk

Poke.
Comment 35 CarlEitsger 2014-05-19 22:44:44 UTC
(In reply to comment #34)
> (In reply to comment #33)
> > Please test
> > http://mid.gmane.org/1400071997-30498-1-git-send-email-chris@chris-wilson.co.
> > uk
> 
> Poke.

Sorry, did not notice due to very short new text here (I do not use the default email notification).  

Did you mean me to test? If so - what and how?
Comment 36 Daniel Vetter 2014-05-20 09:35:06 UTC
On Tue, May 20, 2014 at 12:44 AM,  <bugzilla-daemon@freedesktop.org> wrote:
> --- Comment #35 from CarlEitsger <4607vrfcr84spd21f08@weg-werf-email.de> ---
> (In reply to comment #34)
>> (In reply to comment #33)
>> > Please test
>> > http://mid.gmane.org/1400071997-30498-1-git-send-email-chris@chris-wilson.co.
>> > uk
>>
>> Poke.
>
> Sorry, did not notice due to very short new text here (I do not use the default
> email notification).
>
> Did you mean me to test? If so - what and how?

Apply patch to kernel, compile and test it.
-Daniel
Comment 37 CarlEitsger 2014-05-20 10:41:19 UTC
(In reply to comment #36)
> Apply patch to kernel, compile and test it.
> -Daniel

Sorry, kernel compiling/patching is outside my capabilities - currently. I only can compile/patch the graphics driver xf86-video-intel.git. Maybe Chris can test it (on his spare 855gm hardware).
Comment 38 Chris Wilson 2014-06-03 13:53:08 UTC
commit 00731155a73020c8e3c215723be193f96b4fcb1f
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed May 21 12:42:56 2014 +0100

    drm/i915: Fix dynamic allocation of physical handles
    
    A single object may be referenced by multiple registers fundamentally
    breaking the static allotment of ids in the current design. When the
    object is used the second time, the physical address of the first
    assignment is relinquished and a second one granted. However, the
    hardware is still reading (and possibly writing) to the old physical
    address now returned to the system. Eventually hilarity will ensue, but
    in the short term, it just means that cursors are broken when using more
    than one pipe.
    
    v2: Fix up leak of pci handle when handling an error during attachment,
    and avoid a double kmap/kunmap. (Ville)
    Rebase against -fixes.
    
    v3: And fix the error handling added in v2 (Ville)
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77351
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
    Cc: Jani Nikula <jani.nikula@linux.intel.com>
    Cc: stable@vger.kernel.org
    Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
    Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.