Bug 78002

Summary:

[855GM] X crashes when moving the mouse cursor back into the LVDS screen area for the third time

Product:

DRI

Reporter:

CarlEitsger <4607vrfcr84spd21f08>

Component:

DRM/Intel

Assignee:

Chris Wilson <chris>

Status:

CLOSED FIXED

QA Contact:

Intel GFX Bugs mailing list <intel-gfx-bugs>

Severity:

normal

Priority:

medium

CC:

debian, intel-gfx-bugs

Version:

unspecified

Hardware:

x86 (IA32)

OS:

Linux (All)

Whiteboard:

i915 platform:

i915 features:

Attachments:

Description	Flags
X crash message in gdb and backtrace	none
gdb for comment 12 ...	none
part of Xorg log while the debug patch was active - for comment 12	none

Description CarlEitsger 2014-04-27 13:59:45 UTC

I was testing the version (intel(0): SNA compiled from 2.99.911-106-g11cc397) which fixed bug 77975 ...

However, this new version worsens the situation:

X *reproducabily* restarts (new PID after restart) when I do the following:
1. move mouse cursor more to the right than 1024 pixels (the width of my LVDS)
2. move mouse cursor back to the left (below 1024 pixels) so that it re-appears on the LVDS screen area, too. 
1a. move mouse cursor more to the right than 1024 pixels (the width of my LVDS)
2a. move mouse cursor back to the left (below 1024 pixels) so that it re-appears on the LVDS screen area, too. 
1b. move mouse cursor more to the right than 1024 pixels (the width of my LVDS)
2b. move mouse cursor back to the left (below 1024 pixels) so that it re-appears on the LVDS screen area, too. 

Exactly when the mouse cursor crosses the 1024 boundary X restarts. Each time when the third re-appearance should happen.


The same happens for the bottom LVDS boundary (768 pixels).

How could I get some debug information (if you need some)? Getting a bt with gdb does not work the previous way because the process is not there anymore. 

Workaround: I have switched back to the stable version 2.99.910 now.

Comment 1 Chris Wilson 2014-04-27 15:25:49 UTC

Hmm, I broke something in the saving of malloc errors then. Is there anything in the crashing Xorg.0.log or Xorg.0.log.old?

Comment 2 CarlEitsger 2014-04-27 18:46:54 UTC

(In reply to comment #1)
> Is there
> anything in the crashing Xorg.0.log or Xorg.0.log.old?

The last entry of the .old (which had a change date matching the previous X session) has those last entrys - which are from the startup:

> [    23.934] (II) config/udev: Adding input device PC Speaker (/dev/input/event11)
> [    23.934] (II) No input driver specified, ignoring this device.
> [    23.934] (II) This device may have been added with another device file.

So, no, there is nothing relating to the X restart.

> $ less Xorg.0.log.old | grep -i cursor
> [    22.221] (--) intel(0): Using a maximum size of 64x64 for hardware cursors
> [    22.316] (II) intel(0): HW Cursor enabled

Nothing special.

Comment 3 Chris Wilson 2014-04-27 18:58:46 UTC

Is the stderr captured anywhere? Usually as xdm.log or gdm/*.log etc.

Comment 4 CarlEitsger 2014-04-27 22:32:12 UTC

I am using KDM. `journalctl _SYSTEMD_UNIT=kdm.service`    did not output any log entries. And in /var/log/kdm.log there is not anything interesting.  Not sure where the stderr of kdm and its childs go to... I will try without kdm:


With a pure startx 2> stderr.log (no kdm not enabled in systemd) and a exec xterm X crashes on the 
third re-entry of the mouse cursor. Content generated by the crash:

> xterm: xinit: connection to X server lost
> fatal IO error 11 (Die Ressource ist zur Zeit nicht verfügbar) or KillClient on X server ":0"

With lxde instead of xterm it even crashes at the first reappear of the mouse cursor:

> XIO:  fatal IO error 11 (Resource temporarily unavailable) on X server ":0"
>       after 31 requests (31 known processed) with 0 events remaining.
> xinit: connection to X server lost
> 
> waiting for X server to shut down XIO:  fatal IO error 11 (Die Ressource ist zur Zeit nicht 
> verfügbar) on X server ":0"
>       after 3514 requests (3511 known processed) with 0 events remaining.
> pcmanfm: Fatal IO error 11 (Die Ressource ist zur Zeit nicht verfügbar) on X server :0.

By the way, X does not "restart" (as I first wrote in the bug title) when not using KDM. Apparently the restarting is a feature of KDM. Being just on tty1 with startx it crashes and makes the computer not respond to any input. Accessing by ssh (and rebooting) works.

However, I guess those stderrs are not really useful for you. Please tell me how to investigate better.

Comment 5 Chris Wilson 2014-04-28 06:36:48 UTC

Hmm, if we can't see a reason for the crash in either the log file or stderr, we need to attach gdb. This is easiest with a second machine and sshing in. So far, I haven't spotted anything in either code review or running with valgrind - but I still haven't tried hooking up a VGA monitor to the 855gm.

Comment 6 CarlEitsger 2014-04-28 10:21:58 UTC

(In reply to comment #5)
> Hmm, if we can't see a reason for the crash in either the log file or
> stderr, we need to attach gdb. This is easiest with a second machine and
> sshing in. 

Sure, this is really no problem, I also tried it before, but as soon as I attach gdb (via PID specification) X's execution seems to stop (and I need it to continue to run to trigger the error...). Probably I just need to tell to "continue" or something but I did not find anything in the man page (quickly reading). If you could tell me how to attach and continue to run X and then backtracing ...  Hmm, I just found http://visualgdb.com/gdbreference/commands/continue which tells me that I just need to issue the command "continue". Does it work this way (we have no breakpoints).

> but I still haven't tried hooking up a VGA monitor to the 855gm.

Well, I guess it is related to the different monitor sizes, so you likely only can reproduce it with a bigger / smaller external VGA monitor.

Comment 7 Chris Wilson 2014-04-28 10:26:40 UTC

Right, you need to hit continue. So connect gdb, using gdb --pid=`pidof Xorg` (or perhaps pidof X), then enter 'c' when it finishes loading the symbols and then trigger the error. When X dies, gdb will hopefully capture the error and present a command prompt again. Type 'bt'.

To get a good backtrace, typically requires the debugging symbols to be installed.

Another thing to check is whether there is a tell-tale in dmesg for why X died.

Comment 8 CarlEitsger 2014-04-28 13:46:46 UTC

nothing interesting/new in dmesg. I have no debugging symbols installed but I think it is enough:

Tested with your newest version 2a993c8. 

> Program received signal SIGSEGV, Segmentation fault.
> __sna_create_cursor (sna=0xb6aec000) at sna_display.c:3112
> 3112		c->alloc = ALIGN(size, 4096);

Also see attachment.

Comment 9 CarlEitsger 2014-04-28 13:47:30 UTC

Created attachment 98129 [details]
X crash message in gdb and backtrace

Comment 10 Chris Wilson 2014-04-28 15:01:03 UTC

Ah, that makes sense at least. It should be impossible... Do you mind applying the

index 1520533..854ee55 100644
--- a/src/sna/sna_display.c
+++ b/src/sna/sna_display.c
@@ -87,7 +87,7 @@ union compat_mode_get_connector{
 #define DEFAULT_DPI 96
 #endif
 
-#if 0
+#if 1
 #define __DBG(x) ErrorF x
 #else
 #define __DBG(x)

debugging patch and attaching the tail of the Xorg.0.log? I need to work out why we end up using more cursors than I preallocate.

Comment 11 CarlEitsger 2014-04-28 19:01:50 UTC

hmm, the patch does not work (if applied to version 2a993c8aa9e8594c32d5e67329b0dbed0d92c761 or 0b23011c27736d0ae2b33d8ea147c16b909baa57)

> $ patch sna_display.c /tmp/debugpatch
> patching file sna_display.c
> patch unexpectedly ends in middle of line
> Hunk #1 FAILED at 87.
> 1 out of 1 hunk FAILED -- saving rejects to file sna_display.c.rej

But I could apply the one from https://bugs.freedesktop.org/show_bug.cgi?id=77351#c4

> patch unexpectedly ends in middle of line
> Hunk #1 succeeded at 87 with fuzz 1.

Trying this one...

Comment 12 CarlEitsger 2014-04-28 19:48:11 UTC

Created attachment 98137 [details]
gdb for comment 12 ...

[   107.955] (II) intel(0): SNA compiled from 2.99.911-113-g0b23011

Comment 13 CarlEitsger 2014-04-28 19:49:46 UTC

Created attachment 98138 [details]
part of Xorg log while the debug patch was active - for comment 12

Comment 14 CarlEitsger 2014-04-28 19:53:41 UTC

likely uninteresting info: ... actually this was Xorg.1.log because I did a startx on tty1 while there was kdm running on tty7.

Comment 15 Chris Wilson 2014-04-28 20:26:53 UTC

Ugh. It's the cursor sharing avoidance for the pwrite paths.

Comment 16 Chris Wilson 2014-04-29 08:36:52 UTC

commit 94e39323772ef6561efcc0620f67cabd2462a0d0
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Apr 29 09:02:50 2014 +0100

    sna: Recycle physical cursors
    
    A side-effect of the workaround for incoherent physical cursors is that
    we never reused a cursor after disabling. As such moving the cursor off
    the pipe and back on would eventually consume all the preallocated
    structs leading to a segfault.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78002
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Comment 17 CarlEitsger 2014-04-29 11:46:08 UTC

Thanks! Confirming, seems to be fixed in 

> intel(0): SNA compiled from 2.99.911-114-g94e3932

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.