Bug 95185 - sna_driver causes X-server to get SIGSEGV
Summary: sna_driver causes X-server to get SIGSEGV
Status: RESOLVED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/intel (show other bugs)
Version: git
Hardware: Other All
: medium normal
Assignee: Chris Wilson
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-04-28 12:22 UTC by Jason Vas Dias
Modified: 2016-04-28 19:10 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments

Description Jason Vas Dias 2016-04-28 12:22:58 UTC
Here is a GDB stack trace of Xorg server version 1.18.3, 
caused by xf86-video-intel version 2.99.917, with the patches
of Bug #95140 that allows the Xserver to start without coredumping,
gathered by setting the xorg.conf
ServerFlags 'Option "NoTrapSignals" "true"',  starting up Xorg with :
  $ ulimit -c unlimited
  $ Xorg -logverbose 7 :0 vt04 & (sleep 1; xterm -fg white -bg black &),
and then, in the xterm , starting up DBUS & attempting to start the 
Window Manager (enlightenment) with 'enlightenment_start' ; the server
gets a SIGSEGV, and hangs the machine, but upon hard poweroff & restart, 
I find a core file was created - here is the stack trace & gdb output 
showing the cause of the problem :
  $ echo 't a a bt
up
p rq->bo
'  > gdb.cmds
  $ gdb -batch -x gdb.cmds  $BLD/xserver/hw/xfree86/Xorg  core > gdb.stack.trace
  $ cat gdb.stack.trace
[New LWP 4411]
[New LWP 4421]
[New LWP 4422]
[New LWP 4423]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib64/libthread_db.so.1".
Core was generated by `Xorg -logverbose 7 :0 vt04'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  __kgem_busy (handle=<error reading variable: Cannot access memory at address 0x80>, kgem=0x7f5898063000) at /usr/os_src/xorg/driver/xf86-video-intel/src/sna/kgem.c:620
620             busy.handle = handle;
[Current thread is 1 (Thread 0x7f58980fc8c0 (LWP 4411))]

Thread 4 (Thread 0x7f58903bd700 (LWP 4423)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007f5892780761 in __run__ (arg=0x1617c40) at /usr/os_src/xorg/driver/xf86-video-intel/src/sna/sna_threads.c:70
#2  0x00007f5896112394 in start_thread (arg=0x7f58903bd700) at pthread_create.c:333
#3  0x00007f589640f8ed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 3 (Thread 0x7f5890bbe700 (LWP 4422)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007f5892780761 in __run__ (arg=0x1617bd0) at /usr/os_src/xorg/driver/xf86-video-intel/src/sna/sna_threads.c:70
#2  0x00007f5896112394 in start_thread (arg=0x7f5890bbe700) at pthread_create.c:333
#3  0x00007f589640f8ed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 2 (Thread 0x7f58913bf700 (LWP 4421)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007f5892780761 in __run__ (arg=0x1617b60) at /usr/os_src/xorg/driver/xf86-video-intel/src/sna/sna_threads.c:70
#2  0x00007f5896112394 in start_thread (arg=0x7f58913bf700) at pthread_create.c:333
#3  0x00007f589640f8ed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 1 (Thread 0x7f58980fc8c0 (LWP 4411)):
#0  __kgem_busy (handle=<error reading variable: Cannot access memory at address 0x80>, kgem=0x7f5898063000) at /usr/os_src/xorg/driver/xf86-video-intel/src/sna/kgem.c:620
#1  kgem_commit (kgem=0x7f5898063000) at /usr/os_src/xorg/driver/xf86-video-intel/src/sna/kgem.c:2972
#2  _kgem_submit (kgem=kgem@entry=0x7f5898063000) at /usr/os_src/xorg/driver/xf86-video-intel/src/sna/kgem.c:3731
#3  0x00007f5892731bbd in sna_accel_wakeup_handler (sna=sna@entry=0x7f5898063000) at /usr/os_src/xorg/driver/xf86-video-intel/src/sna/sna_accel.c:18096
#4  0x00007f589274c4d4 in sna_wakeup_handler (arg=<optimized out>, result=0, read_mask=0x821a40 <LastSelectMask>) at /usr/os_src/xorg/driver/xf86-video-intel/src/sna/sna_driver.c:773
#5  0x00000000004393ba in WakeupHandler (result=result@entry=0, pReadmask=pReadmask@entry=0x821a40 <LastSelectMask>) at /usr/os_src/xorg/xserver/dix/dixutils.c:426
#6  0x000000000057c837 in WaitForSomething (pClientsReady=pClientsReady@entry=0x19d1840) at /usr/os_src/xorg/xserver/os/WaitFor.c:230
#7  0x000000000043487e in Dispatch () at /usr/os_src/xorg/xserver/dix/dispatch.c:359
#8  0x0000000000438883 in dix_main (argc=5, argv=0x7fff2b5de638, envp=<optimized out>) at /usr/os_src/xorg/xserver/dix/main.c:300
#9  0x00007f5896348710 in __libc_start_main (main=0x424030 <main>, argc=5, argv=0x7fff2b5de638, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fff2b5de628) at ../csu/libc-start.c:289
#10 0x0000000000424069 in _start () at ../sysdeps/x86_64/start.S:118
#1  kgem_commit (kgem=0x7f5898063000) at /usr/os_src/xorg/driver/xf86-video-intel/src/sna/kgem.c:2972
2972                        __kgem_busy(kgem, rq->bo->handle))
$1 = (struct kgem_bo *) 0x0


The problem here is that the 'rq->bo' parameter is NULL -  
access to invalid memory addresses will cause a segmentation violation,
indicated by a SIGSEGV signal being sent to the process, which causes
it to dump core (if ulimits allow) .

Really, the xf86-video-intel driver should pay more attention to not generating
or using invalid memory addresses !  The current latest version of it cannot
start without a patch to avoid one being accessed, and cannot start the 
window manager without another patch to avoid this invalid memory access.

Here is my first guess at a patch to fix:

$ diff -U0 kgem.c~ kgem.c
--- kgem.c~     2016-04-25 23:32:41.073898879 +0000
+++ kgem.c      2016-04-28 12:20:41.006474178 +0000
@@ -2789 +2789 @@
-       kgem->retire(kgem);
+        if(NULL != kgem->retire) kgem->retire(kgem);
@@ -2971 +2971,2 @@
-               if (kgem->fence[rq->ring] == NULL &&
+               if ((kgem->fence[rq->ring] == NULL) &&
+                   (NULL != rq) && (NULL != rq->bo) && (NULL != rq->bo->handle) &&

(this also shows the patch from Bug #95140 allowing Xorg to start up).

Please could the xf86-video-intel developers pay more attention to 
not generating invalid memory address accesses.
Comment 1 Chris Wilson 2016-04-28 12:31:30 UTC
Your patches are incorrect. Perhaps if you used the once in the source?
Comment 2 Jason Vas Dias 2016-04-28 13:34:52 UTC
(In reply to Chris Wilson from comment #1)
> Your patches are incorrect. Perhaps if you used the once in the source?

Please explain what you mean by this - I cannot determine what this might be.

Patch A allows the Xorg server to start without core dumping:
--- kgem.c~     2016-04-25 23:32:41.073898879 +0000
+++ kgem.c      2016-04-28 12:20:41.006474178 +0000
@@ -2789 +2789 @@
-       kgem->retire(kgem);
+        if(NULL != kgem->retire) kgem->retire(kgem);

Patch B prevents the Xorg server core dumping on window manager initialization:
@@ -2971 +2971,2 @@
+               if ((kgem->fence[rq->ring] == NULL) &&
+                   (NULL != rq) && (NULL != rq->bo) && (NULL != rq->bo->handle) 

Unfortunately, while it does prevent the XServer from core dumping,
Patch B does not allow it to proceed - it just hangs - it is rather weird,
because at first the Xserver starts and displays an xterm; then, when
I try to start the window manager, the graphical display disappears ,
a blank screen is displayed, and the machine ceases to respond to any
keystroke (like the the VT-Switch sequence: <CTRL>+<ALT>+<F[N]>
to switch to terminal N, or <CTRL>+<ALT>+<DEL> to reboot - none of 
them have any effect) .

At first, this hang was caused by the x86-video-intel driver coredumping 
when trying to invoke  __kgem_busy(kgem, rq->bo->handle) when rq->bo is NULL,
at kgem.c line 2973, but now with Patch B fixing this, there is no coredump,
but the server still hangs the machine and there is no display of anything
(text or graphics) and no way of stopping the machine except by removing
the power cable and battery - the power button does not work either).

Any ideas how to work around so that I can start a window manager ?
Thank & Regards, Jason
Comment 3 Jason Vas Dias 2016-04-28 18:34:53 UTC
There are a few other places where the xf86-video-intel code 
core dumps on access to rq->bo - here are the ones I've found:

$ diff -U0 kgem.c~ kgem.c
--- kgem.c~     2016-04-25 23:32:41.073898879 +0000
+++ kgem.c      2016-04-28 18:04:21.161610936 +0000
@@ -2734,2 +2734,3 @@
-               if (__kgem_busy(kgem, rq->bo->handle))
-                       break;
+               if( (NULL != rq) && (NULL != rq->bo) ) 
+                 if (__kgem_busy(kgem, rq->bo->handle))
+                   break;
@@ -2789 +2790 @@
-       kgem->retire(kgem);
+        if(NULL != kgem->retire) kgem->retire(kgem);
@@ -2804 +2805 @@
-
+               if( (NULL != rq) && (NULL != rq->bo) )
@@ -2827 +2828,2 @@
-       if (__kgem_busy(kgem, rq->bo->handle)) {
+       if( (NULL != rq) && (NULL != rq->bo) )
+         if (__kgem_busy(kgem, rq->bo->handle)) {
@@ -2832,3 +2834,3 @@
-       }
-
-       DBG(("%s: ring=%d idle (handle=%d)\n",
+         }
+       if( (NULL != rq) && (NULL != rq->bo) )
+         DBG(("%s: ring=%d idle (handle=%d)\n",
@@ -2970,4 +2972,4 @@
-
-               if (kgem->fence[rq->ring] == NULL &&
-                   __kgem_busy(kgem, rq->bo->handle))
-                       kgem->fence[rq->ring] = rq;
+               if (kgem->fence[rq->ring] == NULL)                
+                  if( (NULL != rq) && (NULL != rq->bo) )
+                    if( __kgem_busy(kgem, rq->bo->handle) )
+                      kgem->fence[rq->ring] = rq;


The X-Server no longer coredumps, the display just goes black and 
the machine hangs on window manager start - it also happens with twm .

I guess I'll have to wait until I get access to another machine in order
to debug Xserver with GDB & get to the bottom of the problem.
Comment 4 Jason Vas Dias 2016-04-28 19:10:11 UTC
Aha! I got the window manager working, with the patches applied to 
xf86-video-intel, by specifying these Options in xorg.conf :

Section "Device"
        Identifier  "Intel"
        Screen      0
        Driver      "intel"
        BusID       "PCI:0:2:0"
        Option      "Monitor-eDP-1" "eDP-1"
        Option      "Monitor-DisplayPort-0" "eDP-1"
# new options added: 
        Option      "NoAccel" "true"
        Option      "DDC"     "false"
        Option      "FallbackDebug" "true"
        Option      "DebugFlushBatches" "true"
        Option      "DebugFlushCaches" "true"
        Option      "DebugWait" "true"
EndSection


I'm not sure exactly which option causes it to work - maybe 'NoAccel' ?
will have to add each one individually, rebooting machine each time 
for ones that don't work - will do this some other time, unless the
developers can enlighten me as to the lack of which option might 
cause the machine to hang.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.