Here is a GDB stack trace of Xorg server version 1.18.3, caused by xf86-video-intel version 2.99.917, with the patches of Bug #95140 that allows the Xserver to start without coredumping, gathered by setting the xorg.conf ServerFlags 'Option "NoTrapSignals" "true"', starting up Xorg with : $ ulimit -c unlimited $ Xorg -logverbose 7 :0 vt04 & (sleep 1; xterm -fg white -bg black &), and then, in the xterm , starting up DBUS & attempting to start the Window Manager (enlightenment) with 'enlightenment_start' ; the server gets a SIGSEGV, and hangs the machine, but upon hard poweroff & restart, I find a core file was created - here is the stack trace & gdb output showing the cause of the problem : $ echo 't a a bt up p rq->bo ' > gdb.cmds $ gdb -batch -x gdb.cmds $BLD/xserver/hw/xfree86/Xorg core > gdb.stack.trace $ cat gdb.stack.trace [New LWP 4411] [New LWP 4421] [New LWP 4422] [New LWP 4423] [Thread debugging using libthread_db enabled] Using host libthread_db library "/usr/lib64/libthread_db.so.1". Core was generated by `Xorg -logverbose 7 :0 vt04'. Program terminated with signal SIGSEGV, Segmentation fault. #0 __kgem_busy (handle=<error reading variable: Cannot access memory at address 0x80>, kgem=0x7f5898063000) at /usr/os_src/xorg/driver/xf86-video-intel/src/sna/kgem.c:620 620 busy.handle = handle; [Current thread is 1 (Thread 0x7f58980fc8c0 (LWP 4411))] Thread 4 (Thread 0x7f58903bd700 (LWP 4423)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 #1 0x00007f5892780761 in __run__ (arg=0x1617c40) at /usr/os_src/xorg/driver/xf86-video-intel/src/sna/sna_threads.c:70 #2 0x00007f5896112394 in start_thread (arg=0x7f58903bd700) at pthread_create.c:333 #3 0x00007f589640f8ed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 Thread 3 (Thread 0x7f5890bbe700 (LWP 4422)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 #1 0x00007f5892780761 in __run__ (arg=0x1617bd0) at /usr/os_src/xorg/driver/xf86-video-intel/src/sna/sna_threads.c:70 #2 0x00007f5896112394 in start_thread (arg=0x7f5890bbe700) at pthread_create.c:333 #3 0x00007f589640f8ed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 Thread 2 (Thread 0x7f58913bf700 (LWP 4421)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 #1 0x00007f5892780761 in __run__ (arg=0x1617b60) at /usr/os_src/xorg/driver/xf86-video-intel/src/sna/sna_threads.c:70 #2 0x00007f5896112394 in start_thread (arg=0x7f58913bf700) at pthread_create.c:333 #3 0x00007f589640f8ed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 Thread 1 (Thread 0x7f58980fc8c0 (LWP 4411)): #0 __kgem_busy (handle=<error reading variable: Cannot access memory at address 0x80>, kgem=0x7f5898063000) at /usr/os_src/xorg/driver/xf86-video-intel/src/sna/kgem.c:620 #1 kgem_commit (kgem=0x7f5898063000) at /usr/os_src/xorg/driver/xf86-video-intel/src/sna/kgem.c:2972 #2 _kgem_submit (kgem=kgem@entry=0x7f5898063000) at /usr/os_src/xorg/driver/xf86-video-intel/src/sna/kgem.c:3731 #3 0x00007f5892731bbd in sna_accel_wakeup_handler (sna=sna@entry=0x7f5898063000) at /usr/os_src/xorg/driver/xf86-video-intel/src/sna/sna_accel.c:18096 #4 0x00007f589274c4d4 in sna_wakeup_handler (arg=<optimized out>, result=0, read_mask=0x821a40 <LastSelectMask>) at /usr/os_src/xorg/driver/xf86-video-intel/src/sna/sna_driver.c:773 #5 0x00000000004393ba in WakeupHandler (result=result@entry=0, pReadmask=pReadmask@entry=0x821a40 <LastSelectMask>) at /usr/os_src/xorg/xserver/dix/dixutils.c:426 #6 0x000000000057c837 in WaitForSomething (pClientsReady=pClientsReady@entry=0x19d1840) at /usr/os_src/xorg/xserver/os/WaitFor.c:230 #7 0x000000000043487e in Dispatch () at /usr/os_src/xorg/xserver/dix/dispatch.c:359 #8 0x0000000000438883 in dix_main (argc=5, argv=0x7fff2b5de638, envp=<optimized out>) at /usr/os_src/xorg/xserver/dix/main.c:300 #9 0x00007f5896348710 in __libc_start_main (main=0x424030 <main>, argc=5, argv=0x7fff2b5de638, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fff2b5de628) at ../csu/libc-start.c:289 #10 0x0000000000424069 in _start () at ../sysdeps/x86_64/start.S:118 #1 kgem_commit (kgem=0x7f5898063000) at /usr/os_src/xorg/driver/xf86-video-intel/src/sna/kgem.c:2972 2972 __kgem_busy(kgem, rq->bo->handle)) $1 = (struct kgem_bo *) 0x0 The problem here is that the 'rq->bo' parameter is NULL - access to invalid memory addresses will cause a segmentation violation, indicated by a SIGSEGV signal being sent to the process, which causes it to dump core (if ulimits allow) . Really, the xf86-video-intel driver should pay more attention to not generating or using invalid memory addresses ! The current latest version of it cannot start without a patch to avoid one being accessed, and cannot start the window manager without another patch to avoid this invalid memory access. Here is my first guess at a patch to fix: $ diff -U0 kgem.c~ kgem.c --- kgem.c~ 2016-04-25 23:32:41.073898879 +0000 +++ kgem.c 2016-04-28 12:20:41.006474178 +0000 @@ -2789 +2789 @@ - kgem->retire(kgem); + if(NULL != kgem->retire) kgem->retire(kgem); @@ -2971 +2971,2 @@ - if (kgem->fence[rq->ring] == NULL && + if ((kgem->fence[rq->ring] == NULL) && + (NULL != rq) && (NULL != rq->bo) && (NULL != rq->bo->handle) && (this also shows the patch from Bug #95140 allowing Xorg to start up). Please could the xf86-video-intel developers pay more attention to not generating invalid memory address accesses.
Your patches are incorrect. Perhaps if you used the once in the source?
(In reply to Chris Wilson from comment #1) > Your patches are incorrect. Perhaps if you used the once in the source? Please explain what you mean by this - I cannot determine what this might be. Patch A allows the Xorg server to start without core dumping: --- kgem.c~ 2016-04-25 23:32:41.073898879 +0000 +++ kgem.c 2016-04-28 12:20:41.006474178 +0000 @@ -2789 +2789 @@ - kgem->retire(kgem); + if(NULL != kgem->retire) kgem->retire(kgem); Patch B prevents the Xorg server core dumping on window manager initialization: @@ -2971 +2971,2 @@ + if ((kgem->fence[rq->ring] == NULL) && + (NULL != rq) && (NULL != rq->bo) && (NULL != rq->bo->handle) Unfortunately, while it does prevent the XServer from core dumping, Patch B does not allow it to proceed - it just hangs - it is rather weird, because at first the Xserver starts and displays an xterm; then, when I try to start the window manager, the graphical display disappears , a blank screen is displayed, and the machine ceases to respond to any keystroke (like the the VT-Switch sequence: <CTRL>+<ALT>+<F[N]> to switch to terminal N, or <CTRL>+<ALT>+<DEL> to reboot - none of them have any effect) . At first, this hang was caused by the x86-video-intel driver coredumping when trying to invoke __kgem_busy(kgem, rq->bo->handle) when rq->bo is NULL, at kgem.c line 2973, but now with Patch B fixing this, there is no coredump, but the server still hangs the machine and there is no display of anything (text or graphics) and no way of stopping the machine except by removing the power cable and battery - the power button does not work either). Any ideas how to work around so that I can start a window manager ? Thank & Regards, Jason
There are a few other places where the xf86-video-intel code core dumps on access to rq->bo - here are the ones I've found: $ diff -U0 kgem.c~ kgem.c --- kgem.c~ 2016-04-25 23:32:41.073898879 +0000 +++ kgem.c 2016-04-28 18:04:21.161610936 +0000 @@ -2734,2 +2734,3 @@ - if (__kgem_busy(kgem, rq->bo->handle)) - break; + if( (NULL != rq) && (NULL != rq->bo) ) + if (__kgem_busy(kgem, rq->bo->handle)) + break; @@ -2789 +2790 @@ - kgem->retire(kgem); + if(NULL != kgem->retire) kgem->retire(kgem); @@ -2804 +2805 @@ - + if( (NULL != rq) && (NULL != rq->bo) ) @@ -2827 +2828,2 @@ - if (__kgem_busy(kgem, rq->bo->handle)) { + if( (NULL != rq) && (NULL != rq->bo) ) + if (__kgem_busy(kgem, rq->bo->handle)) { @@ -2832,3 +2834,3 @@ - } - - DBG(("%s: ring=%d idle (handle=%d)\n", + } + if( (NULL != rq) && (NULL != rq->bo) ) + DBG(("%s: ring=%d idle (handle=%d)\n", @@ -2970,4 +2972,4 @@ - - if (kgem->fence[rq->ring] == NULL && - __kgem_busy(kgem, rq->bo->handle)) - kgem->fence[rq->ring] = rq; + if (kgem->fence[rq->ring] == NULL) + if( (NULL != rq) && (NULL != rq->bo) ) + if( __kgem_busy(kgem, rq->bo->handle) ) + kgem->fence[rq->ring] = rq; The X-Server no longer coredumps, the display just goes black and the machine hangs on window manager start - it also happens with twm . I guess I'll have to wait until I get access to another machine in order to debug Xserver with GDB & get to the bottom of the problem.
Aha! I got the window manager working, with the patches applied to xf86-video-intel, by specifying these Options in xorg.conf : Section "Device" Identifier "Intel" Screen 0 Driver "intel" BusID "PCI:0:2:0" Option "Monitor-eDP-1" "eDP-1" Option "Monitor-DisplayPort-0" "eDP-1" # new options added: Option "NoAccel" "true" Option "DDC" "false" Option "FallbackDebug" "true" Option "DebugFlushBatches" "true" Option "DebugFlushCaches" "true" Option "DebugWait" "true" EndSection I'm not sure exactly which option causes it to work - maybe 'NoAccel' ? will have to add each one individually, rebooting machine each time for ones that don't work - will do this some other time, unless the developers can enlighten me as to the lack of which option might cause the machine to hang.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.