Bug 20516 - segfault on server regen
segfault on server regen
Status: RESOLVED FIXED
Product: xorg
Classification: Unclassified
Component: Driver/intel
unspecified
x86 (IA32) Linux (All)
: high critical
Assigned To: Keith Packard
Xorg Project Team
:
: 20867 21188 21476 22836 22941 23124 24229 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2009-03-06 15:34 UTC by Fatih Aşıcı
Modified: 2009-09-30 12:43 UTC (History)
12 users (show)

See Also:
i915 platform:
i915 features:


Attachments
gdb backtrace (1.40 KB, text/plain)
2009-03-06 15:34 UTC, Fatih Aşıcı
no flags Details
dmesg output (35.67 KB, text/plain)
2009-03-06 15:35 UTC, Fatih Aşıcı
no flags Details
Xorg log (27.79 KB, text/plain)
2009-03-06 15:36 UTC, Fatih Aşıcı
no flags Details
Xorg log with EXA (67.16 KB, text/plain)
2009-03-07 08:18 UTC, Fatih Aşıcı
no flags Details
gdb backtrace taken with EXA (962 bytes, text/plain)
2009-03-12 17:20 UTC, Fatih Aşıcı
no flags Details
Another Xorg.log (82.96 KB, text/plain)
2009-08-03 07:30 UTC, Martin Orr
no flags Details
Patch to fix the Sig11 because of missing bufmgr (441 bytes, patch)
2009-08-09 12:05 UTC, Torsten Kaiser
no flags Details | Splinter Review
Xorg.0.log w/ backtrace from aGentoo system (20.23 KB, text/plain)
2009-09-01 04:23 UTC, Toralf Förster
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Fatih Aşıcı 2009-03-06 15:34:58 UTC
Created attachment 23598 [details]
gdb backtrace

Everytime I log out the server crashes.

GM965
xorg-server: 1.6.0
xf86-video-intel: master c3a747cb54acc1b037b559313e6a2113ae2ac4c7
libdrm: master 391c92ae1799f0d1fddb2321c5713afc58575514
kernel 2.6.29-rc7 + 5ad8b7d12605e88d1e532061699102797fdefe08
Comment 1 Fatih Aşıcı 2009-03-06 15:35:42 UTC
Created attachment 23599 [details]
dmesg output
Comment 2 Fatih Aşıcı 2009-03-06 15:36:31 UTC
Created attachment 23600 [details]
Xorg log
Comment 3 Gordon Jin 2009-03-06 20:07:32 UTC
Does EXA work?
Comment 4 Fatih Aşıcı 2009-03-07 08:18:23 UTC
Created attachment 23634 [details]
Xorg log with EXA

(In reply to comment #3)
> Does EXA work?
> 

No, it doesn't. But it seems different. Server resets, and mouse cursor appears; but before the kdm screen appears server crashes again.
Comment 5 Fatih Aşıcı 2009-03-12 17:20:06 UTC
Created attachment 23817 [details]
gdb backtrace taken with EXA
Comment 6 Jesse Barnes 2009-05-11 11:21:45 UTC
Adjusting severity: crashes & hangs should be marked critical.
Comment 7 Eric Anholt 2009-07-01 00:25:10 UTC
*** Bug 21476 has been marked as a duplicate of this bug. ***
Comment 8 Eric Anholt 2009-07-01 00:26:48 UTC
*** Bug 20867 has been marked as a duplicate of this bug. ***
Comment 9 Eric Anholt 2009-07-15 15:18:44 UTC
*** Bug 21188 has been marked as a duplicate of this bug. ***
Comment 10 Gordon Jin 2009-07-18 19:56:01 UTC
*** Bug 22836 has been marked as a duplicate of this bug. ***
Comment 11 Martin Orr 2009-08-03 07:30:06 UTC
Created attachment 28304 [details]
Another Xorg.log

I also get seg faults on server regeneration with the same backtrace, and in particular to a NULL bufmgr.

I'm using:
GM965
xf86-video-intel: 50e2a6734de43a135aa91cd6e6fb5147e15ce315
kernel: 2.6.30 (with UMS)

The place where i830_init_bufmgr should be called is in i830_allocator_init after DRM_IOCTL_I915_GEM_INIT, but it isn't because the ioctl fails.  The clue here is "(EE) intel(0): Failed to initialize kernel memory manager" in the log.
(I'm not absolutely sure that I'm seeing the same bug here because neither this bug nor any of the duplicates have that in their log.)

Poking around in the kernel, this ioctl fails because it is called without the fd being master.
drmDropMaster was called in CloseScreen, via LeaveVT, but the matching call to EnterVT in ScreenInit comes long after i830_memory_init.

So I think it is necessary to call drmSetMaster somewhere in ScreenInit before calling i830_memory_init.
Comment 12 Torsten Kaiser 2009-08-09 12:04:45 UTC
I have the same bug and its a total blocker for me.
Without a fix I can't upgrade to 2.6.30.

Common setup for all my cases:
Chipset: 965GM
Driver: 2.8.0
Server: 1.6.3
Libdrm: 2.4.12

The error happens when I log into KDE via kdm and then end the session. Insteast showing the kdm login dialog again the system becomes unusable.
This is with KDE 3.5.10, no compiz only kwin, so it should not matter that my mesa is still at 7.3.

What works is 2.6.29 without KMS.
2.6.29 with KMS, 2.6.30 with and without KMS and 2.6.30-rc5 with and without KMS are all broken.
In the Not-KMS case after the logout the display turns black and I can't do anything lokal anymore. Neither VT-Switch nor a Server-Zap gets any reaction, but the rest of the system is still running (I can login via ssh and it turns off via the power button).
In the KMS case the display also turns black, but I can see a cursor at the top left corner. But otherwise the system is in the same state: ssh + powerbutton works, but keyboard and screen seem dead.

In all 5 failing cases the Xorg.0.log shows a callchain AddScreen -> unknown address in intel driver -> i830_allocate_2d_memory -> i830_allocate_memory -> drm_intel_bo_alloc.
Looking at drm_intel_bo_alloc the only thing creating a sig11 there seems to be the case that the supplied bufmgr is NULL.

I think the following is happening:
When the KDE session is ended I830CloseScreen is called and that destroys the bufmgr and sets bufmgr to NULL.
When the server wants to reopen the Screen it calls:
I830ScreenInit
\-i830_memory_init
  \-i830_allocator_init -> that only inits bufmgr in the !use_drm_mode case
  \-i830_try_memory_allocation
    \-i830_allocate_2d_memory
      \-i830_allocate_memory
        \-drm_intel_bo_alloc -> but bufmgr is still NULL -> Sig11

The first startup seems to work, because I830DrmModeInit always sets up the bufmgr, but this is not called again on server regeneration.

I'm attaching a patch that I'm currently using. It fixes the Sig11 and the Xserver no longer dies, but instead of the kdm dialog the screen stays black. But the keyboard still works, so I can switch to a console VT or zap the server with Crtl-Alt-Backspace and get a new kdm dialog.
I don't know what other initialisations might be missing, but at least it no longer "bricks" my notebook.

From the Xorg.0.log during first startup:
(II) intel(0): [DRI2] Setup complete
(**) intel(0): Framebuffer compression disabled
(**) intel(0): Tiling enabled
(**) intel(0): SwapBuffers wait enabled
(==) intel(0): VideoRam: 262144 KB
(II) intel(0): Attempting memory allocation with tiled buffers.
(II) intel(0): Tiled allocation successful.
(II) UXA(0): Driver registered support for the following operations:
(II)         solid
(II)         copy
(II)         composite (RENDER acceleration)
(==) intel(0): Backing store disabled
(==) intel(0): Silken mouse enabled
(II) intel(0): Initializing HW Cursor
(II) intel(0): No memory allocations
(II) intel(0): RandR 1.2 enabled, ignore the following RandR disabled message.
(II) intel(0): DPMS enabled
(==) intel(0): Intel XvMC decoder disabled
(II) intel(0): Set up textured video
(II) intel(0): direct rendering: DRI2 Enabled
(WW) intel(0): Option "AccelMethod" is not used
(WW) intel(0): Option "PreferredMode" is not used
(--) RandR disabled

For the regeneration:
(II) intel(0): [DRI2] Setup complete
(**) intel(0): Framebuffer compression disabled
(**) intel(0): Tiling enabled
(**) intel(0): SwapBuffers wait enabled
(EE) intel(0): BUFMGR is missing!
(==) intel(0): VideoRam: 262144 KB
(II) intel(0): Attempting memory allocation with tiled buffers.
(II) intel(0): Tiled allocation successful.
(II) UXA(0): Driver registered support for the following operations:
(II)         solid
(II)         copy
(II)         composite (RENDER acceleration)
(II) intel(0): Initializing HW Cursor
(II) intel(0): No memory allocations
(II) intel(0): RandR 1.2 enabled, ignore the following RandR disabled message.
(II) intel(0): DPMS enabled
(==) intel(0): Intel XvMC decoder disabled
(II) intel(0): Set up textured video
(II) intel(0): direct rendering: DRI2 Enabled
(--) RandR disabled

The lines "(==) intel(0): Backing store disabled" and "(==) intel(0): Silken mouse enabled" seem missing, but I don't see what might cause this.
I also do not see any errors in the kernel-syslog or Xorg.0.log when the screen stays black.
Comment 13 Torsten Kaiser 2009-08-09 12:05:52 UTC
Created attachment 28456 [details] [review]
Patch to fix the Sig11 because of missing bufmgr
Comment 14 Wang Zhenyu 2009-08-09 20:29:03 UTC
*** Bug 23124 has been marked as a duplicate of this bug. ***
Comment 15 Torsten Kaiser 2009-08-28 02:14:38 UTC
Still broken the same way with xf86-video-intel-2.8.1.
(Using kernel 2.6.31-rc8, xserver 1.6.3, libdrm 2.4.12)

The patch still applies and has the same result: The Xserver no longer crashes and can be killed with Ctrl+Alt+Backspace, without the patch display and keyboard are dead.

Ugly workaround: Adding TerminateServer=true to kdmrc

If kdm restarts the server instead only doing a regeneration the crash does not happens, because the server then does the complete initialization like one the first start.
Comment 16 Toralf Förster 2009-08-29 05:29:00 UTC
(In reply to comment #13)
> Created an attachment (id=28456) [details]
> Patch to fix the Sig11 because of missing bufmgr
This patch didn't solve the issue for my Gentoo system (vanilla 2.6.30.5, xorg 1.6.3, intel 2.8.1, https://bugs.gentoo.org/show_bug.cgi?id=278473).
Comment 17 Torsten Kaiser 2009-08-29 10:33:42 UTC
Toralf, I'm also using a Gentoo system and saw the Gentoo Bug 278473 also, but I think that is something else. That looks more like a kdm bug, because one reporter was using the proprietary nvidia driver, so a bug in xf86-video-intel can't be the reason for that misbehavior.

Also only one of the three Xorg.0.log that are attached to the Gentoo Bug have the backtrace that I am seeing (the same one that is in Martin Orr's Xorg.log that is attached to this bug).

As to my patch not helping in your case:
Did you see a backtrace like the one at the end in Martins Xorg.log before appling my patch?
If not, it might be the Gentoo-KDM-Bug and not this issue of the intel driver.

If you did have this backtrace you could do two things:
a) Look for a line like "(EE) intel(0): BUFMGR is missing!".
 * If this is not in your Xorg.log, then the changes for my patch where never used. That might suggest there is another issue causing your kdm problems.
 * If it is there, then please note, that my patch fixes the "Sig11", but not the server regen. My use of this patch is, that instead of a dead Xserver == dead keyboard+display, I only have to deal with a black screen (hung intel driver or hung kdm? I don't know. No error messages in the syslog, kdm.log or Xorg.log.) that I can zap with Ctrl+Alt+Backspace to get a working kdm back.

b) Try my "TerminateServer=true"-kdmrc-Workaround.
 * If this fixes your problem, it points to this X-Org-Bug about the server regeneration.
 * If this does not help, the server regeneration is not the cause of your problem. That would point more in the direction of the Gentoo-Kdm-Bug.
Comment 18 Toralf Förster 2009-09-01 04:23:58 UTC
Created attachment 29063 [details]
Xorg.0.log w/ backtrace from aGentoo system

Hello,
I attached the back trace for my system (T400, xorg 1.6.3, intel 2.8.1, i915 kernel module of 2.6.30.5, no KMS). The "ugly" workaround works for me.
BTW I tested today 2.6.31-rc8 + KMS, the issue differs a little bit, now that the mouse pointer is seen after logout, but the entire screen is empty (black). Ctrl-Alt-Backspace works and give a new kdm login.
Comment 19 Keith Packard 2009-09-17 18:20:14 UTC
Fixed in commit 33f98e4056706f4c30bb4327677ac49e82058231 by leaving GEM running across server reset.
Comment 20 Eric Anholt 2009-09-24 19:42:56 UTC
*** Bug 22941 has been marked as a duplicate of this bug. ***
Comment 21 Julien Cristau 2009-09-30 12:43:09 UTC
*** Bug 24229 has been marked as a duplicate of this bug. ***