| Summary: | X crashes on startup with KMS (Alpha architecture) | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | DRI | Reporter: | Michael Cree <mcree> | ||||||||||
| Component: | DRM/Radeon | Assignee: | Default DRI bug account <dri-devel> | ||||||||||
| Status: | RESOLVED MOVED | QA Contact: | |||||||||||
| Severity: | major | ||||||||||||
| Priority: | medium | CC: | hramrach, mattst88, mcree | ||||||||||
| Version: | unspecified | ||||||||||||
| Hardware: | Alpha | ||||||||||||
| OS: | Linux (All) | ||||||||||||
| Whiteboard: | |||||||||||||
| i915 platform: | i915 features: | ||||||||||||
| Attachments: | 
 | ||||||||||||
| Created attachment 33020 [details]
Kernel dmesg showing KMS/drm setup and backtrace at kernel warning.Given the kernel warning and oops, this could be an issue in the kernel rather than the X server... (The white square is probably the hardware cursor left enabled due to the X server crash) Yes, the 1"x1" square is almost definitely the hardware cursor. I see a similar thing in bug 23227. I have had a go at testing KMS and the Xserver again with newer kernel (2.6.34), newer Xserver (1.8.1), and also a more recent radeon card (HD4350). The kernel console comes up fine in KMS but the Xserver crashes on start up. A corrupted patterned screen first appears then the monitor blanks as the signal is lost. The Xserver log doesn't appear to reveal anything untoward, but ominous messages appear in dmesg like so: [ 478.147216] [drm:radeon_fence_wait] *ERROR* fence(fffffc00227e9680:0x00000004) 505ms timeout going to reset GPU [ 478.147216] radeon 0000:01:00.0: GPU softreset [ 478.147216] radeon 0000:01:00.0: R_008010_GRBM_STATUS=0xA27034A4 [ 478.147216] radeon 0000:01:00.0: R_008014_GRBM_STATUS2=0x00000102 [ 478.147216] radeon 0000:01:00.0: R_000E50_SRBM_STATUS=0x200000C0 [ 478.147216] radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00007FEE [ 478.147216] radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00000001 [ 478.147216] radeon 0000:01:00.0: R_000E60_SRBM_SOFT_RESET=0x00000402 [ 478.207762] radeon 0000:01:00.0: R_008010_GRBM_STATUS=0x00003028 [ 478.207762] radeon 0000:01:00.0: R_008014_GRBM_STATUS2=0x00000002 [ 478.207762] radeon 0000:01:00.0: R_000E50_SRBM_STATUS=0x200000C0 Full Xorg log and dmesg will be attached. Created attachment 35983 [details]
kernel dmesg showing drm/radeon errors on Xserver startup.Created attachment 35984 [details]
Xserver logFurther testing with kernel 2.6.35-rc3, Xserver 1.8.1.901 (1.8.2 RC 1), and Radeon driver 6.13.99 with the HD4350 Radeon card.
I've reallocated this bug to driver/radeon since that seems more appropriate.
I see a similar crash on starting X with KMS as described before for earlier kernels.
While starting Xorg under gdb I observe a Bus error as follows:
Program received signal SIGBUS, Bus error.
0x00000200003a547c in ?? () from /lib/libc.so.6.1
(gdb) bt
#0  0x00000200003a547c in ?? () from /lib/libc.so.6.1
#1  0x00000200003a53e4 in memcpy () from /lib/libc.so.6.1
#2  0x00000200008c8f94 in drmmode_load_cursor_argb (crtc=0x12026e510, 
    image=0x120289fc0) at drmmode_display.c:392
#3  0x000000012018d714 in xf86_crtc_convert_cursor_to_argb (crtc=0x12026e510, 
    src=<value optimized out>) at xf86Cursors.c:218
#4  0x000000012018e514 in xf86_load_cursor_image (scrn=<value optimized out>, 
    src=0x1204b5160 "") at xf86Cursors.c:452
#5  0x0000000120191f94 in xf86SetCursor (pScreen=0x120276430, 
    pCurs=0x120291850, x=840, y=525) at xf86HWCurs.c:148
#6  0x000000012018f148 in xf86CursorSetCursor (pDev=0x120463d20, 
    pScreen=0x120276430, pCurs=0x120291850, x=840, y=525) at xf86Cursor.c:353
#7  0x00000001200637bc in miPointerUpdateSprite (pDev=0x120463d20)
    at mipointer.c:402
#8  0x0000000120064280 in miPointerDisplayCursor (pDev=0x120463d20, 
    pScreen=0x120276430, pCursor=0x120291850) at mipointer.c:197
#9  0x00000001200c4584 in CursorDisplayCursor (pDev=0x120463d20, 
    pScreen=0x120276430, pCursor=0x1202b8990) at cursor.c:155
#10 0x0000000120169604 in AnimCurDisplayCursor (pDev=0x120463d20, 
    pScreen=0x120276430, pCursor=0x1202b8990) at animcur.c:247
#11 0x000000012003c80c in UpdateSpriteForScreen (pDev=0x120463d20, 
    pScreen=0x120276430) at events.c:3098
#12 0x0000000120063c30 in miPointerWarpCursor (pDev=0x120463d20, 
    pScreen=0x120276430, x=840, y=525) at mipointer.c:343
#13 0x000000012013b174 in xf86WarpCursor (pDev=<value optimized out>, 
    pScreen=0x120276430, x=<value optimized out>, y=<value optimized out>)
    at xf86Cursor.c:473
#14 0x0000000120063940 in miPointerSetCursorPosition (pDev=0x120463d20, 
    pScreen=0x120276430, x=<value optimized out>, y=<value optimized out>, 
    generateEvent=0) at mipointer.c:239
#15 0x0000000120168df4 in AnimCurSetCursorPosition (pDev=0x120463d20, 
    pScreen=0x120276430, x=<value optimized out>, y=<value optimized out>, 
    generateEvent=<value optimized out>) at animcur.c:266
#16 0x000000012003ca98 in InitializeSprite (pDev=0x120463d20, pWin=0x1202b77b0)
    at events.c:3025
#17 0x00000001200473a8 in EnableDevice (dev=0x120463d20, sendevent=1 '\001')
    at devices.c:299
#18 0x0000000120047d84 in InitCoreDevices () at devices.c:610
#19 0x0000000120024d10 in main (argc=539181728, argv=0x1, 
    envp=<value optimized out>) at main.c:257
Shifting the stack context up to the routine drmmode_load_cursor_argb() and examining variables I see:
383	static void
384	drmmode_load_cursor_argb (xf86CrtcPtr crtc, CARD32 *image)
385	{
386		drmmode_crtc_private_ptr drmmode_crtc = crtc->driver_private;
387		void *ptr;
388	
389		/* cursor should be mapped already */
390		ptr = drmmode_crtc->cursor_bo->ptr;
391	
392		memcpy (ptr, image, 64 * 64 * 4);
393	
394		return;
(gdb) print *drmmode_crtc
$7 = {drmmode = 0x12026dc50, mode_crtc = 0x12026e2d0, hw_id = 0, 
  cursor_bo = 0x12026c7e0, rotate_bo = 0x0, rotate_fb_id = 0, lut_r = {
    0 <repeats 256 times>}, lut_g = {0 <repeats 256 times>}, lut_b = {
    0 <repeats 256 times>}}
(gdb) print *drmmode_crtc->cursor_bo
$8 = {ptr = 0x200009b8000, flags = 0, handle = 1, size = 16384}
(gdb) print *(long *)drmmode_crtc->cursor_bo->ptr
Cannot access memory at address 0x200009b8000
Shouldn't I be able to access the memory location (0x200009b8000) to which the image at 0x120289fc0 is to be copied by the memcpy command?
Hope this helps in indicating where the problem might be.
Michael.SIGBUS probably means something went wrong with the mapping in the kernel. See if there's anything in dmesg, but I suspect you'll have to add debugging output in drivers/gpu/drm/{radeon,ttm}/ to find out where it's coming from.No, there was nothing in dmesg. To add debug output is this a kernel/module option? Or do you mean that debugging output will have to be manually added to the radeon kernel code at strategic places with a recompile of the kernel? I'm prepared to give that a go but I will need guidance. I am not familiar with the radeon kernel code! -- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/92. | 
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 33019 [details] Xorg log with backtrace X crashes on start up when KMS is enabled on the Alpha architecture. Testing on a PWS600au (EV56 cpu) with 2.6.32.5 kernel, Xorg git master and Radeon 7000 PCI graphics card. Symptoms: On starting X the screen blanks for a bit then the console reappears with a backtrace from Xorg; and the kernel warning message. The console is damaged - a patterned white square (about 1in by 1in) remains firmly fixed in the centre of the screen. The system is left unstable and eventually crashed with kernel oops when as I was reading the logs.