Bug 29140

Summary:

[rs690] Freeze at Xorg startup when using KMS and multiple screens

Product:

DRI

Reporter:

steckdenis

Component:

DRM/Radeon

Assignee:

Default DRI bug account <dri-devel>

Status:

RESOLVED FIXED

QA Contact:

Severity:

normal

Priority:

medium

Version:

DRI git

Hardware:

x86-64 (AMD64)

OS:

Linux (All)

Whiteboard:

i915 platform:

i915 features:

Attachments:

Description	Flags
Complete dmesg (radeon module loaded just after the last line speaking about a segfault of glxinfo)	none
Xorg.0.log when the problem occurs	none
kdm.log (with -debug 0x18F) when the problem occurs	none
Strace output when running Xorg	none
possible fix	none

Description steckdenis 2010-07-18 07:34:56 UTC

Created attachment 37174 [details] [review]
Complete dmesg (radeon module loaded just after the last line speaking about a segfault of glxinfo)

Hello,

I have an ATI Radeon X1270 card (rs690m), with 128 Mio of sideport memory and 256 Mio of shared RAM. This memory configuration caused the bug #27529 .

I use the 2.6.35-rc5 kernel (linus-tree), Mesa git (as of the 17th of July 2010, commit 184abe8e26f76a50ede43d503aa6bf129d8d6b76, "llvmpipe: Remove unused variable in lp_test_sincos."), libdrm git (as of the first of July 2010, commit b803918f3f77c62edf22e78cb2095be399753423, "drm mode: Return -errno on drmIoctl() failure") and xf86-video-ati git (as of the 15th of July, commit cdeb1949c820242f05a8897d3ddd0718f204dacf, "kms: don't call cursor helper if using software cursor").

Because of the bug #27529, I used to boot my netbook with the "radeon.modeset=0" param, to avoir screen corruptions.

Then, I went to vacations. When I returned back to home, I wanted to try the latest versions of all my favorite softwares, especially the shiny new kernel, containing the fix of #27529.

I updated all my software stack to the versions I mentionned in the second paragraph, changed "radeon.modeset=0" to "radeon.modeset=1" and rebooted.

The reboot was ok. All my services started up nicely, except KDM. When the X server attempts to start, the screen goes black, with a small text cursor at its top left corner. This cursor doesn't move or blinks. The computer doesn't go farther in the boot sequence, it freezes.

I rebooted, using the nomodeset option, and killed KDM (which started fine). I launched a small shell loop that continuously write the dmesg output in a file and launched it.

While it was running, I unloaded the radeon module and reloaded it with the "modeset=1" parameter. All went fine, my screen was ok, with my text console on it. Then, I restarted KDM, which attempted to start Xorg.

The same bug came again. During the "freeze", I was able to see the activity of my hard disk drive (the dmesg loop contains a sync). It makes me thinking that the computer is not hard frozen, only the graphical part. I cannot go to any virtual console at this point, and I rebooted.

I attached my complete dmesg, as captured by my small script.

Thanks for resolving this bug, I hope it is the last one I will see before having a nice KMS-enabled netbook, and being able to give a try to Gallium3D, which is needed to a have the very nice Blur effect of KWin on my hardware.

Comment 1 steckdenis 2010-07-23 02:05:34 UTC

Hello,

I tested the new 2.6.35-rc6 kernel, and the bug also happens with this one. I use Mesa Git as of this morning, and the same xf86-video-ati and libdrm versions as in my last post.

I also tested with the "Option "NoAccel" "true"" in my xorg.conf, and the bug didn't happened. src/radeon_kms.c in the xf86-video-ati driver loads the EXA Xorg module when NoAccel is "false", so I didn't load it with NoAccel was "true". The bug may be there.

Comment 2 Michel Dänzer 2010-07-23 02:34:32 UTC

Please attach the Xorg.0.log and the kdm log file from when the problem occurs.

Comment 3 steckdenis 2010-07-23 02:53:18 UTC

Created attachment 37330 [details]
Xorg.0.log when the problem occurs

Comment 4 Michel Dänzer 2010-07-23 03:09:35 UTC

As the X log file appears to end abruptly, we really need to see the kdm log file.

Comment 5 steckdenis 2010-07-23 03:25:52 UTC

Created attachment 37332 [details]
kdm.log (with -debug 0x18F) when the problem occurs

Sorry for the delay, my KDM used syslog, which discarded its output. I had to tune my /etc/rc.d/kdm script to make it working and logging into a file.

Comment 6 Michel Dänzer 2010-07-23 03:53:13 UTC

Hrm, that doesn't contain more information either. Is the X server process still running when the freeze occurs? If so, can you try (from a remote login) attaching gdb to it and getting a backtrace?

Comment 7 steckdenis 2010-07-23 04:39:11 UTC

Hello,

To do what you asked, I needed to run my second computer. I usually have a two-screen setup, with a 1280x1024 screen connected to the VGA output of my netbook, and its 1366x768 LVDS screen.

When I disconnected the external screen to use it with my other computer, it booted nicely. KDM showed up as espected, and I managed to login. Glxinfo showed many visuals (sign that KMS and DRI2 are used), and glxgears runned slowly (DRI2 performance hit).

Then, I rebooted my netbook with its two screens, logged in in a virtual terminal, started sshd, re-inserted radeon with modeset=1 and launched KDM. It failed as espected.

I disconnected my sreen and attached it to my other computer. Then I sshed my netbook.

"top" showed that the processor was unused (at nearly 0%). The ssh connection was fast and responsive.

I started GDB and attached it to the running /usr/bin/X process. Unfortunately, it was not compiled with debugging symbols, so my stack trace is useless.

#0  0x00007f47e7d94093 in ?? ()
#1  0x000000000040f653 in ?? ()
#2  0x00007fffdf082ec0 in ?? ()
#3  0x00007fffdf084e0d in ?? ()
#4  0x0000000000000090 in ?? ()
#5  0x0000000000000000 in ?? ()

I hope the fact it works with a single-head setup will help you. I also have to say that without KMS, my primary screen (the one that shows the Plasma panel) is the LVDS. With KMS, it's the external one. If it helps you.

Comment 8 steckdenis 2010-08-09 06:00:25 UTC

Created attachment 37722 [details]
Strace output when running Xorg

Hello,

I tried today to reproduce this bug using Xorg Git, Linux 2.6.35 and Mesa Git. The bug happened again, except that I have some very interesting informations for you.

GDB wasn't helpfull because the bug is in the radeon kernel module. I discovered that Linux prints to dmesg a complete kernel stacktrace when an application is locked up by a mutex. By chance, it is just what is happening with Xorg, so I have a stack trace :

INFO: task Xorg:2948 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Xorg          D 00000000ffffc694     0  2948      1 0x00400005
 ffff8800683e7788 0000000000000082 ffff880001814f00 ffff8800683a1180
 0000000000014f00 0000000000014f00 ffff8800683e7fd8 ffff8800683e7fd8
 ffff8800683e7fd8 ffff88006c1df780 ffff8800683e7fd8 0000000000014f00
Call Trace:
 [<ffffffff81356bef>] __mutex_lock_slowpath+0x13f/0x310
 [<ffffffff81356dd1>] mutex_lock+0x11/0x30
 [<ffffffffa04feba5>] radeon_ring_lock+0x25/0x50 [radeon]
 [<ffffffffa0511f01>] r300_gpu_is_lockup+0x71/0x190 [radeon]
 [<ffffffffa04e753e>] radeon_fence_wait+0x33e/0x3d0 [radeon]
 [<ffffffff8106f610>] ? autoremove_wake_function+0x0/0x40
 [<ffffffffa04e6f85>] ? radeon_fence_emit+0xe5/0x130 [radeon]
 [<ffffffffa0545a58>] radeon_pm_set_clocks+0x3c8/0x5f0 [radeon]
 [<ffffffff81356ce3>] ? __mutex_lock_slowpath+0x233/0x310
 [<ffffffffa0546908>] radeon_pm_compute_clocks+0xd8/0x270 [radeon]
 [<ffffffffa04dadf3>] atombios_crtc_mode_fixup+0x23/0x40 [radeon]
 [<ffffffffa044a06b>] drm_crtc_helper_set_mode+0x15b/0x3f0 [drm_kms_helper]
 [<ffffffffa0506a7a>] ? r100_cs_packet_next_reloc+0x4a/0x1e0 [radeon]
 [<ffffffffa044abe7>] drm_crtc_helper_set_config+0x797/0x820 [drm_kms_helper]
 [<ffffffffa03e6ccf>] ? drm_mode_object_find+0x5f/0x80 [drm]
 [<ffffffffa03e7f9f>] drm_mode_setcrtc+0x2cf/0x3a0 [drm]
 [<ffffffffa03da99c>] drm_ioctl+0x37c/0x460 [drm]
 [<ffffffffa03e7cd0>] ? drm_mode_setcrtc+0x0/0x3a0 [drm]
 [<ffffffff8112d04c>] vfs_ioctl+0x3c/0xd0
 [<ffffffff8112d62c>] do_vfs_ioctl+0x7c/0x500
 [<ffffffff8112db29>] sys_ioctl+0x79/0x90
 [<ffffffff8100a017>] tracesys+0xd9/0xde

To be even more complete, I launched Xorg with strace, to see when all things are happening. I attached the strace output to this bug.

The last line, that is not complete, is when Xorg tries to call the DRM_IOCTL_MODE_SETCRTC. The two previous ioctls are DRM_IOCTL_MODE_ADDFB followed by DRM_IOCTL_MODE_SETGAMMA.

This bug doesn't happen when I use only one monitor (the internal LVDS), but only when I also use my external VGA monitor (without it, I think DRM_IOCTL_MODE_SETCRTC is never called).

I use an ATI Radeon X1270 (rs690m with 128Mio sideport memory) on a Packard Bell Dot/MA.FR netbook (it's the same as the Gateway Gateway LT3103u, but with a Packard Bell logo on it :) ).

I hope these informations will help you.

Comment 9 steckdenis 2010-08-12 09:57:45 UTC

Hello,

I think I found the problem, but I am unfortunately unable to fix it (I don't know the radeon module enough).

A change between the 2.6.34 and 2.6.35 kernels added a bunch of functions in drivers/gpu/drm/radeon/radeon_pm.c. The function that causes troubles to me is radeon_pm_set_clocks(struct radeon_device *rdev); .

This function begins by locking three mutexes, including rdev->cp.mutex.

My card is a r300, so the code goes through the "else" branch of the if. This branch contains a call to radeon_fence_emit.

Now in radeon_fence.c . I don't know how, but this function happens to call radeon_fence_wait. The problem is that radeon_fence_wait calls r300_gpu_is_lockup, by branching in "if (unlikely(!radeon_fence_signaled(fence))) {".

In r300.c : r300_gpu_is_lockup, called by radeon_fence_wait, calls radeon_ring_lock, because it wants to write in the ring.

In radeon_ring.c : radeon_ring_lock begins by calling "mutex_lock(&rdev->cp.mutex);", the exact same mutex as the one already locked by radeon_pm_set_clocks. That seems to be the problem.

Cheers.

Comment 10 Alex Deucher 2010-08-12 16:17:12 UTC

Created attachment 37828 [details] [review]
possible fix

Does this patch help?

Comment 11 steckdenis 2010-08-13 00:55:19 UTC

I applied the patch on a vanilla 2.6.35.1 kernel, and it works ! 

Thanks.

Comment 12 Alex Deucher 2010-08-13 07:54:14 UTC

I've sent the patch to Dave.  Thanks for tracking this down.

Comment 13 Jerome Glisse 2010-08-16 10:15:28 UTC

Closing

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.