Summary: | [rs690] Freeze at Xorg startup when using KMS and multiple screens | ||
---|---|---|---|
Product: | DRI | Reporter: | steckdenis |
Component: | DRM/Radeon | Assignee: | Default DRI bug account <dri-devel> |
Status: | RESOLVED FIXED | QA Contact: | |
Severity: | normal | ||
Priority: | medium | ||
Version: | DRI git | ||
Hardware: | x86-64 (AMD64) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
Description
steckdenis
2010-07-18 07:34:56 UTC
Hello, I tested the new 2.6.35-rc6 kernel, and the bug also happens with this one. I use Mesa Git as of this morning, and the same xf86-video-ati and libdrm versions as in my last post. I also tested with the "Option "NoAccel" "true"" in my xorg.conf, and the bug didn't happened. src/radeon_kms.c in the xf86-video-ati driver loads the EXA Xorg module when NoAccel is "false", so I didn't load it with NoAccel was "true". The bug may be there. Please attach the Xorg.0.log and the kdm log file from when the problem occurs. Created attachment 37330 [details]
Xorg.0.log when the problem occurs
As the X log file appears to end abruptly, we really need to see the kdm log file. Created attachment 37332 [details]
kdm.log (with -debug 0x18F) when the problem occurs
Sorry for the delay, my KDM used syslog, which discarded its output. I had to tune my /etc/rc.d/kdm script to make it working and logging into a file.
Hrm, that doesn't contain more information either. Is the X server process still running when the freeze occurs? If so, can you try (from a remote login) attaching gdb to it and getting a backtrace? Hello, To do what you asked, I needed to run my second computer. I usually have a two-screen setup, with a 1280x1024 screen connected to the VGA output of my netbook, and its 1366x768 LVDS screen. When I disconnected the external screen to use it with my other computer, it booted nicely. KDM showed up as espected, and I managed to login. Glxinfo showed many visuals (sign that KMS and DRI2 are used), and glxgears runned slowly (DRI2 performance hit). Then, I rebooted my netbook with its two screens, logged in in a virtual terminal, started sshd, re-inserted radeon with modeset=1 and launched KDM. It failed as espected. I disconnected my sreen and attached it to my other computer. Then I sshed my netbook. "top" showed that the processor was unused (at nearly 0%). The ssh connection was fast and responsive. I started GDB and attached it to the running /usr/bin/X process. Unfortunately, it was not compiled with debugging symbols, so my stack trace is useless. #0 0x00007f47e7d94093 in ?? () #1 0x000000000040f653 in ?? () #2 0x00007fffdf082ec0 in ?? () #3 0x00007fffdf084e0d in ?? () #4 0x0000000000000090 in ?? () #5 0x0000000000000000 in ?? () I hope the fact it works with a single-head setup will help you. I also have to say that without KMS, my primary screen (the one that shows the Plasma panel) is the LVDS. With KMS, it's the external one. If it helps you. Created attachment 37722 [details]
Strace output when running Xorg
Hello,
I tried today to reproduce this bug using Xorg Git, Linux 2.6.35 and Mesa Git. The bug happened again, except that I have some very interesting informations for you.
GDB wasn't helpfull because the bug is in the radeon kernel module. I discovered that Linux prints to dmesg a complete kernel stacktrace when an application is locked up by a mutex. By chance, it is just what is happening with Xorg, so I have a stack trace :
INFO: task Xorg:2948 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Xorg D 00000000ffffc694 0 2948 1 0x00400005
ffff8800683e7788 0000000000000082 ffff880001814f00 ffff8800683a1180
0000000000014f00 0000000000014f00 ffff8800683e7fd8 ffff8800683e7fd8
ffff8800683e7fd8 ffff88006c1df780 ffff8800683e7fd8 0000000000014f00
Call Trace:
[<ffffffff81356bef>] __mutex_lock_slowpath+0x13f/0x310
[<ffffffff81356dd1>] mutex_lock+0x11/0x30
[<ffffffffa04feba5>] radeon_ring_lock+0x25/0x50 [radeon]
[<ffffffffa0511f01>] r300_gpu_is_lockup+0x71/0x190 [radeon]
[<ffffffffa04e753e>] radeon_fence_wait+0x33e/0x3d0 [radeon]
[<ffffffff8106f610>] ? autoremove_wake_function+0x0/0x40
[<ffffffffa04e6f85>] ? radeon_fence_emit+0xe5/0x130 [radeon]
[<ffffffffa0545a58>] radeon_pm_set_clocks+0x3c8/0x5f0 [radeon]
[<ffffffff81356ce3>] ? __mutex_lock_slowpath+0x233/0x310
[<ffffffffa0546908>] radeon_pm_compute_clocks+0xd8/0x270 [radeon]
[<ffffffffa04dadf3>] atombios_crtc_mode_fixup+0x23/0x40 [radeon]
[<ffffffffa044a06b>] drm_crtc_helper_set_mode+0x15b/0x3f0 [drm_kms_helper]
[<ffffffffa0506a7a>] ? r100_cs_packet_next_reloc+0x4a/0x1e0 [radeon]
[<ffffffffa044abe7>] drm_crtc_helper_set_config+0x797/0x820 [drm_kms_helper]
[<ffffffffa03e6ccf>] ? drm_mode_object_find+0x5f/0x80 [drm]
[<ffffffffa03e7f9f>] drm_mode_setcrtc+0x2cf/0x3a0 [drm]
[<ffffffffa03da99c>] drm_ioctl+0x37c/0x460 [drm]
[<ffffffffa03e7cd0>] ? drm_mode_setcrtc+0x0/0x3a0 [drm]
[<ffffffff8112d04c>] vfs_ioctl+0x3c/0xd0
[<ffffffff8112d62c>] do_vfs_ioctl+0x7c/0x500
[<ffffffff8112db29>] sys_ioctl+0x79/0x90
[<ffffffff8100a017>] tracesys+0xd9/0xde
To be even more complete, I launched Xorg with strace, to see when all things are happening. I attached the strace output to this bug.
The last line, that is not complete, is when Xorg tries to call the DRM_IOCTL_MODE_SETCRTC. The two previous ioctls are DRM_IOCTL_MODE_ADDFB followed by DRM_IOCTL_MODE_SETGAMMA.
This bug doesn't happen when I use only one monitor (the internal LVDS), but only when I also use my external VGA monitor (without it, I think DRM_IOCTL_MODE_SETCRTC is never called).
I use an ATI Radeon X1270 (rs690m with 128Mio sideport memory) on a Packard Bell Dot/MA.FR netbook (it's the same as the Gateway Gateway LT3103u, but with a Packard Bell logo on it :) ).
I hope these informations will help you.
Hello, I think I found the problem, but I am unfortunately unable to fix it (I don't know the radeon module enough). A change between the 2.6.34 and 2.6.35 kernels added a bunch of functions in drivers/gpu/drm/radeon/radeon_pm.c. The function that causes troubles to me is radeon_pm_set_clocks(struct radeon_device *rdev); . This function begins by locking three mutexes, including rdev->cp.mutex. My card is a r300, so the code goes through the "else" branch of the if. This branch contains a call to radeon_fence_emit. Now in radeon_fence.c . I don't know how, but this function happens to call radeon_fence_wait. The problem is that radeon_fence_wait calls r300_gpu_is_lockup, by branching in "if (unlikely(!radeon_fence_signaled(fence))) {". In r300.c : r300_gpu_is_lockup, called by radeon_fence_wait, calls radeon_ring_lock, because it wants to write in the ring. In radeon_ring.c : radeon_ring_lock begins by calling "mutex_lock(&rdev->cp.mutex);", the exact same mutex as the one already locked by radeon_pm_set_clocks. That seems to be the problem. Cheers. Created attachment 37828 [details] [review] possible fix Does this patch help? I applied the patch on a vanilla 2.6.35.1 kernel, and it works ! Thanks. I've sent the patch to Dave. Thanks for tracking this down. Closing |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.