Bug 99138 - Xorg can't be restarted gracefully running CEDAR graphics card
Summary: Xorg can't be restarted gracefully running CEDAR graphics card
Status: RESOLVED MOVED
Alias: None
Product: xorg
Classification: Unclassified
Component: Server/General (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Xorg Project Team
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-12-18 15:23 UTC by Arthur Marsh
Modified: 2018-12-13 22:36 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
backtrace of Xorg after attempting service lxdm stop (25.82 KB, text/plain)
2017-01-01 10:28 UTC, Arthur Marsh
no flags Details
debug log as requested (4.26 KB, text/plain)
2017-01-20 09:44 UTC, Arthur Marsh
no flags Details

Description Arthur Marsh 2016-12-18 15:23:32 UTC
Running Debian unstable with:

xserver-xorg-video-radeon                     1:7.8.0-1+b1

and minimalist /etc/X11/xorg.conf:

Section "Device"
        Identifier      "Radeon CEDAR"
        Driver          "radeon"
EndSection

I experience the lockup of the video playback window running mpv (requires kill -HUP from another window to terminate mpv), chromium windows not refreshing unless I switched to vt1 and back to vt7, and 

service lxdm restart

failing. I needed to kill -9 the Xorg process to avoid having to reboot to restart Xorg.

Running mesa 13.0.2 and ddx based on Debian version 2:1.19.0-3.0

Video card identifies itself as:

01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Cedar [Radeon HD 5000/6000/7350/8350 Series] (prog-if 00 [VGA controller])
        Subsystem: Gigabyte Technology Co., Ltd Cedar [Radeon HD 5000/6000/7350/8350 Series]
        Flags: bus master, fast devsel, latency 0, IRQ 29, NUMA node 0
        Memory at d0000000 (64-bit, prefetchable) [size=256M]
        Memory at fe9e0000 (64-bit, non-prefetchable) [size=128K]
        I/O ports at c000 [size=256]
        Expansion ROM at 000c0000 [disabled] [size=128K]
        Capabilities: [50] Power Management version 3
        Capabilities: [58] Express Legacy Endpoint, MSI 00
        Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
        Capabilities: [150] Advanced Error Reporting
        Kernel driver in use: radeon
        Kernel modules: radeon
Comment 1 Michel Dänzer 2016-12-19 02:38:04 UTC
You mentioned on IRC that a "radeon_cs:0 process" prevents the system from rebooting. I assume that's the radeonsi command stream submission thread of the Xorg process being stuck somewhere in the kernel due to the GPU hang. If you try rebooting and wait for a few minutes, do backtraces of the stuck task show up in dmesg? If so, please attach those here. (It may be possible to get backtraces without waiting via some Sysrq key combination)
Comment 2 Arthur Marsh 2016-12-27 11:37:29 UTC
While bisecting an unrelated kernel bug I found that some kernels between 4.9.0 and 4.10-rc1 allowed:

service lxdm stop

to work as expected and cleanly stop the X server.

Otherwise either a kill -9 of the lxdm and Xorg process or a reboot is needed to start the X server again.

If it would be helpful I could try bisecting to find the commit that stopped X shutting down cleanly.
Comment 3 Arthur Marsh 2017-01-01 10:28:52 UTC
Created attachment 128698 [details]
backtrace of Xorg after attempting service lxdm stop
Comment 4 Arthur Marsh 2017-01-19 04:00:19 UTC
# ps -ef|grep lxdm
root      7524     1  0 13:52 ?        00:00:00 /usr/sbin/lxdm-binary -d
root      7528  7524  0 13:52 tty7     00:00:00 /usr/lib/xorg/Xorg :0 vt07 -nolisten tcp -novtswitch -auth /var/run/lxdm/lxdm-:0.auth
root      7588  5817  0 13:52 pts/0    00:00:00 grep lxdm
am64:/home/amarsh04# gdb -p 7528
GNU gdb (Debian 7.12-5) 7.12
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word".
Attaching to process 7528
[New LWP 7550]
[New LWP 7553]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007fc610ece67d in pthread_join (threadid=140488490870528,
    thread_return=0x0) at pthread_join.c:90
90      pthread_join.c: No such file or directory.
(gdb) thread apply all bt full

Thread 3 (Thread 0x7fc60697e700 (LWP 7553)):
#0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
No locals.
#1  0x00007fc610ecfbf6 in __GI___pthread_mutex_lock (mutex=0x556029709d20)
    at ../nptl/pthread_mutex_lock.c:115
        id = 283991900
        __PRETTY_FUNCTION__ = "__pthread_mutex_lock"
        type = 1
        id = <optimized out>
#2  0x00005560294752a0 in input_lock ()
No symbol table info available.
#3  0x00005560294753f5 in ?? ()
No symbol table info available.
#4  0x00007fc610ecd424 in start_thread (arg=0x7fc60697e700)
    at pthread_create.c:333
        __res = <optimized out>
        pd = 0x7fc60697e700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140488490870528,
                -2498737612137141548, 0, 140731836161135, 0, 140488697360448,
                2512654887706716884, 2512606964688775892},
              mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0},
            data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
---Type <return> to continue, or q <return> to quit---
        not_first_call = <optimized out>
        pagesize_m1 = <optimized out>
        sp = <optimized out>
        freesize = <optimized out>
        __PRETTY_FUNCTION__ = "start_thread"
#5  0x00007fc610c109bf in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:105
No locals.

Thread 2 (Thread 0x7fc6079e6700 (LWP 7550)):
#0  pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
No locals.
#1  0x00007fc60c03025b in ?? () from /usr/lib/x86_64-linux-gnu/dri/r600_dri.so
No symbol table info available.
#2  0x00007fc60c0300b7 in ?? () from /usr/lib/x86_64-linux-gnu/dri/r600_dri.so
No symbol table info available.
#3  0x00007fc610ecd424 in start_thread (arg=0x7fc6079e6700)
    at pthread_create.c:333
        __res = <optimized out>
        pd = 0x7fc6079e6700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140488508073728,
                -2498737612137141548, 0, 140731836158591, 0, 140488697360448,
---Type <return> to continue, or q <return> to quit---
                2512652753107970772, 2512606964688775892},
              mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0},
            data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
        pagesize_m1 = <optimized out>
        sp = <optimized out>
        freesize = <optimized out>
        __PRETTY_FUNCTION__ = "start_thread"
#4  0x00007fc610c109bf in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:105
No locals.

Thread 1 (Thread 0x7fc612dfc140 (LWP 7528)):
#0  0x00007fc610ece67d in pthread_join (threadid=140488490870528,
    thread_return=0x0) at pthread_join.c:90
        __tid = 7553
        _buffer = {__routine = 0x7fc610ece5a0 <cleanup>,
          __arg = 0x7fc60697ed28, __canceltype = 0, __prev = 0x0}
        oldtype = 0
        pd = 0x7fc60697e700
        self = 0x7fc612dfc140
        result = 0
#1  0x00005560294759b0 in ?? ()
No symbol table info available.
---Type <return> to continue, or q <return> to quit---
#2  0x0000556029313c06 in ?? ()
No symbol table info available.
#3  0x00007fc610b482b1 in __libc_start_main (main=0x5560292fd840, argc=8,
    argv=0x7ffeaf1a5e98, init=<optimized out>, fini=<optimized out>,
    rtld_fini=<optimized out>, stack_end=0x7ffeaf1a5e88)
    at ../csu/libc-start.c:291
        result = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {0, 8616197459934735060,
                93871496222800, 140731836161680, 0, 0, 2499477270933564116,
                2512606207147666132}, mask_was_saved = 0}}, priv = {pad = {
              0x0, 0x0, 0x7ffeaf1a5ee0, 0x7fc612e6c168}, data = {prev = 0x0,
              cleanup = 0x0, canceltype = -1357226272}}}
        not_first_call = <optimized out>
#4  0x00005560292fd87a in _start ()
No symbol table info available.
Comment 5 Michel Dänzer 2017-01-19 07:34:05 UTC
(In reply to Arthur Marsh from comment #0)
> I experience the lockup of the video playback window running mpv (requires
> kill -HUP from another window to terminate mpv), chromium windows not
> refreshing unless I switched to vt1 and back to vt7,

Sounds like bug 99333, fixed in xserver 1.19.1, available now in Debian sid.


> and 
> 
> service lxdm restart
> 
> failing. I needed to kill -9 the Xorg process to avoid having to reboot to
> restart Xorg.

Does this still happen with the above fixed?

If yes, please attach (as opposed to paste) another full backtrace of all threads with the xserver-xorg-core-dbgsym package installed.
Comment 6 Arthur Marsh 2017-01-20 09:44:54 UTC
Created attachment 129062 [details]
debug log as requested

debug log after running "service lxdm stop".
Comment 7 GitLab Migration User 2018-12-13 22:36:58 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/xorg/xserver/issues/512.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.