Created attachment 81896 [details] Crash backtrace with a gdb list of the called functions From time to time, I'm getting crashes while using Xorg with the radeon driver and 3 monitors. The crash happens randomly. It was noticed twice today, one while running firefox/Java. The second one while editing an email on claws-mail. I'm running Fedora 19, with the following packages: xorg-x11-server-Xorg-1.14.1-4.fc19.x86_64 xorg-x11-drv-ati-7.1.0-5.20130408git6e74aacc5.fc19.x86_64 xorg-x11-glamor-0.5.0-5.20130401git81aadb8.fc19.x86_64 xorg-x11-glamor-debuginfo-0.5.0-5.20130401git81aadb8.fc19.x86_64 kernel-3.9.6-200.fc18.x86_64 Xorg relevant most relevant log: [ 2928.438] (EE) Backtrace: [ 2928.439] (EE) 0: /usr/bin/X (OsLookupColor+0x129) [0x46e539] [ 2928.439] (EE) 1: /lib64/libpthread.so.0 (__restore_rt+0x0) [0x396c60ef9f] [ 2928.439] (EE) 2: /usr/bin/X (miTrapezoidBounds+0x7d) [0x51262d] [ 2928.439] (EE) 3: /lib64/libglamor.so.0 (glamor_composite_rects_nf+0x4540) [0x7f894a185710] [ 2928.439] (EE) 4: /usr/bin/X (AddTraps+0x46f3) [0x51ecb3] [ 2928.439] (EE) 5: /usr/bin/X (SendErrorToClient+0x3f7) [0x436e47] [ 2928.439] (EE) 6: /usr/bin/X (_init+0x3ab2) [0x429ae2] [ 2928.439] (EE) 7: /lib64/libc.so.6 (__libc_start_main+0xf5) [0x396be21b75] [ 2928.440] (EE) 8: /usr/bin/X (_start+0x29) [0x426741] [ 2928.440] (EE) 9: ? (?+0x29) [0x29] PS.: when the system boots, I suspect that the X server crashes, and then it re-starts. Every time a crash happens, it seems
Created attachment 81897 [details] Radeon dmesg stuff
Created attachment 81898 [details] Xorg.0.log
Created attachment 83300 [details] Additional dmesg I did an upgrade to Kernel 10.3 and to the latest radeon firmwares found at linux-firmware tree. The bug seems to be happening often when claws-mail is being used, in general when doing operations that would require it to popup some selection box. I'm enclosing the dmesg after one of such crashes. Sometimes, the bug is serious enough to hang up the system. Other times, it affects only the X screen - e. g. switching to a console window works, but returning back to X doesn't. At X, it enters into a loop where the screen disappears for a while... then it returns to show the last image for a while and everything happens again. Most of the time, the machine is still live, and can be accessed via ssh.
xTrapezoidValid() is: #define xTrapezoidValid(t) ((t)->left.p1.y != (t)->left.p2.y && \ (t)->right.p1.y != (t)->right.p2.y && \ (int) ((t)->bottom - (t)->top) > 0) Maybe try printing out traps when it's in a crashed state? (Doing a 'bt full' after a crash would gather values of other variables that might be of interest too.)
The problem is related to mesa. Initially, the affected system was using mesa-*-9.2-0.14.20130723.fc19 packages. Downgrading them to mesa-*-9.2-0.12.20130610.fc19.x86_64 also didn't solve. However, moving back to the mesa version shipped with Fedora 18 solved: mesa-libEGL-9.1-5.fc18.x86_64 mesa-libwayland-egl-9.1-5.fc18.x86_64 mesa-libglapi-9.1-5.fc18.x86_64 mesa-libGLES-9.1-5.fc18.x86_64 mesa-libgbm-9.1-5.fc18.x86_64 mesa-libGL-9.1-5.fc18.x86_64 It seems that something between mesa 9.1 and mesa 9.2 broke support for combo box opening on VERDE.
Can you bisect?
(In reply to comment #6) > Can you bisect? I'll try to bisect it at the weekend.
*** Bug 67690 has been marked as a duplicate of this bug. ***
(In reply to comment #7) > (In reply to comment #6) > > Can you bisect? > > I'll try to bisect it at the weekend. I'm trying to bisect. However, bisecting it will be very difficult, as mesa 9.1 doesn't build on Fedora 19 (it likely builds on some F19 rawhide version, as there was a 9.1.6 package for it). I'm trying to find a changeset between 9.1 and 9.2, but even when I find one that compiles, several of them gets segmentation fault at address 0x0. That was the case, for example, of those changesets: 78fbb41, f96c07a, 148f0de, c09a4cb. It is probably easier if you could point me some patches to revert. As refernce, I'm using the same parameters for configure as found on F19 with mesa 9.2: ./configure --build=x86_64-redhat-linux-gnu --host=x86_64-redhat-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr --bindir=/us r/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/libexec --localstatedir=/var --sharedst atedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info --enable-selinux --enable-osmesa --with-dri-driverdir=/usr/lib64/dri --enable-egl --disable-gles1 --en able-gles2 --disable-gallium-egl --disable-xvmc --enable-vdpau --with-egl-platforms=x11,drm,wayland --enable-shared-glapi --enable-gbm --disable-opencl --enable-glx-tls --enable-texture-float=yes --enable-gallium-llvm --with-llvm-shared-libs --enable-dri --enable-xa --with-gallium-drivers=svga,radeonsi,swrast,r600,r300,nouveau --with- dri-drivers=nouveau,radeon,r200,i915,i965 && \
As a reference, this is where compilation fails on 9.1: builtin_function.cpp:13727:4: error: 'builtin_inverse' was not declared in this scope builtin_inverse, ^ builtin_function.cpp:17129:4: error: 'builtin_determinant' was not declared in this scope builtin_determinant, ^ builtin_function.cpp:17143:4: error: 'builtin_inverse' was not declared in this scope builtin_inverse,
(In reply to comment #9) > (In reply to comment #7) > > (In reply to comment #6) > > > Can you bisect? > > > > I'll try to bisect it at the weekend. > > I'm trying to bisect. However, bisecting it will be very difficult, as mesa > 9.1 doesn't build on Fedora 19 (it likely builds on some F19 rawhide > version, as there was a 9.1.6 package for it). Nevermind. doing a "gmake clean" before recompiling fixes this issue. Probably, there are some missing/broken dependencies at the makefile. I'm doing the bisect right now.
Bisecting didn't help. So, I decided to check if any changes at the configured options changed from Fedora 18 and Fedora 19 could be affecting the issue. So, I compiled mesa 9.2 (actually, git changeset 8edb79f1ef) with Fedora 18, and the issue disappeared. As reference, those are the changes at the configure that happened between F18 and F19: --- Fedora_18 2013-08-04 14:46:45.655278805 -0300 +++ Fedora_19 2013-08-04 14:45:19.505487516 -0300 @@ -1,37 +1,40 @@ sudo ls configure ./configure \ --build=x86_64-redhat-linux-gnu \ --host=x86_64-redhat-linux-gnu \ --program-prefix= \ --disable-dependency-tracking \ --prefix=/usr \ --exec-prefix=/usr \ --bindir=/usr/bin \ --sbindir=/usr/sbin \ --sysconfdir=/etc \ --datadir=/usr/share \ --includedir=/usr/include \ --libdir=/usr/lib64 \ --libexecdir=/usr/libexec \ --localstatedir=/var \ --sharedstatedir=/var/lib \ --mandir=/usr/share/man \ --infodir=/usr/share/info \ --enable-selinux \ ---enable-pic \ --enable-osmesa \ ---enable-xcb \ --with-dri-driverdir=/usr/lib64/dri \ --enable-egl \ ---enable-gles1 \ +--disable-gles1 \ --enable-gles2 \ --disable-gallium-egl \ +--disable-xvmc \ +--enable-vdpau \ --with-egl-platforms=x11,drm,wayland \ --enable-shared-glapi \ --enable-gbm \ --disable-opencl \ +--enable-glx-tls \ +--enable-texture-float=yes \ --enable-gallium-llvm \ --with-llvm-shared-libs \ +--enable-dri \ --enable-xa \ ---with-gallium-drivers=svga,r300,r600,radeonsi,nouveau,swrast \ +--with-gallium-drivers=svga,radeonsi,swrast,r600,r300,nouveau \ --with-dri-drivers=nouveau,radeon,r200,i915,i965
It seems that the bug is caused by --enable-glx-tls.
Is glamor built with --enable-glx-tls as well?
(In reply to comment #14) > Is glamor built with --enable-glx-tls as well? Right, glamor needs mesa to have glx-tls enabled. I explained a little bit why at http://www.freedesktop.org/wiki/Software/Glamor/ . The root cause is xserver will enable it by default, and glamor is built within xserver domain. I used to submit a patch to mesa to enable glx-tls by default if tls is supported which is just like the same way in xserver. But the patch is not accepted.
(In reply to comment #14) > Is glamor built with --enable-glx-tls as well? I don't think so. On Fedora 19, its build script at its rpm spec file is: autoreconf --install %configure --disable-static make %{?_smp_mflags} So, no explicit --enable-glx-tls there. FIY, Glamour seems to be git changeset 81aadb8, from its package name: xorg-x11-glamor-0.5.0-5.20130401git81aadb8.fc19.src.rpm
(In reply to comment #16) > (In reply to comment #14) > > Is glamor built with --enable-glx-tls as well? > > I don't think so. > > On Fedora 19, its build script at its rpm spec file is: > autoreconf --install > %configure --disable-static > make %{?_smp_mflags} > > So, no explicit --enable-glx-tls there. > > FIY, Glamour seems to be git changeset 81aadb8, from its package name: > xorg-x11-glamor-0.5.0-5.20130401git81aadb8.fc19.src.rpm Although --enable-glx-tls is not explicitly specified. But it detect whether the tls is supported, and if it's supported, it will enable glx-tls which is the same way what xorg's autoconf does. And tls is almost always supported, so that make glamor require a mesa with --enable-glx-tls specified explicitly.
(In reply to comment #17) > Although --enable-glx-tls is not explicitly specified. But it detect whether > the tls is supported, and if it's supported, it will enable glx-tls which is > the same way what xorg's autoconf does. Then glamor's configure --help output should reflect that. > And tls is almost always supported, so that make glamor require a mesa with > --enable-glx-tls specified explicitly. But if I understand correctly, Mauro is saying that this problem happens when Mesa is built with --enable-glx-tls, but not without...
(In reply to comment #18) > (In reply to comment #17) > > Although --enable-glx-tls is not explicitly specified. But it detect whether > > the tls is supported, and if it's supported, it will enable glx-tls which is > > the same way what xorg's autoconf does. > > Then glamor's configure --help output should reflect that. Yep. Anyway, I double-checked by compiling it on mock (mock gets the standard Fedora packages, instead of my modified ones). From it, TLS seems to be enabled: ... checking for XORG... yes checking for DRI2... yes checking whether to include GLAMOR_GLES2 support... no checking whether to enable DEBUG... no checking for GL... yes checking for LIBDRM... yes checking for EGL... yes checking for GBM... yes checking for thread local storage (TLS) support... __thread checking for tls_model attribute support... yes checking that generated files are newer than configure... done > > > > And tls is almost always supported, so that make glamor require a mesa with > > --enable-glx-tls specified explicitly. > > But if I understand correctly, Mauro is saying that this problem happens > when Mesa is built with --enable-glx-tls, but not without... Yes. When mesa is compiled with --enable-glx-tls, either mesa or glamor seem to sendinvalid or incomplete commands to Cape Verde GPU on some situations (combo boxes), causing it to wait forever to complete and hanging Xorg. The GPU itself doesn't hang, as it is still possible to press CTRL-ALT-F2 to move to a console terminal. So, it seems that the Kernel radeon driver remains working properly.
(In reply to comment #18) > (In reply to comment #17) > > Although --enable-glx-tls is not explicitly specified. But it detect whether > > the tls is supported, and if it's supported, it will enable glx-tls which is > > the same way what xorg's autoconf does. > > Then glamor's configure --help output should reflect that. > > > > And tls is almost always supported, so that make glamor require a mesa with > > --enable-glx-tls specified explicitly. > > But if I understand correctly, Mauro is saying that this problem happens > when Mesa is built with --enable-glx-tls, but not without...\ You are right. I looked at Mauro's comments. And he indeed said the problem is caused by mesa is built with --enable-glx-tls. I just misunderstood his meaning previously. Unfortunately, then I have no idea for this issue now.
I'm a little confused about the symptoms of the problem: The original report and the attached gdb output show what looks like a fairly standard X server crash. When this happens, you should get kicked back to console or the display manager login screen. But the bug title and later comments instead make me think of something like a GPU lockup / reset cycle, which sometimes ends with a completely hung system. What is the connection between these two rather different kinds of symptoms? Do both of them occur every time? Together, or one after another? ...
(In reply to comment #21) > I'm a little confused about the symptoms of the problem: > > The original report and the attached gdb output show what looks like a > fairly standard X server crash. When this happens, you should get kicked > back to console or the display manager login screen. > > But the bug title and later comments instead make me think of something like > a GPU lockup / reset cycle, which sometimes ends with a completely hung > system. > > What is the connection between these two rather different kinds of symptoms? > Do both of them occur every time? Together, or one after another? ... When I opened the BZ, the original symptom were a Xorg crash. On a few cases the machine just hangs (something like 10% of the crashes). In order to track the root cause and see if this issue was not solved on a latter fix, I upgraded from kernel 3.9 to kernel 3.10.3 (and later to 3.10.4) and updated all packages at the machine. With the updated packages, the symptom changed: when the GPU stops to answer to mesa/Glamor, instead of causing Xorg to crash (and when the machine doesn't hang) it now reports: [15225.379837] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec As shown on this attachment: https://bugs.freedesktop.org/attachment.cgi?id=83300 The workaround to avoid the buggy code was to disable glx-tls at Mesa.
(In reply to comment #22) > [15225.379837] radeon 0000:01:00.0: GPU lockup CP stall for more than > 10000msec If you didn't get these messages before, then the GPU didn't hang. Also, it's unlikely that a GPU hang directly causes an X server crash. So, I suspect there's several different problems here. Can you attach a screenshot showing a claws-mail combo box which can trigger the problem(s)?
Created attachment 83727 [details] claws-mail write email dialog box
(In reply to comment #22) > When I opened the BZ, the original symptom were a Xorg crash. On a few cases > the machine just hangs (something like 10% of the crashes). Crash was already reported in bug 64912
As (In reply to comment #25) > (In reply to comment #22) > > > When I opened the BZ, the original symptom were a Xorg crash. On a few cases > > the machine just hangs (something like 10% of the crashes). > > Crash was already reported in bug 64912 Bug 64912 is already fixed, is this bug still reproducable with latest glamor git version?
I can confirm the hang is still present with glamor-git (as of today). Disabling glx-tls resolves the issue. I am using mesa-git (today as well) with a Radeon 8750M. The easiest way I found to reproduce it was to open Emacs and do something that caused text decoration, such as the underline you'd see when using flyshell and misspelling a word. It would render each individual underline "tick" individually, about 10 seconds each, until the whole word was underlined. Then Xorg would recover (until Emacs redrew that buffer again).
Sorry, spoke too soon; my issue still remains. I was using the wrong glamor so it was running through llvmpipe, which always worked fine. Back to the drawing board...
(In reply to comment #27) > It would render each individual underline "tick" individually, about 10 > seconds each, until the whole word was underlined. Then Xorg would recover > (until Emacs redrew that buffer again). Sounds like the GPU keeps locking up, and the kernel radeon driver resets it after 10 seconds. That's probably a Mesa / kernel driver issue, make sure you're using current Mesa / libdrm / kernel. Mauro, is your original problem still happening with current glamor / xf86-video-ati Git?
(In reply to comment #29) > (In reply to comment #27) > > It would render each individual underline "tick" individually, about 10 > > seconds each, until the whole word was underlined. Then Xorg would recover > > (until Emacs redrew that buffer again). > > Sounds like the GPU keeps locking up, and the kernel radeon driver resets it > after 10 seconds. That's probably a Mesa / kernel driver issue, make sure > you're using current Mesa / libdrm / kernel. > > Mauro, is your original problem still happening with current glamor / > xf86-video-ati Git? Yes. I upgraded those days to Fedora 20, and the problem started to happen again (before the upgrade, I was using an older version of glamor where this bug didn't hit because TLS was disabled there). I'm now running: kernel-3.13.4-200.fc20.x86_64 xorg-x11-glamor-0.5.1-3.20140115gitfb4d046c.fc20.x86_64 xorg-x11-drv-nouveau-1.0.9-2.fc20.x86_64 And the bug keeps happening at the very same place: when, for example, I try to change my email identity at claws-mail combo-box (there are other places where this bug hits, but this one happens all the time).
Created attachment 95118 [details] dmesg for the error with Kernel 3.13.4
Depending on what version of mesa you are using, can you try and disable hyperz? Set env var R600_DEBUG=nohyperz
Mauro, please attach a Xorg log file with Mesa built without TLS support.
(In reply to comment #32) > Depending on what version of mesa you are using, can you try and disable > hyperz? Set env var R600_DEBUG=nohyperz Yes, it seems so. Not sure how to set it permanently with systemd and lightdm. Anyway, I stopped lightdm, and started X manually with both this var set and not set. Without this env var, the nouveau driver crashes (after playing for a while with the combo box). With this env set, it stopped happening.
Created attachment 95163 [details] Xorg.0.log for mesa 9.2.5-1.20131220.fc20 compiled without TLS
(In reply to comment #34) > (In reply to comment #32) > > Depending on what version of mesa you are using, can you try and disable > > hyperz? Set env var R600_DEBUG=nohyperz > > Yes, it seems so. > > Not sure how to set it permanently with systemd and lightdm. > > Anyway, I stopped lightdm, and started X manually with both this var set and > not set. Without this env var, the nouveau driver crashes (after playing for > a while with the combo box). With this env set, it stopped happening. What version of mesa are you using? Older versions didn't implement hyperz support for SI asics. Also, this option has nothing to do with nouveau. It's a radeon driver env var.
(In reply to comment #36) > (In reply to comment #34) > > (In reply to comment #32) > > > Depending on what version of mesa you are using, can you try and disable > > > hyperz? Set env var R600_DEBUG=nohyperz > > > > Yes, it seems so. > > > > Not sure how to set it permanently with systemd and lightdm. > > > > Anyway, I stopped lightdm, and started X manually with both this var set and > > not set. Without this env var, the nouveau driver crashes (after playing for > > a while with the combo box). With this env set, it stopped happening. > > What version of mesa are you using? Fedora calls it as "9.2.5-1.20131220". It seems to be a mesa -git snapshot taken on Dec, 20 2013. > Older versions didn't implement hyperz > support for SI asics. Ok. > Also, this option has nothing to do with nouveau. > It's a radeon driver env var. Yeah, sure. Sorry for the mess. I used to have two NV cards on this machine in the past. It seems I forgot to remove the driver. The Xorg driver I'm using is: xorg-x11-drv-ati-7.2.0-3.20131101git3b38701.fc20.x86_64 Also, it seems to be a git snapshot taken on 2013-11-01 for changeset 3b38701.
(In reply to comment #34) > Anyway, I stopped lightdm, and started X manually with both [R600_DEBUG=nohyperz] set and > not set. Without this env var, the nouveau driver crashes (after playing for > a while with the combo box). With this env set, it stopped happening. Weird: AFAICT glamor doesn't use depth/stencil functionality at all. Also, radeonsi only got HyperZ support last December, but you were talking about these hangs long before then. (In reply to comment #35) > Xorg.0.log for mesa 9.2.5-1.20131220.fc20 compiled without TLS As you can see, this prevents glamor from loading, which disables all hardware acceleration in X. You can get the same effect with Option "NoAccel" in xorg.conf.
(In reply to comment #37) > > What version of mesa are you using? > > Fedora calls it as "9.2.5-1.20131220". Can you try current upstream Git master, or at least the 10.1 or a 10.0.y release? There have been a lot of fixes in radeonsi since 9.2.y.
I haven't seen any hangs on VERDE for a long time.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.