Summary: | Screen locks up at random points when using a 3D compositing wm (gnome-shell) on an rv515 (radeon mobility x1300) | ||
---|---|---|---|
Product: | Mesa | Reporter: | dmotd <inaudible> |
Component: | Drivers/Gallium/r300 | Assignee: | Default DRI bug account <dri-devel> |
Status: | RESOLVED FIXED | QA Contact: | |
Severity: | critical | ||
Priority: | high | CC: | ac, ccrisan, jstpierre, przemyslaw |
Version: | 7.11 | ||
Hardware: | x86-64 (AMD64) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
dmesg output using drm.debug=14
glxinfo startx output Xorg.0.log /proc/interrupts after the freeze /proc/interrupts on a fresh (working) boot.. output of glxinfo gdb gnome-shell backtrace possible fix |
Created attachment 52203 [details]
glxinfo
Created attachment 52204 [details]
startx output
Created attachment 52205 [details]
Xorg.0.log
> `DISPLAY=:0 openbox --replace` results in: > > "Invalid MIT-MAGIC-COOKIE-1 keyOpenbox-Message: Failed to open the display from > the DISPLAY environment variable." This is because gdm3 uses a non-default X11 authentication cookie. I use something like XAUTHORITY=/run/gdm3/$(sudo ls /run/gdm3|grep $(whoami))/database DISPLAY=:0 [...] to work around it. > running glxgears just shows an empty black box.. all other glx demos are the > same empty boxes.. Do they work with the environment variable vblank_mode=0? If yes, does the number for radeon increase in /proc/interrupts once the problem occurs? Do you have records of when you updated the kernel last? What kernel version did you have before you started experiencing this issue? (In reply to comment #4) > > `DISPLAY=:0 openbox --replace` results in: > > > > "Invalid MIT-MAGIC-COOKIE-1 keyOpenbox-Message: Failed to open the display from > > the DISPLAY environment variable." > > This is because gdm3 uses a non-default X11 authentication cookie. I use > something like > > XAUTHORITY=/run/gdm3/$(sudo ls /run/gdm3|grep $(whoami))/database DISPLAY=:0 > [...] > > to work around it. i've since ditched a graphical login, i've had better success reinitiating X with startx.. the last freeze i had i managed to initiate an openbox session (---replace) on top of the failed gnome-shell (which i killed, and switched 'fallback mode' from the cmdline). the result was many windows were inactive and frozen (ie. evolution, gnome-terminal, chromium), while a few others (gvim was one, and empathy was another) remained active and usable. i could however start new instances without issue. > > > running glxgears just shows an empty black box.. all other glx demos are the > > same empty boxes.. > > Do they work with the environment variable vblank_mode=0? If yes, does the > number for radeon increase in /proc/interrupts once the problem occurs? setting vblank_mode=0 works and displays an output.. but not much change in /proc/interrupts (irq 46 for radeon) i'll attach the output from after the freeze and one from a fresh happy boot.. Created attachment 52248 [details]
/proc/interrupts after the freeze
Created attachment 52249 [details]
/proc/interrupts on a fresh (working) boot..
(In reply to comment #5) > Do you have records of when you updated the kernel last? What kernel version > did you have before you started experiencing this issue? it has been occurring since i made the transition to gnome-shell about two months ago, but i believe it was occurring before in another compositing environment (enlightenment e17 with the ecomporph /ecomp extension - a compiz fork) which i wrongly attributed to the unstable nature of the e17/ecomorph codebase.. i didn't really test that for very long, although i can tell you that i tested it in march of this year, and some quick googling suggests that it would have been running with a 2.6.37 kernel on archlinux then. sorry i don't have package logs going back that far. (In reply to comment #6) > > > running glxgears just shows an empty black box.. all other glx demos are the > > > same empty boxes.. > > > > Do they work with the environment variable vblank_mode=0? If yes, does the > > number for radeon increase in /proc/interrupts once the problem occurs? > > setting vblank_mode=0 works and displays an output.. but not much change in > /proc/interrupts (irq 46 for radeon) Not much change for the radeon number, or none at all? If the latter, apparently the IRQ for the radeon card stops working for some reason, which would explain the core symptoms of the freeze. (In reply to comment #10) > (In reply to comment #6) > > > > running glxgears just shows an empty black box.. all other glx demos are the > > > > same empty boxes.. > > > > > > Do they work with the environment variable vblank_mode=0? If yes, does the > > > number for radeon increase in /proc/interrupts once the problem occurs? > > > > setting vblank_mode=0 works and displays an output.. but not much change in > > /proc/interrupts (irq 46 for radeon) > > Not much change for the radeon number, or none at all? If the latter, > apparently the IRQ for the radeon card stops working for some reason, which > would explain the core symptoms of the freeze. no change to the radeon irq number. (In reply to comment #11) > (In reply to comment #10) > > (In reply to comment #6) > > > > > running glxgears just shows an empty black box.. all other glx demos are the > > > > > same empty boxes.. > > > > > > > > Do they work with the environment variable vblank_mode=0? If yes, does the > > > > number for radeon increase in /proc/interrupts once the problem occurs? > > > > > > setting vblank_mode=0 works and displays an output.. but not much change in > > > /proc/interrupts (irq 46 for radeon) > > > > Not much change for the radeon number, or none at all? If the latter, > > apparently the IRQ for the radeon card stops working for some reason, which > > would explain the core symptoms of the freeze. > > no change to the radeon irq number. is there a way i can debug this further? Try the following options in the kernel command line in grub: pci=nomsi noapic irqpoll and see if any of them help. (In reply to comment #13) > Try the following options in the kernel command line in grub: > pci=nomsi > noapic > irqpoll > and see if any of them help. I have been running my machine with all the above kernel flags appended and i haven't yet experienced an issue. I haven't had a chance to exhaustively test these settings, but my machine has been active for a few days now without a screen lock, so I thought I would report back that this seems to help. (In reply to comment #14) > (In reply to comment #13) > > Try the following options in the kernel command line in grub: > > pci=nomsi > > noapic > > irqpoll > > and see if any of them help. > > I have been running my machine with all the above kernel flags appended and i > haven't yet experienced an issue. I haven't had a chance to exhaustively test > these settings, but my machine has been active for a few days now without a > screen lock, so I thought I would report back that this seems to help. Can you narrow down which specific one helps? Created attachment 54445 [details] output of glxinfo I'm experiencing the same issue on Ubuntu 11.10. I'm ready to provide all necessary backtrace. I'm attaching gnome-shell backtrace and glxinfo output. Same issue is reported on gnome-shell bugtracker: https://bugzilla.gnome.org/show_bug.cgi?id=650857 but they claim its either X or the drivers fault. Created attachment 54446 [details]
gdb gnome-shell backtrace
(In reply to comment #13) > Try the following options in the kernel command line in grub: > pci=nomsi > noapic > irqpoll > and see if any of them help. I've tried your suggestion and it worked! I've discovered the following: "pci=nomsi noapic irqpoll" no freeze "pci=nomsi irqpoll" no freeze "irqpoll" no freeze "pci=nomsi" no freeze "noapic" no freeze "" freeze So far without any of this options I can reproduce the crash in 100% cases under 1 minute of moving image in Gimp. I did it many times when backtracing. However, there is slight probability that I just got lucky with one of options. I'm 100% sure I've run `update-grub` after every /etc/default/grub change. Assuming that all of this options fix the problem, which option should I use (witch one disables least things)? I found the following but I don't understand much: noapic: [SMP,APIC] Tells the kernel to not make use of any IOAPICs that may be present in the system. irqpoll: [HW] When an interrupt is not handled search all handlersfor it. Also check all handlers each timerinterrupt. Intended to get systems with badly brokenfirmware running. pci=nomsi: [MSI] If the PCI_MSI kernel config parameter isenabled, this kernel boot option can be used todisable the use of MSI interrupts system-wide. And finally: Is it a proper fix, or just a workaround? (In reply to comment #18) > And finally: Is it a proper fix, or just a workaround? It's a workaround. Since those options work, it's not a radeon bug. It's most likely a platform bug for your board; probably a bad apic or msi setup for your chipset. You might need an apic or pci quirk for your motherboard chipset. I would suggest emailing the linux-kernel mailing list and saying that you need noapic or pci=msi to get things working on your board. Include the hw details of your system (lspci, etc.). Alex I think this is a driver or hw bug actually. I seem to lose MSI rearms here, if I manually poke a rearm in from userspace over ssh the system recovers fine. not sure if we should disable MSI on rv515, you might be able to find some info internally. (In reply to comment #21) > Alex I think this is a driver or hw bug actually. > > I seem to lose MSI rearms here, if I manually poke a rearm in from userspace over ssh the system recovers fine. > > not sure if we should disable MSI on rv515, you might be able to find some info internally. I'll see what I can find, but r5xx is pretty old. Did reading back the rearm reg help? Created attachment 56992 [details] [review] possible fix Does this patch help? A patch referencing this bug report has been merged in Linux v3.3-rc4: commit b7f5b7dec3d539a84734f2bcb7e53fbb1532a40b Author: Alex Deucher <alexander.deucher@amd.com> Date: Mon Feb 13 16:36:34 2012 -0500 drm/radeon/kms: fix MSI re-arm on rv370+ still seeing the odd lockup will play for a few more days though. Current kernels disable MSI by default for RV515. Does that resolve this report? apologies for the many months without reply - i have not been in a position to contribute after making the initial bug report. i recently performed a system update and am currently running kernel 3.3.1 on archlinux, i can confirm that this bug is still present, and i am still getting occasional graphics freezes of the same nature to before. once again i can get openbox to replace the affected x session, so basic rendering is still okay. i did get an opportunity to test pci=nomsi for a while and noticed that while there were no freezes of this nature, the 3d graphics rendering would sometimes receive a slight unexplained performance hit (visual lag) that would remain for the rest of the session, a minor effect but not debilitating to general use. Is it still a issue with current mesa and kernel ?? |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 52202 [details] dmesg output using drm.debug=14 i have an intermittent issue with "ATI Mobility Radeon X1300" on an up-to-date archlinux laptop running gnome-shell.. all's well until rendering breaks, sometimes after many hours/days use.. there's no real sign of a repeatable action and nothing reported in logs.. X continues to run and the mouse cursor is active, and even changes to represent the screen content.. the closest i can get to a repeatable action is an almost certain freeze using gimp with strenuous brush activity using the clone tool on an average size image. if i restart gdm i still have no mouse/keyboard interaction with onscreen although gdm dress is displayed.. if i killall -HUP gnome-shell in the running session the window content (without decorations) briefly displays but when it settles only the background appears if i switch to the VT and back to X then window content with border decorations is displayed instead of background image, but still no interaction i can't launch any X applications from VT or ssh, or use another WM to replace. `DISPLAY=:0 openbox --replace` results in: "Invalid MIT-MAGIC-COOKIE-1 keyOpenbox-Message: Failed to open the display from the DISPLAY environment variable." however, if i kill gdm, and create an xinitrc for openbox i can start an Xsession with keybd/mouse interaction, so i guess there's definitely something up with 3d (offscreen?) rendering from within the openbox session: running glxgears just shows an empty black box.. all other glx demos are the same empty boxes.. when i'm logged in to this openbox xsession i can launch xapps via `DISPLAY=:0 foobar` variable.. so this bug i'm encountering must do something to interrupt the ability to draw new windows? i can only get a working 3D X session by unloading/reloading kernel drivers.. radeon/drm/ttm/drm_kms_helper.. here's the very basic script i'm using to unload drivers so i can start X with 3D.. --- #!/bin/bash # unbind kms fb before unloading modules.. echo 0 > /sys/class/vtconsole/vtcon1/bind sleep 1 # unload kernel modules rmmod drm_kms_helper ttm radeon drm sleep 1 # reload kernel modules modprobe radeon --- some basic information: GL_VERSION: 2.1 Mesa 7.11 GL_VENDOR: X.Org R300 Project GL_RENDERER: Gallium 0.4 on ATI RV515 uname -a: Linux neondada 3.0-ARCH #1 SMP PREEMPT Tue Aug 30 08:53:25 CEST 2011 x86_64 Intel(R) Core(TM)2 CPU T5600 @ 1.83GHz GenuineIntel GNU/Linux i have included output from startx, dmesg (with drm.debug=14), Xorg.0.log & glxinfo.