Hello! I discovered this issue while browseing with WebKit based browser like Midori or Epiphany within the new Gnome-Shell. Steps to reproduce: 1. Gnome 3.2 with Gnome-Shell, using the Radeon-Drivers from Linux/XORG 1. # sysctl -w kernel.sysrq="1" // WARNING! This will not save your ass!!! 2. $ launch Midori or Epiphany with enabled JavaScript 3. http://piratpix.com/bpt2011.1/index.html // Pictures from federal convent of german Pirate-Party, note: There is a "javascript void(0)" error on the site 4. open one or some image with the mouse Results: * Crash/Hang of system * MagicSysRq will not help! Software/Hardware: pacman -Q libwebkit libwebkit3 midori epiphany libwebkit 1.6.1-1 libwebkit3 1.6.1-1 midori 0.4.1-1 epiphany 3.2.0-1 and also: * Self compiled vanilla kernel 3.0.3 and stock kernel 3.0.6 from Archlinux * Graphis-Card is an AMD Radeon 5650, with open-source drivers Not affected: * Gnome 3.2 in Fallback-Mode * Firefox 7 (shows also the javascript void(0) but displays everything like usual) * Fedora 15 with Radeon 4670 and open-source drivers and Midori 0.3.6 Soughts: * Maybe row of problems start with the JavaScript-Error * Next Step is WebKit * Next Steop is Gnome-Shell * But even if there bugs in all of them above, the operating-system should crash/hang in any chase. So I afraid this is a bug in kernel/radeon-driver. * It is also interesting the the Fedora 15 with an other AMD Graphics-Card is not affected. Thanks PS: Discussion in the forums of Archlinux - https://bbs.archlinux.org/viewtopic.php?pid=1004227#p1004227
...the operating-system shouldn't crash. You can keep the other typos :-)
Please attach your xorg log and dmesg output.
Will post both as soon as possible. Maybe today evening.
Created attachment 52435 [details] dmesg output
Created attachment 52436 [details] xorg log I have set vblank_mode=0 and SwapbuffersWait to "false", but even without them the system will crash/hang.
Created attachment 52437 [details] lspci output
Created attachment 52438 [details] pacman -Q output, list of installed packages My self compiled vanilla kernel 3.0.3 is not listed, but the problem exits also with the stock kernel of Archlinux.
I have done some testing with my laptop, their is also the integrated graphics from Intel on the Core i5 CPU. Result: * If I use the integrated Intel graphics-device I'm not affected by this bug
Created attachment 52831 [details] xorg.log after crash/hang of system I've upgraded to kernel 3.1 but the bug is still present. The first log I've uploaded previously show the state BEFORE anything happend and seems not very helpful. My mistake. So I provide here a new logfile after crash/hang of the system, at the end you can see an error-message: [ 71.633] (II) RADEON(0): radeon_dri2_flip_event_handler:981 fevent[0x19870d0] width 1366 pitch 5632 (/4 1408) I hope this is more helpful :-)
Ooops. There are a lot of bugreports around Gnome 3.2 and the flip_event_handler message. https://bbs.archlinux.org/viewtopic.php?id=127506 http://lists.freedesktop.org/archives/dri-devel/2011-October/015112.html https://bugs.archlinux.org/task/26340 Just the tip of the iceberg :-(
Possibly a duplicate of bug 41592.
Hmmm. Is it helpful for you, if I try to use kexec and get a dump-file of my crashed kernel?
Seems that this is not the same bug, looking at mirandir's comment.
I tried a git-version of xf86-video-ati, but it didn't changed anything.
1. I downloaded the beta release of Fedora 16 (Live) an put it on an USB-Thumbdrive 2. Boot up my computer and installed Epiphany (WebKit) 3. Visit piratpix.com 4. Click through thumbnails I've done this on my laptop with a Radeon 5650 Mobility and on my desktop with a Radeon 46??. The benefit of this testing is, that I've got the 100% identical system and we know that this issue is not caused by me or my configuration. laptop with radeon 5650-> crash! desktop with radeon 46xx-> runs smooth without problems Conclusion: It is the graphics card! But why? It runs perfect in all other situations (Gnome-Shell itself, IOQuake3, Videos, Framebuffer...).
* Upgraded BIOS from 1.13 to 1.19, no effect (Acer Timeline 3820TG) * xf86-video-ati 6.14.3, no effect Am I the only user on this world, with this bug?! Damn!
(In reply to comment #9) > So I provide here a new logfile after crash/hang of the system, at the end you > can see an error-message: > [ 71.633] (II) RADEON(0): radeon_dri2_flip_event_handler:981 > fevent[0x19870d0] width 1366 pitch 5632 (/4 1408) Those aren't errors but harmless debug messages. Something is increasing the X server log level from the default for you. As the problem doesn't happen without gnome-shell, it's most likely a Mesa driver bug. Can you try if it still happens with current upstream Mesa Git? Please also attach the glxinfo output.
Okay. I've tried it (for more than five hours). The final showstopper was mesa-git, with a compile-error in r300. I will wait for the next official release mesa/ati-dri/xf86-video-ati and hope the best. By the way, even if the current git-repos include a fix. Shouldn't be the bug itself in the radeon-kernel-module?
Created attachment 53212 [details] glxinfo output of kernel 3.1
Created attachment 53213 [details] glxinfo output kernel 2.6.37 not affected
Created attachment 53214 [details] glxinfo of kernel 2.6.39.3
Okay! I've nearly got it! I've taken a bunch of old stock-kernels from archlinux and tested a lot. kernel-2.6.37 is not affected kernel-2.6.38.1 fails to login into gnome-shell (don't care) kernel-2.6.38.5-1 is not affected kernel-2.6.38.7-1 is not affected kernel-2.6.38.8 is affected and crashes reliable! kenrel-2.6.39.3 is affected and crashes reliable! kernel-3.0 is affected and crashed reliable! kernel-3.1 is affected and crhases reliable! live-cd of fedora 16 is affected (beta) live-cd of fedora 15 is not affected (kernel-2.6.38-something) So I think some of the patches between 2.6.38.7 and 2.6.38.8 is the cause! I hope this helps :-)
I'm just guessing: Alex Deucher (1): drm/radeon/evergreen/btc/fusion: setup hdp to invalidate and flush when asked Possible?!
Or this? This looks more interesting! https://lkml.org/lkml/2011/6/1/302 I patching...
Created attachment 53218 [details] [review] remove r600_ioctl_wait_idle for evergreen (r800) based cards It seems mit guess was right, if I'm right this patch caused the bug? https://lkml.org/lkml/2011/6/1/302 I take a look at the source of /drivers/gpu/drm/radeon/radeon_asic.c and decided to remove "ioctl_wait_idle = r600_ioctl_wait_idle" from "static struct radeon_asic evergreen_asic". I hope this doesn't cause a new problems, but maybe I'm lucky: /drivers/gpu/drm/radeon/r600.c 3533 /** 3534 * r600_ioctl_wait_idle - flush host path cache on wait idle ioctl 3535 * rdev: radeon device structure 3536 * bo: buffer object struct which userspace is waiting for idle 3537 * 3538 * Some R6XX/R7XX doesn't seems to take into account HDP flush performed 3539 * through ring buffer, this leads to corruption in rendering, see 3540 * http://bugzilla.kernel.org/show_bug.cgi?id=15186 to avoid this we 3541 * directly perform HDP flush by writing register through MMIO. 3542 */ 3543 void r600_ioctl_wait_idle(struct radeon_device *rdev, struct radeon_bo *bo) { ... } My affected card is a Radeon 5650 Mobility, a Evergreen which is ~ R800. So r600_ioctl_wait_idle() shouldn't be necessary for Evergreen based cards. I've tested the change now as much as possible for me within the Gnome-Shell and Framebuffer-Terminals (Suspend to RAM, runs of IOQuake3, glxgears, glchess, Midori, Firefox). Thanks
I still think this may be a mesa bug. Have you tried mesa from git? The glxinfo outputs you attached are from mesa 7.11.
Created attachment 53247 [details] [review] flush HDP via the ring You still need to flush the HDP cache otherwise the CPU may get stale data if it accesses vram after GPU rendering is complete.
Fine! I will apply your patch today evening and will report the result as soon as possible. Can you tell me shortly what the HDP cache is? For what is it good? Thanks! I tried to compile mesa from git, but didn't succeed and gave up in shame! In general I think code from user-space shouldn't able to trigger (or prevent, in this case) a fatal crash or hang in kernel-space. So I decided to investigate the problem by testing the different versions of the kernel and some poking in the code ;-)
(In reply to comment #28) > Fine! I will apply your patch today evening and will report the result as soon > as possible. Can you tell me shortly what the HDP cache is? For what is it > good? Thanks! > Host Data Path. It's the interface for accessing vram via the CPU. E.g., when you access a buffer in vram via the PCI FB BAR, it goes through the HDP on the GPU. Unfortunately, bugzilla.kernel.org is down so I don't remember exactly what bug the r600_ioctl_wait_idle patch fixed. It's possible you just aren't hitting the case in your particular desktop scenario, but that removing it would regress other cases. I'm leery of applying it until I have some confirmation that someone else is hitting this bug or that it doesn't regress any other cases. > > I tried to compile mesa from git, but didn't succeed and gave up in shame! In What sort of problems are you seeing? > general I think code from user-space shouldn't able to trigger (or prevent, in > this case) a fatal crash or hang in kernel-space. So I decided to investigate > the problem by testing the different versions of the kernel and some poking in > the code ;-) That's the nature of complex 3D engines.
(In reply to comment #29) > Host Data Path. It's the interface for accessing vram via the CPU. E.g., when > you access a buffer in vram via the PCI FB BAR, it goes through the HDP on the > GPU. Unfortunately, bugzilla.kernel.org is down so I don't remember exactly > what bug the r600_ioctl_wait_idle patch fixed. It's possible you just aren't > hitting the case in your particular desktop scenario, but that removing it > would regress other cases. I'm leery of applying it until I have some > confirmation that someone else is hitting this bug or that it doesn't regress > any other cases. Thanks for your description. I'm understand that you are careful. > What sort of problems are you seeing? The compiler complaint about something in "r300..." buth honestly I don't remember it really. > That's the nature of complex 3D engines. I understand! But we are lucky to have you and your co-workers at AMD :-) Okay. I tried to apply the patch, but I doesn't work. [peter@cupcake linux-3.1]$ patch -p1 --dry-run < ~/flush_hdp_via_the_ring.patch patching file drivers/gpu/drm/radeon/evergreen_blit_kms.c Hunk #1 FAILED at 625. 1 out of 1 hunk FAILED -- saving rejects to file drivers/gpu/drm/radeon/evergreen_blit_kms.c.rej patching file drivers/gpu/drm/radeon/r600.c Hunk #1 FAILED at 2331. Hunk #2 succeeded at 2342 (offset -11 lines). 1 out of 2 hunks FAILED -- saving rejects to file drivers/gpu/drm/radeon/r600.c.rej patching file drivers/gpu/drm/radeon/r600_blit_kms.c Hunk #1 FAILED at 503. 1 out of 1 hunk FAILED -- saving rejects to file drivers/gpu/drm/radeon/r600_blit_kms.c.rej patching file drivers/gpu/drm/radeon/radeon_asic.c I'm afraid you created the patch against a different kernel? Is it current git?
Yep, looks like git! Little bit to late for me now. I will give you feedback tomorrow!
Sorry! I'm late! Because Linus Torvalds releases 3.2-rc1 I decided to test the rc1 instead of a clone of git-master. * kernel-3.2-rc1 without patch - unstable, crash/hang * kernel-3.2-rc1 with your patch - stable, works perfect During testing I got the feeling the kernel 3.2-rc1 without the patch itself is "more stable". I will try to describe. In most cases a unpatched kernel will crash/hang after opening by mouse-click (not slideshow) 1 to 5 images on piratpix.com (the website is the only known website, to me, which reproduce the issue reliable). The unpatched rc1 seems to survive more, I was able to open ~20 pictures before the system doesn't respond. The kernel 3.2-rc1 with your patch is completely stable and works fine! Great work! Thanks I add a new glxinfo and dmesg of the patched and kernel with patch!
Created attachment 53353 [details] glxinfo of kernel-3.2-rc1 with patch
Created attachment 53354 [details] dmesg of kernel-3.2-rc1 with patch
Oh no! After my last comment I just left Epiphany open and let it draw the slideshow on the website. While that I wrote some mails and heared some music (just normal stuff), till my system crashed/hang. The music replayed in a sound-loop. Kernel 3.1 - regulary crash with the 1 to 5 picture Kernel 3.2-rc1 - regulary crash with the ~ 20 picture Kernel 3.2-rc1 with patch - crash after really many pictures :-(
Any chance you can try a newer mesa? There may be test packages available for your distro.
In the official testing-repos of Archlinux are currently no mesa-packages, but I know a repo hosted by an user - http://spiralinear.org/perry3d/x86_64/ If these packages work I will remove your patch or downgrade to 3.1 to get a more "unreliable" environment and start testing.
Good news! * I installed mesa-git and everything depending on it * kernel 3.2-rc1 and 3.1 The system is stable and doesn't crash! I still worried about the complex connections between mesa and kernel, but I'm just glade about a rock stable system! Thanks! Maybe I get access to the identical laptop of an co-worker with the same hardware (the radeon is just relabeled as 6650) and will try Fedora 16 on it. I want reproduce the bug to confirm myself. After that I will add the result here an close the bug, finally :-)
I doesn't use any patch for the kernels!
(In reply to comment #38) > * I installed mesa-git and everything depending on it [...] > The system is stable and doesn't crash! Glad to hear it. It would be great if you could try if the problem still happens with the current Mesa Git 7.11 branch, and if it does, if you could bisect which change from the master branch fixed it. Then we could maybe backport the fix to the 7.11 branch.
To be honest, bisect mesa is to much time consuming for me. So I decided to give the new mesa-release 7.11.1 a try but the commit which fixed the issue on mesa-git is not included. Currently im looking forward to thursday next week, where I should get access to the laptop of my co-worker. I've taken a look on the changelog on the commit-messages on the git-log of mesa. But I didn't found something interesting. Maybe I will take a deeper look later.
My co-worker gave me today access to his laptop (thanks!). 1. I booted up Fedora 16 (Live) and installed Epiphany 2. Visit the website from above 3. After clicking on the second thumbnail the system hanged/crashed It is the same 3820TG from Acer with a Radeon 5600 Series ASIC, the BIOS version 1.19 is also the same. Looks like "reproduceable" for me. Next interesting thing would be an general test on another Evergreen based ASICs. Should I set this bug on "Resolved"? Or wait for Mesa 8.0?
HD5850 owner here, Since many months ago I suffered under complete random system freezes (no tty, no ssh) as soon as I switched to gallium instead of classic. I could not reproduce it and the logs did not contain anything suspicious, probably because I needed to reset the machine. I can't count how many config files did got corrupted in this time. It happened almost guaranteed under two hours uptime and it didn't matter what distribution or desktop I used. Sadly, I had no time to debug it further and used the classic driver instead. I tried using mesa-git in the last few days and the problem disappeared, I tracked it down, until I found this little bug report, just to find out what it was. Thank you guys, you did me and perhaps a few others a great favour.
The good news: mesa-8.0 is stable The better news: mesa-8.0.1-2 is stable in archlinux repositories The best news: it fixed officially! The sad news: still don't know what it finally fixed ;-)
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.